Python的descriptor

现在的位置: 首页 > 综合 > 正文

Python的descriptor

2013年10月06日 ⁄ 综合 ⁄ 共 10061字 ⁄ 字号小中大 ⁄ 评论关闭

如果你和我一样，曾经对method和function以及对它们的各种访问方式包括self参数的隐含传递迷惑不解，建议你耐心的看下去。这里还提到了Python属性查找策略，使你清楚的知道Python处理obj.attr和obj.attr=val时，到底做了哪些工作。

Python中，对象的方法也是也可以认为是属性，所以下面所说的属性包含方法在内。

先定义下面这个类，还定义了它的一个实例，留着后面用。

class T(object): name = 'name' def hello(self): print 'hello' t = T()

使用dir(t)列出t的所有有效属性：

>>> dir(t)  ['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__',   '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__',   '__repr__', '__setattr__', '__str__', '__weakref__', 'hello', 'name']

属性可以分为两类，一类是Python自动产生的，如__class__，__hash__等，另一类是我们自定义的，如上面的hello，name。我们只关心自定义属性。

类和实例对象(实际上，Python中一切都是对象，类是type的实例)都有__dict__属性，里面存放它们的自定义属性(对与类，里面还存放了别的东西)。

>>> t.__dict__ 
{}
>>> dict(T.__dict__)     {'__module__': '__main__', 'name': 'name',   'hello': <function hello at 0x00CC2470>,   '__dict__': <attribute '__dict__' of 'T' objects>,   '__weakref__': <attribute '__weakref__' of 'T' objects>, '__doc__': None} 
>>>

有些内建类型，如list和string，它们没有__dict__属性，所以没办法在它们上面附加自定义属性。

到现在为止t.__dict__是一个空的字典，因为我们并没有在t上自定义任何属性，它的有效属性hello和name都是从T得到的。T的__dict__中包含hello和name。当遇到t.name语句时，Python怎么找到t的name属性呢？

首先，Python判断name属性是否是个自动产生的属性，如果是自动产生的属性，就按特别的方法找到这个属性，当然，这里的name不是自动产生的属性，而是我们自己定义的，Python于是到t的__dict__中寻找。还是没找到。

接着，Python找到了t所属的类T，搜索T.__dict__，期望找到name，很幸运，直接找到了，于是返回name的值：字符串 ‘name’。如果在T.__dict__中还没有找到，Python会接着到T的父类(如果T有父类的话)的__dict__中继续查找。

这不足以解决我们的困惑，因为事情远没有这么简单，上面说的其实是个简化的步骤。

继续上面的例子，对于name属性T.name和T.__dict__['name']是完全一样的。

>>> T.name 
'name' 
>>> T.__dict__['name']
 'name' 
>>>

但是对于hello，情形就有些不同了

>>> T.hello 
<unbound method T.hello> 
>>> T.__dict__['hello'] 
<function hello at 0x00CC2470> 
>>>

可以发现，T.hello是个unbound method。而T.__dict__['hello']是个函数(不是方法)。

推断：方法在类的__dict__中是以函数的形式存在的(方法的定义和函数的定义简直一样，除了要把第一个参数设为self)。那么T.hello得到的应该也是个函数啊，怎么成了unbound method了。

再看看从实例t中访问hello

>>> t.hello 
<bound method T.hello of <__main__.T object at 0x00CD0E50>> 
>>>

是一个bound method。

有意思，按照上面的查找策略，既然在T的__dict__中hello是个函数，那么T.hello和t.hello应该都是同一个函数才对。到底是怎么变成方法的，而且还分为unbound method和bound method。

关于unbound和bound到还好理解，我们不妨先作如下设想：方法是要从实例调用的嘛(指实例方法，classmethod和 staticmethod后面讲)，如果从类中访问，如T.hello，hello没有和任何实例发生联系，也就是没绑定(unbound)到任何实例上，所以是个unbound，对t.hello的访问方式，hello和t发生了联系，因此是bound。

但从函数<function hello at 0x00CC2470>到方法<unbound method T.hello>的确让人费解。

一切的魔法都源自今天的主角：descriptor

查找属性时，如obj.attr，如果Python发现这个属性attr有个__get__方法，Python会调用attr的__get__方法，返回__get__方法的返回值，而不是返回attr(这一句话并不准确，我只是希望你能对descriptor有个初步的概念)。

Python中iterator(怎么扯到Iterator了？)是实现了iterator协议的对象，也就是说它实现了下面两个方法 __iter__和next()。类似的，descriptor也是实现了某些特定方法的对象。descriptor的特定方法是 __get__,__set__和__delete__，其中__set__和__delete__方法是可选的。iterator必须依附某个对象而存在(由对象的__iter__方法返回)，descriptor也必须依附对象，作为对象的一个属性，它而不能单独存在。还有一点，descriptor
必须存在于类的__dict__中，这句话的意思是只有在类的__dict__中找到属性，Python才会去看看它有没有__get__等方法，对一个在实例的__dict__中找到的属性，Python根本不理会它有没有__get__等方法，直接返回属性本身。descriptor到底是什么呢：简单的说，descriptor是对象的一个属性，只不过它存在于类的__dict__中并且有特殊方法__get__(可能还有__set__和 __delete)而具有一点特别的功能，为了方便指代这样的属性，我们给它起了个名字叫descriptor属性。

可能你还是不明白，下面开始用例子说明。

先定义这个类：

class Descriptor(object):      def __get__(self, obj, type=None):              return 'get', self, obj, type      def __set__(self, obj, val):          print 'set', self, obj, val      def __delete__(self, obj):          print 'delete', self, obj

这里__set__和__delete__其实可以不出现，不过为了后面的说明，暂时把它们全写上。

下面解释一下三个方法的参数：

self当然不用说，指的是当前Descriptor的实例。obj值拥有属性的对象。这应该不难理解，前面已经说了，descriptor是对象的稍微有点特殊的属性，这里的obj就是拥有它的对象，要注意的是，如果是直接用类访问descriptor(别嫌啰嗦，descriptor是个属性，直接用类访问descriptor就是直接用类访问类的属性)，obj的值是None。type是obj的类型，刚才说过，如果直接通过类访问 descriptor，obj是None，此时type就是类本身。

三个方法的意义，假设T是一个类，t是它的一个实例，d是T的一个descriptor属性(牛什么啊，不就是有个__get__方法吗！)，value是一个有效值：

读取属性时，如T.d,返回的是d.__get__(None, T)的结果，t.d返回的是d.__get__(t, T)的结果。

设置属性时，t.d = value，实际上调用d.__set__(t, value)，T.d = value，这是真正的赋值，T.d的值从此变成value。删除属性和设置属性类似。

下面用例子说明，看看Python中执行是怎么样的：

重新定义我们的类T和实例t

class T(object):      d = Descriptor()  t = T()

d是T的类属性，作为Descriptor的实例，它有__get__等方法，显然，d满足了所有的条件，现在它就是一个descriptor！

>>> t.d         #t.d，返回的实际是d.__get__(t, T)  
('get', <__main__.Descriptor object at 0x00CD9450>, <__main__.T object at 0x00CD0E50>, <class '__main__.T'>) 
>>> T.d        #T.d，返回的实际是d.__get__(None, T)，所以obj的位置为None  
('get', <__main__.Descriptor object at 0x00CD9450>, None, <class '__main__.T'>) 
>>> t.d = 'hello'   #在实例上对descriptor设置值。要注意的是，现在显示不是返回值，而是__set__方法中                                 print语句输出的。  
set <__main__.Descriptor object at 0x00CD9450> <__main__.T object at 0x00CD0E50> hello 
>>> t.d         #可见，调用了Python调用了__set__方法，并没有改变t.d的值  
('get', <__main__.Descriptor object at 0x00CD9450>, <__main__.T object at 0x00CD0E50>, <class '__main__.T'>) 
>>> T.d = 'hello'   #没有调用__set__方法 
>>> T.d                #确实改变了T.d的值  
'hello' 
>>> t.d               #t.d的值也变了，这可以理解，按我们上面说的属性查找策略，t.d是从T.__dict__中得到的                                T.__dict__['d']的值是'hello'，t.d当然也是'hello' 
'hello'  
>>> t.d #t.d，返回的实际是d.__get__(t, T) 
('get', <__main__.Descriptor object at 0x00CD9450>, <__main__.T object at 0x00CD0E50>, <class '__main__.T'>)
>>> T.d #T.d，返回的实际是d.__get__(None, T)，所以obj的位置为None 
('get', <__main__.Descriptor object at 0x00CD9450>, None, <class '__main__.T'>)

data descriptor和non-data descriptor

象上面的d，同时具有__get__和__set__方法，这样的descriptor叫做data descriptor，如果只有__get__方法，则叫做non-data descriptor。容易想到，由于non-data descriptor没有__set__方法，所以在通过实例对属性赋值时，例如上面的t.d = 'hello'，不会再调用__set__方法，会直接把t.d的值变成'hello'吗？口说无凭，实例为证：

class Descriptor(object):      def __get__(self, obj, type=None):              return 'get', self, obj, type  class T(object):         d = Descriptor()  t = T()

>>> t.d  ('get', <__main__.Descriptor object at 0x00CD9550>, <__main__.T object at 0x00CD9510>, <class '__main__.T'>) 
>>> t.d = 'hello' 
>>> t.d 
'hello' 
>>>

在实例上对non-data descriptor赋值隐藏了实例上的non-data descriptor！

是时候坦白真正详细的属性查找策略 了，对于obj.attr（注意：obj可以是一个类）：

1.如果attr是一个Python自动产生的属性，找到！(优先级非常高！)

2.查找obj.__class__.__dict__，如果attr存在并且是data descriptor，返回data descriptor的__get__方法的结果，如果没有继续在obj.__class__的父类以及祖先类中寻找data descriptor

3.在obj.__dict__中查找，这一步分两种情况，第一种情况是obj是一个普通实例，找到就直接返回，找不到进行下一步。第二种情况是 obj是一个类，依次在obj和它的父类、祖先类的__dict__中查找，如果找到一个descriptor就返回descriptor的 __get__方法的结果，否则直接返回attr。如果没有找到，进行下一步。

4.在obj.__class__.__dict__中查找，如果找到了一个descriptor(插一句：这里的descriptor一定是 non-data descriptor，如果它是data descriptor，第二步就找到它了)descriptor的__get__方法的结果。如果找到一个普通属性，直接返回属性值。如果没找到，进行下一步。

5.很不幸，Python终于受不了。在这一步，它raise AttributeError

利用这个，我们简单分析一下上面为什么要强调descriptor要在类中才行。我们感兴趣的查找步骤是2，3，4。第2步和第4步都是在类中查找。对于第3步，如果在普通实例中找到了，直接返回，没有判断它有没有__get__()方法。

对属性赋值时的查找策略，对于obj.attr = value

1.查找obj.__class__.__dict__，如果attr存在并且是一个data descriptor，调用attr的__set__方法，结束。如果不存在，会继续到obj.__class__的父类和祖先类中查找，找到 data descriptor则调用其__set__方法。没找到则进入下一步。

2.直接在obj.__dict__中加入obj.__dict__['attr'] = value

顺便分析下为什么在实例上对non-data descriptor赋值隐藏了实例上的non-data descriptor。

接上面的non-data descriptor例子

>>> t.__dict__  {'d': 'hello'}

在t的__dict__里出现了d这个属性。根据对属性赋值的查找策略，第1步，确实在t.__class__.__dict__也就是 T.__dict__中找到了属性d，但它是一个non-data descriptor，不满足data descriptor的要求，进入第2步，直接在t的__dict__属性中加入了属性和属性值。当获取t.d时，执行查找策略，第2步在 T.__dict__中找到了d，但它是non-data
descriptor，步满足要求，进行第3步，在t的__dict__中找到了d，直接返回了它的值'hello'。

说了这么半天，还没到函数和方法！

算了，明天在说吧

简单提一下，所有的函数(方法)都有__get__方法，当它们在类的__dict__中是，它们就是non-data descriptor。

前面说了descriptor，这个东西其实和Java的setter，getter有点像。但这个descriptor和上文中我们开始提到的函数方法这些东西有什么关系呢？

所有的函数都可以是descriptor，因为它有__get__方法。

>>> def hello(): pass

>>> dir(hello) ['__call__', '__class__', '__delattr__', '__dict__', '__doc__', '__get__ ', '__getattribute__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']

>>>

注意，函数对象没有__set__和__del__方法，所以它是个non-data descriptor.

方法其实也是函数，如下：

>>> class T(object): def hello(self): pass

>>> T.__dict__['hello']

<function hello at 0x00CD7EB0>

>>>

或者，我们可以把方法看成特殊的函数，只是它们存在于类中，获取函数属性时，返回的不是函数本身(比如上面的<function hello at 0x00CD7EB0>)，而是返回函数的__get__方法的返回值，接着上面类T的定义：

>>> T.hello
<unbound method T.hello>

获取T的hello属性，根据查找策略，从T的__dict__中找到了，找到的是<function hello at 0x00CD7EB0>，但不会直接返回<function hello at 0x00CD7EB0>，因为它有__get__方法，所以返回的是调用它的__get__(None, T)的结果：一个unbound方法。

>>> f = T.__dict__['hello'] #直接从T的__dict__中获取hello，不会执行查找策略，直接返回了<function hello at 0x00CD7EB0>
>>> f
<function hello at 0x00CD7EB0>
>>> t = T()
>>> t.hello #从实例获取属性，返回的是调用<function hello at 0x00CD7EB0>的__get__(t, T)的结果：一个bound方法。
<bound method T.hello of <__main__.T object at 0x00CDAD10>>
>>>

为了证实我们上面的说法，在继续下面的代码(f还是上面的<function hello at 0x00CD7EB0>)：

>>> f.__get__(None, T) 
<unbound method T.hello> 
>>> f.__get__(t, T) 
<bound method T.hello of <__main__.T object at 0x00CDAD10>>

好极了！

总结一下：

1.所有的函数都有__get__方法

2.当函数位于类的__dict__中时，这个函数可以认为是个方法，通过类或实例获取该函数时，返回的不是函数本身，而是它的__get__方法返回值。

我承认我可能误导你认为方法就是函数，是特殊的函数。其实方法和函数还是有区别的，准确的说：方法就是方法，函数就是函数。

>>> type(f)
<type 'function'>
>>> type(t.hello)
<type 'instancemethod'>
>>> type(T.hello)
<type 'instancemethod'>
>>>

函数是function类型的，method是instancemethod(这是普通的实例方法，后面会提到classmethod和staticmethod)。

关于unbound method和bound method，再多说两句。在c实现中，它们是同一个对象(它们都是instancemethod类型的)，我们先看看它们里面到底是什么

>>> dir(t.hello)

['__call__', '__class__', '__cmp__', '__delattr__', '__doc__', '__get__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', 'im_class', 'im_func', 'im_self']

__call__说明它们是个可调用对象，而且我们还可以猜测，这个__call__的实现应该大致是：转调另外一个函数(我们期望的哪个，比如上面的hello)，并以对象作为第一参数。

要注意的是im_class,im_func,im_self。这几个东西我们并不陌生，在t.hello里，它们分别代表T,hello(这里是存储在 T.__dict__里的函数hello)和t。有了这些我们可以大致想象如何纯Python实现一个instancemethod了:)。

其实还有几个内建函数都和descriptor有关，下面简单说说。

classmethod

classmethod能将一个函数转换成类方法，类方法的第一个隐含参数是类本身 (普通方法的第一个隐含参数是实例本身)，类方法即可从类调用，也可以从实例调用(普通方法只能从实例调用)。

>>> class T(object): def hello(cls): print 'hello', cls hello = classmethod(hello) #两个作用：把hello装换成类方法，同时隐藏作为普通方法的hello
>>> t = T()
>>> t.hello()
hello <class '__main__.T'>
>>> T.hello()
hello <class '__main__.T'>
>>>

注意：classmethod是个类，不是函数。classmethod类有__get__方法，所以，上面的t.hello和T.hello获得实际上是classmethod的__get__方法返回值

>>> t.hello
<bound method type.hello of <class '__main__.T'>>
>>> type(t.hello)
<type 'instancemethod'>
>>> T.hello
<bound method type.hello of <class '__main__.T'>>
>>> type(T.hello)
<type 'instancemethod'>
>>>

从上面可以看出，t.hello和T.hello是instancemethod类型的，而且是绑定在T上的。也就是说classmethod的 __get__方法返回了一个instancemethod对象。从前面对instancemethod的分析上，我们应该可以推断：t.hello的 im_self是T，im_class是type(T是type的实例)，im_func是函数hello

>>> t.hello.im_self
<class '__main__.T'>
>>> t.hello.im_class
<type 'type'>
>>> t.hello.im_func
<function hello at 0x011A40B0>
>>>

完全一致！所以实现一个纯Python的classmethod也不难:)

staticmethod

staticmethod能将一个函数转换成静态方法，静态方法没有隐含的第一个参数。

class T(object): def hello(): print 'hello' hello = staticmethod(hello)
>>> T.hello() #没有隐含的第一个参数 hello
>>> T.hello
<function hello at 0x011A4270>
>>>

T.hello直接返回了一个函数。猜想staticmethod类的__get__方法应该是直接返回了对象本身。

#############

本文转自

http://hi.baidu.com/_yuan0518/blog/item/1c0a0af410e7fc36bc310935.html

【上篇】c++试题（20）
【下篇】asp.net窗口拖拽问题

作者: censor

该日志由 censor 于11年前发表在综合分类下，最后更新于 2013年10月06日.
转载请注明: Python的descriptor | 学步园 +复制链接

抱歉!评论已关闭.

学步园

Python的descriptor

作者: censor

书签

最新文章New

本站推荐

返回首页