现在的位置: 首页 > 综合 > 正文

Effective C# 原则25: 让你的类型支持序列化(译)

2011年02月08日 ⁄ 综合 ⁄ 共 18052字 ⁄ 字号 评论关闭

Effective C# 原则25: 让你的类型支持序列化

Item 25: Prefer Serializable Types

对象的持久是类型的一个核心功能。这是一个在你忽略对它的支持以前,没有人会注意到的基本元素之一。 如果你的类型不能恰当的支持序列化,那么对于把你类的做为基类或者成员的开发人员来说,你会给他们增加很多的工作量。当你的类型不支持序列化时,他们不得不围绕这工作,自己添加实现这个标准的功能。而对于不能访问类的私有成员的开发人来说,恰当的实现你的类型的序列化是不太可能的。如果你的类型不支持序列化,那么对于你的用户来说,想再要实现实它是很困难或者根本就不可能的事。

取而代之的是,为你的实际类型添加序列化。对于那些不用承载UI元素,窗口,或者表单的类型来说这是有实际意义的。感觉有额外的工作是没有理由的,.Net的序列化是很简单的,以至于你没有任意的借口说不支持它。在多数情况下,添加Serializable特性就足够了:

[Serializable]
public class MyType
{
  private string _label;
  private int _value;
}

只添加一个Serializable特性就足以让它可以序列化,是因为这它的成员都是可序列化的:string和int都是.Net序列化支持的。无论是否可能,都要给类型添加序列化支持是很重要的,原因在你添加另一个类做为新类的成员时就很明显了:

[Serializable]
public class MyType
{
  private string      _label;
  private int         _value;
  private OtherClass  _object;
}

这里的Serializable特性只有在OtherClass也支持序列化时才有效。如果OtherClass不支持序列化,那么你在序列化MyType时,因为OtherClass对象也在里面,你会得到一个运行时错误。这只是因为对OtherClass的内部结构不清楚,而使序列化成为不可能。

.Net的序列化是把类中所有成员变量保存到输出流中。另外,.Net的序列化还支持任意的对象图(object graph):即使你的对象上有一个循环引用,serialize 和deserialize 方法都只会为你的实际对象读取和储存一次。当一些web对象反序列化了以后,.Net序列化框架也可以创建这些web对象的引用。你创建的任何与web相关的对象,在对象图序列化以后,你都可以正确的保存它们。最后一个要注意的地方是Serializable 特性同时支持二进制和SOAP序列化。这一原则里的所有技术都支持这两种序列化格式。但是要记住:只有当所有类型的对象图都支持序列化时才能成功。这就是为什么要让所有的类型都支持序列化显得很重要了。一但你放过了一个类,你就轻意的给对象图开了个后门,以至于所有使用这个类的人,想要序列化对象图时变得更加困难。不久以后,他们就会发现不得不自己写序列化代码了。

添加Serializable特性是一个最简单的技术来支持对象的序列化。但最简单的方案并不是总是正确的方案。有时候,你并不想序列化对象的所有成员:有些成员可能只存在于长期操作的缓存中,还有一些对象可能占用着一些运行时资源,而这些资源只能存在于内存中。你同样可以很好的使用特性来控制这些问题。添加NonSerialized特性到任何你不想让它序列化的数据成员上。这给它们标上了不用序列化的标记:

Serializable]
public class MyType
{
  private string _label;

  [NonSerialized]
  private int _cachedValue;

  private OtherClass  _object;
}

你,做为类的设计者,非序列化成员给你多添加了一点点工作。在序列化过程中,序列化API不会为你初始化非序列化成员。因为类型的构造函数没有被调用,所以成员的初始化也不会被执行。当你使用序列化特性时,非序列成员就保存着系统默认值:0或者null。当默认的0对初始化来说是不正确的,那么你须要实现IDeserializationCallback 接口,来初始化这些非序列化成员。框架会在整个对象图反序列化以后,调用这个方法。这时,你就可以用它为所有的非序列化成员进行初始化了。因为整个对象图已经载入,所以你的类型上的所有方法的调用及成员的使用都是安全的。不幸的是,这不是傻瓜式的。在整个对象图载入后,框架会在对象图中每个实现了IDeserializationCallback接口的对象上调用OnDeserialization方法。对象图中的其它任何对象可以在OnDeserialization正在进行时调用对象的公共成员。如果它们抢在了前面,那么你的非序列化成员就是null或者0。顺序是无法保证的,所以你必须确保你的所有公共成员,都能处理非序列化成员还没有初始化的这种情况。

到目前为止,你已经知道为什么要为所有类型添加序列化了:非序列化类型会在要序列化的对象中使用时带来更多的麻烦事。你也学会了用特性来实现最简单的序列化方法,还包括如何初始化非序列化成员。

序列化了对象有方法在程序的不同版本间生存。(译注:这是一个很重要的问题,因为.Net里的序列化不像C++那样,你可以轻松的自己控制每一个字节的数据,因此版本问题成了序列化中经常遇到的一个问题。) 添加序列化到一个类型上,就意味着有一天你要读取这个对象的早期版本。Serializable特性生成的代码,在对象图的成员被添加或者移除时会抛出异常。当你发现你自己已经要面对多版本问题时,你就需要在序列化过程中负出更多的操作:使用ISerializable接口。这个接口定义了一些hook用于自定义序列化你的类型。ISerializable接口里使用的方法和存储与默认的序列化方法和储存是一致的,这就是说,你可以使用序列化特性。如果什么时候有必要提供你自己的扩展序列化时,你可以再添加对ISerializable接口的支持。

做一个为例子:考虑你如何来支持MyType的第2个版本,也就是添加了另一个字段到类中时。简单的添加一个字段都会产生一个新的类型,而这与先前已经存在磁盘上的版本是不兼容的:

[Serializable]
public class MyType
{
  private string _label;

  [NonSerialized]
  private int _value;

  private OtherClass  _object;

  // Added in version 2
  // The runtime throws Exceptions
  // with it finds this field missing in version 1.0
  // files.
  private int  _value2;
}

你实现ISerializable接口来支持对这个行为的处理。ISerializable接口定义了一个方法,但你必需实现两个。ISerializable定义了GetObjectData()方法,这是用于写数据到流中。另外,如果你必须提供一个序列析构函数从流中初始化对象:

private MyType( SerializationInfo info,
  StreamingContext cntxt );

下面的序列化构造函数演示了如何从先前的版本中读取数据,以及和默认添加的Serializable特性生成的序列化保持供一致,来读取当前版本中的数据:

using System.Runtime.Serialization;
using System.Security.Permissions;

[Serializable]
public sealed class MyType : ISerializable
{
  private string _label;

  [NonSerialized]
  private int _value;

  private OtherClass  _object;

  private const int DEFAULT_VALUE = 5;
  private int  _value2;

  // public constructors elided.

  // Private constructor used only by the Serialization
       framework.
  private MyType( SerializationInfo info,
    StreamingContext cntxt )
  {
    _label = info.GetString( "_label" );
    _object = ( OtherClass )info.GetValue( "_object", typeof
      ( OtherClass ));
    try {
      _value2 = info.GetInt32( "_value2" );
    } catch ( SerializationException e )
    {
      // Found version 1.
      _value2 = DEFAULT_VALUE;
    }
  }

  [SecurityPermissionAttribute(SecurityAction.Demand,
    SerializationFormatter =true)]
  void ISerializable.GetObjectData (SerializationInfo inf,
    StreamingContext cxt)
  {
    inf.AddValue( "_label", _label );
    inf.AddValue( "_object", _object );
    inf.AddValue( "_value2", _value2 );
  }
}

序列化流是以键/值对应的方法来保存每一个元素的。默认的特性生成的代码是以变量名做为键来存储值。当你添加了ISerializable接口后,你必须匹配键名以及变量顺序。这个顺序就是在类中定义时的顺序。(顺便说一句,这实际上就是说重新排列类中的变量名或者重新给变量命名,都会破坏对已经创建了的文件的兼容性。)

同样,我已经要求过SerializationFormatter的安全许可。如果不实行恰当的保护,对于你的类来说,GetObjectData()可能存在安全漏洞。恶意代码可能会产生一个StreamingContext,从而可以用GetObjectData()方法从对象中取得值,或者不断修改版本而取得另一个SerializationInfo,或者重新组织修改的对象。这就许可了恶意的开发者来访问对象的内部状态,在流中修改它们,然而发送一个修改后的版本给你。对SerializationFormatter进行许可要求可以封闭这个安全漏洞。这样可以确保只有受信任的代码才能恰当的访问类的内部状态(参见原则47)。

但在使用ISerializable接口时有一个弊端,你可以看到,我很早就让MyType成为密封(sealed)的,这就强制让它只能成为叶子类(leaf class)。在基类实现ISerializable接口就隐式的让所有派生类也序列化。实现ISerializable就意味关所有派生类必须创建受保护构造函数以及反序列化。另外,为了支持非密封类,你必须在GetObjectData()方法创建hook,从而让派生类可以添加它们自己的数据到流中。编译器不会捕获任何这样的错误,当从流中读取派生类时,因缺少恰当的构造构造函数会在运行时抛出异常。缺少hook的GetObjectData()方法也意味着从派生类来的数据不会保存到文件中。当然也不会有错误抛出。所以我要推荐:在叶类中实现Serializable。

我没有说这,因为它不工作:为了派生类的序列化,你的基类必须支持序列化。修改MyType ,让它成为了一个可序列化的基类,你要把序列化构造函数修改为protected,然后创建一个虚方法,这样派生类就可以重载它并存储它们的数据。

using System.Runtime.Serialization;
using System.Security.Permissions;

[Serializable]
public class MyType : ISerializable
{
  private string _label;

  [NonSerialized]
  private int _value;

  private OtherClass  _object;

  private const int DEFAULT_VALUE = 5;
  private int  _value2;

  // public constructors elided.

  // Protected constructor used only by the Serialization
       framework.
  protected MyType( SerializationInfo info,
    StreamingContext cntxt )
  {
    _label = info.GetString( "_label" );
    _object = ( OtherClass )info.GetValue( "_object", typeof
      ( OtherClass ));
    try {
      _value2 = info.GetInt32( "_value2" );
    } catch ( SerializationException e )
    {
      // Found version 1.
      _value2 = DEFAULT_VALUE;
    }
  }
  [ SecurityPermissionAttribute( SecurityAction.Demand,
    SerializationFormatter =true ) ]
  void ISerializable.GetObjectData(
    SerializationInfo inf,
    StreamingContext cxt )
  {
    inf.AddValue( "_label", _label );
    inf.AddValue( "_object", _object );
    inf.AddValue( "_value2", _value2 );

    WriteObjectData( inf, cxt );
  }

  // Overridden in derived classes to write
  // derived class data:
  protected virtual void
    WriteObjectData(
    SerializationInfo inf,
    StreamingContext cxt )
  {
  }
}

一个派生类应该提供它自己的序列化构造函数,并且重载WriteObjectData方法:
public class DerivedType : MyType
{
  private int _DerivedVal;

  private DerivedType ( SerializationInfo info,
    StreamingContext cntxt ) :
      base( info, cntxt )
  {
      _DerivedVal = info.GetInt32( "_DerivedVal" );
  }

  protected override void WriteObjectData(
    SerializationInfo inf,
    StreamingContext cxt )
  {
    inf.AddValue( "_DerivedVal", _DerivedVal );
  }

}

从流中写入和读取值的顺序必须保持一致。我相信先读写基类的数据应该简单一些,所以我就这样做了。如果你写的代码不对整个继承关系进行精确的顺序序列化,那么你的序列化代码是无效的。

.Net框架提供了一个简单的方法,也是标准的算法来支持对象的序列化。如果你的类型须要持久,你应该遵守这个标准的实现。如果你的类型不支持序列化,那化其它使用这个类的类也不能序列。为了让使用类的客户更加方便,尽可能的使用默认序列化特性,并且在默认的特性不满足时要实现ISerializable 接口。
====================

   

Item 25: Prefer Serializable Types
Persistence is a core feature of a type. It's one of those basic elements that no one notices until you neglect to support it. If your type does not support serialization properly, you create more work for all developers who intend to use your types as a member or base class. When your type does not support serialization, they must work around it, adding their own implementation of a standard feature. It's unlikely that clients could properly implement serialization for your types without access to private details in your types. If you don't supply serialization, it's difficult or impossible for users of your class to add it.

Instead, prefer adding serialization to your types when practical. It should be practical for all types that do not represent UI widgets, windows, or forms. The extra perceived work is no excuse. .NET Serialization support is so simple that you don't have any reasonable excuse not to support it. In many cases, adding the Serializable attribute is enough:

[Serializable]
public class MyType
{
  private string _label;
  private int _value;
}

 

Adding the Serializable attribute works because all the members of this type are serializable: string and int both support NET serialization. The reason it's important for you to support serialization wherever possible becomes obvious when you add another field of a custom type:

[Serializable]
public class MyType
{
  private string      _label;
  private int         _value;
  private OtherClass  _object;
}

 

The Serializable attribute works here only if the OtherClass type supports .NET serialization. If OtherClass is not serializable, you get a runtime error and you have to write your own code to serialize MyType and the OtherClass object inside it. That's just not possible without extensive knowledge of the internals defined in OtherClass.

.NET serialization saves all member variables in your object to the output stream. In addition, the .NET serialization code supports arbitrary object graphs: Even if you have circular references in your objects, the serialize and deserialize methods will save and restore each actual object only once. The .NET Serialization Framework also will recreate the web of references when the web of objects is deserialized. Any web of related objects that you have created is restored correctly when the object graph is deserialized. A last important note is that the Serializable attribute supports both binary and SOAP serialization. All the techniques in this item will support both serialization formats. But remember that this works only if all the types in an object graph support serialization. That's why it's important to support serialization in all your types. As soon as you leave out one class, you create a hole in the object graph that makes it harder for anyone using your types to support serialization easily. Before long, everyone is writing their own serialization code again.

Adding the Serializable attribute is the simplest technique to support serializable objects. But the simplest solution is not always the right solution. Sometimes, you do not want to serialize all the members of an object: Some members might exist only to cache the result of a lengthy operation. Other members might hold on to runtime resources that are needed only for in-memory operations. You can manage these possibilities using attributes as well. Attach the [NonSerialized] attribute to any of the data members that should not be saved as part of the object state. This marks them as nonserializable attributes:

[Serializable]
public class MyType
{
  private string _label;

  [NonSerialized]
  private int _cachedValue;

  private OtherClass  _object;
}

 

Nonserialized members add a little more work for you, you, the class designer. The serialization APIs do not initialize nonserialized members for you during the deserialization process. None of your types' constructors is called, so the member initializers are not executed, either. When you use the serializable attributes, nonserialized members get the default system-initialized value: 0 or null. When the default 0 initialization is not right, you need to implement the IDeserializationCallback interface to initialize these nonserializable members. IDeserializationCallback contains one method: OnDeserialization. The framework calls this method after the entire object graph has been deserialized. You use this method to initialize any nonserialized members in your object. Because the entire object graph has been read, you know that any function you might want to call on your object or any of its serialized members is safe. Unfortunately, it's not fool-proof. After the entire object graph has been read, the framework calls OnDeserialization on every object in the graph that supports the IDeserializationCallback interface. Any other objects in the object graph can call your object's public members when processing OnDeserialization. If they go first, your object's nonserialized members are null, or 0. Order is not guaranteed, so you must ensure that all your public methods handle the case in which nonserialized members have not been initialized.

So far, you've learned about why you should add serialization to all your types: Nonserializable types cause more work when used in types that should be serialized. You've learned about the simplest serialization methods using attributes, including how to initialize nonserialized members.

Serialized data has a way of living on between versions of your program. Adding serialization to your types means that one day you will need to read an older version. The code generated by the Serializable attribute throws exceptions when it finds fields that have been added or removed from the object graph. When you find yourself ready to support multiple versions and you need more control over the serialization process, use the ISerializable interface. This interface defines the hooks for you to customize the serialization of your types. The methods and storage that the ISerializable interface uses are consistent with the methods and storage that the default serialization methods use. That means you can use the serialization attributes when you create a class. If it ever becomes necessary to provide your own extensions, you then add support for the ISerializable interface.

As an example, consider how you would support MyType, version 2, when you add another field to your type. Simply adding a new field produces a new format that is incompatible with the previously stored versions on disk:

[Serializable]
public class MyType
{
  private string _label;

  [NonSerialized]
  private int _value;

  private OtherClass  _object;

  // Added in version 2
  // The runtime throws Exceptions
  // with it finds this field missing in version 1.0
  // files.
  private int  _value2;
}

 

You add support for ISerializable to address this behavior. The ISerializable interface defines one method, but you have to implement two. ISerializable defines the GetObjectData() method that is used to write data to a stream. In addition, you must provide a serialization constructor to initialize the object from the stream:

private MyType( SerializationInfo info,
  StreamingContext cntxt );

 

The serialization constructor in thefollowing class shows how to read a previous version of the type and read the current version consistently with the default implementation generated by adding the Serializable attribute:

using System.Runtime.Serialization;
using System.Security.Permissions;

[Serializable]
public sealed class MyType : ISerializable
{
  private string _label;

  [NonSerialized]
  private int _value;

  private OtherClass  _object;

  private const int DEFAULT_VALUE = 5;
  private int  _value2;

  // public constructors elided.

  // Private constructor used only by the Serialization
       framework.
  private MyType( SerializationInfo info,
    StreamingContext cntxt )
  {
    _label = info.GetString( "_label" );
    _object = ( OtherClass )info.GetValue( "_object", typeof
      ( OtherClass ));
    try {
      _value2 = info.GetInt32( "_value2" );
    } catch ( SerializationException e )
    {
      // Found version 1.
      _value2 = DEFAULT_VALUE;
    }
  }

  [SecurityPermissionAttribute(SecurityAction.Demand,
    SerializationFormatter =true)]
  void ISerializable.GetObjectData (SerializationInfo inf,
    StreamingContext cxt)
  {
    inf.AddValue( "_label", _label );
    inf.AddValue( "_object", _object );
    inf.AddValue( "_value2", _value2 );
  }
}

 

The serialization stream stores each item as a key/value pair. The code generated from the attributes uses the variable name as the key for each value. When you add the ISerializable interface, you must match the key name and the order of the variables. The order is the order declared in the class. (By the way, this fact means that rearranging the order of variables in a class or renaming variables breaks the compatibility with files already created.)

Also, I have demanded the SerializationFormatter security permission. GetObjectData could be a security hole into your class if it is not properly protected. Malicious code could create a StreamingContext, get the values from an object using GetObjectData, serialize modified versions to another SerializationInfo, and reconstitute a modified object. It would allow a malicious developer to access the internal state of your object, modify it in the stream, and send the changes back to you. Demanding the SerializationFormatter permission seals this potential hole. It ensures that only properly trusted code can access this routine to get at the internal state of the object (see Item 47).

But there's a downside to implementing the ISerializable interface. You can see that I made MyType sealed earlier. That forces it to be a leaf class. Implementing the ISerializable interface in a base class complicates serialization for all derived classes. Implementing ISerializable means that every derived class must create the protected constructor for deserialization. In addition, to support nonsealed classes, you need to create hooks in the GetObjectData method for derived classes to add their own data to the stream. The compiler does not catch either of these errors. The lack of a proper constructor causes the runtime to throw an exception when reading a derived object from a stream. The lack of a hook for GetObjectData() means that the data from the derived portion of the object never gets saved to the file. No errors are thrown. I'd like the recommendation to be "implement Serializable in leaf classes."

I did not say that because that won't work. Your base classes must be serializable for the derived classes to be serializable. To modify MyType so that it can be a serializable base class, you change the serializable constructor to protected and create a virtual method that derived classes can override to store their data:

using System.Runtime.Serialization;
using System.Security.Permissions;

[Serializable]
public class MyType : ISerializable
{
  private string _label;

  [NonSerialized]
  private int _value;

  private OtherClass  _object;

  private const int DEFAULT_VALUE = 5;
  private int  _value2;

  // public constructors elided.

  // Protected constructor used only by the Serialization
       framework.
  protected MyType( SerializationInfo info,
    StreamingContext cntxt )
  {
    _label = info.GetString( "_label" );
    _object = ( OtherClass )info.GetValue( "_object", typeof
      ( OtherClass ));
    try {
      _value2 = info.GetInt32( "_value2" );
    } catch ( SerializationException e )
    {
      // Found version 1.
      _value2 = DEFAULT_VALUE;
    }
  }
  [ SecurityPermissionAttribute( SecurityAction.Demand,
    SerializationFormatter =true ) ]
  void ISerializable.GetObjectData(
    SerializationInfo inf,
    StreamingContext cxt )
  {
    inf.AddValue( "_label", _label );
    inf.AddValue( "_object", _object );
    inf.AddValue( "_value2", _value2 );

    WriteObjectData( inf, cxt );
  }

  // Overridden in derived classes to write
  // derived class data:
  protected virtual void
    WriteObjectData(
    SerializationInfo inf,
    StreamingContext cxt )
  {
  }
}

 

A derived class would provide its own serialization constructor and override the WriteObjectData method:

public class DerivedType : MyType
{
  private int _DerivedVal;

  private DerivedType ( SerializationInfo info,
    StreamingContext cntxt ) :
      base( info, cntxt )
  {
      _DerivedVal = info.GetInt32( "_DerivedVal" );
  }

  protected override void WriteObjectData(
    SerializationInfo inf,
    StreamingContext cxt )
  {
    inf.AddValue( "_DerivedVal", _DerivedVal );
  }

}

 

The order of writing and retrieving values from the serialization stream must be consistent. I've chosen to read and write the base class values first because I believe it is simpler. If your read and write code does not serialize the entire hierarchy in the exact same order, your serialization code won't work.

The .NET Framework provides a simple, standard algorithm for serializing your objects. If your type should be persisted, you should follow the standard implementation. If you don't support serialization in your types, other classes that use your type can't support serialization, either. Make it as easy as possible for clients of your class. Use the default methods when you can, and implement the ISerializable interface when the default attributes don't suffice.
 

抱歉!评论已关闭.