C/C++: Should serialization methods be class members?


Keywords:c++ 


Question: 

Suppose we have a complex (i.e. non-primitive) class ComplexObject defined below:

class A{...};

class B{...};

class C{...};

class ComplexObject
{
private:
    A _fieldA;
    B _fieldB;
    C _fieldC;
};

I would like to implement a serializer that serializes instances of ComplexObject into binary form. From my experience in C#, I have seen essentially 3 distinct ways to implement a serializer.

  1. Define a serialize(binarystream&) method in ComplexObject's definition and those of the "child" classes, A, B, and C. The serialize method defined in ComplexObject will recursively call those of the child members.
  2. Create a separate class that contains methods to serialize each of ComplexObject, A, B, and C. The method used to serialize ComplexObject will recursively call those of the child members. Of course, getters will have to be defined in the classes to retrieve private fields for the serializer.
  3. Use reflection to generate a template of the object and to write all serializable fields into a table according to the generated template.

Unfortunately I believe reflection will be extremely hard to utilize in C++, so I shall stay away from the third option. I have seen options 1 and 2 both been used very often (in C#).

An advantage that option 1 possess over option 2 is that it allows for classes that derive from ComplexObject, by marking the serilalize(binarystream&) method virtual. However, it would add to the list of member functions of an object and confuse programmers. You don't see a serialize method being defined in std::string, do you?

On the other hand, option 2 takes out and groups all serialization methods together to make things a bit neater. However, I suppose it isn't as easy to accommodate for derived classes of ComplexObject.

Under which circumstances should each of the options (1 and 2) be used?


3 Answers: 

I wold not bother with self-made serializer. (Notice that designing deserialization is harder than serialization...) I would rather use something line:

or boost (you can also check how they solved similar problem)

Getting back to your dilemma. Grouping all serialization code in a single class is a bad idea, because this class would grown with each new serializable object. You could use friend "serializer" class for each "serializable" class, or use friend method / operator<<. But there is no perfect solution and it is not easy task. If you can, use lib.



I choose "both". Serialization has components in the object and (templated) free standing functions.

For example:

class Serialization_Interface
{
  public:
    virtual void load_from_buffer(uint8_t*& buffer_ptr) = 0;
};

void Load_From_Buffer(unsigned int& number, uint8_t*& buffer_pointer)
{
  number = *((unsigned int *) buffer_ptr);
  buffer_pointer += sizeof(unsigned int);
}

template <class Object>
void Load_From_Buffer(Object& obj, uint8_t*& buffer_pointer)
{
  obj.load_from_buffer(buffer_pointer);
}

Don't limit yourself to two choices. There's always a third alternative. :-)

Also, don't reinvent the wheel, check out Boost::serialization.



C++ doesn't have reflection, but that doesn't mean serialization code needs to be written by hand.

You can use a code generator (for example, protocol buffers) to create the serialization code from a simple description. Of course, that description format doesn't support the rich C++ features for creating your public API, but you can take the data structure type created by the code generator and embed that inside your "real" class, either directly embedded or via pimpl. That way you write all non-serialization behavior in your class, but it doesn't have any data of its own, it relies on the serialization object to store the data.

It's basically like your method #2, but applying inversion of control. The serializer logic doesn't reach into your class to get access to the data, instead it becomes responsible for storing the data where your class also can use it.