Data Serialization

Updated: 2021-11-30

Data can be saved on disk, or sent from one application to another application over a network. The format of the data can be different from the data in memory.

  • Serialization: encoding structured data. The process of converting data in memory to a format in which it can be stored on disk or sent over a network.
  • Deserialization: the process of reading data from disk or network into memory.

Text Format


  • Pro: human-readable
  • Con: not very efficient in terms of either storage space or parse time.

Binary Formats

  • Pro: compact and faster to process.
  • Con: not human-readable

Most notable over-the-wire formats: ProtoBuf, Thrift and Avro. For storage, some columnar formats are gaining popularity.

For more info about ProtoBuf/Thrift/Avro, check the API page.


  • to store key-value pairs.
  • commonly used in Hadoop as an input and output file format. MapReduce also uses SequenceFiles to store the temporary output from map functions.
  • three different formats:
    • Uncompressed,
    • Record Compressed: only the value in a record is compressed
    • Block Compressed: both keys and values are compressed.