Comparison of data serialization formats

Comparison of data serialization formats

This is a comparison of data serialization formats, different ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.

Contents

Overview

Name Creator/Maintainer Based on Standardized? Specification Binary? Human-readable? Includes reference support?e Schema/IDL? Standard APIs
ASN.1 ISO, IEC, ITU-T N/A Yes ISO/IEC 8824; X.680 series of ITU-T Recommendations Yes
(BER, DER, PER, or custom via ECN)
Yes
(XER, GSER, or custom via ECN)
Partial f Yes (built-in) N/A
Bencode BitTorrent, Inc. N/A Yes Part of BitTorrent protocol specification Partially
(numbers are ASCII-based)
No No No No
BSON MongoDB JSON Yes BSON Specification Yes No No No No
Comma-separated values (CSV) RFC author:
Yakov Shafranovich
N/A Partial
(myriad informal variants used)
RFC 4180
(among others)
No Yes No No No
D-Bus Message Protocol freedesktop.org N/A Yes D-Bus Specification Yes Yes
(Type Signatures)
No No Yes
(see D-Bus)
JSON Douglas Crockford JavaScript syntax Yes RFC 4627 No, but see BSON Yes No Partial
(Kwalify, Rx, JSON Schema Proposal)
No
MessagePack Sadayuki Furuhashi JSON (loosely) Yes MessagePack format specification Yes No No No No
Netstrings Dan Bernstein N/A Yes netstrings.txt Yes Yes No No No
OGDL Rolf Veen ? Yes 1.0 Working draft Yes
(Binary 1.0 Working draft)
Yes Yes
(Path 1.0 Working draft)
Yes
(Schema WD)
Property list NeXT (creator)
Apple (maintainer)
? Partial Public DTD for XML format Yesa Yesb No ? Cocoa, CoreFoundation, OpenStep, GnuStep
Protocol Buffers Google N/A Partial Developer Guide: Encoding Yes Partiald No Yes (built-in)
S-expressions Internet Draft author:
Ron Rivest
Lisp, Netstrings Partial
(largely de facto)
"S-Expressions" Internet Draft Yes
("Canonical representation")
Yes
("Advanced transport representation")
No No
Structured Data eXchange Formats IETF N/A Yes RFC 3072 Yes No No No
Thrift Facebook (creator)
Apache (maintainer)
N/A No Original whitepaper Yes Partialc No Yes (built-in)
eXternal Data Representation IETF N/A Yes RFC 4506 Yes No Yes Yes Yes
XML W3C SGML Yes W3C Recommendations:
1.0 (Fifth Edition)
1.1 (Second Edition)
Partial
(Binary XML)
Yes Yes (XPointer, XPath) Yes (XML schema) DOM, SAX, XQuery, XPath
YAML Clark Evans, Ingy döt Net, and Oren Ben-Kiki XML, C, Python, Perl, Email Yes Version 1.2 No Yes Yes Partial (Kwalify, Rx, built-in language type-defs) No
  • a. ^ The current default format is binary.
  • b. ^ The "classic" format is plain text, and an XML format is also supported.
  • c. ^ Theoretically possible due to abstraction, but no implementation is included.
  • d. ^ The primary format is binary but a text format is available.[1]
  • e. ^ Means that generic tools/libraries know how to encode, decode, and dereference a reference to another piece of data in the same document. A tool may require the IDL file, but no more. Excludes custom, non-standardized referencing techniques.
  • f. ^ ASN.1 does offer OIDs, a standard format for globally unique identifiers. However, there is no standard for "marking"/"tagging" an arbitrary piece of data in a document with an OID. There is also no standard format for locally unique identifiers within a document. Therefore, a generic ASN.1 tool/library can not automatically encode/decode/resolve references within a document without help from custom-written program code.

Syntax comparison of human-readable formats

Format Null Boolean true Boolean false Integer Floating-point String Array Associative array/Object
ASN.1
(XML Encoding Rules)
<foo /> <foo>true</foo> <foo>false</foo> <foo>685230</foo> <foo>6.8523015e+5</foo> <foo>A to Z</foo>
<SeqOfUnrelatedDatatypes>
    <isMarried>true</isMarried>
    <hobby />
    <velocity>-42.1e7</velocity>
    <bookname>A to Z</bookname>
    <bookname>We said, "no".</bookname>
</SeqOfUnrelatedDatatypes>
An object (the key is a field name):
<person>
    <isMarried>true</isMarried>
    <hobby />
    <height>1.85</height>
    <name>Bob Peterson</name>
</person>

A data mapping (the key is a data value):

<competition>
    <measurement>
        <name>John</name>
        <height>3.14</height>
    </measurement>
    <measurement>
        <name>Jane</name>
        <height>2.718</height>
    </measurement>
</competition>

a

CSVb nulla
(or an empty element in the row)a
1a
truea
0a
falsea
685230
-685230a
6.8523015e+5a A to Z
"We said, ""no""."
true,,-42.1e7,"A to Z"
42,1
A to Z,1,2,3
Netstringsc 0:,a
4:null,a
1:1,a
4:true,a
1:0,a
5:false,a
6:685230,a 9:6.8523e+5,a 6:A to Z, 29:4:true,0:,7:-42.1e7,6:A to Z,, 41:9:2:42,1:1,,25:6:A to Z,12:1:1,1:2,1:3,,,,a
JSON null true false 685230
-685230
6.8523015e+5 "A to Z" [true, null, -42.1e7, "A to Z"] {"42": true, "A to Z": [1, 2, 3]}
OGDL[verification needed] nulla truea falsea 685230a 6.8523015e+5a "A to Z"
'A to Z'
NoSpaces
true
null
-42.1e7
"A to Z"

(true, null, -42.1e7, "A to Z")

42
  true
"A to Z"
  1
  2
  3
42
  true
"A to Z", (1, 2, 3)
Property list
(plain text format)[2]
N/A <*BY> <*BN> <*I685230> <*R6.8523015e+5> "A to Z" ( <*BY>, <*R-42.1e7>, "A to Z" )
{
    "42" = <*BY>;
    "A to Z" = ( <*I1>, <*I2>, <*I3> );
}
Property list
(XML format)[3][4]
N/A <true /> <false /> <integer>685230</integer> <real>6.8523015e+5</real> <string>A to Z</string>
<array>
    <true />
    <real>-42.1e7</real>
    <string>A to Z</string>
</array>
<dict>
    <key>42</key>
    <true />
    <key>A to Z</key>
    <array>
        <integer>1</integer>
        <integer>2</integer>
        <integer>3</integer>
    </array>
</dict>
S-expressions NIL
nil
T
#te
true
NIL
#fe
false
685230 6.8523015e+5 abc
"abc"
#616263#
3:abc
{MzphYmM=}
|YWJj|
(T NIL -42.1e7 "A to Z") ((42 T) ("A to Z" (1 2 3)))
YAML ~
null
Null
NULL[5]
y
Y
yes
Yes
YES
on
On
ON
true
True
TRUE[6]
n
N
no
No
NO
off
Off
OFF
false
False
FALSE[6]
685230
+685_230
-685230
02472256
0x_0A_74_AE
0b1010_0111_0100_1010_1110
190:20:30[7]
6.8523015e+5
685.230_15e+03
685_230.15
190:20:30.15
.inf
-.inf
.Inf
.INF
.NaN
.nan
.NAN[8]
A to Z
"A to Z"
'A to Z'
[y, ~, -42.1e7, "A to Z"]
- y
-
- -42.1e7
- A to Z
{"John":3.14, "Jane":2.718}
42: y
A to Z: [1, 2, 3]
XMLd <null />a <boolean val="true"/>a

<true />a

<boolean val="false"/>a

<false />a

<integer>685230</integer>a <float>6.8523015e+5</float>a A to Z a
<array>
  <element type="boolean">true</element>
  <element type="null"/>
  <element type="float">-42.1e7</element>
  <element type="string">A to Z</element>
</array>
a
<associative-array>
  <entry>
    <key type="integer">42</key>
    <value type="boolean">true</value>
  </entry>
  <entry>
    <key type="string">A to Z</key>
    <value>
      <array>
        <element type="integer" val="1"/>
        <element type="integer" val="2"/>
        <element type="integer" val="3"/>
      </array>
    </value>
  </entry>
</associative-array>
  • a. ^ One possible encoding; the specification document does not specifically give an encoding for this datatype.
  • b. ^ The RFC CSV specification only deals with delimiters, newlines, and quote characters; it does not directly deal with serializing programming data structures.
  • c. ^ The netstrings specification only deals with nested byte strings; anything else is outside the scope of the specification.
  • d. ^ XML in and of itself is not a data serialization language, but many data serialization formats have been derived from it; as such, there are many different ways, in addition to those shown, to serialize programming data structures into XML.
  • e. ^ This syntax is not compatible with the Internet-Draft, but is used by some dialects of Lisp.

Comparison of binary formats

Format Null Booleans Integer Floating-point String Array Associative array/Object
ASN.1
(BER or PER encoding)
NULL type BOOLEAN; BER as 1 byte in binary form INTEGER; variable length big-endian binary representation up to 2^2^1024 bits REAL; representation as IEEE double or as three integer (mantissa + base + exponent) Multiple valid types (VisibleString, PrintableString, GeneralString, UniversalString) data specifications SET OF (unordered) and SEQUENCE OF (guaranteed order) user definable type
BSON[9] Null type - 0 bytes for value True: one byte \x01
False: \x00
int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complement double: little-endian binary64 UTF-8 encoded, preceded by int32 encoded string length in bytes BSON embedded document with numeric keys BSON embedded document
MessagePack \xc0 True: \xc3 False: \xc2 Single byte "fixnum" (values -32..127)

or typecode (one byte) + big-endian (u)int8/16/32/64

Typecode (one byte) + IEEE single/double As "fixraw" (single-byte prefix + up to 31 raw bytes)

or typecode (one byte) + 2-4 bytes length + raw bytes

As "fixarray" (single-byte prefix + up to 15 array items)

or typecode (one byte) + 2-4 bytes length + array items

As "fixmap" (single-byte prefix + up to 15 key-value pairs)

or typecode (one byte) + 2-4 bytes length + key-value pairs

Netstrings 0:, True: 1:1,

False: 1:0,

OGDL Binary
Property list
(binary format)
Protocol Buffers[10] Variable encoding length signed 32-bit: varint encoding of "ZigZag"-encoded value (n << 1) XOR (n >> 31)

Variable encoding length signed 64-bit: varint encoding of "ZigZag"-encoded (n << 1) XOR (n >> 63)
Constant encoding length 32-bit: 32 bits in little-endian 2's complement
Constant encoding length 64-bit: 54 bits in little-endian 2's complement

floats: little-endian binary32

doubles: little-endian binary64

UTF-8 encoded, preceded by varint-encoded integer length of string in bytes
Thrift

See also

References

External links


Wikimedia Foundation. 2010.

Игры ⚽ Нужен реферат?

Look at other dictionaries:

  • Serialization — This article is about data structure encoding. For other uses, see Serialization (disambiguation). In computer science, in the context of data storage and transmission, serialization is the process of converting a data structure or object state… …   Wikipedia

  • Comparison of document markup languages — The following tables compare general and technical information for a number of document markup languages. Please see the individual markup languages articles for further information. Contents 1 General information 2 Characteristics 3 Notes 4 …   Wikipedia

  • Comparison of programming languages (mapping) — Programming language comparisons General comparison Basic syntax Basic instructions Arrays Associative arrays String operations …   Wikipedia

  • Comparison of programming paradigms — Programming paradigms Agent oriented Automata based Component based Flow based Pipelined Concatenative Concurrent computin …   Wikipedia

  • Comma-separated values — Comma separated list Filename extension .csv or .txt Internet media type text/csv Type of format multiplatform …   Wikipedia

  • OGDL — (Ordered Graph Data Language), is a structured textual format that represents information in the form of graphs, where the nodes are strings and the arcs or edges are spaces or indentation. [1] Like XML, but unlike JSON and YAML, OGDL includes a… …   Wikipedia

  • S-выражение — Термин S выражение или sexp (для символического выражения) относится к соглашению о способе записи полуструктурированных данных (англ.) в доступной для человеческого понимания текстовой форме. Символические выражения создаются, в основном,… …   Википедия

  • Lightweight markup language — A lightweight markup language is a markup language with a simple syntax, designed to be easy for a human to enter with a simple text editor, and easy to read in its raw form. Lightweight markup languages are used in applications where people… …   Wikipedia

  • MessagePack — Original author(s) Sadayuki Furuhashi Stable release 0.5.7 Development status Active Written in Various languages Operating …   Wikipedia

  • XML — Infobox file format name = Extensible Markup Language icon = logo = extension = .xml mime = application/xml, text/xml (deprecated) type code = uniform type = public.xml magic = owner = World Wide Web Consortium genre = Markup language container… …   Wikipedia

Share the article and excerpts

Direct link
Do a right-click on the link above
and select “Copy Link”