Proto2 vs Proto3
Proto1 is deprecated.
Proto3 is a simplification of Proto2. Both Proto2 and Proto3 are active
Proto2 and proto3 are wire compatible: the same construct in proto2 and proto3 will have the same binary representation. Which means they can reference symbols across versions and generate code that works well together.
- Proto2: supports
optionalnatively, the wrapper types were recommended but should be avoided in new applications.
- Proto3: originally did not support presence tracking for primitive fields. As of 2020, proto3 supports both
optionalfields which have
has_foo()methods and "singular" fields, which do not. Be sure to use
optionalif your protocol requires knowledge of field presence.
Proto3 does not permit custom default values. All fields in proto3 have consistent zero defaults.
Proto3 removes support for
enumsrequire an entry with the value
0to act as the default value.
enumsuse the first syntactic entry in the
enumdeclaration as the default value where it is otherwise unspecified.
In languages with closed enums (ex. Java):
- all proto3 enums generate an
UNRECOGNIZEDentry to accommodate unknown enum values. proto3 setters prohibit
UNRECOGNIZEDvalues, so a simple copy of an enum field from one proto to another will crash if the enum field value is
- Proto2 enums never represent unknown enum values, but instead place them in the unknown field set. A proto2 enum can have confusing behavior (ex. repeated fields report incorrect counts and are reordered in reserialization when an unknown value is encountered)
Enums cross reference
- A proto2 message can reference a proto3 enum or message
- A proto3 message cannot reference a proto2 enum due to differences in semantics.
Extensions / Any
Proto3 removes support for
extensions; instead use of
Any fields to represent untyped fields. The extensions mechanism is wire compatible with a normal field declaration whereas
Any is not, so a field cannot be changed to an
Any as the schema evolves, while it could be changed to an extension in proto2.
Any is significantly more verbose on the wire as it uses a string based
type_url as a key while extensions use a varint encoded field number.
Parsed eagerly or lazily:
- Extensions (other than
MessageSet) are parsed eagerly (and sometimes selectively if you provide a custom ExtensionRegistry)
Anyis always parsed lazily. This delta in performance profile may be important for some applications (e.g. an Android app may prefer to parse messages off the UI thread).
String field validation
Protocol Buffer string fields have always been documented to be
- Proto2 does not validate that inbound / outbound bytes are indeed UTF-8 encoded.
- Proto3 validates that all string fields are appropriately
UTF-8encoded during parsing and in byte-oriented setters.
This validation means that parsing string fields in proto3 is more CPU intensive and parse failures are possible when passed an improperly structured string field. The flipside is that eager validation ensures that the problem can be identified quickly and resolved at the source.
String field parsing
In Java, proto3 parses
String fields as
UTF-8 eagerly whereas proto2 parses them lazily.
Proto3 defines a canonical JSON specification for all features whereas there is no specification for various proto2 features like extensions. The behavior of proto2 features is thus implementation-dependent.
- proto3 adds int min/max sentinels to C++ enums, preventing use of
- In proto3,
optionalfields cannot be changed to
repeatedbecause that will cause old messages to be declared invalid.
- it is unsafe to rename or change proto packages of any proto used in an Any proto. Extension resolution is numeric, like field numbers. Any proto resolution is stringy like stubby methods.