These days lots of people are buiding microservices, and microservices usually involve HTTP API’s, which in turn usually exchange data as JSON.
Not long ago somebody pointed out that a lot of effort goes into generating and parsing that JSON. It would be unwise to simply ignore this part of your system’s design.
Since it’s very easy to benchmark things in Go, I decided to do a quick comparison of JSON encoding strategies.
Go JSON Encoding
The normal way to generate JSON in Go is to use the encoding/json
package, and feed your struct into the MarshalJSON
function. This function will take anything and try to convert it to JSON. If your struct, or anything in it, has its own MarshalJSON
function then that is used, otherwise it’s examined using reflection.
Reflection is (supposed to be) expensive, so I wanted to see how much I might save by making my own JSON encoder for a struct. The main point being that I already know what the struct is made of, so I can save the encoder the trouble of examining it.
Benchmarked Variations
I started with several, er, structurally identical structs:
- A naïve one, with no
MarshalJSON
function of its own. - A hinted one, with field names provided.
- A smart one, with its own proper
MarshalJSON
function. - A fake one, which returns previously set data from its
MarshalJSON
.
The point of the fake one, of course, is to isolate the overhead of the actual JSON encoding.
All of these, when set up with a bit of standard fake data, generate the following JSON:
{
"Id" : 123,
"Stuff" : [
"fee",
"fi",
"fo",
"fum"
],
"Desc" : "Something with \"quotes\" to untangle.",
"Time" : "1970-01-01T01:16:40+01:00",
"Insiders" : {
"One" : {
"Id" : 321,
"Name" : "Eenie"
},
"Two" : {
"Id" : 421,
"Name" : "Meenie"
}
}
}
Surprising Results
Here are the benchmark results for this little experiment, as run on a MacBook Pro (Mid 2014) with 2.8 GHz i5, 8 GB RAM, under Go 1.6.1.
BenchmarkNaïveJsonMarshal-4 300000 4299 ns/op
BenchmarkHintedJsonMarshal-4 300000 4293 ns/op
BenchmarkSmartJsonMarshal-4 200000 6490 ns/op
BenchmarkSmartJsonMarshalDirect-4 300000 4149 ns/op
BenchmarkFakeSmartJsonMarshal-4 1000000 2299 ns/op
BenchmarkFakeSmartJsonMarshalDirect-4 10000000 115 ns/op
In the “Direct” benchmarks, the struct’s own MarshalJSON
function is called without going through encoding/json
, i.e. without any sanity-checking.
I expected to see a lot of overhead from the reflection, i.e. the unknown struct being examined. Instead I found that using your own MarshalJSON
function is actually slower because json.MarshalJSON
(sensibly enough) validates the JSON output for you, lest it accidentally return invalid JSON itself.
Also, the hinting doesn’t make much of a difference, but it can make your JSON output prettier and more predictable: one usually uses it to have lowercase and/or underscore_separated key names in JSON objects, and to omit null objects in order to compact the JSON.
Using the numbers above we can very crudely estimate:
- Custom encoding with validity checks is about 50% slower.
- Custom encoding without validity checks is about 3.5% faster.
- Best-case custom encoding with validity checks is about 50% faster.
In order to use the custom encoding without validity checks, you have to do all the encoding in a non-idiomatic way. This makes your codebase more fragile, because a new collaborator can’t just step in and do the obvious thing without undoing your optimizations.
It would be interesting to see how these numbers scaled with more complex structs, in particular deeper nested objects.
Based on these benchmarks, which I admit are oversimplified, I recommend avoiding custom MarshalJSON
functions unless you absolutely need them for handling unusual data structures. If you want them for speed, make sure to benchmark your implementation before making a final decision.
No comments:
Post a Comment