Thursday, April 14, 2016

Interesting JSON Benchmarks Go.

These days lots of people are buiding microservices, and microservices usually involve HTTP API’s, which in turn usually exchange data as JSON.

Not long ago somebody pointed out that a lot of effort goes into generating and parsing that JSON. It would be unwise to simply ignore this part of your system’s design.

Since it’s very easy to benchmark things in Go, I decided to do a quick comparison of JSON encoding strategies.

Go JSON Encoding

The normal way to generate JSON in Go is to use the encoding/json package, and feed your struct into the MarshalJSON function. This function will take anything and try to convert it to JSON. If your struct, or anything in it, has its own MarshalJSON function then that is used, otherwise it’s examined using reflection.

Reflection is (supposed to be) expensive, so I wanted to see how much I might save by making my own JSON encoder for a struct. The main point being that I already know what the struct is made of, so I can save the encoder the trouble of examining it.

Benchmarked Variations

I started with several, er, structurally identical structs:

  1. A naïve one, with no MarshalJSON function of its own.
  2. A hinted one, with field names provided.
  3. A smart one, with its own proper MarshalJSON function.
  4. A fake one, which returns previously set data from its MarshalJSON.

The point of the fake one, of course, is to isolate the overhead of the actual JSON encoding.

All of these, when set up with a bit of standard fake data, generate the following JSON:

{
   "Id" : 123,
   "Stuff" : [
      "fee",
      "fi",
      "fo",
      "fum"
   ],
   "Desc" : "Something with \"quotes\" to untangle.",
   "Time" : "1970-01-01T01:16:40+01:00",
   "Insiders" : {
      "One" : {
         "Id" : 321,
         "Name" : "Eenie"
      },
      "Two" : {
         "Id" : 421,
         "Name" : "Meenie"
      }
   }
}

Surprising Results

Here are the benchmark results for this little experiment, as run on a MacBook Pro (Mid 2014) with 2.8 GHz i5, 8 GB RAM, under Go 1.6.1.

BenchmarkNaïveJsonMarshal-4               300000          4299 ns/op
BenchmarkHintedJsonMarshal-4              300000          4293 ns/op
BenchmarkSmartJsonMarshal-4               200000          6490 ns/op
BenchmarkSmartJsonMarshalDirect-4         300000          4149 ns/op
BenchmarkFakeSmartJsonMarshal-4          1000000          2299 ns/op
BenchmarkFakeSmartJsonMarshalDirect-4   10000000           115 ns/op

In the “Direct” benchmarks, the struct’s own MarshalJSON function is called without going through encoding/json, i.e. without any sanity-checking.

I expected to see a lot of overhead from the reflection, i.e. the unknown struct being examined. Instead I found that using your own MarshalJSON function is actually slower because json.MarshalJSON (sensibly enough) validates the JSON output for you, lest it accidentally return invalid JSON itself.

Also, the hinting doesn’t make much of a difference, but it can make your JSON output prettier and more predictable: one usually uses it to have lowercase and/or underscore_separated key names in JSON objects, and to omit null objects in order to compact the JSON.

Using the numbers above we can very crudely estimate:

  • Custom encoding with validity checks is about 50% slower.
  • Custom encoding without validity checks is about 3.5% faster.
  • Best-case custom encoding with validity checks is about 50% faster.

In order to use the custom encoding without validity checks, you have to do all the encoding in a non-idiomatic way. This makes your codebase more fragile, because a new collaborator can’t just step in and do the obvious thing without undoing your optimizations.

It would be interesting to see how these numbers scaled with more complex structs, in particular deeper nested objects.

Based on these benchmarks, which I admit are oversimplified, I recommend avoiding custom MarshalJSON functions unless you absolutely need them for handling unusual data structures. If you want them for speed, make sure to benchmark your implementation before making a final decision.

Source Code

No comments: