Consider switching to Simdjson-go

We should consider switching our JSON parsing to: GitHub - minio/simdjson-go: Golang port of simdjson: parsing gigabytes of JSON per second

It claims to be 10x faster than Go’s JSON parser.

CC: @michaelcompton @vvbalaji

@mrjn May be you can consider json-iterator(GitHub - json-iterator/go: A high-performance 100% compatible drop-in replacement of "encoding/json") which is more mature and also 10x faster, and it 100% compatibility with standard Go’s JSON lib.

2 Likes

Is simdjson faster without a GPU (are the other SIMD implementations? I thought it’s currently only available to GPUs and very specialized hardware)?

We should probably run a poll to figure out how many of our deployments run on machines that have access to some SIMD (they have a handy function to check if it’s compatible).

json-iterator/go doesn’t seem to have any such requirements.

SIMD has been around for a while now. I think most modern intel chips support it. FWIW, typically these implementations do also have a fail-back mode, where they can switch to a normal Go implementation on platforms which don’t support SSE.

Though the json-iterator approach looks better, it has a lot more stars too, so more popular in the community.

Pretty much all cpus nowadays have some sort of SIMD architecture, so this shouldn’t be a concern.

I’d love to see benchmarks on real dgraph data!

I’ve found that simdjson-go has a nice benchmarks_test.go which compares it to std-json and jsoniter. A summary of this can be found here: GitHub - minio/simdjson-go: Golang port of simdjson: parsing gigabytes of JSON per second

BenchmarkApache_builds-12                           8571            133707 ns/op         951.90 MB/s        1001 B/op         21 allocs/op
BenchmarkEncodingJsonApache_builds-12                745           1547654 ns/op          82.24 MB/s      464078 B/op       9717 allocs/op
BenchmarkJsoniterApache_builds-12                   1131           1036249 ns/op         122.82 MB/s      520837 B/op      13249 allocs/op
BenchmarkCanada-12                                    31          34580642 ns/op          65.10 MB/s    12463564 B/op     333536 allocs/op
BenchmarkEncodingJsonCanada-12                        25          46280032 ns/op          48.64 MB/s    12260723 B/op     392536 allocs/op
BenchmarkJsoniterCanada-12                            16          65812488 ns/op          34.20 MB/s    16636987 B/op     665989 allocs/op
BenchmarkCitm_catalog-12                             560           2155356 ns/op         801.35 MB/s      223779 B/op      14501 allocs/op
BenchmarkEncodingJsonCitm_catalog-12                  56          21589300 ns/op          80.00 MB/s     5218456 B/op      95480 allocs/op
BenchmarkJsoniterCitm_catalog-12                      99          11787867 ns/op         146.52 MB/s     5663367 B/op     118756 allocs/op
BenchmarkGithub_events-12                          13824             86082 ns/op         756.63 MB/s        1614 B/op        119 allocs/op
BenchmarkEncodingJsonGithub_events-12               1642            720462 ns/op          90.40 MB/s      187511 B/op       3331 allocs/op
BenchmarkJsoniterGithub_events-12                   2608            459739 ns/op         141.67 MB/s      222902 B/op       4443 allocs/op
BenchmarkGsoc_2018-12                                838           1426016 ns/op        2333.66 MB/s       13275 B/op         66 allocs/op
BenchmarkEncodingJsonGsoc_2018-12                     46          25282600 ns/op         131.63 MB/s     7064482 B/op      58616 allocs/op
BenchmarkJsoniterGsoc_2018-12                         66          18439356 ns/op         180.47 MB/s    11802709 B/op      90914 allocs/op
BenchmarkInstruments-12                             2926            397129 ns/op         554.85 MB/s        6398 B/op       1792 allocs/op
BenchmarkEncodingJsonInstruments-12                  367           3234329 ns/op          68.13 MB/s      889352 B/op      13337 allocs/op
BenchmarkJsoniterInstruments-12                      554           2162449 ns/op         101.90 MB/s      998816 B/op      18777 allocs/op
BenchmarkMarine_ik-12                                 30          33599490 ns/op          88.79 MB/s     8656960 B/op     436577 allocs/op
BenchmarkEncodingJsonMarine_ik-12                     16          66812538 ns/op          44.65 MB/s    22791466 B/op     614777 allocs/op
BenchmarkJsoniterMarine_ik-12                         19          59789532 ns/op          49.90 MB/s    25483422 B/op     692512 allocs/op
BenchmarkMesh-12                                     100          10199999 ns/op          70.94 MB/s     2694608 B/op     134240 allocs/op
BenchmarkEncodingJsonMesh-12                          79          15341778 ns/op          47.17 MB/s     7161936 B/op     149505 allocs/op
BenchmarkJsoniterMesh-12                              66          17712080 ns/op          40.85 MB/s     8020255 B/op     184138 allocs/op
BenchmarkMesh_pretty-12                               99          11808081 ns/op         133.58 MB/s     3312611 B/op     134240 allocs/op
BenchmarkEncodingJsonMesh_pretty-12                   54          22203720 ns/op          71.04 MB/s     7477376 B/op     149505 allocs/op
BenchmarkJsoniterMesh_pretty-12                       54          21444441 ns/op          73.56 MB/s     8657071 B/op     204038 allocs/op
BenchmarkNumbers-12                                  525           2276186 ns/op          65.95 MB/s      802246 B/op      30030 allocs/op
BenchmarkEncodingJsonNumbers-12                      487           2515404 ns/op          59.68 MB/s     1066432 B/op      20026 allocs/op
BenchmarkJsoniterNumbers-12                          354           3347443 ns/op          44.85 MB/s     1226480 B/op      30038 allocs/op
BenchmarkRandom-12                                  1052           1134981 ns/op         449.77 MB/s       10323 B/op       2067 allocs/op
BenchmarkEncodingJsonRandom-12                       124           9548385 ns/op          53.46 MB/s     2994630 B/op      70084 allocs/op
BenchmarkJsoniterRandom-12                           169           7147923 ns/op          71.42 MB/s     3197255 B/op      88091 allocs/op
BenchmarkTwitter-12                                 1642            735079 ns/op         859.11 MB/s       15740 B/op       1565 allocs/op
BenchmarkEncodingJsonTwitter-12                      148           8047331 ns/op          78.47 MB/s     2264518 B/op      31759 allocs/op
BenchmarkJsoniterTwitter-12                          220           5409089 ns/op         116.75 MB/s     2461476 B/op      45043 allocs/op
BenchmarkTwitterescaped-12                          1016           1176180 ns/op         478.17 MB/s       16347 B/op       1565 allocs/op
BenchmarkEncodingJsonTwitterescaped-12               152           7888163 ns/op          71.30 MB/s     2366189 B/op      31757 allocs/op
BenchmarkJsoniterTwitterescaped-12                   182           6516472 ns/op          86.31 MB/s     2640968 B/op      47198 allocs/op
BenchmarkUpdate_center-12                           1689            712848 ns/op         747.95 MB/s        2594 B/op         57 allocs/op
BenchmarkEncodingJsonUpdate_center-12                153           7869284 ns/op          67.75 MB/s     2768037 B/op      49089 allocs/op
BenchmarkJsoniterUpdate_center-12                    190           6242113 ns/op          85.42 MB/s     3090373 B/op      66758 allocs/op

I can see jsoniter being very similar to std-json, while simdjson is way faster.

Benchmarks on jsoniter are a few years old, and it seems std-json has caught up. I really like this article: Go: Is the encoding/json Package Really Slow? | by Vincent Blanchon | A Journey With Go | Medium which compares std-json performance improvements between different go versions.