Sorry to revive this thread, but I didn’t see any update about this, for this reason I’ve been doing some test about antlr4 with GraphQL
and it’s incredible slow if you compare with the current parser. I used the same grammar linked previously with some changes, I didn’t spend so much time on this because I wanted to see the performance that antlr4 has.
Some result of my tests:
Antlr4
I just ran one test with a very simple query because I needed more time to adapt the grammar from GraphQL
to GraphQL+-
to perform different test cases, but after I saw the bad result I didn’t spend more time.
BenchmarkQueryParser/test-4 2000 986301 ns/op 205508 B/op 5806 allocs/op
Current Parser
I took queries from test files and also I took the queries defined on https://wiki.dgraph.io/Query_Language and ran some tests with the following results:
--- FAIL: BenchmarkCurrentQuery/directors
--- FAIL: BenchmarkCurrentQuery/movies
BenchmarkCurrentQuery/filters-4 100000 11594 ns/op 6600 B/op 73 allocs/op
BenchmarkCurrentQuery/geq1-4 200000 12165 ns/op 6192 B/op 63 allocs/op
--- FAIL: BenchmarkCurrentQuery/date
BenchmarkCurrentQuery/alias-4 300000 5028 ns/op 4688 B/op 41 allocs/op
BenchmarkCurrentQuery/first_after-4 100000 10555 ns/op 6672 B/op 64 allocs/op
BenchmarkCurrentQuery/offset-4 100000 10365 ns/op 6640 B/op 64 allocs/op
BenchmarkCurrentQuery/generic-4 200000 9136 ns/op 6224 B/op 61 allocs/op
BenchmarkCurrentQuery/count-4 200000 5789 ns/op 4976 B/op 45 allocs/op
BenchmarkCurrentQuery/allof-4 200000 8286 ns/op 5776 B/op 59 allocs/op
--- FAIL: BenchmarkCurrentQuery/root
BenchmarkCurrentQuery/anyof-4 200000 8972 ns/op 6032 B/op 63 allocs/op
--- FAIL: BenchmarkCurrentQuery/anyof_at_root
BenchmarkCurrentQuery/leq-4 200000 11685 ns/op 6096 B/op 63 allocs/op
BenchmarkCurrentQuery/geq-4 100000 10667 ns/op 6128 B/op 63 allocs/op
--- FAIL: BenchmarkCurrentQuery/geo-near
--- FAIL: BenchmarkCurrentQuery/geo-within
--- FAIL: BenchmarkCurrentQuery/geo-contains
--- FAIL: BenchmarkCurrentQuery/geo-intersects
BenchmarkCurrentQuery/filters-or-4 100000 11191 ns/op 6448 B/op 75 allocs/op
BenchmarkCurrentQuery/filters-and-4 100000 11278 ns/op 6448 B/op 75 allocs/op
BenchmarkCurrentQuery/order-4 200000 10751 ns/op 5952 B/op 55 allocs/op
BenchmarkCurrentQuery/orderdesc-4 200000 9308 ns/op 6048 B/op 56 allocs/op
Try to not consider the FAIL cases, I had to spend more time configuring the tests with the indexes to get some result on thoses cases, but I didn’t because the results are pretty clear. As you see, the performance between antlr4
and the Current Parser
has a difference of around 90x, maybe I could have done some improvement in the grammar used with antlr4
but I doubt to get something like the current parser.
Other tests
I also tried another code generator who parse PEG grammar, so I adapted part of the antlr4
grammar and I added some new non-terminals and terminals to be able to parse the syntax of GraphQL+-
, I don’t believe that it’s completed, but it can parse the examples of the Wiki and from some defined tests on git repo (gql directory). The results are obtained using a smaller AST as default.
BenchmarkQuery/directors-4 100000 13685 ns/op 9013 B/op 26 allocs/op
BenchmarkQuery/movies-4 100000 10945 ns/op 4917 B/op 25 allocs/op
BenchmarkQuery/filters-4 200000 7176 ns/op 4021 B/op 25 allocs/op
BenchmarkQuery/geq1-4 200000 6781 ns/op 3893 B/op 25 allocs/op
BenchmarkQuery/date-4 200000 6910 ns/op 3893 B/op 25 allocs/op
BenchmarkQuery/alias-4 300000 3775 ns/op 3541 B/op 26 allocs/op
BenchmarkQuery/first_after-4 200000 6822 ns/op 3765 B/op 25 allocs/op
BenchmarkQuery/offset-4 200000 6706 ns/op 3765 B/op 25 allocs/op
BenchmarkQuery/generic-4 200000 6219 ns/op 3637 B/op 25 allocs/op
BenchmarkQuery/count-4 300000 3750 ns/op 3189 B/op 25 allocs/op
BenchmarkQuery/allof-4 200000 5184 ns/op 3509 B/op 25 allocs/op
BenchmarkQuery/root-4 300000 4398 ns/op 3349 B/op 25 allocs/op
BenchmarkQuery/anyof-4 200000 5366 ns/op 3509 B/op 25 allocs/op
BenchmarkQuery/anyof_at_root-4 300000 4583 ns/op 3349 B/op 25 allocs/op
BenchmarkQuery/leq-4 200000 5947 ns/op 3637 B/op 25 allocs/op
BenchmarkQuery/geq-4 200000 5963 ns/op 3765 B/op 25 allocs/op
BenchmarkQuery/geo-near-4 300000 4339 ns/op 3445 B/op 25 allocs/op
BenchmarkQuery/geo-within-4 200000 6859 ns/op 4277 B/op 25 allocs/op
BenchmarkQuery/geo-contains-4 300000 4399 ns/op 3509 B/op 25 allocs/op
BenchmarkQuery/geo-intersects-4 200000 6827 ns/op 4277 B/op 25 allocs/op
BenchmarkQuery/filters-or-4 200000 6226 ns/op 3765 B/op 25 allocs/op
BenchmarkQuery/filters-and-4 200000 6221 ns/op 3765 B/op 25 allocs/op
BenchmarkQuery/order-4 200000 5839 ns/op 3509 B/op 25 allocs/op
BenchmarkQuery/orderdesc-4 200000 5852 ns/op 3573 B/op 25 allocs/op
These results are only of the lexer and parser, some additional time, used memory and allocations should be required to create the gql.Result
who is obtained from the current parser.
Summary
- About the antlr code generator
If the project is looking performance, I don’t believe that using the antlr4
generator is a good approach (at least for the go code, I don’t know the performance for other languages), you can use other generator like yacc
(included in go tool
), the peg
generator or another kind of generator. I didn’t test yacc, but if it’s needed a complete comparison, some additional time is required to test it.
Probably, some additional effort will be required to convert antlr4 (from graphql) or ebnf grammar (both used as spec on open cypher) to another more efficient to use on Dgraph.