Performances: gRPC routines blocking times

myo · June 13, 2018, 9:38am

Hello,

I wrote a small Go program that creates, in an empty database, about 200 nodes and 200 edges using the Go client and dGraph nightly builds.
I takes about 17 seconds!! on my Intel® Core™ i5-3470 CPU @ 3.20GHz.

I ran profiling tests and CPU is barely used:

Type: cpu
Duration: 17.21s, Total samples = 150ms ( 0.87%)

But routine profiling shows that time is spent by gRPC calls:

Showing nodes accounting for 35.49s, 100% of 35.49s total
Dropped 43 nodes (cum <= 0.18s)
Showing top 20 nodes out of 24
      flat  flat%   sum%        cum   cum%
    35.49s   100%   100%     35.49s   100%  runtime.selectgo
         0     0%   100%      3.14s  8.84%  github.com/dgraph-io/dgo.(*Dgraph).Alter
         0     0%   100%      4.08s 11.49%  github.com/dgraph-io/dgo.(*Txn).Commit
         0     0%   100%     10.40s 29.31%  github.com/dgraph-io/dgo.(*Txn).Mutate
         0     0%   100%      3.14s  8.84%  github.com/dgraph-io/dgo/protos/api.(*dgraphClient).Alter
         0     0%   100%      4.08s 11.49%  github.com/dgraph-io/dgo/protos/api.(*dgraphClient).CommitOrAbort
         0     0%   100%     10.40s 29.31%  github.com/dgraph-io/dgo/protos/api.(*dgraphClient).Mutate
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc.(*ClientConn).Invoke
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc.(*clientStream).RecvMsg
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc.(*csAttempt).recvMsg
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc.Invoke
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc.invoke
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc/transport.(*Stream).RecvCompress
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc/transport.(*Stream).waitOnHeader
         0     0%   100%     17.74s 49.99%  google.golang.org/grpc/transport.(*controlBuffer).get
         0     0%   100%     17.74s 49.99%  google.golang.org/grpc/transport.(*loopyWriter).run
         0     0%   100%     17.74s 49.99%  google.golang.org/grpc/transport.newHTTP2Client.func3

I do not insert data through multi-threading yet, but I strongly feel times should be greatly shorter.

Indexes involved in my schema are:

            label:              string @index(term) @upsert .
            source:             uid @reverse @count .
            target:             uid @reverse @count .

If I check CPU profile on dGraph exposed endpoint during insertions, CPU is lightly used:

Duration: 30s, Total samples = 730ms ( 2.43%)

I’m quite new to Go, dGraph and Go profiling, so maybe I missed something?

Thank you

jcbms · June 15, 2018, 6:24am

you make term index first ,and your edge predicates are @reverse and @count these will take long time
if you check data before insert it will take more time. may be @upsert is the check
see this issue schema with @reverse @count in one predicate and without, the speed is total different · Issue #2385 · dgraph-io/dgraph · GitHub

shanghai-Jerry · June 15, 2018, 6:59am

that’s good but when you try @upsert, it should be used in indexed predicate

myo · June 15, 2018, 9:04am

Thank you for your tips.

I did more tests on my laptop which has a better cpu (Intel® Core™ i7-7700HQ CPU @ 2.80GHz) and a SSD drive.

I also benched only the queries/inserts, excluding initialization and schema creation.
I optimized a bit my queries and now I get 1 sec for the same result.
If I remove @reverse and/or @count it goes down to 0.9s.
I can optimize even further the queries, I’ll try that, but the perfs are already ok I think.

BTW, is it possible to have a @reverse index but setting the @count index for only one direction?

jcbms · June 15, 2018, 9:29am

you can change your schema when insert data is compeleting

shanghai-Jerry · June 15, 2018, 9:49am

what’s this mean, i can’t understand.

you can modify your schema whenever you want, but it will not change the data already inserted into dgraph to fit this new schema.
it’s only for new data

jcbms · June 15, 2018, 9:55am

I think old data will still suit for new schema

shanghai-Jerry · June 15, 2018, 10:00am

Maybe i missed something here, if you have question, just refer this.

If no data has been stored for the predicates, a schema mutation sets up an empty schema ready to receive triples.

If data is already stored before the mutation, existing values are not checked to conform to the new schema. On query, Dgraph tries to convert existing values to the new schema types, ignoring any that fail conversion.

If data exists and new indices are specified in a schema mutation, any index not in the updated list is dropped and a new index is created for every new tokenizer specified.

Reverse edges are also computed if specified by a schema mutation.

myo · June 15, 2018, 10:46am

Sorry I haven’t been clear.
What I’d like is to have a specific predicate with a reverse index, and also a count index but only on the reverse side.
For instance, on the ‘source’ predicate, I need the ‘~source’ one and I need to be able to ‘count(~source)’, but I don’t need to ‘count(source)’.
So I’d like to avoid creating that useless index for my application.

myo · June 22, 2018, 8:23am

After a big optimization by merging individual requests into a few big ones, it only takes 60ms now.
I think it is a very good performance.

I still don’t know how to control indexes on reverse predicates though.

shanghai-Jerry · June 22, 2018, 9:35am

batch request is really a good choice to improve perfmance

MichelDiz · June 22, 2018, 4:43pm

@count is only for one direction. There is no @ count.only.reverse.

What exists is that if you use a query in reverse, it will eventually return content indexed by @count. If you do not want to return @Count in a query using @reverse, just do not request it on your query.

Indexing is done in general, there is no exclusive indexing for @reverse. @reverse is an indexing itself.

karan28aug · June 22, 2018, 5:14pm

@myo Basically what I tried is to count for both sides…
When I don’t use a reverse edge in friends, like:

{ everyone(func: anyofterms(name, "Michael")) {
    name age  uid
    friend {
      name age  uid count(uid)
      friend {expand(_all_) uid }
} } }

the result for count is :

{      "count": 2     }

But similarly when I use a reverse edge on friend, like:

{ everyone(func: anyofterms(name, "Michael")) {
    name age  uid
    ~friend {
      name age  uid count(uid)
      friend {expand(_all_) uid }
} } }

the result for count is :

{      "count": 1     }

I guess it depends on your query for what you need a count. You need to specify in query itself for which edge u need a count and While setting Schema, it is not possible for one to set count in one predicate using reverse edge to set count for any one these i.e. for edge or for reverse edge.
In schema you set count for the particular predicates and not its indices. I hope it helps you to clear your doubt.

myo · June 23, 2018, 10:56am

Thank you for the explanation.

myo · June 29, 2018, 12:06pm

Now I get it: count(~pred) is equivalent to count(pred) and thus I cannot count outgoing reverse predicates. Too bad. I’ll have to create new predicates in the reverse direction with the @count index on them.

BlankRain · July 4, 2018, 1:04am

can you show your code?

system · August 3, 2018, 1:05am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
V0.3 Release Notes Users	5	824	November 28, 2017
Queries are slow using the javaAPI Users	4	869	October 24, 2018
Go - Clients Documentation	0	547	August 28, 2020
Dgraph: JSON vs. Binary clients - Dgraph Blog Blog	0	947	August 18, 2017
Any performance benchmarks for different dgraph clients? Users kind:question	2	350	September 26, 2020

Performances: gRPC routines blocking times

Related Topics