Performances: gRPC routines blocking times

Hello,

I wrote a small Go program that creates, in an empty database, about 200 nodes and 200 edges using the Go client and dGraph nightly builds.
I takes about 17 seconds!! on my Intel® Core™ i5-3470 CPU @ 3.20GHz.

I ran profiling tests and CPU is barely used:

Type: cpu
Duration: 17.21s, Total samples = 150ms ( 0.87%)

But routine profiling shows that time is spent by gRPC calls:

Showing nodes accounting for 35.49s, 100% of 35.49s total
Dropped 43 nodes (cum <= 0.18s)
Showing top 20 nodes out of 24
      flat  flat%   sum%        cum   cum%
    35.49s   100%   100%     35.49s   100%  runtime.selectgo
         0     0%   100%      3.14s  8.84%  github.com/dgraph-io/dgo.(*Dgraph).Alter
         0     0%   100%      4.08s 11.49%  github.com/dgraph-io/dgo.(*Txn).Commit
         0     0%   100%     10.40s 29.31%  github.com/dgraph-io/dgo.(*Txn).Mutate
         0     0%   100%      3.14s  8.84%  github.com/dgraph-io/dgo/protos/api.(*dgraphClient).Alter
         0     0%   100%      4.08s 11.49%  github.com/dgraph-io/dgo/protos/api.(*dgraphClient).CommitOrAbort
         0     0%   100%     10.40s 29.31%  github.com/dgraph-io/dgo/protos/api.(*dgraphClient).Mutate
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc.(*ClientConn).Invoke
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc.(*clientStream).RecvMsg
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc.(*csAttempt).recvMsg
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc.Invoke
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc.invoke
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc/transport.(*Stream).RecvCompress
         0     0%   100%     17.75s 50.01%  google.golang.org/grpc/transport.(*Stream).waitOnHeader
         0     0%   100%     17.74s 49.99%  google.golang.org/grpc/transport.(*controlBuffer).get
         0     0%   100%     17.74s 49.99%  google.golang.org/grpc/transport.(*loopyWriter).run
         0     0%   100%     17.74s 49.99%  google.golang.org/grpc/transport.newHTTP2Client.func3

I do not insert data through multi-threading yet, but I strongly feel times should be greatly shorter.

Indexes involved in my schema are:

            label:              string @index(term) @upsert .
            source:             uid @reverse @count .
            target:             uid @reverse @count .

If I check CPU profile on dGraph exposed endpoint during insertions, CPU is lightly used:

Duration: 30s, Total samples = 730ms ( 2.43%)

I’m quite new to Go, dGraph and Go profiling, so maybe I missed something?

Thank you

you make term index first ,and your edge predicates are @reverse and @count these will take long time
if you check data before insert it will take more time. may be @upsert is the check
see this issue https://github.com/dgraph-io/dgraph/issues/2385

:soccer:
that’s good but when you try @upsert, it should be used in indexed predicate

Thank you for your tips.

I did more tests on my laptop which has a better cpu (Intel® Core™ i7-7700HQ CPU @ 2.80GHz) and a SSD drive.

I also benched only the queries/inserts, excluding initialization and schema creation.
I optimized a bit my queries and now I get 1 sec for the same result.
If I remove @reverse and/or @count it goes down to 0.9s.
I can optimize even further the queries, I’ll try that, but the perfs are already ok I think.

BTW, is it possible to have a @reverse index but setting the @count index for only one direction?

you can change your schema when insert data is compeleting

what’s this mean, i can’t understand.

you can modify your schema whenever you want, but it will not change the data already inserted into dgraph to fit this new schema.
it’s only for new data

I think old data will still suit for new schema

Maybe i missed something here, if you have question, just refer this.

If no data has been stored for the predicates, a schema mutation sets up an empty schema ready to receive triples.

If data is already stored before the mutation, existing values are not checked to conform to the new schema. On query, Dgraph tries to convert existing values to the new schema types, ignoring any that fail conversion.

If data exists and new indices are specified in a schema mutation, any index not in the updated list is dropped and a new index is created for every new tokenizer specified.

Reverse edges are also computed if specified by a schema mutation.

Sorry I haven’t been clear.
What I’d like is to have a specific predicate with a reverse index, and also a count index but only on the reverse side.
For instance, on the ‘source’ predicate, I need the ‘~source’ one and I need to be able to ‘count(~source)’, but I don’t need to ‘count(source)’.
So I’d like to avoid creating that useless index for my application.

After a big optimization by merging individual requests into a few big ones, it only takes 60ms now.
I think it is a very good performance.

I still don’t know how to control indexes on reverse predicates though.

batch request is really a good choice to improve perfmance

@count is only for one direction. There is no @ count.only.reverse.

What exists is that if you use a query in reverse, it will eventually return content indexed by @count. If you do not want to return @Count in a query using @reverse, just do not request it on your query.

Indexing is done in general, there is no exclusive indexing for @reverse. @reverse is an indexing itself.

@myo Basically what I tried is to count for both sides…
When I don’t use a reverse edge in friends, like:

{ everyone(func: anyofterms(name, "Michael")) {
    name age  uid
    friend {
      name age  uid count(uid)
      friend {expand(_all_) uid }
} } }

the result for count is :

{      "count": 2     } 

But similarly when I use a reverse edge on friend, like:

{ everyone(func: anyofterms(name, "Michael")) {
    name age  uid
    ~friend {
      name age  uid count(uid)
      friend {expand(_all_) uid }
} } }

the result for count is :

{      "count": 1     } 

I guess it depends on your query for what you need a count. You need to specify in query itself for which edge u need a count and While setting Schema, it is not possible for one to set count in one predicate using reverse edge to set count for any one these i.e. for edge or for reverse edge.
In schema you set count for the particular predicates and not its indices. I hope it helps you to clear your doubt. :smile:

Thank you for the explanation.

Now I get it: count(~pred) is equivalent to count(pred) and thus I cannot count outgoing reverse predicates. Too bad. I’ll have to create new predicates in the reverse direction with the @count index on them.

can you show your code?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.