RFC: Streaming RDF queries

Motivation:

Creating a JSON tree for the bigger result is CPU and memory-intensive operation. So, Dgraph should be able to stream the query results, instead of sending the result in one go.

Implementation

After post query aggregation, we can iterate over every subgraph to stream the Nquad relations.
Let’s look at an implementation example for the given query.

{
  me(func: anyofterms(name, "Michael Martin Alejandro")) {
    friends @filter(ge(age, "20")) {
      name
    }
  }
}

For the above query, the subgraph will be constructed as below.

subgraph 1

 destination id as 21,31,35

We’ll skip the root graph since it is given by the user.

subgraph 2

10 (src Id) + friend => [21, 31, 35] (posting list for the uid 10)
20 (src Id) + friend => [24, 31, 34] (posting list for the uid 20)
30 (src Id) + friend => [20, 34] (posting list for the uid 30)

From this graph onwards we’ll stream the src ID to uid matrix relation. For example, for the src id 10,
we’ll stream the nquads as

<10> <friend> <21>
<10> <friend> <31>
<10> <friend> <35> 

subgraph 3

21 (src Id) + name => ["Jinnah"] (value list for the uid 21)
31 (src Id) + name => ["Karrupan"] (value list for the uid 31)
35 (src Id) + name => ["Vellaiyan"] (value list for the uid 35)

For this subgraph, we don’t have the uidMatrix. We have the corresponding value matrix. So, we’ll stream the src ID to value matrix relationship as

<21> <name> "Jinnah" 
<31> <name> "Karrupan" 
<35> <name> "Vellaiyan" 

Grpc Interface

rpc QueryNquads(Request) (returns stream []*Nquads)

Reference

srcID, Uid Matrix relationship: http://discuss.dgraph.io/t/srcuids-uidmatrix-and-destuids-in-subgraph/5033

cc: @vvbalaji @mrjn @pawan @ashishgoswami

Do you need another Grpc endpoint? I think just not having to generate the JSON tree should be sufficient. Try that first.

1 Like