I expect the size of each directory to be uniform . Is it my bulk json misconfiguration?
And If the directory is too large, the alpha startup time will be long .
the first group [out/0/p (314G)] startup cost time nearly an hour .
Can you give me some advice on how to start faster?
dgraph version
Dgraph version : v1.0.11-18-g4ca67ca
Commit SHA-1 : 4ca67ca
Commit timestamp : 2018-12-24 18:53:10 -0800
Branch : master
Go version : go1.11
sorry . I can’t give you actual data in use .
But, I wrote an program example to build data , It’s a very similar program.
You can do itself, and network copy can be avoided.
go run main.go --help
-c int
how many goroutine num for run (default 100)
-maxuid int
the uid random boundary value (default 650000000)
-num int
how many data you want to build (default 10000)
-type string
which type should build , the value you can select 'nodes' or 'edges' (default "nodes")
Hey , guys
I has been found bulk is used predicate to shard .
In this case, the first group[0] predicate is parent ,
the second group[1] predicate is _predicate_ ,
and the group[2] predicate is md5.
bulk will allocate all same predicate data to only one group , That’s the reason make a groups gets too big.
most of our actual data are bigger single predicate , as things are , with a large group the alpha can not work well , that means I can’t solve this problem by adding more machines.
Is dgraph bulk can support auto split the same predicate?
In addition , has the group functionality mentioned in the document been finished?
We currently do NOT split up a single predicate across groups. Each predicate must lie fully within a group. We have thought about potentially splitting up a predicate across groups, but that’s not a priority for now.
In fact, _predicate_ itself would be removed, once we have a stronger type system; because it is expensive to keep, and not transactionally sound.
Thanks for you reply .
It is a great pity that dgraph can not support bigger single predicate with a good solution now.
If predicate performance improvements at the expense of transactions , We are pleasure to accept it.
I’m looking forward to seeing that dgraph has the ability to process large data.
Can this requirement be added to Roadmap 2019 ?
I have the same problem. I have 18Billion edges and 8Billion RDFs ,when I start up the dgraph alpha ,it took nearly an hour , is there a good solution?
I have a similar problem when one of my predicates is really large. I wanted to run my dGraph cluster on lot of small servers (16GB RAM) but I had to change my strategy and switched from horizontal scaling to some kind of hybrid scaling with fewer servers and higher configuration (64GB RAM - vertical scaling).
At present, it can only be solved by improving the performance of single machine.
It’s going to be a very complicated job .
As you see , the team is great.