A note about Benchmark with Triples per second

Main reference: LargeTripleStores - W3C Wiki

These numbers seem to be a good direction on the benchmarks that we should make - Despite being relatively old values (Some are over a decade old), they are still values that seem to be the goal of the GraphDBs market. The list starts with GraphDBs that are capable of handling easily between 70M triples, 100M, 200M, 500M, 1.7B, 7B up to 1 + Trillion triples.

Apparently Dgraph would be able to handle easily billions of triples. In terms of writing it is similar to Jena TDB (Dgraph in normal mode) and Blazegraph which writes 241k triples per second up to the Stardog (50B) 500k triples/sec - compared to Dgraph ludicrous mode.

Of course, I am assuming these values in relation to Dgraph, based on what I have already had experience with my own setup. It would be good to do tests similar to those on the list, with the same processing and RAM capabilities they had used. And hit the hammer.

I got 400k RDF/s in ludicrous mode using Ryzen 7 Ryzen 7 2700X. We can go even beyond.

Virtually every test on the list uses server processors capable of performing many tasks at the same time and supporting terabytes of RAM. This is not possible in a common Setup. I saw 256G of RAM, 1TB RAM up to 2TB RAM. That’s a LOT of RAM.

In our blog, we claim that we had close to 1M edges/sec (818k edges/sec). Which is insane! Maybe we are even higher than this today. In that test, we had used 488GB RAM and 64 cores.

AllegroGraph Claims to be able to perform 829k triples per second. Which is very similar to ours. With way less resources. They used 2 Terabytes of RAM.

Dream Benchmark

It would be nice to get a chance to test on something like this (it is new, but look at these numbers).
688* Billion/Edges per sec! Imagine Dgraph running in something like this!

This would load any Dgraph’s dataset in milliseconds. Or Terabytes of triples in a few minutes.

Good old article to read.

On June 7, 2011, - AllegroGraph had announced that they were the first Triple technology to reach the billion triple load in less than a week.

Reference: AllegroGraph and Intel's E7-8870 Xeon Processor

Interesting comment on the article: -Dr. Aasman said, “Some people have asked, “Why not do this on a distributed cloud system with Hadoop?” The quick answer: NoSQL databases like Hadoop and Cassandra fail on joins.

“The 310 billion triple result that Franz is announcing today was achieved in only two weeks of access (actual loading time of just over 78 hours) to an 8-socket Intel Xeon E7-8870 processor-based server system configured with 2 terabytes of physical memory and 22 terabytes of physical disk.”

But, basically, 9 years ago they needed a powerful setup to run this.

Processor: Intel Xeon E7-8870
RAM: 2 terabytes
HDD: 22 terabytes

Maths

310 billion triple / 78 hours = 3.974.358.974,35 RDF/hour (3Bi triples per hour)
Per minute = 66.239.316,23 RDF/min
Per second = 1.103.988,60 RDF/s

They also have announced a trillion load see at LargeTripleStores - W3C Wiki

Total load was 1,009,690,381,946 triples in just over 338 hours for an average rate of 829,556 triples per second.

Spark running on GPU vs. CPU. 200GB of data (CSV) It takes 7 min in GPU

If someone wants to use CUDA/GPU… I have a library/experience for that :slight_smile:

And if Dgraph wants to acquire a A100… I know how to get one too :wink:

1 Like

Sounds like a hot knife on the butter.

We definitely should test how Dgraph behaves with GPUs.