Slow performance on ~ 100M vertices graph from Wikidata

Hello, and congratulation on the great project.

I am evaluating the possible use of Dgraph for a solution within my company.

I started from a JSON Wikidata dump, simplified the structure a bit by removing references, using English data only, and keeping all the connecting properties while removing many scalar ones, as they are not interesting to us. The final result has around 100M vertices and more than 1B edges; I imported using bulk mode.

I deployed a single physical (on-premise) machine cluster with three zero and six alpha nodes, split in two groups, using Docker; the machine has 48 cores and 64GB RAM. Now, running a term query on a property called label (schema entry: string @index(term) ) I get unacceptable slowness (see timing below, 1’ 41"). I am also noting that, while running the query, the system seems to be mostly idle, check the CPU metric after the command.
I am adding some details below; if this is interesting for you, I can provide the entire database (bulk loaded), as well as my docker-compose.yml, just let me know.
Thanks so much, have a great day
giuseppe

Details root@d055ae8f9bc8:/data# dgraph version

Dgraph version : v21.03.1
Dgraph codename : rocket-1
Dgraph SHA-256 : a00b73d583a720aa787171e43b4cb4dbbf75b38e522f66c9943ab2f0263007fe
Commit SHA-1 : ea1cb5f35
Commit timestamp : 2021-06-17 20:38:11 +0530
Branch : HEAD
Go version : go1.16.2
jemalloc enabled : true

time curl -X POST http://localhost:28080/query -H ‘Content-Type: application/dql’ -d ‘{ var( func: allofterms(label, “General
Motors Company”) ) { A as QID } var( func: allofterms(aliases, “General Motors Company”) ) { B as QID } WD( func: uid(A,
B) ) { name label aliases description P31 { QID } P279 { QID } } }’

{“data”:{“WD”:[{“name”:“Q29570”,“label”:“Chevrolet”,“aliases”:[“Chevy”,“Chevrolet Division of General Motors Company y Nahuelito Gimenez”],“description”:“American automobile division of GM”,“P31”:[{“QID”:“Q10429667”},{“QID”:“Q334453”},{“QID”:“Q786820”}]},{“name”:“Q81965”,“label”:“General Motors”,“aliases”:[“GM”,“General Motors Corporation”,“General Motors Company, LLC”,“GMC”],“description”:“automotive manufacturing corporation based in Detroit, Michigan, USA”,“P31”:[{“QID”:“Q891723”},{“QID”:“Q21980538”},{“QID”:“Q786820”}]},{“name”:“Q67604297”,“label”:“Pilot Life Insurance Company, Petitioner v. Everate W. Dedeaux [Case 85-1043] / Metropolitan Life Insurance Company, Petitioner v. Arthur Taylor [Case 85-686] / General Motors Corporation, Petitioner v. Arthur Taylor. [Case 85-688 (NAID 118974763)”,“description”:“item in the National Archives and Records Administration’s holdings”,“P31”:[{“QID”:“Q18593264”}]},{“name”:“Q19035456”,“label”:“Pick Manufacturing Company v. General Motors Corporation”,“P31”:[{“QID”:“Q2334719”}]},{“name”:“Q59356336”,“label”:“224 West 57th Street”,“aliases”:[“Demarest and Peerless Company Building”,“Demarest \u0026 Peerless Building”,“General Motors Building”,“A. T. Demarest \u0026 Company Building”,“Argonaut Building”,“Peerless Motor Car Company Building”],“description”:“Commercial building in Manhattan, New York”,“P31”:[{“QID”:“Q1021645”}]},{“name”:“Q26349054”,“label”:“HHE determination report no. HHE-73-73-143, Inland Manufacturing Company, General Motors Corporation, Dayton, Ohio”,“description”:“field study”,“P31”:[{“QID”:“Q1402850”}]},{“name”:“Q67604509”,“label”:“Eastman Kodak Company, Petitioner v. Image Technical Services, Inc., et al. [Case 90-1029] / General motors Corporation, et al., Petitioners v. Evert Romein, et al. [Case 90-1390] / United States, Petitioner v. R. L. C. [Case 90-1 (NAID 118974953)”,“description”:“item in the National Archives and Records Administration’s holdings”,“P31”:[{“QID”:“Q18593264”}]}]},“extensions”:{“server_latency”:{“parsing_ns”:928457,“processing_ns”:101801347836,“encoding_ns”:145195,“assign_timestamp_ns”:724950,“total_ns”:101803227071},“txn”:{“start_ts”:1400},“metrics”:{“num_uids”:{"":0,“P279”:7,“P31”:7,“QID”:16,"_total":58,“aliases”:7,“description”:7,“label”:7,“name”:7}}}}

0.01s user 0.02s system 0% cpu 1:41.86 total

2 Likes

why two equal blocks?

Change it to this bellow and let me know.

{
  var( func: eq(label, “General”) ) { 
    A as QID 
  } 
  var( func: eq(label, “Motors”) ) { 
    B as QID 
  }
  var( func: eq(label, “Company”) ) { 
    C as QID 
  } 
    
      WD( func: uid(A,B,C) ) {
        name 
        label 
        aliases 
        description 
        P31 { 
          QID
        } 
        P279 { 
          QID
        } 
      }
}

If you need, use allofterms for each block.

1 Like

Hi Michel,
thanks for your prompt reply. There were no two equal blocks in my query, I just forgot to add that I am also querying another property named aliases ([string] @index(term) .) - apologies for the mistake. In short, the query is supposed to retrieve all the items that contain all the terns “General”, “Motors”, “Company” within the same field, either a label or an alias.
I just tried your query without any change, although of course it wouldn’t provide the results I am looking for; the outcome is really bad, I stopped the process after more than 10 minutes.
I also found out what alpha group is serving the “label” tablet, and that’s alpha4…6; I repeated the query pointing to alpha4 directly, but the results were the same.
Thanks,
pepi

1 Like

@pepi can you please share your docker file. I suspect some config change can surely help you. Please share your schema too

1 Like

Hello Aman,
thanks for following up. I don’t have permission to upload content here, so you can download my schema at https://esusa.s3.amazonaws.com/public/dgraph_schema.txt
and my docker-compose at https://esusa.s3.amazonaws.com/public/dgraph_docker-compose.yml
Thanks, have a great day
pepi

1 Like

@aman-bansal were you able to download schema and compose files? As written before, I can provide you the entire database if that can help.
Thanks,
pepi

1 Like

@ahsan can you help @pepi here

1 Like

Hey @pepi, Can you please try using the upcoming dgraph release https://github.com/dgraph-io/dgraph/tree/release/v21.09

Dgraph 21.09 is expected to be released by the end of this month for the public. But you can give it a try to see how it performs. It has a bunch of optimizations.

I’d recommend you to enable posting cache, by setting the following flag in alpha.
dgraph alpha --cache "size-mb=20000; percentage=50,30,20;"

1 Like

Hi Ahsan, will do and let you know. Thanks!
pepi

2 Likes

@ahsan @aman-bansal

with this new release things are WAY better:

{"data":{"WD":[{"name":"Q29570","label":"Chevrolet","aliases":["Chevy","Chevrolet Division of General Motors Company y Nahuelito Gimenez"],"description":"American automobile division of GM","P31":[{"QID":"Q334453"},{"QID":"Q10429667"},{"QID":"Q786820"}]},{"name":"Q81965","label":"General Motors","aliases":["GM","General Motors Corporation","General Motors Company, LLC","GMC"],"description":"automotive manufacturing corporation based in Detroit, Michigan, USA","P31":[{"QID":"Q891723"},{"QID":"Q21980538"},{"QID":"Q786820"}]},{"name":"Q67604297","label":"Pilot Life Insurance Company, Petitioner v. Everate W. Dedeaux [Case 85-1043] / Metropolitan Life Insurance Company, Petitioner v. Arthur Taylor [Case 85-686] / General Motors Corporation, Petitioner v. Arthur Taylor. [Case 85-688 (NAID 118974763)","description":"item in the National Archives and Records Administration's holdings","P31":[{"QID":"Q18593264"}]},{"name":"Q19035456","label":"Pick Manufacturing Company v. General Motors Corporation","P31":[{"QID":"Q2334719"}]},{"name":"Q59356336","label":"224 West 57th Street","aliases":["Demarest and Peerless Company Building","Demarest \u0026 Peerless Building","General Motors Building","A. T. Demarest \u0026 Company Building","Argonaut Building","Peerless Motor Car Company Building"],"description":"Commercial building in Manhattan, New York","P31":[{"QID":"Q1021645"}]},{"name":"Q26349054","label":"HHE determination report no. HHE-73-73-143, Inland Manufacturing Company, General Motors Corporation, Dayton, Ohio","description":"field study","P31":[{"QID":"Q1402850"}]},{"name":"Q67604509","label":"Eastman Kodak Company, Petitioner v. Image Technical Services, Inc., et al. [Case 90-1029] / General motors Corporation, et al., Petitioners v. Evert Romein, et al. [Case 90-1390] / United States, Petitioner v. R. L. C. [Case 90-1 (NAID 118974953)","description":"item in the National Archives and Records Administration's holdings","P31":[{"QID":"Q18593264"}]}]},"extensions":{"server_latency":{"parsing_ns":125970,"processing_ns":1005928230,"encoding_ns":478410,"assign_timestamp_ns":976507,"total_ns":1007762179},"txn":{"start_ts":1010019},"metrics":{"num_uids":{"":0,"P279":7,"P31":7,"QID":16,"_total":58,"aliases":7,"description":7,"label":7,"name":7}}}}0.00user 0.00system 0:01.04elapsed 1%CPU (0avgtext+0avgdata 11988maxresident)k
2024inputs+0outputs (24major+627minor)pagefaults 0swaps

Version information:

Dgraph version   : v21.09.0-rc2
Dgraph codename  : unnamed
Dgraph SHA-256   : e996a2843330e4055d9a738b449507ee5b99107eb21ee5e18b20a5680105aa0e
Commit SHA-1     : a515d0d
Commit timestamp : 2021-09-02 22:28:31 +0530
Branch           : release/v21.09
Go version       : go1.16.3
jemalloc enabled : true

This was achieved running bare-metal and without the advanced command line options suggested by @ahsan , I will try those next and then inside Docker and let you know.
Good job guys! Thanks,
pepi

2 Likes

@pepi We’re glad that you were able to get good performance out of Dgraph v21.09. You’d see even better performance with the posting cache enabled. :slight_smile:

2 Likes

@pepi hi, how was your progress with wikidata dump? if possible, can you please share your final configuration and preprocessing code? thanks

2 Likes

Hi @gaurav02 ,
I made good progress; in fact, I was able to import all the Wikidata concepts with a limited set of properties (so far); it’s 100M vertices and 1.4B edges.
I can share my configuration and the code, just give me a few days for gathering and cleaning up; I’ll follow up in this thread. Thanks,
pepi

2 Likes

hi @pepi sure sounds good. thanks

1 Like

With the SPARK connector you would be able to get PageRanks out of all of them too :slight_smile:

1 Like

Hello,

I found some time to create a public repo with my scripts, available at https://github.com/pepistrafforello/awr_kb.git
I still need to add details on the servers configuration (development and deployment), but you may start checking this out and giving me feedback in case I missed anything (unfortunately I don’t have time to test).
Thanks, have a great day
pepi

1 Like

I just added a section with HW/SW information, that should suffice; please, let me know if you need anything else.
Thanks,
pepi

1 Like

hi @pepi thanks, this is great… one minor thing, is awr_kb/DGraph/docker missing in your repository, or is “DGraph” needed from elsewhere? Thanks

1 Like

Hello @gaurav02 ,
sorry I could not get back to you sooner. Eventually, in my experiments I took Docker away in order to get a clearer the picture. This is how I am currently running alpha, single process, single shard:

dgraph alpha --zero=localhost:5080 --security whitelist=0.0.0.0/0 --cache size-mb=4000;percentage=50,30,20; --cwd /data/20210729.v21.09.0-rc2_single/alpha1 --port_offset=0 --logtostderr --expose_trace --profile_mode block --block_rate 10 -v=2 --telemetry sentry=false

Please, let me know if you need further information; unfortunately, my time is very limited but I’ll try to respond.
Thanks,
pepi