Live loader started at 8,000 RDF/s and slowly decreases

Hey guys – currently trying v1.0.7 on GKE with an SSD datadir. I am loading a 5GB (gzipped, otherwise 50GB) RDF import using live loader, not bulk.

It started at 8,000 RDF/s and it is now almost two hours later, at 2,000 per second.

[...]
Total Txns done:    18036 RDFs per second:    2853 Time Elapsed: 1h45m22s, Aborts: 0
Total Txns done:    18041 RDFs per second:    2853 Time Elapsed: 1h45m24s, Aborts: 0
Total Txns done:    18045 RDFs per second:    2853 Time Elapsed: 1h45m26s, Aborts: 0
Total Txns done:    18049 RDFs per second:    2852 Time Elapsed: 1h45m28s, Aborts: 0
Total Txns done:    18050 RDFs per second:    2852 Time Elapsed: 1h45m30s, Aborts: 0
Total Txns done:    18055 RDFs per second:    2851 Time Elapsed: 1h45m32s, Aborts: 0
Total Txns done:    18057 RDFs per second:    2851 Time Elapsed: 1h45m34s, Aborts: 0
Total Txns done:    18061 RDFs per second:    2851 Time Elapsed: 1h45m36s, Aborts: 0
Total Txns done:    18064 RDFs per second:    2850 Time Elapsed: 1h45m38s, Aborts: 0
Total Txns done:    18065 RDFs per second:    2849 Time Elapsed: 1h45m40s, Aborts: 0
Total Txns done:    18068 RDFs per second:    2849 Time Elapsed: 1h45m42s, Aborts: 0
Total Txns done:    18070 RDFs per second:    2848 Time Elapsed: 1h45m44s, Aborts: 0
Total Txns done:    18070 RDFs per second:    2847 Time Elapsed: 1h45m46s, Aborts: 0
Total Txns done:    18070 RDFs per second:    2847 Time Elapsed: 1h45m48s, Aborts: 0
Total Txns done:    18070 RDFs per second:    2846 Time Elapsed: 1h45m50s, Aborts: 0
Total Txns done:    18071 RDFs per second:    2845 Time Elapsed: 1h45m52s, Aborts: 0
Total Txns done:    18073 RDFs per second:    2844 Time Elapsed: 1h45m54s, Aborts: 0
Total Txns done:    18074 RDFs per second:    2844 Time Elapsed: 1h45m56s, Aborts: 0
Total Txns done:    18076 RDFs per second:    2843 Time Elapsed: 1h45m58s, Aborts: 0
Total Txns done:    18079 RDFs per second:    2843 Time Elapsed: 1h46m0s, Aborts: 0
Total Txns done:    18081 RDFs per second:    2842 Time Elapsed: 1h46m2s, Aborts: 0
Total Txns done:    18083 RDFs per second:    2841 Time Elapsed: 1h46m4s, Aborts: 0
Total Txns done:    18085 RDFs per second:    2841 Time Elapsed: 1h46m6s, Aborts: 0
Total Txns done:    18088 RDFs per second:    2840 Time Elapsed: 1h46m8s, Aborts: 0
Total Txns done:    18091 RDFs per second:    2840 Time Elapsed: 1h46m10s, Aborts: 0
Total Txns done:    18093 RDFs per second:    2839 Time Elapsed: 1h46m12s, Aborts: 0
Total Txns done:    18095 RDFs per second:    2839 Time Elapsed: 1h46m14s, Aborts: 0

Machine is 2 vCPU and 13GB RAM – just wondering if this is normal behavior.

Thanks

The bulk loader is significantly faster than the live loader. For a 5 GB gzipped data set that’s the expected behavior with live loading.

We’re continuing to work on improvements to live loading performance. For such a large data set such as yours, you’d be better off using the bulk loader.

I’d imagine your data set includes a lot of indices, which would require more processing as the live load continues to update all the indices as data is coming in.

Using it now but now facing a new problem with my dgraph bulk loading: Dgraph bulk loading in a Kubernetes setup (GKE) - bind: address already in use · Issue #2534 · dgraph-io/dgraph · GitHub

I’ve replied to your issue. Use a different port that’s not bound within the pod for the --http endpoint and you should be good to go.

1 Like

We have decreased the number of concurrent requests to avoid some problems, but you can increase them to test. Just use the Flag -c `Dgraph live -c 12` Before it was at 100, you could try bigger numbers if you prefer.