[WIP] Bulk and Live load testing

Note: This is a WIP. I will frequently update this space as tests are done

This page will capture the longevity tests done with bulk and live loader tools.

Environment

I am using a GCP machine with the following specs:

  1. Arch:

Linux paras-1 5.3.0-1026-gcp #28~18.04.1-Ubuntu SMP Sat Jun 6 00:09:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

The machines are outfitted with 1TB Disk space and 64GB memory and 16 cores.

  1. ulimit -n 1048576

I have changed the Max open files ulimit to 1048576 so that the bulk and live loader do not run into the too many open files error.

  1. DataSets

a. 1.344B RDFs: This dataset is the 21Million RDF data set concatenated 64 times. It is 56GB

b. 2.688B RDFs: 2 times (a) above. It is 112 GB.

  1. Cluster

1Z, 1A for Live

1Z for Bulk

Artifacts

For bulk test, I will collect the CPU, Heap and Block profiles on bulk process along with zero and bulk logs. top outputs are taken as well.
For live test, I will collect the CPU and Heap profiles on the the zero and alpha along with zero, alpha and live logs. top outputs are taken as well.

Test Results

Dgraph version v20.07.0-beta.Jun22

  1. Bulk Loader, 672M.rdf = PASS

  2. Bulk Loader, 1.344B rdf = FAIL (REDUCE phase blocked hogging memory)

  3. Bulk Loader, 2.688B rdf = FAIL (REDUCE phase blocked hogging memory)

Dgraph version v20.03.3

  1. Bulk Loader, 672M.rdf = PASS

  2. Bulk Loader, 1.344B rdf = FAIL (REDUCE phase blocked hogging memory)

Notes

  1. Results may vary depending on the data set, the amount of disk space and memory.

  2. Always make sure to bump up the ulimit -n to a high value.