I am trying to use Dgraph to create a BGP Hijack detection module. For this project I need to read thousands of packets a minute and for each packet I do 1 mutate and then a couple queries. My program wasn’t fast enough to keep up in real time with the network packets and the main problem is that the mutate is too slow
Each query takes about 0.0026 seconds while each mutate takes 0.037 seconds. I am only setting 1 edge in each mutate so it is not a lot of data that is changed. Is this expected behavior and is there any way I can make the mutations fast? Currently I am using the official dgraph python library for my code.
Batching would help you here. Instead of a single mutate per edge, you can have a single mutate for a batch of edges—say, 1000 edges. Batching 1000 edges per mutation is how Dgraph Live Loader achieves high throughput of loading data quickly into Dgraph.
What’s the machine specs of your Dgraph installation? That could have an impact on the latency and throughput you’re seeing.
What’s the order of magnitude of the number of packets per second? A few thousand per second? Tens of thousands? By default mutations go through consensus to maintain strict consistency for the entire cluster. If you don’t need that consistency for writes we can look into adding an option to disable Raft which should help you here.
In this case I cannot do batching because after every update I need to do a query on the current state of the machine, if I check for hijacks after a batch mutate the graph would look different then if I checked for updates after each individual mutate. Though I did just think of the idea to first query the current state of the edge and to only mutate if necessary, most network packets are just refreshing the current state.
The server running dgraph has an intel cpu running at 3.20GHz and 32 cores so I don’t think thats a problem. I gave the dgraph alpha 2048 MB but in my case since I am doing small transactions memory shouldn’t be an issue. Is CPU the only hardware I/O that could affect performance?
Well in the case I am testing now I am getting roughly 3,500 a second for 1 collector and there are about 50 total collectors, though network usage can be drastically different from 1 day to the next. I’ll take a look at the option to disable Raft
I have a few more questions about your cluster set up:
What version of Dgraph are you running? You can run dgraph version to see the version info.
How many Alphas and Zeros are there? And what’s the setting for --num_replicas flag?
CPU, total physical memory, and disk can affect performance. We typically recommend many CPU cores and fast SSD disks with high IOPS. In your case, 32 CPUs sounds good. Dgraph is highly concurrent, so the more cores there are the faster requests are processed.
We haven’t yet exposed this option yet. It’s still a work-in-progress.
Dgraph version : v1.1.0
Dgraph SHA-256 : 7d4294a80f74692695467e2cf17f74648c18087ed7057d798f40e1d3a31d2095
Commit SHA-1 : ef7cdb28
Commit timestamp : 2019-09-04 00:12:51 -0700
Branch : HEAD
Go version : go1.12.7
I only have 1 alpha and 1 zero. I didn’t use --num_replicas so it is the default. I didn’t think I needed more since there is not a lot of data (it just constantly changes) and I am the only user for this instance.
Seems like disabling Raft might work out for you. You can try out the branch (mrjn/ludicrous) we have that disables Raft and see if that works for you. Here’s the build instructions. You’ll need Go 1.11+ or later.
$ git clone https://github.com/dgraph-io/dgraph.git
$ cd ./dgraph
$ git checkout mrjn/ludicrous
Branch 'mrjn/ludicrous' set up to track remote branch 'mrjn/ludicrous' from 'origin'.
Switched to a new branch 'mrjn/ludicrous'
$ make install
After running make install the Dgraph binary will be in $(go env GOPATH)/bin.
I should’ve tried building it from a clean machine first.
This branch was before we changed dependency management to Go modules, so you’ll actually need to build this from GOPATH. Looks like you already have the correct commit checked out in your local repo, so this should work:
I got the installation working and it is working a LOT faster, over 70% faster. But there are some minor differences in the results when I run the same command multiple times so I guess disabling Raft did affect the results. I thought with only 1 dgraph alpha instance then it wouldn’t matter because there is only 1 group or are there still multiple Raft groups per alpha instance.
I’ve experienced scenarios where sometimes when I do a mutate and add some new data, and then when I query it right away it isn’t there, and then the next time I query it then it appears.
Is this an expected risk of enabling ludicrous mode or is it a bug?