Bulk loader location

porsche · January 13, 2022, 3:29pm

K8S version: 1.20.7
K8S Configuration: 1 Zero node, 3 Alpha nodes

Ref: https://dgraph.io/docs/deploy/fast-data-loading/bulk-loader/
Above link talks about bulk/live loader.

Few newbie questions…

Where is this dgraph bulk/Live tool located?
From where do we run this tool?
Do we run this tool on Zero node(s)?
Can we run this tool from non-dgraph cluster node?

MichelDiz · January 13, 2022, 4:22pm

Is part of Dgraph Binary.

Anywhere, at least the versions should be the same.

The Bulk is an offline tool. But it needs a Zero group to lease UIDs. All Alphas should be shutdown. For bulk. But for Live Loader, you need the whole cluster running. And the Live also needs to lease UIDs from the Zero group. And write the data in the Alpha groups.

Not sure what you mean. But you can run Dgraph’s binary anywhere. But it needs to reach the cluster from outside if the case. And also have the same version of the cluster.

porsche · January 13, 2022, 6:32pm

@MichelDiz thanks for the reply.

We are using v21.03.1 docker container on AKS
In my Dgraph K8S cluster, below is what I see

billing2581×225 258 KB
In which folder does dgraph bulk/live tool exists?
Can the dgraph bulk tool load from Azure Storage account?

MichelDiz · January 13, 2022, 7:10pm

Docker or Kubernetes???

None of them. The bulk is part of the binary. If for some reason you wanna know where is the binary. Hit in your terminal “which dgraph” and it will give something like /usr/bin/dgraph or something.

Dgraph Bulk can run in any Linux Distro. If the OS has access to your Storage, so yes it can.

iluminae · January 13, 2022, 11:44pm

This thread has become very confused.

My workflow on k8s for using bulk loader:

bring up 3 zeros
bring up one alpha and have it block in the init container (part of the helm chart)
exec into init container on the single alpha that is up, and run dgraph bulk <flags>
after the bulk loader finishes, bring up remaining alpha node pods (they also stall in init container)
distribute the out/N/p/ directories from the first node to their respective group members and place them in /dgraph/p
touch doneinit in every alpha to unblock the init container and it will start up with all of your data

porsche · January 14, 2022, 12:31am

Wow! this is even more confusing.

As per the official docs bulk uploaded must be run from Zero and all Alphas should be down.

Does it work for you?

iluminae · January 14, 2022, 2:19pm

It does not have to be run from the zeros, it has to access the zeros while it’s running (to allocate uids).

You could run the bulk loader from your laptop with a port-forward to the zero leader if you want. The reason I do it in the init container of one alpha is to use the 16c 64GiB ram is has. (we have like 30billion edges in my dgraph cluster now)

But yes, the alphas should be down - otherwise they will do stuff with the zeros while the bulk loader does.

porsche · January 14, 2022, 5:15pm

Does bulk loader usage involve bootstrapping dgraph with millions/billions of nodes and edges?

iluminae · January 14, 2022, 11:43pm

Well that is it’s job, so yes. It’s not magic, it just reads RDF files and writes them to badger the same way dgraph reads from badger.

Also since you did not get an answer to this:

The bulk loader can read from minio, and minio can passthrough to azure blob store.

Topic		Replies	Views
Dgraph Bulk load on version 20.07.02 Dgraph kind:question , bulkloader	3	958	November 9, 2020
Bulk loader in a running Docker container GraphQL dgraph	6	1081	December 14, 2020
Bulk Loader - Deploy Documentation	0	892	December 16, 2020
Bulk loader in HA Kubernetes deployment Dgraph kind:question	4	523	February 21, 2021
About bulk loader Users	7	1855	September 12, 2018

Bulk loader location

Related topics