I wanted to make sure bulk upload processes is correct. Below are the steps…
Our cluster config:
Zeros: 3 (48 GB, 1 X 1.98 SSD disk)
Alpha: 9 (56 GB, 2 X 1.98 SSD disk)
Groups: 3
All the servers have one extra managed file disk mounted (ReadWriteMany) for file copy.
-
Prepare RDF data file and schema
-
Bring up the zero’s
-
Block Alpha’s with initContainer flag
-
Launch bulk uploader from one of the zeros
dgraph bulk -f /coldstart/upload/pending -s /coldstart/upload/rdf-schema/my_schema.rdf -format=rdf --store_xids --xidmap xid --map_shards=3 --reduce_shards=3 --http localhost:8000 --zero=localhost:5080 parameters and flags ---------------------------- RDF data files location: /coldstart/upload/pending Schema file: /coldstart/upload/rdf-schema/my_schema.rdf format: rdf --store_xids (this is required to store xid?) --xidmap xid (This the attribute name to store?) --map_shards=3 --reduce_shards=3 --http localhost:8000 --zero=localhost:5080 Remarks ------------ We will launch bulk uploader multiple times till all the data files are uploaded
-
Below is the typical type of our objects
type Student { studentId: String! @id courses: [Course] @hasInverse(field: student) xid: String! @search(by: [hash]) }
-
Below is our RDF doc
<_:my.org/Student/10101/Course/201/Event/1> <Course.eventId> "1" . <_:my.org/Student/10101/Course/201/Event/1> <Course.timestamp> "2022-01-01T00:00:02.298240" . <_:my.org/Student/10101/Course/201/Event/1> <Course.student> <_:my.org/Student/10101> . <_:my.org/Student/10101> <Student.studentId> "10101" . <_:my.org/Student/10101> <Student.courses> <_:my.org/Student/10101/Course/201/Event/1> . <_:my.org/Student/10101/Course/201/Event/1> <Course.codeId> <_:my.org/CourseTcode/201> . <_:my.org/CourseTcode/201> <CourseTcode.course> <_:my.org/Student/10101/Course/201/Event/1>