How to import bulk data into cluster?


#1

Hi all,

I have 3 nodes,and I use bulk load to generate 3 data directories.

dgraph bulk -r test_bulk_load.rdf -s test.schema --map_shards=6 --reduce_shards=3

then I installed dgraph for 3 nodes.
for node 1, execute:

1. dgraph zero --my 172.31.1.18:5080 --replicas 1 &
2. dgraph alpha --lru_mb 1024 --my 172.31.1.18:7080 --zero 172.31.1.18:5080 -p /home/out/0/p &
3. dgraph-ratel &

for node 2, execute:

1. dgraph alpha --lru_mb 1024 --my 172.31.2.13:7081 --zero 172.31.1.18:5080 -p /home/out/1/p -o 1 &

for node3, execute:

1. dgraph alpha --lru_mb 1024 --my 172.31.9.16:7082 --zero 172.31.1.18:5080 -p /home/out/2/p -o 2 &

using curl http://172.31.1.18:6080/state on any node, I can see the results.

BUT if I query dgraph, the returned results are empty!!! It seems that bulk data are not imported into dgraph…
Could someone help me with this? Thanks a lot!


#2

In addition:
if I use “curl localhost:8080/mutate” to import some data into the cluster, queries from 3 nodes can get valid results.


(Michel Conrado) #3

What version are you using?
Seems like your case you need only one shard set --reduce_shards to 1.

UPDATE:

If you need tho, you can use this scripts - just follow the readme


#4

my dgraph version is v1.0.16
I don’t understand the meaning of shard. I have 3 node, why set --reduce_shards to 1? what’s the relation between shard num and node num?
Thank you so much!


(Michel Conrado) #5

You don’t need necessarily set the shards num. Dgraph will balance the groups between the Raft group after the cluster is up. You can use them if you need this to be done already.

Basically if you have 4 shards*. You need to use each shard within specific groups. According to the number of replicas.

4 shards are for 3 replicas per group and 12 Alphas.

You can’t use shards output randomly. So I think it’s better for those who are just starting out, just use a single shard and let Dgraph alone do the rest of the work.

If you need to undestand this check the links
https://docs.dgraph.io/deploy/#understanding-dgraph-cluster
https://docs.dgraph.io/deploy/#bulk-loader


#6

I’m confused. If I don’t set shards num, the default value is 1, and the p/ directory which is generated by bulk is too big. I want to place them on different nodes. I have to set shards num…

SO just tell me is there anything wrong in commands I pasted last week?
If yes, how to modify them?

THANK YOU.


(Michel Conrado) #7

okay, I’ve double checked your commands and tested (with v1.0.16) myself with 21million dataset we have. And the result are fine. No errors, no missing predicates, I can mutate.

So I’ve notice that there’s nothing wrong with your command and config. Setting --replicas to 1 and having 3 Alphas will create 3 groups indeed. So the 3 shards are correct.

Probably your case is the execution of binaries. If you are running binaries locally, I suggest that you copy and paste the binary into each folder in OUTPUT or move the “/p” (posting list) folder to the level (path) that the binarie is.

Note that Dgraph will create a folder called wall “/w”. If you are running all Alphas at the same path level. This can cause Wall folders to use the same path. And overwrite each other’s data. In this case define your command like this:

dgraph alpha --lru_mb 1024 (...) -p /home/out/0/p  -w /home/out/0/w

Also check if this IP is correct.


"groups": {
"1": {
      "members": {1 item},
      "tablets": {
      "_predicate_": {2 items},
      "apple_movietrailer_id": {2 items},
      "art_director.films_art_directed": {2 items},
      "character.portrayed_in_films": {2 items},
      "collections": {2 items},
      "content_rating.minimum_accompanied_age": {2 items},
      "costume_design_by": {2 items},
      "costumer_designer.costume_design_for_film": {2 items},
      "distributors": {2 items},
      "dubbing_performances": {2 items},
      "festival.date_founded": {2 items},
      "language": {2 items},
      "loc": {2 items},
      "music_contributor.film": {2 items},
      "nytimes_id": {2 items},
      "other_companies": {2 items},
      "performance.special_performance_type": {2 items},
      "post_production": {2 items},
      "prequel": {2 items},
      "primary_language": {2 items},
      "set_decoration_by": {2 items},
      "story_contributor.story_credits": {2 items},
      "type.property.reverse_property": {2 items}
  },
  "checksum": "15776183693210571989"
  },
"2": {
      "members": {1 item},
      "tablets": {
      "casting_director": {2 items},
      "cinematographer.film": {2 items},
      "content_rating_system.jurisdiction": {2 items},
      "country": {2 items},
      "crew_gig.crew_role": {2 items},
      "cut.film": {2 items},
      "distribution_medium.films_distributed_in_this_medium": {2 items},
      "email": {2 items},
      "featured_locations": {2 items},
      "featured_song.performed_by": {2 items},
      "festival.individual_festivals": {2 items},
      "festival.sponsoring_organization": {2 items},
      "format": {2 items},
      "location.featured_in_films": {2 items},
      "metacritic_id": {2 items},
      "name": {2 items},
      "performance.character_note": {2 items},
      "performance.film": {2 items},
      "personal_appearance.type_of_appearance": {2 items},
      "producer.films_executive_produced": {2 items},
      "production_company.films": {2 items},
      "rated": {2 items},
      "story_by": {2 items},
      "subjects": {2 items},
      "type.property.schema": {2 items}
  },
  "checksum": "4011099541889073135"
  },
"3": {
      "members": {1 item},
      "tablets": {
      "actor.dubbing_performances": {2 items},
      "actor.film": {2 items},
      "art_direction_by": {2 items},
      "casting_director.films_casting_directed": {2 items},
      "character.portrayed_in_films_dubbed": {2 items},
      "cinematography": {2 items},
      "collection.films_in_collection": {2 items},
      "company.films": {2 items},
      "company_role_or_service.companies_performing_this_role_or_service": {2 items},
      "content_rating.country": {2 items},
      "content_rating.minimum_unaccompanied_age": {2 items},
      "content_rating.rating_system": {2 items},
      "content_rating_system.ratings": {2 items},
      "crew_gig.crewmember": {2 items},
      "crew_gig.film": {2 items},
      "crewmember.films_crewed": {2 items},
      "cut.note": {2 items},
      "cut.release_region": {2 items},
      "cut.runtime": {2 items},
      "cut.type_of_cut": {2 items},
      "dgraph.type": {2 items},
      "director.film": {2 items},
      "distributor.films_distributed": {2 items},
      "edited_by": {2 items},
      "editor.film": {2 items},
      "estimated_budget": {2 items},
      "executive_produced_by": {2 items},
      "fandango_id": {2 items},
      "featured_song": {2 items},
      "featured_song.featured_in_film": {2 items},
      "festival.focus": {2 items},
      "festival.location": {2 items},
      "festival_event.festival": {2 items},
      "festival_event.films": {2 items},
      "festival_focus.festivals_with_this_focus": {2 items},
      "festival_sponsor.festivals_sponsored": {2 items},
      "festivals": {2 items},
      "filming": {2 items},
      "format.format": {2 items},
      "genre": {2 items},
      "gross_revenue": {2 items},
      "http://www.w3.org/2000/01/rdf-schema#domain": {2 items},
      "http://www.w3.org/2000/01/rdf-schema#range": {2 items},
      "http://www.w3.org/2002/07/owl#inverseOf": {2 items},
      "initial_release_date": {2 items},
      "job.films_with_this_crew_job": {2 items},
      "locations": {2 items},
      "music": {2 items},
      "netflix_id": {2 items},
      "other_crew": {2 items},
      "performance.actor": {2 items},
      "performance.character": {2 items},
      "person_or_entity_appearing_in_films": {2 items},
      "personal_appearance.film": {2 items},
      "personal_appearance.person": {2 items},
      "personal_appearance_type.appearances": {2 items},
      "personal_appearances": {2 items},
      "pre_production": {2 items},
      "produced_by": {2 items},
      "producer.film": {2 items},
      "production_companies": {2 items},
      "production_design_by": {2 items},
      "production_designer.films_production_designed": {2 items},
      "rating": {2 items},
      "regional_release_date.release_date": {2 items},
      "regional_release_date.release_region": {2 items},
      "release_date_s": {2 items},
      "rottentomatoes_id": {2 items},
      "runtime": {2 items},
      "sequel": {2 items},
      "series": {2 items},
      "series.films_in_series": {2 items},
      "set_designer.sets_designed": {2 items},
      "song.films": {2 items},
      "song_performer.songs": {2 items},
      "songs": {2 items},
      "soundtrack": {2 items},
      "special_performance_type.performance_type": {2 items},
      "starring": {2 items},
      "subject.films": {2 items},
      "tagline": {2 items},
      "topic_server.schemastaging_corresponding_entities_type": {2 items},
      "topic_server.webref_cluster_members_type": {2 items},
      "traileraddict_id": {2 items},
      "trailers": {2 items},
      "type.property.expected_type": {2 items},
      "writer.film": {2 items},
      "written_by": {2 items}
  },
  "checksum": "15866988743810979310"
  }
}

#8

Thanks a lot for your help:-)

I don’t know whether you tried query dgraph or not.
Because command “curl http://172.31.1.18:6080/state” returns normal results. But query return empty results.