Export data and input into remote server

DaToo-J · July 21, 2020, 9:11am

Hi there, I’m newbie to Dgraph.
I want export dgraph data from my laptop and then import the data into remote server.

Here are my steps:

on my laptop use this command line: curl localhost:8080/admin/export
then, copy 3 files ( g01.schema.gz, g01.rdf.gz, g01.gql_schema.gz) to the remote server
on the remote server, delete the out,p,w,zw directory
on the remote server use this command line: ./dgraph bulk -f /pathto/g01.rdf.gz -s /pathto/g01.schema.gz --map_shards=4 --reduce_shards=2 --http localhost:8008 --zero=localhost:5080
then I got the output info as below:


{
	"DataFiles": "/pathto/g01.rdf.gz",
	"DataFormat": "",
	"SchemaFile": "/pathto/g01.schema.gz",
	"GqlSchemaFile": "",
	"OutDir": "./out",
	"ReplaceOutDir": false,
	"TmpDir": "tmp",
	"NumGoroutines": 3,
	"MapBufSize": 67108864,
	"SkipMapPhase": false,
	"CleanupTmp": true,
	"NumReducers": 1,
	"Version": false,
	"StoreXids": false,
	"ZeroAddr": "localhost:5080",
	"HttpAddr": "localhost:8008",
	"IgnoreErrors": false,
	"CustomTokenizers": "",
	"NewUids": false,
	"Encrypted": false,
	"MapShards": 4,
	"ReduceShards": 2,
	"BadgerKeyFile": "",
	"BadgerCompressionLevel": 1
}

The bulk loader needs to open many files at once. This number depends on the size of the data set loaded, the map file output size, and the level of indexing. 100,000 is adequate for most data set sizes. See `man ulimit` for details of how to change the limit.
Current max open files limit: 1024

Connecting to zero at localhost:5080
Predicate "dgraph.type" already exists in schema
Predicate "dgraph.graphql.xid" already exists in schema
Predicate "dgraph.graphql.schema" already exists in schema
Processing file (1 out of 1): /home/zhouj/dgraph_data/dgraph_data/g01.rdf.gz
Shard tmp/map_output/000 -> Reduce tmp/shards/shard_0/000
Shard tmp/map_output/003 -> Reduce tmp/shards/shard_1/003
Shard tmp/map_output/002 -> Reduce tmp/shards/shard_1/002
Shard tmp/map_output/001 -> Reduce tmp/shards/shard_1/001
Num CPUs: 12
[16:04:20+0800] REDUCE 01s 79.26% edge_count:1.789k edge_speed:1.789k/sec plist_count:660.0 plist_speed:660.0/sec. Num Encoding: 0
Num CPUs: 12
[16:04:21+0800] REDUCE 02s 100.00% edge_count:2.257k edge_speed:2.257k/sec plist_count:1.014k plist_speed:1.014k/sec. Num Encoding: 0
[16:04:22+0800] REDUCE 03s 100.00% edge_count:2.257k edge_speed:1.128k/sec plist_count:1.014k plist_speed:506.9/sec. Num Encoding: 0
[16:04:23+0800] REDUCE 04s 100.00% edge_count:2.257k edge_speed:752.1/sec plist_count:1.014k plist_speed:337.9/sec. Num Encoding: 0
[16:04:24+0800] REDUCE 04s 100.00% edge_count:2.257k edge_speed:620.6/sec plist_count:1.014k plist_speed:278.8/sec. Num Encoding: 0
Total: 04s

Finally, I can’t got any answer from query which works well on my laptop dgraph.

So, I want to know if I import successfully data into the remote server dgraph. if not, how do I deal with this problem. If so, why I can’t get any answer.

I’ll be appreciate if anyone could help me to figure out this problem. :smile

Naman · July 21, 2020, 10:03am

Hi @DaToo-J, Welcome to the community and thanks for reaching out to us.

From the logs, it looks that the bulk loader has completed successfully.

Are you getting some error or is it returning an empty response?
Bulk loader would have created out directory inside which there would be 2 directories, one for each of the two groups. Can you show the commands that you are running for spinning up alphas?

DaToo-J · July 21, 2020, 10:20am

Yes, after querying the remote dgraph, I got an empty list and no error.
Command that runs alpha: ./dgraph alpha --lru_mb 1024

After a while, I tried the live command which works as I expected.
So could you tell me what differences between bulk and live ?
and which is better for development ?

Naman · July 21, 2020, 10:48am

Live loader and bulk loader are both used for data loading. The major differences between them lie in speed. The bulk loader is fast while the live loader is slow. While a good point for the live loader is that it can be used for running cluster whereas bulk loader can be run on with only
group zero running.
Bulk loader is suggested for the initial import of large data.
You can have a look at some more details about data loading.

I think that would clarify a bit how both are used differently. Kindly note the structure of directory after running bulk loader and how to use it. In case of any doubts, feel free to shoot follow-up questions.

DaToo-J · July 22, 2020, 1:50am

Thanks very much for your guide.

Topic		Replies	Views
Dump and Load Data Dgraph	3	3864	August 13, 2020
How to export data from one dgraph and import the data to other dgraph？ Dgraph	1	1424	December 28, 2020
Export Dgraph on Version 1.1.0 Dgraph kind:question	6	582	February 4, 2021
Remote Export in Dgraph Dev rfc , dgraph	10	1728	November 30, 2020
Where does export command export the data on server Dgraph	1	424	January 24, 2021

Export data and input into remote server

Related topics