Is it normal in tour "a bigger dataset"

I can follow the tour until “a bigger dataset”

My test machine spec:

  • Intel® Xeon® CPU E5620 @ 2.40GHz * 2
  • 1T 7200rpm Harddisk
  • 32G DDR3 Memory

Follow the first tour, prepared docker
mkdir -p ~/dgraph
docker run -it -p 5080:5080 -p 6080:6080 -p 8080:8080 -p 9080:9080 -p 8000:8000 -v ~/dgraph:/dgraph --name dgraph dgraph/dgraph dgraph zero
docker exec -it dgraph dgraph server --memory_mb 2048 --zero localhost:5080
docker exec -it dgraph dgraph-ratel
(also tried --memory_mb 16000)

And Alter the schema
director.film: uid @reverse @count . genre: uid @reverse . initial_release_date: dateTime @index(year) . name: string @index(term) . starring: uid @count .

After I downloaded 1million.rdf.gz, use live -r to load rdf.
docker exec -it dgraph dgraph live -r 1million.rdf.gz --zero localhost:5080

I don’t know whether the speed and aborts data are normal, but feels very slow.

Total Txns done:        6 RDFs per second:    2999 Time Elapsed: 2s, Aborts: 1
Total Txns done:       18 RDFs per second:    4499 Time Elapsed: 4s, Aborts: 29
Total Txns done:       31 RDFs per second:    5166 Time Elapsed: 6s, Aborts: 69
Total Txns done:       44 RDFs per second:    5500 Time Elapsed: 8s, Aborts: 98
Total Txns done:       55 RDFs per second:    5500 Time Elapsed: 10s, Aborts: 118
Total Txns done:       64 RDFs per second:    5333 Time Elapsed: 12s, Aborts: 150
Total Txns done:       68 RDFs per second:    4857 Time Elapsed: 14s, Aborts: 220
Total Txns done:       78 RDFs per second:    4875 Time Elapsed: 16s, Aborts: 265
Total Txns done:       80 RDFs per second:    4444 Time Elapsed: 18s, Aborts: 300
Total Txns done:       84 RDFs per second:    4200 Time Elapsed: 20s, Aborts: 323
Total Txns done:       84 RDFs per second:    3818 Time Elapsed: 22s, Aborts: 334
Total Txns done:       85 RDFs per second:    3542 Time Elapsed: 24s, Aborts: 358
Total Txns done:       85 RDFs per second:    3269 Time Elapsed: 26s, Aborts: 386
Total Txns done:       85 RDFs per second:    3036 Time Elapsed: 28s, Aborts: 411
Total Txns done:       86 RDFs per second:    2867 Time Elapsed: 30s, Aborts: 441
Total Txns done:       88 RDFs per second:    2750 Time Elapsed: 32s, Aborts: 464
Total Txns done:       88 RDFs per second:    2588 Time Elapsed: 34s, Aborts: 484
Total Txns done:       88 RDFs per second:    2444 Time Elapsed: 36s, Aborts: 507
Total Txns done:       90 RDFs per second:    2368 Time Elapsed: 38s, Aborts: 561
Total Txns done:       91 RDFs per second:    2275 Time Elapsed: 40s, Aborts: 621
Total Txns done:       91 RDFs per second:    2167 Time Elapsed: 42s, Aborts: 649
Total Txns done:       91 RDFs per second:    2068 Time Elapsed: 44s, Aborts: 679
Total Txns done:       92 RDFs per second:    2000 Time Elapsed: 46s, Aborts: 713
Total Txns done:       93 RDFs per second:    1937 Time Elapsed: 48s, Aborts: 714
Total Txns done:       93 RDFs per second:    1860 Time Elapsed: 50s, Aborts: 718
Total Txns done:       94 RDFs per second:    1808 Time Elapsed: 52s, Aborts: 724
Total Txns done:       94 RDFs per second:    1741 Time Elapsed: 54s, Aborts: 738
Total Txns done:       95 RDFs per second:    1696 Time Elapsed: 56s, Aborts: 762
Total Txns done:       95 RDFs per second:    1638 Time Elapsed: 58s, Aborts: 790
Total Txns done:       95 RDFs per second:    1583 Time Elapsed: 1m0s, Aborts: 802
Total Txns done:       95 RDFs per second:    1532 Time Elapsed: 1m2s, Aborts: 817
Total Txns done:       95 RDFs per second:    1484 Time Elapsed: 1m4s, Aborts: 837
Total Txns done:       96 RDFs per second:    1455 Time Elapsed: 1m6s, Aborts: 859
Total Txns done:       97 RDFs per second:    1426 Time Elapsed: 1m8s, Aborts: 868
Total Txns done:       97 RDFs per second:    1386 Time Elapsed: 1m10s, Aborts: 881
Total Txns done:       97 RDFs per second:    1347 Time Elapsed: 1m12s, Aborts: 899
Total Txns done:       98 RDFs per second:    1324 Time Elapsed: 1m14s, Aborts: 916
Total Txns done:       98 RDFs per second:    1289 Time Elapsed: 1m16s, Aborts: 934
Total Txns done:       99 RDFs per second:    1269 Time Elapsed: 1m18s, Aborts: 955
Total Txns done:      100 RDFs per second:    1250 Time Elapsed: 1m20s, Aborts: 966
Total Txns done:      100 RDFs per second:    1219 Time Elapsed: 1m22s, Aborts: 981
Total Txns done:      101 RDFs per second:    1202 Time Elapsed: 1m24s, Aborts: 1000
Total Txns done:      101 RDFs per second:    1174 Time Elapsed: 1m26s, Aborts: 1021
Total Txns done:      101 RDFs per second:    1148 Time Elapsed: 1m28s, Aborts: 1041
Total Txns done:      102 RDFs per second:    1133 Time Elapsed: 1m30s, Aborts: 1054
Total Txns done:      103 RDFs per second:    1120 Time Elapsed: 1m32s, Aborts: 1066
Total Txns done:      103 RDFs per second:    1096 Time Elapsed: 1m34s, Aborts: 1080
Total Txns done:      103 RDFs per second:    1073 Time Elapsed: 1m36s, Aborts: 1095
Total Txns done:      103 RDFs per second:    1051 Time Elapsed: 1m38s, Aborts: 1119
Total Txns done:      103 RDFs per second:    1030 Time Elapsed: 1m40s, Aborts: 1139
Total Txns done:      103 RDFs per second:    1010 Time Elapsed: 1m42s, Aborts: 1162
Total Txns done:      103 RDFs per second:     990 Time Elapsed: 1m44s, Aborts: 1167
Total Txns done:      104 RDFs per second:     981 Time Elapsed: 1m46s, Aborts: 1179
Total Txns done:      104 RDFs per second:     963 Time Elapsed: 1m48s, Aborts: 1202
Total Txns done:      104 RDFs per second:     945 Time Elapsed: 1m50s, Aborts: 1218
Total Txns done:      104 RDFs per second:     929 Time Elapsed: 1m52s, Aborts: 1236
Total Txns done:      104 RDFs per second:     912 Time Elapsed: 1m54s, Aborts: 1252

And Server Output showed

mutation.go:159: Done schema update predicate:"director.film" value_type:UID directive:REVERSE count:true
2018/03/29 13:10:01 mutation.go:159: Done schema update predicate:"genre" value_type:UID directive:REVERSE
2018/03/29 13:10:01 mutation.go:159: Done schema update predicate:"initial_release_date" value_type:DATETIME directive:INDEX tokenizer:"year"
2018/03/29 13:10:01 mutation.go:159: Done schema update predicate:"name" value_type:STRING directive:INDEX tokenizer:"term"
2018/03/29 13:10:01 mutation.go:159: Done schema update predicate:"starring" value_type:UID count:true
2018/03/29 13:10:49 groups.go:316: Asking if I can serve tablet for: dummy
2018/03/29 13:18:30 groups.go:316: Asking if I can serve tablet for: name
2018/03/29 13:18:30 groups.go:316: Asking if I can serve tablet for: name
2018/03/29 13:18:30 groups.go:316: Asking if I can serve tablet for: name
2018/03/29 13:18:30 groups.go:316: Asking if I can serve tablet for: _predicate_
2018/03/29 13:18:30 groups.go:316: Asking if I can serve tablet for: _predicate_
2018/03/29 13:18:30 groups.go:316: Asking if I can serve tablet for: _predicate_
2018/03/29 13:18:30 groups.go:316: Asking if I can serve tablet for: _predicate_
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: director.film
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: genre
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: initial_release_date
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: starring
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: starring
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: performance.actor
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: performance.film
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: performance.film
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: actor.film
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: performance.character
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: actor.film
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: actor.film
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: actor.film
2018/03/29 13:18:31 groups.go:316: Asking if I can serve tablet for: performance.character
2018/03/29 13:18:49 draft.go:669: Writing snapshot at index: 210, applied mark: 533
2018/03/29 13:18:49 wal.go:118: Writing snapshot to WAL, metadata: {ConfState:{Nodes:[1] XXX_unrecognized:[]} Index:210 Term:2 XXX_unrecognized:[]}, len(data): 27
2018/03/29 13:20:19 draft.go:669: Writing snapshot at index: 212, applied mark: 534
2018/03/29 13:20:19 wal.go:118: Writing snapshot to WAL, metadata: {ConfState:{Nodes:[1] XXX_unrecognized:[]} Index:212 Term:2 XXX_unrecognized:[]}, len(data): 27
2018/03/29 13:20:49 draft.go:669: Writing snapshot at index: 257, applied mark: 595
2018/03/29 13:20:49 wal.go:118: Writing snapshot to WAL, metadata: {ConfState:{Nodes:[1] XXX_unrecognized:[]} Index:257 Term:2 XXX_unrecognized:[]}, len(data): 27
2018/03/29 13:21:19 draft.go:669: Writing snapshot at index: 610, applied mark: 1038
2018/03/29 13:21:19 wal.go:118: Writing snapshot to WAL, metadata: {ConfState:{Nodes:[1] XXX_unrecognized:[]} Index:610 Term:2 XXX_unrecognized:[]}, len(data): 27
2018/03/29 13:21:49 draft.go:669: Writing snapshot at index: 1032, applied mark: 1804
2018/03/29 13:21:49 wal.go:118: Writing snapshot to WAL, metadata: {ConfState:{Nodes:[1] XXX_unrecognized:[]} Index:1032 Term:2 XXX_unrecognized:[]}, len(data): 27
2018/03/29 13:22:19 draft.go:669: Writing snapshot at index: 1632, applied mark: 2019
2018/03/29 13:22:19 wal.go:118: Writing snapshot to WAL, metadata: {ConfState:{Nodes:[1] XXX_unrecognized:[]} Index:1632 Term:2 XXX_unrecognized:[]}, len(data): 27
2018/03/29 13:22:49 draft.go:669: Writing snapshot at index: 2010, applied mark: 2346
2018/03/29 13:22:49 wal.go:118: Writing snapshot to WAL, metadata: {ConfState:{Nodes:[1] XXX_unrecognized:[]} Index:2010 Term:2 XXX_unrecognized:[]}, len(data): 27
2018/03/29 13:23:19 draft.go:669: Writing snapshot at index: 2368, applied mark: 2600
2018/03/29 13:23:19 wal.go:118: Writing snapshot to WAL, metadata: {ConfState:{Nodes:[1] XXX_unrecognized:[]} Index:2368 Term:2 XXX_unrecognized:[]}, len(data): 27
2018/03/29 13:23:49 draft.go:669: Writing snapshot at index: 2390, applied mark: 3890
2018/03/29 13:23:49 wal.go:118: Writing snapshot to WAL, metadata: {ConfState:{Nodes:[1] XXX_unrecognized:[]} Index:2390 Term:2 XXX_unrecognized:[]}, len(data): 27

The per second number in dgraph live log keep decreasing. Is it normal?

my dgraph version is 1.0.4

Hey Jair,

I had a look at this, there a couple of extra @count indexes which aren’t needed. I will update the tour to remove them. Though the high aborts are because of conflicts among different batches of mutations.

Running cat 1million.rdf| awk -F ' ' '{print $1,$2}' * | sort | uniq -c | sort -nr| less shows

    262 <16808417249295569659> <starring>
    240 <5281357350269019391> <starring>
    203 <3429022399455207808> <starring>
    197 <4136539480493405740> <starring>
    193 <15879613395889330696> <starring>
    191 <14766569164132289417> <starring>
    191 <10729850801465314700> <starring>
    182 <409900392357024013> <starring>
    181 <13265852902313601180> <starring>
    179 <2663082822828835201> <starring>
    176 <9822379669694059074> <starring>
    173 <7998027201999106158> <actor.film>
    172 <11491455907756626836> <starring>
    171 <14757387687345491472> <starring>
    169 <18328922841925323129> <starring>
    166 <6543833726304339171> <starring>
    166 <3283659752765638071> <starring>
    166 <16320625340347498605> <starring>
    165 <5661091599378777914> <starring>
    164 <9993417256559289619> <starring>
    ...
    ...

So there a bunch of RDF triples for the same S and P which is why the concurrent transactions are aborting. We will have a look at this and see if we can have a mechanism to ignore data conflicts in some cases. For now, I would suggest running dgraph live with -c set to 1 for the fastest loading.

docker exec -it dgraph dgraph live -r 1million.rdf.gz --zero localhost:5080 -c 1

There is already an issue for a suggested improvement at Add IgnoreDataConflict option · Issue #2221 · dgraph-io/dgraph · GitHub.

2 Likes

OK, got it. I will try it again.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.