Ro0tk1t
(Rootkit)
April 8, 2021, 8:30am
1
Hello, developers.
i have a problem about batch mutation.
What I want to do
we have many data need to insert to dgraph, and created several indexed schemas for our data. for the insert speed, so we run multi script to batch insert into dgraph cluster, but each time new data insert to cluster the index will be refreshed, so we always got Exception: Transaction has been aborted. Please retry
. that means we can only insert one data at the same time.
I was so confused about how to solve that. for the huge data, we can’t insert one by one !
Dgraph metadata
dgraph version
Dgraph version : v21.03.0
Dgraph codename : rocket-mod
Dgraph SHA-256 : 4ca26023e812146d88fc3f5b364589a4de2776fa3dce849d2eff103f3fa9ae60
Commit SHA-1 : a77bbe8ae
Commit timestamp : 2021-04-07 21:36:38 +0530
Branch : release/v21.03
Go version : go1.15.9
jemalloc enabled : true
chewxy
(chewxy)
April 8, 2021, 8:40am
2
Can you elaborate how you’re doing the batch mutations? Do you commit after each txn is done?
Ro0tk1t
(Rootkit)
April 8, 2021, 8:52am
3
yes, i commit after each txn done.
i use pydgraph to do that, and pseudo-code looks like:
for data in datas:
new_person = data.get('name')
query = '''{person as var(func: eq(name, "%s"))}''' % new_person
nquad = f'''uid(person) <name> "{new_person}" .'''
txn = client.txn()
mutation = txn.create_mutation(set_nquads=nquad)
req = txn.create_request(query=query, mutations=[mutation], commit_now=True)
print(txn.do_request(req))
chewxy
(chewxy)
April 8, 2021, 8:52am
4
Ah… pydgraph is yet to be updated for 21.03.
Anurag
(Anurag)
April 8, 2021, 8:57am
5
Hi @Ro0tk1t ,
I’ll release a new version asap. You can update Pydgraph and try again.
Thanks!
Ro0tk1t
(Rootkit)
April 8, 2021, 9:06am
6
@Anurag @chewxy
emm… i see the latest pydgraph still v20.07 at pypi.org . do my usage of pydgraph correct ?
will it fixed when i update pydgraph ?
Anurag
(Anurag)
April 8, 2021, 9:10am
7
Yes, your usage of Pydgraph is correct for Dgraph v20.07 or Dgraph v20.11
, if you notice that the dgraph version has been updated to v21.3
which introduced few breaking changes for the clients. We are updating clients to support the newer version. Once I release the new version, this problem should go away. As a quick check, you can test if your code works with Dgraph v20.7
right now.
Ro0tk1t
(Rootkit)
April 8, 2021, 9:46am
8
run script alone works fine, but as i said we always got Exception: Transaction has been aborted. Please retry
when run script multiple
Anurag
(Anurag)
April 8, 2021, 9:48am
9
Does this happen with a different Dgraph version as well eg Dgraph v20.11
?
Ro0tk1t
(Rootkit)
April 9, 2021, 2:07am
11
@Anurag
after i install newest pydgraph form github repo, there is nothing changed.
Transaction has been aborted. Please retry
i think that not cuz of the client, but the dgraph server side limitation.
how can i speed up batch mutation?
Naman
(Naman Jain)
April 13, 2021, 8:58pm
12
Hey @Ro0tk1t , there might be conflicts in the transactions due to which the transactions get aborted.
If you want to load the data in bulk, have you considered live loader or bulk loader for that? They smartly manage this conflict resolution.
Ro0tk1t
(Rootkit)
April 14, 2021, 3:11am
13
fine, use bulk. there is also problem, and no more error detail
i give up
Naman
(Naman Jain)
April 14, 2021, 7:17am
14
Hey @Ro0tk1t , sorry for the inconvenience. There was a minor bug that caused the bulk loader to crash without printing the error message.
Basically, you are trying to insert data whose type is defined as scalar(string, int, float, etc) in the schema while in the data file it is of type uid
.
Example:
Schema: name: string .
Data: _:a <name> _:b .
or _:a <name> uid(0x10) .
This PR fix(bulk): throw the error instead of crashing by NamanJain8 · Pull Request #7722 · dgraph-io/dgraph · GitHub should fix this issue. But still, you would need to correct your data. You will see this crash error message with my change:
2021/04/14 12:43:01 RDF doesn't match schema: Input for predicate "name" of type scalar is uid. Edge: entity:4600001 attr:"\000\000\000\000\000\000\000\000name" value_type:UID value_id:4700001
Do let me know in case of any queries or help needed with data loading.
Thanks for reporting the bug.
Ro0tk1t
(Rootkit)
April 16, 2021, 4:06am
16
@Naman
thanks. one more problem… with bulk loaded to dgraph, all edge was missed, i think my schema and rdf is right.
the test schema file:
field1: string .
type A {
field1
}
A: [uid] @reverse .
field2: string .
type B {
field2
}
the test rdf file:
<_:A_1> <dgraph.type> "A" .
<_:A_1> <field1> "value1" .
<_:A_2> <dgraph.type> "A" .
<_:A_2> <field1> "value2" .
<_:B_1> <dgraph.type> "B" .
<_:B_1> <field2> "aaaaaaaaaaaa" .
<_:B_1> <A> <_:A_1> .
and the bulk command is:
dgraph bulk -s s.schema -f test.rdf --zero localhost:5080
after dgraph cluster up, i run a query in ratel:
{
a(func: has(field2)){
expand(_all_)
{
~A{
expand(_all_)
}
A{
expand(_all_)
}
}
}
}
but i can only see one data node B, no edge ~A and data node A.
{
"data": {
"a": [
{
"field2": "aaaaaaaaaaaa"
}
]
},
"extensions": {
"server_latency": {
"parsing_ns": 215458,
"processing_ns": 640911,
"encoding_ns": 25487,
"assign_timestamp_ns": 1072510,
"total_ns": 2323218
},
"txn": {
"start_ts": 233
},
"metrics": {
"num_uids": {
"_total": 1,
"field2": 1,
"~A": 0
}
}
}
}
iluminae
(Kenan Kessler)
April 16, 2021, 4:42am
17
TLDR: edge A should not be a child of the first expand in your query.
In your example set, has(field2)
results in a type B which expand returns field2 for… But expand has no edges in type B so the rest of your query is not doing anything.
You have a forward edge from (:B)-[:A]->(:A)
so I think you want this:
{
a(func: has(field2)){ #equivilent to type(B) here
expand(_all_) #gives you field2
A{
expand(_all_) #gives you field1
}
}
}
If edge predicate A was in the type definition for type B it would expand it for you as well.
kaustubh
(Kaustubh Joshi)
April 16, 2021, 12:31pm
19
Few case in points →
Each field must be marked with a type, this can either be simple ones (string, int, etc.) or a uid or an array of the simple types.
I’d suggest the following schema for you
type A {
field1
}
field1: string .
type B {
field2
hasConnectionTo
}
field2: string .
hasConnectionTo: [uid] @reverse .
and accordingly the test rdf would be:
<_:A_1> <dgraph.type> "A" .
<_:A_1> <field1> "value1" .
<_:A_2> <dgraph.type> "A" .
<_:A_2> <field1> "value2" .
<_:B_1> <dgraph.type> "B" .
<_:B_1> <field2> "aaaaaaaaaaaa" .
<_:B_1> <hasConnectionTo> <_:A_1> .
now you can run something as follows:
{
getField2(func: has(field2)){
hasConnectionTo{
expand(_all_)
}
~hasConnectionTo{
expand(_all_)
}
}
}
While this makes sense for now, you might want to separate out into types based selection and filter predicates in the future
1 Like
Ro0tk1t
(Rootkit)
April 19, 2021, 2:45am
20
thanks !
for now the test files above works fine.
for our production files, many data node and edge was lost after bulk loaded. and the bulk loader usually failed at REDUCE
stage cuz of OOM (only succed 2 times), so we are trying 2 use -j 1
option 2 test again, It’s a little slow.
is there some way to start a bulk load from the maped file in tmp
directory ?
and i’m a little confused if the rdf file looks like:
#<_:A_1> <dgraph.type> "A" .
#<_:A_1> <field1> "value1" .
<_:A_2> <dgraph.type> "A" .
<_:A_2> <field1> "value2" .
<_:B_1> <dgraph.type> "B" .
<_:B_1> <field2> "aaaaaaaaaaaa" .
<_:B_1> < hasConnectionTo> <_:A_1> .
will the node A_1
exists after bulk loaded ?
Naman
(Naman Jain)
April 19, 2021, 5:52am
21
Hey @Ro0tk1t , can you elaborate on this, please? It would be helpful to see a sample data for kind of data that was not loaded correctly if any.
Do you have any memory profiles? I assume you are on v21.03
. Correct me if I am wrong. Also, a couple of questions.
What are the specifications of the machine you are running a bulk loader on.
What is the data size.
Yes, you can use --skip_map_phase
flag to skip the map phase and --tmp
to provide the tmp directory generated earlier.
No, A_1
will not be loaded.
Ro0tk1t
(Rootkit)
April 19, 2021, 6:38am
22
There is a piece of log from /var/log/messages
about OOM
:
Apr 15 10:46:50 localhost kernel: ls invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Apr 15 10:46:50 localhost kernel: ls cpuset=/ mems_allowed=0
Apr 15 10:46:50 localhost kernel: CPU: 1 PID: 28654 Comm: ls Kdump: loaded Not tainted 3.10.0-1160.21.1.el7.x86_64 #1
Apr 15 10:46:50 localhost kernel: Hardware name: Bochs Bochs, BIOS rel-1.7.5.1-20190822_073655 04/01/2014
Apr 15 10:46:50 localhost kernel: Call Trace:
Apr 15 10:46:50 localhost kernel: [<ffffffff90d8305a>] dump_stack+0x19/0x1b
Apr 15 10:46:50 localhost kernel: [<ffffffff90d7d97a>] dump_header+0x90/0x229
Apr 15 10:46:50 localhost kernel: [<ffffffff9090eb3b>] ? cred_has_capability+0x6b/0x120
Apr 15 10:46:50 localhost kernel: [<ffffffff907c221d>] oom_kill_process+0x2cd/0x490
Apr 15 10:46:50 localhost kernel: [<ffffffff9090ec1e>] ? selinux_capable+0x2e/0x40
Apr 15 10:46:50 localhost kernel: [<ffffffff907c290a>] out_of_memory+0x31a/0x500
Apr 15 10:46:50 localhost kernel: [<ffffffff90d7e497>] __alloc_pages_slowpath+0x5db/0x729
Apr 15 10:46:50 localhost kernel: [<ffffffff907c8e86>] __alloc_pages_nodemask+0x436/0x450
Apr 15 10:46:50 localhost kernel: [<ffffffff90818b58>] alloc_pages_current+0x98/0x110
Apr 15 10:46:50 localhost kernel: [<ffffffff907bdcd7>] __page_cache_alloc+0x97/0xb0
Apr 15 10:46:50 localhost kernel: [<ffffffff907c0c70>] filemap_fault+0x270/0x420
Apr 15 10:46:50 localhost kernel: [<ffffffffc029791e>] __xfs_filemap_fault+0x7e/0x1d0 [xfs]
Apr 15 10:46:50 localhost kernel: [<ffffffffc0297b1c>] xfs_filemap_fault+0x2c/0x30 [xfs]
Apr 15 10:46:50 localhost kernel: [<ffffffff907edf5a>] __do_fault.isra.61+0x8a/0x100
Apr 15 10:46:50 localhost kernel: [<ffffffff907ee50c>] do_read_fault.isra.63+0x4c/0x1b0
Apr 15 10:46:50 localhost kernel: [<ffffffff907f5d50>] handle_mm_fault+0xa20/0xfb0
Apr 15 10:46:50 localhost kernel: [<ffffffff90d90653>] __do_page_fault+0x213/0x500
Apr 15 10:46:50 localhost kernel: [<ffffffff90d90a26>] trace_do_page_fault+0x56/0x150
Apr 15 10:46:50 localhost kernel: [<ffffffff90d8ffa2>] do_async_page_fault+0x22/0xf0
Apr 15 10:46:50 localhost kernel: [<ffffffff90d8c7a8>] async_page_fault+0x28/0x30
Apr 15 10:46:50 localhost kernel: Mem-Info:
Apr 15 10:46:50 localhost kernel: active_anon:7548052 inactive_anon:137382 isolated_anon:0#012 active_file:38 inactive_file:1482 isolated_file:31#012 unevictable:0 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:92639 slab_unreclaimable:14747#012 mapped:10563 shmem:405626 pagetables:47771 bounce:0#012 free:49990 free_pcp:143 free_cma:0
Apr 15 10:46:50 localhost kernel: Node 0 DMA free:15892kB min:32kB low:40kB high:48kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:16kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Apr 15 10:46:50 localhost kernel: lowmem_reserve[]: 0 2829 31991 31991
Apr 15 10:46:50 localhost kernel: Node 0 DMA32 free:122524kB min:5972kB low:7464kB high:8956kB active_anon:2526928kB inactive_anon:45416kB active_file:0kB inactive_file:1760kB unevictable:0kB isolated(anon):0kBisolated(file):0kB present:3129216kB managed:2897760kB mlocked:0kB dirty:0kB writeback:0kB mapped:4980kB shmem:146248kB slab_reclaimable:152112kB slab_unreclaimable:10172kB kernel_stack:816kB pagetables:17280kBunstable:0kB bounce:0kB free_pcp:196kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2018 all_unreclaimable? no
Apr 15 10:46:50 localhost kernel: lowmem_reserve[]: 0 0 29161 29161
Apr 15 10:46:50 localhost kernel: Node 0 Normal free:61544kB min:61572kB low:76964kB high:92356kB active_anon:27665280kB inactive_anon:504112kB active_file:204kB inactive_file:4168kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29864624kB mlocked:0kB dirty:0kB writeback:0kB mapped:37272kB shmem:1476256kB slab_reclaimable:218444kB slab_unreclaimable:48800kB kernel_stack:3904kB pagetables:173804kB unstable:0kB bounce:0kB free_pcp:376kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:4913 all_unreclaimable? no
Apr 15 10:46:50 localhost kernel: lowmem_reserve[]: 0 0 0 0
Apr 15 10:46:50 localhost kernel: Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
Apr 15 10:46:50 localhost kernel: Node 0 DMA32: 1095*4kB (UE) 809*8kB (UE) 1696*16kB (UEM) 2060*32kB (UEM) 200*64kB (UEM) 31*128kB (UE) 10*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 123236kB
Apr 15 10:46:50 localhost kernel: Node 0 Normal: 1511*4kB (UE) 1466*8kB (UEM) 1149*16kB (UEM) 469*32kB (UEM) 126*64kB (UEM) 27*128kB (UEM) 2*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 63196kB
Apr 15 10:46:50 localhost kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Apr 15 10:46:50 localhost kernel: 407831 total pagecache pages
Apr 15 10:46:50 localhost kernel: 0 pages in swap cache
Apr 15 10:46:50 localhost kernel: Swap cache stats: add 0, delete 0, find 0/0
Apr 15 10:46:50 localhost kernel: Free swap = 0kB
Apr 15 10:46:50 localhost kernel: Total swap = 0kB
Apr 15 10:46:50 localhost kernel: 8388478 pages RAM
Apr 15 10:46:50 localhost kernel: 0 pages HighMem/MovableOnly
Apr 15 10:46:50 localhost kernel: 193905 pages reserved
Apr 15 10:46:50 localhost kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
Apr 15 10:46:50 localhost kernel: [ 471] 0 471 31363 12826 67 0 0 systemd-journal
Apr 15 10:46:50 localhost kernel: [ 499] 0 499 12158 559 26 0 -1000 systemd-udevd
Apr 15 10:46:50 localhost kernel: [ 638] 0 638 13883 112 26 0 -1000 auditd
Apr 15 10:46:50 localhost kernel: [ 665] 999 665 153256 2291 61 0 0 polkitd
Apr 15 10:46:50 localhost kernel: [ 668] 81 668 16571 152 33 0 -900 dbus-daemon
Apr 15 10:46:50 localhost kernel: [ 674] 998 674 30147 122 29 0 0 chronyd
Apr 15 10:46:50 localhost kernel: [ 750] 0 750 6596 75 19 0 0 systemd-logind
Apr 15 10:46:50 localhost kernel: [ 793] 0 793 31598 160 17 0 0 crond
Apr 15 10:46:50 localhost kernel: [ 908] 0 908 89710 5621 97 0 0 firewalld
Apr 15 10:46:50 localhost kernel: [ 2765] 0 2765 143572 2831 97 0 0 tuned
Apr 15 10:46:50 localhost kernel: [ 2770] 0 2770 173343 8717 182 0 0 rsyslogd
Apr 15 10:46:50 localhost kernel: [ 2827] 0 2827 27552 34 10 0 0 agetty
Apr 15 10:46:50 localhost kernel: [ 2862] 0 2862 6117 115 16 0 -1000 sshd
Apr 15 10:46:50 localhost kernel: [ 3133] 0 3133 22436 260 42 0 0 master
Apr 15 10:46:50 localhost kernel: [ 3136] 89 3136 22479 256 45 0 0 qmgr
Apr 15 10:46:50 localhost kernel: [29274] 0 29274 6967 283 19 0 0 sshd
Apr 15 10:46:50 localhost kernel: [24872] 0 24872 5988 95 16 0 0 sftp-server
Apr 15 10:46:50 localhost kernel: [13398] 0 13398 6845 130 18 0 0 sshd
Apr 15 10:46:50 localhost kernel: [16576] 0 16576 28887 104 13 0 0 bash
Apr 15 10:46:50 localhost kernel: [12019] 0 12019 6738 1290 18 0 0 tmux
Apr 15 10:46:50 localhost kernel: [12020] 0 12020 28887 116 13 0 0 bash
Apr 15 10:46:50 localhost kernel: [31979] 0 31979 40558 208 36 0 0 top
Apr 15 10:46:50 localhost kernel: [15285] 0 15285 72341147 7258379 46500 0 0 dgraph
Apr 15 10:46:50 localhost kernel: [14476] 0 14476 28887 100 13 0 0 bash
Apr 15 10:46:50 localhost kernel: [14980] 0 14980 5011 69 15 0 0 tmux
Apr 15 10:46:50 localhost kernel: [ 3844] 89 3844 22462 252 44 0 0 pickup
Apr 15 10:46:50 localhost kernel: [25393] 0 25393 27014 19 10 0 0 sleep
Apr 15 10:46:50 localhost kernel: [27707] 0 27707 27014 19 9 0 0 sleep
Apr 15 10:46:50 localhost kernel: [27710] 0 27710 27014 18 10 0 0 sleep
Apr 15 10:46:50 localhost kernel: [28642] 0 28642 27014 24 10 0 0 sleep
Apr 15 10:46:50 localhost kernel: [28645] 0 28645 27014 24 10 0 0 sleep
Apr 15 10:46:50 localhost kernel: [28650] 0 28650 27014 23 9 0 0 sleep
Apr 15 10:46:50 localhost kernel: [28653] 0 28653 4853 37 14 0 0 ls
Apr 15 10:46:50 localhost kernel: [28654] 0 28654 4853 38 13 0 0 ls
Apr 15 10:46:50 localhost kernel: [28655] 0 28655 27014 22 9 0 0 sleep
Apr 15 10:46:50 localhost kernel: Out of memory: Kill process 15285 (dgraph) score 864 or sacrifice child
Apr 15 10:46:50 localhost kernel: Killed process 15285 (dgraph), UID 0, total-vm:289364588kB, anon-rss:29033516kB, file-rss:0kB, shmem-rss:0kB
yes i use 21.03
centos 7.6 / 20 Core / 48G memery / 2T ssd storage
the rdf dataset approximately 700G .