Can not query dataset with 222M Edge in a bare metal server with 65GB Ram(Memory overflow)


(Taha Sevim) #1

Hi,

I have a single node that is bare metal with 64GB Ram.
I have inserted a dataset that consists of 650K node and 222M using bulk loader with below bash commands:

Initialization of a temporary zero:

`docker run --rm --name bulker -d -p 5080:5080 -p 6080:6080 -v ~/dgraph:/dgraph dgraph/dgraph  dgraph zero --my=zero:5080 `

Insertion of a single RDF file ($1) with its schema ($2):

`docker exec bulker dgraph bulk -r $1 -s $2 --map_shards=1 --reduce_shards=1`

My inserted RDF file consists of 222M triplet and schema of my dataset is illustrated below:
41%201

There was not any problem in insertion of dataset.
However, the desired query that I want to retrieve its result is:

	{
  q(func: allofterms(name@., "iphone 7"))@filter(has(has_url)){
			total : count(has_url)
			has_url {
				url_name :name@.
				~has_url @filter(not allofterms(name@., "iphone 7")){
					keyword :name@.
				}
			}
		} 

After this query, it waits approximately 2 minutes where I am monitoring the processes of Dgraph with htop at the same time that results in out of memory. Actually in Ratel UI, it sends this message:
`Error Name: TypeError

Message: Failed to fetch

URL: undefined`
But I know that actually process of my alpha becomes out of memory since I am monitoring usages with htop.

Note: Simple queries actually work and I am able get a simple query result in 0-2 seconds.

What should I do ? How I can optimize my query, actually the does problem result from the query or any other reason such as wrong setup ?

Thanks for your helps !
Bonus: Below query works perfectly (200-300ms):

{
  q(func: allofterms(name@., "iphone 7"))@filter(has(has_url))@cascade{
			total : count(has_url)
			has_url {
				url_name :name@.
			}
		} 
}

I think that problem occurs due to reverse edge querying ~has_url which results in a performance issue that I faced.


(Michel Conrado) #2

Please share logs, machine stats and so on.


(Taha Sevim) #3

Container logs:

server_1  | I0905 14:38:37.541091       1 init.go:88]
server_1  |
server_1  | Dgraph version   : v1.0.16
server_1  | Commit SHA-1     : 0590ee95
server_1  | Commit timestamp : 2019-07-11 11:52:54 -0700
server_1  | Branch           : HEAD
server_1  | Go version       : go1.12.5
server_1  |
server_1  | For Dgraph official documentation, visit https://docs.dgraph.io.
server_1  | For discussions about Dgraph     , visit https://discuss.dgraph.io.
server_1  | To say hi to the community       , visit https://dgraph.slack.com.
server_1  |
server_1  | Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
server_1  | Copyright 2015-2018 Dgraph Labs, Inc.
server_1  |
server_1  |
server_1  | I0905 14:38:37.541362       1 run.go:461] x.Config: {DebugMode:false PortOffset:0 QueryEdgeLimit:1000000 NormalizeNodeLimit:10000}
server_1  | I0905 14:38:37.541372       1 run.go:462] worker.Config: {ExportPath:export NumPendingProposals:256 Tracing:1 MyAddr:server:7080 ZeroAddr:zero:5080 RaftId:0 ExpandEdge:true WhiteListedIPRanges:[] MaxRetries:-1 StrictMutations:false SnapshotAfter:10000 AbortOlderThan:5m0s}
server_1  | I0905 14:38:37.541383       1 run.go:463] edgraph.Config: {PostingDir:p BadgerTables:mmap BadgerVlog:mmap WALDir:w MutationsMode:0 AuthToken: AllottedMemory:30000}
server_1  | I0905 14:38:37.541417       1 server.go:115] Setting Badger table load option: mmap
server_1  | I0905 14:38:37.541420       1 server.go:127] Setting Badger value log load option: mmap
server_1  | I0905 14:38:37.541423       1 server.go:155] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:false TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 ReadOnly:false Truncate:true Logger:0x1f86470 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 managedTxns:false maxBatchCount:0 maxBatchSize:0}
server_1  | I0905 14:38:37.542682       1 node.go:88] All 1 tables opened in 0s
server_1  | I0905 14:38:37.592500       1 node.go:88] Replaying file id: 0 at offset: 50278
server_1  | I0905 14:38:37.592525       1 node.go:88] Replay took: 13.216µs
server_1  | I0905 14:38:37.592578       1 server.go:115] Setting Badger table load option: mmap
server_1  | I0905 14:38:37.592587       1 server.go:127] Setting Badger value log load option: mmap
server_1  | I0905 14:38:37.592592       1 server.go:169] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 ReadOnly:false Truncate:true Logger:0x1f86470 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 managedTxns:false maxBatchCount:0 maxBatchSize:0}
zero_1    | I0905 06:22:33.448489       1 init.go:88]
zero_1    |
zero_1    | Dgraph version   : v1.0.16
zero_1    | Commit SHA-1     : 0590ee95
zero_1    | Commit timestamp : 2019-07-11 11:52:54 -0700
zero_1    | Branch           : HEAD
zero_1    | Go version       : go1.12.5
zero_1    |
zero_1    | For Dgraph official documentation, visit https://docs.dgraph.io.
zero_1    | For discussions about Dgraph     , visit https://discuss.dgraph.io.
zero_1    | To say hi to the community       , visit https://dgraph.slack.com.
zero_1    |
zero_1    | Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
zero_1    | Copyright 2015-2018 Dgraph Labs, Inc.
zero_1    |
zero_1    |
zero_1    | I0905 06:22:33.448668       1 run.go:98] Setting up grpc listener at: 0.0.0.0:5080
zero_1    | I0905 06:22:33.448739       1 run.go:98] Setting up http listener at: 0.0.0.0:6080
zero_1    | badger 2019/09/05 06:22:33 INFO: All 1 tables opened in 0s
zero_1    | badger 2019/09/05 06:22:33 INFO: Replaying file id: 0 at offset: 198457
zero_1    | badger 2019/09/05 06:22:33 INFO: Replay took: 9.485µs
zero_1    | I0905 06:22:33.479698       1 node.go:157] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc0000c1e00 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1f86470 DisableProposalForwarding:false}
zero_1    | I0905 06:22:33.480038       1 node.go:317] Found hardstate: {Term:6 Vote:1 Commit:1075 XXX_unrecognized:[]}
zero_1    | I0905 06:22:33.480717       1 node.go:326] Group 0 found 1076 entries
zero_1    | I0905 06:22:33.480732       1 raft.go:433] Restarting node for dgraphzero
zero_1    | I0905 06:22:33.480786       1 node.go:88] 1 became follower at term 6
zero_1    | I0905 06:22:33.480831       1 node.go:88] newRaft 1 [peers: [], term: 6, commit: 1075, applied: 0, lastindex: 1075, lastterm: 6]
zero_1    | I0905 06:22:33.480924       1 run.go:282] Running Dgraph Zero...
zero_1    | I0905 06:22:33.537869       1 node.go:193] Setting conf state to nodes:1
zero_1    | I0905 06:22:33.538037       1 raft.go:676] Done applying conf change at 0x1
zero_1    | I0905 06:22:33.544150       1 pool.go:138] CONNECTED to server:7080
zero_1    | W0905 06:22:33.544312       1 pool.go:168] Shutting down extra connection to server:7080
zero_1    | W0905 06:22:33.546561       1 pool.go:226] Connection lost with server:7080. Error: rpc error: code = Canceled desc = context canceled
zero_1    | I0905 06:22:34.481346       1 node.go:88] 1 no leader at term 6; dropping index reading msg
zero_1    | I0905 06:22:34.755887       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 06:22:36.281587       1 node.go:88] 1 is starting a new election at term 6
zero_1    | I0905 06:22:36.281642       1 node.go:88] 1 became pre-candidate at term 6
zero_1    | I0905 06:22:36.281658       1 node.go:88] 1 received MsgPreVoteResp from 1 at term 6
zero_1    | I0905 06:22:36.281822       1 node.go:88] 1 became candidate at term 7
zero_1    | I0905 06:22:36.281849       1 node.go:88] 1 received MsgVoteResp from 1 at term 7
zero_1    | I0905 06:22:36.282042       1 node.go:88] 1 became leader at term 7
zero_1    | I0905 06:22:36.282074       1 node.go:88] raft.node: 1 elected leader 1 at term 7
zero_1    | I0905 06:22:36.282153       1 raft.go:642] I've become the leader, updating leases.
zero_1    | I0905 06:22:36.282184       1 assign.go:42] Updated Lease id: 10250001. Txn Ts: 50001
zero_1    | W0905 06:22:36.481535       1 node.go:668] [0x1] Read index context timed out
zero_1    | I0905 06:22:36.481986       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | I0905 06:22:36.483292       1 zero.go:396] Got connection request: id:1 addr:"server:7080"
zero_1    | I0905 06:22:36.483710       1 zero.go:523] Connected: id:1 addr:"server:7080"
zero_1    | W0905 06:43:38.278776       1 node.go:668] [0x1] Read index context timed out
zero_1    | E0905 06:43:49.422721       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 06:44:01.094119       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | E0905 06:44:32.524098       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 06:45:44.973761       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | W0905 06:45:45.119598       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 1m31.523s. Breakdown: [{proposals 8.915s} {disk 8.626s} {advance 4.328s}]. Num entries: 0. MustSync: false
zero_1    | W0905 06:45:45.321789       1 pool.go:226] Connection lost with server:7080. Error: rpc error: code = Unavailable desc = transport is closing
zero_1    | I0905 06:45:56.246455       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 06:45:56.247730       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | I0905 06:45:56.261262       1 zero.go:396] Got connection request: id:1 addr:"server:7080"
zero_1    | I0905 06:45:56.261615       1 zero.go:523] Connected: id:1 addr:"server:7080"
zero_1    | W0905 07:13:28.529574       1 node.go:668] [0x1] Read index context timed out
zero_1    | W0905 07:13:28.756203       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 2.96s. Breakdown: [{disk 526ms} {proposals 105ms} {advance 0s}]. Num entries: 0. MustSync: false
zero_1    | W0905 07:13:54.516033       1 node.go:668] [0x1] Read index context timed out
zero_1    | W0905 07:13:54.550225       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 25.793s. Breakdown: [{disk 18.31s} {proposals 7.484s} {advance 0s}]. Num entries: 0. MustSync: false
zero_1    | W0905 07:15:24.179545       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 168ms. Breakdown: [{disk 168ms} {proposals 0s} {advance 0s}]. Num entries: 0. MustSync: false
zero_1    | I0905 07:15:36.157365       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 07:15:36.162399       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | I0905 07:15:36.171108       1 zero.go:396] Got connection request: id:1 addr:"server:7080"
zero_1    | I0905 07:15:36.171560       1 zero.go:523] Connected: id:1 addr:"server:7080"
zero_1    | W0905 08:00:41.653357       1 node.go:668] [0x1] Read index context timed out
zero_1    | E0905 08:01:04.650223       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 08:01:42.608319       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | I0905 08:03:28.195403       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 08:03:28.205125       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | I0905 08:03:28.207881       1 zero.go:396] Got connection request: id:1 addr:"server:7080"
zero_1    | I0905 08:03:28.208234       1 zero.go:523] Connected: id:1 addr:"server:7080"
zero_1    | W0905 08:08:18.251018       1 node.go:668] [0x1] Read index context timed out
zero_1    | W0905 08:08:49.098724       1 node.go:668] [0x1] Read index context timed out
zero_1    | W0905 08:08:49.722306       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 16.422s. Breakdown: [{proposals 14.072s} {disk 2.314s} {advance 22ms}]. Num entries: 0. MustSync: false
zero_1    | E0905 08:11:09.940008       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 08:11:10.341969       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | I0905 08:11:20.568145       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 08:11:20.570036       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | I0905 08:11:20.572554       1 zero.go:396] Got connection request: id:1 addr:"server:7080"
zero_1    | I0905 08:11:20.572963       1 zero.go:523] Connected: id:1 addr:"server:7080"
zero_1    | W0905 08:11:23.668913       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 155ms. Breakdown: [{sync 154ms} {disk 0s} {proposals 0s} {advance 0s}]. Num entries: 1. MustSync: true
zero_1    | W0905 09:25:54.926473       1 node.go:668] [0x1] Read index context timed out
zero_1    | E0905 09:26:26.989888       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 09:26:36.522775       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | E0905 09:29:27.703510       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 09:29:55.912520       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | W0905 09:31:20.270503       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 1m23.749s. Breakdown: [{advance 47.452s} {proposals 31.748s} {disk 4.549s}]. Num entries: 0. MustSync: false
zero_1    | E0905 09:31:20.319783       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 09:31:20.319820       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | I0905 09:31:20.992546       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 09:31:20.993687       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | I0905 09:31:20.993734       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 09:31:20.993884       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | W0905 09:32:34.449041       1 node.go:668] [0x1] Read index context timed out
zero_1    | E0905 09:34:06.623163       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 09:35:10.879947       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | E0905 09:38:34.605041       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 09:41:03.912050       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | W0905 09:41:26.885489       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 170ms. Breakdown: [{disk 162ms} {proposals 0s} {advance 0s}]. Num entries: 0. MustSync: false
zero_1    | I0905 09:41:27.065388       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 09:41:27.074384       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | I0905 09:41:27.148740       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 09:41:27.148867       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | I0905 09:41:27.151439       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 09:41:27.151551       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | E0905 09:42:21.578555       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 09:42:38.791525       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | E0905 09:43:22.858297       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 09:43:25.543838       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | I0905 09:43:34.648816       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 09:43:34.657608       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | I0905 09:43:34.662837       1 zero.go:396] Got connection request: id:1 addr:"server:7080"
zero_1    | I0905 09:43:34.662965       1 zero.go:523] Connected: id:1 addr:"server:7080"
zero_1    | W0905 10:30:49.898362       1 node.go:668] [0x1] Read index context timed out
zero_1    | E0905 10:30:54.092110       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 10:30:55.023830       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | E0905 10:31:05.618204       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 10:31:20.254896       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | E0905 10:31:50.542213       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | W0905 10:31:54.440446       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 11.776s. Breakdown: [{proposals 2.382s} {disk 676ms} {advance 0s}]. Num entries: 0. MustSync: false
zero_1    | E0905 10:31:54.489541       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | I0905 10:33:20.624647       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 10:33:20.626220       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | I0905 10:33:20.662465       1 zero.go:396] Got connection request: id:1 addr:"server:7080"
zero_1    | I0905 10:33:20.662878       1 zero.go:523] Connected: id:1 addr:"server:7080"
zero_1    | W0905 10:33:20.880021       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 117ms. Breakdown: [{sync 103ms} {disk 14ms} {proposals 0s} {advance 0s}]. Num entries: 1. MustSync: true
zero_1    | W0905 10:33:24.099437       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 135ms. Breakdown: [{sync 135ms} {disk 0s} {proposals 0s} {advance 0s}]. Num entries: 1. MustSync: true
ratel_1   | 2019/09/05 06:22:34 Listening on port 8000...
server_1  | I0905 14:38:39.212704       1 node.go:88] All 8 tables opened in 1.616s
server_1  | I0905 14:38:39.246758       1 node.go:88] Replaying file id: 16 at offset: 92097704
server_1  | I0905 14:38:39.246825       1 node.go:88] Replay took: 31.835µs
server_1  | I0905 14:38:39.247393       1 groups.go:96] Current Raft Id: 0x1
server_1  | I0905 14:38:39.247710       1 worker.go:81] Worker listening at address: [::]:7080
server_1  | I0905 14:38:39.247828       1 run.go:392] gRPC server started.  Listening on port 9080
server_1  | I0905 14:38:39.247893       1 run.go:393] HTTP server started.  Listening on port 8080
server_1  | I0905 14:38:39.347803       1 pool.go:138] CONNECTED to zero:5080
zero_1    | I0905 14:38:39.351071       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 14:38:39.351447       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | I0905 14:38:39.353634       1 zero.go:396] Got connection request: id:1 addr:"server:7080"
zero_1    | I0905 14:38:39.354031       1 zero.go:523] Connected: id:1 addr:"server:7080"
server_1  | I0905 14:38:39.354508       1 groups.go:119] Connected to group zero. Assigned group: 0
server_1  | I0905 14:38:39.354557       1 groups.go:121] Raft Id after connection to Zero: 0x1
server_1  | I0905 14:38:39.354976       1 draft.go:74] Node ID: 0x1 with GroupID: 1
server_1  | I0905 14:38:39.355116       1 node.go:157] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc0006d1ef0 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1f86470 DisableProposalForwarding:false}
server_1  | I0905 14:38:39.475575       1 node.go:317] Found hardstate: {Term:25 Vote:1 Commit:239 XXX_unrecognized:[]}
server_1  | I0905 14:38:39.475675       1 node.go:326] Group 1 found 240 entries
server_1  | I0905 14:38:39.475680       1 draft.go:1317] Restarting node for group: 1
server_1  | I0905 14:38:39.475731       1 node.go:88] 1 became follower at term 25
server_1  | I0905 14:38:39.475793       1 node.go:88] newRaft 1 [peers: [], term: 25, commit: 239, applied: 0, lastindex: 239, lastterm: 25]
server_1  | I0905 14:38:39.475804       1 groups.go:139] Server is ready
server_1  | I0905 14:38:39.475880       1 groups.go:650] Got address of a Zero leader: zero:5080
server_1  | I0905 14:38:39.475894       1 draft.go:716] Found Raft progress: 0
server_1  | I0905 14:38:39.475931       1 groups.go:663] Starting a new membership stream receive from zero:5080.
server_1  | I0905 14:38:39.476131       1 groups.go:680] Received first state update from Zero: counter:1091 groups:<key:1 value:<members:<key:1 value:<id:1 group_id:1 addr:"server:7080" leader:true last_update:1567679603 > > tablets:<key:"_predicate_" value:<group_id:1 predicate:"_predicate_" > > tablets:<key:"has_serpword" value:<group_id:1 predicate:"has_serpword" > > tablets:<key:"has_url" value:<group_id:1 predicate:"has_url" > > tablets:<key:"name" value:<group_id:1 predicate:"name" > > checksum:9310096136728819195 > > zeros:<key:1 value:<id:1 addr:"zero:5080" leader:true > > maxLeaseId:10250000 maxTxnTs:60000 maxRaftId:1 cid:"67177056-c314-4246-87f8-d03fe3840c00"
server_1  | I0905 14:38:39.551838       1 node.go:193] Setting conf state to nodes:1
server_1  | I0905 14:38:39.552424       1 mutation.go:145] Done schema update predicate:"_predicate_" value_type:STRING list:true
server_1  | I0905 14:38:40.807847       1 node.go:88] Got compaction priority: {level:1 score:1.059707336127758 dropPrefix:[]}
server_1  | I0905 14:38:40.808001       1 node.go:88] Running for level: 1
server_1  | I0905 14:38:41.933545       1 node.go:88] LOG Compact 1->2, del 1 tables, add 1 tables, took 1.125474317s
server_1  | I0905 14:38:41.933585       1 node.go:88] Compaction for level: 1 DONE
server_1  | I0905 14:38:42.216193       1 node.go:88] 1 is starting a new election at term 25
server_1  | I0905 14:38:42.216258       1 node.go:88] 1 became pre-candidate at term 25
server_1  | I0905 14:38:42.216272       1 node.go:88] 1 received MsgPreVoteResp from 1 at term 25
server_1  | I0905 14:38:42.216393       1 node.go:88] 1 became candidate at term 26
server_1  | I0905 14:38:42.216415       1 node.go:88] 1 received MsgVoteResp from 1 at term 26
server_1  | I0905 14:38:42.216553       1 node.go:88] 1 became leader at term 26
server_1  | I0905 14:38:42.216576       1 node.go:88] raft.node: 1 elected leader 1 at term 26
server_1  | I0905 14:38:42.476424       1 groups.go:725] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
server_1  | I0905 14:38:42.476487       1 groups.go:734] Got Zero leader: zero:5080
server_1  | I0905 14:45:11.195900       1 groups.go:788] Zero leadership changed. Renewing oracle delta stream.
server_1  | W0905 14:45:37.058831       1 groups.go:706] No membership update for 10s. Closing connection to Zero.
zero_1    | E0905 14:46:21.770375       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0905 14:46:27.000771       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | W0905 14:46:29.409358       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 992ms. Breakdown: [{disk 992ms} {proposals 0s} {advance 0s}]. Num entries: 0. MustSync: false
server_1  | I0905 14:46:37.530938       1 init.go:88]
server_1  |
server_1  | Dgraph version   : v1.0.16
server_1  | Commit SHA-1     : 0590ee95
server_1  | Commit timestamp : 2019-07-11 11:52:54 -0700
server_1  | Branch           : HEAD
server_1  | Go version       : go1.12.5
server_1  |
server_1  | For Dgraph official documentation, visit https://docs.dgraph.io.
server_1  | For discussions about Dgraph     , visit https://discuss.dgraph.io.
server_1  | To say hi to the community       , visit https://dgraph.slack.com.
server_1  |
server_1  | Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
server_1  | Copyright 2015-2018 Dgraph Labs, Inc.
server_1  |
server_1  |
server_1  | I0905 14:46:37.544378       1 run.go:461] x.Config: {DebugMode:false PortOffset:0 QueryEdgeLimit:1000000 NormalizeNodeLimit:10000}
server_1  | I0905 14:46:37.544435       1 run.go:462] worker.Config: {ExportPath:export NumPendingProposals:256 Tracing:1 MyAddr:server:7080 ZeroAddr:zero:5080 RaftId:0 ExpandEdge:true WhiteListedIPRanges:[] MaxRetries:-1 StrictMutations:false SnapshotAfter:10000 AbortOlderThan:5m0s}
server_1  | I0905 14:46:37.544489       1 run.go:463] edgraph.Config: {PostingDir:p BadgerTables:mmap BadgerVlog:mmap WALDir:w MutationsMode:0 AuthToken: AllottedMemory:30000}
server_1  | I0905 14:46:37.544557       1 server.go:115] Setting Badger table load option: mmap
server_1  | I0905 14:46:37.544571       1 server.go:127] Setting Badger value log load option: mmap
server_1  | I0905 14:46:37.544583       1 server.go:155] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:false TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 ReadOnly:false Truncate:true Logger:0x1f86470 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 managedTxns:false maxBatchCount:0 maxBatchSize:0}
server_1  | I0905 14:46:37.602066       1 node.go:88] All 1 tables opened in 18ms
server_1  | I0905 14:46:37.647678       1 node.go:88] Replaying file id: 0 at offset: 50278
server_1  | I0905 14:46:37.660386       1 node.go:88] Replay took: 12.66766ms
server_1  | I0905 14:46:37.660560       1 server.go:115] Setting Badger table load option: mmap
server_1  | I0905 14:46:37.660587       1 server.go:127] Setting Badger value log load option: mmap
server_1  | I0905 14:46:37.660602       1 server.go:169] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:false TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 ReadOnly:false Truncate:true Logger:0x1f86470 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:2 CompactL0OnClose:true LogRotatesToFlush:2 managedTxns:false maxBatchCount:0 maxBatchSize:0}
server_1  | I0905 14:46:39.892346       1 node.go:88] All 8 tables opened in 2.215s
server_1  | I0905 14:46:39.937340       1 node.go:88] Replaying file id: 16 at offset: 92097704
server_1  | I0905 14:46:39.954760       1 node.go:88] Replay took: 17.391709ms
server_1  | I0905 14:46:39.956018       1 worker.go:81] Worker listening at address: [::]:7080
server_1  | I0905 14:46:39.956141       1 run.go:392] gRPC server started.  Listening on port 9080
server_1  | I0905 14:46:39.956194       1 run.go:393] HTTP server started.  Listening on port 8080
server_1  | I0905 14:46:39.968844       1 groups.go:96] Current Raft Id: 0x1
server_1  | I0905 14:46:40.069395       1 pool.go:138] CONNECTED to zero:5080
zero_1    | I0905 14:46:40.091144       1 zero.go:396] Got connection request: cluster_info_only:true
zero_1    | I0905 14:46:40.104536       1 zero.go:414] Connected: cluster_info_only:true
zero_1    | I0905 14:46:40.126420       1 zero.go:396] Got connection request: id:1 addr:"server:7080"
zero_1    | I0905 14:46:40.126498       1 zero.go:523] Connected: id:1 addr:"server:7080"
server_1  | I0905 14:46:40.126639       1 groups.go:119] Connected to group zero. Assigned group: 0
server_1  | I0905 14:46:40.126653       1 groups.go:121] Raft Id after connection to Zero: 0x1
server_1  | I0905 14:46:40.126765       1 draft.go:74] Node ID: 0x1 with GroupID: 1
server_1  | I0905 14:46:40.126800       1 node.go:157] Setting raft.Config to: &{ID:1 peers:[] learners:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc009a0e270 Applied:0 MaxSizePerMsg:262144 MaxCommittedSizePerReady:67108864 MaxUncommittedEntriesSize:0 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1f86470 DisableProposalForwarding:false}
server_1  | I0905 14:46:40.195605       1 node.go:317] Found hardstate: {Term:26 Vote:1 Commit:242 XXX_unrecognized:[]}
server_1  | I0905 14:46:40.195692       1 node.go:326] Group 1 found 243 entries
server_1  | I0905 14:46:40.195698       1 draft.go:1317] Restarting node for group: 1
server_1  | I0905 14:46:40.195733       1 node.go:88] 1 became follower at term 26
server_1  | I0905 14:46:40.195761       1 node.go:88] newRaft 1 [peers: [], term: 26, commit: 242, applied: 0, lastindex: 242, lastterm: 26]
server_1  | I0905 14:46:40.195777       1 groups.go:139] Server is ready
server_1  | I0905 14:46:40.195844       1 groups.go:650] Got address of a Zero leader: zero:5080
server_1  | I0905 14:46:40.195855       1 draft.go:716] Found Raft progress: 0
server_1  | I0905 14:46:40.195924       1 groups.go:663] Starting a new membership stream receive from zero:5080.
server_1  | I0905 14:46:40.208698       1 groups.go:680] Received first state update from Zero: counter:1093 groups:<key:1 value:<members:<key:1 value:<id:1 group_id:1 addr:"server:7080" leader:true last_update:1567694322 > > tablets:<key:"_predicate_" value:<group_id:1 predicate:"_predicate_" > > tablets:<key:"has_serpword" value:<group_id:1 predicate:"has_serpword" > > tablets:<key:"has_url" value:<group_id:1 predicate:"has_url" > > tablets:<key:"name" value:<group_id:1 predicate:"name" > > checksum:9310096136728819195 > > zeros:<key:1 value:<id:1 addr:"zero:5080" leader:true > > maxLeaseId:10250000 maxTxnTs:60000 maxRaftId:1 cid:"67177056-c314-4246-87f8-d03fe3840c00"
server_1  | I0905 14:46:40.341596       1 node.go:193] Setting conf state to nodes:1
zero_1    | W0905 14:46:40.341596       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 145ms. Breakdown: [{sync 117ms} {disk 28ms} {proposals 0s} {advance 0s}]. Num entries: 1. MustSync: true
server_1  | W0905 14:46:40.343070       1 draft.go:914] Raft.Ready took too long to process: Timer Total: 147ms. Breakdown: [{sync 145ms} {proposals 2ms} {disk 0s} {advance 0s}] Num entries: 0. MustSync: true
server_1  | I0905 14:46:40.363854       1 mutation.go:145] Done schema update predicate:"_predicate_" value_type:STRING list:true
server_1  | I0905 14:46:43.416474       1 node.go:88] 1 is starting a new election at term 26
server_1  | I0905 14:46:43.416526       1 node.go:88] 1 became pre-candidate at term 26
server_1  | I0905 14:46:43.416540       1 node.go:88] 1 received MsgPreVoteResp from 1 at term 26
server_1  | I0905 14:46:43.416700       1 node.go:88] 1 became candidate at term 27
server_1  | I0905 14:46:43.416715       1 node.go:88] 1 received MsgVoteResp from 1 at term 27
server_1  | I0905 14:46:43.416879       1 node.go:88] 1 became leader at term 27
server_1  | I0905 14:46:43.416907       1 node.go:88] raft.node: 1 elected leader 1 at term 27
server_1  | I0905 14:46:44.196411       1 groups.go:725] Leader idx=0x1 of group=1 is connecting to Zero for txn updates
server_1  | I0905 14:46:44.196466       1 groups.go:734] Got Zero leader: zero:5080

Htop results right before execution of query finishes:


(Taha Sevim) #4

Machine stats:

H/W path      Device           Class          Description
=========================================================
                               system         B360 HD3P-LM (Default string)
/0                             bus            B360HD3PLM-CF
/0/0                           memory         64KiB BIOS
/0/3a                          memory         64GiB System Memory
/0/3a/0                        memory         16GiB DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
/0/3a/1                        memory         16GiB DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
/0/3a/2                        memory         16GiB DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
/0/3a/3                        memory         16GiB DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
/0/44                          memory         512KiB L1 cache
/0/45                          memory         2MiB L2 cache
/0/46                          memory         16MiB L3 cache
/0/47                          processor      Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
/0/100                         bridge         Intel Corporation
/0/100/2                       display        Intel Corporation
/0/100/12                      generic        Cannon Lake PCH Thermal Controller
/0/100/14                      bus            Cannon Lake PCH USB 3.1 xHCI Host Controller
/0/100/14/0   usb1             bus            xHCI Host Controller
/0/100/14/1   usb2             bus            xHCI Host Controller
/0/100/14.2                    memory         RAM memory
/0/100/16                      communication  Cannon Lake PCH HECI Controller
/0/100/17                      storage        Cannon Lake PCH SATA AHCI Controller
/0/100/1d                      bridge         Cannon Lake PCH PCI Express Root Port 9
/0/100/1f                      bridge         Intel Corporation
/0/100/1f.4                    bus            Cannon Lake PCH SMBus Controller
/0/100/1f.5                    bus            Cannon Lake PCH SPI Controller
/0/100/1f.6   eno1             network        Ethernet Connection (7) I219-LM
/0/1          scsi0            storage
/0/1/0.0.0    /dev/sda         disk           8001GB HGST HUH721008AL
/0/1/0.0.0/1  /dev/sda1        volume         31GiB RAID partition
/0/1/0.0.0/2  /dev/sda2        volume         511MiB RAID partition
/0/1/0.0.0/3  /dev/sda3        volume         2013GiB RAID partition
/0/1/0.0.0/4  /dev/sda4        volume         5405GiB RAID partition
/0/1/0.0.0/5  /dev/sda5        volume         1023KiB BIOS Boot partition
/0/2          scsi1            storage
/0/2/0.0.0    /dev/sdb         disk           8001GB HGST HUH721008AL
/0/2/0.0.0/1  /dev/sdb1        volume         31GiB RAID partition
/0/2/0.0.0/2  /dev/sdb2        volume         511MiB RAID partition
/0/2/0.0.0/3  /dev/sdb3        volume         2013GiB RAID partition
/0/2/0.0.0/4  /dev/sdb4        volume         5405GiB RAID partition
/0/2/0.0.0/5  /dev/sdb5        volume         1023KiB BIOS Boot partition
/1                             power          To Be Filled By O.E.M.
/2            veth57ff0b2      network        Ethernet interface
/3            docker0          network        Ethernet interface
/4            veth137ffd6      network        Ethernet interface
/5            veth7ba5600      network        Ethernet interface
/6            br-a2e20f2733dd  network        Ethernet interface

As I stated, I think that query that I tried may not be efficient.


(Michel Conrado) #5

Do you have logs from the moment of your issue?

BTW, are you able to resolve those addrs? feels like your bare metal server can’t. I see issues with connection in the logs.


(Taha Sevim) #6

The only logs that I see from docker-compose logs command when issue occurs is:

zero_1    | W0905 14:46:40.341596       1 raft.go:707] Raft.Ready took too long to process: Timer Total: 145ms. Breakdown: [{sync 117ms} {disk 28ms} {proposals 0s} {advance 0s}]. Num entries: 1. MustSync: true
zero_1    | E0906 04:13:30.790387       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0906 04:13:45.762548       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded
zero_1    | E0906 04:15:47.937404       1 node.go:653] Error while trying to call ReadIndex: context deadline exceeded
zero_1    | E0906 04:15:49.091850       1 node.go:706] [0x1] While trying to do lin read index: context deadline exceeded

Ratel UI output:


Did you mean any other kind of logs that doesn’t output to screen but to a file or something else ?

In my remote server, of course I can not resolve addresses such as server:7080 and zero:5080 since they are connected with a docker network bridge that means those containers are able to resolve each other but not from outside of the container network. Therefore, I am able to access those containers with those ports from localhost. Did I get what you mean or are you questioning something else which I missed?

BTW, thanks for your interest @MichelDiz


(Michel Conrado) #7

Okay, I’ll investigate that. But have you tested with master?
Also, you can use Jaeger to troubleshoot it.


(Taha Sevim) #8

I have only a worker node (bare metal machine) where I am testing my queries. I am not sure what you mean by stating that :confused: I hope I got your question right.

I did not know Jaeger so I will investigate Jaeger thanks !
I appreciate for you helps.


(Michel Conrado) #9

I mean, shut down your current node for a while (keep it safe). And, In another path/folder, test with a newer version of Dgraph (“master branch” is newer). Redo the Bulk load and etc.

Get Dgraph v1.1.0 https://github.com/dgraph-io/dgraph/releases

Here is the v1.0.18-rc1 binary https://drive.google.com/open?id=1QSSx2mb5LG09D1ywN-FBh5FXZXQOvaMK

You could also checkout to v1.0.18-rc1 https://github.com/dgraph-io/dgraph/tree/v1.0.18-rc1 or any other and build it your self. Just run “go buil” in ~/go/src/github.com/dgraph-io/dgraph/dgraph/


(Taha Sevim) #10

I tried newer version after you told me, same problem still occurs. I just want to underline again, our data consists of two nodes that words and urls. Even though we have 650K nodes and 222M edges at the moment, we will scale it to at least 4 Billion nodes. For now, I still believe that source of the problem exists since query result is very huge, but I would expect that Dgraph could handle this kind of big data chunk with that machine stats since Dgraph seems the most convenient and the most exciting one in the industry as we see.


(Lucas Wang) #11

@taha Thanks for sharing this interesting use case with us. I created some bogus data having similar structure as yours: it has 300 phone nodes and 1M url nodes, and phones are randomly connected to urls through the <has_url> predicate.

package main

import (
    "fmt"
    "math/rand"
)

const (
    numPhones = 300
    numUrls   = 1000000
)

func main() {
    // print all phone names using blank nodes                                                                                                       
    for i := 0; i < numPhones; i++ {
        fmt.Printf("_:phone%d <name> \"phone %d\" .\n", i, i)
    }
    // print all urls using the blank nodes                                                                                                          
    for i := 0; i < numUrls; i++ {
        fmt.Printf("_:url%d <name> \"http://simple.com/url%d\" .\n", i, i)
    }

    totalCombs := numUrls * numPhones
    for j := 0; j < totalCombs; j++ {
        phone := rand.Intn(numPhones)
        url := rand.Intn(numUrls)

        fmt.Printf("_:phone%d <has_url> _:url%d .\n", phone, url)
    }
}

Yet, trying to go another level deeper trying to find other phones in those urls cause the alpha to run into OOM

{
  q(func: allofterms(name@., "phone 293"))@filter(has(has_url)) @cascade {
            total : count(has_url)
            has_url {
            url_name: name@.
            ~has_url @filter(not allofterms(name@., "iphone 7")){
             keyword :name@.
             }
            }
        }
}

On the sever side, I can see that it succeeds in finding the uids of the urls, but is never able to load those urls’ connected phones before running OOM.
As you pointed out, the result will be a huge dataset. In fact, it will potentially retrieve every single url and every phone in the DB since they are densely connected. Our current code is not able to handle such cases, and we need to come up with new mechanisms to handle such huge responses. I’ve created a github issue to track this requirement https://github.com/dgraph-io/dgraph/issues/3934. Thanks and we’ll try to get this supported soon.


(Taha Sevim) #12

@gitlw Thanks for your experiment that is exactly same what I have tried.
I will try to change logic behind my query which might also give insights that is similar to that inefficient query.

Again thanks a lot for you attention @gitlw @MichelDiz !


(Linhan Li) #13

Hi Lucas, we have a dataset of 700M RDF(100M Nodes(with two attributes each)+ 500M edges), and we run an experiment on the machines similar to @taha. As results, we got the same OOM issues as @taha, but we just wonder if Degraph’s mechanism will support such huge datasets and responses if we run queries under the distributed cluster mode?