When live loading after bulk loading, the query does not work properly


Report a Dgraph Bug

What version of Dgraph are you using?

Dgraph Version
$ dgraph version
 
Dgraph version   : v21.12.0
Dgraph codename  : zion
Dgraph SHA-256   : 078c75df9fa1057447c8c8afc10ea57cb0a29dfb22f9e61d8c334882b4b4eb37
Commit SHA-1     : d62ed5f15
Commit timestamp : 2021-12-02 21:20:09 +0530
Branch           : HEAD
Go version       : go1.17.3
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs.
For discussions about Dgraph     , visit http://discuss.dgraph.io.
For fully-managed Dgraph Cloud   , visit https://dgraph.io/cloud.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2021 Dgraph Labs, Inc.

Have you tried reproducing the issue with the latest release?

What is the hardware spec (RAM, OS)?

Steps to reproduce the issue (command/config used to run Dgraph).

  1. Make Json file with UID ( we use leveldb for UID duplicate management)
  2. Bulk Load 10b data with Two Type(Account, Transaction)
    Schema : Account Have Edge to Multiple Transaction([uid]), Transaction Have Edge to Account(uid)
  3. Live Load New Data 50m data per day(Daily Batch)
    4 . Query Every Day

Expected behaviour and actual result.

Expected Behavior

{

    "group": "corp",
    "address": "testAddress01",
    "rcv": [{ "amtout" :10,"time":2021-12-01},{ "amtout" :50,"time":2022-05-18}],
    
  },

Actual Behaviour (live load data only with 2022-05-18.json)

{

    "group": "corp",
    "address": "testAddress01",
    "rcv": [{ "amtout" :50,"time":2022-05-18}],
  },

Experience Report for Feature Request

Note: Feature requests are judged based on user experience and modeled on Go Experience Reports. These reports should focus on the problems: they should not focus on and need not propose solutions.

What you wanted to do

The past year’s worth of data is put into bulk, live load data is put into daily batch operation, and then the query is running on the next day.

What you actually did

We generate all the data in the Json file. To prevent duplicate UIDs for each address, put the UIDs for each address previously assigned to the JSON file (the UIDs were allocated by the assign API after zero execution)

After executing only zero, put 1 year’s worth of data in bulk load, then execute alpha, and enter the data as live load for the day before that date.

Why that wasn’t great, with examples

If you send a query after this process, you will only see the UID list for live load when you query for the Transaction UID (Edge) connected to the Account Type. The problem is that the UID for 1 year worth of data entered in bulk cannot be checked.

Is what I did wrong?
The strange thing is that if you look at the order of the sst files created in the P folder, the data put in BULK is generated up to 1-8000 sst files, and the data put in live is generated 16000-16100 sst files, so it is not connected. Is there any solution to this?

Any external references to support your case

@MichelDiz Hi Could you help me to solve this problem? my question means Is it possible load data by live loader after load data by bulk loader? (i use before generated UID files and assigned UID range by zero) i asked already when bulk loader makes 1-8000 sst files then when i load data by live loader it makes not consistently files it maked 16000-16100 numbers sst files.

i’m sorry for my poor english;;

Sorry for the delay, I’m no longe in the company. So, I’m not following the community due to some personal projects. You can find help in the community Discord.

Avoid creating your own UIDs. Live Loader will always create new ones, unless you create a very strict method to deal with UIDs.

Ofcourse it is. That’s how things should work.

No problem, keep doing it. Make yourself understood, ignore the criticism and your English will gradually improve.

Cheers!

Thanks for your help!