How to update a large amount of data in dgraph every day

In this case, you use a client doing the upsert block.

You can use Bulk or Live only if the data doesn’t exist in the DB. If both data you already had inserted in the DB you go to step 3. Do you use blank nodes always? if so Bulk and Live have a special flag called “-x” that stores the uid mapping. It can be useful for Posterous insertions.

for more run:

dgraph live -h | grep xid

I don’t know how are your datasets. But imagine the following.

You brought your data from some other DB, some source that doesn’t use Dgraph standards. (This can be useful even for CSV files). One way to link such entities is by using some value that relates them. Like “id” (foreign key, id or something) or some value that is unique and that is in the entities and serves to identify them.

Simple schema sample

<friend>: [uid] .
<linkto>: string @index(hash) .
<name>: string @index(exact) .

Dataset sample

{
   "set": [
      {
         "name": "User 1",
         "linkto": "User 2"
      },
      {
         "name": "User 2",
         "linkto": "User 3"
      },
      {
         "name": "User 3",
         "linkto": "User 2"
      },
      {
         "name": "User 4",
         "linkto": "User 2"
      },
      {
         "name": "User 5",
         "linkto": "User 2"
      },
      {
         "name": "User 6",
         "linkto": "User 2"
      },
      {
         "name": "User 7",
         "linkto": "User 2"
      }
   ]
}

The upsert block to link them

You have to run this upsert one by one until the links are over.

How do I know the links are over? easy, if the upsert response has the field “vars”. There are links. If has only the “uids” field. It’s over.

upsert {
  query {
    v0 as var(func: has(linkto), first:1) { # Never remove the "first" param.
    LK as linkto
    }
    LINK as var(func: eq(name, val(LK)))
  }

  mutation {
    set {
      uid(v0) <friend> uid(LINK) .
      uid(LINK) <friend> uid(v0) .
    }
    delete {
      uid(v0) <linkto> * .
    }
  }
}

After the link the Query

{
	q(func: has(name)) {
		name
        linkto #just to check if this value exists
		friend {
			name
            linkto
		}
	}
}

The result

So you can see the data with the relations.

 {
  "data": {
    "q": [
      {
        "name": "User 7",
        "friend": [
          {
            "name": "User 2"
          }
        ]
      },
      {
        "name": "User 1",
        "friend": [
          {
            "name": "User 2"
          }
        ]
      },
      {
        "name": "User 2",
        "friend": [
          {
            "name": "User 7"
          },
          {
            "name": "User 1"
          },
          {
            "name": "User 3"
          },
          {
            "name": "User 4"
          },
          {
            "name": "User 5"
          },
          {
            "name": "User 6"
          }
        ]
      },
      {
        "name": "User 3",
        "friend": [
          {
            "name": "User 2"
          }
        ]
      },
      {
        "name": "User 4",
        "friend": [
          {
            "name": "User 2"
          }
        ]
      },
      {
        "name": "User 5",
        "friend": [
          {
            "name": "User 2"
          }
        ]
      },
      {
        "name": "User 6",
        "friend": [
          {
            "name": "User 2"
          }
        ]
      }
    ]
  }
}
1 Like