How can I build xidmap


(Gao Qi Lin) #1

dgraph live --help

  dgraph live [flags]
  -x, --xidmap string            Directory to store xid to uid mapping

I Need Write incremental data every day , So I Need xidmap to Remove duplication nodes.
But , How can I export xidmap from now database ?


(Gao Qi Lin) #2

I has been used badger sucess export uids from dgraph database.

Is there a tool to export xidmap directly?

the code :

package main

import (
	"encoding/binary"
	"fmt"

	"github.com/dgraph-io/badger"
)

func main() {
	opts := badger.DefaultOptions
	opts.Dir = "/dgraph/p" // Directory to store posting lists.
	opts.ValueDir = "/dgraph/p"
	db, err := badger.Open(opts)
	if err != nil {
		panic(err)
	}
	defer db.Close()
	err = ForeachUids(db)
	if err != nil {
		panic(err)
	}
}

// export all uids
func ForeachUids(db *badger.DB) error {
	err := db.View(func(txn *badger.Txn) error {
		opts := badger.DefaultIteratorOptions
		opts.PrefetchSize = 100
		it := txn.NewIterator(opts)
		defer it.Close()
		for it.Rewind(); it.Valid(); it.Next() {
			item := it.Item()
			k := item.Key()
			err := item.Value(func(v []byte) error {
				uid, n := binary.Uvarint(v) //get uid from binary
				if n == len(v) {
					fmt.Printf("key=[%s] , uid: 0x%x\n", k, uid)
				}
				return nil
			})
			if err != nil {
				return err
			}
		}
		return nil
	})
	return err
}

(Michel Conrado) #3

Dgraph don’t uses XIDs. It just create them during a load. The flag --xidmap is useful just to setup a TMP folder path.

However you can use external ids https://docs.dgraph.io/mutations/#external-ids


(Gao Qi Lin) #4

Hey guys

external-ids can’t solve my current problem

I used dgraph for file parent process by file md5

when I live load the first time is ok , but the next live load can not Automatic recognition of existing UIDs . when next load data A new uid will be created by the same subject
That is not what I expected.

My application scenario is that every day there will be new different file parent process data need written.
so, I need to automatically identify the existing UID for the same subject.

Do you have any good suggestions for me?

eg:

the fist day :

<0000710bf0bdf4394113147bf904da3c> <md5> "0000710bf0bdf4394113147bf904da3c" .
<332feab1435662fc6c672e25beb37be3> <md5> "332feab1435662fc6c672e25beb37be3" .
<0000710bf0bdf4394113147bf904da3c> <pmd5> <332feab1435662fc6c672e25beb37be3> .

the next day:

<0000710bf0bdf4394113147bf904da3c> <pmd5> <c6fa526514b961b5b8a9585d1eff5f9d> .

I want “0000710bf0bdf4394113147bf904da3c” not create a new uid and need “c6fa526514b961b5b8a9585d1eff5f9d” auto build a new uid .


(Michel Conrado) #5

I believe you should create a program in Py or Go using Upsert Procedure. Live or Bulk will not do this procedure.

https://docs.dgraph.io/howto/#upsert-procedure


(Gao Qi Lin) #6

thanks for you reply.

but now I need to maintain third party databases to save md5->uid map .

Scenes like this Incremental Data Problem , Can bulk loader support it in feature ?


(Michel Conrado) #7

I don’t believe so. The loaders are made just for loading RDFs. There’s no other function.
It would be necessary to evaluate something universally accepted to introduce as a feature to Loaders. But this is only coming from the community.


(yeahvip) #8

I want to use dgraph live to increment new data because upsert may be slower. But how can I make the map between uid and xid? Can I make a unique identifier for every triplet to replace the mapping of uid and xid?


(Michel Conrado) #9

Hey @yeahvip,

When you have a question, please open a new topic. And reference other topics instead of commenting on them. For when you write in an old topic. It can trigger emails to the people involved. And not everyone likes to receive emails from old subjects.

About your question.

XID mapping only exists for entities. Consequently, edges that belong to this entity must contain the blank node in order to be mapped.

Blank nodes(unique identifiers) are used in that case. https://tour.dgraph.io/intro/5/

In general Dgraph does not use XID, only UIDs. Internally Dgraph handles this, but it is not open for users manipulation.

If you have URI, URL, UUID, GUID, BIC, UDID, SSID, NPI, shortuuid, Snowflake, MongoID and etc. You must use this approach here https://docs.dgraph.io/master/mutations/#external-ids.

Cheers.