Optimal way to ingest MongoDb records to Dgraph

Mounika_Mandadi · May 28, 2021, 5:26am

I have 1 lakh (100,000) records in MongoDb. I need to ingest them in Dgraph.

steps I followed:

created a rdf file for the records using java code
It took nearly 1 hour for the creation of rdf file for 1 lakh records
2)used bulk loader to ingest data
I am getting some errors due to improper rdf tuples so i used the --ignore errors flag
Data got ingested but only 60k records got ingested.

what I wanted to know:

can we check what are the errors while bulk loading ?If yes, how?
Best approach to convert mongoDb document to RDF
2)Can you suggest me good approach for parsing the rdf files properly?
what might be the reasons for that much difference in no of records that got ingested?

chewxy · May 28, 2021, 5:37am

Don’t use --ignore-errors. Can you perhaps show me some examples of “good” RDF triples and some “bad” RDF triples?

Also, some bigger picture regarding how the data is stored in Mongo etc would be nice

Mounika_Mandadi · May 28, 2021, 6:36am

if i am not using --ignore errors, then it is giving me the first rdf tuple that is not in proper format and bulk loader is gettig stopped

eg:
2021/05/28 11:58:44 while lexing _:qkp4pisseduqebkll6gooefd8j ““Chris”” . at line 1 column 43: Invalid input: C at lexText

my file is nearly 1gb size, how can i search for this particular tuple manually and change it? This would be very time consuming and hard right. So , how can i parse this file ? or how can i know the tuples which are not in proper format or got rejected by bulk loader?
should i write a validation script to check each rdf tuple before i am writing it into that file?or any finer soultion is available than this?

Mounika_Mandadi · May 28, 2021, 7:16am

{ 
    "_id" : ObjectId("570f5e683e42ab22383c4bc7"), 
    "userId" : "3hvnvalevcepje0uf07oufds7p", 
    "TotalExperience" : NumberInt(117), 
    "dateCreated" : "2016-01-21", 
    "education" : [
        {
            "degree" : "Bachelor of Arts (BA)", 
            "fieldOfStudy" : "Painting", 
            "schoolName" : "Osmania University", 
            "schoolAndUniversity" : "Osmania University", 
            "startDate" : "2006-01-01", 
            "endDate" : "2011-01-01", 
            "isHighEducation" : false, 
            "sy" : NumberInt(106), 
            "ey" : NumberInt(111), 
            "sm" : NumberInt(0), 
            "em" : NumberInt(0), 
            "period" : NumberInt(60), 
            "recentEducation" : false
        }, 
        {
            "degree" : "Bachelor of Arts (BA)", 
            "fieldOfStudy" : "Painting", 
            "schoolName" : "PG College Of Law, Basheerbagh", 
            "schoolAndUniversity" : "PG College Of Law, Basheerbagh", 
            "universityName" : "PG College Of Law, Basheerbagh", 
            "startDate" : "2006-01-01", 
            "endDate" : "2011-01-01", 
            "isHighEducation" : true, 
            "fieldID" : "169898b4-20a8-4309-83fe-bd6cd5836882", 
            "sy" : NumberInt(106), 
            "ey" : NumberInt(111), 
            "sm" : NumberInt(0), 
            "em" : NumberInt(0), 
            "period" : NumberInt(60), 
            "recentEducation" : false
        }, 
        
    ], 
   
    "experience" : [
        {
            "priority" : 0.0, 
            "company" : "Phenom People Pvt Ltd.", 
            "title" : "Senior Product Analyst", 
            "jobTitle" : "Senior Product Analyst", 
            "industry" : "Information Technology & Services", 
            "size" : "51-200", 
            "type" : "Privately Held", 
            "startDate" : "2015-12-01", 
            "isCurrent" : true, 
            "fieldID" : "f3b0f5f8-b587-4460-af94-d3fb2a9d4de5", 
            "company_org" : "Phenom People", 
            "jobTitle_org" : "Senior Product Analyst", 
            "providedByUser" : false, 
            "endDate" : "1111-02-01", 
            "sy" : NumberInt(115), 
            "ey" : NumberInt(-789), 
            "sm" : NumberInt(11), 
            "em" : NumberInt(1), 
            "period" : NumberInt(62)
        }, 
        {
            "priority" : 0.0, 
            "company" : "Netwin Solutions India Pvt Ltd.", 
            "title" : "Senior UI Designer", 
            "jobTitle" : "Senior User Interface Designer", 
            "industry" : "Computer Software", 
            "size" : "51-200", 
            "type" : "Privately Held", 
            "startDate" : "2014-10-01", 
            "endDate" : "2015-12-01", 
            "isCurrent" : false, 
            "fieldID" : "3c2e7f0c-4aaa-4338-8044-aa1bc4f39d77", 
            "company_org" : "Netwin Solutions, Inc", 
            "jobTitle_org" : "Senior UI Designer", 
            "providedByUser" : false, 
            "sy" : NumberInt(114), 
            "ey" : NumberInt(115), 
            "sm" : NumberInt(9), 
            "em" : NumberInt(11), 
            "period" : NumberInt(14)
        } 
         
       
    ], 
    "firstName" : "Partha", 
    "lastName" : "Sarathy. Aila", 
    "location" : {
        "location" : "Hyderābād, Telangana, India", 
        "city" : "", 
        "state" : "", 
        "country" : "", 
        "latitude" : NumberInt(-9999), 
        "longitude" : NumberInt(-9999)
    }, 
    "refNum" : "PHENA0059",  
    "ts" : ISODate("2021-03-12T05:26:23.130+0000"), 
    "updatedDate" : "2021-01-18",  
    "designation" : "Senior Product Analyst", 
    "imProfileId" : null, 
    "imStatus" : "inactive", 
    "internal" : false, 
    "newLocation" : {
        "source" : "SYSTEM_PUP_Profile_Enrichment", 
        "createdDate" : ISODate("2021-01-13T14:32:29.697+0000"), 
        "location" : "Hyderābād, Telangana, India", 
        "fieldId" : "5739b06f-8688-4528-867d-3161259e1e01"
    }, 
    "skillList" : [
        {
            "value" : "Adobe Creative Suite", 
            "value_org" : "Adobe Creative Suite", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
        {
            "value" : "Adobe Dreamweaver", 
            "value_org" : "Adobe Dreamweaver", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
        {
            "value" : "Adobe Illustrator", 
            "value_org" : "Adobe Illustrator", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
        {
            "value" : "Adobe Photoshop", 
            "value_org" : "Adobe Photoshop", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
       
        
        {
            "value" : "Layout", 
            "value_org" : "Layout"
        }, 
        {
            "value" : "Layout & Design", 
            "value_org" : "Layout & Design", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
        {
            "value" : "Logo Design", 
            "value_org" : "Logo Design", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
    
        {
            "value" : "Poster", 
            "value_org" : "Poster"
        }, 
       
        
        {
            "value" : "Typography", 
            "value_org" : "Typography", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
        {
            "value" : "UI", 
            "value_org" : "UI"
        } 
       
    ], 
    "userType" : "external"
}

this is a sample record, we have 1 lakh such records

MichelDiz · May 28, 2021, 1:18pm

This JSON syntax isn’t suported.

@chewxy @hardik maybe we could add support for this in the parser. What bothers me is the concept of “documents” being injected in a Graph context. They are not the same. Ingesting Mongo data without contextualization of the entities will be a mess with several duplicated entities.

Topic		Replies	Views
Data Ingestion very slow Users	6	1083	October 25, 2018
Schema and mutation RDF triples / bulk loader error handling Dgraph kind:question	4	377	January 13, 2021
OOM during bulk loading Dgraph bulkloader	5	799	October 18, 2021
Out of memory problem in large rdf file bulk load Users	8	715	October 30, 2019
How to ingest nodes and create forward and reverse Documentation dgraph	1	107	July 7, 2024

Optimal way to ingest MongoDb records to Dgraph

Related topics