Optimal way to ingest MongoDb records to Dgraph

I have 1 lakh (100,000) records in MongoDb. I need to ingest them in Dgraph.

steps I followed:

  1. created a rdf file for the records using java code
    It took nearly 1 hour for the creation of rdf file for 1 lakh records
    2)used bulk loader to ingest data
    I am getting some errors due to improper rdf tuples so i used the --ignore errors flag
  2. Data got ingested but only 60k records got ingested.

what I wanted to know:

  1. can we check what are the errors while bulk loading ?If yes, how?
  2. Best approach to convert mongoDb document to RDF
    2)Can you suggest me good approach for parsing the rdf files properly?
  3. what might be the reasons for that much difference in no of records that got ingested?

Don’t use --ignore-errors. Can you perhaps show me some examples of “good” RDF triples and some “bad” RDF triples?

Also, some bigger picture regarding how the data is stored in Mongo etc would be nice

if i am not using --ignore errors, then it is giving me the first rdf tuple that is not in proper format and bulk loader is gettig stopped

eg:
2021/05/28 11:58:44 while lexing _:qkp4pisseduqebkll6gooefd8j ““Chris”” . at line 1 column 43: Invalid input: C at lexText

my file is nearly 1gb size, how can i search for this particular tuple manually and change it? This would be very time consuming and hard right. So , how can i parse this file ? or how can i know the tuples which are not in proper format or got rejected by bulk loader?
should i write a validation script to check each rdf tuple before i am writing it into that file?or any finer soultion is available than this?

{ 
    "_id" : ObjectId("570f5e683e42ab22383c4bc7"), 
    "userId" : "3hvnvalevcepje0uf07oufds7p", 
    "TotalExperience" : NumberInt(117), 
    "dateCreated" : "2016-01-21", 
    "education" : [
        {
            "degree" : "Bachelor of Arts (BA)", 
            "fieldOfStudy" : "Painting", 
            "schoolName" : "Osmania University", 
            "schoolAndUniversity" : "Osmania University", 
            "startDate" : "2006-01-01", 
            "endDate" : "2011-01-01", 
            "isHighEducation" : false, 
            "sy" : NumberInt(106), 
            "ey" : NumberInt(111), 
            "sm" : NumberInt(0), 
            "em" : NumberInt(0), 
            "period" : NumberInt(60), 
            "recentEducation" : false
        }, 
        {
            "degree" : "Bachelor of Arts (BA)", 
            "fieldOfStudy" : "Painting", 
            "schoolName" : "PG College Of Law, Basheerbagh", 
            "schoolAndUniversity" : "PG College Of Law, Basheerbagh", 
            "universityName" : "PG College Of Law, Basheerbagh", 
            "startDate" : "2006-01-01", 
            "endDate" : "2011-01-01", 
            "isHighEducation" : true, 
            "fieldID" : "169898b4-20a8-4309-83fe-bd6cd5836882", 
            "sy" : NumberInt(106), 
            "ey" : NumberInt(111), 
            "sm" : NumberInt(0), 
            "em" : NumberInt(0), 
            "period" : NumberInt(60), 
            "recentEducation" : false
        }, 
        
    ], 
   
    "experience" : [
        {
            "priority" : 0.0, 
            "company" : "Phenom People Pvt Ltd.", 
            "title" : "Senior Product Analyst", 
            "jobTitle" : "Senior Product Analyst", 
            "industry" : "Information Technology & Services", 
            "size" : "51-200", 
            "type" : "Privately Held", 
            "startDate" : "2015-12-01", 
            "isCurrent" : true, 
            "fieldID" : "f3b0f5f8-b587-4460-af94-d3fb2a9d4de5", 
            "company_org" : "Phenom People", 
            "jobTitle_org" : "Senior Product Analyst", 
            "providedByUser" : false, 
            "endDate" : "1111-02-01", 
            "sy" : NumberInt(115), 
            "ey" : NumberInt(-789), 
            "sm" : NumberInt(11), 
            "em" : NumberInt(1), 
            "period" : NumberInt(62)
        }, 
        {
            "priority" : 0.0, 
            "company" : "Netwin Solutions India Pvt Ltd.", 
            "title" : "Senior UI Designer", 
            "jobTitle" : "Senior User Interface Designer", 
            "industry" : "Computer Software", 
            "size" : "51-200", 
            "type" : "Privately Held", 
            "startDate" : "2014-10-01", 
            "endDate" : "2015-12-01", 
            "isCurrent" : false, 
            "fieldID" : "3c2e7f0c-4aaa-4338-8044-aa1bc4f39d77", 
            "company_org" : "Netwin Solutions, Inc", 
            "jobTitle_org" : "Senior UI Designer", 
            "providedByUser" : false, 
            "sy" : NumberInt(114), 
            "ey" : NumberInt(115), 
            "sm" : NumberInt(9), 
            "em" : NumberInt(11), 
            "period" : NumberInt(14)
        } 
         
       
    ], 
    "firstName" : "Partha", 
    "lastName" : "Sarathy. Aila", 
    "location" : {
        "location" : "Hyderābād, Telangana, India", 
        "city" : "", 
        "state" : "", 
        "country" : "", 
        "latitude" : NumberInt(-9999), 
        "longitude" : NumberInt(-9999)
    }, 
    "refNum" : "PHENA0059",  
    "ts" : ISODate("2021-03-12T05:26:23.130+0000"), 
    "updatedDate" : "2021-01-18",  
    "designation" : "Senior Product Analyst", 
    "imProfileId" : null, 
    "imStatus" : "inactive", 
    "internal" : false, 
    "newLocation" : {
        "source" : "SYSTEM_PUP_Profile_Enrichment", 
        "createdDate" : ISODate("2021-01-13T14:32:29.697+0000"), 
        "location" : "Hyderābād, Telangana, India", 
        "fieldId" : "5739b06f-8688-4528-867d-3161259e1e01"
    }, 
    "skillList" : [
        {
            "value" : "Adobe Creative Suite", 
            "value_org" : "Adobe Creative Suite", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
        {
            "value" : "Adobe Dreamweaver", 
            "value_org" : "Adobe Dreamweaver", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
        {
            "value" : "Adobe Illustrator", 
            "value_org" : "Adobe Illustrator", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
        {
            "value" : "Adobe Photoshop", 
            "value_org" : "Adobe Photoshop", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
       
        
        {
            "value" : "Layout", 
            "value_org" : "Layout"
        }, 
        {
            "value" : "Layout & Design", 
            "value_org" : "Layout & Design", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
        {
            "value" : "Logo Design", 
            "value_org" : "Logo Design", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
    
        {
            "value" : "Poster", 
            "value_org" : "Poster"
        }, 
       
        
        {
            "value" : "Typography", 
            "value_org" : "Typography", 
            "skillSource" : [
                "SYSTEM_PUP_Profile_Enrichment"
            ], 
            "latestDate" : ISODate("2021-01-18T12:23:27.686+0000")
        }, 
        {
            "value" : "UI", 
            "value_org" : "UI"
        } 
       
    ], 
    "userType" : "external"
}

this is a sample record, we have 1 lakh such records

This JSON syntax isn’t suported.

@chewxy @hardik maybe we could add support for this in the parser. What bothers me is the concept of “documents” being injected in a Graph context. They are not the same. Ingesting Mongo data without contextualization of the entities will be a mess with several duplicated entities.