Complex Bulk Upsert Howto

I’m currently attempting to use dgraph to store data that is webscraped with Node.js

My schema currently looks like this:

type GraphicsCardPriceDatum {
  date: DateTime!
  price: Float!
}

type GraphicsCard {
  passmarkId: String! @id
  cardName: String! @search(by: [term])
  lookupUrl: String!
  lastUpdated: DateTime! @search
  firstUpdated: DateTime @search
  priceHistory: [GraphicsCardPriceDatum]
  g3d: Int
  g2d: Int
  samples: Int
  busInterface: String
  maxMemory: String
  coreClock: String
  memoryClock: String
  directX: String
  openGL: String
  maxTdp: String
  powerPerf: String
  category: String @search(by: [hash])
  rank: Int
}

The webscraper runs every week, generating a JSON array, one object per graphics card. I would like to perform an upsert that either creates a new node for each datum, or updates an existing node if its passmarkId matches. Note that passmarkIds CANNOT be dgraph’s internal UIDs. The update logic is to simply replace all values for a particular card with available new values, save for priceHistory.

For priceHistory, if the node already exists, its existing priceHistory list should be concatenated with a new price + date datum, and updated. If the node is new, a single-item list with the just-scraped price + date datum should be created.

I’m sure my newness to graph DBs and dgraph is showing here, but it isn’t immediately obvious to me how to do this in a constant number of requests.

I’ve attempted to do this with a GraphQL query and then an add + update mutation, but I couldn’t quite figure out how to do a bulk update. I’ve also attempted to figure out how to do this using the upsert block, but I can’t figure out how to inject my custom update logic between the query and mutation step.

Hate asking other people to write code for me in a question, but could someone advise on how to do this? Thanks.

I’m pretty new, too, but I distinctly remember seeing a bit about conditional upserts when I was reading the docs.

Definitely doesn’t cover the bulk part, but may help with the custom logic?

Yeah I saw this. I feel like all of the pieces are there, dgraph can do what I want it to, I just can’t deduce what exactly I need to write from the documentation. I also feel like the newly added lambdas would be a way to do what I want, but I’m unclear as to whether or not they can be used for mutations.

Welcome @Mantissa-23!
Here is an approach that might suit your use case. You can use the lambda feature in GraphQL to create a batch operation. You can then send small batches for Dgraph to process. This approach involves the usual GraphQL plus a bit of Javascript. I feel this is a good approach as it leaves some elbow room for explicit error handling.

In the example below, i check for graphics card. If it exists I update it. You can extend this logic to add a new graphics card if it does not exist. This would constitute an “upsert” operation.
Please note: I have used slash to do this, as it is easier to experiment and iterate quickly.

Schema:

type GraphicsCardPriceDatum {
  date: DateTime
  price: Float
}

type GraphicsCard {
  passmarkId: String! @id
  cardName: String @search(by: [term])
  lookupUrl: String
  lastUpdated: DateTime @search
  firstUpdated: DateTime @search
  priceHistory: [GraphicsCardPriceDatum]
  g3d: Int
  g2d: Int
  samples: Int
  busInterface: String
  maxMemory: String
  coreClock: String
  memoryClock: String
  directX: String
  openGL: String
  maxTdp: String
  powerPerf: String
  category: String @search(by: [hash])
  rank: Int
}

input GraphicsCardHistoryMessage{
  price: Float
}
input GraphicsCardMessage{
  passmarkId: String
  history: [GraphicsCardHistoryMessage]
}

input GraphicsCardInput{
  nodes: [GraphicsCardMessage!]!
}

type Mutation {
    processGraphicsCardBatch(gcBatch: GraphicsCardInput!): String @lambda
}

The “processGraphicsCardBatch” accepts an array of nodes. This forms a mini-batch. The associated lambda looks as below. For each array element, the existence of card based on passmarkId is checked. If it exists, the history is updated. (please extend the else clause to add to complete the use case)
I am only using important fields to illustrate. Please extend it as you feel appropriate.

async function processGraphicsCardBatch({args, graphql}) {
  let logs ="Success"
  for (i = 0; i < args.gcBatch.nodes.length; i++) {
    console.log(args.gcBatch.nodes[i])
    
    const cardCheck = await graphql(`query cardCheck($passmarkId:String) {
  queryGraphicsCard(filter: {passmarkId: {eq: $passmarkId}}) {
    passmarkId
  }
}`,{"passmarkId": args.gcBatch.nodes[i].passmarkId}
    )
    
    try{
      passmarkId = cardCheck.data.queryGraphicsCard[0].passmarkId
      console.log(" passmark id exists in backend :::: " +  passmarkId)
      console.log(args.gcBatch.nodes[i].history.price)
      
      const results = await graphql(`mutation ($passmarkId: String!, $price: Float) {
        updateGraphicsCard(input: {filter: {passmarkId: {eq: $passmarkId}}, set: {priceHistory: {price: $price}}}) {
          numUids
        }
      }`, {"passmarkId": args.gcBatch.nodes[i].passmarkId, "price": args.gcBatch.nodes[i].history.price})
      console.log(results)
      //(passmarkId: String!, price: Float!)
    }catch(err) {
      logs=" passmark does not exist "
      //if the error is due to record not found, add a new GraphicsCard here
    }  
  }
  return logs
}

self.addGraphQLResolvers({
    "Mutation.processGraphicsCardBatch": processGraphicsCardBatch
})

Let’s seed a Card “P1”.

mutation AddGC {
    addGraphicsCard(input: {passmarkId: "P1"}) {
    numUids
  }
}

Now let’s call “processGraphicsCardBatch” as below. We are sending a batch of two records.

mutation BatchMutation {
  processGraphicsCardBatch(gcBatch: {nodes: 
    [{passmarkId: "P1", history: {price: 15}},
     {passmarkId: "P1", history: {price: 25}}
    ]})
}

When we query, you can find the results as below.
Query:

query MyQuery {
  queryGraphicsCard(filter: {passmarkId: {eq: "P1"}}) {
    passmarkId
    priceHistory {
      date
      price
    }
  }
}

Results:

  "data": {
    "queryGraphicsCard": [
      {
        "passmarkId": "P1",
        "priceHistory": [
          {
            "date": null,
            "price": 15
          },
          {
            "date": null,
            "price": 25
          }
        ]
      }
    ]
  }

Please review this approach.