How do you go from GraphQL to RDF for bulk loading?

tonicebrian · July 5, 2021, 9:32am

Hi,

I have a massive dataset in BigQuery. Now I’m reading the documentation for defining my graph model in GraphQL. Since I’m already familiar with RDF I just want to understand how to translate a given GraphQL model into RDF triples, because I need to serialize from what I have in BigQuery to RDF but that should be aligned with the graph model.
Is there any resource explaining how to get from your GraphQL model to the RDF to ingest?

pshaddel · July 5, 2021, 10:34am

Hi @tonicebrian
I’m not so familiar with BigQuery and also It’s been 2 months that we are trying to migrate to Dgraph but I think knowing the relation of GraphQL Schema and the real nodes and edges in dgraph might be helpful:
If we have this GraphQL Schema:

type User {
  UserID: ID!
  name: String
  verified: Boolean
  age: Int
}

This will be your dgraph schema:

<User.verified>: bool .
<User.name>: string .
<User.age>: int .

type <User> {
	User.verified
        User.name
        User.age
}

For our case I implemented a converter that was able to generate something that maps to these structures.
Also based on docs you have another option and you can specifiy your type and predicate name in each GraphQL Field like this:

type Person @dgraph(type: "Human-Person") {
    name: String @dgraph(pred: "your_custom_name")
    age: Int
}

MichelDiz · July 5, 2021, 1:46pm

1). The entity need a prefix, which is the name of its GraphQL Type.

e.g

_:Node1 <User.name> "Lucas" .
_:Node1 <User.age> "19" .
_:Node1 <dgraph.type> "User" .

2). Note that all entities need the predicate dgraph.type reflecting the GraphQL Type.

I think that’s all, some features in GraphQL has some different structure, if you are using any of those you need a closer look in the docs.

tonicebrian · July 5, 2021, 1:57pm

Thanks @pshaddel and @MichelDiz you are putting me on track.

What about relations when they involve a composite key? Say I have:

type Student {
   name: String! @id,
   enroled: [Course]
}


type Course {
   name: String! @id,
   year: Int! @id,
   credits: Int
}

How would you define the RDF for bulk loading in this case.

And kind of more general, I’m scanning documentation but I’m not able to spot the place where questions like this could be answered. Maybe am I looking at the wrong place?

MichelDiz · July 5, 2021, 2:18pm

Not sure, I don’t use GraphQL daily. But I think the id directive is just a eq query under the hood. Or maybe it creates an XID edge? I can’t see why. I think there’s no especial storing there. It uses the name/year edge with eq function.

This link is a usual edge connection.

tonicebrian · July 5, 2021, 2:21pm

But in RDF you cannot have anything that is not a triple. So you could have:

_:student1 <enroled> _:????

but I don’t know what to put as a Course

MichelDiz · July 5, 2021, 2:23pm

The course Blank node or UID.

_:student1 <enroled> _:NewCourse .

Or

_:student1 <enrolled> <0xfrer31> .

tonicebrian · July 5, 2021, 2:49pm

But from where that _:NewCourse id comes? Is it a hash over the composite key (name , year) ?

amaster507 · July 5, 2021, 3:07pm

This isn’t really a composite key but more like an additional unique index.

E.G. you could not have two course with the same name even if the years were different and you could not have two courses for the sale year. I don’t think you want to use the @id in this situation.

But as far as formatting an RDF to types that have a @id field, there is nothing special to do differently than importing rdf data to a type not containing a @id field.

The thing to keep in mind and know is that the @id applies logic to the GraphQL API only. This means you could unintentionally import data that did not have unique values for that field and Dgraph would happily ingest.

Even if the GraphQL schema does not show a field mapped to the ID (Dgraph UID) there is still one in the underlying data. The “id” for those types is the same as any other type. To prove this use that schema and add some data through mutations. Then modify the schema and add a id: ID field and query the data again. Changing the schema does not change the data but it can provide access to data that is already there just not visible to the GraphQL API.

MichelDiz · July 5, 2021, 3:18pm

Blank Nodes (eg. _:NewCourse) will always be a new UID. It is an unknown identifier or contextual identifier(within a transaction). Never use it when you are trying to link your data to an existing entity. Instead of blank nodes, use UID. In order to use an existing UID, you have to query for it before mutation. You can also use Upsert Block to do the job.

https://dgraph.io/docs/mutations/upsert-block/#sidebar

Topic		Replies	Views
Clarification on using RDF with a Dgraph GraphQL Schema Dgraph	1	839	November 17, 2021
Understanding the mapping between GraphQL and DQL Schema Definitions Dgraph graphql , kind:question , schema , dql	3	988	July 29, 2021
Won't bulk load with graphql schema convert graphql schema to dql schema？ GraphQL kind:question , status:accepted , kind:bug , ticket:created	4	667	July 21, 2021
Bulk load with graphql schema GraphQL kind:question , status:accepted , kind:feature	10	1386	July 21, 2021
Increase RDF compliance support (N-Triples) Dev rdf	1	1284	July 29, 2020

How do you go from GraphQL to RDF for bulk loading?

Related topics