I have a massive dataset in BigQuery. Now I’m reading the documentation for defining my graph model in GraphQL. Since I’m already familiar with RDF I just want to understand how to translate a given GraphQL model into RDF triples, because I need to serialize from what I have in BigQuery to RDF but that should be aligned with the graph model.
Is there any resource explaining how to get from your GraphQL model to the RDF to ingest?
Hi @tonicebrian
I’m not so familiar with BigQuery and also It’s been 2 months that we are trying to migrate to Dgraph but I think knowing the relation of GraphQL Schema and the real nodes and edges in dgraph might be helpful:
If we have this GraphQL Schema:
type User {
UserID: ID!
name: String
verified: Boolean
age: Int
}
This will be your dgraph schema:
<User.verified>: bool .
<User.name>: string .
<User.age>: int .
type <User> {
User.verified
User.name
User.age
}
For our case I implemented a converter that was able to generate something that maps to these structures.
Also based on docs you have another option and you can specifiy your type and predicate name in each GraphQL Field like this:
type Person @dgraph(type: "Human-Person") {
name: String @dgraph(pred: "your_custom_name")
age: Int
}
What about relations when they involve a composite key? Say I have:
type Student {
name: String! @id,
enroled: [Course]
}
type Course {
name: String! @id,
year: Int! @id,
credits: Int
}
How would you define the RDF for bulk loading in this case.
And kind of more general, I’m scanning documentation but I’m not able to spot the place where questions like this could be answered. Maybe am I looking at the wrong place?
Not sure, I don’t use GraphQL daily. But I think the id directive is just a eq query under the hood. Or maybe it creates an XID edge? I can’t see why. I think there’s no especial storing there. It uses the name/year edge with eq function.
This isn’t really a composite key but more like an additional unique index.
E.G. you could not have two course with the same name even if the years were different and you could not have two courses for the sale year. I don’t think you want to use the @id in this situation.
But as far as formatting an RDF to types that have a @id field, there is nothing special to do differently than importing rdf data to a type not containing a @id field.
The thing to keep in mind and know is that the @id applies logic to the GraphQL API only. This means you could unintentionally import data that did not have unique values for that field and Dgraph would happily ingest.
Even if the GraphQL schema does not show a field mapped to the ID (Dgraph UID) there is still one in the underlying data. The “id” for those types is the same as any other type. To prove this use that schema and add some data through mutations. Then modify the schema and add a id: ID field and query the data again. Changing the schema does not change the data but it can provide access to data that is already there just not visible to the GraphQL API.
Blank Nodes (eg. _:NewCourse) will always be a new UID. It is an unknown identifier or contextual identifier(within a transaction). Never use it when you are trying to link your data to an existing entity. Instead of blank nodes, use UID. In order to use an existing UID, you have to query for it before mutation. You can also use Upsert Block to do the job.