Graph Design, Foreign Keys, uid/[uid], and Correctness

I doubt this is a question that has a clear or “right” answer. The reason why I’m asking is to get some opinions on the matter, particularly those that have more experience with graph databases.

For the discussion, consider the following schema:

title: string .
start_date: datetime .
tasks: [uid] @reverse .

description: string .
complete: bool .
comments: [uid] @reverse .

username: string .
timestamp: datetime .
comment: string .

type Project {
	title
	start_date
	tasks
}

type Task {
	description
	complete
	comments
}

type Comment {
	username
	timestamp
	comment
}

Being new to graph databases, I’m used to SQL where you model relationships with foreign keys.

In our schema, the intention is that projects have tasks, and tasks have comments. It is also the intention that a comment can only be associated with a single task and a task can only be associated with a single project. With this schema (tasks: [uid] @reverse .), there is nothing in Dgraph ensuring that there isn’t an edge from more than one project to the same task. As a result, when following the reverse edge from Task to Project (~tasks), the result is a list instead of a single node.

In Dgraph v1.1.0 the distinction between uid and [uid] was introduced. This essentially allows foreign keys. To take advantage of this, I’d need to rework my schema to now look like this:

title: string .
start_date: datetime .

description: string .
complete: bool .
project: uid @reverse .

username: string .
timestamp: datetime .
comment: string .
task: uid @reverse .

type Project {
	title
	start_date
}

type Task {
	description
	complete
	project
}

type Comment {
	username
	timestamp
	comment
	task
}

This seems more correct because the schema enforces the correct relationships. When following the edge from Task to Project, I now get a single node, and the reverse results in a list of nodes, which is what we expect.

However, this results in a lot of issues from a usability perspective. Queries and Mutations, at least when using JSON, all seem to work best when you model your schema where edges follow the natural hierarchy. In this case, you would expect a query of the whole graph to look like:

{"projects": [
	{"tasks": [
		{"comments": [
			{}
		]}
	]}
]}

I believe you could produce this in query to the second schema by following reverse edges, as long as you renamed the fields.

But you won’t be able to create new nodes with a mutation that has that same shape. You’d have to send a list of different objects, see here.

So I keep going back and forth between what appears to be correct modeling of the relationships and what feels better/more natural to query and mutate.

One answer is to not worry about using Dgraph to enforce the correctness of the relationships, just do that in the application. Dgraph provides a lot more safety that many NoSQL solutions anyway, so I’m sure folks coming from that background don’t worry about it. But for those of us coming from SQL where we worry about normalization and constraints, it feels like a compromise either way.

How are others thinking about this?

An awesome, no compromise solution that I have no idea if it would be feasible or desireable to implement, would be to (1) allow (re)naming the reverse relationship and (2) mutating data in either direction. So Task would have a project field and Project would have a tasks field. The forward edge would be from Task to Project (project: uid @reverse .) to ensure a Task can only relate to one Project, but we would still be able to query and mutate the data in the natural (reverse) direction.

2 Likes

I guess I had similar doubts you do, so I created a tool that can consistency check cardinalities (so your 1-1..n relation stays 1-1..n with [uid] schema). Please note this is highly WIP/proof-of-concept: GitHub - ppp225/dgsee: Dgraph consistency check and visualize edge cardinalities

The answer how to model your relations depends on the usability. For queries, it shouldn’t really matter, as you can alias reverse edges to your needs. For mutations it matters quite a bit, if they need to be defined in the “natural hierarchy”. I found that some of my relations need to be [uid] for mutations sake, even if I’d like to define them as uid, and thus this tool above was born.

1 Like

Thanks confirming I’m not the only one thinking about this. I also appreciate the link to your tool. It will be helpful in making sure my application isn’t inserting incorrect relationships over time.

I’m hoping that someone from Dgraph can eventually weigh in on whether or not my “proposed” solution is unworkable or undesirable.