Dgraph schema system

Based on my discussion about Dgraph schema and type system with @mrjn. Let us consider the following schema:

type Person {
name string
age int
address string
Friends [Person]
}

type Actor {
name string
age int
address string
films [Film]
}

type Film {
name string
budget int
}

If I make the following query:

{
 Actor(_uid_:1) {
  Film
 }
}

- This would return the Film name and budget information for 
all the films the actor has acted in after verifying that the node 
with _uid_ 1 has the required edges (specified in the schema)
which would be name, age, address, film.

And in this query

{
 me(_uid_:1) {
  Film {
   name
  }
 }
}

- This would return the name of all the films that are connected
to node with _uid_ 1 without doing any checks at that node, and 
at the Film level, a check is made if all the films have name and 
budget, after which only the name is returned.

Is this the approach we want to take? @mrjn
Any suggestions and comments on this system are welcome! @core-devs

A few things here:

We should use the colons. It’s not like we’re deliberately trying to not be like GraphQL.

I think we should return all the Actor fields as well. The name, age, address, including all the films they’ve acted in, and the name and budget of the films. So, we expand recursively. This is no different to:

{
  Actor(_uid_:0x01) {}
}

I don’t see how the Film part in the query works. Is there a Film predicate there? By default, the field is a predicate. You’ll have to specify the type of the predicate to make it conform.

Film is not a predicate (from what I can gather), so it would basically return no result. If you want to ask for films, which is a predicate, you can probably do it like so:

{
  me(_uid_:0x01) {
    films: Film {
      name
    }
  }
}

This would tell it to look for the predicate films, and confirm that the entities are of type Film. All entities of type Film would return name and budget by default, so the extra name predicate is redundant.

The exclusion part, where you only return certain fields, is unnecessary in our case. As I explained, it seems like GraphQL specification is tied to a relational storage, and is an indication to the backend to figure out which table to look at, and then which fields in the table to return. So, they use schema to aid the backend.

OTOH, we don’t have that constraint, so we can use the schema to make it easier to write queries and parse data for the end user.

Ah sure, we’ll use colons.

So do we return the entire subgraph? Recursing over any number of levels? In some cases like Friends, this could lead to cycles. What I understood was we only return all the scalar values in that level.

It was supposed to be films, my bad.

Hmm… good point. Maybe we could expand the object types to just one level? So, say, Person -> expand friends -> Person, only expand scalars. Similarly, Actor -> films -> Film, only expand scalars; in case Film type contains other object types etc.

Yeah, expanding just one level of scalars would be good.

Alternatively, we could keep our object types, to only have scalars. And not allow including other object types. I don’t see any particular downside to that approach. Note that if you need to ensure that an actor has at least one film, they could do something like this:

type Actor {
  name: string
  age: int
  address: string
  films: uid
}

And to actually get the films, you can run a query like so:

{
  Actor(_uid_: 0x01) {
    films: Film
  }
}

This would return the name, age, address, and then all film names and their budgets. We could also make it a bit more advanced and have filters if we want to ensure a certain count, say SeniorActor, which can be defined as someone who has done at least 10 films.

type SeniorActor {
  Actor @filter(gt(count(films), 10))
}

Note how a SeniorActor automatically contains all the fields of the Actor, but is defined as an actor with greater than 10 results for films edge.

1 Like

I feel we could still have the object name but just not check for its type validity at this level. It would help in identifying the type of next level instead of having to specify it in the query like this:

1 Like

Okay, sure.

So, you’d just do { Actor(_uid_:0x01) { films }} to receive all film data as well? Sure, that’d work too!

1 Like

One more thing that I feel. We should have global scalars like

scalar age int
scalar name string
scalar uid id

type Person {
 age: int
 name: string
}

These globals would be the way for us to type-check during the mutations. Also, a given field can have only one type (say age in person and age in actor). That way the type checked during the mutation wouldn’t violate the type specified elsewhere. What are your thoughts @mrjn?

1 Like

Yes, we should absolutely have global scalars. That part is already implemented, but you might want to check how it’s specified. Shouldn’t be a JSON config.

Also, how about:

scalar (
  type.object.name.en: string
  type.object.age: int
  ... others
)

Later, we could write our own dgfmt tool to fix the schema. But, just as a nice thing to have, it’s not so important.

Sure we could have that format:

scalar (
 age: int
 name: string
)

Would allow for nice grouping. Though, note that we should continue to still allow scalar x:y, and also allow multiple clauses of scalar ( … ) in the same schema file.

1 Like

I have implemented object types and scalar types, with query expansion based on schema. Please have a look when free: https://github.com/dgraph-io/dgraph/compare/feature/schema-system?expand=1

Can we have a chat sometime you’re free? wanted to discuss a few things.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.