Hi All,
This post may apply directly to the “stronger type system” feature in the 2018 Dgraph product roadmap.
My team has had a lot of success using Dgraph, and we have built a pretty complex web application on top of it. However, the approach we took to defining the schema will not scale based on the Dgraph team’s recommendations against a generic “type” edge per https://docs.dgraph.io/howto/#giving-nodes-a-type. I am looking for advice on how to migrate away from our current approach to one which can store millions or billions of nodes, while maintaining a rich “meta-schema” we have put in place to add context to the data. Here are the schema requirements of our app (schema in a general sense, not necessarily directly mapping to Dgraph’s schema features):
- A schema that is constantly changing, and is being created and modified by non-technical users
- The schema-level concept of “Classes”, “Properties”, and “Relations”. Each of these has a name and description.
- The content-level concept of “Entities”, each of which has a “Class”, “Property Values”, and “Relation Values”.
- Properties have a “type” such as number, string, date, boolean, image, and visualization.
- Classes have a set of predefined “Properties” and “Relations” their corresponding entities can have. Each has a min count and max count.
- For a given Class + Relation combo, the class of the “target entity” in the source entity → Relation → target entity is constrained at the class level. For example, the class Business Requirement and relation Depends On may have a valid target class of Key Value Driver, whereas for the class Software System may also use the relation Depends On, but with a valid target class of Software Library.
Based on the dgraph team’s advice to use *specific" predicates, rather than generic ones (due to mutation aborts and sharding between machines), I think our entire model needs to be redesigned. The shortcoming seems to be:
We use way too many generic predicates. The predicates “VertexType”, “Class”, “Property”, “Relation”, “ExternalId”, “Namespace”, and others are used by ALL entities in the system. Thus as the system scales, we basically must perform all mutations in series, because any two transactions would very often conflict.
However, I am not sure how to retain our rich schema features using the current Dgraph schema features while also following the Dgraph team’s predicate suggestions we will need in order to scale.
I think in summary, we need the Dgraph schema to do for us what we have been doing ourselves at the vertex level. These would be:
- Attach scalar attributes to predicates themselves. Similar to facets, but at the schema level. softwareSystem is a class and must have a “name” and “description”. We would also use a “type” attribute to identify softwareSystem as a “class”
- We must be able to create relationships between schema items. For instance, "softwareSystem allows schema property type latestVersion, which has a min and max occurrence count (1 and 1). Or softwareSystem allows relation type usesLibrary, which in turn can point to the class softwareLibrary, and has a min and max occurrence count (0 and unlimited).
Hopefully this makes sense. We would be very interested to provide comments on the Dgraph team’s type system design and would be happy to provide more detail about what we have built.