Can new Edges be added to large numbers of existing Nodes by referencing a Node's Type?

nick-walt · July 24, 2022, 6:54am

If our application’s functional design changes over time to better model the domain and we want to add new Edges to large numbers of existing Nodes is it possible to add them to a Node’s “type” which will then be applied to existing and future Node instances by the database?

Or would we just tell the database to find all specific Node instances and then add the new Edges in a bulk update? Thanks!

matthewmcneely · July 24, 2022, 5:21pm

Hi Nick,

Adding edges to established node types is something quite common in dgraph (and other graph databases). But there’s a distinction to be made: 1. defining a new edge and 2. assigning values to that new edge.

Say, for instance, your application has a Movie type with an [Actor] edge. Later you want to add a [Director] edge. No, problem — just update your schema with the new edge. Of course, at that moment all the existing Movie nodes will have empty [Director] edges. You’ll need to go back and update those nodes with the correct data.

Did I understand your question correctly?

MichelDiz · July 24, 2022, 10:01pm

Types in Dgraph isn’t all that “strict”. Types are basically a kind of “convention” and used to “reveal” the format of the node. So, you can even add “hidden” edges that only who queries knows it. What I mean is that, there’s no implications changing the schema over time.

Only GraphQL you are obligated to do things.

You mean upsert bulk update right? there’s no correlation there. Bulk update is a good way to update a database that changes over time.

nick-walt · July 25, 2022, 9:26am

@matthewmcneely @MichelDiz

Hi Mathew and Michael. Thanks for your responses.

Does this mean that Edges can be like a type wrapper for the Node to which it points? Where a Director Edge (from a Movie Node) points to a Person, thereby saying that that person is a type Director? Elsewhere in the graph another Edge from a Person Node could point to the same Person Node, that the Director Edge points to, with a label of Father. This means that the same Person can be a different type within each relationship (Movie > Director > Person) and (Person > Father > Person)?

This seems to be very different compared to my experience with Types, which comes from an Algebraic Data Type (AlgDT) system… Or is it?

A Person type that is also a Director and a Father could be expressed as a Type with Variants:

type Person
    = Director
    | Father

Depending on the function that Person is either a Director or a Father? This is awesome! Is this why Neo4J mandates that all Edges are strictly typed?

Algebraic thinking reasons about values in terms of their types and the operations that can be applied to them by expressions (functions). These types can be directly associated to a domain entities.

The AlgDT type system comes from ML Functional Programming languages like Haskell, OCAML, Rescript (ReasonML), F#, Rust, Elm and to a limited extent Typescript.

Most AlgDTs are user-made custom types, like userProfile or UserID, and are recognised by the compiler and application as semantically unique entities and are assigned the same level of significance as that given to a string, integer record and array.

Algebraic Data Types can be understood as user-constructed types made up of single or multiple values that become entities in data. This means they can be composed in all sorts of cool ways and passed from one function to another, or nested inside other unique types.

AlgDTs are assigned a much higher level of uniqueness and meaning than conventional reference types which are primitive types with a label. Here is a quote from: domain driven design - Is it still valid to speak about anemic model in the context of functional programming? - Software Engineering Stack Exchange

Suppose we need to define a type representing user IDs. An “anemic” definition would state that user IDs are strings. That’s technically feasible, but runs into huge problems because user IDs aren’t used like arbitrary strings. It makes no sense to concatenate them or slice out substrings of them, Unicode shouldn’t really matter, and they should be easily embeddable in URLs and other contexts with strict character and format limitations.

Solving this problem usually happens in a few stages. A simple first cut is to say “Well, a UserID is represented equivalently to a string, but they’re different types and you can’t use one where you expect the other.” Haskell (and some other typed functional languages) provides this feature via newtype:

newtype UserID = UserID String

This defines a UserID function which when given a String constructs a value that is treated like a UserID by the type system, but which is still just a String at runtime. Now functions can declare that they require a UserID instead of a string; using UserIDs where you previously were using strings guards against code concatenating two UserIDs together. The type system guarantees that can’t happen, no tests required.

The weakness here is that code can still take any arbitrary String like "hello" and construct a UserID from it. Further steps include creating a “smart constructor” function which when given a string checks some invariants and only returns a UserID if they’re satisfied. Then the “dumb” UserID constructor is made private so if a client wants a UserID they must use the smart constructor, thereby preventing malformed UserIDs from coming into existence.

Even further steps define the UserID data type in such a way that it’s impossible to construct one that’s malformed or “improper”, simply by definition. For instance, defining a UserID as a list of digits:

data Digit = Zero | One | Two | Three | Four | Five | Six | Seven | Eight | Nine
data UserID = UserID [Digit]

To construct a UserID a list of digits must be provided. Given this definition, it’s trivial to show that it’s impossible for a UserID to exist that can’t be represented in a URL. Defining data models like this in Haskell is often aided by advanced type system features like Data Kinds and Generalized Algebraic Data Types (GADTs), which allow the type system to define and prove more invariants about your code. When data is decoupled from behavior your data definition is the only means you have to enforce behavior.

MichelDiz · July 25, 2022, 2:23pm

You can give several Type names for a single node(person + father) if you want to. The edge in general just link nodes. For example Jorge is a person and has an edge “Director of” xyz.

matthewmcneely · July 25, 2022, 4:53pm

I tend to think of edges as attributes when data modeling. But I don’t think your definition is wrong, just another perspective.

One interesting aspect of GraphQL SDL in this case is the interface construct. In the case above, you could have a Person interface type, of which Director (and Actor, etc) could derive. In the Movie type, the directors edge could be defined as an array of Person (directors [Person]). This would allow you to add any Person as a director. However, that might not be a good idea if the Director derived type has attributes (edges) that are needed by the application. In that case, better to declare directors [Director]. dgraph GraphQL mutations will ‘type check’ supplied values in that case and enforce only Director types.

If you haven’t read the GraphQL spec on interfaces and derived types: https://graphql.org/learn/schema/#interfaces

Topic		Replies	Views
Schema Updating Ratel kind:question	7	1796	February 28, 2021
Edges are not being generated GraphQL dgraph	1	403	June 19, 2023
Adding new edges without effecting existing edges Dgraph kind:question	2	1013	April 9, 2021
Help with first schema - Node type, edge types, relationships Users	2	760	June 2, 2018
RFC: Proposal for change in Type System Dgraph rfc	4	1003	November 17, 2022

Can new Edges be added to large numbers of existing Nodes by referencing a Node's Type?

Related topics