From neo4j to dgraph

Hi everyone !
We are using neo4j in production, but we are facing some scalability problem (and some stability too… yesterday our neo4j cluster lost 6times somes of his pods due to cpu usage…)
Well, i just discovered yesterday Dgraph, and it’s look like very interesting !
Never played before with graphql, if you have time to help me a lot would be appreciate.

  • In neo4j, we have big graph (depth max: ~10), and sometimes we need to find out all types, which are dynamics. In Neo4j, we was using Types as Label, and we just had to do : (start)-[:RELATIONSHIP *]->(end) to iterate to the whole graph. It is possible to do this in graphql ?
  • For dynamics types like that, it is a better practice to dynamically add schema to dgraph if not present and then add data ? (think its not… but we was doing this on neo4j with label)
  • we had performance issue on neo4j, because we have big number of input (sometimes 100k in less than a hour), and we have to merge it… so taking some locks etc… Neo4j was litteraly dying. Would be better in dgraph ?

Thanks for your help,
Gautier

In Dgraph there is a recurse() query pattern that allows arbitrary depth queries. You must tell it which relationships to follow though - so if you do not know what they are, that may be a problem.

But depth 10 is fine, no inherent worries there unless you are returning millions of values maybe.

https://dgraph.io/docs/query-language/recurse-query/

GraphQL is more for the web developer as middleware. You should look into DQL, the cypher equivalent.

J

1 Like

Oh missed the part where OP said GraphQL - yea don’t do that coming from neo4j - stick with DQL, which is only graphql-ish

1 Like

Thanks for your advice. Still leaning about dgraph.
In neo4j we was doing multi labeling on asset, in dgraph you describe schema about asset but what if an asset is both 2 types ?
For example, a Lion is a Animal but also a feline. It is a good pratice in dgraph?

1 Like

A node in Dgraph can have multiple types. That is no problem a node can also have predicates that are not of any type. In fact you could have a schemaless design in Dgraph using DQL where predicate data types are derrived from their first use and then you later if wanted combine these predicates into types or just add indexes on them to improve search capabilities. Without defined types things like expand won’t work though, because it uses the schema to know what predicates to query. Bottom line is if a node is of one type there is nothing in Dgraph (using DQL) to stop it from having predicates (aka properties and edges) of another type or not defined by any type.

1 Like

Thanks for your help.
Also, in neo4j we had many many types of relationships.
For context, we are doing network cartography, and every cartography had a Nodes containing an Id, and that Id was the relationship using to traverse graph (in neo4j)
→ was the faster way to traverse whole data of a graph (sometimes ~30k nodes) in neo4j we found.
Looks like very different in dql, do you have any pattern idea to do that ? We need to know every time an assets has been detected, without having any duplicate.

Also, we had to traverse whole graph for 2 reasons, to get all type in the following graph, and to draw the whole graph. Think with aggregation we can get all types from children without traverse the whole graph, no?

Could you share what you did in neo4j vs what you have tried in Dgraph? You can get the types with aggregations, or with schema queries if you want to get all possible predicate and supertypes - which would be much quicker.

Thanks for your help,
Here’s a “simple” schema i juste created which cover all our use case in neo4j. Please consider after Port, we still have many different type of data, the graph is bigger and bigger after port.

So here, for SuperUser/User/Team, i have no problem using dgraph.

One of our problem here (in neo4j) is that Entity has many nodes connected to him, and so every time we had a data on it (with merge, and constraints, to avoid duplicate), we have very bad performance in neo4j, and cpu loads at 100% all the time, and sometimes data drop.
Would like to cover all our use case to see if dgraph should be a better option than neo4j.

What is not clear for now, is :

  • As you see, we have some Cartography Nodes which representation a Cartography at a time. This nodes contains a value which represente the label’s relationship in our graph, and we using it to traverse the graph and get the data at a given time. What do you think about it ? We had to do that in neo4j instead of storing a key in the relationship because of speed. Should we use the same technique with dgraph ? It is a good pratice with your db’s architecture?

  • to use filter on our graph, and because we have many different kind of node, we have to get all nodes from graph after the Entity, and then get all their labels and count how many different value do you have for a type. For example, for Port, you may have 22, 443 and 80 (so 3 Port to filter on). This query is very slow in neo4j, and i think it should be better use aggregation, What do you think ? It is even possible to use aggregation on this use case?

I still think as a neo4j user, and not with dgraph point of view, so before testing resquests whould like to think about how to architect my data in dgraph to optimise speed.
Thanks for your time, very appreciate

1 Like

yea seems possible to me - maybe you can paste the cypher query and the dgraph query that you have attempted. Assuming you go from user->entity->on, the DQL query would be something like:

(by the way I have not run this since I dont have your data or really your request pattern, but this may give you some ideas)

{
  var(func: type(User)) @filter(eq(<user.name>,"Jimbo")) { # type() function just means dgraph.type=="User"
    user.name # guessing on field names here, nothing special about this.
    has_entity { # i do not know how many you expect here but could be many I am guessing
      entity.name
      has_collections {
        assetsVar as has_asset # store children of has_asset in the variable assetsVar
      }
    }
  }
  allAssets as var(func: uid(assetsVar)) @recurse(depth: 5) { # pick a reasonable max depth
     # include here the edge(s) to recurse over
     has_asset @filter(type(Port))
  }
  ports(func: uid(allAssets)) @groupby(<asset.port>) {
    count(uid) # should return a map of asset.port:count
  }
}
1 Like