Graph Data Science on Dgraph

Robin_Andersen · November 25, 2021, 9:18pm

Hi folks!

We are a startup building a social learning tool based on a graph database. We require a sufficiently good recommendation system, as well as setting an up API with predictions, and the two alternatives when choosing the graph database (as for many) is Neo4j and Dgraph (we will use GraphQL regardless).

As the data science components (search algorithms, predictions) is a core component, we are very tempted to choose Neo4j due to their Graph Data Science Library. However, we like the “graphQL all the way”-aspect of Dgraph, as well as the fact that Dgraph is built on Go (vs the slower Java/Scala for Neo4j) - in addition to the (most likely) stiffer pricing of Neo4j.

We saw that the Data science library for Dgraph is very limited ATM, so we wonder is there is a smooth way to incorporate graph algorithms in queries in Dgraph, or in general some tips on how we can create a recommendation system without much fuzz and/or extra costs (expect for hosting due to the need of some hefty parallelization)?

Best regards!

EnricoMi · November 27, 2021, 11:32am

I think the current best way to run graph algorithms on Dgraph data is to load your graph / sub-graph into Spark and use one of the existing Spark graph algorithms libraries.

There is not much (graph) computational power implemented in Dgraph and it does not look like this is anywhere on the road. Which is fine given you can couple a system that is great in storing and query graph data with a system that is great with any big data and graph processing.

Juri · November 27, 2021, 9:57pm

you can calculate amazing stuff just with DQL. no extra library needed. everything works out of the box thanks to the awesome DQL features like value variables that also sum up

check out that tutorial

you will be amazed by how f*cking easy it is to build with Dgraph an awesome reliable recommendation system with just few lines

Robin_Andersen · November 28, 2021, 11:28pm

Thank you very much - that makes sense, and is in fact pretty much the same approach as using Neo4j Graph Data Science Library (GDS) with Neo4j as Apache Spark and Neo4j GDS is pretty much the same (Spark is even better in terms of existing algos, but the concept is the same).

My only concern here is that we are building a knowledge graph, so we need to create embeddings that is always up-to-date with the current data stored in Dgraph, i.e. the current graph (we aim to build a system that can potentially host millions of user in best case). Thus, we need to be able to mutate the graph with embeddings all the time at a potentially very high frequency. Do you think that this would be a bigger problem with this solution (Spark+Dgraph) than the Neo4j version?

Robin_Andersen · November 28, 2021, 11:41pm

Thank you!

This approach may absolutely be useful for some of the basic things we aim to do, but we will most likely need some more more complex/clever algorithms to build what we aim to build. The reason for this is that we need to create a very good knowledge graph - which most likely need embeddings which humans cannot understand (i.e. we most likely need Clustering algos, GraphSAGE, Personalized PageRank etc.). I really don´t think I want to implement these algos from scratch in DQL.

Disclaimer: My answer might be due to lack of knowledge of the features of/opportunities in DQL, so I´d be happy to be corrected

Dave_Aitel · November 29, 2021, 7:34pm

I’ll be honest: I think the right path here is Neo4j. To do this in DGraph what you’ll do (which is what I’ve done) is do computations in Apache SPARK, write those tables out as a json file, then write an importer that puts them in your DGraph. Likewise, a lot of the clustering and other algorithms you want are not implemented in Apache SPARK.

lee.chen · November 30, 2021, 2:14am

This blog is great.

Robin_Andersen · November 30, 2021, 11:34am

Thank you, sir!

We will still see - we still might need the speed advantage and/or the better scalability of Dgraph over Neo4j, so we might end up building an own set of models with tensorflow/keras instead. But yeah, it seems that Neo4j is the way to go for now due to simplicity of development - we can just create simulations to test for latency and scaling problems, and that should be a rather easy job

Topic		Replies	Views
How about a graph compute engine based on dgraph except for spark? Dgraph kind:question	4	726	December 28, 2020
Current methods for running graph analyses (e.g. betweenness, centrality)? Misc	3	1410	February 28, 2020
Dgraph vs Datomic vs Neo4j etc Dgraph	8	2877	July 16, 2018
Dgraph compared to other databases - Migration Documentation	0	534	January 28, 2021
Help me decide between Dgraph and Neo4J (paid) Dgraph	7	535	April 12, 2024

Graph Data Science on Dgraph

Related topics