There is little point in collecting [data] if you are not going to use it.
– @mikehawkes, Curve Jumping - Dgraph to the front - Dgraph Day 2021
Are you still around Mike? I had another conversation today about using Dgraph for OLAP purposes.
Dgraph is great to build PoC applications with especially in a GraphQL framework where you can generate types and much of the building blocks from just your raw types themselves. But when it comes to analyzing the data it seems to me to become hard to do that within Dgraph itself.
I am interested in approaches on how to use Dgraph for analytics, especially for larger data sets built upon large data models.
Within the last 2 years, I had to better understand a data model itself that was spread across thousands of tables in an MS SQL DB. The developers of the system had built tools to help find referenced data, but thought it not necessary to add in any FKs. As hard as it is for many to write SQL queries joining dozens of tabled together, imagine doing that with only PK and indexes, but without FK across thousands of tables. How do you even begin to know what tables can be linked to each other? I took this meta data of the model itself of the tables and columns and imported it into a Dgraph. But quickly realized that the toolset I needed to analyze the graph was missing. I instead took the data and created my own data set that I could load into a graph visualizer and from within that small little Node.JS application I could group and move the graph around trying to make sense of it. I connected similar named fields together and then grouped together related tables and my understanding of the data model itself rapidly improved.
That was a lot of work just to better understand a somewhat simple 2 type model dataset of tables and columns. But not if I were to explode that with all of the data in the massive MS SQL database, how would I even begin to do analytics on it? I know there were some talks around LLMs but I don’t know if I want to go that deep into it, just the surface layer of analytics. Dgraph makes it easy to ingest data and query that data back out in the same shape it went in, but where I see a lot of lacking is being able to transform that data into usable analytics within the database itself.
Or is the answer to select the data you want to analyze and export it into a different platform that is built for analyzing data such as Zoho, Qlik, PowerBi, Jupyter, etc.
I just find it odd that Dgraph does not do much in the way of analytics by itself other than the very basic kind of selection with limited filtering, sorting, and aggregating. And then you still can’t sort an aggregated list. Does anybody do analytics within Dgraph itself, or just use Dgraph for a data store only and turn elsewhere to “use it”