Data Analytics Use Cases

amaster507 · June 30, 2023, 3:10am

There is little point in collecting [data] if you are not going to use it.

– @mikehawkes, Curve Jumping - Dgraph to the front - Dgraph Day 2021

Are you still around Mike? I had another conversation today about using Dgraph for OLAP purposes.

Dgraph is great to build PoC applications with especially in a GraphQL framework where you can generate types and much of the building blocks from just your raw types themselves. But when it comes to analyzing the data it seems to me to become hard to do that within Dgraph itself.

I am interested in approaches on how to use Dgraph for analytics, especially for larger data sets built upon large data models.

Within the last 2 years, I had to better understand a data model itself that was spread across thousands of tables in an MS SQL DB. The developers of the system had built tools to help find referenced data, but thought it not necessary to add in any FKs. As hard as it is for many to write SQL queries joining dozens of tabled together, imagine doing that with only PK and indexes, but without FK across thousands of tables. How do you even begin to know what tables can be linked to each other? I took this meta data of the model itself of the tables and columns and imported it into a Dgraph. But quickly realized that the toolset I needed to analyze the graph was missing. I instead took the data and created my own data set that I could load into a graph visualizer and from within that small little Node.JS application I could group and move the graph around trying to make sense of it. I connected similar named fields together and then grouped together related tables and my understanding of the data model itself rapidly improved.

That was a lot of work just to better understand a somewhat simple 2 type model dataset of tables and columns. But not if I were to explode that with all of the data in the massive MS SQL database, how would I even begin to do analytics on it? I know there were some talks around LLMs but I don’t know if I want to go that deep into it, just the surface layer of analytics. Dgraph makes it easy to ingest data and query that data back out in the same shape it went in, but where I see a lot of lacking is being able to transform that data into usable analytics within the database itself.

Or is the answer to select the data you want to analyze and export it into a different platform that is built for analyzing data such as Zoho, Qlik, PowerBi, Jupyter, etc.

I just find it odd that Dgraph does not do much in the way of analytics by itself other than the very basic kind of selection with limited filtering, sorting, and aggregating. And then you still can’t sort an aggregated list. Does anybody do analytics within Dgraph itself, or just use Dgraph for a data store only and turn elsewhere to “use it”

mikehawkes · June 30, 2023, 9:25am

Tough one for me to answer. We invested a lot of time and energy in DGraph and I pushed so that our company became the first technology partner for DGraph. We did joint PR and announcements to that effect.

I had a team developing an entirely new way of managing data that was underpinned by DGraph. Just before DGraph imploded, everyone stopped talking to us - radio silence. Just after Manish’s departure, we reached out to both DGraph’s new management team, and Manish to see if we could help - we had a full team of developers and had spent a fortune getting our technology built.

Luckily, we had built an abstraction tier that allowed us to migrate and continue our build. It cost us, but we did it. To this day, DGraph has been on radio silence for us. I’ve given up with it.

Anyway - that doesn’t help with your problem. As you point out, DGraph and analytics make poor bed-partners out of the box. We built our own toolkit to allow us to group, aggregate, sort within these sets and groups, and also find distinct values (SELECT DISTINCT in SQL terms). Graphs generally suffer from this kind of issue - and struggle with things like totalling values for sub-entities without kicking off recursive queries that can get stupidly complex.

We are the only Qlik Elite Solutions Provider in Ireland and created interfaces to allow us to visualise DGraph data using Qlik. Back in our DGraph era, we’d ingest into DGraph and run various scripts (as seen in the videos) to allow us to create relevant relationships, then pull that into either D3 (for our custom code) or Qlik (for exploration and discovery).

There are a couple of other things to note with DGraph from an operational perspective, based on our experience - over time, it gets messy and performance drops. To keep (I presume Badger) happy, we need to export the entire database and its schemas and then re-import it. Our instances of DGraph created many SST files over time, and the more of these it created, the more RAM and CPU it needed - we automated dump and re-import as we needed to do it often.

I know there’s a lot of past tense in this reply - and I did wonder if I should respond at all, but since you asked if I was still lurking …

Anyway, for us, I couldn’t accept radio silence - especially as we’d invested so much in our development activities and built a dedicated sales and support team to help DGraph grow in the EU. It made no commercial sense for us to continue with DGraph. We quietly dropped it as a technology, and we’re just about to close down our last service built around it. It’s cost us 2 years, but we’ve built our own platform to do the type of data management we require - with analytics capabilities!

You might find things like an entgo and Postgres combination give you exactly what you need … I’ll pop the links below. Interestingly, in the chats for entgo, I came across a post where Manish was asking whether there was an interest in DGraph/ent support. Ent has a small learning curve, but it works graphs well - and sits on top of Postgres / MySQL et al.

Roadmap for v1 · Issue #46 · ent/ent · GitHub for Manish’s comments back in 2019.

Sorry it’s not a happier tale.

amaster507 · June 30, 2023, 2:03pm

Whoa!

Thank you so much for taking the time to respond and for your transparency. I was asked who would be the expert for data analytics within Dgraph using it for OLAP-style architecture and not just building a real-time app and you came to my mind from your Dgraph Day talk.

I also have recently experienced this too.

yes, same exactly.

I’ll check out ent, I don’t develop in golang, so that while interesting may not be applicable.

mikehawkes · June 30, 2023, 2:39pm

No problem - it’s a real shame as I love the technology - and would have considered supporting it in other ways, if we’d been given the opportunity.

If you need a hand here or there, I may be able to to point you in the right direction (he said, modestly - my modesty’s my strongest point). Bottom line is DGraph has the same issues as any other graph database from an analytics perspective.

Some of our common bits:

We added edges to help us group and aggregate - avoid facets for this type of activity as you tend to trip over yourself later.

We also found it useful to use the target node type as a component of the name in edges - it makes analytics much more specific and lets you have explicit queries following known node/edge paths. Invoice_LineItems and Basket_LineItems can both ‘point’ to LineItem types, but you know absolutely what you’re asking the analytics tools to pull out.

We also did explicit definitions for reverse edges - every edge was defined - reverse edges can lose their meaning when you try to do a reverse-edge lookup in something like Qlik. Much better to define them both explicitly.

Again, from a distinct perspective, we found it better to use intermediate nodes with edges pointing to the ‘real nodes.’ We could query the intermediates, and still get to the real nodes, if needed.

Probably teaching you to suck eggs here - apologies if that’s the case.

Best of luck with your project.
Mike

Raphael · July 3, 2023, 3:56pm

@mikehawkes I learn your story from this post and it is regrettable. The only positive point is that you are not the only one who ‘love the technology’, there is now a new passionate team here at Dgraph.
I’m taking product direction and OLAP is a workload we want to eventually support ( I come from a company that acquired Spotfire, so I have some notions of visual analytics myself).
Your post contains useful information for me. If you have part of your work as open-source I’d like to study it too.
Thank you for your post to the Dgraph community, despite your past history. We are here to make Dgraph the product it deserves to be, it has very strong foundations and your experience is helping us set directions.

Raphael.

amaster507 · July 6, 2023, 3:11am

@Raphael, FWIW, in light of this post, I think it would be best to replace this quote on your homepage with someone who is actually still using Dgraph in production. This could be considered a false/poor advertisement if potential customers come to this page and put the two together.

“Dgraph is a no-brainer. I can ingest any data and any structure, and I don’t have to worry about it. If I were trying to do this in a SQL table, we would end up with horrible joins and get tied up in knots. Dgraph is a godsend.”
Mike Hawkes
CTO, Capventis

Raphael · July 6, 2023, 4:14pm

Anthony, I don’t see the quote from Mike on https://dgraph.io/. Are you referring to another page?

amaster507 · July 6, 2023, 4:19pm

Raphael · July 6, 2023, 4:58pm

Thanks.

Topic		Replies	Views
What data analytics tool can be used for dgraph data? Dgraph	5	654	November 18, 2022
Dgraph Data Analization Ideas Misc	4	589	November 28, 2022
Mike Hawkes - Curve Jumping - Dgraph to the front Dgraph Day	0	648	March 30, 2021
Is Dgraph Good for Analytics Customer Event Data? Users	4	1068	April 24, 2018
Analytic for Apps With DGraph Backend Dgraph	4	316	May 22, 2021

Data Analytics Use Cases

Related topics