Q1 Update from Dgraph HQ

tl;dr - we spent a lot of time steadying the ship. Dgraph has been through three major leadership changes and, frankly, operations were in very rough shape. Now things are much better and a major release is coming this quarter!

We spent a significant amount of effort toward getting Dgraph’s business and product fundamentals right. This included reducing burn/inefficiencies, securing new (unannounced) funding, improving underlying infra cost/reliability, and selecting the right team of engineers to dedicate full-time to Dgraph’s OS projects and commercial products. We also added a handful of large customers, which goes a long way in ensuring a bright future for Dgraph as a whole.

The fundamental promise of any database is reliability. As a result of the team’s effort, Dgraph Cloud is more stable than ever. We modernized most of the underlying infra and ensured that all GA releases of Dgraph will be available on Dgraph Cloud the same day.

This was no small effort and it means that we fell short of what I hoped we’d accomplish in Q1 in terms of enhancing Dgraph. But we’ve stabilized the ship and we’re nearly through our “back to basics” push. We’re now at a point we can start introducing new enhancements.

Here’s what the Community can expect from Dgraph in Q2:

  1. Vector Support. Alpha release is this Friday; GA is expected in May as part of v24.

  2. Maintenance update(s) for performance and usability improvements starting with v24 (notably an upgrade from Golang 1.19 to 1.22).

  3. Embrace external contributions (PRs).

Thank you to the community for your continued support. Those who have messaged me here or via email/Linkedin/etc. the direct feedback is appreciated and helpful for keeping me focused on what’s most important to this community.


If you’d like to give feedback on these plans, please read the expanded details below that I pulled together with the team:

Vector Support

The biggest and most impactful change in v24 will be the inclusion of vector support. The Alpha will be released this Friday (April 5, 2024). We believe the future of technology will be “AI-augmented” and we want Dgraph to be a meaningful part of that future.

Dgraph will support vector-based similarity searches, which surface inferred links and recommendations. To do this, Dgraph will support a new vfloat type, dot-product and related math operators, import/export and RDF representations of float arrays, and a tunable HNSW (hierarchical, navigable, small-world) index option for vfloat fields. As you would expect in Dgraph, the GraphQL APIs that are auto-generated from a minimal GraphQL schema will include similarity search functions, based on both externally computed embeddings, and the UIDs of existing entities. This unlocks a number of graph+vector use cases with RAG-based LLM queries.

This Friday’s Alpha will come with a blog post and we look forward to early adopter feedback in GitHub. Assuming all goes well in the Alpha, the GA release will occur in May as part of Dgraph v24 (to be followed closely by Hypermode’s first public releases).

Maintenance update(s)

Dgraph v24 will have numerous performance and performance tuning improvements. The largest of these is an update to how caching is performed in Badger. It will vastly reduce cache invalidations to prevent expensive re-computation of internal data structures (reducing value conversion to posting lists, for those familiar with Dgraph internals). This eliminates CPU and memory management bottlenecks that can occur with some usage profiles.

Related to performance, v24 will be compiled using Golang v1.22, which includes better garbage collection and more complete monitoring, particularly around how mutex delays are monitored.

Various smaller improvements and fixes are also coming with v24, including code optimizations for simple ID-based get operations in GraphQL, the ability to update or alter ID fields, and other items that will be detailed at release time in the release notes.

We’re on track to put out a release candidate for v24 in the next 2-4 weeks.

Embrace external contributions.

Our release cadence going forward will be to put out a maintenance release with new fixes and improvements at least once per quarter.

Going forward we’re going to embrace our external contributors. To start, we’ll review all existing PRs for inclusion. We have a large backlog, so most of Q2 will be about catching up. We’ll work with you to identify the top-priority PRs and merge them after the v24 GA release in May. We’ll review and merge (as appropriate) at least quarterly.

Thank you to everyone who has made changes and contributions. If you have particular PRs you would like to see in Dgraph, please let us know, plus any context about that PR that will be helpful.

7 Likes

Just wondering when or if any of these items might be completed:

I’m also interested if you have even seen this post yet @KVG ?

2 Likes

Thank you, as always, Anthony for helping curate these items. We really could not do this without your and others’ input and even strong push toward the right features. We do have all these items (and more) on our backlog, and like any team we will try to pick them in the best possible order.

As you know, we now have a strong AI product as well (see http://hypermode.com) which upended our prior list of priorities. We needed to immediately ensure Dgraph was a rock-solid, AI-capable component so it would be a good backing store and compatible component in the overall product suite (Hypermode+Dgraph) - hence the focus on Vectors, stability and performance for this release.

We also have large customers pushing the limits on how much performance they can get out of various cluster sizes and hardware. The golang profiles from them pointed us towards the caching and perf changes as the way to improve.

So this release is pretty much in the bag and is going to both open up completely new functionality for anyone using Dgraph who also needs to add AI to the product in a smart, easy way, and also enable greater scale at lower cost. I’m actually excited and think that was the right choice for this release.

But it does not mean that we won’t also focus on the features, improvements, docs and other items in the existing backlog - they just did not make v24 due to the re-prioritization needed to harmonize Dgraph and Hypermode stacks and improve scale and reduce cost for the largest customers.

Full disclosure, the entire backlog is going to shift again, at least somewhat, as we now have AI capabilities and expect new requests as people figure out how to use that best. We are still finding the best use cases for graph+AI which will demand new features, and some of those items will doubtless make it to the top and realistically have to push an existing high-priority item down.

Thank you particularly for your last item “introduction into the Dgraph codebase.” Everybody wants the features that match their own system and priorities, but it makes a big difference when people are also willing to help or contribute. As the new, dedicated team divides up work and system areas, we will do some knowledge transfer internally, so we can also document and maybe record videos or have working sessions as we do this ourselves. Your input on the best way to do this is most welcome (perhaps deserving a separate thread).

Separately, we welcome ideas on the new AI (vector) capabilities, and how people can and want to use AI on top of new or existing dgraph DBs. That will also feed into our priorities going forward.

With sincere thanks and appreciation,

Damon

2 Likes

Why would anyone want to use the AI portion when the validity of the model can’t be maintained.

Seriously, if you want me to poke it apart someone spin up an instance with your best “non Enterprise” and I’ll gladly hack it apart for you.

I understood it took adding vector support to keep the cash flow, but that is an unfortunate turn of events IMHO.

1 Like

At this point and with an AI company of your own, are we far from having an AI assistant that can help a user build and maintain a database?

Haha yes, eventually we’ll have agents that give us tested schemas, mutations and queries from prompts… It’s a matter of time, probably not too much. Blaze already does it for SQL!

I’m really excited for the maintenance updates; I’d love to see a list of items slated to be in that list so I can scour them for how they may assist our deployment, especially the performance related ones. We’ve overhauled our schema for improved performance, but we’re still not hitting what we believe is the upper bound of dgraph ingestion limits. I’m also excited about the external contributions. That’s a great way to keep the community involved. I posted previously about supporting concurrent transactions and how that would help support our team, it would be a big performance boost. Overall, I look forward to the details of the performance improvements you have for the v24 release. Thanks for keeping to your commitment with a Q1 update!

2 Likes

lol, I love that meme. I used it a lot about 6 years ago. But I’ve changed my mind. Today’s AI is not the same as a pre-programmed Bot. LLMs are truly an interesting and brilliant concept. Using vector space to cluster semantics and derive mathematically precise probabilities, resulting in texts that closely approximate real-world facts, feels like some form of dark magic to me.

I was initially skeptical about GPTs. I tried “Replika” in 2018 and was underwhelmed. The concept was intriguing, but it didn’t captivate me. It was AI Dungeon in 2020 that started to pique my interest—I even discussed with Manish and others internally about the possibility of looking for someone in their circle of friends to help us potentially train on GraphQL+ with GPT2 and 3. Trying it out, I realized it wasn’t just any bot. But I only truly understood how it worked recently, in 2023. And indeed, it’s genius. There’s no reason to resist the tide. Is it based on the same principle as the word guess system on our phone keyboards? Yes, but it actually works!

My only concern is with this overwhelming hype. Everything is labeled AI now. My coffee maker has AI. The app for my condo’s lobby has AI. Every Chinese gadget now comes with an AI tag. This commercial overuse is foolish. It attempts to overshadow something truly impressive.

1 Like

Yeah I agree, we’ve reached a new level of AI. Perhaps it’s worth putting the effort to adding vectors to Dgraph, while most users may not have a use for it, there may those few customers that need it and also provide the cash flow to keep the project going.

The maintenance stuff is the most exciting to me as well, and opening up for pull requests.

1 Like

I believe the advantage here lies in the area of indexing. I’ve seen the feature in action; I just haven’t gotten hands-on with it. However, it seems like it will be highly beneficial, especially for comparison purposes. You’ll be able to compare nodes logically (of course, after training your data and inject into database). Vector support is basic; you will need to either train it yourself or use an external API for that. Dgraph will only handle vector storage and allow for querying at the root with filters. Essentially, it facilitates “Vector Search”. I believe.

All signs point to it being incredibly useful - especially for AI devs. The only thing left to determine is how well it performs with large datasets. For smaller ones, it appeared to me to be very efficient.

And I don’t believe the database will come with an “agent”; I think this feature is more about providing support for vectors, positioning it to compete indirectly with Vector Databases like Chroma, Pinecone, etc.

1 Like

It’d be nice to see a How-To / Tutorial / Short-form video about the new Vector Support in action.

Well there’s v24.0.0-alpha2 Release Dgraph v24.0.0-alpha2 · dgraph-io/dgraph · GitHub so I guess we can get hands-on with it now.

2 Likes

Sounds nice. I’ll give a try.

It would be really helpful if there was additional context provided for these bug fixes. I noticed this one: fix(raft):alpha leader fails to stream snapshot to new alpha nodes by shivaji-dgraph · Pull Request #9022 · dgraph-io/dgraph · GitHub references and closes an internal hypermode ticket. I’d imagine there’s a lot more detail in that to support why this bug fix is impactful and what situations it resolves. This detail would be helpful for users to decide if they want to upgrade to resolve exisiting issues.

Example bug fix: https://github.com/dgraph-io/dgraph/pull/9022
Description: Certain issues arise when a snapshot is taken by the alpha leader in a cluster and a new node joins afterwards.
Closes: https://linear.app/hypermode/issue/HYP-163/alpha-leader-fails-to-stream-snapshot-to-new-alpha-nodes

1 Like

@rahst12 Sorry for the vague PR names. We will try to do better. This issue specifically is about when you add new nodes to an existing cluster, sometimes the leader does not stream snapshot to followers. Because of these the new nodes are not able to serve the data. Its a high impact issue that only happens in certain situations.

1 Like