Graph + Vector vs Graph on Vector - Dgraph Blog

Graph + Vector vs Graph on Vector

Graph + Vector vs Graph on Vector

The next version of Dgraph will support both graph and vector data structures to allow multi-model queries on graph, vector, text, and geospatial information.

We’re seeking Private Beta participants & partners:

With each technology platform shift, data management software had to evolve to meet the needs of those new applications. The Web brought Big Data, Mobile brought on NoSQL, the web hit scale, SQL had to make room for Big Data. With this latest wave of LLMs, modern apps now need vectors and graphs.

The recent iterations of Language Models (LMs) are able to interpret large chunks of unstructured data (documents) and generate an encoding of that document expressed as a vector. In plain language, these LMs are able to explain the contents of a document as a list of numbers. That list of numbers, called a vector, can then be compared to other lists of numbers. This process is called vector search.

The ability to quickly compare chunks of unstructured data unlocks dozens of previously impossible use cases. Things like recommendations, fraud detection, entity resolution, and natural language processing all become significantly easier.

As powerful as this is, a known limitation of LLMs is that they are only as current as the data they are trained on. Thus, the LLM is usually behind the current state of the data. This can be overcome by adding contents that represent more current data in the question asked of the LLM.

We can iteratively store previously generated encodings in a database. Then, when asking the LLM to perform work, we can send over documents based on already stored encodings, improving the context from which the LLM can respond.

This workaround has led to the emergence of vector-first databases like Pinecone, Weaviate, and Milvus, while established data-stores like Postgres, Elasticsearch, Redis, and MongoDB have incorporated support for storing vector embeddings and similarity search.

Graph-based Indexes

In nearly all implementations of vector search for native vector databases and vector support in traditional databases, graphs and trees are used to index vectors. This graph structure enables efficient searches across very large sets of vectors.

Vectors are stored and then graphs are built as indexes on top of them in a Graphs-on-Vector (GOV) approach.

However, the GOV approach only partially harnesses the potential of graphs.

Graphs can be used to traverse multi-dimensional space to calculate adjacencies, but, more importantly, can represent structured data that explains the relationships between those entities.

Very few real-world AI applications can function without composite queries that retrieve data from a database’s structured and vector components. Recognizing this, we adopted a Graph-and-Vector (GAV) approach. In this approach, both graph and vector are primary components in Dgraph, allowing for multi-model queries on graph, vector, text, and geospatial information.

This unlocks things like model-output explainability and hallucination identification, and requires fewer systems to manage. i.e., not needing to ping 3 databases to check “what should we recommend, is it in stock, and is it within 2-day shipping of the buyer?”

Dgraph’s ability to handle multi-model queries, ACID compliance, and scalability make it ideal as the transactional backend for AI applications.

We are excited about the new version of Dgraph being released soon. We are currently accepting Private Beta participants & partners to grant early access to the product preview, particularly for our active customers and community members.

Please provide your email and we’ll contact you:


This is a companion discussion topic for the original entry at https://dgraph.io/blog/post/20231010-graph-plus-vector-vs-graph-on-vector
4 Likes

Glad to see Dgraph being positioned and the backend for transactional AI Applications. Natively supporting Vectors is the way to go.

Sounds cool and exciting (probably what your investors want to hear). But I am wondering, why you are not focusing on filling the current gaps that DGraph knowingly has, before you add completely new features?

2 Likes

I spent sometime trying to see where the graph / vector work is happening on the core project codebase. Can you please point me to that?

Or is this happening in a private repo? If private, are you accepting OSS contributors help?

I’m guessing it’s all private, KVG mentions Hypermode has been building in stealth, so I’m guessing it’s not just in the DGraph codebase, but a new layer of infra that sits on top of it.

https://x.com/kevinvangundy/status/1721617965035270384?s=20

My understanding is that they’re giving early access to people who are interested, so you can probably DM KVG and ask.

TBH, it’s been crickets ever since I have dm’d him & dgraph support on these points. I am guessing they must be busy. Also, the youtube live with the CTO/CEO was too abstract to even understand what are they upto. It was quite dissapointing to see no clear plans (may be my engineer brain needs that level of clarity). Dgraph Labs is becoming part of Hypermode - Dgraph Blog - #10 by honeybadger → someone please get back.

Coming back to the issue at hand, I have started playing with Zilliz / Pinecone and already laying my migration plans out of Dgraph (if Dgraph chooses to continue to operate in silence). I feel like a database company should always opensource the code - to build trust! The vector work appeared to be a feature flag implementation in Dgraph when it was first announced. I don’t see a point in keeping it secret. It is starting to feel like there is no progress here or there was no progress made to begin with.

Definitely frustrated … but still a little hopeful.

3 Likes

What I found interesting was the fact that the vector work was initially incorporated into Dgraph on Github. I had been monitoring a branch that included developments on vectors, and suddenly, the branch was removed. I’m intrigued about what will happen next; perhaps the team needs some time to coordinate and adapt to the move to Hypermode.

1 Like

I don’t understand the rationale behind removing or deleting branches needlessly. Github makes it’s offering free for OSS projects. It doesn’t matter how many branches you have or don’t. It’s simply a bad idea to remove this code from public view. I guess its time to move to other vector DBs that build in public.

@mrjn you were in silence for this long. Your recent post on Dgraph has given the community some hope. If you are considering a resurrection of this outside of dgraph or by returning to dgraph, it can help build some traction.

@iyinoluwaayoola do you happen to have that code on your fork to look at?

No, I don’t. I’m not attempting to generate negativity; I simply found it peculiar. I believe they may be exercising caution until decisions become more definitive following their integration into Hypermode. This post implies that they plan to eventually release the work as open source.

3 Likes

Everything in Dgraph has always been “soon”. We’ve been waiting for “soon” for many years now. How much longer do we need to wait? Days, weeks, months, years, decades, milleniums?

3 Likes