Dgraph v24.0.0-alpha is available now for the community to try out the support for vector data type which enables semantic search.
Dgraph is adding vector support to combine graph data with embeddings, enhancing Graph-based applications and unlocking new AI capabilities. Core graph use cases like fraud detection, recommendations, and master data management can all be supercharged by vectors and embeddings. Graph+Vector is also a key technique used to reduce hallucinations within AI-augmented applications.
This release also includes some performance enhancements and maintenance bug fixes to improve the stability of the database engine.
Key highlights of the release include:
- Support for a native vector type at the DQL level
- Extend Liveloader to work with the vector type (Bulkloader will be available in GA)
- Community contributed PRs:
- Dgraph/Badger fixes:
- Update to Golang v1.22 - performance and monitoring improvements
- Upgraded Golang client
- Number of CVE Fixes
We are working towards a GA release candidate and expect it to be out in May. Dgraph v24 GA will also include GraphQL support for the vector data type and semantic search, a new caching approach that will boost performance of all applications, and a number of community PRs and maintenance fixes.
Note that this (alpha) release is not available on Dgraph Cloud, but the GA release will be released for both on-premise and Dgraph Cloud options. The release binaries and release notes are now available on GitHub. The docker images for dgraph/dgraph and dgraph/standalone are available on DockerHub.
A simple example of using vector embeddings and similarity search queries is shown below. More examples will follow in blog posts and docs in the coming weeks. This example talks about using Ratel for the schema update, mutations and queries, but you can use any approach.
Setup and install dgraph and ratel
Get a Dgraph docker container for the v24 alpha version
docker pull dgraph/standalone:v24.0.0-alpha2
Run a docker container, storing data on your local machine
mkdir ~/dgraph
docker run -d --name dgraph-v24alpha2 -p “8080:8080” -p “9080:9080” -v ~/dgraph:/dgraph dgraph/standalone:standalone:v24.0.0-alpha2
Then get and start the ratel tool
docker pull dgraph/ratel
docker run -d --name ratel -p "8000:8000" dgraph/ratel:latest
Ratel will now be running on localhost:8000
Add a schema, data and test queries
Define a DQL Schema. You can set this via the Ratel schema tab using the bulk edit option.
<Issue.description>: string .
<Issue.vector_embedding>: float32vector @index(hnsw(metric:"cosine")) .
type <Issue> {
Issue.description
Issue.vector_embedding
}
Notice that the new float32vector type is used, with a new index type of hnsw. The hnsw can use a distance metric of cosine, euclidean, or dotproduct . Here we use cosine similarity, which works great if your vectors are not going to be normalized.
At this point, the database will accept and index float vectors.
Insert some data containing short, test-only embeddings using this DQL Mutation
You can paste this into Ratel as a mutation, or use curl, pydgraph or similar:
{
"set": [
{
"dgraph.type": "Issue",
"Issue.vector_embedding": "[0.8, 0.8, 0.5, 0]",
"Issue.description": "Intermittent timeouts. Logs show no such host error."
},
{
"dgraph.type": "Issue",
"Issue.vector_embedding": "[0, 0, 0, 0.7]",
"Issue.description": "Bug when user adds record with blank surName. Field is required so should be checked in web page."
},
{
"dgraph.type": "Issue",
"Issue.vector_embedding": "[0.8, 0, 0.7, 0]",
"Issue.description": "Delays on responses every 30 minutes with high network latency in backplane"
},
{
"dgraph.type": "Issue",
"Issue.vector_embedding": "[0.7, 0.8, 0.5, 0]",
"Issue.description": "Slow queries intermittently. The host is not found according to logs."
},
{
"dgraph.type": "Issue",
"Issue.vector_embedding": "[0.6, 0.3, 1.0, 0]",=
"Issue.description": "Some timeouts. It seems to be a DNS host lookup issue. Seeing No Such Host message."
},
{
"dgraph.type": "Issue",
"Issue.vector_embedding": "[0.5, 0.1, 0.7, 0.7]",
"Issue.description": "Host and DNS issues are causing timeouts in the User Details web page"
}
]
}
A simple query that finds similar questions
You are ready to do similarity queries, to find Issues based on semantic similarity to a new Issue description!
For simplicity, we are not computing large vectors from an LLM. The embeddings above simply represent four concepts which are in the four vector dimensions: which are, respectively:
- Slowness or delays
- Logging or messages
- Networks
- GUIs or web pages
Use case and query
Let’s say a new issue comes in, and you want to use the text description to find other, similar issues you have seen in the past. Use the similarity query below:
If the new issue description is “Slow response and delay in my network!”, we represent this new issue as the vector [0.9, 0.8, 0, 0]. The first “slowness” dimension is high because the description mentions both “slow response” and “delay.” “Logs” is mentioned once, so set dimension two to 0.8. Neither networks nor GUIs are mentioned, so leave those at 0.
Note that the first parameter to similar_to is the DQL field name, the second parameter is the number of results to return, and the third parameter is the vector to look for.
query slownessWithLogs() {
simVec(func: similar_to(
Issue.vector_embedding,
3,
"[0.9, 0.8, 0, 0]")) {
uid
Issue.description
}
}
If you want to send in data using parameters, rewrite this as
query test($vec: float32vector) {
simVec(func: similar_to(Issue.vector_embedding, 3, $vec)) {
uid
Issue.description
}
}
And make a request (again using Ratel) with variable named “vec” set to a JSON float value:
vec: [0.9, 0.8, 0, 0]
Curl alternative
Finally, for those who do not prefer to use Ratel, you can do all these steps via HTTP tools, such as curl:
curl --location 'http://localhost:8080/query' \
--header 'Content-Type: application/json' \
--data ' {
"query": "query test($vec: float32vector) { simVec(func: similar_to(Issue.vector_embedding, 3, $vec)) { uid Issue.description } }",
"variables":{"$vec":"[1,0,0,0]"}
}
'
Summing it up
This end-to-end example shows how you can insert data with vector embeddings, conforming to a schema with the new vector type and an index specifying a cosine similarity vector index, and do a semantic search for Issues via the new similar_to() function in Dgraph.