How to handle LLM embeddings like read, write
In Dgraph an llm Embedding is just another predicate implemented at DQL level and also exposed on the generated GraphQL API.
You can read write delete an embedding like any other predicate, using Dgraph DQL or GraphQL queries and mutations.
See @embedding directive for GraphQL Schema and vector type in DQL with vector index
In DQL you can use similar_to on vector predicates with hnsw index.
At GraphQL level Dgraph generates methods to search by similarity for any type that has at least one predicate with @embedding predicate.
Thanks , but I can’t able to add ```
name_v: [Float!] @embedding @search(by: [“hnsw(metric: euclidean, exponent: 4)”])
Name_v as embedding can you please share procedure to create predicate with embedding and search.
Also is there any method to search vector using query in docker app other than similar_to function?
Are you using Dgraph cloud or on-prem? which version?
Could you clarify what happens when you deploy your GraphQL schema including name_v predicate? What is the error returned?
Please share your graphQL schema, I can try to load it on my side.
Hello Raphael, I’m using docker on-prem
Dgraph version : v24.0.4
Dgraph codename : dgraph
Dgraph SHA-256 : 7d3902cec804acebd40fe6af36e83aaa8db126aa1bbdcb161acb52fd1480662b
Commit SHA-1 : 84b07e6
Commit timestamp : 2024-10-08 09:42:18 +0530
Branch : HEAD
Go version : go1.22.7
Currently i’m using query categories($v: float32vector) {
var(func:similar_to(Issue.vector_embedding,15,$v)) {
vemb as Issue.vector_embedding
categorysimilarity as Math (($v) dot vemb)
}
list(func:uid(categorysimilarity)) @filter(gt(val(categorysimilarity),0.8)){
Issue.eid
}
}
Able to get top 10 searches want to check similarity calculation or what method is used? how to choose method to calculate like cosine similarity, euclidean_distance?
Also is there any way to convert text into embedding in dgraph ?
Which vector index method dgraph using ?
while defining schema used <Issue.vector_embedding>: float32vector @index(hnsw(metric:“euclidean”)) . will it play any role while doing vector search?
@amir1 , that clarifies. Thanks.
So you are using DQL.
Your first post was about the GrapgQL schema, that why I was puzzled.
To clarify the story
Dgraph is using HNSW index. When building the index tree, it’s using the metric (distance) you provide in the schema.
So in your case, euclidean distance:
<Issue.vector_embedding>: float32vector @index(hnsw(metric:“euclidean”)) .
When you query with similar_to
dql function, you provide a topK
parameter and Dgraph returns the topK most similar items based on the metric used for the index.
You can recompute a score with the Math function in DQL to further filter (as you do), or to respond with a distance, or a score if used by your application.
Ideally, you should compute the same metric. In your example, you are using euclidean
distance for the index, but you compute a simple dot product
score.
It’s not a big deal if you are using normalized vectors (embeddings from models are normalized).
But I would recommend to use the correct metric.
Refer to the blog post : Using Vector similarity search in DQL - Dgraph Blog
It has a section Computing vector distances and similarity scores
.