How handle llm embeddings in dgraph

amir1 · November 12, 2024, 9:33am

How to handle LLM embeddings like read, write

Raphael · November 12, 2024, 4:54pm

In Dgraph an llm Embedding is just another predicate implemented at DQL level and also exposed on the generated GraphQL API.

You can read write delete an embedding like any other predicate, using Dgraph DQL or GraphQL queries and mutations.

See @embedding directive for GraphQL Schema and vector type in DQL with vector index

In DQL you can use similar_to on vector predicates with hnsw index.
At GraphQL level Dgraph generates methods to search by similarity for any type that has at least one predicate with @embedding predicate.

amir1 · November 13, 2024, 11:50am

Thanks , but I can’t able to add ```

name_v: [Float!] @embedding @search(by: [“hnsw(metric: euclidean, exponent: 4)”])

Name_v as embedding can you please share procedure to create predicate with embedding and search.

Also is there any method to search vector using query in docker app other than similar_to function?

Raphael · November 13, 2024, 9:32pm

Are you using Dgraph cloud or on-prem? which version?

Could you clarify what happens when you deploy your GraphQL schema including name_v predicate? What is the error returned?
Please share your graphQL schema, I can try to load it on my side.

amir1 · November 14, 2024, 12:42pm

Hello Raphael, I’m using docker on-prem

Dgraph version : v24.0.4
Dgraph codename : dgraph
Dgraph SHA-256 : 7d3902cec804acebd40fe6af36e83aaa8db126aa1bbdcb161acb52fd1480662b
Commit SHA-1 : 84b07e6
Commit timestamp : 2024-10-08 09:42:18 +0530
Branch : HEAD
Go version : go1.22.7
Currently i’m using query categories($v: float32vector) {
var(func:similar_to(Issue.vector_embedding,15,$v)) {
vemb as Issue.vector_embedding
categorysimilarity as Math (($v) dot vemb)
}
list(func:uid(categorysimilarity)) @filter(gt(val(categorysimilarity),0.8)){
Issue.eid
}
}

Able to get top 10 searches want to check similarity calculation or what method is used? how to choose method to calculate like cosine similarity, euclidean_distance?

Also is there any way to convert text into embedding in dgraph ?

Which vector index method dgraph using ?
while defining schema used <Issue.vector_embedding>: float32vector @index(hnsw(metric:“euclidean”)) . will it play any role while doing vector search?

Raphael · November 17, 2024, 12:40am

@amir1 , that clarifies. Thanks.
So you are using DQL.
Your first post was about the GrapgQL schema, that why I was puzzled.

To clarify the story

Dgraph is using HNSW index. When building the index tree, it’s using the metric (distance) you provide in the schema.
So in your case, euclidean distance:

<Issue.vector_embedding>: float32vector @index(hnsw(metric:“euclidean”)) .

When you query with similar_to dql function, you provide a topK parameter and Dgraph returns the topK most similar items based on the metric used for the index.

You can recompute a score with the Math function in DQL to further filter (as you do), or to respond with a distance, or a score if used by your application.

Ideally, you should compute the same metric. In your example, you are using euclidean distance for the index, but you compute a simple dot product score.
It’s not a big deal if you are using normalized vectors (embeddings from models are normalized).

But I would recommend to use the correct metric.
Refer to the blog post : Using Vector similarity search in DQL - Dgraph Blog
It has a section Computing vector distances and similarity scores.

Topic		Replies	Views
Intro to Dgraph Vector Embeddings GraphQL	3	67	November 18, 2024
Using Vector similarity search in DQL - Dgraph Blog Blog	2	49	July 24, 2024
Understanding the mapping between GraphQL and DQL Schema Definitions Dgraph graphql , kind:question , schema , dql	3	894	July 29, 2021
Writing schema in Dgraph vs GraphQL Dgraph graphql , schema , dgraph , help-wanted	5	812	August 10, 2020
Dgraph and Vector database - the best of two worlds - Dgraph Blog Blog	0	436	August 25, 2023

How handle llm embeddings in dgraph

Related topics