How to optimize the query?

praslar · October 11, 2022, 10:28am

What I want to do

I want to have the best performance query to get a recommended list.

What I did

I have a list of users, each user has a unique address and a list of tags (tag can be anything the user has: tokens, projects - number of tags of each user is from 5 → 500 tags).
I want to get a list of 10 users whose tags are most similar to the current user.
My current query get bigger latency when the number of user increase (>10M users)
Here is my dgraph data model:

{
  "data": {
    "user": [
      {
        "address": "0x79852a2b8386587daad501d90674996dd19d88c9",
        "tagged": [
          {
						// token
            "name": "token:bsc_btc"
          },
          {
						// chain
            "name": "chain:bsc"
          },
          {
            // project
            "name": "project:bsc_aavev3"
          }
        ],
      }
    ]
}

And here is my current query:

{
	my_token(func: eq(address, "0x80c1adfb1192d781a03cae1ac84faecac5c91a8a")) {
		t as tagged
	}
	

	var(func: type(User)) {
		x as count(tagged @filter(uid(t)))
		norm as math(1)
		score as math(x*norm)
	}
	

	suggestions(func: uid(score), orderdesc: val(score), first: 10) {
			address
                val(score)
	}
}

The query performance get bigger latency when the number of user increase

200 record

"server_latency": {
     "parsing_ns": 93100,
     "processing_ns": 1513700,
     "encoding_ns": 37700,
     "assign_timestamp_ns": 834700,
     "total_ns": 2.531.400
   },

4000 record

"extensions": {
   "server_latency": {
     "parsing_ns": 94400,
     "processing_ns": 6001900,
     "encoding_ns": 30800,
     "assign_timestamp_ns": 630300,
     "total_ns": 6.824.800
   },

20.000 record

"extensions": {
   "server_latency": {
     "parsing_ns": 1465400,
     "processing_ns": 167639600,
     "encoding_ns": 110500,
     "assign_timestamp_ns": 775900,
     "total_ns": 170.061.400
   }

This is my current docker-compose

version: "3.2"
networks:
  dgraph:

services:
  zero1:
    image: dgraph/dgraph:v21.03.2
    volumes:
      - ./dgraph_data/zero1:/dgraph
    ports:
      - "5081:5080"
      - "6081:6080"
    networks:
      - dgraph
    command: dgraph zero --my=zero1:5080 --replicas 3 --raft="idx=1"

  zero2:
    image: dgraph/dgraph:v21.03.2
    depends_on:
      - zero1
    volumes:
      - ./dgraph_data/zero2:/dgraph
    ports:
      - "5082:5080"
      - "6082:6080"
    networks:
      - dgraph
    command: dgraph zero --my=zero2:5080 --replicas 3 --peer zero1:5080 --raft="idx=2"
  zero3:
    image: dgraph/dgraph:v21.03.2
    depends_on:
      - zero2
    volumes:
      - ./dgraph_data/zero3:/dgraph
    ports:
      - "5083:5080"
      - "6083:6080"
    networks:
      - dgraph
    command: dgraph zero --my=zero3:5080 --replicas 3 --peer zero1:5080 --raft="idx=3"

  alpha1:
    image: dgraph/dgraph:v21.03.2
    depends_on:
      - zero3
    volumes:
      - ./dgraph_data/alpha1:/dgraph
    ports:
      - "8081:8080"
      - "9081:9080"
    networks:
      - dgraph
    command: dgraph alpha --my=alpha1:7080 --zero=zero1:5080,zero2:5080,zero3:5080
      --security "whitelist=0.0.0.0/0"
      --telemetry "reports=false; sentry=false;"

  alpha2:
    image: dgraph/dgraph:v21.03.2
    depends_on:
      - alpha1
    volumes:
      - ./dgraph_data/alpha2:/dgraph
    ports:
      - "8082:8080"
      - "9082:9080"
    networks:
      - dgraph
    command: dgraph alpha --my=alpha2:7080 --zero=zero1:5080,zero2:5080,zero3:5080
      --security "whitelist=0.0.0.0/0"
      --telemetry "reports=false; sentry=false;"

  alpha3:
    image: dgraph/dgraph:v21.03.2
    depends_on:
      - alpha2
    volumes:
      - ./dgraph_data/alpha3:/dgraph
    ports:
      - "8083:8080"
      - "9083:9080"
    networks:
      - dgraph
    command: dgraph alpha --my=alpha3:7080 --zero=zero1:5080,zero2:5080,zero3:5080
      --security "whitelist=0.0.0.0/0"
      --telemetry "reports=false; sentry=false;"
  alpha4:
    image: dgraph/dgraph:v21.03.2
    depends_on:
      - alpha3
    volumes:
      - ./dgraph_data/alpha4:/dgraph
    ports:
      - "8084:8080"
      - "9084:9080"
    networks:
      - dgraph
    command: dgraph alpha --my=alpha4:7080 --zero=zero1:5080,zero2:5080,zero3:5080
      --security "whitelist=0.0.0.0/0"
      --telemetry "reports=false; sentry=false;"
  alpha5:
    image: dgraph/dgraph:v21.03.2
    depends_on:
      - alpha4
    volumes:
      - ./dgraph_data/alpha5:/dgraph
    ports:
      - "8085:8080"
      - "9085:9080"
    networks:
      - dgraph
    command: dgraph alpha --my=alpha5:7080 --zero=zero1:5080,zero2:5080,zero3:5080
      --security "whitelist=0.0.0.0/0"
      --telemetry "reports=false; sentry=false;"
  alpha6:
    image: dgraph/dgraph:v21.03.2
    depends_on:
      -  alpha5
    volumes:
      - ./dgraph_data/alpha6:/dgraph
    ports:
      - "8086:8080"
      - "9086:9080"
    networks:
      - dgraph
    command: dgraph alpha --my=alpha6:7080 --zero=zero1:5080,zero2:5080,zero3:5080
      --security "whitelist=0.0.0.0/0"
      --telemetry "reports=false; sentry=false;"

  ratel:
    image: dgraph/ratel:v21.03.2
    ports:
      - "8000:8000"
    networks:
      - dgraph
    command: dgraph-ratel

vnium · October 11, 2022, 10:35am

Instead of iterating over all users (with type(User)) you should get the user uids via a reverse edge in the my_token query and use them in your second query.

MichelDiz · October 14, 2022, 2:18am

Indeed, vnium. I would explore the tagged edge. I would use reverse <~tagged> and then it would capture all users by previously limiting the amount of tags. If you start from the top, you will make a very wide query. And therefore it will consume a lot of resources.

The ideal would also be to be able to extract all the tags and make a query block for each tag. This would be done in code, as in DQL it is not possible without query loops. The more separate blocks you have for each thing/part, the more performance you will extract.

MichelDiz · October 14, 2022, 2:34am

I don’t think you’re using Docker Swarm or something like that. Correct? So you don’t go very far as you are limited vertically. Depending on the number of threads, this may even work a little, but ideally, your cluster should be spread over several machines. And not just subdivided between containers on the same machine. I would recommend having a few containers on the same machine if it has a lot of resources available. However, the ideal is to have an NVMe SSD for each container. It’s just complicated to configure it via Docker.

If you want to get the most out of performance. I would recommend a manual run of the Dgraph binary. And run them on bare-metal and each Alpha instance running on its respective SSD. In addition to having a good distribution of the cluster.

praslar · October 18, 2022, 6:52am

Tks so much, will start from here.

praslar · October 19, 2022, 3:17am

Updated the query

{
	  me as var(func: eq(address, "` + account + `")) {
		t as tagged
		c as chain
	  }
	
	  var(func: uid(t)) {
		filteredUser as  tokens: ~tagged
	  }

	  var(func: uid(filteredUser)) {
		x as count(tagged @filter(uid(t)))
		y as count(chain @filter(uid(c)))
		norm as math(1)
		score_x as math(x*norm)
		score_y as math(y*norm)
		score as math(score_x+score_y/2)
	  }

	  suggestions(func: uid(score), first: 10, orderdesc: val(score))
		@cascade
		@filter(NOT uid(me)) 
		{	
			balance: balance
			address: address
			score: val(score)
	    }
	}

Topic		Replies	Views
Query to slow, how to optimize query Dgraph	5	469	April 25, 2021
Query performance with lots of conditions as anyofterms, hash etc Dgraph	1	821	April 10, 2018
How to improve query response time? Dgraph	6	529	November 11, 2019
How to imporve query performance and be avoid from out of memory Dgraph	4	385	July 1, 2023
Efficient querying a large database Dgraph	2	303	July 7, 2021

How to optimize the query?

What I want to do

What I did

Related Topics