Need help with dgraph query


(abhee prince) #1

Hi ,

I would like to know how to write a query which helps me in getting the number of types of items bought by each customer. I would like to know this for and every customer.

schema

Customer

  • customer_name: String

  • did_order : [uid] (order)

Order

  • order.customer : uid (customer)

  • order.item : uid (item)

  • order.date

Item

  • item.name // examples :- apple or orange or pen …

each customer can place any number of orders, each order can have any number of items in it. There won’t be any duplicate items in each order and no facets or so to mention the quantity of the item purchased. Please assume the quantity to be one always. Finally, I would like to count the number of Distinct types of items bought by every customer. i.e :- please start the root with has(customer.name) instead of eq(customer.name,xxx).

P.S :- this is a scale-down of our use case which has abt 6 different types of entities and such. Please ignore any non-ideal schema declaration practices if any.


(August Hell) #2

Basic example:

{
	q(func: has(customer_name)) {
	customer_name
	did_order {
		order.date
		order.item {
			item.name
		}
	}
}

Do the tutorial to get a feeling of what you can do with dgraph :wink:


(abhee prince) #3

Hi August,

Thank you for the response. I have completed the tutorial(in fact many times already: sweat_smile:). My query is to get the “number of distinct items purchased by each customer”. The query posted by you gives me the list of all items purchased by each customer. But it is missing the aggregation(count) part and distinct part. To help you in visualizing my requirement better, I will give an example.

consider 2 customers who placed 2 orders each. lets say that customer A ordered item X and item Y in first order and item Y and item Z in other order, then the distinct items ordered by him are:- item X, item Y, and item Z and the count is 3.

Let’s say that customer B bought item Z in both of his orders, then the number of distinct items ordered by him is 1.
final response required is:-
A : 3
B: 1

for 1 customer I can do this

{
	var(func: eq(customer_name,"A")) {
	customer_name
	did_order {
		order.date
		item_list as order.item
	}
       item_count(func: uid(item_list)){
         final_count : count(uid)
       }
}

I want to know how to do the same when I’m starting with has(customer.name) .i.e when i want to do it for everyone at once.

now, drifting away from layman terms. the challenge I’m facing is " when there are multiple nodes at the root level, I would like to be able to store query variables (uid_lists) whose scope is local. i.e separate uid_list for each node at root level "


(August Hell) #4

Ah I see, yes thats beyond what the tutorial is covering. Sorry I misunderstood your question, it looked trivial. To do what you want, you need some kind of mapping of uids to aggregation.
While it’s easy to gather the informations for such a mapping with eg:

{
  v(func: has(customer_name)) {
    cname: customer_name
    did_order {
      order.item {
        itemuid: uid
      }
    }
  }
}

… I don’t know how to do the mapping with a query and would do that with the result in a program.

Maybe that can be done anyhow with @groupby?


(abhee prince) #5

Hi August hell,
I tried with group by too. but no luck till now.

 @MichelDiz.  Please help me with this.

(Michel Conrado) #6

Hi, I’m not sure if I get it. A sample (JSON or RDF) and also an idea of response would be nice to corroborate with what is being said.

however, I think you eventually will use aggregations tho, so:

{
	allCustomers(func: type(Customer)) {
    customer_name
    did_order {
      order.date
      item_list as count(order.item)
    }
    total_items_by_Customer : sum(val(item_list))
    }

    total(){
      final_sum : sum(val(item_list))
      }
}

Response

{
  "data": {
    "allCustomers": [...],
    "total": [
      {
        "final_sum": 10++
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 23847,
      "processing_ns": 2416875,
      "encoding_ns": 4990,
      "assign_timestamp_ns": 453883
    },
    "txn": {
      "start_ts": 30304
    }
  }
}

https://docs.dgraph.io/query-language/#aggregation


(August Hell) #7

So customer ordered:
Order 1: Item X and Item Y
Order 2: Item Y and Item Z

I’m not sure Michel, will your aggregation

  • count the 3 unique items X, Y, Z which it should
  • or 2+2=4 as the number of items per order?

Thats why I think, there is a group needed to have disinct items of all orders of one customer to get the result Sai is looking for.


(abhee prince) #9

Hi @MichelDiz,

While computing total_items_by_Customer, Your query will count the total number of items ordered by each customer right. As @AugustHell said, i want to “count the number of distinct items bought by each customer”. In the example specified by me, the items ordered by the customer A are X, Y, Y, Z . So, the distinct items ordered by him are 3 (X,Y,Z).


(abhee prince) #10

This is the schema and sample rdf

type Customer {
  customer_name: string
  did_order: [Order]
}

type Order {
  order.customer: Customer
  order.item: [Item]
  order.date: string
}

type Item {
    item.name: string
}


item.name: string @index(exact) .
customer_name: string @index(term) .
did_order: [uid] .
order.customer: uid .
order.item [uid] .
order.date: string .

Sample data

{
  set {
    _:alice <name> "Alice" .
    _:alice <dgraph.type> "Customer" .
    _:alice <did_order> _:order1 .
    _:alice <did_order> _:order2 .


    _:bob <name> "bob" .
    _:bob <dgraph.type> "Customer" .
    _:bob <did_order> _:order3 .
    _:bob <did_order> _:order4 .


    _:order1 <dgraph.type> "Order" .
    _:order1 <order.customer> _:alice .
    _:order1 <order.date> "2019-10-09" .
    _:order1 <order.item> _:x .
    _:order1 <order.item> _:y .

    _:order2 <dgraph.type> "Order" .
    _:order2 <order.customer> _:alice .
    _:order2 <order.date> "2019-10-09" .
    _:order2 <order.item> _:y .
    _:order2 <order.item> _:z .

    _:order3 <dgraph.type> "Order" .
    _:order3 <order.customer> _:bob .
    _:order3 <order.date> "2019-10-09" .
    _:order3 <order.item> _:z .

    _:order4 <dgraph.type> "Order" .
    _:order4 <order.customer> _:bob .
    _:order4 <order.date> "2019-10-09" .
    _:order4 <order.item> _:z .

    _:x <dgraph.type> "Item" .
    _:y <dgraph.type> "Item" .
    _:z <dgraph.type> "Item" .




    _:x <item.name> "pen" .
    _:y <item.name> "pencil" .
    _:z <item.name> "quill" .

  }
}

The expected result is
{
Alice : 3
Bob : 1
}

explaination :-

Alice ordered items :- pencil,pen,quill
bob ordered item :- quill

Take away for me from this example :-
some way to create and use variable whose scope is limited i.e query variable(not value variable) whose scope is limited to each customer.


(Michel Conrado) #11

As far I can tell this wouldn’t be possible. Cuz they are multiple orders. We can count, but not “count unique”. However we can do this if we define a new block for each Customer. And that would have to be done manually. As you did in your first example.

Using the new @normalize behavior (coming soon), we can do something close to what you want. But it doesn’t take out the repeated ones.

{
  q(func:  type(Customer)) {
   customer_name : name
    did_order @normalize {
      order.item  {
       count : count(uid)
       item.name : item.name
      }
    }
  }
}

It returns

{
  "data": {
    "q": [
      {
        "customer_name": "Alice",
        "did_order": [
          {
            "item.name": "pencil"
          },
          {
            "item.name": "quill"
          },
          {
            "count": 2
          },
          {
            "item.name": "pen"
          },
          {
            "item.name": "pencil"
          },
          {
            "count": 2
          }
        ]
      },
      {
        "customer_name": "bob",
        "did_order": [
          {
            "item.name": "quill"
          },
          {
            "count": 1
          },
          {
            "item.name": "quill"
          },
          {
            "count": 1
          }
        ]
  }
}

You can just “remove” the repeated ones manually.

@animesh2049 what you think about this? We could add a new feature like “@mergenormalized

{
  q(func:  type(Customer)) {
   customer_name : name
    did_order @mergenormalized(item.name) {      # "Merge by name"
      order.item  {
       count : count(uid)
       item.name : item.name
      }
    }
  }
}

Tha would return

{
  "data": {
    "q": [
      {
        "customer_name": "Alice",
        "did_order": [
          {
            "item.name": "pencil"
          },
          {
            "item.name": "quill"
          }
          {
            "item.name": "pen"
          },
          {
            "count": 3
          }
        ]
      },
      {
        "customer_name": "bob",
        "did_order": [
          {
            "item.name": "quill"
          }
          {
            "count": 1
          }
        ]
      }
    ]
  }
}