Recommendation Engine Tutorial Help

MaazM · June 20, 2018, 6:39pm

Hi guys, So I am just starting with Dgraph and I am having trouble getting going. I am trying to build this recommendation engine from this blogpost (https://blog.dgraph.io/post/recommendation/). Can you please tell me where I am going wrong?

I have loaded the movieLens dataset (out.rdf.gz) from the github link provided in the tutorial. I loaded the data by downloading the file and placing it it my dgraph directory. Then I ran the command:
docker exec -it dgraph dgraph live -r out.rdf.gz --zero localhost:5080 -c 1
in my terminal. Am I loading the dataset correctly? I cannot find much documentation on how to properly load data into Dgraph.

Next, when I try to run the queries (specifically the second snippet of code) into http://localhost:8000/, I get this error:
: Predicate rated doesn’t have reverse edge

I don’t understand why I am getting this error because the ‘rated’ edge is in fact bi-directional. Can anyone explain to what I am doing incorrectly?

I have done the online tour/tutorial and read over the documentations but I am still having some trouble with Dgraph.

karan28aug · June 20, 2018, 7:00pm

Check your schema in ratel and check if there is a reverse edge for predicate =>rated or not. If it is not there alter the rated part of schema…

rated: type @reverse

Then try to use it if it works and let me know.

MaazM · June 20, 2018, 7:24pm

Okay, just to be clear I clicked schema → clicked the ‘Indices’ column of the ‘rated’ row. Then I check-marked the reverse box. I ran the query again and I am no longer getting the error message. But my output is different than the expected output in the tutorial. The recommended ‘similar movies’ are different than the output in the tutorial. Do you have any idea why this is?

Thank you for your response, I really appreciate it.

karan28aug · June 20, 2018, 7:33pm

var(func: uid( 0x30d80 ))
Maybe it depends on the Uid mentioned by you in the block. Change the Uid and check for the results.

MaazM · June 20, 2018, 7:51pm

That was what I was thinking, but my UID is the same as in the tutorial. I am using

0x30d80

MichelDiz · June 20, 2018, 7:59pm

Here Get started with Dgraph

and the live loader Get started with Dgraph

But the Bulk Loader is recommended.

I’m not sure, but maybe It might have changed the UID (during the load). Would recommend doing a search. See in the tutorial which predicate has the node you are looking for then use this query:

{
 seek(func: has(somepredicatehere) {
  uid
  expand(_all_)
}
}

You should not skip the Schema assembly before the Bulk (or live load). I see that you do not show Live Load a Schema, so it will fill BadgerDB with the data, but without indexing anything or categorizing the predicates.

MaazM · June 20, 2018, 10:19pm

Thank you for the links on loading.

What do you mean by searching for the predicate of the node that I am looking for? I apologize if I am not understanding correctly but I am not understanding the point of this, or what predicate to look for in the tutorial. I am getting an output for ‘similarMovies’ but the output I am getting is completely different than the output in the tutorial. So should I search for the uid for Shawshank Redemption and compare it to the value (0x30d80) in the tutorial to see if the uid changed sometime during the load? I tried this but strangely I don’t get any result from this query.

Also for the schema, to make sure I am understanding you correctly, you are saying that I should write a schema file before the load and then include that file into live load command? Also yes, I am guessing that is what happened here. The predicates were not categorized and the nothing was indexed.

MichelDiz · June 20, 2018, 10:35pm

As the UID has probably changed, you need to find a valid one. So doing an open search to be sure is one way.
PS: I’m just showing you how to find the UID you need. But the ideal is to start from scratch.

It is not random, it just is not fixed. But if you do it in order, everything will go well.

If you are doing LiveLoad in an instance from scratch. Possibly the UID will be the same. If you are already using the instance before starting LiveLoad. The UID sure will not be the same. In the case in the examples that exist, there is no fixed UID.

For best results and indexing, yes.

If you load the Dgraph with data and do not give it a schema, neither a function, directive, or filter will not work.

On how to use Schema with Live Load. (BulkLoad is better ) See the example below.

Read RDFs and a schema file and send to Dgraph running at given address
> dgraph live -r <path-to-rdf-gzipped-file> -s <path-to-schema-file> -d <dgraph-server->address:grpc_port> -z <dgraph-zero-address:grpc_port>

As I said it can be a UID range problem. Delete the P, W, ZW folders and start an instance from scratch (this way deleting the folders). But use Bulk Load instead.

MichelDiz · June 20, 2018, 10:47pm

Here the schema benchmarks/movielens.schema at master · dgraph-io/benchmarks · GitHub

MaazM · June 21, 2018, 4:32am

So when I do a search for the uid 0x30d80, nothing is returned. This is may be because of what you said below since I did not give it a schema.

I will delete the p, w, and zw folders. Then I will run the Live Load again with the schema. I am using docker so this is the command I will use (is this right?):

docker exec -it dgraph dgraph live -r out.rdf.gz -s movielens.schema --zero localhost:5080 -c 1

I understand that Bulk Load is considerably faster, but I hesitate to use Bulk because I am unsure how to properly allocate shards.

MichelDiz · June 21, 2018, 4:40am

Well how did you searched? like this?

{
 test(func: uid(0x30d80) ) {
   expand(_all_){ expand(_all_) }
}
}

if so, that means that there may have been an error in the Bulk, or even changed UID due to the factors I mentioned.

Yes it looks ok

So do not use such a setup. Just a simple and straightforward bulk.

MichelDiz · June 21, 2018, 4:46am

Search for the name, it is indexing with exact and term
“name: string @index (term, exact).”

{
  q(func: allofterms(name@en, "NAME OF THE MOVIE HERE")) {
     uid # you'll get the actual UID.
   expand(_all_){ expand(_all_) } #little expand just to take a look

}
}

or for exact

{
  q(func: eq(name@en, "NAME OF THE MOVIE HERE")) {
     uid # you'll get the actual UID.
   expand(_all_){ expand(_all_) } #little expand just to take a look

}
}

karan28aug · June 21, 2018, 6:04am

@MaazM , as told by MichelDiz also, find the UID associated with (Shawshank Redemption) and then query it for the similar result as in the example.

To get the UID for that movie, use the query he told or this one -

{
  q(func: allofterms(name@en, "Shawshank Redemption")) 
    {
     uid
    }
}

Use this Uid to get the result. Let me know if are able to reproduce the similar result or not.

MaazM · June 21, 2018, 6:20am

Hi Karandeep, so here is the output of all three of these queries as you and MichelDiz suggested. No UIDs are being returned. The ‘test’ query tells me that the name corresponding to this uid is not Shawshank Redemption but April Turner (who I guess is an actress). I will try MichelDiz’s suggestion of re-loading the data with the provided schema.

karan28aug · June 21, 2018, 6:52am

{
  q2(func: allofterms(name@en, "Shawshank Redemption")) 
    {
     uid
    }
}

When I am running this query in my system, this is the response -

{
  "data": {
    "q2": [
      {
        "uid": "0x8005e8"
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 12970,
      "processing_ns": 778359,
      "encoding_ns": 695365
    },
    "txn": {
      "start_ts": 58578,
      "lin_read": {
        "ids": {
          "1": 9
        }
      }
    }
  }
}

No need to run all the queries all together, just simply run any one of the three suggested to get the uid.

MaazM · June 21, 2018, 7:03am

*** I resolved this issue ***

I deleted the p, w and, zw folders as you suggested. Then I ran the following command in my terminal:

docker exec -it dgraph dgraph live -r out.rdf.gz -s movielens.schema --zero localhost:5080 -c 1

However, I got the following output:

2018/06/21 06:42:10 run.go:337: Creating temp client directory at /tmp/x513702890

Processing movielens.schema
2018/06/21 06:52:11 rpc error: code = Unknown desc = While proposing to RAFT group, err: context deadline exceeded

Also, I noticed that I have the following message on my dgraph zero server:

No healthy connection found to leader of group 1

MaazM · June 21, 2018, 7:07am

That is strange why we are not getting the same output. It’s probably because my data is not loaded correctly. Also, I am surprised the query returned a UID when your parameter was “Shawshank Redemption”. The movie’s name is “The Shawshank Redemption” with a “The” at the beginning. Wouldn’t this make a difference in what value is returned?

karan28aug · June 21, 2018, 7:15am

I just tried to run the query with " The Shawshank Redemption ", the result was similar for both of these(The Shawshank Redemption & Shawshank Redemption).
drop_all the data from it and try load it again and then query it for the desired result. Let me know if you are able to reproduce it or not.

MaazM · June 21, 2018, 7:24am

What is the drop_all function?

karan28aug · June 21, 2018, 7:34am

drop_all deletes everything from the database and starts from a clean database.

Topic		Replies	Views
Build a Realtime Recommendation Engine: Part 1 - Dgraph Blog Blog	2	1390	December 18, 2018
Build a Realtime Recommendation Engine: Part 1 - Dgraph Blog Blog	0	1433	June 29, 2017
Can't load movie dataset Issues untagged , tutorial	4	492	July 11, 2020
Build a Realtime Recommendation Engine: Part 2 - Dgraph Blog Blog	3	2272	November 16, 2021
Build a Realtime Recommendation Engine: Part 2 - Dgraph Blog Blog	0	1165	June 30, 2017

Recommendation Engine Tutorial Help

Related topics