Recommendation Engine Tutorial Help

MaazM · June 21, 2018, 7:34am

Sorry I should have clarified, how do I use it though

karan28aug · June 21, 2018, 7:36am

{
  "drop_all": true
}

using this in alter will delete the entire data you have stored in your database.

MaazM · June 21, 2018, 8:25am

Thank you @karan28aug and @MichelDiz. I deleted the dataset per Karandeep’s suggestion, then loaded in the dataset again with the schema (as Michel instructed). I believe this worked and the data is now loaded in correctly.

I ran a query to find the UID of “The Shawshank Redemption” and it is different from the UID in the tutorial. Is this okay? This is my result:

{
“data”: {
“q”: [
{
“uid”: “0x6dddb”
}
],

I continued anyways and ran the queries from the tutorial. I am getting different results for the “similar movies”.

MichelDiz · June 21, 2018, 3:45pm

Yes it’s okay, it can often happen to change. The tutorial is a bit old, some things in Dgraph can be changed. And it also depends on the general usage of Dgraph and how Load loads. But by taking those steps that we tell you, you can collect the correct UID you need. So you can continue your studies.

It is good to know this in case it happens in the future. And also learn to “debug the DB data”.

PS: But expand this query with expan (_all_) just to be sure (it’s certainly it, but it’s good to have a visual response).

MaazM · June 24, 2018, 11:41pm

Hi Michel, Thank you so much:)

MaazM · June 25, 2018, 8:33pm

Hi Michel. I have been trying to run the Hybrid Filtering Algorithm (from Build a Realtime Recommendation Engine: Part 2 - Dgraph Blog), however I am getting an error message regarding the groupBy() query.

I am using the same code as provided in the tutorial so I am wondering if the groupBy functionality has changed.
This is the error message I get:

: Only uid predicate is allowed in count within groupby

This is the code from which the error is being generated:

var(func: uid(2)) { # 1
rated @groupby(genre) {
gc as count(uid)
}

Do you have any idea why I am getting this error? I have looked at the documentation on groupBy and I don’t see where I am going wrong

MichelDiz · June 25, 2018, 9:37pm

I’m not completely aware about this post, can you tell me where to find the data in RDF so I can create an instance and test it?

I think these are here benchmarks/movielens at master · dgraph-io/benchmarks · GitHub
isn’t? I tried to bulk these, but I did not succeed.

Is this coming from a mutation of yours? Show an example.

MaazM · June 25, 2018, 10:10pm

Yes, this is the dataset for the tutorial: benchmarks/movielens/conv100k at master · dgraph-io/benchmarks · GitHub
The schema can be found in benchmarks/movielens/.

This tutorial is the second part to the tutorial we were talking about a few days eariler. I was also having trouble using bulk load so I used Live load instead.

The error message is coming from the code that was provided in the tutorial (under the section titled Hybrid Filtering).

MichelDiz · June 25, 2018, 10:27pm

Well I had no problem with Bulkload.

I started Zero:

dgraph zero.

I pasted Schema and out.rdf.gz on the same path as the binary and executed the bulk:

dgraph bulk -r out.rdf.gz -s movielens.schema --reduce_shards=1 --zero = localhost:5080

I then copied the Dgraph’s binary into the OUT/0 folder and then executed

dgraph server --lru_mb 2024

PS. Zero will not turn off at all.

I opened Ratel, searched for a user

{
  user (func: has (age)) {
    uid
    expand (_all_) # Attention, it will load a lot of data, remove expand (_all_) if necessary
  }
}

then I pasted the UID that I chose 0x4e23 into the following query:

{
  var(func: uid(0x2711)) {  # 1
    rated @groupby(genre) {
      gc as count(uid)
    }

    a as math(1)
    seen as rated @facets(r as rating) @facets(ge(rating, 3)) { # 2
      ~rated @facets(sr as rating) @facets(ge(rating, 3)) {   # 3
        user_score as math((sr + r)/a) # 4
      }
    }
  }

  var(func: uid(user_score), first:30, orderdesc: val(user_score)) { #5
    norm as math(1)
    rated @filter(not uid(seen)) @facets(ur as rating) { # 6
      genre {
        q as math(gc)   # 6.1
      }
      x as sum(val(q))  # 6.2
      fscore as math((1+(x/100))*ur/norm) # 7
    }
  }

  Recommendation(func: uid(fscore), orderdesc: val(fscore), first: 10) { # 8
    val(fscore)
    name
  }
}

and the result was this:


{
  "data": {
    "Recommendation": [
      {
        "val(fscore)": 7.9,
        "name": "Empire Strikes Back, The (1980)"
      },
      {
        "val(fscore)": 7.4,
        "name": "2001: A Space Odyssey (1968)"
      },
      {
        "val(fscore)": 7.05,
        "name": "Glory (1989)"
      },
      {
        "val(fscore)": 6.795,
        "name": "Professional, The (1994)"
      },
      {
        "val(fscore)": 6.65,
        "name": "Mrs. Brown (Her Majesty, Mrs. Brown) (1997)"
      },
      {
        "val(fscore)": 6.65,
        "name": "Ed Wood (1994)"
      },
      {
        "val(fscore)": 6.65,
        "name": "Deconstructing Harry (1997)"
      },
      {
        "val(fscore)": 6.65,
        "name": "Beautiful Thing (1996)"
      },
      {
        "val(fscore)": 6.65,
        "name": "Titanic (1997)"
      },
      {
        "val(fscore)": 6.5,
        "name": "Last Man Standing (1996)"
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "processing_ns": 30999100
    },
    "txn": {
      "start_ts": 17,
      "lin_read": {
        "ids": {
          "1": 4
        }
      }
    }
  }
}

MaazM · June 25, 2018, 10:56pm

Wow that worked perfectly:) So it seemed the issue was that I was using

gc as count (_ uid_) as shown in the tutorial and in the docs here. However, the underscores are not needed as you demonstrated. It was a small error but I appreciate that you showed me this.

Also thank you for explicitly showing how to execute the bulk load.

I have one question however, what do you mean that you copied the binary into the OUT/0 folder? Why is this necessary and what does this accomplish?

MichelDiz · June 25, 2018, 11:08pm

Nice, glad it worked.

It is! I forgot to mention that this has changed a while ago. Since v0.9.0 Release.

Is that this is an old post, you can not always update everything as the Dgraph evolves. But I will review those posts to repair those little details.

your welcome xD

Oh yeah! is that Dgraph creates a folder called “out” for the bulk and depending on the Shards it numbers the folders from 0 to 9 … and so on. Then I copy the binary “dgraph.exe” (I’m on windows) and paste it inside each folder. It’s just a lazy job xD

However if you are on another system, you will need to indicate the path to the Dgraph Server instance something like:

Dgraph Server --lru_mb 2024 -p ~/home/user/dgraph/out/0

and it will use the data generated by Bulkload.

Cheers

MichelDiz · June 25, 2018, 11:17pm

Hey, this link is pointing to an old Dgraph Version. v0.7.7

should be Get started with Dgraph

MaazM · June 27, 2018, 5:44pm

Great! I Really appreciate it.

I understand, everything can’t be updated immediately as Dgraph evolves. Also, I see now what you mean by the folders being numbered 0 to 9.

MaazM · June 28, 2018, 12:48am

I am trying to write a simple operation on the movieLens dataset. I am trying to take the average rating of every movie. So I am using the following query:
(I am not sure why indentation is not showing so I have attached a screen shot of my code as well)
41%20PM

q1(func: has(~rated)) {
uid
~rated @facets(r as rating){
numUsers as count(rating)
avg as math((r+avg)/numUsers)
}
}

I am not sure how I can add up all of the ratings from each movie. This code outputs all the ratings from every movie but I really don’t understand how to add up all the ratings and get the sum. Any idea? I was thinking of using the Go Client to extract all of the ratings from each movie then add up these values using Go. But I am not sure if this is really needed as their might be a simpler option using graphql±.

MichelDiz · June 28, 2018, 3:00am

This example can be a good for u? Get started with Dgraph

About your query, I wanted to understand what you want to do here:
numUsers as count (rating)

rating is a facet, it is not part of the body of Node, but of Facet.
you could try:

numUsers as Total_of_Ratings : count(~rated) #Not "rating", rated is a predicate and rating is a facet

outside the ~rated block. So you will get a total of users rating this movie.

So you will not be able to count “rating” - Count it’s not a mathematical operation, but a Edge relation operation.

e.g: count(friends) = total of relational edges meaning friendships.

You could try use Get started with Dgraph

“The form count(predicate) counts how many predicate edges lead out of a node.
The form count(uid) counts the number of UIDs matched in the enclosing block.”
Get started with Dgraph

About formatting (indentation ) your examples here on Discuss use “ctrl + shift + c” to format as code. There’s a bar on top of the text editor. You can use one formatting function over the other too.

system · July 28, 2018, 3:00am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Build a Realtime Recommendation Engine: Part 1 - Dgraph Blog Blog	2	1390	December 18, 2018
Build a Realtime Recommendation Engine: Part 1 - Dgraph Blog Blog	0	1433	June 29, 2017
Can't load movie dataset Issues untagged , tutorial	4	492	July 11, 2020
Build a Realtime Recommendation Engine: Part 2 - Dgraph Blog Blog	3	2272	November 16, 2021
Build a Realtime Recommendation Engine: Part 2 - Dgraph Blog Blog	0	1165	June 30, 2017

Recommendation Engine Tutorial Help

Related topics