jso
(Jeffrey So)
March 9, 2021, 10:46pm
1
I have set up dgraph on a VM. When I run some load test on it, I notice the response time is pretty slow. Here is my set up.
Cloud: Azure
VM: F16s (16 CPU, 32GB memory)
Disk: Premium SSD
OS: Ubuntu 18.04
I just followed the doc https://dgraph.io/docs/deploy/single-host-setup/ and ran ratel, alpha and zero using docker. I loaded about 5GB of data in it.
I tested it with simulating 50 concurrent users and got response time of ~700ms. When I use 100 users, the response time jumped to ~1.8 sec.
During the test, I found that the CPU load became very high.
Is this response time normal? How can I improve the performance?
MichelDiz
(Michel Diz)
March 9, 2021, 10:59pm
2
How did you perform the load? Live load? Can you give an example of your schema? and also example of the load itself.
mrjn
(Manish R Jain)
March 9, 2021, 11:06pm
3
Most likely you could optimize your query better, so it’s cheaper to run. If you post your query, we could have a look.
jso
(Jeffrey So)
March 10, 2021, 12:42am
4
Here is the schema.
<Actor>: [uid] @reverse .
<actor>: [uid] .
<contenttype>: string .
<cty>: default .
<dgraph.graphql.schema>: string .
<dgraph.graphql.xid>: string @index(exact) @upsert .
<discoverable_status>: string .
<dst>: default .
<id>: string @index(hash) .
<genre>: [uid] @reverse .
<genre_category>: string @index(hash) .
<keyword>: string @index(hash) .
<longdescription>: default .
<lwe>: datetime .
<lws>: datetime .
<name>: string @index(term) @lang .
<net>: default .
<network_name>: string .
<pn>: default .
<provider_name>: string .
<score>: [uid] @count @reverse .
<series>: default .
<set>: [uid] .
<shortdescription>: default .
<st>: default .
<status>: string .
<title>: default .
<user_id>: string .
<user_name>: string @index(term) .
<uses_keyword>: [uid] @reverse .
<watch>: [uid] @reverse .
<window_end>: string .
<window_start>: string .
type <actor> {
name
}
type <character> {
name
}
type <content> {
id
name
st
shortdescription
longdescription
series
cty
pn
net
dst
lws
lwe
}
type <dgraph.graphql> {
dgraph.graphql.schema
dgraph.graphql.xid
}
type <genre> {
genre_category
}
type <keyword> {
keyword
}
type <user> {
user_name
}
And the load test.
<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan version="1.2" properties="5.0" jmeter="5.2.1">
<hashTree>
<TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="Test Plan" enabled="true">
<stringProp name="TestPlan.comments"></stringProp>
<boolProp name="TestPlan.functional_mode">false</boolProp>
<boolProp name="TestPlan.tearDown_on_shutdown">true</boolProp>
<boolProp name="TestPlan.serialize_threadgroups">false</boolProp>
<elementProp name="TestPlan.user_defined_variables" elementType="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
<collectionProp name="Arguments.arguments"/>
</elementProp>
<stringProp name="TestPlan.user_define_classpath"></stringProp>
</TestPlan>
<hashTree>
<ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="Thread Group" enabled="true">
<elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" enabled="true">
<boolProp name="LoopController.continue_forever">false</boolProp>
<intProp name="LoopController.loops">-1</intProp>
</elementProp>
<stringProp name="ThreadGroup.num_threads">50</stringProp>
<stringProp name="ThreadGroup.ramp_time"></stringProp>
<boolProp name="ThreadGroup.scheduler">true</boolProp>
<stringProp name="ThreadGroup.duration">300</stringProp>
<stringProp name="ThreadGroup.delay">0</stringProp>
<stringProp name="ThreadGroup.on_sample_error">startnextloop</stringProp>
<boolProp name="ThreadGroup.same_user_on_next_iteration">false</boolProp>
</ThreadGroup>
<hashTree>
<CSVDataSet guiclass="TestBeanGUI" testclass="CSVDataSet" testname="CSV Data Set Config" enabled="true">
<stringProp name="filename">C:/Perf/bookmark_testdata.csv</stringProp>
<stringProp name="fileEncoding"></stringProp>
<stringProp name="variableNames"></stringProp>
<boolProp name="ignoreFirstLine">false</boolProp>
<stringProp name="delimiter">,</stringProp>
<boolProp name="quotedData">false</boolProp>
<boolProp name="recycle">true</boolProp>
<boolProp name="stopThread">false</boolProp>
<stringProp name="shareMode">shareMode.all</stringProp>
</CSVDataSet>
<hashTree/>
<OnceOnlyController guiclass="OnceOnlyControllerGui" testclass="OnceOnlyController" testname="Once Only Controller" enabled="true"/>
<hashTree>
<HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Authorization" enabled="true">
<elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" enabled="true">
<collectionProp name="Arguments.arguments">
<elementProp name="grant_type" elementType="HTTPArgument" enabled="true">
<stringProp name="Argument.name">grant_type</stringProp>
<stringProp name="Argument.value">client_credentials</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
<boolProp name="HTTPArgument.always_encode">false</boolProp>
<boolProp name="HTTPArgument.use_equals">true</boolProp>
</elementProp>
<elementProp name="client_id" elementType="HTTPArgument" enabled="true">
<stringProp name="Argument.name">client_id</stringProp>
<stringProp name="Argument.value">ios-ui-app</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
<boolProp name="HTTPArgument.always_encode">false</boolProp>
<boolProp name="HTTPArgument.use_equals">true</boolProp>
</elementProp>
<elementProp name="client_secret" elementType="HTTPArgument" enabled="true">
<stringProp name="Argument.name">client_secret</stringProp>
<stringProp name="Argument.value">xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
<boolProp name="HTTPArgument.always_encode">false</boolProp>
<boolProp name="HTTPArgument.use_equals">true</boolProp>
</elementProp>
<elementProp name="audience" elementType="HTTPArgument" enabled="true">
<stringProp name="Argument.name">audience</stringProp>
<stringProp name="Argument.value">edge-service</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
<boolProp name="HTTPArgument.always_encode">false</boolProp>
<boolProp name="HTTPArgument.use_equals">true</boolProp>
</elementProp>
</collectionProp>
</elementProp>
<stringProp name="HTTPSampler.domain">example.com</stringProp>
<stringProp name="HTTPSampler.port"></stringProp>
<stringProp name="HTTPSampler.protocol">https</stringProp>
<stringProp name="HTTPSampler.contentEncoding"></stringProp>
<stringProp name="HTTPSampler.path">/oauth2/token</stringProp>
<stringProp name="HTTPSampler.method">POST</stringProp>
<boolProp name="HTTPSampler.follow_redirects">true</boolProp>
<boolProp name="HTTPSampler.auto_redirects">false</boolProp>
<boolProp name="HTTPSampler.use_keepalive">true</boolProp>
<boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
<stringProp name="HTTPSampler.embedded_url_re"></stringProp>
<stringProp name="HTTPSampler.connect_timeout"></stringProp>
<stringProp name="HTTPSampler.response_timeout"></stringProp>
</HTTPSamplerProxy>
<hashTree>
<HeaderManager guiclass="HeaderPanel" testclass="HeaderManager" testname="Header Manager" enabled="true">
<collectionProp name="HeaderManager.headers">
<elementProp name="Content-Type" elementType="Header">
<stringProp name="Header.name">Content-Type</stringProp>
<stringProp name="Header.value">application/x-www-form-urlencoded</stringProp>
</elementProp>
</collectionProp>
</HeaderManager>
<hashTree/>
<JSONPostProcessor guiclass="JSONPostProcessorGui" testclass="JSONPostProcessor" testname="JSON Extractor" enabled="true">
<stringProp name="JSONPostProcessor.referenceNames">bearer</stringProp>
<stringProp name="JSONPostProcessor.jsonPathExprs">$..access_token</stringProp>
<stringProp name="JSONPostProcessor.match_numbers">0</stringProp>
</JSONPostProcessor>
<hashTree/>
</hashTree>
</hashTree>
<HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Authorization" enabled="false">
<elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" enabled="true">
<collectionProp name="Arguments.arguments">
<elementProp name="grant_type" elementType="HTTPArgument" enabled="true">
<stringProp name="Argument.name">grant_type</stringProp>
<stringProp name="Argument.value">client_credentials</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
<boolProp name="HTTPArgument.always_encode">false</boolProp>
<boolProp name="HTTPArgument.use_equals">true</boolProp>
</elementProp>
<elementProp name="client_id" elementType="HTTPArgument" enabled="true">
<stringProp name="Argument.name">client_id</stringProp>
<stringProp name="Argument.value">ios-ui-app</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
<boolProp name="HTTPArgument.always_encode">false</boolProp>
<boolProp name="HTTPArgument.use_equals">true</boolProp>
</elementProp>
<elementProp name="client_secret" elementType="HTTPArgument" enabled="true">
<stringProp name="Argument.name">client_secret</stringProp>
<stringProp name="Argument.value">xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
<boolProp name="HTTPArgument.always_encode">false</boolProp>
<boolProp name="HTTPArgument.use_equals">true</boolProp>
</elementProp>
<elementProp name="audience" elementType="HTTPArgument" enabled="true">
<stringProp name="Argument.name">audience</stringProp>
<stringProp name="Argument.value">edge-service</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
<boolProp name="HTTPArgument.always_encode">false</boolProp>
<boolProp name="HTTPArgument.use_equals">true</boolProp>
</elementProp>
</collectionProp>
</elementProp>
<stringProp name="HTTPSampler.domain">example.com</stringProp>
<stringProp name="HTTPSampler.port"></stringProp>
<stringProp name="HTTPSampler.protocol">https</stringProp>
<stringProp name="HTTPSampler.contentEncoding"></stringProp>
<stringProp name="HTTPSampler.path">/oauth2/token</stringProp>
<stringProp name="HTTPSampler.method">POST</stringProp>
<boolProp name="HTTPSampler.follow_redirects">true</boolProp>
<boolProp name="HTTPSampler.auto_redirects">false</boolProp>
<boolProp name="HTTPSampler.use_keepalive">true</boolProp>
<boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
<stringProp name="HTTPSampler.embedded_url_re"></stringProp>
<stringProp name="HTTPSampler.connect_timeout"></stringProp>
<stringProp name="HTTPSampler.response_timeout"></stringProp>
</HTTPSamplerProxy>
<hashTree>
<HeaderManager guiclass="HeaderPanel" testclass="HeaderManager" testname="Header Manager" enabled="true">
<collectionProp name="HeaderManager.headers">
<elementProp name="Content-Type" elementType="Header">
<stringProp name="Header.name">Content-Type</stringProp>
<stringProp name="Header.value">application/x-www-form-urlencoded</stringProp>
</elementProp>
</collectionProp>
</HeaderManager>
<hashTree/>
<JSONPostProcessor guiclass="JSONPostProcessorGui" testclass="JSONPostProcessor" testname="JSON Extractor" enabled="true">
<stringProp name="JSONPostProcessor.referenceNames">bearer</stringProp>
<stringProp name="JSONPostProcessor.jsonPathExprs">$..access_token</stringProp>
<stringProp name="JSONPostProcessor.match_numbers">0</stringProp>
</JSONPostProcessor>
<hashTree/>
</hashTree>
<HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Recommendation" enabled="true">
<boolProp name="HTTPSampler.postBodyRaw">true</boolProp>
<elementProp name="HTTPsampler.Arguments" elementType="Arguments">
<collectionProp name="Arguments.arguments">
<elementProp name="" elementType="HTTPArgument">
<boolProp name="HTTPArgument.always_encode">false</boolProp>
<stringProp name="Argument.value">{
"item": "${contentId}"
}</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
</elementProp>
</collectionProp>
</elementProp>
<stringProp name="HTTPSampler.domain">example.com</stringProp>
<stringProp name="HTTPSampler.port"></stringProp>
<stringProp name="HTTPSampler.protocol">https</stringProp>
<stringProp name="HTTPSampler.contentEncoding"></stringProp>
<stringProp name="HTTPSampler.path">/recommend</stringProp>
<stringProp name="HTTPSampler.method">POST</stringProp>
<boolProp name="HTTPSampler.follow_redirects">true</boolProp>
<boolProp name="HTTPSampler.auto_redirects">false</boolProp>
<boolProp name="HTTPSampler.use_keepalive">true</boolProp>
<boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
<stringProp name="HTTPSampler.embedded_url_re"></stringProp>
<stringProp name="HTTPSampler.connect_timeout"></stringProp>
<stringProp name="HTTPSampler.response_timeout"></stringProp>
</HTTPSamplerProxy>
<hashTree>
<HeaderManager guiclass="HeaderPanel" testclass="HeaderManager" testname="Header Manager" enabled="true">
<collectionProp name="HeaderManager.headers">
<elementProp name="Authorization" elementType="Header">
<stringProp name="Header.name">Authorization</stringProp>
<stringProp name="Header.value">Bearer ${bearer}</stringProp>
</elementProp>
<elementProp name="Content-Type" elementType="Header">
<stringProp name="Header.name">Content-Type</stringProp>
<stringProp name="Header.value">application/json</stringProp>
</elementProp>
</collectionProp>
</HeaderManager>
<hashTree/>
<JSONPathAssertion guiclass="JSONPathAssertionGui" testclass="JSONPathAssertion" testname="JSON Assertion" enabled="true">
<stringProp name="JSON_PATH">$.header.message</stringProp>
<stringProp name="EXPECTED_VALUE">Success</stringProp>
<boolProp name="JSONVALIDATION">true</boolProp>
<boolProp name="EXPECT_NULL">false</boolProp>
<boolProp name="INVERT">false</boolProp>
<boolProp name="ISREGEX">true</boolProp>
</JSONPathAssertion>
<hashTree/>
<DurationAssertion guiclass="DurationAssertionGui" testclass="DurationAssertion" testname="Duration Assertion" enabled="false">
<stringProp name="DurationAssertion.duration">200</stringProp>
</DurationAssertion>
<hashTree/>
<ConstantTimer guiclass="ConstantTimerGui" testclass="ConstantTimer" testname="Constant Timer" enabled="false">
<stringProp name="ConstantTimer.delay">1000</stringProp>
</ConstantTimer>
<hashTree/>
</hashTree>
<ConstantTimer guiclass="ConstantTimerGui" testclass="ConstantTimer" testname="Constant Timer" enabled="true">
<stringProp name="ConstantTimer.delay">1000</stringProp>
</ConstantTimer>
<hashTree/>
<ResultCollector guiclass="ViewResultsFullVisualizer" testclass="ResultCollector" testname="View Results Tree" enabled="true">
<boolProp name="ResultCollector.error_logging">true</boolProp>
<objProp>
<name>saveConfig</name>
<value class="SampleSaveConfiguration">
<time>true</time>
<latency>true</latency>
<timestamp>true</timestamp>
<success>true</success>
<label>true</label>
<code>true</code>
<message>true</message>
<threadName>true</threadName>
<dataType>true</dataType>
<encoding>false</encoding>
<assertions>true</assertions>
<subresults>true</subresults>
<responseData>false</responseData>
<samplerData>false</samplerData>
<xml>false</xml>
<fieldNames>true</fieldNames>
<responseHeaders>false</responseHeaders>
<requestHeaders>false</requestHeaders>
<responseDataOnError>false</responseDataOnError>
<saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
<assertionsResultsToSave>0</assertionsResultsToSave>
<bytes>true</bytes>
<sentBytes>true</sentBytes>
<url>true</url>
<threadCounts>true</threadCounts>
<idleTime>true</idleTime>
<connectTime>true</connectTime>
</value>
</objProp>
<stringProp name="filename"></stringProp>
</ResultCollector>
<hashTree/>
<ResultCollector guiclass="SummaryReport" testclass="ResultCollector" testname="Summary Report" enabled="true">
<boolProp name="ResultCollector.error_logging">false</boolProp>
<objProp>
<name>saveConfig</name>
<value class="SampleSaveConfiguration">
<time>true</time>
<latency>true</latency>
<timestamp>true</timestamp>
<success>true</success>
<label>true</label>
<code>true</code>
<message>true</message>
<threadName>true</threadName>
<dataType>true</dataType>
<encoding>false</encoding>
<assertions>true</assertions>
<subresults>true</subresults>
<responseData>false</responseData>
<samplerData>false</samplerData>
<xml>false</xml>
<fieldNames>true</fieldNames>
<responseHeaders>false</responseHeaders>
<requestHeaders>false</requestHeaders>
<responseDataOnError>false</responseDataOnError>
<saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
<assertionsResultsToSave>0</assertionsResultsToSave>
<bytes>true</bytes>
<sentBytes>true</sentBytes>
<url>true</url>
<threadCounts>true</threadCounts>
<idleTime>true</idleTime>
<connectTime>true</connectTime>
</value>
</objProp>
<stringProp name="filename"></stringProp>
</ResultCollector>
<hashTree/>
<ResultCollector guiclass="TableVisualizer" testclass="ResultCollector" testname="View Results in Table" enabled="true">
<boolProp name="ResultCollector.error_logging">false</boolProp>
<objProp>
<name>saveConfig</name>
<value class="SampleSaveConfiguration">
<time>true</time>
<latency>true</latency>
<timestamp>true</timestamp>
<success>true</success>
<label>true</label>
<code>true</code>
<message>true</message>
<threadName>true</threadName>
<dataType>true</dataType>
<encoding>false</encoding>
<assertions>true</assertions>
<subresults>true</subresults>
<responseData>false</responseData>
<samplerData>false</samplerData>
<xml>false</xml>
<fieldNames>true</fieldNames>
<responseHeaders>false</responseHeaders>
<requestHeaders>false</requestHeaders>
<responseDataOnError>false</responseDataOnError>
<saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
<assertionsResultsToSave>0</assertionsResultsToSave>
<bytes>true</bytes>
<sentBytes>true</sentBytes>
<url>true</url>
<threadCounts>true</threadCounts>
<idleTime>true</idleTime>
<connectTime>true</connectTime>
</value>
</objProp>
<stringProp name="filename"></stringProp>
</ResultCollector>
<hashTree/>
</hashTree>
</hashTree>
</hashTree>
</jmeterTestPlan>
MichelDiz
(Michel Diz)
March 10, 2021, 1:00am
5
I didn’t get it. Have you sent an XML as a mutation? Dgraph supports only JSON and RDF(Simple NQuads). No other format is supported.
jso
(Jeffrey So)
March 10, 2021, 1:19am
6
Sorry I didn’t make myself clear… we have another microservice connecting to dgraph, and the above jmeter file hit that microservice.
jso
(Jeffrey So)
March 10, 2021, 3:25pm
7
This is how we make query to dgraph.
In golang code, using dgraph’s client. This is how we set up the client.
This client we initiate one-time, at the start of server
// Dial a gRPC connection. The address to dial to can be configured when
// setting up the dgraph cluster.
dialOpts := append([]grpc.DialOption{},
grpc.WithInsecure(),
grpc.WithDefaultCallOptions(grpc.UseCompressor(gzip.Name)))
// dgAlphaStr := vault.GetSecret(dgAlpha)
d, err := grpc.Dial(getDgConnStr(vaultSrv), dialOpts...) //getDgConnStr(vaultSrv) retrieve ip address dgraph alpha
if err != nil {
logger.BootstrapLogger.Fatal(err)
return nil, err
}
dgs := &DgraphService{
Client: dgo.NewDgraphClient(
api.NewDgraphClient(d),
),
}
and to make query
res, err := txn.QueryWithVars(ctx, query, variables) // txn initiated from dgo client NewReadOnlyTxn()
if err != nil {
logger.LogError(err, logFields)
return nil, "", err
}
var items map[string][]entity.RecoResponse // this struct only hold item ids
if err := json.Unmarshal(res.GetJson(), &items); err != nil {
logger.LogError(err, logFields)
return nil, "", err
}
MichelDiz
(Michel Diz)
March 10, 2021, 3:33pm
8
In short. You have a single Alpha cluster and sending 5GB of data in a short period. That recipe doesn’t look good to me.
If you need to keep like this, try ludicrous mode. Just add the --ludicrous_mode
to your cluster and you are good to go. This mode skips several background tasks that guarantee consistency on timestamps and data availability. So you gonna have a boost in the data ingestion.
See:
https://dgraph.io/docs/deploy/ludicrous-mode/
PS.
I would recommend that you have a cluster with multiple Alphas (preferably with multiple machines/servers) and make a balance of data between the instances. That way you can send mutations without losing significant latency. Or just use Bulkload for that case.
dmai
(Daniel Mai)
March 11, 2021, 5:14am
9
@jso Can you share the query you’re running in your test bench? We can then see if the query can be optimized with some tweaks to the query itself.
jso
(Jeffrey So)
March 11, 2021, 7:36pm
10
Here is the query.
{
# Count the genres for every movie that has been rated
var(func: has(uses_keyword)) {
total_keyword as count(uses_keyword )
}
var(func: has(genre)) {
total_genre as count(genre)
}
# Calculate a Jaccard distance score for every movie that shares
# at least 1 genre with the given movie.
var(func: eq(fid,"xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx")) { # M1
norm as math(1.0) # 1
my_keyword as count(uses_keyword) # 2
connected_keyword as uses_keyword { # 3
~uses_keyword {
# M2 -- movies reached here share a genre with the initial movie
# normalize the count to account for multiple paths
my_keyword_norm as math(my_keyword / norm) # 4
num_keyword as count(uses_keyword @filter(uid(connected_keyword)) ) # 5
k_initial_denom as math(my_keyword_norm + total_keyword - num_keyword)
k_final_denom as math(cond(k_initial_denom > 0 ,k_initial_denom,1))
keyword_distance as math(1.0-(num_keyword/k_final_denom))
}
}
my_genres as count(genre) # 2
connected_genre as genre { # 3
~genre {
# M2 -- movies reached here share a genre with the initial movie
# normalize the count to account for multiple paths
My_genres_norm as math(my_genres / norm) # 4
num_genres as count(genre @filter(uid(connected_genre)) ) # 5
initial_denom as math(My_genres_norm + total_genre - num_genres)
final_denom as math(cond(initial_denom > 0 ,initial_denom,1))
genre_distance as math(1.0-(num_genres/final_denom))
}
}
final_distance as math(keyword_distance+genre_distance)
}
# Sort and return closest movies.
similarMovies(func: uid(final_distance) , orderdesc: lws,first:10) @filter( eq(cty,"movie") and le(lws,"2021-01-28T01:00:00-00:00") and eq(dst,"published")) { # 7
firstlightid
title
lws
lwd
dst
cty
val(final_distance)
}
}
mrjn
(Manish R Jain)
March 12, 2021, 12:39am
11
You don’t need the has queries at the top. See my rough rewrite which takes <10ms to run.
{
# Count the genres for every movie that has been rated
# manish: This is not required.
# manish: Roughly doing the work based on keywords only -- taking <10ms.
# var(func: has(uses_keyword)) {
# total_keyword as count(uses_keyword )
#}
# Calculate a Jaccard distance score for every movie that shares
# at least 1 genre with the given movie.
var(func: eq(firstlightid,"xxx")) { # M1
uid
norm as math(1.0)
count(total_keyword) # 2
my_keyword as count(uses_keyword)
connected_keyword as uses_keyword { # 3
~uses_keyword {
# M2 -- movies reached here share a genre with the initial movie
# normalize the count to account for multiple paths
uid
my_keyword_norm: math(my_keyword / norm)
num_keyword2: math(norm) # You don't need the below one, because math(norm) gives you that.
# num_keyword: count(uses_keyword) @filter(uid(connected_keyword))
kd as count(uses_keyword)
# kd2: val(total_keyword) # manish: we don't need to calculate total_keyword. You get the same info from kd, which would only need to do this over a limited range.
}
}
final_distance as math(kd)
}
q3(func: uid(final_distance)) @filter(eq(cty, "movie") and le(lws,"2021-01-28T01:00:00-00:00") and eq(dst,"published")) {
uid
count(uid)
val(final_distance)
}
}