we use Dgraph version 21.03 with the below schema
type Teacher {
teacherId: ID!
yearLevel: [YearLevel]
learningAreas: [LearningArea]
similarTeacher: [SimilarTeacher]
}
type LearningArea {
areaId: ID!
areaName: String! @search
}
type YearLevel {
levelId: ID!
level: String! @search
type SimilarTeacher {
similarTeacherId : ID!
internalId: String! @search
teacher: Teacher!
score: Float! @search
}
there are about 130,000 teachers loaded to DGraph
and we have a simple comparison that will be performed
on every teacher against all other teachers and
added to the similar teachers list if it matches
following is an example of our update query
upsert { query {
qsource(func: eq(Teacher.internalId, "53513")) {
source as uid
}
qt1(func: eq(Teacher.internalId, "76637")) {
t1 as uid
}
qsim1(func: eq(SimilarTeacher.internalId, "5351376637")) {
sim1 as uid
}
qrev_sim1(func: eq(SimilarTeacher.internalId, "7663753513")) {
rev_sim1 as uid
}
qt2(func: eq(Teacher.internalId, "56968")) {
t2 as uid
}
qsim2(func: eq(SimilarTeacher.internalId, "5351356968")) {
sim2 as uid
}
qrev_sim2(func: eq(SimilarTeacher.internalId, "5696853513")) {
rev_sim2 as uid
} } mutation { set {
uid(sim1) <SimilarTeacher.internalId> "5351376637" .
uid(sim1) <SimilarTeacher.teacher> uid(t1) .
uid(sim1) <SimilarTeacher.score> "0.5270462766947299" .
uid(sim1) <dgraph.type> "SimilarTeacher" .
uid(source) <Teacher.similarTeacher> uid(sim1) .
uid(rev_sim1) <SimilarTeacher.internalId> "7663753513" .
uid(rev_sim1) <SimilarTeacher.teacher> uid(source) .
uid(rev_sim1) <SimilarTeacher.score> "0.5270462766947299" .
uid(rev_sim1) <dgraph.type> "SimilarTeacher" .
uid(t1) <Teacher.similarTeacher> uid(rev_sim1) .
uid(sim2) <SimilarTeacher.internalId> "5351356968" .
uid(sim2) <SimilarTeacher.teacher> uid(t2) .
uid(sim2) <SimilarTeacher.score> "0.5163977794943223" .
uid(sim2) <dgraph.type> "SimilarTeacher" .
uid(source) <Teacher.similarTeacher> uid(sim2) .
uid(rev_sim2) <SimilarTeacher.internalId> "5696853513" .
uid(rev_sim2) <SimilarTeacher.teacher> uid(source) .
uid(rev_sim2) <SimilarTeacher.score> "0.5163977794943223" .
uid(rev_sim2) <dgraph.type> "SimilarTeacher" .
uid(t2) <Teacher.similarTeacher> uid(rev_sim2) . } } }
when running the update usually one teacher will be updated with 500 similar teachers per one call.
one update takes about 250 ms
we have 130,000* 100 update statements and at the moment ETA to complete all of the updates is very large
so the ETA to update all the records is
8,449,935,000 / 500 (rows per batch) * 250 (ms per batch) / 1000 (to get seconds) /60 (get mins.) / 60 (get hrs.) / 24 (get days) = 48.9 days
is there any suggestions for us to speed up this process?
really appreciate your help