Return single object instead array for exactly one relation between node


(benny rio) #1

How to return single JSON object instead JSON array for one-to-one/many-to-one/unary relationship between node ?

for example, a high_school student node to high_school node relationship,
with assumption a high_school student can only studying at exactly one high_school

since DGraph always return array for node to node relationship, I must model the relationship in array too (for marshaling dgraph query result purpose), which is not right from data model perspective,

for example (in golang):

to model NQuad student — study_at — school

type School struct {
   Name string `json:"name"`
   Address string `json:"address"`
}

type Student struct {
  Name     string  `json:"name"`
  School   []School  `json:"study_at"`   #unary relationship, this should be single object, not array
}

(benny rio) #2

I’m ended using custom json unmarshal function like this (in golang)

package main

import (
	"encoding/json"
	"fmt"
)

type School struct {
	Name   string `json:"name"`
	Adress string `json:"address"`
}

type Student struct {
	Name   string `json:"name"`
	School School `json:"school"`
}

func (s *Student) UnmarshalJSON(data []byte) error {
	type TempStudent Student
	temp := &struct {
		*TempStudent
		Schools []School `json:"study_at"`
	}{TempStudent: (*TempStudent)(s)}
	if err := json.Unmarshal(data, &temp); err != nil {
		return err
	}
	if len(temp.Schools) == 1 {
		s.School = temp.Schools[0]
	}
	return nil
}

func main() {
	data := `{"name": "Crazy Student", "study_at": [{"name": "XYZ highschool", "address": "a school adresss"}]}`
	var s Student
	err := json.Unmarshal([]byte(data), &s)
	if err != nil {
		fmt.Println(err.Error())
	}
	fmt.Println(s.School.Name)  //no more s.School[0].Name
}

but I still wishing built-in tag/filter in dgraph to return single object instead array for special case, maybe something like this:

{
   query(func: eq(studentID, "001")) {
     name
     study_at @singleObject {    # tag/filter to tell dgraph to return single object, instead array of object
        name
        address  
   }
}

(Adam Steele) #3

I saw this thread, and I had a thought that I figured I would share. What if Dgraph assumed that anything with first: 1 or last: 1 should be returned as a single item. This would prevent the creation of another directive for this purpose, and instead ingrain it into an existing piece of functionality. Just a thought.


(benny rio) #4

I must disagree with your idea, “first” filter intended to get a subset from a set. I mean, “first:1” and “first: 2” should be consistent to return a subset.


(Adam Steele) #5

While I can see your point, one of the issues is that you are asking Dgraph to change it’s result behavior so that it will return only a single item. The problem is that nothing in the schema is enforcing that relationship. This isn’t SQL where you can have a unique index on the foreign key to enforce this at the database level, which means that it is entirely possible that you are forcing Dgraph to choose which item to return from a set of nodes. Should it return the first? The last? The most recently updated? There has to be something in place which allows it to determine what action should be done. By “piggy-backing” off the first/last operator, this is handled inherently.

I’m actually wondering now if this is something that should be handled at the schema level. Just as you can define a predicate as being a list, maybe a predicate which is a uid should be capable of being defined as being a 1 to 1 relationship. Then nothing in the query would be required for Dgraph to drop the array, and Dgraph could potentially enforce this condition during all mutations.


(benny rio) #6

If this handled on schema level, yes, I think dgraph will need behaviour change on processing predicates, that is why I propose to handle this on query level, by introducing new filter like @singleObject or @unary to tell dgraph to return single object instead array for one-to-one/unary relationship.

Another reason why I prefer to introduce new filter rather than using existing “first” or “last” filter is compability, I mean if we change existing “first” or “last” filter behaviour, old dgraph user could be affected, but if we introduce new filter, they won’t affected…


(Adam Steele) #7

Sorry for the delay, I missed the email notification containing the update.

After thinking about this more, I don’t think I can support doing this at the query level at all anymore (honestly after rereading my original proposed solution, I literally face palmed). Throughout our discussion I’ve never been able to shake the thought that doing this in the query feels like a hack. I think the best long term solution is essentially what I provided at the end of my last response. To reiterate:

The schema would be extended so that uid’s can also be defined as a list (which is currently the only behavior is supports), however just like the other standard datatypes, it can also be defined as being a single item. Then all mutations would be able to reject mutations which would violate this relationship.

I come from an SQL world, so here’s the solutions in SQL terms.

Assume we have 3 tables, schools, students, and schools_students (join table). Schools have many students, students each have 1 school. Obviously you could do this in SQL without the join table, but because all associations in Dgraph currently have the possibility of being many to many, this is the closest representation.

schools:
id (int)
name (string)

students:
id (int)
name (string)

schools_students:
school_id (int)
student_id (int)

Yours:

SELECT ...
FROM students
JOIN schools_students ON schools_students.student_id = students.id
JOIN schools ON schools..id = schools_students.school_id
GROUP BY students.id

Mine:

(At table creation)

ALTER TABLE `schools_students` ADD UNIQUE INDEX `student_id` (`student_id`);
SELECT ...
FROM students
JOIN schools_students ON schools_students.student_id = students.id
JOIN schools ON schools..id = schools_students.school_id

The difference:

The primary difference is that in my example, the schema is enforcing this association. It is impossible to end up breaking the 1 to 1 association assuming the database is doing it’s job. In your solution, the data can be out of spec allowing for the 1 to 1 association to become a 1 to many. In your proposed solution (@singleObject or @unary), your essentially doing what the group by is in this SQL example. This actually hides issues with the data. This also means that rather than throwing an error because something is violating this relationship, the database will happily continue on thinking that nothing is wrong.

Conclusion

While I think that your solution would technically solve the problem, I think it would end up causing more problems in the long term than it would solve. If Dgraph needs to be able to support one to one associations, then it needs to fully support one to one associations. This would means alterations at the schema level, and adhering to those restrictions during all mutations. Violating this relationship should result in an error just like saving a geo datatype in a datetime predicate would.

(Edit)
TLDR; don’t do this in the query, do it in the schema.

I’ve also posted the proposed change in dgraph’s github. I’m hoping maybe one of the dgraph developers can give us their 2 cents on the issue as well.


(benny rio) #8

I prefer dgraph to keep the “RDF spirit”

In a sense, just like RDF, dgraph is schema-less, CMIIW, schema in dgraph is just type safety compared to schema in SQL, we can have dgraph database without even having a schema, but it will be a bit verbose because we need to add RDF type in every mutation.

Even for uniqueness constraint, in dgraph, a programmer must handle it by their hand through transaction: Unique Indexes on predicates

So, to emulate your SQL unique constraint example in current dgraph implementation, just do it through transaction, for example when adding a new student node.

  • First we check if student with school is already exist in dgraph database, by doing this query:
{
    checking_query(func: uid(student_uid)) {
        name
        study_at @filter(uid(school_uid))  {
           name
        }
    }
}
  • if that query return a result, then abort the transaction, which mean it violating “a student must study in exactly one school” constraint, if not continue the transaction by adding new student data.

My proposal doesn’t related to uniqueness constraint, but query result representation, currently dgraph always return array for node-to-node predicate/relationship, it would nice if we have a query filter to tell dgraph to return single object on special case, like one-to-one relationship


(Fendor) #9

I stumpled on this problem as well, in my case I am using Java DGraph4j.

However, the problem was completely solved by using Jackson as a serialization library that can deserialize Single Element Lists into a single object.

In my opinion, it is the best solution to use a fitting serialization library.


(benny rio) #10

That also my adhoc solution, as in my second post, in golang I use custom un-marshaling.


(Fendor) #11

Yes, but manually unmarshalling is not feasible, hence a library that supports it.


(Adam Steele) #12

I agree that a library could help mitigate this issue, however for my purposes this will not be sufficient. The records we are storing in Dgraph are for reporting purposes. We are returning aggregates from these nodes, and because of the way that variables and aggregates work in Dgraph. A library for de-serialization would help if we were returning the raw data, but we are not.

I need a way of guaranteeing that a particular relationship will never be violated while never (or very rarely) actually returning the raw nodes. I understand this can be done in the query, and that the views expressed here are that Dgraph should be capable of running in a schema-less fashion. I do not believe that my proposed change would violate this.

I’m going to do my best not to misrepresent bennyrio. Apologies ahead of time if you feel I have.

Quoting:

In a sense, just like RDF, dgraph is schema-less, CMIIW, schema in dgraph is just type safety compared to schema in SQL, we can have dgraph database without even having a schema, but it will be a bit verbose because we need to add RDF type in every mutation.

Part of a type includes whether or not it is an array. I think this is actually arguing for my proposal. The type of []string is not the same as a type of string. What I’m proposing would differentiate between a type of []uid and uid. As far as I can tell there should not be a difference here. The rest of my proposal is derived from this initial assumption.

So if we assume that Dgraph should be able to establish, enforce, and differentiate between []T and T where T is any supported type (string, int, etc), why shouldn’t Dgraph also do this for the difference between []uid and uid?

I guess the real confusion I have is that Dgraph establishes as difference between []T and T for all datatypes except for uid. Why is uid not treated as all the other data types are?


(Adam Steele) #13

One of the members of the Dgraph team has picked up the github issue I submitted. I’m going to stop checking in/updating this forum post now. If you would like to see the progress of this feature’s development/progress, please go to the github issue. For ease of access, I’m going to re-post the link:


(benny rio) #14

Nice, I close my feature request on github, and link to yours