Best conventions for storage of common object-type data across types

gorillastanley · August 28, 2020, 2:16am

Main types: Job, Company, Candidate …
I’d like to store status information (which can be represented in multiple edges/fields, such as whether the status is active, the name of the status) on each of these main types. Which solution would you recommend (or your own solution) and why?

1:
I have a Status type to describe the status of the above types.

type Status {
  name: String # eg "Active Client" (for Company), "Searching" (for Candidates)
  isActive: Boolean
  ...
}

Each main type has a one-to-one relationship with a Status type.

type Job {
  ...
  status: Status
}

This is a bit annoying though: if I want to update the status, I’d have to fetch the status object’s UID and mutate the status node (I do not want to create a new status object every time the status changes, and have redundant status nodes that are never used and hang in the system).
But is it possible in a mutation, to make it so dgraph will update (if it exists, else create) the status object in the Job type, without having to use UID or any ID specifier?

2:
Similar to the above, but the Status node’s data is flattened into each main type.

type Job {
  ...
  statusName: String
  statusIsActive: Boolean
}

type Company {
  ...
  statusName: String
  statusIsActive: Boolean
}

This means I won’t have to reference any UIDs or IDs when I want to change status-related information, since it’s just stored as predicates.
An issue with this is solution is if I want to change the representation of status-type information (eg more status-related fields, such as whether the date of the status update), I’d have to manually do it across all the main types.

3:
I define an interface for Status instead of using a type.
All the main types implement this interface.
An issue is that the schema doesn’t read well, and it isn’t intrinsic that a Company implements a Status (makes more sense that the Company owns the Status).

interface Status {
  ...
  statusName: String
  statusIsActive: Boolean
}

type Company implements Status {
  ...
}

type Job implements Status{
  ...
}

abhijit-kar · August 28, 2020, 3:39am

If status for each type is known before hand, then use enums. (Like Reddit tags)

e.g. Using Enums:

type CandidateStatus {
  ActivelySearching
  NotLooking
  ActivelyApplying
  ...
}

type Candidate {
  status: CandidateStatus!
}

If the status for a type is not known before hand, then go for a separate type. (Like Twitter Hashtags)

e.g.

type Status {
  name: String @id // The name itself will become id, so you won't have to fetch UIDs
}

type Company {
  status: Status!
}

If you know status before hand and you want to store more properties along with Status, then combine both approaches.

If there can be more properties related to Status, or you want to search for all the people who have set a particular status, you should have a separate status type.

Else if it’s just a single value, then it belongs to the parent type. (Like online status)

gorillastanley · August 28, 2020, 3:54am

Thanks for your reply.
Won’t use enums as I don’t know all of them beforehand.
I don’t agree with your use of Type in the 2nd example:

making a Status type that only contains one scalar predicate (name) doesn’t make sense, instead of placing that scalar predicate in the main type (Company)

I’d appreciate feedback on the solutions I proposed, such as which you think is the best.

abhijit-kar · August 28, 2020, 4:00am

It does when you want to search for all the other types, referring to that type. Plus it is open to future expansion, and I kept it empty for the sake of brevity.

See:

If there can be more properties related to Status, or you want to search for all the people who have set a particular status, you should have a separate status type.
Else if it’s just a single value, then it belongs to the parent type. (Like online status)

Someone from core team can comment on your approaches better.

gorillastanley · August 28, 2020, 4:10am

The thing is that my main types (Job, Company, Candidate …) all have a one to one relationship with this status-information.
It’s not guaranteed that the name in the Status type will be unique though. Candidate A could have the status “Searching”, and Candidate B could also have the status “Searching”, but their other status information can be different.

abhijit-kar · August 28, 2020, 4:25am

I see the problem now.

You have no choice but to use id.
So if your following issue is solved, then you won’t have any problem.

It is annoying to get a UID before updating it in GraphQL, but in DQL you can use Upserts and fetch UID into vars and then subsequently use it in a set block in a single call.

Checkout: Running DQL in GraphQL with Upserts.

P.S. Sorry can’t provide a working example.

amaster507 · August 28, 2020, 4:42am

I vote interfaces to make life simpler and improve scalability. This is a perfect use case for them.

The schema terminology will not read well until your overlook words like “implements” and think of them instead as “uses”

I have used the same Tag interface a dozen times in my schema. Not because every use case was exactly a tag, but because they all use the same fields. I may expand this later on as needed and break it into separate interfaces but when a single interface can serve so many purposes, why limit it.

Plus, I think* (haven’t tested or used it yet) that a type can implement multiple interfaces which will help bring down the schema size when needed.

abhijit-kar · August 28, 2020, 4:49am

Multiple interface implementation works.

MichelDiz · August 28, 2020, 5:04am

@gorillastanley, if this is about GraphQL (which I see it is) you have to make it at Users/GraphQL. That makes it easy to understand what you are asking and also the GraphQL guys are looking there first. I’ll move it.

midoc · August 28, 2020, 9:04pm

How are ENUMS treated? For example in:

enum CandidateStatus {
  ActivelySearching
  NotLooking
  ActivelyApplying
  ...
}

does each item (e.g. ‘ActivelySearching’) get its own UID (internally)? Or is it treated as String? Or a Type?

abhijit-kar · August 29, 2020, 4:07am

Excerpt:

Enums are serialised in Dgraph as strings.

Checkout Documentation for Enum.

midoc · August 31, 2020, 4:50pm

@abhijit-kar this is what made me ask the question:
From Documentation for Enum :

For hash and exact search on enums, the literal enum value, without quotes "..." , is used, for regexp, strings are required.

query {
    queryPost(filter: { tags: { eq: GraphQL } } ) { ... }
}

which made me think that ENUMs get special handling. Like Types.

amaster507 · August 31, 2020, 5:17pm

I think the confusion arises between how they are represented/used in the generated API vs. how they are stored by the underlying Badger.

They are stored as string property of the node. This can be seen by looking at the underlying DQL schema using Ratel.

They are represented/used in the generated API as a special enum case limiting input and output results.

This is further illustrated by changing the GQL schema and changing the ENUM options. What gets changed? Nothing on the database side would change. The only changes would be in the representation that is used in the generated inputs.

Topic		Replies	Views
Type/Schema System: introducing object types in schema Users	4	1874	November 28, 2017
[RFC] Naming of reserved predicate for type system Dgraph	15	956	March 26, 2019
No-object type (eg. int, bool, float) becomes an ` Object` type Dgraph kind:question , dgraph	2	494	September 14, 2022
Type System - Query language Documentation	0	448	August 28, 2020
Overview - Graphql Documentation	0	379	August 28, 2020

Best conventions for storage of common object-type data across types

If status for each type is known before hand, then use enums. (Like Reddit tags)

If the status for a type is not known before hand, then go for a separate type. (Like Twitter Hashtags)

If you know status before hand and you want to store more properties along with Status, then combine both approaches.

Related topics