Need expand(_all_) like behaviour where schema isn't known beforehand

What I want to do

I want to log API responses from multiple sources in a single type. I can’t really guess all possible predicates before hand. It’s also possible that a sometimes a single predicate has different type for different API response.
Then I need to expand all fields on some of these responses without really knowing the type before hand. I know there would be timestamp in most of them and I would need filtering based on it.

What I did

Tried to modify the type on every append but it doesn’t work since some predicates collide.

Some other thread mentioned using facets but that seems counter-intuitive. More explanation if that is the right approach?
I know expand(_all_) need all predicates but couldn’t there be a dgraph.dyna type that can be assigned only to the nodes that need it. This way you wouldn’t have to keep track of all possible predicates in all nodes.

Once the data is inserted in the DB, the predicate is allocated as a default type. So you can find by quering like this schema { type }.

Can you share where this was suggested? The context should help here.

You can create a fake Type and use it to kind of “debug the schema” and create the official Schema. But you would have to upsert all possible nodes with the "dgraph.type": "myFakeType" and then you can query with expand(myFakeType)

you do not have to have "dgraph.type": "myFakeType" on a node for expand(myFakeType) to work. The only thing that consults dgraph.type is expand(_all_)

2 Likes

Didn’t quite get it. I made a schema from sample response but then some other response later on might have extra fields that I wont be able to query. How can I see these extra fields?
(Plus the schema is getting filled with predicates, it’s ok I guess but avoiding it would be nice)

seems like your use case is more along the lines of elasticsearch - any new field that comes in just automatically index and be able to search on - this is not the default modus operandi of dgraph. Any new fields ingested will be assigned to a group unindexed until it is told to index them. Indicies are required to run many of the filters necessary to begin a query.

To make it simple: no, there is no built in way to expand(everyPossibleField) without extra work.

Suggestion: assuming you have a structured input, keep the information split into predicates by field and keep them together in one field together in eg: JSON.

eg:

_:newLog <request_time> "2021/01/02-01:02:00" .
_:newLog <service_name> "thanos" .
_:newLog <response_code> "200" .
_:newLog <full_log> "{\"request_time\":\"2021/01/02-01:02:00\",\"service_name\":\"thanos\",\"response_code\":200}" .

then your queries will look like this:

{
  q(func: eq(service_name,"thanos")) @filter(eq(response_code,"200")) {
    full_log
  }
}

that way you will always get ‘all’ of the fields because they will be in full log no matter what you filter by. You can index whatever fields you need to whenever you decide you need to build indicies for them.

edit: and if this works for you, lobby for the JSON type to be added as a feature to the database, as it will make this feel a bit less icky.

1 Like

Thanks for the reply. I might just separate out the fields that need searching (if they exist) and dump the rest in the full log string. And yeah json field would be awesome.
The rest of my data is hard to model as well with many relationships, that’s the reason I chose a graph db for this application but now it seems that arangodb might be the better choice for this application. I do like dgraph but probably not the right fit here.