Tracking provenance: use n-quads context/named graph alternative

rosecky · February 16, 2022, 10:43am

We’re looking into possibilities of leveraging graph representation of our data. So far, we’ve done a PoC with an RDF quad store… and liked the representation a lot but less to the robustness and performance. We’re considering switching to DGraph but would like to keep a similar mindset.

In a nutshell, we now make a heavy use of context / named graph of each triple. We use that to store provenance information about the triple, meaning a document which gave us that particular information. We’re often searching information in specific subgraphs determined by the context / named graph. When an update comes for a given item, we’re simply wiping out the old named graph and replacing it with a new one.

I’m new to DGraph but would like to understand: Is this possible to do in DGraph (in DQL / GQL / both)? I noticed several things:

This page uses “Label” in place of an RDF context / named graph. According to the document, this is different to an Attribute/Predicate but I saw no examples (or didn’t understand) of (1) setting this label for each edge during the ingestion; (2) using this label to look up data.
According to this page, you can put facets on edges and retrieve their values at query time. The following example also mentiones the possibility of only looking up edges with a specific facet value, which is great. But can you also easily look up any edge with the given facet value, regardless of the source / target node (our equivalent to the entire graph)? Would this approach to “wiping out the old document” be efficient?

data(func: eq(name, "Alice")) {
    friend @facets(eq(close, true)) {
      name
    }
  }

Daniel · February 17, 2022, 4:39pm

I don’t have any conclusive answer here but for what it’s worth:

I tried facets a bit and they’re pleasant to work with but I lacked the ability for them to point to actual nodes (as in RDF-star). Only primitive values are allowed which wasn’t sufficient for my use case.
I don’t think it’s possible to query facets directly as you allude to but I might be mistaken. Dql is very “search-oriented” and it sounds like your use case would benefit from the richness of SPARQL. Especially the Rdf-star compatible ones. I guess you could materialize your data in Dgraph if you need the performance and then tap into both systems via a unified GraphQL layer.

rosecky · March 1, 2022, 10:03am

Hi Daniel and thanks for the response. Yes, having facets pointing to nodes would be good as we actually do attach a fair bit of information to the graphs themselves. But I think we could actually do without that. As for RDF*, I don’t think we need to take that path either, pure RDF seems more than sufficient for our use-case: we only really need some context id for each statement, not the luxury of expressing statements about statements, which RDF* provides.

I’m wondering whether some DGraph folks could (1) Show the use of “Label” in practice; (2) Provide an estimate of how expensive it is to e.g. find / remove all edges with a specific facet value.

Topic		Replies	Views
Docs mention n-quads but not named graphs. Are there named graphs? Dgraph	3	1205	February 11, 2019
How to to model this? Dgraph	0	299	December 26, 2020
Clarification on using RDF with a Dgraph GraphQL Schema Dgraph	1	811	November 17, 2021
RDF N-Quads are not absolute Dgraph kind:question	1	487	December 1, 2021
Is Dgraph an RDF store? Dgraph faq	3	2516	July 15, 2020

Tracking provenance: use n-quads context/named graph alternative

Related topics