Tracking provenance: use n-quads context/named graph alternative

We’re looking into possibilities of leveraging graph representation of our data. So far, we’ve done a PoC with an RDF quad store… and liked the representation a lot but less to the robustness and performance. We’re considering switching to DGraph but would like to keep a similar mindset.

In a nutshell, we now make a heavy use of context / named graph of each triple. We use that to store provenance information about the triple, meaning a document which gave us that particular information. We’re often searching information in specific subgraphs determined by the context / named graph. When an update comes for a given item, we’re simply wiping out the old named graph and replacing it with a new one.

I’m new to DGraph but would like to understand: Is this possible to do in DGraph (in DQL / GQL / both)? I noticed several things:

  • This page uses “Label” in place of an RDF context / named graph. According to the document, this is different to an Attribute/Predicate but I saw no examples (or didn’t understand) of (1) setting this label for each edge during the ingestion; (2) using this label to look up data.
  • According to this page, you can put facets on edges and retrieve their values at query time. The following example also mentiones the possibility of only looking up edges with a specific facet value, which is great. But can you also easily look up any edge with the given facet value, regardless of the source / target node (our equivalent to the entire graph)? Would this approach to “wiping out the old document” be efficient?
data(func: eq(name, "Alice")) {
    friend @facets(eq(close, true)) {
      name
    }
  }

I don’t have any conclusive answer here but for what it’s worth:

  1. I tried facets a bit and they’re pleasant to work with but I lacked the ability for them to point to actual nodes (as in RDF-star). Only primitive values are allowed which wasn’t sufficient for my use case.

  2. I don’t think it’s possible to query facets directly as you allude to but I might be mistaken. Dql is very “search-oriented” and it sounds like your use case would benefit from the richness of SPARQL. Especially the Rdf-star compatible ones. I guess you could materialize your data in Dgraph if you need the performance and then tap into both systems via a unified GraphQL layer.

Hi Daniel and thanks for the response. Yes, having facets pointing to nodes would be good as we actually do attach a fair bit of information to the graphs themselves. But I think we could actually do without that. As for RDF*, I don’t think we need to take that path either, pure RDF seems more than sufficient for our use-case: we only really need some context id for each statement, not the luxury of expressing statements about statements, which RDF* provides.

I’m wondering whether some DGraph folks could (1) Show the use of “Label” in practice; (2) Provide an estimate of how expensive it is to e.g. find / remove all edges with a specific facet value.