Can a subject (node) have mutliple of the same value predicate (edges) but with different facets and values?

mutation
example

(Anshul Kanakia) #1

I am interested in storing stock price data on dgraph. To do so, I have set up nodes (subjects) in the graph to be companies (with predicates like name, ticker_symbol, market_cap, etc.) I would like to add a predicate <closing_day_price> with a datetime facet (date=“2019-04-16T:00:00.00”) for each closing day price for the past X years. So the mutation would look something like:
{
set {
_:company_a_id <closing_day_price> 102.23 (date=“2019-04-17T:00:00.00”) .
_:company_a_id <closing_day_price> 102.23 (date=“2019-04-16T:00:00.00”) .
_:company_a_id <closing_day_price> 101.55 (date=“2019-04-15T:00:00.00”) .
_:company_a_id <closing_day_price> 110.1 (date=“2019-04-14T:00:00.00”) .
_:company_a_id <closing_day_price> 112.99 (date=“2019-04-13T:00:00.00”) .
}
}

So, for node “company_a”, will this create a new edge for each closing day price with a different date facet that I can filter on later (like getting all the closing prices between 2 dates, etc.) or will it overwrite the same predicate <closing_day_price> over and over with different date facet and values?

In general, is this a good strategy for storing timeseries data for a particular node that is filterable? If not, what would you suggest?


(Martin Martinez Rivera) #2

This would only be possible if the values were unique, which does not seem to be the case in your application. Once you have repeated values, they will be overwritten by any new triple with the same value (even if the facets are different).

You can accomplish the same by having <closing_day_price> point to an uid node (instead of a scalar node). From this node there are two scalar predicates, “date” and “price”.


(Javier Alvarado) #3

Since the subject (_:company_a_id) and predicate (<closing_day_price>) is the same, you would be overwriting the same node. Only the last closing price and date will be found in the database after the transaction.

Instead, you could use

    set {
        <_:company1> <name> "Acme Corporation" .

        <_:closing1> <company> <_:company1> .
        <_:closing1> <price> "102.23" (date="2019-04-17T:00:00.00") .

        <_:closing2> <company> <_:company1> .
        <_:closing2> <price> "102.23" (date="2019-04-16T:00:00.00") .

        <_:closing3> <company> <_:company1> .
        <_:closing3> <price> "101.55" (date="2019-04-15T:00:00.00") .

        <_:closing4> <company> <_:company1> .
        <_:closing4> <price> "110.10" (date="2019-04-14T:00:00.00") .

        <_:closing5> <company> <_:company1> .
        <_:closing5> <price> "112.99" (date="2019-04-13T:00:00.00") .
    }

or, without facets,

    set {
        <_:company1> <name> "Acme Corporation" .

        <_:closing1> <company> <_:company1> .
        <_:closing1> <date> "2019-04-17T:00:00.00" .
        <_:closing1> <price> "102.23" .

        <_:closing2> <company> <_:company1> .
        <_:closing2> <date> "2019-04-16T:00:00.00" .
        <_:closing2> <price> "102.23" .

        <_:closing3> <company> <_:company1> .
        <_:closing3> <date> "2019-04-15T:00:00.00" .
        <_:closing3> <price> "101.55" .

        <_:closing4> <company> <_:company1> .
        <_:closing4> <date> "2019-04-14T:00:00.00" .
        <_:closing4> <price> "110.10" .

        <_:closing5> <company> <_:company1> .
        <_:closing5> <date> "2019-04-13T:00:00.00" .
        <_:closing5> <price> "112.99" .
    }

I can’t say which approach would be better. I suggest trying some typical queries and comparing.


(Michel Conrado) #4

I think the better approach with Facets would be like below (no need for reverse and you can filter the facets on edge)

Remembering that the focus of the Facets are the Edges (add an extra info between nodes), not the predicates with values. Using facets in values is not very useful.

{
 set {
        _:company1 <name> "Acme Corporation" .

        _:company1 <closing_day_price> <_:closing1> (date="2019-04-17T:00:00.00") .
        _:closing1 <price> "102.23" .

        _:company1 <closing_day_price> <_:closing2> (date="2019-04-16T:00:00.00") .
        _:closing2 <price> "102.23" .

        _:company1 <closing_day_price> <_:closing3> (date="2019-04-15T:00:00.00") .
        _:closing3 <price> "101.55" .

        _:company1 <closing_day_price> <_:closing4> (date="2019-04-14T:00:00.00") .
        _:closing4 <price> "110.10" .

        _:company1 <closing_day_price> <_:closing5> (date="2019-04-13T:00:00.00") .
        _:closing5 <price> "112.99".

}
}

Query


{
  t(func: eq(name, "Acme Corporation")) {
    closing_day_price @facets @facets(ge(date, "2019-04-15T:00:00.00")) { 
      uid
      price
    }
  }
}

Result

{
  "data": {
    "t": [
      {
        "closing_day_price": [
          {
            "uid": "0x1ed2a8",
            "price": "102.23",
            "closing_day_price|date": "2019-04-17T:00:00.00"
          },
          {
            "uid": "0x1ed2a9",
            "price": "102.23",
            "closing_day_price|date": "2019-04-16T:00:00.00"
          },
          {
            "uid": "0x1ed2aa",
            "price": "101.55",
            "closing_day_price|date": "2019-04-15T:00:00.00"
          }
        ],
        "uid": "0x1ed2ad"
      }
    ]
  }
}

Edge ID between 2 nodes
(Anshul Kanakia) #5

Thanks very much for the feedback. I have decided to use a slightly more generic approach, by creating a “date value type” since this will be useful in other circumstances in our project as well. Schema looks something like this:

dated_value_t: string .
dated_int: int @index(int) .
dated_float: float @index(float) .
dated_string: sting @index(hash) .
on: dateTime @index(day) .
before: dateTime @index(day) .
after: dateTime @index(day) .

And is used like this:

{
    set {
        _:company1 <name> "Acme Corporation" .

        _:company1 <closing_prices> <_:closing1> .
        _:closing1 <dated_value_t> "" .
        _:closing1 <on> "2019-04-17T:00:00.00" .
        _:closing1 <dated_float> "102.23" .

        _:company1 <closing_day_price> <_:closing2> 
        _:closing2 <dated_value_t> "" .
        _:closing2 <on> "2019-04-16T:00:00.00" .
        _:closing2 <dated_float> "102.23" .

        _:company1 <closing_day_price> <_:closing3> .
        _:closing3 <dated_value_t> "" .
        _:closing3 <on> "2019-04-15T:00:00.00" .
        _:closing3 <dated_float> "101.55" .

        _:company1 <closing_day_price> <_:closing4> .
        _:closing4 <dated_value_t> "" .
        _:closing4 <on> "2019-04-14T:00:00.00" .
        _:closing4 <dated_float> "110.10" .

        _:company1 <closing_day_price> <_:closing5> .
        _:closing5 <dated_value_t> "" .
        _:closing5 <on> "2019-04-13T:00:00.00".
        _:closing5 <dated_float> "112.99" .
    }
}

the “date_value_t” is just an internal node type to keep things more readable and it indicates that the node will have at least one date, either “on”, “before” or “after” and can be handled accordingly later down our pipeline. Thanks for your help. Much appreciated.


(system) closed #6

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.