How to groupby date

yupengfei · August 21, 2018, 8:41am

I store logs in dgraph and I want to count them by date, how can I do this?

MichelDiz · August 28, 2018, 4:41am

Sorry for the delay.

Unfortunately Dgraph has no way to use GroupBy Date. For GB uses UID. What you can do is create your own structure representing indexing. Follow the example below.

Perhaps we can evaluate as a future feature, what do you think @shanea ?

Cheers.

Schema


name: string @index(term) @lang .
release_date: datetime @index(year) .
group_year: uid @reverse .
group_month: uid @reverse .

Mutation

{
  set {

    #We create the groups as nodes
    _:group_year <name> "1979" .
    _:group_year2 <name> "1950" .
    _:group_year3 <name> "1990" .

    _:group_month <name> "ago" .

    # Then we relate them to the Nodes created 
    # (to use some existing node would just 
    # exchange the BlankNode for the UID from 
    # that existing node structure )
    # e.g: <0x1fa43> <group_year> _:NodeA .

    _:NodeA <name> "Node A" .
    _:NodeA <release_date> "1979-12-07" .
    _:group_year <group_year> _:NodeA .

    _:NodeB <name> "Node B" .
    _:NodeB <release_date> "1979-12-07" .
    _:group_year <group_year> _:NodeB .

    _:NodeC <name> "Node C" .
    _:NodeC <release_date> "1950-12-07" .
    _:group_year2 <group_year> _:NodeC .

    _:NodeD <name> "Node D" .
    _:NodeD <release_date> "1990-12-07" .
    _:group_year3 <group_year> _:NodeD .

    _:NodeE <name> "Node E" .
    _:NodeE <release_date> "1990-12-07" .
    _:group_year3 <group_year> _:NodeE .
    _:group_month <group_month> _:grou_month .

  }
}

Query

{
  var(func:has(release_date)) @groupby(~group_year) 
    {
      a as count(uid)
      # a is a genre UID to count value variable
    
  }

  byYear(func: uid(a), orderasc: name) {
    uid
    name
    group_year {expand(_all_)}
  }
}

Result


{
  "data": {
    "byYear": [
      {
        "uid": "0x272a",
        "name": "1950",
        "group_year": [
          {
            "name": "Node C",
            "release_date": "1950-12-07T00:00:00Z"
          }
        ]
      },
      {
        "uid": "0x272d",
        "name": "1979",
        "group_year": [
          {
            "name": "Node A",
            "release_date": "1979-12-07T00:00:00Z"
          },
          {
            "name": "Node B",
            "release_date": "1979-12-07T00:00:00Z"
          }
        ]
      },
      {
        "uid": "0x272e",
        "name": "1990",
        "group_year": [
          {
            "name": "Node D",
            "release_date": "1990-12-07T00:00:00Z"
          },
          {
            "name": "Node E",
            "release_date": "1990-12-07T00:00:00Z"
          }
        ]
      }
    ]
  }
}

yupengfei · August 28, 2018, 7:42am

Thank you, I use dgraph to analysis logs, is it suitable?

MichelDiz · August 28, 2018, 2:29pm

You can use Dgraph perfectly, but I would recommend BadgerDB for this situation. A DB graph is best for situations of a complex DB full of tables and connections. Dgraph’s idea is to traverse it easily.

Now, if you want to create a logs app that has graphic characteristics. Where logs are related and you study log relationships for some reason. Then yes I would recommend Dgraph.

Cheers.

shanea · August 28, 2018, 3:02pm

Yes. I captured it as a request. Thanks for flagging this @yupengfei

yupengfei · August 29, 2018, 1:47am

YES, I am building a recommend system for our platform, so full of connections.

labs20 · September 1, 2018, 3:26pm

This is the example of lack of “mundane” functionality that is scaring me by now.

How do I build charts?

Plot a line chart of the “issues count” (Y), series by “issue type”, grouped by “[day | week | month]” (X)

@groupby should be able to group things:

@groupby(company_name)
@groupby(DATE(_create_datetime))
@groupby(MONTH(_create_datetime))
@groupby(YEAR(_create_datetime))
@groupby(TIME(_create_datetime))
etc…

groupby, cascade and normalize NEEDs attention.

Its looking to me like a super potent and fast car with a bad driving system.

labs20 · September 3, 2018, 12:12pm

I’m trying this approach but I’m really lost here.

I’ve created a group_year, group_month, group_day and a group_week_day. Then I’ve created relationships between an issue and each of one of those based on its created DATETIME.

How do a get all issues from 2018-05-01 to 2018-07-23, grouped by day AND COUNTED BY ISSUE TYPE?

I want this result:

{
    data: {
       result:[
            {
                date: "2018-05-01",
                issue_types: [
                     {
                        type: 'alert',
                        total: 8
                     },
                     {
                        type: 'problem',
                        total: 2
                     },
                ] 
            },
            {
                date: "2018-05-02",
                issue_types: [
                     {
                        type: 'alert',
                        total: 1
                     },
                     {
                        type: 'problem',
                        total: 5
                     },
                ] 
            },
            ...
       ]
    } 
}

And I’d like to write something like this:

{
    result(func: has(__is_issue)) 
      @filter(ge(created_time, '2018-05-01') AND le(created_time, '2018-07-23')) 
      @groupedby(date(created_time) as DAY)

   {
        date: DAY  #grouped created_time
        issue_types @groupedby(type as grouped_types) {
            type
           count(grouped_types)
        }
    }
}

But, right now, with a HUGE deadline hanging over my neck, I do appreciate any help on this.

Thanks @MichelDiz, @shanea

labs20 · September 3, 2018, 2:34pm

If I try this:

{
    classificacoes(func: has(__is_classificacoes)) @cascade{
      uid
      issue_type: descricao
      issues: ~classificacoes @groupby(datas){
        count(uid)
      }
    }   
}

I got this:

{
  "data": {
    "classificacoes": [
      {
        "uid": "0x3f13",
        "issue_type": "normal",
        "issues": [
          {
            "@groupby": [
              {
                "datas": "0x3f6c",
                "count": 1
              },
              {
                "datas": "0x3f91",
                "count": 1
              },
              {
                "datas": "0x3f97",
                "count": 1
              },
              {
                "datas": "0x3fcb",
                "count": 1
              },
              {
                "datas": "0x4063",
                "count": 1
              },
              {
                "datas": "0x3f81",
                "count": 3
              },
              {
                "datas": "0x3f9e",
                "count": 4
              },
              {
                "datas": "0x3fd6",
                "count": 4
              },
              {
                "datas": "0x3fb7",
                "count": 5
              },
              {
                "datas": "0x4021",
                "count": 5
              },
              {
                "datas": "0x3f3a",
                "count": 6
              },
              {
                "datas": "0x3fe8",
                "count": 7
              },
              {
                "datas": "0x4019",
                "count": 10
              }
            ]
          }
        ]
      },
      {
        "uid": "0x3f14",
        "issue_type": "alerta",
        "issues": [
          {
            "@groupby": [
              {
                "datas": "0x3f5f",
                "count": 1
              },
              {
                "datas": "0x3f66",
                "count": 1
              },
              {
                "datas": "0x3f97",
                "count": 1
              },
              {
                "datas": "0x3f9e",
                "count": 1
              },
              {
                "datas": "0x3fcb",
                "count": 1
              },
              {
                "datas": "0x3fd6",
                "count": 1
              },
              {
                "datas": "0x3fe8",
                "count": 1
              },
              {
                "datas": "0x4019",
                "count": 1
              },
              {
                "datas": "0x4046",
                "count": 1
              },
              {
                "datas": "0x4063",
                "count": 1
              },
              {
                "datas": "0x4021",
                "count": 5
              }
            ]
          }
        ]
      },
      {
        "uid": "0x3f15",
        "issue_type": "problema",
        "issues": [
          {
            "@groupby": [
              {
                "datas": "0x3f81",
                "count": 1
              },
              {
                "datas": "0x4021",
                "count": 1
              },
              {
                "datas": "0x3f6c",
                "count": 2
              },
              {
                "datas": "0x3f9e",
                "count": 2
              },
              {
                "datas": "0x3fe8",
                "count": 3
              }
            ]
          }
        ]
      },
      {
        "uid": "0x3f16",
        "issue_type": "urgência",
        "issues": [
          {
            "@groupby": [
              {
                "datas": "0x3f9e",
                "count": 1
              },
              {
                "datas": "0x3fd6",
                "count": 1
              }
            ]
          }
        ]
      },
      {
        "uid": "0x3f17",
        "issue_type": "Emergência"
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 13978,
      "processing_ns": 6756208,
      "encoding_ns": 1270439
    },
    "txn": {
      "start_ts": 142004,
      "lin_read": {
        "ids": {
          "1": 154624
        }
      }
    }
  }
}

But if I try this:

{
  var(func: has(__is_classificacoes)) @cascade{
    id as uid
    type as descricao
    ~classificacoes   @groupby(datas){
      total as count(uid)
    }
  }  
        
  test (func: uid(id)) {
    uid
    issue_type: val(type)
    val(total)
  }
}

All I got is this:

{
  "data": {
    "test": [
      {
        "uid": "0x3f13",
        "issue_type": "normal"
      },
      {
        "uid": "0x3f14",
        "issue_type": "alerta"
      },
      {
        "uid": "0x3f15",
        "issue_type": "problema"
      },
      {
        "uid": "0x3f16",
        "issue_type": "urgência"
      },
      {
        "uid": "0x3f17",
        "issue_type": "Emergência"
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 38890,
      "processing_ns": 5725494,
      "encoding_ns": 493127
    },
    "txn": {
      "start_ts": 142024,
      "lin_read": {
        "ids": {
          "1": 154644
        }
      }
    }
  }
}

Completelly out of time here, so I’ll do this the dumbest way by looping trough result sets and building things by hand, but I do hope there is a better way.

If I’m beeing just dumb and blind, please point me to the right docs or examples.

I’m using the new 1.8 release.

Thanks.

MichelDiz · September 3, 2018, 3:16pm

I’ve did a gist for this

mutate.YML

I just added the “release_date” predicate that is indexed by date

query_1.YML

Used the filter in the Var block @filter(ge(release_date, "1950-05-01") AND le(release_date, "1980-07-23"))

query_2.YML

Used the filter in the Var block and the query itself in the edge.

gist.github.com

https://gist.github.com/MichelDiz/89b35f798661334392bd0ddd7f694913

mutate.YML

{
  set {

    #We create the groups as nodes
    _:group_year <name> "1979" .
    _:group_year <release_date> "1979-01-01" .
    
    _:group_year2 <name> "1950" .    
    _:group_year2 <release_date> "1950-01-01" .

This file has been truncated. show original

query_1.YML

{
  var(func:has(release_date)) 
  @groupby(~group_year)
    @filter(ge(release_date, "1950-05-01") AND le(release_date, "1980-07-23"))
    {
      a as count(uid)
      # a is a genre UID to count value variable
    
  }

This file has been truncated. show original

query_2.YML

{
  var(func:has(release_date)) 
  @groupby(~group_year)
    @filter(ge(release_date, "1950-05-01") AND le(release_date, "1980-07-23"))
    {
      a as count(uid)
      # a is a genre UID to count value variable
    
  }

This file has been truncated. show original

labs20 · September 3, 2018, 4:02pm

Hi @MichelDiz, thank you very much for your answer. But it is missing in your suggestion the third level “types”, and is where I mostly got lost.

EDITED I was wrong on this paragraph, your way cover this well:
Also, my scenario is a bit more complex because I have to deal with days, and in a given search period I can have may-10, jun-10 and jul-10, so a simple “group_days” wont do it.
EDITED

Result needed:

{
    data: {
       total_issues_by_date_by_type:[
            {
                date: "2018-05-01",
                issue_types: [
                     {
                        type: 'alert',
                        total: 8
                     },
                     {
                        type: 'problem',
                        total: 2
                     },
                ] 
            },
            {
                date: "2018-06-01",
                issue_types: [
                     {
                        type: 'alert',
                        total: 1
                     },
                     {
                        type: 'problem',
                        total: 5
                     },
                ] 
            },
            ...
       ]
    } 
}

Do you have any suggestion given those specifics?

Thank you for your answer!

MichelDiz · September 3, 2018, 4:28pm

This can get a bit more complicated. As it is known, groupby does not group by Datetime or other types. Only by UID. To get what you want. You will have to create Nodes for days instead of years, going straight nodes for days… You will create many nodes, many for sure. Or wait for group support by other types.

On second thought, it would be very complex without the slightest shadow of doubt to create it by days. For you would have to create at least two or three Var blocks to get the desired result. But it would be a complex tree to deal with.

As you edited the answer, update me if you have the desired result.

Cheers.

labs20 · September 3, 2018, 4:42pm

About the dates your suggestion does work, but about the types level I’m still lost.

I think is time to GraphQl+ - be pushed to a GraphQL++ by getting this kind of functionality trivial for the developer, as it should be.

We are talking about a simple “group by” situation here for the dumbiest chart possible: total By type By date.

If getting to this is so hard, one of us is wrong at the most basic level: or I need to do my homework better (and it could be the case) or the language urges to evolve. Thats positive and constructive criticism, dont get me wrong, but this shoud be much easier to accomplish.

As soon as I got some time I have a bunch of complains about @cascade and @normalize too, but I need this done first.

If you have any suggestion for the “types” thing, it would be super.

Really thanks for your effort.

MichelDiz · September 3, 2018, 4:45pm

Are you using this same structure in “type”? Or is it a “kind”?

labs20 · September 3, 2018, 4:52pm

Think this way:

JSON mutation as example

{
    issues:[
        {
            __is_issue: true,
            created_time: "2018-05-01 14:32:00",
            description: "Things are not good right now",
            dates: [
                {
                    __is_dates: true,
                    day: "2018-05-01 00:00:00",  // So I can use your suggestion
                }
            ],
            issue_types: [
                {
                     __is_issue_types: true,
                    type: "Alert"
                }
            ]
        },
        {
            __is_issue: true,
            created_time: "2018-05-21 17:20:00",
            description: "Things are still bad",
            dates: [
                {
                    __is_dates: true,
                    day: "2018-05-21 00:00:00",  
                }
            ],
            issue_types: [
                {
                     __is_issue_types: true,
                    type: "Big Problem"
                }
            ]
        }, 
        ...
    ]
}

this should give:

{
    data: {
       total_issues_by_date_by_type:[
            {
                date: "2018-05-01",
                issue_types: [
                     {
                        type: 'Alert',
                        total: 1
                     }
                ] 
            },
            {
                date: "2018-05-21",
                issue_types: [
                     {
                        type: 'Big Problem',
                        total: 1
                     }
                ] 
            },
            ...
       ]
    } 
}

MichelDiz · September 3, 2018, 5:55pm

I updated the gist by adding one more level of “group”. It’s complicated by adding more than one group. So it would be better to wait for native support.

btw, I’m reading what you write in your examples, just analyzing.

link Dgraph http://discuss.hypermode.com/t/how-to-groupby-date/3040/ · GitHub

group_month	uid	reverse
group_type	uid	reverse
group_year	uid	reverse
name	string	exact
release_date	datetime	year
type	default

{
  var(func: has(type), first: 1, offset: 0) {
    uid
		name
    group_type @groupby(~group_year) {
      a as count(uid)
      # a is a genre UID to count value variable
    }
  }
    
  var(func: has(type), first: 1, offset: 1) {
    uid
    name
    group_type @groupby(~group_year) {
      a2 as count(uid)
      # a is a genre UID to count value variable
    }
  }

  problem(func: uid(a), orderdesc: val(a)) {
    name
    group_year { 
      name 
      release_date
    }
  }

   alert(func: uid(a2), orderdesc: val(a2)) {
    name
    group_year { 
      name 
      release_date
    }
  }
}

Result


{
  "data": {
    "problem": [
      {
        "name": "1990",
        "group_year": [
          {
            "name": "Node D",
            "release_date": "1990-12-07T00:00:00Z"
          },
          {
            "name": "Node E",
            "release_date": "1990-12-07T00:00:00Z"
          }
        ]
      },
      {
        "name": "1950",
        "group_year": [
          {
            "name": "Node C",
            "release_date": "1950-12-07T00:00:00Z"
          }
        ]
      }
    ],
    "alert": [
      {
        "name": "1979",
        "group_year": [
          {
            "name": "Node A",
            "release_date": "1979-12-07T00:00:00Z"
          },
          {
            "name": "Node B",
            "release_date": "1979-12-07T00:00:00Z"
          }
        ]
      }
    ]
  }

labs20 · September 3, 2018, 6:30pm

Ok @MichelDiz, thank you very much.

I think we’ve hitted the wall here. I think that what you suggest can work, but it simple cannot be done in a real wolrd app, where “types” are dynamic and I have no way to know how many the end user had created.

I’ll have to do this by looping query => code => query => code as needed. Really ugly thing, but…

The way I see, looks like some time and energy had to be putted on low level things of the engine and I do simpatize with that. But its time to move on to the “user level” and make this more confortable to use.

And that, IMHO, is wrong. Not in technical terms, as I am 100% sure that the devs had the reasons to do like that, but wrong in terms of usability.

I do want to ask for more predicades inside a groupby subquery, as it just makes sense to do that.
I do want to @normalize or to @cascade in a per branch basis, because it just make sense to.

Now, I know that reasons must be to things be the way they are. Its just doesnt means that they are good as it is.

Thank you for your support.

MichelDiz · September 3, 2018, 8:46pm

I believe the only reason for this is the habit with SQL.

Below it contains partly personal opinion and a bit of context with my notion of world LOL (do not consider as “law”)

Taking GraphQL as an example.Technically speaking does not support groupBy, as far as I know. GraphQL is very rigid and limited in syntax and functions intentionally. It is like that Just to avoid common problems as it is in SQL. It does not include arbitrary logic like SQL for example.

On the other hand Dgraph gives several options as being a DB itself and not an API language. However, Dgraph’s lang is based on GraphQL. So you have to wait for feature support or help with PRs. For something beyond this. If it’s the case really.

Although GraphQL does not support groupBy, several other Libs generate this support for GraphQL. Like Apollo, Relay and so on (Prisma will support some day). And they all do it via Edges (but using filters). Just like Dgraph does, but Dgraph doesn’t need filters for this.

Taking SQL as an example. Usually It would do “GROUP BY” through JOINS between tables. Compared to Dgraph, each node in your model you could consider as a “table” to JOIN. So it kind “fits” the SQL logic right?

The ideal for you is use query templating and filters. For now. Or try to fit your model to what I have shown.

Fill a issue for that or support a open one. If there is one.

I hope I’m not beeing confuse.

PS. You can simply put a GraphQL server doing all the hard work your APP would do. So you would not need a query templating or something. GraphQL is a great API language. I recommend.

ref: Example of GraphQL Supports.

geoyws · May 10, 2020, 6:15pm

@MichelDiz Any plans to implement @groupby(month) perhaps? This would be very helpful and would help save a lot of time for us poor devs

MichelDiz · May 10, 2020, 6:20pm

No plans, we have some issues with similar points but no plans for it either.

See this Suggestions to Improve GroupBy

I think I gonna open a specific issue for that (“Groupby value”) and see if there is some popularity in the community.

Topic		Replies	Views
Groupby on multiple fields using DQL Dgraph kind:question	20	1797	September 16, 2022
Running total (cumulative sum) by date Dgraph	14	2347	August 29, 2019
Suggestions to Improve GroupBy Dgraph proposal , discussion	6	2276	December 10, 2021
The Good, The Bad, The Ugly - State of Dgraph Users	2	4147	August 28, 2022
Group By Date Dgraph	4	344	August 24, 2021

How to groupby date

Schema

Mutation

Query

Result

mutate.YML

query_1.YML

query_2.YML

Result

Related topics