How to groupby date

I store logs in dgraph and I want to count them by date, how can I do this?

Sorry for the delay.

Unfortunately Dgraph has no way to use GroupBy Date. For GB uses UID. What you can do is create your own structure representing indexing. Follow the example below.

Perhaps we can evaluate as a future feature, what do you think @shanea ?

Cheers.

Schema


name: string @index(term) @lang .
release_date: datetime @index(year) .
group_year: uid @reverse .
group_month: uid @reverse .

Mutation

{
  set {

    #We create the groups as nodes
    _:group_year <name> "1979" .
    _:group_year2 <name> "1950" .
    _:group_year3 <name> "1990" .

    _:group_month <name> "ago" .

    # Then we relate them to the Nodes created 
    # (to use some existing node would just 
    # exchange the BlankNode for the UID from 
    # that existing node structure )
    # e.g: <0x1fa43> <group_year> _:NodeA .

    _:NodeA <name> "Node A" .
    _:NodeA <release_date> "1979-12-07" .
    _:group_year <group_year> _:NodeA .

    _:NodeB <name> "Node B" .
    _:NodeB <release_date> "1979-12-07" .
    _:group_year <group_year> _:NodeB .

    _:NodeC <name> "Node C" .
    _:NodeC <release_date> "1950-12-07" .
    _:group_year2 <group_year> _:NodeC .

    _:NodeD <name> "Node D" .
    _:NodeD <release_date> "1990-12-07" .
    _:group_year3 <group_year> _:NodeD .

    _:NodeE <name> "Node E" .
    _:NodeE <release_date> "1990-12-07" .
    _:group_year3 <group_year> _:NodeE .
    _:group_month <group_month> _:grou_month .

  }
}

Query

{
  var(func:has(release_date)) @groupby(~group_year) 
    {
      a as count(uid)
      # a is a genre UID to count value variable
    
  }

  byYear(func: uid(a), orderasc: name) {
    uid
    name
    group_year {expand(_all_)}
  }
}

Result


{
  "data": {
    "byYear": [
      {
        "uid": "0x272a",
        "name": "1950",
        "group_year": [
          {
            "name": "Node C",
            "release_date": "1950-12-07T00:00:00Z"
          }
        ]
      },
      {
        "uid": "0x272d",
        "name": "1979",
        "group_year": [
          {
            "name": "Node A",
            "release_date": "1979-12-07T00:00:00Z"
          },
          {
            "name": "Node B",
            "release_date": "1979-12-07T00:00:00Z"
          }
        ]
      },
      {
        "uid": "0x272e",
        "name": "1990",
        "group_year": [
          {
            "name": "Node D",
            "release_date": "1990-12-07T00:00:00Z"
          },
          {
            "name": "Node E",
            "release_date": "1990-12-07T00:00:00Z"
          }
        ]
      }
    ]
  }
}
1 Like

Thank you, I use dgraph to analysis logs, is it suitable?

You can use Dgraph perfectly, but I would recommend BadgerDB for this situation. A DB graph is best for situations of a complex DB full of tables and connections. Dgraph’s idea is to traverse it easily.

Now, if you want to create a logs app that has graphic characteristics. Where logs are related and you study log relationships for some reason. Then yes I would recommend Dgraph.

Cheers.

Yes. I captured it as a request. Thanks for flagging this @yupengfei

YES, I am building a recommend system for our platform, so full of connections.

1 Like

This is the example of lack of “mundane” functionality that is scaring me by now.

How do I build charts?

  • Plot a line chart of the “issues count” (Y), series by “issue type”, grouped by “[day | week | month]” (X)

@groupby should be able to group things:

  • @groupby(company_name)
  • @groupby(DATE(_create_datetime))
  • @groupby(MONTH(_create_datetime))
  • @groupby(YEAR(_create_datetime))
  • @groupby(TIME(_create_datetime))
    etc…

groupby, cascade and normalize NEEDs attention.

Its looking to me like a super potent and fast car with a bad driving system.

I’m trying this approach but I’m really lost here.

I’ve created a group_year, group_month, group_day and a group_week_day. Then I’ve created relationships between an issue and each of one of those based on its created DATETIME.

How do a get all issues from 2018-05-01 to 2018-07-23, grouped by day AND COUNTED BY ISSUE TYPE?

I want this result:

{
    data: {
       result:[
            {
                date: "2018-05-01",
                issue_types: [
                     {
                        type: 'alert',
                        total: 8
                     },
                     {
                        type: 'problem',
                        total: 2
                     },
                ] 
            },
            {
                date: "2018-05-02",
                issue_types: [
                     {
                        type: 'alert',
                        total: 1
                     },
                     {
                        type: 'problem',
                        total: 5
                     },
                ] 
            },
            ...
       ]
    } 
}

And I’d like to write something like this:

{
    result(func: has(__is_issue)) 
      @filter(ge(created_time, '2018-05-01') AND le(created_time, '2018-07-23')) 
      @groupedby(date(created_time) as DAY)

   {
        date: DAY  #grouped created_time
        issue_types @groupedby(type as grouped_types) {
            type
           count(grouped_types)
        }
    }
}

But, right now, with a HUGE deadline hanging over my neck, I do appreciate any help on this.

Thanks @MichelDiz, @shanea

1 Like

If I try this:

{
    classificacoes(func: has(__is_classificacoes)) @cascade{
      uid
      issue_type: descricao
      issues: ~classificacoes @groupby(datas){
        count(uid)
      }
    }   
}

I got this:

{
  "data": {
    "classificacoes": [
      {
        "uid": "0x3f13",
        "issue_type": "normal",
        "issues": [
          {
            "@groupby": [
              {
                "datas": "0x3f6c",
                "count": 1
              },
              {
                "datas": "0x3f91",
                "count": 1
              },
              {
                "datas": "0x3f97",
                "count": 1
              },
              {
                "datas": "0x3fcb",
                "count": 1
              },
              {
                "datas": "0x4063",
                "count": 1
              },
              {
                "datas": "0x3f81",
                "count": 3
              },
              {
                "datas": "0x3f9e",
                "count": 4
              },
              {
                "datas": "0x3fd6",
                "count": 4
              },
              {
                "datas": "0x3fb7",
                "count": 5
              },
              {
                "datas": "0x4021",
                "count": 5
              },
              {
                "datas": "0x3f3a",
                "count": 6
              },
              {
                "datas": "0x3fe8",
                "count": 7
              },
              {
                "datas": "0x4019",
                "count": 10
              }
            ]
          }
        ]
      },
      {
        "uid": "0x3f14",
        "issue_type": "alerta",
        "issues": [
          {
            "@groupby": [
              {
                "datas": "0x3f5f",
                "count": 1
              },
              {
                "datas": "0x3f66",
                "count": 1
              },
              {
                "datas": "0x3f97",
                "count": 1
              },
              {
                "datas": "0x3f9e",
                "count": 1
              },
              {
                "datas": "0x3fcb",
                "count": 1
              },
              {
                "datas": "0x3fd6",
                "count": 1
              },
              {
                "datas": "0x3fe8",
                "count": 1
              },
              {
                "datas": "0x4019",
                "count": 1
              },
              {
                "datas": "0x4046",
                "count": 1
              },
              {
                "datas": "0x4063",
                "count": 1
              },
              {
                "datas": "0x4021",
                "count": 5
              }
            ]
          }
        ]
      },
      {
        "uid": "0x3f15",
        "issue_type": "problema",
        "issues": [
          {
            "@groupby": [
              {
                "datas": "0x3f81",
                "count": 1
              },
              {
                "datas": "0x4021",
                "count": 1
              },
              {
                "datas": "0x3f6c",
                "count": 2
              },
              {
                "datas": "0x3f9e",
                "count": 2
              },
              {
                "datas": "0x3fe8",
                "count": 3
              }
            ]
          }
        ]
      },
      {
        "uid": "0x3f16",
        "issue_type": "urgência",
        "issues": [
          {
            "@groupby": [
              {
                "datas": "0x3f9e",
                "count": 1
              },
              {
                "datas": "0x3fd6",
                "count": 1
              }
            ]
          }
        ]
      },
      {
        "uid": "0x3f17",
        "issue_type": "Emergência"
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 13978,
      "processing_ns": 6756208,
      "encoding_ns": 1270439
    },
    "txn": {
      "start_ts": 142004,
      "lin_read": {
        "ids": {
          "1": 154624
        }
      }
    }
  }
}

But if I try this:

{
  var(func: has(__is_classificacoes)) @cascade{
    id as uid
    type as descricao
    ~classificacoes   @groupby(datas){
      total as count(uid)
    }
  }  
        
  test (func: uid(id)) {
    uid
    issue_type: val(type)
    val(total)
  }
}

All I got is this:

{
  "data": {
    "test": [
      {
        "uid": "0x3f13",
        "issue_type": "normal"
      },
      {
        "uid": "0x3f14",
        "issue_type": "alerta"
      },
      {
        "uid": "0x3f15",
        "issue_type": "problema"
      },
      {
        "uid": "0x3f16",
        "issue_type": "urgência"
      },
      {
        "uid": "0x3f17",
        "issue_type": "Emergência"
      }
    ]
  },
  "extensions": {
    "server_latency": {
      "parsing_ns": 38890,
      "processing_ns": 5725494,
      "encoding_ns": 493127
    },
    "txn": {
      "start_ts": 142024,
      "lin_read": {
        "ids": {
          "1": 154644
        }
      }
    }
  }
}

Completelly out of time here, so I’ll do this the dumbest way by looping trough result sets and building things by hand, but I do hope there is a better way.

If I’m beeing just dumb and blind, please point me to the right docs or examples.

I’m using the new 1.8 release.

Thanks.

I’ve did a gist for this

mutate.YML

I just added the “release_date” predicate that is indexed by date

query_1.YML

Used the filter in the Var block @filter(ge(release_date, "1950-05-01") AND le(release_date, "1980-07-23"))

query_2.YML

Used the filter in the Var block and the query itself in the edge.

Hi @MichelDiz, thank you very much for your answer. But it is missing in your suggestion the third level “types”, and is where I mostly got lost.

EDITED I was wrong on this paragraph, your way cover this well:
Also, my scenario is a bit more complex because I have to deal with days, and in a given search period I can have may-10, jun-10 and jul-10, so a simple “group_days” wont do it.
EDITED

Result needed:

{
    data: {
       total_issues_by_date_by_type:[
            {
                date: "2018-05-01",
                issue_types: [
                     {
                        type: 'alert',
                        total: 8
                     },
                     {
                        type: 'problem',
                        total: 2
                     },
                ] 
            },
            {
                date: "2018-06-01",
                issue_types: [
                     {
                        type: 'alert',
                        total: 1
                     },
                     {
                        type: 'problem',
                        total: 5
                     },
                ] 
            },
            ...
       ]
    } 
}

Do you have any suggestion given those specifics?

Thank you for your answer!

This can get a bit more complicated. As it is known, groupby does not group by Datetime or other types. Only by UID. To get what you want. You will have to create Nodes for days instead of years, going straight nodes for days… You will create many nodes, many for sure. Or wait for group support by other types.

On second thought, it would be very complex without the slightest shadow of doubt to create it by days. For you would have to create at least two or three Var blocks to get the desired result. But it would be a complex tree to deal with.

As you edited the answer, update me if you have the desired result.

Cheers.

About the dates your suggestion does work, but about the types level I’m still lost.

I think is time to GraphQl+ - be pushed to a GraphQL++ by getting this kind of functionality trivial for the developer, as it should be.

We are talking about a simple “group by” situation here for the dumbiest chart possible: total By type By date.

If getting to this is so hard, one of us is wrong at the most basic level: or I need to do my homework better (and it could be the case) or the language urges to evolve. Thats positive and constructive criticism, dont get me wrong, but this shoud be much easier to accomplish.

As soon as I got some time I have a bunch of complains about @cascade and @normalize too, but I need this done first.

If you have any suggestion for the “types” thing, it would be super.

Really thanks for your effort.

1 Like

Are you using this same structure in “type”? Or is it a “kind”?

Think this way:

JSON mutation as example

{
    issues:[
        {
            __is_issue: true,
            created_time: "2018-05-01 14:32:00",
            description: "Things are not good right now",
            dates: [
                {
                    __is_dates: true,
                    day: "2018-05-01 00:00:00",  // So I can use your suggestion
                }
            ],
            issue_types: [
                {
                     __is_issue_types: true,
                    type: "Alert"
                }
            ]
        },
        {
            __is_issue: true,
            created_time: "2018-05-21 17:20:00",
            description: "Things are still bad",
            dates: [
                {
                    __is_dates: true,
                    day: "2018-05-21 00:00:00",  
                }
            ],
            issue_types: [
                {
                     __is_issue_types: true,
                    type: "Big Problem"
                }
            ]
        }, 
        ...
    ]
}

this should give:

{
    data: {
       total_issues_by_date_by_type:[
            {
                date: "2018-05-01",
                issue_types: [
                     {
                        type: 'Alert',
                        total: 1
                     }
                ] 
            },
            {
                date: "2018-05-21",
                issue_types: [
                     {
                        type: 'Big Problem',
                        total: 1
                     }
                ] 
            },
            ...
       ]
    } 
}

I updated the gist by adding one more level of “group”. It’s complicated by adding more than one group. So it would be better to wait for native support.

btw, I’m reading what you write in your examples, just analyzing.

link https://gist.github.com/MichelDiz/89b35f798661334392bd0ddd7f694913

group_month uid reverse
group_type uid reverse
group_year uid reverse
name string exact
release_date datetime year
type default
{
  var(func: has(type), first: 1, offset: 0) {
    uid
		name
    group_type @groupby(~group_year) {
      a as count(uid)
      # a is a genre UID to count value variable
    }
  }
    
  var(func: has(type), first: 1, offset: 1) {
    uid
    name
    group_type @groupby(~group_year) {
      a2 as count(uid)
      # a is a genre UID to count value variable
    }
  }

  problem(func: uid(a), orderdesc: val(a)) {
    name
    group_year { 
      name 
      release_date
    }
  }

   alert(func: uid(a2), orderdesc: val(a2)) {
    name
    group_year { 
      name 
      release_date
    }
  }
}

Result


{
  "data": {
    "problem": [
      {
        "name": "1990",
        "group_year": [
          {
            "name": "Node D",
            "release_date": "1990-12-07T00:00:00Z"
          },
          {
            "name": "Node E",
            "release_date": "1990-12-07T00:00:00Z"
          }
        ]
      },
      {
        "name": "1950",
        "group_year": [
          {
            "name": "Node C",
            "release_date": "1950-12-07T00:00:00Z"
          }
        ]
      }
    ],
    "alert": [
      {
        "name": "1979",
        "group_year": [
          {
            "name": "Node A",
            "release_date": "1979-12-07T00:00:00Z"
          },
          {
            "name": "Node B",
            "release_date": "1979-12-07T00:00:00Z"
          }
        ]
      }
    ]
  }

Ok @MichelDiz, thank you very much.

I think we’ve hitted the wall here. I think that what you suggest can work, but it simple cannot be done in a real wolrd app, where “types” are dynamic and I have no way to know how many the end user had created.

I’ll have to do this by looping query => code => query => code as needed. Really ugly thing, but…

The way I see, looks like some time and energy had to be putted on low level things of the engine and I do simpatize with that. But its time to move on to the “user level” and make this more confortable to use.

And that, IMHO, is wrong. Not in technical terms, as I am 100% sure that the devs had the reasons to do like that, but wrong in terms of usability.

I do want to ask for more predicades inside a groupby subquery, as it just makes sense to do that.
I do want to @normalize or to @cascade in a per branch basis, because it just make sense to.

Now, I know that reasons must be to things be the way they are. Its just doesnt means that they are good as it is.

Thank you for your support.

I believe the only reason for this is the habit with SQL.

Below it contains partly personal opinion and a bit of context with my notion of world LOL (do not consider as “law”)

Taking GraphQL as an example.Technically speaking does not support groupBy, as far as I know. GraphQL is very rigid and limited in syntax and functions intentionally. It is like that Just to avoid common problems as it is in SQL. It does not include arbitrary logic like SQL for example.

On the other hand Dgraph gives several options as being a DB itself and not an API language. However, Dgraph’s lang is based on GraphQL. So you have to wait for feature support or help with PRs. For something beyond this. If it’s the case really.

Although GraphQL does not support groupBy, several other Libs generate this support for GraphQL. Like Apollo, Relay and so on (Prisma will support some day). And they all do it via Edges (but using filters). Just like Dgraph does, but Dgraph doesn’t need filters for this.

Taking SQL as an example. Usually It would do “GROUP BY” through JOINS between tables. Compared to Dgraph, each node in your model you could consider as a “table” to JOIN. So it kind “fits” the SQL logic right?

The ideal for you is use query templating and filters. For now. Or try to fit your model to what I have shown.

Fill a issue for that or support a open one. If there is one.

I hope I’m not beeing confuse.

PS. You can simply put a GraphQL server doing all the hard work your APP would do. So you would not need a query templating or something. GraphQL is a great API language. I recommend.

ref: Example of GraphQL Supports.

1 Like

@MichelDiz Any plans to implement @groupby(month) perhaps? This would be very helpful and would help save a lot of time for us poor devs :stuck_out_tongue:

No plans, we have some issues with similar points but no plans for it either.

See this Suggestions to Improve GroupBy

I think I gonna open a specific issue for that (“Groupby value”) and see if there is some popularity in the community.

1 Like