[feature] Random sample

Every major database has a way of randomly sampling data. The response to the original feature request suggested a way of showing one random node. This means making O(N) individual queries, which is completely unacceptable overhead for something the DB could do easily natively. The only option in GraphQL land right now is to generate a list of all uids, select some, and query for those uids, which could mean holding MBs of IDs in application memory just to generate a query.

Is it possible to give a way to randomly sample data?

For sample data you can use lambda or remote pointing to a public API.

It is possible in either the UI or Lambda. You just have to count (N) and then get random X of N, then query for each N using pagination in a single query with X query blocks and return those as a single set

I assume you can use the random pagination feature in DQL to help solve your issue? (introduced v21.03)

{
  q(func: has(xid), random: 10){xid}
}
1 Like

This means making O(N) individual queries , which is completely unacceptable overhead for something the DB could do easily natively.

This means making O(N) individual queries , which is completely unacceptable overhead for something the DB could do easily natively.

@iluminae Thanks! Is this documented anywhere?? I can’t find it in the DQL docs.

I am sorry I think I was wrong on when it was introduced - here is the commit and I think it may be next version.

1 Like

No, my concept would be one query with multiple blocks inside, just FYI