How about doing some graph-compute in a query?

BlankRain · April 22, 2019, 8:26am

let me show you a demo.
step 1. prepare data.

{
  set {
    # friends
    _:a <friend> _:b . 
    _:b <friend> _:c . 
    _:c <friend> _:d  . 
    _:a <friend> _:d  . 
    _:a <name> "Alice" .
    _:b <name> "Bob" .
    _:c <name> "Tom" .
    _:d <name> "Mallory" .
    
    # family
    _:f <name> "FlyBird" .
    _:h <name> "Hellen" .
    _:f <family> _:a  .
    _:h <family> _:b  .
    
  }
}

step 2. hack the source code of dgraph,and build.
I do this step in my own repo. we talk about it later.

step 3. run a query

3.1 louvain it’s a method for community detection

query x{
	 q (func: has(name)) {
    name
    label :  math(louvain(friend,family))  # means build edges  via friend  family
    }
}

the response is this:

{
  "extensions": {
    "server_latency": {
      "parsing_ns": 15087,
      "processing_ns": 42974591,
      "encoding_ns": 957638
    },
    "txn": {
      "start_ts": 13
    }
  },
  "data": {
    "q": [
      {
        "name": "FlyBird",
        "label": 0
      },
      {
        "name": "Hellen",
        "label": 1
      },
      {
        "name": "Alice",
        "label": 0
      },
      {
        "name": "Bob",
        "label": 1
      },
      {
        "name": "Tom",
        "label": 2
      },
      {
        "name": "Mallory",
        "label": 2
      }
    ]
  }
}

3.2 pagerank

query x{
	 q (func: has(name)) {
    name
    rank :  math(pagerank(friend,family))  # means build edges  via friend  family
    }
}

response:

{
  "extensions": {
    "server_latency": {
      "parsing_ns": 18284,
      "processing_ns": 47249353,
      "encoding_ns": 866556
    },
    "txn": {
      "start_ts": 17
    }
  },
  "data": {
    "q": [
      {
        "name": "FlyBird",
        "rank": 0.070086
      },
      {
        "name": "Hellen",
        "rank": 0.070086
      },
      {
        "name": "Alice",
        "rank": 0.129653
      },
      {
        "name": "Bob",
        "rank": 0.184756
      },
      {
        "name": "Tom",
        "rank": 0.227142
      },
      {
        "name": "Mallory",
        "rank": 0.318277
      }
    ]
  }
}

ok,now , what I’m facing…
I do my own hack in “parser” ,and the louvain,pagerank function inject as a math function .
Question 1:
Is it a good or graceful way to put this kind of functions in math() block?
pagerank,louvain is all about graph computing. not the common math as we familar (+,-,*,/,sin,cos ,…).

Question 2:
the grammar, or the query we write is in this pattern.

 (partA) {
  partB
}

In partA, what you do is filter and get all uids you need.
for example: q (func: has(name)) { # I use has name to get all matched uids. }
In partB, what you do is apply functions with all uids
for example:

  (partA) { 
   name
# I apply pagerank function to all uids I selected.
   rank :  math(pagerank(friend,family)) 
}

Is this struct (partA) {partB} a good or graceful or nature way for applying graph-computing functions ?

PS:
I build a image in docker.
use the docker-compose file you can quickly have a test.

docker-compose-pagerank.yaml (641 Bytes)

docker-comose -f docker-compuse-pagerank.yaml up

MichelDiz · April 22, 2019, 4:51pm

image: brainrequired/dgraph:pagerank

Have you edited Dgraph’s code? if so can you share it?

martinmr · April 22, 2019, 6:24pm

Other people have already suggested that we support other graph algorithms. In general, we are not very keen to support graph algorithms within Dgraph’s internal code unless there’s a lot of demand for it.

Currently we support shortest path queries but even doing something simple like that involves a lot of code that needs to be maintained in detriment of other core features.

I suggest you open a feature request in our github repo. If a lot of people show interest in having a specific algorithm supported we can consider integrating it inside Dgraph.

BlankRain · April 23, 2019, 1:39am

yes , I already edited dgraph 's source code. I will share it later .

BlankRain · April 23, 2019, 1:56am

hi michel, this is my code GitHub - BlankRain/dgraph at pagerank.
search ProcessFuncParam . you will get all what I changed.

BlankRain · April 23, 2019, 1:58am

Could dgraph open an interface ,like PluginTokenizer ,so they can add their own graph algorithms?

MichelDiz · April 23, 2019, 2:29am

In fact this possibility already exists. https://docs.dgraph.io/query-language/#implementing-a-plugin

BlankRain · April 23, 2019, 6:02am

This plugin is for string index, not graph algorithms. I mean, open an interface for graph algorithms.

MichelDiz · April 23, 2019, 4:07pm

Oh, okay, I thought It was to anything. Well you can PR us or open an issue detailing this.

martinmr · April 23, 2019, 9:04pm

I am not sure we could provide such an interface. It seems too general to be of any use. Also, I don’t think opening an interface would give you any advantages over simply using the query language to implement a given algorithm in your application logic.

I was thinking about this issue yesterday and I have an idea on how to implement algorithms within Dgraph that might work/be more flexible than the current approach.

Right now, we are implementing shortest paths by modifying our query language so the algorithm can be implemented using our internal representation of the graph. This probably gets us faster performance but in my view it has some issues.

Implementing a new algorithm requires someone that understands the internals of Dgraph well enough to avoid breaking other features. This set of people consists probably of only the people that work/have worked on Dgraph itself. It doesn’t lend itself to an external contributor sending a PR for a new algorithm since they first need to get familiar with Dgraph’s internals.
Implementing a new algorithm might involve changing the internals of Dgraph to make it work. Internal changes also require that we verify that the existing algorithms and query features are still working. In my opinion this might make the query code more brittle in the long run as a seemingly simple change might cause other features to break in unexpected ways.

I was thinking instead that we could have a new endpoint (called “algo” for example) along with “query”, “mutate”, and “alter”. Inside the hood, this endpoint would run the selected algorithm with the passed arguments but it would do it not by changing Dgraph internal query code, but by using the query language. Basically, it would implement an algorithm the way a user would in their application code.

The advantages of this approach would be.

Provide a thoroughly tested reference implementation of a given algorithm instead of asking every user that needs one to implement their own in their application code.
Avoid having to change the internal query code for every new algorithm implementation or changes to an existing algorithm. This would give us more confidence that each piece (query language and algorithm implementations) works correctly and won’t break the other when it’s changed.
It would let other people contribute new algorithms more easily. All that is needed is that you are familiar with golang, the algorithm you want to implement, and Dgraph’s query language. This subset of people is probably a lot larger than the set of people that could implement a new algorithm today.

The downside is some implementations might be slower than they would otherwise be if they were more closely integrated with Dgraph but that might be an acceptable trade-off for most users.

I think shortest-path algorithms are easy enough to implement using something like this as a proof of concept.

@mrjn, Any thoughts?

mrjn · April 24, 2019, 2:06am

Replying to @martinmr:

I like the general idea, but not sure how it would work. Something like shortest path requires one to do breadth-first search repeatedly, which means doing edge traversals. Note that edge traversals for us could mean going over network, which we do via ProcessTaskOverNetwork, and the general SubGraph interface. A user would need to understand that to retrieve data as they need during the execution of the algorithm. This might or might not be hard, but I don’t see a very easy solution here considering Dgraph is a distributed system and the data can’t be assumed to be available locally.

One could argue that the user can just repeatedly run queries to get the next portion of the data. This becomes an issue if the intermediate set of results are in millions and to traverse corresponding edges, user would need to run a million queries. That definitely won’t scale. So, we’d need to get the user to either become comfortable mucking with Tasks/SubGraphs or put together some wrapper functions around them.

BlankRain · April 24, 2019, 2:47am

I like your idea.
It’s great and more comprehensive.

@mrjn It’s a hard work. I have no good idea about it.

Is there any safe, distributed-algorithm frame for distributed system ? so that developer can
quickly write some algorithm without careing about the data, the net call or anything else.

system · May 24, 2019, 2:47am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is there a good way to implement graph algorithms? Users	6	2681	February 18, 2021
Graph Algorithms Support/Documentation Dgraph dgraph , priority:p2 , kind:feature , popular , area:querylang:algos	2	1545	February 1, 2019
Graph Algorithms Implementations in DGraph Dgraph	2	2593	September 30, 2019
Graph Data Science on Dgraph Dgraph	7	1937	November 30, 2021
How about a graph compute engine based on dgraph except for spark? Dgraph kind:question	4	749	December 28, 2020

How about doing some graph-compute in a query?

Related topics