How about doing some graph-compute in a query?

let me show you a demo.
step 1. prepare data.

{
  set {
    # friends
    _:a <friend> _:b . 
    _:b <friend> _:c . 
    _:c <friend> _:d  . 
    _:a <friend> _:d  . 
    _:a <name> "Alice" .
    _:b <name> "Bob" .
    _:c <name> "Tom" .
    _:d <name> "Mallory" .
    
    # family
    _:f <name> "FlyBird" .
    _:h <name> "Hellen" .
    _:f <family> _:a  .
    _:h <family> _:b  .
    
  }
}

step 2. hack the source code of dgraph,and build.
I do this step in my own repo. we talk about it later.

step 3. run a query

3.1 louvain :wink: it’s a method for community detection

query x{
	 q (func: has(name)) {
    name
    label :  math(louvain(friend,family))  # means build edges  via friend  family
    }
}

the response is this:

{
  "extensions": {
    "server_latency": {
      "parsing_ns": 15087,
      "processing_ns": 42974591,
      "encoding_ns": 957638
    },
    "txn": {
      "start_ts": 13
    }
  },
  "data": {
    "q": [
      {
        "name": "FlyBird",
        "label": 0
      },
      {
        "name": "Hellen",
        "label": 1
      },
      {
        "name": "Alice",
        "label": 0
      },
      {
        "name": "Bob",
        "label": 1
      },
      {
        "name": "Tom",
        "label": 2
      },
      {
        "name": "Mallory",
        "label": 2
      }
    ]
  }
}

3.2 pagerank

query x{
	 q (func: has(name)) {
    name
    rank :  math(pagerank(friend,family))  # means build edges  via friend  family
    }
}

response:

{
  "extensions": {
    "server_latency": {
      "parsing_ns": 18284,
      "processing_ns": 47249353,
      "encoding_ns": 866556
    },
    "txn": {
      "start_ts": 17
    }
  },
  "data": {
    "q": [
      {
        "name": "FlyBird",
        "rank": 0.070086
      },
      {
        "name": "Hellen",
        "rank": 0.070086
      },
      {
        "name": "Alice",
        "rank": 0.129653
      },
      {
        "name": "Bob",
        "rank": 0.184756
      },
      {
        "name": "Tom",
        "rank": 0.227142
      },
      {
        "name": "Mallory",
        "rank": 0.318277
      }
    ]
  }
}

ok,now , what I’m facing…
I do my own hack in “parser” ,and the louvain,pagerank function inject as a math function .
Question 1:
Is it a good or graceful way to put this kind of functions in math() block?
pagerank,louvain is all about graph computing. not the common math as we familar (+,-,*,/,sin,cos ,…).

Question 2:
the grammar, or the query we write is in this pattern.

 (partA) {
  partB
}

In partA, what you do is filter and get all uids you need.
for example: q (func: has(name)) { # I use has name to get all matched uids. }
In partB, what you do is apply functions with all uids
for example:

  (partA) { 
   name
# I apply pagerank function to all uids I selected.
   rank :  math(pagerank(friend,family)) 
}

Is this struct (partA) {partB} a good or graceful or nature way for applying graph-computing functions ?

PS:
I build a image in docker.
use the docker-compose file you can quickly have a test.

docker-compose-pagerank.yaml (641 Bytes)

docker-comose -f docker-compuse-pagerank.yaml up

2 Likes
image: brainrequired/dgraph:pagerank

Have you edited Dgraph’s code? if so can you share it?

Other people have already suggested that we support other graph algorithms. In general, we are not very keen to support graph algorithms within Dgraph’s internal code unless there’s a lot of demand for it.

Currently we support shortest path queries but even doing something simple like that involves a lot of code that needs to be maintained in detriment of other core features.

I suggest you open a feature request in our github repo. If a lot of people show interest in having a specific algorithm supported we can consider integrating it inside Dgraph.

yes , I already edited dgraph 's source code. I will share it later .

hi michel, this is my code GitHub - BlankRain/dgraph at pagerank.
search ProcessFuncParam . you will get all what I changed.

Could dgraph open an interface ,like PluginTokenizer ,so they can add their own graph algorithms?

In fact this possibility already exists. https://docs.dgraph.io/query-language/#implementing-a-plugin

This plugin is for string index, not graph algorithms. I mean, open an interface for graph algorithms.

1 Like

Oh, okay, I thought It was to anything. Well you can PR us or open an issue detailing this.

I am not sure we could provide such an interface. It seems too general to be of any use. Also, I don’t think opening an interface would give you any advantages over simply using the query language to implement a given algorithm in your application logic.

I was thinking about this issue yesterday and I have an idea on how to implement algorithms within Dgraph that might work/be more flexible than the current approach.

Right now, we are implementing shortest paths by modifying our query language so the algorithm can be implemented using our internal representation of the graph. This probably gets us faster performance but in my view it has some issues.

  1. Implementing a new algorithm requires someone that understands the internals of Dgraph well enough to avoid breaking other features. This set of people consists probably of only the people that work/have worked on Dgraph itself. It doesn’t lend itself to an external contributor sending a PR for a new algorithm since they first need to get familiar with Dgraph’s internals.

  2. Implementing a new algorithm might involve changing the internals of Dgraph to make it work. Internal changes also require that we verify that the existing algorithms and query features are still working. In my opinion this might make the query code more brittle in the long run as a seemingly simple change might cause other features to break in unexpected ways.

I was thinking instead that we could have a new endpoint (called “algo” for example) along with “query”, “mutate”, and “alter”. Inside the hood, this endpoint would run the selected algorithm with the passed arguments but it would do it not by changing Dgraph internal query code, but by using the query language. Basically, it would implement an algorithm the way a user would in their application code.

The advantages of this approach would be.

  1. Provide a thoroughly tested reference implementation of a given algorithm instead of asking every user that needs one to implement their own in their application code.

  2. Avoid having to change the internal query code for every new algorithm implementation or changes to an existing algorithm. This would give us more confidence that each piece (query language and algorithm implementations) works correctly and won’t break the other when it’s changed.

  3. It would let other people contribute new algorithms more easily. All that is needed is that you are familiar with golang, the algorithm you want to implement, and Dgraph’s query language. This subset of people is probably a lot larger than the set of people that could implement a new algorithm today.

The downside is some implementations might be slower than they would otherwise be if they were more closely integrated with Dgraph but that might be an acceptable trade-off for most users.

I think shortest-path algorithms are easy enough to implement using something like this as a proof of concept.

@mrjn, Any thoughts?

1 Like

Replying to @martinmr:

I like the general idea, but not sure how it would work. Something like shortest path requires one to do breadth-first search repeatedly, which means doing edge traversals. Note that edge traversals for us could mean going over network, which we do via ProcessTaskOverNetwork, and the general SubGraph interface. A user would need to understand that to retrieve data as they need during the execution of the algorithm. This might or might not be hard, but I don’t see a very easy solution here considering Dgraph is a distributed system and the data can’t be assumed to be available locally.

One could argue that the user can just repeatedly run queries to get the next portion of the data. This becomes an issue if the intermediate set of results are in millions and to traverse corresponding edges, user would need to run a million queries. That definitely won’t scale. So, we’d need to get the user to either become comfortable mucking with Tasks/SubGraphs or put together some wrapper functions around them.

1 Like

I like your idea.
It’s great and more comprehensive.

@mrjn It’s a hard work. I have no good idea about it.

Is there any safe, distributed-algorithm frame for distributed system ? so that developer can
quickly write some algorithm without careing about the data, the net call or anything else.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.