Connection not found for insert

Hi I can query but received an error message while doing an insert:

{
  "name": "t",
  "url": "http://localhost:8080/mutate?commitNow=true",
  "errors": [
    {
      "message": "cannot retrieve predicate information: No connection exists",
      "extensions": {
        "code": "ErrorInvalidRequest"
      }
    }
  ]
}

The dgraph alpha server message:

Move predicate request: predicate:"name" source_gid:1 dest_gid:2 txn_ts:3440019

The dgraph zero server message:
I0825 04:40:16.655843   30155 zero.go:438] Connected: cluster_info_only:true
I0825 04:40:16.656903   30155 zero.go:420] Got connection request: id:1 addr:"localhost:7080"
I0825 04:40:16.657287   30155 zero.go:551] Connected: id:1 addr:"localhost:7080"
I0825 04:47:30.954485   30155 tablet.go:208]

Groups sorted by size: [{gid:2 size:0} {gid:1 size:240067445}]

I0825 04:47:30.954536   30155 tablet.go:213] size_diff 240067445
I0825 04:47:30.954946   30155 tablet.go:108] Going to move predicate: [name], size: [70 MB] from group 1 to 2
I0825 04:47:30.955069   30155 tablet.go:135] Starting move: predicate:"name" source_gid:1 dest_gid:2 txn_ts:3440019
E0825 04:47:30.958280   30155 tablet.go:70] while calling MovePredicate: rpc error: code = Unknown desc = Unable to find a connection for group: 2

My dgraph version is:
v1.2.6
on ubuntu 18

If my understanding of the error message is correct,
dgraph is detecting a few clusters and it is trying to move it to another cluster,
I don’t think I set up any new cluster as this is on a single machine that I SSH into
is there anyway to solve this problem please?

Hi @M.Lau, Dgraph is trying to move the predicate from 1 group to another. For more information about sharding, you can read here. Can you tell us a little more about the configuration of your cluster (Number of alphas and zeros) and possible steps to reproduce this error on a newly start cluster(if you are able to reproduce it).

Ok. I didn’t setup any sharding because this is a single machine setup.

I have only 1 zero and 1 alpha. But I wonder whether this error happens because I accidentally forgot to close the original alpha while starting a new alpha in a new folder. Is this a possible reason why it is trying to split the files into two places?

Is there a way to disable auto sharding or predicate auto rebalancing? so that the user can manually decide when to do so.

You can set the interval of sharding by using --rebalance_interval when you start dgraph zero, after which dgraph will try to redistribute the predicate among groups. Currently, you cannot disable sharding completely but can set to a high value. But If there is only a single group then that should not be something to be worried about. Can you post the response of /state endpoint of your zero if still facing the same issue?

I have the state endpoint of the alpha…not able to get state end point of zero…what should i do to get it?

Here is Alpha State End Point, it show two alpha at 7080 which is weird?

{
    "counter": "1660771",
    "groups": {
        "1": {
            "members": {
                "1": {
                    "id": "1",
                    "groupId": 1,
                    "addr": "localhost:7080",
                    "leader": true,
                    "lastUpdate": "1598330419"
                }
            },
            "tablets": {
                "action": {
                    "groupId": 1,
                    "predicate": "action"
                },
                ... predicates here 
            },
            "snapshotTs": "3395461",
            "checksum": "7004392364713691865"
        },
        "2": {
            "members": {
                "2": {
                    "id": "2",
                    "groupId": 2,
                    "addr": "IPADDR:7080",
                    "leader": true,
                    "lastUpdate": "1597913245"
                }
            }
        }
    },
    "zeros": {
        "1": {
            "id": "1",
            "addr": "localhost:5080",
            "leader": true
        }
    },
    "maxLeaseId": "952307",
    "maxTxnTs": "3450000",
    "maxRaftId": "2",
    "cid": "ddee15d0-416e-4b14-ac4f-fdad40d18875",
    "license": {
        "maxNodes": "18446744073709551615",
        "expiryTs": "1586669495"
    }
}

@M.Lau,

From the state information, you have two groups(“1” and “2”). You might arrive in this state as you forgot to stop old alpha. Since you started another alpha without stopping the first one, zero node might have done rebalancing.

Yes. I think it is because I accidentally started a dgraph alpha in the wrong directory without p, w folders which resulted in 2 groups registered at zero.

I removed the group by running: curl “localhost:6080/removeNode?group=2&id=2”
because I noticed that group 1 id 1 is holding the tablets…

I was able to run a query in Ratel successfully.

But when I tried to run a mutation subsequently and Ratel gave this error message:
{
“name”: “t”,
“url”: “http://localhost:8080/mutate?commitNow=true”,
“errors”: [
{
“message”: “cannot retrieve predicate information: No connection exists”,
“extensions”: {
“code”: “ErrorInvalidRequest”
}
}
]
}

can you help me please?

my current zero state is:
{“counter”:“2181”,“groups”:{“1”:{“members”:{“1”:{“id”:“1”,“groupId”:1,“addr”:“localhost:7080”,“leader”:true,“lastUpdate”:“1598603128”}},“tablets”:{ …(… removed … ) ,“2”:{}},“zeros”:{“1”:{“id”:“1”,“addr”:“localhost:5080”,“leader”:true}},“maxLeaseId”:“1242364”,“maxTxnTs”:“30000”,“maxRaftId”:“2”,“removed”:[{“id”:“2”,“groupId”:2,“addr”:“IPADR:7080”,“leader”:true,“lastUpdate”:“1598597463”}],“cid”:“2793f804-3c1a-4b47-8da3-d72055972c53”,“license”:{“maxNodes”:“18446744073709551615”,“expiryTs”:“1600924445”,“enabled”:true}}

my dgraph alpha /health is showing that it is online:
curl localhost:8080/health

{“version”:“v1.2.6”,“instance”:“alpha”,“uptime”:574}

I solved the problem by:

  1. getting maxLeaseId from curl localhost:6080/state
  2. shutting down dgraph zero & dgraph alpha
  3. deleting the zw folder
  4. restarting dgraph zero & dgraph alpha
  5. remember to run command
    ‘curl localhost:6080/assign?what=uids&N=maxLeasedId’

This allow me to insert (run mutation) into the alpha.

I hope this helps the other users who are using single host setup.
How to replicate the problem:

  1. Start dgraph zero, dgraph alpha single host
  2. Shut down dgraph alpha
  3. cd to another directory
  4. start dgraph alpha
  5. shut down the wrong dgraph alpha
  6. cd to the original directory
  7. start the original dgraph alpha
  8. removeNode?group=2&id=2
  9. when running mutation get No Connection Error.

Since the old dgraph alpha has been shut down, there is no transfer of tablets into the new dgraph alpha that was started at the wrong directory. All you need to do is to shut down the new dgraph alpha. shut down dgraph zero, delete zw, and restart dgraph zero, dgraph alpha at the right directory.

I not sure if this is a bug but apparently dgraph zero is supposed to automatically remove groups that don’t have any alphas inside. For my case when I query state of zero, i get 2 groups. the 2nd group has 2:{} which means there is no alpha inside. So I think dgraph zero didn’t remove the 2nd group, even after I run removeNode. I guess the existance of 2 groups cause the zero to try to shard. And the dgraph alpha is unable to serve the tablets thus no mutation can happen.