[bug] doesn't handle forking well

Moved from GitHub pydgraph/75

Posted by d4l3k:

I’ve been trying to use pydgraph to load data to train a PyTorch model but the pydgraph client really doesn’t like multiprocessing/forking. If you make a call via pydgraph before forking it will throw errors if there are any concurrent queries made in the sub processes.

import torch
from torch.utils.data import Dataset, DataLoader
import pydgraph

def dgraph_client():
    stub = pydgraph.DgraphClientStub('localhost:9080')
    return pydgraph.DgraphClient(stub)


class GraphDataset(Dataset):
    def __init__(self):
        super().__init__()

        self.docs = list(range(100))

    def __len__(self) -> int:
        return len(self.docs)

    def __getitem__(self, i):
        resp = dgraph_client().txn(read_only=True).query(
            """{
                user(func: has(username), first: 1) {
                   uid
                   username
                }
            }""",
        )
        print(resp)
        return torch.tensor([i])


train_dataset = GraphDataset()
train_dataset[0] # removing this line fixes this code

train_loader = DataLoader(train_dataset, batch_size=8, num_workers=8)
# running multiple dgraph requests in parallel causes the crash
print(next(iter(train_loader)))

Output

json: "{\"user\":[{\"uid\":\"0x2\",\"username\":\"blah\"}]}"
txn {
  start_ts: 1177930
}
latency {
  parsing_ns: 17013
  processing_ns: 308037700
  encoding_ns: 920495
}

Exception ignored in: <function _Rendezvous.__del__ at 0x7f182b90fb70>
Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/grpc/_channel.py", line 436, in __del__
    with self._state.condition:
AttributeError: '_Rendezvous' object has no attribute '_state'
Traceback (most recent call last):
  File "repro.py", line 38, in <module>
    print(next(iter(train_loader)))
  File "/usr/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in __next__
    return self._process_next_batch(batch)
  File "/usr/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
TypeError: __init__() missing 3 required positional arguments: 'call', 'response_deserializer', and 'deadline'

Seems to be caused by some issue with grpc. Googling the error didn’t pull anything up from the grpc project so it might be specific to how pydgraph uses it.

It’d be really nice if torch’s DataLoader was well supported since it’ll make graph learning much easier to do. I’m very excited to use dgraph for ML :slight_smile:

Versions:

Name: pydgraph
Version: 1.2.0
Summary: Official Dgraph client implementation for Python
Home-page: https://github.com/dgraph-io/pydgraph
Author: Dgraph Labs
Author-email: contact@dgraph.io
License: Apache License, Version 2.0
Location: /usr/lib/python3.7/site-packages
Requires: grpcio, protobuf
Required-by:
---
Name: grpcio
Version: 1.22.0
Summary: HTTP/2-based RPC framework
Home-page: https://grpc.io
Author: The gRPC Authors
Author-email: grpc-io@googlegroups.com
License: Apache License 2.0
Location: /usr/lib/python3.7/site-packages
Requires: six
Required-by: tensorflow, tensorflow-serving-api, tensorflow-serving-api-gpu, tensorboard, pydgraph
---
Name: protobuf
Version: 3.7.0
Summary: Protocol Buffers
Home-page: https://developers.google.com/protocol-buffers/
Author: None
Author-email: None
License: 3-Clause BSD License
Location: /usr/lib/python3.7/site-packages
Requires: six, setuptools
Required-by: tensorflow, tensorflow-serving-api, tensorflow-serving-api-gpu, tensorboardX, tensorboard, pydgra
ph, googleapis-common-protos, google-api-core
---
Name: torch
Version: 1.1.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: UNKNOWN
Author: UNKNOWN
Author-email: UNKNOWN
License: UNKNOWN
Location: /usr/lib/python3.7/site-packages
Requires: numpy
Required-by: torchvision

martinmr commented :

Yes, this issue is known and is in fact related to gRPC (python grpc server with multiprocessing fails · Issue #16001 · grpc/grpc · GitHub). As far as I know, there’s no plans for the gRPC team to fix this so unfortunately there’s not much we can do about it. The real issue is the lack of decent multithreading in python (the existing solutions seem to mimic threads by spawning a new process).

d4l3k commented :

Might want to document the behavior of grpc somewhere. grpc/fork_support.md at master · grpc/grpc · GitHub

I ended up working around this by creating a multiprocessing.Pool on program launch and then running all dgraph client requests in the subprocess using apply. That way the parent doesn’t make any RPC calls and run into weird grpc forking behavior.