Moved from GitHub pydgraph/75
Posted by d4l3k:
I’ve been trying to use pydgraph to load data to train a PyTorch model but the pydgraph client really doesn’t like multiprocessing/forking. If you make a call via pydgraph before forking it will throw errors if there are any concurrent queries made in the sub processes.
import torch
from torch.utils.data import Dataset, DataLoader
import pydgraph
def dgraph_client():
stub = pydgraph.DgraphClientStub('localhost:9080')
return pydgraph.DgraphClient(stub)
class GraphDataset(Dataset):
def __init__(self):
super().__init__()
self.docs = list(range(100))
def __len__(self) -> int:
return len(self.docs)
def __getitem__(self, i):
resp = dgraph_client().txn(read_only=True).query(
"""{
user(func: has(username), first: 1) {
uid
username
}
}""",
)
print(resp)
return torch.tensor([i])
train_dataset = GraphDataset()
train_dataset[0] # removing this line fixes this code
train_loader = DataLoader(train_dataset, batch_size=8, num_workers=8)
# running multiple dgraph requests in parallel causes the crash
print(next(iter(train_loader)))
Output
json: "{\"user\":[{\"uid\":\"0x2\",\"username\":\"blah\"}]}"
txn {
start_ts: 1177930
}
latency {
parsing_ns: 17013
processing_ns: 308037700
encoding_ns: 920495
}
Exception ignored in: <function _Rendezvous.__del__ at 0x7f182b90fb70>
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/grpc/_channel.py", line 436, in __del__
with self._state.condition:
AttributeError: '_Rendezvous' object has no attribute '_state'
Traceback (most recent call last):
File "repro.py", line 38, in <module>
print(next(iter(train_loader)))
File "/usr/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in __next__
return self._process_next_batch(batch)
File "/usr/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
TypeError: __init__() missing 3 required positional arguments: 'call', 'response_deserializer', and 'deadline'
Seems to be caused by some issue with grpc. Googling the error didn’t pull anything up from the grpc project so it might be specific to how pydgraph uses it.
It’d be really nice if torch’s DataLoader was well supported since it’ll make graph learning much easier to do. I’m very excited to use dgraph for ML
Versions:
Name: pydgraph
Version: 1.2.0
Summary: Official Dgraph client implementation for Python
Home-page: https://github.com/dgraph-io/pydgraph
Author: Dgraph Labs
Author-email: contact@dgraph.io
License: Apache License, Version 2.0
Location: /usr/lib/python3.7/site-packages
Requires: grpcio, protobuf
Required-by:
---
Name: grpcio
Version: 1.22.0
Summary: HTTP/2-based RPC framework
Home-page: https://grpc.io
Author: The gRPC Authors
Author-email: grpc-io@googlegroups.com
License: Apache License 2.0
Location: /usr/lib/python3.7/site-packages
Requires: six
Required-by: tensorflow, tensorflow-serving-api, tensorflow-serving-api-gpu, tensorboard, pydgraph
---
Name: protobuf
Version: 3.7.0
Summary: Protocol Buffers
Home-page: https://developers.google.com/protocol-buffers/
Author: None
Author-email: None
License: 3-Clause BSD License
Location: /usr/lib/python3.7/site-packages
Requires: six, setuptools
Required-by: tensorflow, tensorflow-serving-api, tensorflow-serving-api-gpu, tensorboardX, tensorboard, pydgra
ph, googleapis-common-protos, google-api-core
---
Name: torch
Version: 1.1.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: UNKNOWN
Author: UNKNOWN
Author-email: UNKNOWN
License: UNKNOWN
Location: /usr/lib/python3.7/site-packages
Requires: numpy
Required-by: torchvision