Dgraph v20.07.0 / v20.03.0 unreliability in Mac OS environment

Report a Dgraph Bug

What version of Dgraph are you using?

v20.03.0 and v20.07.0

Have you tried reproducing the issue with the latest release?

yes

What is the hardware spec (RAM, OS)?

Mac OS 10.13.6 16 GByte RAM
Docker desktop 2.3.0.4 with engine 19.03.12

Steps to reproduce the issue (command/config used to run Dgraph).

run testcase “testConfRef” from OS project

Expected behaviour and actual result.

Test should run with result:
Creating Eventmanager(dgraph) for confref
storing 250 events to dgraph
done after 6.1 secs

Which works repeatably with version v20.03.0

In v20.07.0 the result may be e.g.:

<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNKNOWN
	details = "strconv.ParseInt: parsing "btw": invalid syntax"
	debug_error_string = "{"created":"@1597214956.558346000","description":"Error received from peer ipv6:[::1]:9080","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"strconv.ParseInt: parsing "btw": invalid syntax","grpc_status":2}"
>

for a node that would be parsed o.k. in version v20.03.0 as:

{
  "data": {
    "events": [
      {
        "identifier": "btw2021",
        "uid": "0x6",
        "city": "Dresden",
        "year": 2021,
        "acronym": "BTW",
        "url": "http://portal.confref.org/list/btw2021"
      },

and which actually might parse ok on a second attempt in v20.07.0 but then the server might get unavailable. Please understand that i wouldn’t look into the details of the regression this is just to let you know that IMHO something fishy is going on here. I even modified my start script to clean the whole ~dgraph directory before running the tests.

1 Like

see avoid regression as per https://discuss.dgraph.io/t/dgraph-v20-07-0-u… · WolfgangFahl/ProceedingsTitleParser@ce70013 · GitHub for the modification I’ll now try to travis runs - one against v20.03.0 and one with v20.07.0

The v20.03.0 travis run Travis CI - Test and Deploy Your Code with Confidence was o.k.

Since I am having trouble with the v20.03.0 version in the Mac environment also I am not sure what is going on here and will try to get a smaller footprint of the problem description in the GitHub - WolfgangFahl/DgraphAndWeaviateTest: Test GraphQL based Dgraph and Weaviate systems project.

adds testCities · WolfgangFahl/DgraphAndWeaviateTest@094904d · GitHub now has an example trying to batch - add cities. I am currently limiting to 2000 entries with a batch size of 250. More often than not the MacOS environment will fail

Travis CI - Test and Deploy Your Code with Confidence shows that the Test projects’s python unit tests run fine in a travis environment.

Running the tests against a local Ubuntu server also works.

running v20.07.0 against a local MacOS docker image gives a message like the below one 3 x

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "Socket closed"
	debug_error_string = "{"created":"@1597224818.140421000","description":"Error received from peer ipv6:[::1]:9080","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Socket closed","grpc_status":14}"
>

v20.03.0 also fails -but the message appears only twice - so the problem seems to show up later.

Interesting to see another unavailable issue … I’m wondering if this is connected to my issue with the server suddenly becoming unavailable after a few hours running. I restart and things run again - but I’m also seeing a gradual increase in the memory footprint - it’s almost like the server runs out of resources, making me suspect that something’s not being cleared down, so after a while it can’t accept any more connections.

Whatever it is, it’s a relatively new thing because my code’s been stable until the latest updates. Now it fails every 24-hours. See Corrupt database issue if you think these might be related. If not, sorry for crashing the thread :slight_smile:

I wanted to verify the configuration:

  • macOS
    • client (pydgraph) on macOS host
    • server: dgraph running in docker containers
  • ubuntu
    • client (pydgraph)
    • server: dgraph in docker containers

The clien is always running on MacOS host - I am working against different backends on Mac OS docker or Ubuntu docker to get things running. The Mac OS docker environment is the unstable environment. I did not increase the standard 2 GB RAM Setting yet. Unfortunately I currently also have other issues that need to be fixed and clarified.

It’s quite annoying the message

	status = StatusCode.UNKNOWN
	details = "strconv.ParseInt: parsing "btw": invalid syntax"
	debug_error_string = "{"created":"@1597307044.013794000","description":"Error received from peer ipv4:2.0.0.7:9080","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"strconv.ParseInt: parsing "btw": invalid syntax","grpc_status":2}"
>

also shows up using the Ubuntu server every once in a while. The alpha log then shows:

alpha.log:E0813 08:24:04.014427      54 draft.go:593] Applying proposal. Error: strconv.ParseInt: parsing "btw": invalid syntax. Proposal: "mutations:<group_id:1 start_ts:47 edges:<entity:10631 attr:\"acronym\" value:\"BTW\" value_type:STRING value_id:18446744073709551615 > edges:<entity:10631 attr:\"area\" value_type:UID value_id:10632 > edges:<entity:10631 attr:\"city\" value:\"Dresden\" value_type:STRING value_id:18446744073709551615 > edges:<entity:10631 attr:\"confSeries\" value_type:UID value_id:10630 > edges:<entity:10631 attr:\"country\" value:\"Germany\" value_type:STRING value_id:18446744073709551615 > edges:<entity:10630 attr:\"dblpId\" value:\"https://dblp.org/db/conf/btw/\" value_type:STRING value_id:18446744073709551615 > edges:<entity:10631 attr:\"endDate\" value:\"2021-01-01\" value_type:STRING value_id:18446744073709551615 > edges:<entity:10631 attr:\"event\" value:\"btw2021\" value_type:STRING value_id:18446744073709551615 > edges:<entity:10631 attr:\"eventId\" value:\"btw2021\" value_type:STRING value_id:18446744073709551615 > edges:<entity:10630 attr:\"id\" value:\"btw\" value_type:STRING > edges:<entity:10632 attr:\"id\" value:\"\\002\\000\\000\\000\\000\\000\\000\\000\" value_type:INT > edges:<entity:10631 attr:\"identifier\" value:\"btw2021\" value_type:STRING > edges:<entity:10631 attr:\"lookupAcronym\" value:\"BTW 2021\" value_type:STRING > edges:<entity:10630 attr:\"name\" value:\"Datenbanksysteme f&#252;r Business, Technologie und Web Datenbanksysteme in B&#252;ro, Technik und Wissenschaft\" value_type:STRING > edges:<entity:10631 attr:\"name\" value:\"Datenbanksysteme f&#252;r Business, Technologie und Web Datenbanksysteme in B&#252;ro, Technik und Wissenschaft\" value_type:STRING > edges:<entity:10631 attr:\"source\" value:\"confref\" value_type:STRING > edges:<entity:10631 attr:\"startDate\" value:\"2021-01-01\" value_type:STRING > edges:<entity:10631 attr:\"submissionExtended\" value:\"\\000\" value_type:BOOL > edges:<entity:10631 attr:\"url\" value:\"http://portal.confref.org/list/btw2021\" value_type:STRING > edges:<entity:10632 attr:\"value\" value:\"Computer Science\" value_type:STRING > edges:<entity:10631 attr:\"year\" value:\"\\345\\007\\000\\000\\000\\000\\000\\000\" value_type:INT > metadata:<pred_hints:<key:\"acronym\" value:SINGLE > pred_hints:<key:\"area\" value:SINGLE > pred_hints:<key:\"city\" value:SINGLE > pred_hints:<key:\"confSeries\" value:SINGLE > pred_hints:<key:\"country\" value:SINGLE > pred_hints:<key:\"dblpId\" value:SINGLE > pred_hints:<key:\"endDate\" value:SINGLE > pred_hints:<key:\"event\" value:SINGLE > pred_hints:<key:\"eventId\" value:SINGLE > pred_hints:<key:\"id\" value:SINGLE > pred_hints:<key:\"identifier\" value:SINGLE > pred_hints:<key:\"keywords\" value:LIST > pred_hints:<key:\"lookupAcronym\" value:SINGLE > pred_hints:<key:\"name\" value:SINGLE > pred_hints:<key:\"ranks\" value:LIST > pred_hints:<key:\"source\" value:SINGLE > pred_hints:<key:\"startDate\" value:SINGLE > pred_hints:<key:\"submissionExtended\" value:SINGLE > pred_hints:<key:\"url\" value:SINGLE > pred_hints:<key:\"value\" value:SINGLE > pred_hints:<key:\"year\" value:SINGLE > > > key:\"01-956623720165124759\" index:3636 ".

The message does not make any sense to me. Why should any of the values starting with “btw” interpreted as integer?

You’ll even find this now in travis log:
https://travis-ci.org/github/WolfgangFahl/ProceedingsTitleParser/jobs/717536585

Note how this is only for one of the three runs python 3.7 and 3.8 run fine. I doubt that the issue is acutally connected to the python version. I’ll try a rerun to proove this. It looks like this behavior is totally unpredicatable so I’d suspect a timing issue.

The travis failure for python 3.6 is repeatable see Travis CI - Test and Deploy Your Code with Confidence - it’s not clear why the problem does not show up in 3.7 or 3.8

Hi @WolfgangFahl, can you provide us some schema? It looks like you have some integer field in your schema that needs to be parsed to int.

Looks like acronym is the convict here as can be seen from the log message ( attr:“acronym” value:“BTW”). From the source code too, it looks like it is defined as string. This makes me think if that shcema is used, why does dgraph trying to parse it to int.
But it is working with python3.7 and python3.8 as you said. So maybe it could be a python or dgraph client issue.

It’s totally random. Sometimes the python3.8 code fails and sometimes another one. So far it never happened that all thee were succesful. IMHO there is something really fishy going on here. It does not make any sense to try to parse btw… as int. The error message is proably misleading.

1 Like

For Python failing. This is on macOS? Is it installed with brew? or using pyenv?