Dgraph start to panic out of nowhere on Darwin

Report a Dgraph Bug

What version of Dgraph are you using?

Dgraph version   : v20.11.0
Dgraph codename  : tchalla
Dgraph SHA-256   : 8591d39e46caf64c1f8b9d8c8dcdb1afe7a61bbe5d9e08cd597faa4978717544
Commit SHA-1     : c4245ad55
Commit timestamp : 2020-12-16 15:55:40 +0530
Branch           : HEAD
Go version       : go1.15.5
jemalloc enabled : false

Have you tried reproducing the issue with the latest release?

Just tchalla

What is the hardware spec (RAM, OS)?

Darwin - 11.1 (20C69) - Bigsur
32 GB DDR4

Steps to reproduce the issue (command/config used to run Dgraph).

Just start Dgraph.
First, it was working fine for days - I was using GraphQL. Now even removing the directories(rm -fr p w zw) it panics.

Kind of solved by stopping all related dev envs. Stopping the actual Dgraph instances, removing the folders, stopping the React dev, stopping the JS server, stopping the GraphiQL software. And it looks like it was something related to the Subscription. Not sure.

  "errors": [
      "message": "Internal Server Error - a panic was trapped.  This indicates a bug in the GraphQL server.  A stack trace was logged.  Please let us know by filing an issue with the stack trace."
I1229 22:58:18.615857   90384 log.go:34] Writes flushed. Stopping compactions now...
I1229 22:58:18.625141   90384 log.go:34] DropPrefix done
I1229 22:58:18.625160   90384 log.go:34] Resuming writes
I1229 22:58:18.625226   90384 index.go:942] Rebuilding index for attr dgraph.graphql.p_sha256hash and tokenizers [exact]
I1229 22:58:18.626944   90384 groups.go:159] Server is ready
I1229 22:58:18.626962   90384 access_ee.go:389] ResetAcl closed
I1229 22:58:18.626976   90384 access_ee.go:311] RefreshAcls closed
I1229 22:58:18.695223   90384 log.go:34] Rebuilding index for predicate dgraph.graphql.p_sha256hash (1/2): Sent data of size 0 B
I1229 22:58:18.712526   90384 mutation.go:188] Done schema update predicate:"dgraph.graphql.p_sha256hash" value_type:STRING directive:INDEX tokenizer:"exact" 
I1229 22:58:18.712570   90384 draft.go:124] Operation completed with id: opIndexing
I1229 22:58:18.829679   90384 graphql.go:39] ResetCors closed
I1229 22:58:18.830431   90384 run.go:819] Updating cors from subscription.
I1229 22:58:18.830447   90384 run.go:843] Updating cors origins: [*]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x4a05c8d]

goroutine 239 [running]:
github.com/dgraph-io/dgraph/graphql/schema.(*field).HasCustomDirective(0xc0000d1320, 0x8, 0x10)
	/ext-go/1/src/github.com/dgraph-io/dgraph/graphql/schema/wrappers.go:965 +0x4d
github.com/dgraph-io/dgraph/graphql/resolve.validateCustomFieldsRecursively(0x5522080, 0xc0000d1320, 0x9, 0x10)
	/ext-go/1/src/github.com/dgraph-io/dgraph/graphql/resolve/resolver.go:555 +0x4c
github.com/dgraph-io/dgraph/graphql/resolve.(*RequestResolver).ValidateSubscription(0xc00038cf00, 0xc000724000, 0x2, 0x0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/graphql/resolve/resolver.go:544 +0x129
github.com/dgraph-io/dgraph/graphql/subscription.(*Poller).AddSubscriber(0xc0004d9050, 0xc000724000, 0xc006403b88, 0x0, 0x0, 0x0)
	/ext-go/1/src/github.com/dgraph-io/dgraph/graphql/subscription/poller.go:71 +0x87
github.com/dgraph-io/dgraph/graphql/web.(*graphqlSubscription).Subscribe(0xc000010350, 0x5507b00, 0xc0000d0ab0, 0xc00130c000, 0xbc, 0xc00061a610, 0x7, 0xc0000d0a50, 0x8, 0x203000, ...)
	/ext-go/1/src/github.com/dgraph-io/dgraph/graphql/web/http.go:173 +0x1a7
github.com/dgraph-io/graphql-transport-ws/graphqlws/internal/connection.(*connection).readLoop(0xc0000d0810, 0x5507a40, 0xc000110c80, 0xc00000f900)
	/go/pkg/mod/github.com/dgraph-io/graphql-transport-ws@v0.0.0-20200916064635-48589439591b/graphqlws/internal/connection/connection.go:191 +0x71d
github.com/dgraph-io/graphql-transport-ws/graphqlws/internal/connection.Connect(0x550a0c0, 0xc0000ea2c0, 0x54d6fe0, 0xc000010350, 0x0, 0x0, 0x0, 0xced0263f)
	/go/pkg/mod/github.com/dgraph-io/graphql-transport-ws@v0.0.0-20200916064635-48589439591b/graphqlws/internal/connection/connection.go:97 +0x209
created by github.com/dgraph-io/graphql-transport-ws/graphqlws.NewHandlerFunc.func1
	/go/pkg/mod/github.com/dgraph-io/graphql-transport-ws@v0.0.0-20200916064635-48589439591b/graphqlws/http.go:33 +0x1bc
W1229 22:58:21.109191   90378 sentry_integration.go:140] unable to read CID from file /var/folders/sw/r5b83mfx5xsgz5rxczw519y40000gn/T//dgraph-alpha-cid-sentry open /var/folders/sw/r5b83mfx5xsgz5rxczw519y40000gn/T//dgraph-alpha-cid-sentry: no such file or directory. Skip
[Sentry] 2020/12/29 22:58:21 Sending fatal event [eea86ca9a8764dbab8de18730b0b64a1] to o318308.ingest.sentry.io project: 1805390
[Sentry] 2020/12/29 22:58:22 Buffer flushed successfully.
➜  ~ 

Error in the React application


Dgraph is creating a new folder, which is weird.

Screen Shot 2020-12-29 at 23.08.38

Lol, here I thought that mysterious t folder was intentional. Should have mentioned it in my last report.

The t folder is intentional. It was introduced recently to store temporary buffers.

Reference: https://github.com/dgraph-io/dgraph/blob/a57470e06e88530897e651c064ea70e42dbee6c1/dgraph/cmd/alpha/run.go#L106

1 Like

Nice, but folders with a single character are really confusing. Why we don’t use the whole name?
postings*, wall, zwall, tmp. It is more human friendly.

1 Like

Yes, Even I feel that will be nicer. Tagging @ahsan as he introduced this folder.

The panic seems to have happened due to some ongoing subscription query. The subscription query did not exist in the newly started dgraph instance and so, it panicked. Can you try shutting down any ongoing subscription queries and then restarting dgraph ? The subscription queries could again be restarted once schema has been provided to new dgraph intsance.

Yeah, what is funny is that the panic starts using a normal query. I have some subscriptions but only that one triggers the panic easily. Its weird. And sometimes is from nothing.

I’ve done this.

It first happened in a fully filled DB.

What are the steps to reproduce this ?

I can reproduce via https://github.com/MichelDiz/Grabber

It happens in two forms. With a full filled DB and a DB started from scratch.

With the empty DB I can reproduce via https://github.com/MichelDiz/Grabber/blob/49a57e585eb8d588a51d81269e5f94014193e236/googlelogin/src/pages/Settings/components/Settings/index.jsx#L23 - Just start the Deno server and the React one. Go to the Settings page and it happens.

With a full filled DB is a bit different to reproduce. It happened when I tried to add this mutation https://github.com/MichelDiz/Grabber/blob/main/src/graphql/addLiveBroadcast.graphql But only when the input is malformed. So instead of a malformed error, it throws a panic. And when it happened in the first time I couldn’t recover the cluster.