Finding dead / orphaned / stranded nodes

tamethecomplex · November 1, 2017, 3:16pm

Hi all,

The size of my dgraph data directories is growing at a rate that doesn’t make sense given the content I am creating. It’s likely that there is some bug in my application code that is failing to delete some edges and leaving them stranded, thus cluttering the database with unneeded data.

Ideally, I would have a way to:

Get a list of nodes with more than X outbound predicates (for all predicates)
Get the total count of a given predicate type in the system, and ideally an edge count by predicate for all predicates in the system

Is there a way to accomplish this currently?

EDIT: Also, any other tips on analyzing fraction of disk usage to indexes vs data would be great. My compressed export *.rdf.gz is ony 186K, but my ./p directory is 119M and ./w directory is 70M. Scaling proportionally to production scale data may take us into terabyte territory, so trying to figure out what may be going on. Even when I write, and then delete / replace edges, the disk usage only seems to go up. Not to say it isn’t some bug in my program, but it’s hard to tell where the source of growth is.

peter · November 1, 2017, 11:32pm

Those are all great ideas, it’s definitely the kind of feedback that we’re interested in. There isn’t really a way to do anything like that at the moment, although it would be possible to build an analysis tool to get that sort of information (which dgraph is offline).

Deleting and replacing edges while disk usage is going up is an issue that we’ve seen before. It’s probably not a bug in your code. Dgraph’s current value log garbage collection is very simple right now, it just executes once per 10 minutes. When the garbage collection triggers, you should see disk usage come back down. We plan to make the value log garbage collection a bit smarter in upcoming releases.

mrjn · November 2, 2017, 4:05am

At the same time, we’re building tools in Badger (the embedded DB), which can run fast offline GC. So, that’d be one way to quickly fix the space usage, though, not a substitute for the online GC that we’re going to improve on.

system · December 2, 2017, 4:05am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inspecting Dgraph's disk usage, total nodes and edges information Dgraph	1	448	May 8, 2020
Calculate how much disk space filtered nodes take up? Dgraph dgraph	1	692	December 6, 2021
High disk space usage by DGraph Dgraph	3	982	July 24, 2019
Suddenly increase pace of disk usage in Dgraph Dgraph	2	530	August 22, 2018
The dgraph v20.11 running in production suddenly crashed, the error message is in the attachment, please help urgently Dgraph dgraph	21	1722	April 1, 2021

Finding dead / orphaned / stranded nodes

Related topics