Finding dead / orphaned / stranded nodes

Hi all,

The size of my dgraph data directories is growing at a rate that doesn’t make sense given the content I am creating. It’s likely that there is some bug in my application code that is failing to delete some edges and leaving them stranded, thus cluttering the database with unneeded data.

Ideally, I would have a way to:

  1. Get a list of nodes with more than X outbound predicates (for all predicates)
  2. Get the total count of a given predicate type in the system, and ideally an edge count by predicate for all predicates in the system

Is there a way to accomplish this currently?

EDIT: Also, any other tips on analyzing fraction of disk usage to indexes vs data would be great. My compressed export *.rdf.gz is ony 186K, but my ./p directory is 119M and ./w directory is 70M. Scaling proportionally to production scale data may take us into terabyte territory, so trying to figure out what may be going on. Even when I write, and then delete / replace edges, the disk usage only seems to go up. Not to say it isn’t some bug in my program, but it’s hard to tell where the source of growth is.

Those are all great ideas, it’s definitely the kind of feedback that we’re interested in. There isn’t really a way to do anything like that at the moment, although it would be possible to build an analysis tool to get that sort of information (which dgraph is offline).

Deleting and replacing edges while disk usage is going up is an issue that we’ve seen before. It’s probably not a bug in your code. Dgraph’s current value log garbage collection is very simple right now, it just executes once per 10 minutes. When the garbage collection triggers, you should see disk usage come back down. We plan to make the value log garbage collection a bit smarter in upcoming releases.

1 Like

At the same time, we’re building tools in Badger (the embedded DB), which can run fast offline GC. So, that’d be one way to quickly fix the space usage, though, not a substitute for the online GC that we’re going to improve on.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.