The size of my dgraph data directories is growing at a rate that doesn’t make sense given the content I am creating. It’s likely that there is some bug in my application code that is failing to delete some edges and leaving them stranded, thus cluttering the database with unneeded data.
Ideally, I would have a way to:
- Get a list of nodes with more than X outbound predicates (for all predicates)
- Get the total count of a given predicate type in the system, and ideally an edge count by predicate for all predicates in the system
Is there a way to accomplish this currently?
EDIT: Also, any other tips on analyzing fraction of disk usage to indexes vs data would be great. My compressed export *.rdf.gz is ony 186K, but my ./p directory is 119M and ./w directory is 70M. Scaling proportionally to production scale data may take us into terabyte territory, so trying to figure out what may be going on. Even when I write, and then delete / replace edges, the disk usage only seems to go up. Not to say it isn’t some bug in my program, but it’s hard to tell where the source of growth is.