Is it possible to have a full distributed database?

I just discover Dgraph recently, and so far it look really amazing, I’m working on a new product to connect blockchain technologies to internet technology and I need to share some data over many node without any central point. I just wonder if dgraph could be working in this conditions. I saw there is a part where it’s possible to create as many clusters as we want with dgraph zero and swarm or kubernetes but it doesn’t exactly fit my needs.

In my case I have an application that run on every computer (user’s computers) and those applications will need to send data over the network to propagate to the other computers. My thought where to create a dgraph zero on a special server and then every time the user start the application it will create a dgraph server that will connect to this zero instance. When the local server is ready, the application will read/write in the db and all modification will be propagated to the network but I wonder what happen with the data if I have 1000 application running and then the next minute just 100. Those dgraph server may not be up all the time and can vary from 10 to 1000+ so I would love to have some feedbacks on this kind of use case, if dgraph can be a decent match for this use

Thanks :slight_smile:

Hi @antho1404, just wondering - what is your motivation for running dgraph locally on user’s machines rather than in the cloud? Are you planning to actually host their data, or will it be hosted on an independent service?

I’m looking for a solution right now that maximizes user data privacy, and all of these blockchain technologies (like Blockstack) look intriguing… but I haven’t come to an understanding of whether similar user privacy could be implemented for apps running on top of dgraph. I was wondering if our goals might be similar.

The plan is to have no cloud and only have the database running on every users. The users will own the database and the company behind the product will just be another user in the network. So we will not host any data not just ourself, all the users + company (another user) will host those data.

Of course privacy is important and to do that cryptography plays a big role and all the data needs to be encrypted in a way that only the entity that can have access to the data can read it and for all the other data will be encrypted so meaningless. Of course this needs to be implemented on an application on top of the database that will write in the database some encrypted data so all the database will be filled with data that will be only readable by one/some users of the database.

I hope I answer your questions and if you have more let me know :slight_smile:

This would be incredible, and get rid of so much legal complexity associated with hosting user data.

I wonder if this would scale… Say if you have 1000 users accessing a single cluster and you have replication turned on, I guess the data would exist in 1000 different places? Also, seems it may be a challenge to reliably fire up a dgraph server on every user’s machine because of differing user environments. I wonder if something more like this could accomplish your goal:

  1. When a new TEAM is created (which will have many users), a new dgraph cluster is created for that team. But the cluster is not hosted by you (the application provider)… it is hosted on some group of servers you do not know the location of and do not have access to
  2. At the same time the cluster is being provisioned, the latest version of your app is run on the same machines running the dgraph cluster. This way the application has local network access to a dgraph instance for rapid app-to-database communication.

That way you accomplish your goal of not hosting any users data, but also getting rid of the technical problem of trying to run a dgraph instance on each user’s machine. (Probably also wouldn’t be practical on mobile). All the user would need is a browser.

Just put together a diagram of an initial thought of how this might work…

Another idea… let’s say you decide it’s too hard to host the data with a 3rd party and want to host it yourself. But per

Maybe you can host all of the data yourself, but encrypted and decrypted using a key you do not have, so you have the data but it cannot be read:

thanks for all those ideas, and clear schema, it’s awesome. Those solutions are pretty much valid but there is always the need to trust a 3rd party for the data and this is the stuff I wand to get rid off for many reasons like data control (even they are encrypted) or single point of failure (everything is down if the 3rd party service is down).

I really like the idea of a group of users (the team) in that case it’s really nice in order to have light application that don’t need a full synchronisation of the database but they rely and trust a 3rd party, but I wonder if maybe there is a way to control with dgraph the data we want to sync and have a “light” db in that case even on mobile it could be possible to have a light db that sync only the data the client on the phone needs.

Anyway thanks a lot @tamethecomplex to take the time and I would love to continue this discussion in direct if you want with skype or something like that, if you want you can reach me at anthony@mesg.tech :wink:

also about the scaling that was actually my concerns because I can understand that we can have many “clusters” for a single database (like maximum 100) but I don’t really know how this will work with a cluster this is one of my biggest concern. Concerning the environment this is not really a problem anymore with some tools like docker

So I was thinking more about this… If you wanted to have a user group-hosted cluster with no 3rd party, and the app would be hosted strictly on user owned devices (or maybe devices leased through a separate agreement with a private hosting provider), is this more what you had in mind?:

Alternatively, if you wanted to allow users in the group who don’t necessarily host, or needed to access the app from devices which are not a cluster node, maybe:

No worries! I actually really need to find a solution to this problem as well, and it helps to talk it through :slight_smile:

Possibly in the future, or maybe on the dgraph slack channel, but would like to keep having the discussion here in hopes that we can tie it in to a couple threads already started (Multi-tenant solution for Dgraph and Ultimate user privacy scheme) for a holistic solution to this problem possibly involving new dgraph features :slight_smile:

Sorry for the late answer, your first schema on the last message is what I was thinking just few things :

  • no license verification (everyone can put some data, maybe with a rate limit or a fee they have to pay everytime to avoid having some people that flood the database
  • multiple dgraph zero on different servers around the world. From what I understood the dgraph zero are the resolvers for the network that can do the peer discovery so I think no need to have one on each user especially if there is no user there is no resolver so the first user don’t know where to connect to

I like your idea of load balancing between all the user’s applications, at first I was thinking to just have to trust one user (that can be your own computer or server) and connect to that one, but use the one already on the network is a nice idea.

I still wonder if in this situation dgraph can be a good solution because users will come and leave the network so we might really scale from 10 to 1000 and go back to 10 so I’m not sure it’s really working in that case. From what I read and understood it’s more like it’s distributed between your servers and those will stay up 99% of the time and the number will be quite low (1-100)

Just curious - how many users are you thinking you’ll have?

Yeah, that’s my understanding as well. I still wonder if you can accomplish what you want, just with a smaller number of nodes run by “super users” who have a desktop PC or something that is connected to the internet most of the time, as opposed to laptops or mobile whose connections are transient. But there would be some mechanism to ensure that the host being connected to is a trusted member of the network.

The idea is to run a network that connect all the different technologies like web, blockchain and IoT together in a way that you can easily interact with it like ifttt or zapier website but with technology agnostic so the amount of user can be quite huge and moreover we will provide for companies and developers the tools to let them connect there own technologies to this network and like that let them create some full applications but in a decentralised way where the the nodes of the network will process this application and get rewards for it. It’s like you can run yourself a part of zapier or ifttt and get the incentive for every processing you do for them and this based on any technology not only web. So we might have thousands and thousands of users / companies, hopefully millions :slight_smile:

Agree with the idea to have “super users”, this as I see it, it will be some users who want to run the software in a high availability server to let it run non stop in order to get more rewards. Those will be some kind of rare users for a massively adopted service but at the beginning of the network it will be most of users. But the idea is not to belongs to any super user but instead to many other users (any kind of user) and have a consensus to check that everyone is not cheating and participating fairly in the network

Also @tamethecomplex if you have some similar problems I would really recommend you to look into blockchain technologies, Ethereum technology is quite amazing and can do most even maybe all the stuff discussed here

Interesting - I need to get up to speed on what ifttt and zapier are capable of. I’m not familiar with this product category.

Seems this is the way a network like this would stabilize. A handful of super stable nodes handling most of the traffic, and a long tail of net data consumers.

I’m definitely interested in blockchain, but don’t understand it very well. It seems like the blockchain is great for tracking ownership of valuable assets, and small pieces of code like smart contracts which must be immutable. It seems like blockchains have low throughput currently, so storing large amounts of data on the blockchain is prohibitive?

I was interested in the startup Blockstack as a “parallel internet where you own your data”. It seems though that building an application that requires a central database running on a server is not currently possible. If I understand correctly, the only available persistence for dapps is simple file storage / retrieval. Since databases rely on a persistent process on a single server with very low latency access to the filesystem, I’m not sure what a dapp solution to this would be? I got this response on the Blockstack forum when posting about this question. Seems they’re looking into it. https://forum.blockstack.org/t/persisting-data-to-a-database-for-a-blockstack-app/3650

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.