Make Multi-Tenancy open source

Just came here to share my thoughts on this matter.

I agree with most of the above, than other OSS databases come with multi-namespace/schema support out of the box and that this is something which DGraph IMHO should support too.

Some others have also mentioned, and I would agree with this point, that ACLs is an acceptable feature to maintain as enterprise-only. I would prefer it not be, but I understand why you would want to make it so.

From my use-case perspective, we have data which is owned by different accounts, and when querying DGraph we only want to return the data owned by them. Our users don’t get raw access to the DGraph instances, they go through our application services which apply authentication/authorisation rules, as well as handling other non-graph data. As such we don’t require ACLs, just a way to only query data belonging to that user. Our current approach is to add a predicate to every node with their account ID, however this is starting to slow down our queries when we have 1M+ nodes, even if the account only has a few hundred themselves.

We’re about to start a new project and would have preferred to use DGraph. We’ve been putting off the project in priority to wait for multi-tenancy to come, but now we’re most likely going to go down a different approach.

We’re still a startup, <20 staff including a few developers, so the opportunity for enterprise licenses is a possibility for the future, but not now. The only way smaller companies are ever going to invest in an enterprise license is if they can use it in anger for 12-18 months and have success using it; this issue is preventing our company from doing so.

1 Like

I agree with most of the comments above, and I definitely think that multi-tenancy should be in the community edition. I also think that ACLs should be in the community edition, since that’s also a fairly fundamental part of databases.

In general, I think enterprise versions should be for things like:

  • 24x7x365 management / support
  • Improved resource utilization (e.g. more cores / parallel processes)
  • Quicker patches for bugs/updates
  • Handling multi-datacenter replication (learner nodes falls into this category)

These kinds of features often make sense financially to go with an enterprise contract because they would cost the company more to go with a free version than the enterprise version.

However, as many have pointed out, lack of multi-tenancy and ACLs can be a barrier for even adopting the technology in the first place, thus losing out on an opportunity for a future customer when they really see the value in the product for their business (which I believe many businesses would do once they gave it a chance).

This is especially important with a product like Dgraph because in a way it’s potentially paradigm-changing for many businesses, and as mentioned above, it could just be a non-starter because of the lack of multi-tenancy.

1 Like

Just to chime in here, MT was always meant to be a proprietary feature – just like distributed aspect of Dgraph was always meant to be open-source.

The rationale for Dgraph proprietary features has been that these can be worked around with more user-side code, so they are not absolutely necessary for functioning of the DB. That rationale still holds, and therefore we don’t have any plans to make MT or ACLs open source.

Both of these, along with all other enterprise features, are however, present in Dgraph Cloud. That’d be the recommendation for anyone who’d like to use these features.

1 Like

As a new user that looks for a DBMS for a world-size project, I can say that this statement doesn’t take into consideration real world needs for this kind of developement. I was pretty much convinced by your product, but I’m definitly not willing to pay 199$/month just to get this basic feature and i’m not willing to bloat my codebase with DBMS’s job to be able to use the community version. I’m not sure that someone would want to do that either.

You say that it’s available in Dgraph Cloud, but according to your princing plan MT is only available for dedicated servers.

There are not so many companies that are willing to start by spending so much on a new product, considerating learning curve, the proposals of your competitors and the obvious need for these features for watever serious project. Pet projects, maybe, can overcome lacking this features, but to run a real business…

Or, at least, make it available to the shared plan. I can considere investing time and efforts to migrate from good old RDMS paradigm to Graph based databases, but I can’t decently spend so much money and resources to build an advanced proof of concept.

Many developpers will agree on that point.

In my case, as a result it’s a no. I hope that you will change your mind about it, Dgraph sounds really promising.

3 Likes

Just to tackle this point. The shared instances start at $9.99/mo. One can spin up multiple of those as needed, each instance acting like a graph silo, to get MT. This would still be cheaper than running an Amazon RDS or equivalent.

So, since it’s unclear with the princing page, what would be the cost of a shared instance with only MT enabled ? On what criterion that price is based on ? You must have an idea. And if you have an idea, why not filling the product page with this critical option cost ?

My guess is that you know that this feature is essential, and you try to convince your users to become customers, or your customers to spend more and become better customers. You run a business, you need to get more customers and free users doesn’t pay bills.

But the cost of getting your product is not only the per month price. It’s time investment, employees training, many trials and errors. And before knowing if it will worth the investment, you must allow your users to be able to produce some proof of concept to convince their company, or themself, that your product is the way to go. Since it’s not very clear what would be the cost of the minimal features that you can provide, and that any serious DBMS natively provide, even in open source world, you may discourage users to only considere your product.

If you ease the creation of PoC for your users, you will get more customers.

2 Likes

We are currently considering DGraph as an option to switch stack from PostgreSQL / Clickhouse to Graph Database / Clickhouse bundle.

MT as open-source option would definitely favor the decision to choose DGraph. I would share the thought process here.

While having MT as open-source would be great we are ok with writing our own crutches to patch the functionality we need. The main question is whether you plan to provide all the performance / clustering / core functionality as open-source (machine learning features down the road included).

The sensitive nature of the data we work with limits us to hosting it ourselves. On the other hand both for me and co-founders of fellow startups the speed of development is more important than costs associated with cloud offerings. Ease of setup is the major driver for using managed k8s for example.

I believe you wouldn’t loose lots of DBaaS customers if you offer all the functionality in the open source version since from my experience lower maintenance cost of cloud offering is the decisive factor with faster development speed being the second most important. For those working with sensitive data and forced to manage the cluster themselves not having the functionality might be a major roadblock.

Personally, I would like to know what are the chances of core functionality being locked behind cloud offering since that is the main issue for us. Rolling our own MT / ACL while cumbersome is still possible.

TIA

I’ve been so excited about Dgraph, started my new project with it, and then I ran into this blocker.

I don’t need ACLs. I agree that ACLs make sense as an enterprise feature. I do however need to create multiple logical databases within a single Dgraph instance.

The workarounds suggested above won’t work for me:

  • I can’t host multiple Dgraph instances because my project will have 10000+ logically separate databases.
  • I can’t go the enterprise route because my project is open source.

I think this will be a blocker for many open source projects that want to integrate with Dgraph. Supporting other open source projects seems like it goes well with Dgraph’s 1st core value of “Be Open Source”.

If it would be approved, I am willing to do the work and submit a pull request to bring namespaces to the community version of Dgraph (without ACLs).

How would you secure namespaces without ACL? Essentially you could never allow someone to have access to change the schema for only their namespace without also giving them the access to change the schema for every namespace.

I don’t need to secure the namespaces, I just need logical separation. I need to be able to upload a schema for one namespace without blowing away the schemas for the other namespaces.

I think the point of the ACL is to secure the namespaces, so when I say I don’t need the ACL, I’m saying I don’t need the security features.

I think this lines up with what others are saying about wanting separate namespaces for separate apps in local development, or separate test and dev namespaces.

Logical database separation is a critical feature for local development and testing even if you don’t use it at all in production.

1 Like

Yeah, I thought about putting postfixes on my apps, so:

App1: study
App2: music

Then you would have:

type User_study {
  id: ID!
  username: String
  ...
}
type User_music {
  id: ID!
  username: String
  ...
}

I guess you would query like queryUser_study, although I am not sure what the naming restrictions would be (can you have _?)… maybe even prefixes somehow…

Not ideal, but if you use a query builder, this would not be terrible.

J

This is what I actually recommend, that way you can still query across the singular database if needed, you cam’t query across the enterprise namespaced database at all.

As far as naming restrictions, the spec limitation is that types and fields in GraphQL match the regex, [a-zA-Z][a-zA-Z0-9_]*.

It is conventional that they don’t mix cases bust not required. (Eg, lowercase, UPPERCASE, snake_case, camelCase, PascalCase)

But they cannot contain periods (.) or other characters outside of the above such as chinese characters.

Ok, then I would definitely do prefixes:

type study_User {
  ...
}
type music_User {
  ...
}

J

It may be interesting to note that dgraph internally prefixes all predicates/types with the namespace individually with a number, so prefixing your predicates/types yourself is virtually identical to what dgraph would do at the storage layer.

Interesting. So internally:

<0x12345> <dgraph.type> "PREFIX_User"
<0x12345> <PREFIX_User.email> "me@me.com"

So basically an orphan node would still be mine because of the Type prefix:

<0x12345> <PREFIX_User.friends> <0xBAD_ID>

:neutral_face:

J

Here is how I understood it in a simplified understanding.

So let’s say there are two namespaces. Namespaces are only identified with a number, no name. So let’s have 1 and 2. When you send a query or a mutation in DQL or your namespaced GraphQL endpoint, Dgraph under the hoods knows what namespace you are working with, either by mapping your GraphQL endpoint (two namespaced databases would have separate GraphQL endpoints) or by the inspecting which user is logged in and what namespace that user is granted to, or if it is a guardian, then there is a header I believe that indicates which namespace you are using, similar to use db SQL logic.

Then when processing the query/mutation everything gets namespaced appropriately. such that when using namespace 2 a query like:

{
  n(func: type(Post)) {
    title
  }
}

would get translated to:

{
  n(func: type(2_Post)) {
    2_title
  }
}

But you cannot query for these namespaced data together. Doing something like:

{
  n(func: type(1_Post, 2_Post)) {
    1_title
    2_title
  }
}

would not work from any client because it would get a namespace applied such that it would be intepreted as:

{
  n(func: type(2_1_Post, 2_2_Post)) {
    2_1_title
    2_2_title
  }
}

And bulk loading and live loading somehow accounts for namespacing too so that it adds the prefix to every type/predicate I do believe.

That is my understanding of how it works in a very simplified view. So any user can in essence copy this, but would not have separately controlled schemas so that dropping data only affects a single namespace. This is why the drop data commands were modified as well when namespaces were introduced. You can’t just go willy-nilly dropping EVERYTHING for a database because it could screw up other namespaces.

uids don’t have to be unique across namespaces from what I understand. Now leasing is still done globally I think so that if namespace 1 client leases 20 uids and then namespace 2 client leases 20 uids there would be a total of 40 uids leased. But it is possible for namespace 1 and 2 to both have data mapped to the same uid. So deletes of the form S * * have to account for namespaces as well. uid 0x2a might be a user for namespace 1, but in namespace 2 it could be any other type of node with different fields. Which explains why the expand(_all_) has to work off from types, because otherwise it would be getting data across namespaces potentially theoretically.

1 Like