Multi Tenancy in Dgraph

This document is still in review. The details might change over time.

Motivation

Dgraph currently is single-tenant, which means you can run only a single namespace inside a dgraph instance. Most of the databases support multi-tenancy (or multiple namespaces) which allows for logical partitioning of the database. In PostgreSQL, each namespace is called a database and all data is associated with a namespace.
Multi-tenancy or logical partitioning allows multiple namespaces to run on a single dgraph instance in an isolated manner. The computing resources are shared between the namespaces but they are logically separated from each other.

Overview

For Dgraph, the namespace separation in multi-tenancy will be a logical separation, and it is breaking change since we will be changing the structure of the keys. Dgraph will prefix predicates and types with namespace while storing it in badger. Multi-tenancy will require ACLs to be enabled since a user should be able to perform queries/mutations in multiple namespaces but should not have access to all the namespaces.

Assumptions

  1. Each namespace acts as a logical silo. The data stored in one namespace will not be accessible by another namespace.
  2. Each user is part of a single namespace. Cross namespace queries are not allowed. (we might support aggregation queries across multiple namespaces but it is out of the scope of this RFC)
  3. A user can be part of multiple namespaces but the user has to be created separately for each namespace.
  4. Each new cluster creates two namespaces. system namespace and default namespace.
    • system namespace stores the metadata about the namespaces and information about super users. Superusers are the users who can create and delete the namespaces.
    • default namespace is the default namespace for the users. The user data will be stored in this namespace unless a different namespace is specified.
  5. For open-source users, dgraph will still have system and default namespace. The system namespace will be empty and everything is stored in the default namespace.

Implementation Details

Most of the components of dgraph will require changes. This section lists all the changes required.

Representation of keys

Currently, for a given RDF <0x01> <follower> <0xab> . it is represented in badger as key=<follower, 0x01> . With multi-tenancy, it will be stored as key=<namespace><namespace-delimiter><follower, 0x01>. This is a breaking change and all the data has to be migrated to this new format (maybe a migration command in alpha?).

The problem with keeping the same namespace is that we cannot drop the namespace if it doesn’t have a prefix.

Namespace and Predicate separator

We will be using the byte 30 as the namespace separator because it’s ASCII standard separator and a user wouldn’t have access to it. This namespace separator should be restricted at the lexer level. The data sent by the user should never contain the namespace separator.
NamespaceSeperator = byte(30)

Access Control Lists (ACL)

This section including the diagram is out of date. @ibrahim to update this.
Multi-tenancy will depend heavily on ACLs. The kind of queries/mutations a user can perform will depend on the ACLs. By default, the guardians group exists in Dgraph. A user of the guardian group will have access to create a namespace and assign a user to the ACL group. When a new namespace is created a guardians group for the namespace will also be created, called guardians-of-the-namespace. Only the user of the dgraph-guardians group can add/delete a user from the guardian-of-namespace.
Each user can be part of multiple namespaces which means they can be part multiple namespace-guardians group so that they are able to query in multiple namespaces (one namespace at a time)


The dgraph-guardians can create/delete the namespace. namespace-guardians can only query/mutate data in their namespace. They CANNOT delete the namespace they’re part of.

The dgraph.user.namespaces will contains the namespace list of namespaces a user can access.
A user can be a member of multiple namespace-guardians groups but they can perform queries/mutation on a single namespace at a time.

Current ACL Schema

dgraph.xid string @index(exact) .
dgraph.password password .
dgraph.user.group [uid] @reverse .
dgraph.acl.rule [uid] .
dgraph.acl.predicate string @index(exact) @upsert .
dgraph.rule.permission int .

Proposed ACL Schema

dgraph.xid string @index(exact) .
dgraph.password password .
dgraph.user.group [uid] @reverse .
dgraph.acl.rule [uid] .
dgraph.acl.predicate string @index(exact) @upsert .
dgraph.rule.permission int .
dgraph.namespace string @reverse

Creating a new namespace

A Namespace will be created by /alter with payload {"create_namespace": "foo"}. A namespace can only be created by any member of the dgraph-guardians group.

Deleting a namespace

Namespace can be dropped via /alter with payload'{"drop_namespace": "foo"}'. Any member of the dgraph-guardians group can delete/drop a namespace. Note, the members of namespace-guardians cannot delete a namespace, they can only perform queries/mutations.

In badger, drop prefix is a stop-world operation and some operations like memtable flush, compaction, etc are stopped while running drop prefix. If one user runs a drop namespace operation, this could affect the writes of another user.

Queries, Mutations, and Schema.

Each query and mutation will be associated with a namespace. The claims in the token determine the namespace that should be used for the query/mutation. Internally all keys would be prefixed with <namespace> prefix and this namespace will be used for all the queries.
Comment by Manish - Should be done via the keys.go, I think. Also, parseKey should work. So should the prefixPredicate and so and so forth

Similarly, Schema is updated via ALTER command. Every client interaction will tell (through the JWT token) what namespace they want to interact with. By using that information, parseSchema will prefix namespace to every predicate. eg(default-name, derived from keys.go from x package). Then the normal flow happens, zero will decide where this predicate should belong to and MutateOverNetwork happens.

While querying for schema, schema retrieval happens the same way, but when we returning to the user, we will filter the schema respective to the namespace, and trim the predicate name. The namespace part of the predicate key should be removed while returning the schema to the user.

Transactions

Transactions should also be namespace level. A transaction running in namespace “x” should not be affected by a transaction running in namespace “y”. The clients will send namespace using the transaction context. For instance, in dgo, the user can do NewTransactionWithNamespace(foo) and then we use this foo while making requests to dgraph. In case of missing namespace in the context, it will be considered as the Default Namespace

Currently, each transaction is blocked by a watermark in both zero and alpha. The transaction blocking happens when we query for data in the alpha based on the maxAssigned. We will need to separate the blocking mechanism based on the namespace. In order to make namespace level transaction handling, we can create a namespace level watermark and after the commit phase, we can stream the respective maxAssigned of the namespace to all the alpha so the higher readTs of the corresponding namespace will be unblocked to query the data.

Transaction implementation details

UID leasing and transaction timestamps are going to work in the same way.

Every zero oracle and alpha oracle are separated by namespace. After the commit phase, we have to stream the MaxAssignedTs for each namespace. alpha oracle will have waiters according to the namespace.

In alpha oracle.

waiters = map[string] map[uint64] struct

for _, waitersForNamespace := range waiters{
  for startTs, notifyCh := range waitersForNamespace{
        if startTs < MaxAssignedForThisNamespace{
           notifyCh <- struct{}
           // delete the waiters.
        }
    } 
}

In zero oracle, we’ll have a watermark for each namespace.

doneUntil map[string]y.WaterMark

we have two places to send the delta.

  1. During issuing ts. because lower ts may go though commit phase, so here we have to wait for the lower ts complete the transaction to send the delta (mechanism already exist)
func (o *Oracle) storePending(ids *pb.AssignedIds, namespace string) {
	// Wait to finish up processing everything before start id.
	max := x.Max(ids.EndId, ids.ReadOnly)
	if err := o.doneUntil[namespace].WaitForMark(context.Background(), max); err != nil {
		glog.Errorf("Error while waiting for mark: %+v", err)
	}

	// Now send it out to updates.
	o.updates <- &pb.OracleDelta{MaxAssigned: max, namespace: namespace}

	o.Lock()
	defer o.Unlock()
	o.maxAssigned[namespace] = x.Max(o.maxAssigned, max)
}
  1. After applying the commit proposal. we need to send higher delta to the alpha oracle. So, that in-memory posting list will be written to the disk.
func (o *Oracle) updateCommitStatus(index uint64, src *api.TxnContext) {
	// TODO: We should check if the tablet is in read-only status here.
	if o.updateCommitStatusHelper(index, src) {
		delta := new(pb.OracleDelta)
		delta.Txns = append(delta.Txns, &pb.TxnStatus{
			StartTs:  src.StartTs,
			CommitTs: o.commitTs(src.StartTs),
			Namespace: src.NameSpace.
		})
		o.updates <- delta
	}
}

Export and Bulk/Live Loader

The export works as it is but for each namespace, we will create a new folder and each folder can contain the exported rdf and schema file.

mutation {
  export(input: {format: "json", namespace:"foo"}) {
    response {
      message
      code
    }
  }
}

This will export the namespace foo to a folder foo in the export directory (by default this directory is called export)
To export all the namespace export(input: {format: "json", namespace:"*"}) can be used. The namespace param can actually be a regex which allows exporting multiple namespace.

While importing bulk/live we can figure out the namespace(s) using the folder name and the user will be asked to confirm the namespace. The user can choose to import all data into the default namespace or pick different namespace (via command line prompt) for each folder.

Backup and Restore

The backup contains a protobuf will all the metadata and we can store information about namespace in that. Restore can use the same information from the backup file. A prompt can confirm if the namespace and restore the data into the specified namespace.

Zero node

Zero supports the /moveTablet, /state and /assign endpoints. All these endpoints should be namespace aware. Since each namespace has it’s own timestamps, the /assign endpoint should also accept a namespace as the param (or via the header). The rebalancing of the tablets should also be namespace aware (or not?).

The /state endpoint on zero should show information about only the specified namespace.
Question - Do we need a way to show information about all the namespaces?

Alpha /admin endpoint

The current admin endpoint lists details such as ongoing indexing. The /admin should also be namespace aware. We can use the same Dgraph-namespace header for it here as well.

GraphQL and SlashGraphQL?

Comment by Manish - Multi-tenancy should work with /admin GraphQL.
Needs more information - cc @Pawan

Testing

Todo - Figure out how to test multi-tenancy. Existing systests will not be enough.

Changes needed to the clients

All the clients need to support

  1. Namespace creation/deletion. This should be done via changes to the payload for /alter call.
  2. Queries and mutation optionally support specifiying the namespace.
  3. Support specifying namespace in transactions.

Questions

  • Do we need to return information about namespace to the user with each query? Should we add the namespace to the response header as well?
    ibrahim - I think we should. It might be useful if the request is redirected for some reason.
  • Separate badger storage for each namespace?
    ibrahim - Multiple badger instances can be very costly in terms of memory/cpu.

Author: @balaji and @ibrahim

Code diff from multi-tenancy call

index 830346cd..ce5a75eb 100644
--- a/dgraph/cmd/zero/assign.go
+++ b/dgraph/cmd/zero/assign.go
@@ -127,6 +127,7 @@ func (s *Server) lease(ctx context.Context, num *pb.Num, txn bool) (*pb.Assigned
 	// If we have less available than what we need, we need to renew our lease.
 	if available < num.Val+1 { // +1 for a potential readonly ts.
 		// Blocking propose to get more ids or timestamps.
+		// All proposals would need to be namespace aware.
 		if err := s.Node.proposeAndWait(ctx, &proposal); err != nil {
 			return nil, err
 		}
@@ -144,6 +145,8 @@ func (s *Server) lease(ctx context.Context, num *pb.Num, txn bool) (*pb.Assigned
 			s.nextTxnTs++
 			out.ReadOnly = s.readOnlyTs
 		}
+		// We are namespace aware here. So, we pick the right oracle for the
+		// namespace. Update the doneUntil for that namespace.
 		s.orc.doneUntil.Begin(x.Max(out.EndId, out.ReadOnly))
 	} else {
 		out.StartId = s.nextLeaseId
diff --git a/dgraph/cmd/zero/oracle.go b/dgraph/cmd/zero/oracle.go
index b96c5cf8..4c6d47ef 100644
--- a/dgraph/cmd/zero/oracle.go
+++ b/dgraph/cmd/zero/oracle.go
@@ -40,6 +40,9 @@ type syncMark struct {
 // Oracle stores and manages the transaction state and conflict detection.
 type Oracle struct {
 	x.SafeMutex
+	// Namespace aware. Stores the namespace.
+	// Zero server holds a map of namespace -> Oracle.
+
 	commits map[uint64]uint64 // startTs -> commitTs
 	// TODO: Check if we need LRU.
 	keyCommit   map[string]uint64 // fp(key) -> commitTs. Used to detect conflict.
@@ -269,6 +272,7 @@ func (o *Oracle) storePending(ids *pb.AssignedIds) {
 	}
 
 	// Now send it out to updates.
+	// Send with the namespace in it.
 	o.updates <- &pb.OracleDelta{MaxAssigned: max}
 
 	o.Lock()
@@ -433,6 +437,9 @@ func (s *Server) Oracle(_ *api.Payload, server pb.Zero_OracleServer) error {
 	if !s.Node.AmLeader() {
 		return errNotLeader
 	}
+	// I need to subscribe to k namespaces.
+	// Loop over the oracles corresponding to those namespaces and subscribe to
+	// those specifically.
 	ch, id := s.orc.newSubscriber()
 	defer s.orc.removeSubscriber(id)
 
@@ -499,6 +506,7 @@ func (s *Server) TryAbort(ctx context.Context,
 }
 
 // Timestamps is used to assign startTs for a new transaction
+// This would also have to be namespace aware.
 func (s *Server) Timestamps(ctx context.Context, num *pb.Num) (*pb.AssignedIds, error) {
 	ctx, span := otrace.StartSpan(ctx, "Zero.Timestamps")
 	defer span.End()
@@ -511,6 +519,7 @@ func (s *Server) Timestamps(ctx context.Context, num *pb.Num) (*pb.AssignedIds,
 	reply, err := s.lease(ctx, num, true)
 	span.Annotatef(nil, "Response: %+v. Error: %v", reply, err)
 
+	// Get namespace oracle.
 	if err == nil {
 		s.orc.doneUntil.Done(x.Max(reply.EndId, reply.ReadOnly))
 		go s.orc.storePending(reply)
diff --git a/go.mod b/go.mod
index a261b5f9..dbeda460 100644
--- a/go.mod
+++ b/go.mod
@@ -32,6 +32,8 @@ require (
 	github.com/google/codesearch v1.0.0
 	github.com/google/uuid v1.0.0
 	github.com/minio/minio-go v0.0.0-20181109183348-774475480ffe
+	github.com/onsi/ginkgo v1.7.0 // indirect
+	github.com/onsi/gomega v1.4.3 // indirect
 	github.com/paulmach/go.geojson v0.0.0-20170327170536-40612a87147b
 	github.com/philhofer/fwd v1.0.0 // indirect
 	github.com/pkg/errors v0.8.1
diff --git a/posting/oracle.go b/posting/oracle.go
index 59619124..69095784 100644
--- a/posting/oracle.go
+++ b/posting/oracle.go
@@ -92,6 +92,7 @@ func (txn *Txn) Store(pl *List) *List {
 	return txn.cache.SetIfAbsent(string(pl.key), pl)
 }
 
+// Per namespace.
 type oracle struct {
 	x.SafeMutex
 
diff --git a/protos/pb/pb.pb.go b/protos/pb/pb.pb.go
index d73ad68b..8b084179 100644
--- a/protos/pb/pb.pb.go
+++ b/protos/pb/pb.pb.go
@@ -7,15 +7,16 @@ import (
 	context "context"
 	encoding_binary "encoding/binary"
 	fmt "fmt"
+	io "io"
+	math "math"
+	math_bits "math/bits"
+
 	pb "github.com/dgraph-io/badger/v2/pb"
 	api "github.com/dgraph-io/dgo/v2/protos/api"
 	proto "github.com/golang/protobuf/proto"
 	grpc "google.golang.org/grpc"
 	codes "google.golang.org/grpc/codes"
 	status "google.golang.org/grpc/status"
-	io "io"
-	math "math"
-	math_bits "math/bits"
 )
 
 // Reference imports to suppress errors if they are not otherwise used.
@@ -1247,6 +1248,9 @@ func (m *License) GetEnabled() bool {
 }
 
 type ZeroProposal struct {
+	// Most likely no need to have namespace here. If we can do MaxAssigned w/ namespace,
+	// without having separate Txn timestamps / UIDs, then we're good. No need
+	// to change this.
 	SnapshotTs           map[uint32]uint64 `protobuf:"bytes,1,rep,name=snapshot_ts,json=snapshotTs,proto3" json:"snapshot_ts,omitempty" protobuf_key:"varint,1,opt,name=key,proto3" protobuf_val:"varint,2,opt,name=value,proto3"`
 	Member               *Member           `protobuf:"bytes,2,opt,name=member,proto3" json:"member,omitempty"`
 	Tablet               *Tablet           `protobuf:"bytes,3,opt,name=tablet,proto3" json:"tablet,omitempty"`
@@ -3332,6 +3336,7 @@ func (m *TxnStatus) GetCommitTs() uint64 {
 }
 
 type OracleDelta struct {
+	// Should have Namespace as well.
 	Txns                 []*TxnStatus      `protobuf:"bytes,1,rep,name=txns,proto3" json:"txns,omitempty"`
 	MaxAssigned          uint64            `protobuf:"varint,2,opt,name=max_assigned,json=maxAssigned,proto3" json:"max_assigned,omitempty"`
 	GroupChecksums       map[uint32]uint64 `protobuf:"bytes,3,rep,name=group_checksums,json=groupChecksums,proto3" json:"group_checksums,omitempty" protobuf_key:"varint,1,opt,name=key,proto3" protobuf_val:"varint,2,opt,name=value,proto3"`
@@ -3544,6 +3549,7 @@ func (m *RaftBatch) GetPayload() *api.Payload {
 }
 
 type Num struct {
+	// Would need to be namespace aware.
 	Val                  uint64   `protobuf:"varint,1,opt,name=val,proto3" json:"val,omitempty"`
 	ReadOnly             bool     `protobuf:"varint,2,opt,name=read_only,json=readOnly,proto3" json:"read_only,omitempty"`
 	Forwarded            bool     `protobuf:"varint,3,opt,name=forwarded,proto3" json:"forwarded,omitempty"`

Discussion with Manish

Multi-Tenancy in Dgraph

<dgraph.acl, user> -> Bunch of things here.

With namespace,
<namespace, dgraph.acl, user> -> Bunch of things.

Super user is the “default” namespace
<default, dgraph.acl, user> -> Guardians of Dgraph
<foo, dgraph.acl, user> -> Guardians of foo


Guiding Principles

  • There should be no change in behavior for open source users.

Default Namespace

  • Every key stored in Badger must have a namespace.
  • By default, they’ll use “default” or something.

Relationship between foo Guardian and Default Guardian

  • That way, either foo Guardian or default guardian can modify members of foo namespaces.
  • Only default guardian can modify namespaces.
  • Guardian of a foo namespace can drop the data in the namespace.
  • Guardian of foo can’t delete the namespace itself because it was created by default guardian.

User access to Namespaces

Easier (User only has access to one namespace: 1:1)

  • Keep the users across namespaces separate.
  • Every user logs in to a particular namespace, gets a token.
  • Pass in (namespace, token) to auth. Maybe the namespace can be within the token. In that case, no need for a separate namespace header.
  • ACLs are enabled by default.

Harder (User has access to multiple namespaces: 1:many)

  • Allow a user to operate across namespaces on the same token.
  • Because, then we can’t do <namespace, dgraph.acl, user>
  • <dgraph.acl, user> -> bunch of namespaces, one password, etc.
  • Then, the user ID has to be unique across all namespaces, which is just WRONG.

ACL

Current

<dgraph.xid, uid> -> “xid” // Has an exact index for “xid” -> uid.
<dgraph.group.acl, group-uid> -> “string acl”
<dgraph.password, uid> -> password
<dgraph.user.group, uid> -> list of groups

With Namespace

<n1, dgraph.xid, uid> -> “xid” // Has an exact index for “xid” -> uid.
<n1, dgraph.group.acl, group-uid> -> “string acl”
<n1, dgraph.password, uid> -> password
<n1, dgraph.user.group, uid> -> list of groups

Because everything is separate, doing the 1:many user-namespace is going to be hard. Maybe skip that for now.

Tip: Keep a watch on this, and see if 1:many would be possible via some other means.

it would be an enterprise feature?Oh, it is terrible.

It was the most discussed topic in a year plan, with feedback/questions for reasoning from many people… It is at least strange, that there was no feedback on the topic.

So, my guess is, mostly nobody outside of DGraph team understands the reasoning behind, why a basic feature available in so many databases (even graph databases: ArangoDB, Neo4J, OrientDB - which I checked) for free to be enterprise in DGraph.

Questions from today’s review session
Lines prefixed with ibrahim are responses by Ibrahim.

  • Namespace key prefix vs a rdf 4-tuple
    Ibrahim - This needs input from @mrjn . AFAIK, we want to do keys with namespace prefix.
  • Should we support default namespace without any prefix
    Ibrahim - There are multiple challenges with doing the. The first one is how do we delete data? If the keys are prefixed, we can do dropPrefix(namespace) and all data will be dropped. Without the prefix, it would be very difficult to find the data to drop (we can iterate over the entire db but that’s too slow)
  • Separate p-directory for each namespace prefix vs shared
    • More memory footprint as Badger needs a separate memory allocation but would a separate p-directory per namespace provide better isolation.
      Ibrahim - Agreed, multiple badger instances could lead to a lot of per-namespace overhead but the physical isolation might be useful as well. @mrjn what do you think?
  • ACLs for different namespace prefix
    • Who gets access to default namespace?
      Todo(Ibrahim) - Update this once we have clarity on ACLs
  • Could we force users to always pick a namespace?
    Ibrahim - Based on the discussion we had on 25/06/2020, the JWT token will contain information about the namespace
    • What do other DBs do/allow?
      Ibrahim- Postgres requires you to create a DB before you start using it and select the DB before you insert/read data from that DB
    • All the existing code would need an update
  • Transactions
    • Can we have a transaction across namespaces?
      Ibrahim - Not in the initial version. The initial version will have only single-namepace queries/mutations
    • Query and transactions are limited to a namespace
      ibrahim - yes, that’s correct
  • Export and import
    • Should we allow export/import of all namespaces in a single request or allow regular expressions for namespace prefix?
      Ibrahim - the initial version will support only single namespace export.
  • Would Timestamps be shared across namespaces?
    Ibrahim - No. Zero will store timestamps for each namespace.

Todo:

  • We would need a section on Sentry and Telemetry
  • Test plan

Notes from yesterday’s call with @pawan and @abhimanyusinghgaur

  • A user wants to access predicate across namespace. Where do we store information about this user? Should there be a common users table? Or should we duplicate the user? If duplicated, they will have to use different JWT tokens.
    Ibrahim to look into Postgres.

  • How would a user get access to a namespace? More details on the exact graphql API that will be used.

  • We will need another namespace to store all the admins group. We will need a admin namespace which stores all the information about the admins.

I thought more about this and this might not be required. I think a good way to find out about the exact requirements might be to look at what all do we need for this to work with Slash GraphQL. A user having their own Dgraph instance deployed in Slash GraphQL would not access data in another namespace (belonging to another user). Guardians (superusers) are the only users that might be able to access data across namespaces.

Notes from today’s review meeting

  • How does a user find the namespace they have access to? Can they query an API which will return back all the namespaces a user has access to?
  • Can a user use Multi-tenancy without ACLs?
    • Can the user remove ACLs and we still allow them to use dgraph? Maybe merge the namespace?
    • Can they query without a JWT token? (which means ACLs are turned off)
  • Queries/mutations should return errors if the namespace does not exist, currently (on the query/mutation PR) we return an empty response.
  • We need to maintain namespace information in badger or a separate dgraph instance (the same will be needed to store user information)
  • Mandatory namespace header for all operations. If a user hasn’t provided namespace, we return an error.
  • Admins should be able to create/delete namespace. What happens if the namespace is deleted by the user? Should we delete all the data inside the namespace? Maybe we should force the user for confirmation in some way.

Admins would want to query in multiple namespaces. A user might not need but for admins it is needed.

Or if we can make a point that even admins should not be allowed to query/mutate on any namespace, then we don’t need to allow anyone to query out of their namespace.

This spiked my interest as the reason we decided to switch to a graph db was to get away from these “namespaces”. We used MySQL before and had:

  • User namespace where we controlled user access
  • Public namespace where public data lived that was accessable to all users
  • A private namespace for each user where their private data lived

EDIT: We also was linking private namespace data to its related public namespace data which led to extra queries and data compilation on the user side

What we come to realize after 2 years of running and accumulating >550 users, was that this namespacing was not optimal and actually made admin and development more difficult. I am sure there are use cases for it but for our specific case we needed to query across namespaces (which is allowed with MySQL) but becomes difficult if you have more than just a few.

My opinion is that namespaces should only be used where data will never need to be queried across the top for any kind of management. It would probably work for a web developer who wants a single dgraph instance but for several clients websites where each client will be a unique schema and no cross site querying is ever done. Our use case required the same schema for every namespace.

I understand also the requirement of it being an enterprise feature because it requires ACL which is also an enterprise feature. This will probably deter many users from using it though, and stick to multiple dgraph instances.

1 Like

Jotting down some of my thoughts. Some of these may be a repetition from above.
Overall, I think we trying to fit in way too many use-cases in the first cut.

Mental Model

While designing, having a mental model of each tenant having their own physical DB will help. Asking what would happen if each tenant had their own DB will immediately tell us what the solution should be.

Another idea is to draw an analogy. This feature is very analogous to Process’ Virtual Memory or VMs hosted on an ESX and we can take inspiration from those battle-tested analogies as well.

Yes, other DBs may allow some exotic features (such as user logs in only once for all namespace, user can access >1 namespace, cross namespace txn etc) but they either break the mental model or are complex to implement. As a first cut of this feature, we should aim for simplicity and then iterate.

We can address always add more in a subsequent iteration when we have some feedback and field experience

Assumptions / Notes

  1. Each user in a DB cannot / should not be able to access other namespaces. This would not have been possible in the case of 1DB / tenant scenario, so we extend to namespace. The corollary to this is that a user can only be part of one namespace.
  2. In rare situation, if an admin/user needs access to multiple namespaces, they create a user in each namespace of interest.
  3. The exception to the above is the Dgraph guardian users. These users can access any namespace and will only generally be used for managerial / administrative operations and onboarding tenants/namespaces. No tenant will have a user in this group.
  4. Now, within a namespace, each tenant can have their own guardian(s) limited to that namespace. Again, tenant users have no visibility of namespaces, to them, it is as if they are operating on their own exclusive DB.
  5. The default namespace should mostly be used for administrative purposes. Ideally, no tenant’s data should ever go in this namespace. This also means that any query / mutation / alter operation should be attached to a namespace specified by the client in the same namespace via ACL.
  6. As we discussed, the namespace can be made part of the jwtToken when a user logs in. This will make it transparent to the user / clients. Inserting it into jwtToken ties the namespace to the ACL rules of the user, so a user must be logged in into a namespace and the ACL will check if the user belongs to it.
  7. For OSS or when ACL is turned off, either everything goes into the default namespace or there is no namespacing at all so it is compatible with older versions. I prefer latter but it may be more complex to implement.
4 Likes

I am uncomfortable with solutions that would make privileged access to the database complex. For instance, a common pattern with Elasticsearch is setting up a privileged ingestion pipeline that collects data from many places, and puts it in specific indices. Users may have read access as necessary, but this does not match the ingestion permissions.

My stateless ingestion into dgraph would have to run Login() to every namespace it accessed and track separately the token to that namespace - even though it is a user I wish to write across many namespaces. Similarly, my stateless API accessing many customers namespaces on their behalf would have to run Login() on possibly every call, even though it is a privileged user that needs to read all namespaces.

This design seems to be ideal for exposing the database on the internet to many users, which I understand is what dgraph is doing with the SaaS offering currently. However, I would never expose the database directly to the user in an application where the database is not the whole product.

1 Like

Perhaps what you’re looking for is the GraphQL auth, which is different from Multi-Tenancy. GraphQL Auth is designed to be exposed to the internet.

Just to clarify, that is not what I am looking for. I want namespaces without ACL. Preferably with data-isolation ala normal rdbms architecture.

I recognize the desire to keep the model simple as @paras suggests, but I have some concerns regarding multi-tenant applications that have dgraph behind privelged APIs, riffing on what @iluminae brings up.

Writes

I find it to be a common pattern to “hydrate” read-effecient datastores like dgraph and elasticsearch via other sources of truth (cold storage, old RDBMs, eventlogs etc). Often there exists an “all seeing, all knowing” process that can stitch this data and pipe it into the right “tenant”.

Now, if that ETL process (or worse, dozens of stateless replicas of that process) need to call login for every request and batch data into discrete tenant-requests, I worry that performance and ergonomics will suffer. (Especially in the case of reading the “firehose” see: Dgraph can't idle without being oomkilled after large data ingestion where we explored various techniques and improvents for ingestion at high speed)

For these batch ingestion uses, an RDF 4-tuple would be more ergonomic since we could include inserts for multiple namespaces with one request. I think this quote was in regards to storage representation (which I have no preference on), but I wanted to highlight 4-tuple as an idea for the API.

Reads

Similarly, when dgraph lives as a small piece of a “greater” multi-tenant platform, lots of nuanced, domain-specific access control happens at the application layer- usually fed by APIs that have permissive access to the upstream databases/caches/services. We’d like to able to isolate data in dgraph by various criteria, without the overhead of managing users/tokens/state etc.

Mental model

@paras calls out this class of cross-cutting permissions as being “exotic” (which I do agree with), however they are foundational in most databases. Databases evolve to include these pathways because they are not directly exposed to the end-user; intermediary APIs and services do a lot stitching/filtering.

Conflating ACL and Isolation

The concern/critique Ive seen so far relates to the coupling of data-isolation and access-control. I believe they are independent, but compounding features.

Data isolation is quite important- and I think for instances where only privileged users (apis) interact with dgraph, ACL is a bit overkill, and in fact, can make things more painful. I see the features serving two different purposes:

Data Isolation

  • prevents dgraph schema collisions
  • avoids silly footguns (rm -rfing the whole prod database)
  • helps common DB-isms like hot/cold schema migrations/swap-over
  • provides no guarantees of security (if the header says namespaceX, they get namespaceX)

Access Control

  • Verifiably “correct”/strict data-isolation and compliance
  • User provisioning/mgmt/audit trails

I see in the meeting notes there was a comment about this approach. Was it ruled out?

Prior art

Elasticsearch is an apt comparison. In their model, I can make many isolated indices, and even bulk ingest to them in one big request… however native access control ontop of their indices is provided their enterprise license.

I suspect most people are familiar with a model like this, so ACL ontop of namespaces seems like a good way to distinguish dgraph-enterprise from OSS… however that is drifting into a business discussion which I want to veer away from while assessing the RFC.

That being said, we would definitely purchase enterprise for isolation capabilities- but the user-per-namespace design would require us share/track/provision more state than we’d like to, leaving a bit of a sour taste.

I realize I’m promoting a seemingly less safe system :stuck_out_tongue: - but if we conclude isolation != security, the line seems clearer.

Also, FWIW, I think physical separation (via multiple badgers) would be a sweet enterprise upsell for those that have extremely strict compliance needs

Note here, when I say, multi-tenant, I think of it as different organizations. There will not be a know it all process in such a situation. That is the initial goal. And yes, we are talking about how to handle situations such as what you suggest above but it may just not be in the initial release.

Per current design, at least in the first release, you will need to call Login once per namespace and not on every request. Once logged in, you will have a jwtToken (tied to that user/namespace combo) that should be used for mutations/queries.

Right. Per current design, one request will have to contain all data pertaining to that tenant. So, segregation will need to be done on the client side. But again, in my mind, multi-tenant means different orgs and they wont need to do this as all data will be presumed to be the same namespace embedded in the jwtToken.

We are still debating on this. However, note this will only help with segregation on the client side. Batching per namespace will still need to be done.

Noted. This seems to be the crux of your concern. We will see about separating the two and will take this under advisement with the team. No promises :slight_smile:

We hear you. Thanks for the comments. We really appreciate such feedback that helps us come up with a feature set that appeals to the majority of use-cases, make a release, gather feedback and iterate over it.

I had raised the same question. This will also solve another related concern about the encryption keys being global and not per namespace. @ibrahim , please opine.

Thanks again. My colleagues will chime in as needed if I missed anything and/or keep me honest.

1 Like

Thanks for the detailed response @Paras very insightful! Appreciate the RFCs being public like this.

Here’s a couple followup comments to shine a little more light on use cases to help inform the future development of the feature :slight_smile:

Ah sorry yes, you are correct. My reference to “every request” is in a model where the processes writing to dgraph are ephemeral and stateless (to allow quick horizontal scaling in a burst), which is how we try to architect our services. There’s solutions to that problem, but it’s more painful than just provisioning the process a long-lived certificate to talk administratively/globally as we do with postgres/elastic/bigtable etc.

With a multi-tenant system of hundreds of customers, creating a message-queue topic for each is not feasible. Instead we multiplex certain request types over shared message queues. (In our current model using elasticsearch we batch 100s of requests, spanning tenants, into one insert request). Agreed this is likely a problem out of scope for this initial RFC, but I imagine the more asynchronous your writers are, the more this behavior will be desired.

@iluminae and I played with the balaji/later_multitenancy branch which we liked a lot (essentially tenancy without the ACL things). If there’s any more info we can provide, feel free to reach out. Thanks!

1 Like