This document is still in review. The details might change over time.
Motivation
Dgraph currently is single-tenant, which means you can run only a single namespace inside a dgraph instance. Most of the databases support multi-tenancy (or multiple namespaces) which allows for logical partitioning of the database. In PostgreSQL, each namespace is called a database and all data is associated with a namespace.
Multi-tenancy or logical partitioning allows multiple namespaces to run on a single dgraph instance in an isolated manner. The computing resources are shared between the namespaces but they are logically separated from each other.
Overview
For Dgraph, the namespace separation in multi-tenancy will be a logical separation, and it is breaking change since we will be changing the structure of the keys. Dgraph will prefix predicates and types with namespace while storing it in badger. Multi-tenancy will require ACLs to be enabled since a user should be able to perform queries/mutations in multiple namespaces but should not have access to all the namespaces.
Assumptions
- Each namespace acts as a logical silo. The data stored in one namespace will not be accessible by another namespace.
- Each user is part of a single namespace. Cross namespace queries are not allowed. (we might support aggregation queries across multiple namespaces but it is out of the scope of this RFC)
- A user can be part of multiple namespaces but the user has to be created separately for each namespace.
- Each new cluster creates two namespaces.
system
namespace anddefault
namespace.system
namespace stores the metadata about the namespaces and information aboutsuper users
. Superusers are the users who can create and delete the namespaces.default
namespace is the default namespace for the users. The user data will be stored in this namespace unless a different namespace is specified.
- For open-source users, dgraph will still have
system
anddefault
namespace. The system namespace will be empty and everything is stored in thedefault
namespace.
Implementation Details
Most of the components of dgraph will require changes. This section lists all the changes required.
Representation of keys
Currently, for a given RDF <0x01> <follower> <0xab> .
it is represented in badger as key=<follower, 0x01>
. With multi-tenancy, it will be stored as key=<namespace><namespace-delimiter><follower, 0x01>
. This is a breaking change and all the data has to be migrated to this new format (maybe a migration command in alpha?).
The problem with keeping the same namespace is that we cannot drop the namespace if it doesn’t have a prefix.
Namespace and Predicate separator
We will be using the byte 30
as the namespace separator because it’s ASCII standard separator and a user wouldn’t have access to it. This namespace separator should be restricted at the lexer level. The data sent by the user should never contain the namespace separator.
NamespaceSeperator = byte(30)
Access Control Lists (ACL)
This section including the diagram is out of date. @ibrahim to update this.
Multi-tenancy will depend heavily on ACLs. The kind of queries/mutations a user can perform will depend on the ACLs. By default, the guardians
group exists in Dgraph. A user of the guardian group will have access to create a namespace and assign a user to the ACL group. When a new namespace is created a guardians group for the namespace will also be created, called guardians-of-the-namespace. Only the user of the dgraph-guardians
group can add/delete a user from the guardian-of-namespace.
Each user can be part of multiple namespaces which means they can be part multiple namespace-guardians
group so that they are able to query in multiple namespaces (one namespace at a time)
The
dgraph-guardians
can create/delete the namespace. namespace-guardians
can only query/mutate data in their namespace. They CANNOT delete the namespace they’re part of.
The dgraph.user.namespaces
will contains the namespace list of namespaces a user can access.
A user can be a member of multiple namespace-guardians
groups but they can perform queries/mutation on a single namespace at a time.
Current ACL Schema
dgraph.xid string @index(exact) .
dgraph.password password .
dgraph.user.group [uid] @reverse .
dgraph.acl.rule [uid] .
dgraph.acl.predicate string @index(exact) @upsert .
dgraph.rule.permission int .
Proposed ACL Schema
dgraph.xid string @index(exact) .
dgraph.password password .
dgraph.user.group [uid] @reverse .
dgraph.acl.rule [uid] .
dgraph.acl.predicate string @index(exact) @upsert .
dgraph.rule.permission int .
dgraph.namespace string @reverse
Creating a new namespace
A Namespace will be created by /alter
with payload {"create_namespace": "foo"}
. A namespace can only be created by any member of the dgraph-guardians
group.
Deleting a namespace
Namespace can be dropped via /alter
with payload'{"drop_namespace": "foo"}'
. Any member of the dgraph-guardians
group can delete/drop a namespace. Note, the members of namespace-guardians
cannot delete a namespace, they can only perform queries/mutations.
In badger, drop prefix is a stop-world operation and some operations like memtable flush, compaction, etc are stopped while running drop prefix. If one user runs a drop namespace operation, this could affect the writes of another user.
Queries, Mutations, and Schema.
Each query and mutation will be associated with a namespace. The claims in the token determine the namespace that should be used for the query/mutation. Internally all keys would be prefixed with <namespace>
prefix and this namespace
will be used for all the queries.
Comment by Manish - Should be done via the keys.go, I think. Also, parseKey should work. So should the prefixPredicate and so and so forth
Similarly, Schema is updated via ALTER command. Every client interaction will tell (through the JWT token) what namespace they want to interact with. By using that information, parseSchema will prefix namespace to every predicate. eg(default-name, derived from keys.go from x package). Then the normal flow happens, zero will decide where this predicate should belong to and MutateOverNetwork happens.
While querying for schema, schema retrieval happens the same way, but when we returning to the user, we will filter the schema respective to the namespace, and trim the predicate name. The namespace
part of the predicate key should be removed while returning the schema to the user.
Transactions
Transactions should also be namespace level. A transaction running in namespace “x” should not be affected by a transaction running in namespace “y”. The clients will send namespace using the transaction context. For instance, in dgo, the user can do NewTransactionWithNamespace(foo)
and then we use this foo while making requests to dgraph. In case of missing namespace in the context, it will be considered as the Default Namespace
Currently, each transaction is blocked by a watermark in both zero and alpha. The transaction blocking happens when we query for data in the alpha based on the maxAssigned
. We will need to separate the blocking mechanism based on the namespace. In order to make namespace level transaction handling, we can create a namespace level watermark and after the commit phase, we can stream the respective maxAssigned
of the namespace to all the alpha so the higher readTs of the corresponding namespace will be unblocked to query the data.
Transaction implementation details
UID leasing and transaction timestamps are going to work in the same way.
Every zero oracle and alpha oracle are separated by namespace. After the commit phase, we have to stream the MaxAssignedTs for each namespace. alpha oracle will have waiters according to the namespace.
In alpha oracle.
waiters = map[string] map[uint64] struct
for _, waitersForNamespace := range waiters{
for startTs, notifyCh := range waitersForNamespace{
if startTs < MaxAssignedForThisNamespace{
notifyCh <- struct{}
// delete the waiters.
}
}
}
In zero oracle, we’ll have a watermark for each namespace.
doneUntil map[string]y.WaterMark
we have two places to send the delta.
- During issuing ts. because lower ts may go though commit phase, so here we have to wait for the lower ts complete the transaction to send the delta (mechanism already exist)
func (o *Oracle) storePending(ids *pb.AssignedIds, namespace string) {
// Wait to finish up processing everything before start id.
max := x.Max(ids.EndId, ids.ReadOnly)
if err := o.doneUntil[namespace].WaitForMark(context.Background(), max); err != nil {
glog.Errorf("Error while waiting for mark: %+v", err)
}
// Now send it out to updates.
o.updates <- &pb.OracleDelta{MaxAssigned: max, namespace: namespace}
o.Lock()
defer o.Unlock()
o.maxAssigned[namespace] = x.Max(o.maxAssigned, max)
}
- After applying the commit proposal. we need to send higher delta to the alpha oracle. So, that in-memory posting list will be written to the disk.
func (o *Oracle) updateCommitStatus(index uint64, src *api.TxnContext) {
// TODO: We should check if the tablet is in read-only status here.
if o.updateCommitStatusHelper(index, src) {
delta := new(pb.OracleDelta)
delta.Txns = append(delta.Txns, &pb.TxnStatus{
StartTs: src.StartTs,
CommitTs: o.commitTs(src.StartTs),
Namespace: src.NameSpace.
})
o.updates <- delta
}
}
Export and Bulk/Live Loader
The export works as it is but for each namespace, we will create a new folder and each folder can contain the exported rdf and schema file.
mutation {
export(input: {format: "json", namespace:"foo"}) {
response {
message
code
}
}
}
This will export the namespace foo
to a folder foo
in the export directory (by default this directory is called export
)
To export all the namespace export(input: {format: "json", namespace:"*"})
can be used. The namespace
param can actually be a regex which allows exporting multiple namespace.
While importing bulk/live we can figure out the namespace(s) using the folder name and the user will be asked to confirm the namespace. The user can choose to import all data into the default namespace or pick different namespace (via command line prompt) for each folder.
Backup and Restore
The backup contains a protobuf will all the metadata and we can store information about namespace in that. Restore can use the same information from the backup file. A prompt can confirm if the namespace and restore the data into the specified namespace.
Zero node
Zero supports the /moveTablet
, /state
and /assign
endpoints. All these endpoints should be namespace aware. Since each namespace has it’s own timestamps, the /assign
endpoint should also accept a namespace as the param (or via the header). The rebalancing of the tablets should also be namespace aware (or not?).
The /state
endpoint on zero should show information about only the specified namespace.
Question - Do we need a way to show information about all the namespaces?
Alpha /admin endpoint
The current admin endpoint lists details such as ongoing indexing
. The /admin
should also be namespace aware. We can use the same Dgraph-namespace
header for it here as well.
GraphQL and SlashGraphQL?
Comment by Manish - Multi-tenancy should work with /admin
GraphQL.
Needs more information - cc @Pawan
Testing
Todo - Figure out how to test multi-tenancy. Existing systests will not be enough.
Changes needed to the clients
All the clients need to support
- Namespace creation/deletion. This should be done via changes to the payload for
/alter
call. - Queries and mutation optionally support specifiying the namespace.
- Support specifying namespace in transactions.
Questions
- Do we need to return information about namespace to the user with each query? Should we add the namespace to the response header as well?
ibrahim - I think we should. It might be useful if the request is redirected for some reason. - Separate badger storage for each namespace?
ibrahim - Multiple badger instances can be very costly in terms of memory/cpu.
Code diff from multi-tenancy call
index 830346cd..ce5a75eb 100644
--- a/dgraph/cmd/zero/assign.go
+++ b/dgraph/cmd/zero/assign.go
@@ -127,6 +127,7 @@ func (s *Server) lease(ctx context.Context, num *pb.Num, txn bool) (*pb.Assigned
// If we have less available than what we need, we need to renew our lease.
if available < num.Val+1 { // +1 for a potential readonly ts.
// Blocking propose to get more ids or timestamps.
+ // All proposals would need to be namespace aware.
if err := s.Node.proposeAndWait(ctx, &proposal); err != nil {
return nil, err
}
@@ -144,6 +145,8 @@ func (s *Server) lease(ctx context.Context, num *pb.Num, txn bool) (*pb.Assigned
s.nextTxnTs++
out.ReadOnly = s.readOnlyTs
}
+ // We are namespace aware here. So, we pick the right oracle for the
+ // namespace. Update the doneUntil for that namespace.
s.orc.doneUntil.Begin(x.Max(out.EndId, out.ReadOnly))
} else {
out.StartId = s.nextLeaseId
diff --git a/dgraph/cmd/zero/oracle.go b/dgraph/cmd/zero/oracle.go
index b96c5cf8..4c6d47ef 100644
--- a/dgraph/cmd/zero/oracle.go
+++ b/dgraph/cmd/zero/oracle.go
@@ -40,6 +40,9 @@ type syncMark struct {
// Oracle stores and manages the transaction state and conflict detection.
type Oracle struct {
x.SafeMutex
+ // Namespace aware. Stores the namespace.
+ // Zero server holds a map of namespace -> Oracle.
+
commits map[uint64]uint64 // startTs -> commitTs
// TODO: Check if we need LRU.
keyCommit map[string]uint64 // fp(key) -> commitTs. Used to detect conflict.
@@ -269,6 +272,7 @@ func (o *Oracle) storePending(ids *pb.AssignedIds) {
}
// Now send it out to updates.
+ // Send with the namespace in it.
o.updates <- &pb.OracleDelta{MaxAssigned: max}
o.Lock()
@@ -433,6 +437,9 @@ func (s *Server) Oracle(_ *api.Payload, server pb.Zero_OracleServer) error {
if !s.Node.AmLeader() {
return errNotLeader
}
+ // I need to subscribe to k namespaces.
+ // Loop over the oracles corresponding to those namespaces and subscribe to
+ // those specifically.
ch, id := s.orc.newSubscriber()
defer s.orc.removeSubscriber(id)
@@ -499,6 +506,7 @@ func (s *Server) TryAbort(ctx context.Context,
}
// Timestamps is used to assign startTs for a new transaction
+// This would also have to be namespace aware.
func (s *Server) Timestamps(ctx context.Context, num *pb.Num) (*pb.AssignedIds, error) {
ctx, span := otrace.StartSpan(ctx, "Zero.Timestamps")
defer span.End()
@@ -511,6 +519,7 @@ func (s *Server) Timestamps(ctx context.Context, num *pb.Num) (*pb.AssignedIds,
reply, err := s.lease(ctx, num, true)
span.Annotatef(nil, "Response: %+v. Error: %v", reply, err)
+ // Get namespace oracle.
if err == nil {
s.orc.doneUntil.Done(x.Max(reply.EndId, reply.ReadOnly))
go s.orc.storePending(reply)
diff --git a/go.mod b/go.mod
index a261b5f9..dbeda460 100644
--- a/go.mod
+++ b/go.mod
@@ -32,6 +32,8 @@ require (
github.com/google/codesearch v1.0.0
github.com/google/uuid v1.0.0
github.com/minio/minio-go v0.0.0-20181109183348-774475480ffe
+ github.com/onsi/ginkgo v1.7.0 // indirect
+ github.com/onsi/gomega v1.4.3 // indirect
github.com/paulmach/go.geojson v0.0.0-20170327170536-40612a87147b
github.com/philhofer/fwd v1.0.0 // indirect
github.com/pkg/errors v0.8.1
diff --git a/posting/oracle.go b/posting/oracle.go
index 59619124..69095784 100644
--- a/posting/oracle.go
+++ b/posting/oracle.go
@@ -92,6 +92,7 @@ func (txn *Txn) Store(pl *List) *List {
return txn.cache.SetIfAbsent(string(pl.key), pl)
}
+// Per namespace.
type oracle struct {
x.SafeMutex
diff --git a/protos/pb/pb.pb.go b/protos/pb/pb.pb.go
index d73ad68b..8b084179 100644
--- a/protos/pb/pb.pb.go
+++ b/protos/pb/pb.pb.go
@@ -7,15 +7,16 @@ import (
context "context"
encoding_binary "encoding/binary"
fmt "fmt"
+ io "io"
+ math "math"
+ math_bits "math/bits"
+
pb "github.com/dgraph-io/badger/v2/pb"
api "github.com/dgraph-io/dgo/v2/protos/api"
proto "github.com/golang/protobuf/proto"
grpc "google.golang.org/grpc"
codes "google.golang.org/grpc/codes"
status "google.golang.org/grpc/status"
- io "io"
- math "math"
- math_bits "math/bits"
)
// Reference imports to suppress errors if they are not otherwise used.
@@ -1247,6 +1248,9 @@ func (m *License) GetEnabled() bool {
}
type ZeroProposal struct {
+ // Most likely no need to have namespace here. If we can do MaxAssigned w/ namespace,
+ // without having separate Txn timestamps / UIDs, then we're good. No need
+ // to change this.
SnapshotTs map[uint32]uint64 `protobuf:"bytes,1,rep,name=snapshot_ts,json=snapshotTs,proto3" json:"snapshot_ts,omitempty" protobuf_key:"varint,1,opt,name=key,proto3" protobuf_val:"varint,2,opt,name=value,proto3"`
Member *Member `protobuf:"bytes,2,opt,name=member,proto3" json:"member,omitempty"`
Tablet *Tablet `protobuf:"bytes,3,opt,name=tablet,proto3" json:"tablet,omitempty"`
@@ -3332,6 +3336,7 @@ func (m *TxnStatus) GetCommitTs() uint64 {
}
type OracleDelta struct {
+ // Should have Namespace as well.
Txns []*TxnStatus `protobuf:"bytes,1,rep,name=txns,proto3" json:"txns,omitempty"`
MaxAssigned uint64 `protobuf:"varint,2,opt,name=max_assigned,json=maxAssigned,proto3" json:"max_assigned,omitempty"`
GroupChecksums map[uint32]uint64 `protobuf:"bytes,3,rep,name=group_checksums,json=groupChecksums,proto3" json:"group_checksums,omitempty" protobuf_key:"varint,1,opt,name=key,proto3" protobuf_val:"varint,2,opt,name=value,proto3"`
@@ -3544,6 +3549,7 @@ func (m *RaftBatch) GetPayload() *api.Payload {
}
type Num struct {
+ // Would need to be namespace aware.
Val uint64 `protobuf:"varint,1,opt,name=val,proto3" json:"val,omitempty"`
ReadOnly bool `protobuf:"varint,2,opt,name=read_only,json=readOnly,proto3" json:"read_only,omitempty"`
Forwarded bool `protobuf:"varint,3,opt,name=forwarded,proto3" json:"forwarded,omitempty"`
Discussion with Manish
Multi-Tenancy in Dgraph
<dgraph.acl, user> → Bunch of things here.
With namespace,
<namespace, dgraph.acl, user> → Bunch of things.
Super user is the “default” namespace
<default, dgraph.acl, user> → Guardians of Dgraph
<foo, dgraph.acl, user> → Guardians of foo
Guiding Principles
- There should be no change in behavior for open source users.
Default Namespace
- Every key stored in Badger must have a namespace.
- By default, they’ll use “default” or something.
Relationship between foo Guardian and Default Guardian
- That way, either foo Guardian or default guardian can modify members of foo namespaces.
- Only default guardian can modify namespaces.
- Guardian of a foo namespace can drop the data in the namespace.
- Guardian of foo can’t delete the namespace itself because it was created by default guardian.
User access to Namespaces
Easier (User only has access to one namespace: 1:1)
- Keep the users across namespaces separate.
- Every user logs in to a particular namespace, gets a token.
- Pass in (namespace, token) to auth. Maybe the namespace can be within the token. In that case, no need for a separate namespace header.
- ACLs are enabled by default.
Harder (User has access to multiple namespaces: 1:many)
- Allow a user to operate across namespaces on the same token.
- Because, then we can’t do <namespace, dgraph.acl, user>
- <dgraph.acl, user> → bunch of namespaces, one password, etc.
- Then, the user ID has to be unique across all namespaces, which is just WRONG.
ACL
Current
<dgraph.xid, uid> → “xid” // Has an exact index for “xid” → uid.
<dgraph.group.acl, group-uid> → “string acl”
<dgraph.password, uid> → password
<dgraph.user.group, uid> → list of groups
With Namespace
<n1, dgraph.xid, uid> → “xid” // Has an exact index for “xid” → uid.
<n1, dgraph.group.acl, group-uid> → “string acl”
<n1, dgraph.password, uid> → password
<n1, dgraph.user.group, uid> → list of groups
Because everything is separate, doing the 1:many user-namespace is going to be hard. Maybe skip that for now.
Tip: Keep a watch on this, and see if 1:many would be possible via some other means.