Upsert Not Working As Expected?

Hi there,
There’s a good chance I’m not using upsert correctly since what I am trying to do should be trivial but I’m struggling a lot with making it work.

I Want to Do

I want to make sure a user cannot be created twice. Using an “email” predicate is the unique constraint. In the SQL world, this is much easier by just declaring that a column is unique. In Dgraph, it seems that the only way to do this is through a conditional if.

However, even with the conditional-if, it seems to not work when I hammer it with a few goroutines that try to create a user with the same email. It ends up with duplicate users.

What I Did

Concurrently created the same users over and over again using the upsert mechanism:

type service { *dgo.Dgraph }

func (s *service) CreateUser(ctx context.Context, email, hashedPassword string) error {
	mut := fmt.Sprintf(`
		_:user <dgraph.type> "User" .
		_:user <email> %q .
		_:user <hashedPassword> %q .
	`, email, hashedPassword)
	query := fmt.Sprintf(`query {
		v as var(func: eq(email, %q))
	}`, email)
	resp, err := s.NewTxn().Do(ctx, &api.Request{
		Query: query,
		Mutations: []*api.Mutation{{
			Cond:      `@if(eq(len(v), 0))`,
			SetNquads: []byte(mut),
		}},
		CommitNow: true,
	})
	if err != nil {
		return err
	}
	if len(resp.Uids) > 1 {
		return fmt.Errorf("unexpected uid results length: %d", len(resp.Uids))
	}
	if len(resp.Uids) == 0 {
		return fmt.Errorf("user already exists")
	}
	return nil
}

// elsewhere

func testConcurrentCreate(t *testing.T, s *service) {
	email := "me@gmail.com"
	errCh := make(chan error, 5)
	for i := 0; i < 5; i++ {
		go func() {
			errCh <- s.CreateUser(ctx, email, "123")
		}()
	}
	var errCount int
	for i := 0; i < 5; i++ {
		err := <-errCh
		if err == nil {
			continue
		}
		if err.Error() != "user already exists" {
			t.Fatal(err)
		}
		errCount++
	}
	if errCount != 4 {
		t.Fatalf("expected 4 duplicate entry errors but got %d", errCount)
	}
}

What I expected

I expected only 1 user would ever be created. But instead, I end up with a few duplicate users on every run.

Dgraph Metadata

dgraph version

docker run --rm -it -p 8080:8080 -p 9080:9080 -p 8000:8000 dgraph/standalone:v20.11.1

PASTE THE RESULTS OF dgraph version HERE.

dgraph version

[Decoder]: Using assembly version of decoder
Page Size: 4096

Dgraph version : v20.11.1
Dgraph codename : tchalla-1
Dgraph SHA-256 : cefdcc880c0607a92a1d8d3ba0beb015459ebe216e79fdad613eb0d00d09f134
Commit SHA-1 : 7153d13fe
Commit timestamp : 2021-01-28 15:59:35 +0530
Branch : HEAD
Go version : go1.15.5
jemalloc enabled : true

For Dgraph official documentation, visit https://dgraph.io/docs/.
For discussions about Dgraph , visit http://discuss.dgraph.io.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.

Thank you!

1 Like

I think I may have been missing the “@upsert” directive on the schema declaration. That’s a tricky one :slight_smile:

I’m now getting “ErrAborted” as opposed to just succeeding and returning zero resp.Uids. And then the second time I retry, then I do end up getting zero resp.Uids.

Would love to know if this is the right pattern.

Thanks

1 Like

Nope, you don’t need this directive. This is used for another purpose but similar. It is used to avoid concurrent inserts for the same field on the transaction level (on the fly).

Looking at your code. Feels like it is fine. Have you tried via Ratel with pure DQL? That might mitigate some issue related to Go Coding or DQL.

e.g:

upsert {
  query {
    v as varg(func: eq(email, "test@dgraph.io")){
      uid
      email
    }
  }

  mutation @if(eq(len(v), 0)) {
    set {
      uid(v) <email> "test@dgraph.io" .
    }
  }
}

This always returns the same UID. BTW, you don’t need the cond, only if you wanna avoid inserts.

Note that a single typo in the query block will continue creating new nodes endless. Make sure that _:user <email> %q . is aways printing the same as the query.

I’m not sure it’s possible to test with Ratel as I am testing running the query concurrently. For example, 5 people trying to create a “test@dgraph.io” email around the same exact time.

Without putting the @upsert query, I am not able to ensure uniqueness (via retries as you mentioned).

Here’s an end to end Go program that reproduces the issue:

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/dgraph-io/dgo/v200"
	"github.com/dgraph-io/dgo/v200/protos/api"
	"google.golang.org/grpc"
)

var ctx = context.Background()

func main() {
	c := newClient("localhost:9080")
	err := c.Alter(ctx, &api.Operation{
		Schema: `
			email: string @index(exact) .
			hashedPassword: string @index(exact) .

			type User {
				email
				hashedPassword
			}
		`,
	})
	must(err)
	errCh := make(chan error, 10)
	for i := 0; i < 10; i++ {
		go func() {
			errCh <- createUser(c, "test@dgraph.io")
		}()
	}
	var errCount int
	for i := 0; i < 10; i++ {
		err := <-errCh
		if err == nil {
			continue
		}
		if err.Error() != "user already exists" {
			must(err)
		}
		errCount++
	}
	if errCount != 9 {
		log.Fatalf("expected 9 duplicate entry errors but got %d", errCount)
	}
}

func createUser(s *dgo.Dgraph, email string) error {
	mut := fmt.Sprintf(`
			_:user <dgraph.type> "User" .
			_:user <email> %q .
			_:user <hashedPassword> %q .
		`, email, "password")
	query := fmt.Sprintf(`query {
			v as var(func: eq(email, %q))
		}`, email)
	resp, err := s.NewTxn().Do(ctx, &api.Request{
		Query: query,
		Mutations: []*api.Mutation{{
			Cond:      `@if(eq(len(v), 0))`,
			SetNquads: []byte(mut),
		}},
		CommitNow: true,
	})
	if err != nil {
		return err
	}
	if len(resp.Uids) > 1 {
		return fmt.Errorf("unexpected uid results length: %d", len(resp.Uids))
	}
	if len(resp.Uids) == 0 {
		return fmt.Errorf("user already exists")
	}
	return nil
}

func newClient(url string) *dgo.Dgraph {
	d, err := grpc.Dial(url, grpc.WithInsecure())
	must(err)

	return dgo.NewDgraphClient(
		api.NewDgraphClient(d),
	)
}

func must(err error) {
	if err != nil {
		panic(err)
	}
}

Unless I’m doing something wrong above, is this a bug? Thanks

I see, so the @upsert directive is for your case too.

I don’t know. I personally don’t have a big experience with go code. I understand your code, but I can’t see any problem with it. I hope somebody else is able to check it.

1 Like

Can somebody prove upsert is working in any way. Im experiencing the same issue

Well, the proof is that everytime some dev create a feature. He also create tests See New upsert block by mangalaman93 · Pull Request #3412 · dgraph-io/dgraph · GitHub - So it works. But can happen that some use cases need improvement. And for this you can open a new ticket to let us reproduce and give you the right direction. And please, don’t respond in old posts. Just open a new one and you can give the link of this post is if you need as reference. Thank you!

Closing this.