Query by regex not working in transaction

tron · November 20, 2020, 11:02pm

I think I’ve found a bug related to transactions and querying by regex in the JavaScript gRPC client. Here are the steps:

Start a transaction.
Insert a node.
Query for the node by UID, and it will be found.
Query for the node by regex, and it will not be found.
Abort the transaction.

If I do this instead, there’s no problem:

Start a transaction.
Insert a node.
Commit and discard the transaction.
Start a new transaction
Query for the node by UID, and it will be found.
Query for the node by regex, and it will be found.

I know what follows is a lot of code, but it precisely reproduces the problem I’m seeing. I tried to write this code so that anyone could copy/paste it into a file, say regex.js, and then simply execute node regex.js to see what I’m seeing:

const { DgraphClient, DgraphClientStub, Mutation, Operation } = require('dgraph-js')

test()

async function test() {
  const stub = new DgraphClientStub('localhost:9080')
  const client = new DgraphClient(stub)
  await setSchema(client)

  await failure(client)
  await success(client)

  stub.close()
}

async function failure(client) {
  const txn = client.newTxn()
  const alice = await insertUser(txn, 'Alice')
  const aliceByUid = await findUserByUid(txn, alice.uid)
  const aliceByName = await findUsersByName(txn, 'lic', 10)
  await txn.discard()

  console.log('failure:')
  console.log('  aliceByUid is:', aliceByUid)
  console.log('  aliceByName is:', aliceByName)
}

async function success(client) {
  let txn = client.newTxn()
  const alice = await insertUser(txn, 'Alice')
  await txn.commit()
  await txn.discard()

  txn = client.newTxn()
  const aliceByUid = await findUserByUid(txn, alice.uid)
  const aliceByName = await findUsersByName(txn, 'lic', 10)

  console.log('success:')
  console.log('  aliceByUid is:', aliceByUid)
  console.log('  aliceByName is:', aliceByName)
}

function setSchema(client) {
  const operation = new Operation()

  operation.setSchema(`
    type User {
      User.name
    }
    User.name: string @index(trigram) .
  `)

  return client.alter(operation)
}

async function insertUser(txn, name) {
  const user = {
    'dgraph.type': 'User',
    'User.name': name,
    uid: '_:user'
  }

  const mutation = new Mutation()
  mutation.setSetJson(user)

  const response = await txn.mutate(mutation)
  user.uid = response.getUidsMap().get('user')

  return user
}

async function findUserByUid(txn, uid) {
  const query = `
    query user($uid: string) {
      user(func: uid($uid)) {
        User.name
        uid
      }
    }
  `

  const response = await txn.queryWithVars(query, { $uid: uid })
  return response.getJson().user[0]
}

async function findUsersByName(txn, name, limit) {
  const query = `
    query users($regex: string, $limit: int) {
      users(func: regexp(User.name, $regex), first: $limit) {
        User.name
        uid
      }
    }
  `

  const response = await txn.queryWithVars(query, { $regex: `/${name}/i`, $limit: `${limit}` })
  return response.getJson()?.users
}

Here’s what prints to my console:

failure:
  aliceByUid is: { 'User.name': 'Alice', uid: '0xc365' }
  aliceByName is: []
success:
  aliceByUid is: { 'User.name': 'Alice', uid: '0xc366' }
  aliceByName is: [ { 'User.name': 'Alice', uid: '0xc366' } ]

Am I doing something wrong, or is it a legitimate bug that the regex query doesn’t work in the “failure” case?

amaster507 · November 20, 2020, 11:07pm

If I understand the stack thorough enough, the regex filter relies on an index. I would think that the index is not updated until the transaction is committed. I haven’t messed with the Javascript client much at all so I don’t have specific proof if my hypothesis and understanding is correct.

tron · November 20, 2020, 11:36pm

That may explain what’s happening, but does that mean queries depending on an index live outside of Dgraph’s ACID transaction support?

amaster507 · November 20, 2020, 11:56pm

Idk, we need an expert. @MichelDiz ?

MichelDiz · November 23, 2020, 4:45am

@tron why do you need to query with Regex in the same transaction? In general, we just grab the uid and do the next insertions until we commit. I can’t see why you are doing like this.

As far I know, the transaction stays in RAM until you commit. So, it won’t have any index or way to access. Only the Upsert Procedure could. You can try to use this Upsert

tron · November 23, 2020, 5:43pm

why do you need to query with Regex in the same transaction?

I’m just trying to understand how Dgraph transactions work, and I provided this example because it surprised me.

I’m coming from relational databases like PostgreSQL and Microsoft SQL where both the data and the index are updated immediately for the current the transaction. So if you start a transaction, then insert, then query based on an index, the query result can contain what you just inserted.

I guess Dgraph just doesn’t support this.

azafoo · November 23, 2020, 7:12pm

Let’s say I write an api that uses query-with-regex. I’m going to want to write an integration test for it. And because I’m keeping an eye on the future where I have many tests, I would want to make sure that the test data is fully isolated so tests start from a known clean position and can run in parallel. A way to do this, ideally, would be to run any test within a transaction and rollback without committing. No test would impact another test.

However, if I cannot query by regex within a transaction, using newly inserted but not committed test data, then I would need to commit the test data before issuing the query.

This means it would not be possible to run my tests in parallel because I would need to run them serially and wipe the test data in each test setup.

MichelDiz · November 23, 2020, 7:56pm

Is it possible (in other DBs) and makes sense to run tests without commit? That feels like a hacky way of testing things.

I’m not sure if that’s a good thing to keep transactions accessible in RAM through the cluster. That might consume a lot of RAM. I’m not against it but feels unusual. And if there’s is demand - why not?

azafoo · November 23, 2020, 8:15pm

Transactions that are ACID, are Isolated. It makes sense to rely on this property for test isolation, if your database supports a full set of queries within the uncommitted transaction like Postgres does. I argue that this is not hacky at all! In fact, the I in ACID guarantees it. How else can you guarantee the integrity of parallel tests?

By default, when you checkout/1 a connection, it’s set to sandbox: true . This means that the adapter starts a database transaction for your test, making all changes to the database tentative and therefore invisible to other connections. When the test is complete, the adapter does not commit the transaction but rolls it back, abandoning all changes made within it and leaving other tests happily unaware.

Topic		Replies	Views
Some trouble with the final query Dgraph Clients untagged , dgraph-js	2	468	July 11, 2020
Trying to delete from query result in dgraph-js Dgraph kind:question	3	559	September 24, 2020
Got upsert error: rpc error: code = Unknown desc = while lexing query Users	3	1452	November 22, 2019
Dgraph Client mutation is not working? Dgraph mutation , dgraph , client	3	840	October 27, 2021
Grpc error when running query Dgraph kind:question	1	837	December 28, 2020

Query by regex not working in transaction

Related topics