Query by regex not working in transaction

I think I’ve found a bug related to transactions and querying by regex in the JavaScript gRPC client. Here are the steps:

  1. Start a transaction.
  2. Insert a node.
  3. Query for the node by UID, and it will be found.
  4. Query for the node by regex, and it will not be found.
  5. Abort the transaction.

If I do this instead, there’s no problem:

  1. Start a transaction.
  2. Insert a node.
  3. Commit and discard the transaction.
  4. Start a new transaction
  5. Query for the node by UID, and it will be found.
  6. Query for the node by regex, and it will be found.

I know what follows is a lot of code, but it precisely reproduces the problem I’m seeing. I tried to write this code so that anyone could copy/paste it into a file, say regex.js, and then simply execute node regex.js to see what I’m seeing:

const { DgraphClient, DgraphClientStub, Mutation, Operation } = require('dgraph-js')

test()

async function test() {
  const stub = new DgraphClientStub('localhost:9080')
  const client = new DgraphClient(stub)
  await setSchema(client)

  await failure(client)
  await success(client)

  stub.close()
}

async function failure(client) {
  const txn = client.newTxn()
  const alice = await insertUser(txn, 'Alice')
  const aliceByUid = await findUserByUid(txn, alice.uid)
  const aliceByName = await findUsersByName(txn, 'lic', 10)
  await txn.discard()

  console.log('failure:')
  console.log('  aliceByUid is:', aliceByUid)
  console.log('  aliceByName is:', aliceByName)
}

async function success(client) {
  let txn = client.newTxn()
  const alice = await insertUser(txn, 'Alice')
  await txn.commit()
  await txn.discard()

  txn = client.newTxn()
  const aliceByUid = await findUserByUid(txn, alice.uid)
  const aliceByName = await findUsersByName(txn, 'lic', 10)

  console.log('success:')
  console.log('  aliceByUid is:', aliceByUid)
  console.log('  aliceByName is:', aliceByName)
}

function setSchema(client) {
  const operation = new Operation()

  operation.setSchema(`
    type User {
      User.name
    }
    User.name: string @index(trigram) .
  `)

  return client.alter(operation)
}

async function insertUser(txn, name) {
  const user = {
    'dgraph.type': 'User',
    'User.name': name,
    uid: '_:user'
  }

  const mutation = new Mutation()
  mutation.setSetJson(user)

  const response = await txn.mutate(mutation)
  user.uid = response.getUidsMap().get('user')

  return user
}

async function findUserByUid(txn, uid) {
  const query = `
    query user($uid: string) {
      user(func: uid($uid)) {
        User.name
        uid
      }
    }
  `

  const response = await txn.queryWithVars(query, { $uid: uid })
  return response.getJson().user[0]
}

async function findUsersByName(txn, name, limit) {
  const query = `
    query users($regex: string, $limit: int) {
      users(func: regexp(User.name, $regex), first: $limit) {
        User.name
        uid
      }
    }
  `

  const response = await txn.queryWithVars(query, { $regex: `/${name}/i`, $limit: `${limit}` })
  return response.getJson()?.users
}

Here’s what prints to my console:

failure:
  aliceByUid is: { 'User.name': 'Alice', uid: '0xc365' }
  aliceByName is: []
success:
  aliceByUid is: { 'User.name': 'Alice', uid: '0xc366' }
  aliceByName is: [ { 'User.name': 'Alice', uid: '0xc366' } ]

Am I doing something wrong, or is it a legitimate bug that the regex query doesn’t work in the “failure” case?

1 Like

If I understand the stack thorough enough, the regex filter relies on an index. I would think that the index is not updated until the transaction is committed. I haven’t messed with the Javascript client much at all so I don’t have specific proof if my hypothesis and understanding is correct.

That may explain what’s happening, but does that mean queries depending on an index live outside of Dgraph’s ACID transaction support?

Idk, we need an expert. @MichelDiz ?

@tron why do you need to query with Regex in the same transaction? In general, we just grab the uid and do the next insertions until we commit. I can’t see why you are doing like this.

As far I know, the transaction stays in RAM until you commit. So, it won’t have any index or way to access. Only the Upsert Procedure could. You can try to use this https://www.youtube.com/watch?v=lIUNIR7L7jw

1 Like

why do you need to query with Regex in the same transaction?

I’m just trying to understand how Dgraph transactions work, and I provided this example because it surprised me.

I’m coming from relational databases like PostgreSQL and Microsoft SQL where both the data and the index are updated immediately for the current the transaction. So if you start a transaction, then insert, then query based on an index, the query result can contain what you just inserted.

I guess Dgraph just doesn’t support this.

Let’s say I write an api that uses query-with-regex. I’m going to want to write an integration test for it. And because I’m keeping an eye on the future where I have many tests, I would want to make sure that the test data is fully isolated so tests start from a known clean position and can run in parallel. A way to do this, ideally, would be to run any test within a transaction and rollback without committing. No test would impact another test.

However, if I cannot query by regex within a transaction, using newly inserted but not committed test data, then I would need to commit the test data before issuing the query.

This means it would not be possible to run my tests in parallel because I would need to run them serially and wipe the test data in each test setup.

Is it possible (in other DBs) and makes sense to run tests without commit? That feels like a hacky way of testing things.

I’m not sure if that’s a good thing to keep transactions accessible in RAM through the cluster. That might consume a lot of RAM. I’m not against it but feels unusual. And if there’s is demand - why not?