Query by regex not working in transaction

I think I’ve found a bug related to transactions and querying by regex in the JavaScript gRPC client. Here are the steps:

  1. Start a transaction.
  2. Insert a node.
  3. Query for the node by UID, and it will be found.
  4. Query for the node by regex, and it will not be found.
  5. Abort the transaction.

If I do this instead, there’s no problem:

  1. Start a transaction.
  2. Insert a node.
  3. Commit and discard the transaction.
  4. Start a new transaction
  5. Query for the node by UID, and it will be found.
  6. Query for the node by regex, and it will be found.

I know what follows is a lot of code, but it precisely reproduces the problem I’m seeing. I tried to write this code so that anyone could copy/paste it into a file, say regex.js, and then simply execute node regex.js to see what I’m seeing:

const { DgraphClient, DgraphClientStub, Mutation, Operation } = require('dgraph-js')

test()

async function test() {
  const stub = new DgraphClientStub('localhost:9080')
  const client = new DgraphClient(stub)
  await setSchema(client)

  await failure(client)
  await success(client)

  stub.close()
}

async function failure(client) {
  const txn = client.newTxn()
  const alice = await insertUser(txn, 'Alice')
  const aliceByUid = await findUserByUid(txn, alice.uid)
  const aliceByName = await findUsersByName(txn, 'lic', 10)
  await txn.discard()

  console.log('failure:')
  console.log('  aliceByUid is:', aliceByUid)
  console.log('  aliceByName is:', aliceByName)
}

async function success(client) {
  let txn = client.newTxn()
  const alice = await insertUser(txn, 'Alice')
  await txn.commit()
  await txn.discard()

  txn = client.newTxn()
  const aliceByUid = await findUserByUid(txn, alice.uid)
  const aliceByName = await findUsersByName(txn, 'lic', 10)

  console.log('success:')
  console.log('  aliceByUid is:', aliceByUid)
  console.log('  aliceByName is:', aliceByName)
}

function setSchema(client) {
  const operation = new Operation()

  operation.setSchema(`
    type User {
      User.name
    }
    User.name: string @index(trigram) .
  `)

  return client.alter(operation)
}

async function insertUser(txn, name) {
  const user = {
    'dgraph.type': 'User',
    'User.name': name,
    uid: '_:user'
  }

  const mutation = new Mutation()
  mutation.setSetJson(user)

  const response = await txn.mutate(mutation)
  user.uid = response.getUidsMap().get('user')

  return user
}

async function findUserByUid(txn, uid) {
  const query = `
    query user($uid: string) {
      user(func: uid($uid)) {
        User.name
        uid
      }
    }
  `

  const response = await txn.queryWithVars(query, { $uid: uid })
  return response.getJson().user[0]
}

async function findUsersByName(txn, name, limit) {
  const query = `
    query users($regex: string, $limit: int) {
      users(func: regexp(User.name, $regex), first: $limit) {
        User.name
        uid
      }
    }
  `

  const response = await txn.queryWithVars(query, { $regex: `/${name}/i`, $limit: `${limit}` })
  return response.getJson()?.users
}

Here’s what prints to my console:

failure:
  aliceByUid is: { 'User.name': 'Alice', uid: '0xc365' }
  aliceByName is: []
success:
  aliceByUid is: { 'User.name': 'Alice', uid: '0xc366' }
  aliceByName is: [ { 'User.name': 'Alice', uid: '0xc366' } ]

Am I doing something wrong, or is it a legitimate bug that the regex query doesn’t work in the “failure” case?

1 Like

If I understand the stack thorough enough, the regex filter relies on an index. I would think that the index is not updated until the transaction is committed. I haven’t messed with the Javascript client much at all so I don’t have specific proof if my hypothesis and understanding is correct.

That may explain what’s happening, but does that mean queries depending on an index live outside of Dgraph’s ACID transaction support?

Idk, we need an expert. @MichelDiz ?

@tron why do you need to query with Regex in the same transaction? In general, we just grab the uid and do the next insertions until we commit. I can’t see why you are doing like this.

As far I know, the transaction stays in RAM until you commit. So, it won’t have any index or way to access. Only the Upsert Procedure could. You can try to use this Upsert

1 Like

why do you need to query with Regex in the same transaction?

I’m just trying to understand how Dgraph transactions work, and I provided this example because it surprised me.

I’m coming from relational databases like PostgreSQL and Microsoft SQL where both the data and the index are updated immediately for the current the transaction. So if you start a transaction, then insert, then query based on an index, the query result can contain what you just inserted.

I guess Dgraph just doesn’t support this.

Let’s say I write an api that uses query-with-regex. I’m going to want to write an integration test for it. And because I’m keeping an eye on the future where I have many tests, I would want to make sure that the test data is fully isolated so tests start from a known clean position and can run in parallel. A way to do this, ideally, would be to run any test within a transaction and rollback without committing. No test would impact another test.

However, if I cannot query by regex within a transaction, using newly inserted but not committed test data, then I would need to commit the test data before issuing the query.

This means it would not be possible to run my tests in parallel because I would need to run them serially and wipe the test data in each test setup.

Is it possible (in other DBs) and makes sense to run tests without commit? That feels like a hacky way of testing things.

I’m not sure if that’s a good thing to keep transactions accessible in RAM through the cluster. That might consume a lot of RAM. I’m not against it but feels unusual. And if there’s is demand - why not?

Transactions that are ACID, are Isolated. It makes sense to rely on this property for test isolation, if your database supports a full set of queries within the uncommitted transaction like Postgres does. I argue that this is not hacky at all! In fact, the I in ACID guarantees it. How else can you guarantee the integrity of parallel tests?

By default, when you checkout/1 a connection, it’s set to sandbox: true . This means that the adapter starts a database transaction for your test, making all changes to the database tentative and therefore invisible to other connections. When the test is complete, the adapter does not commit the transaction but rolls it back, abandoning all changes made within it and leaving other tests happily unaware.