Running upsert in python

GitHub - dgraph-io/pydgraph: Official Dgraph Python client explains upserts but there seems not to be any direct support for an upsert like

upsert {  
  query {
    # get the uids of all Country nodes
     countries as var (func: has(<dgraph.type>)) @filter(eq(<dgraph.type>, "Country")) {
        uid
    }
  }
  mutation {
    delete {
      uid(countries) * * .
    }
  }
}

derived from How to bulk delete nodes or cascade delete nodes?

How can this upsert directly applied with the python client?

The client code seems to be overly complex and inconsistent with the documentation. E.g. there has been a discussion on the PR adding txn.upsert rough draft by Kingloko · Pull Request #89 · dgraph-io/pydgraph · GitHub but the doc says in curl something like:

curl -H "Content-Type: application/rdf" -X POST localhost:8080/mutate?commitNow=true -d 
upsert {
  query {
    q(func: eq(email, "user@company1.io")) {
      v as uid
      name
    }
  }

  mutation {
    set {
      uid(v) <name> "first last" .
      uid(v) <email> "user@company1.io" .
    }
  }
}' | jq

is possible - so why doesn’t the python client not simply support upsert(upsertCommand) ?

Hi @WolfgangFahl,

I tested the following and it worked fine from python.

  • Assume this data model.
<OrgLocation>: string @index(exact) .
<OrgName>: string @index(exact) .
  • Create a type “Org” with the above 2 predicates.

  • Data is created as below.

{
  set{
    _:newOrg <dgraph.type> "Org" .
    _:newOrg <OrgName> "Org in D" .
    _:newOrg <OrgLocation> "D" .
  }
}
  • I just tested the following script.
import pydgraph



def main():
    client_stub = pydgraph.DgraphClientStub('localhost:9080')
    client = pydgraph.DgraphClient(client_stub)
    txn = client.txn()
    print('Connection opened!')
    query = """{
      V as var(func: type(Org)) @filter(eq(OrgLocation, \"D\"))
    }"""
    nquad = """
      uid(V) * *  .
    """
    mutation = txn.create_mutation(del_nquads=nquad)
    request = txn.create_request(query=query, mutations=[mutation], commit_now=True)
    txn.do_request(request)
    print('Transaction executed!')

    # Close the client stub.
    client_stub.close()


if __name__ == '__main__':
    try:
        main()
        print('DONE!')
    except Exception as e:
        print('Error: {}'.format(e))
  • When I query using id of data just created, i see the data has been deleted.
{
  query(func:  uid("0x34")){
    uid
    OrgName
    OrgLocation
  }
}

Please try this and let us know.

@anand thank you for looking into this. Yes this is the awkward behavior explained in the documentation of the python library. It is inconsistent with the upsert syntax. I’d like to use the upsert command directly and not have to split and rename the parts. E.g. nquad is not part of the documentation of upsert. I think the python library is outdated and not consistent with the current state of dgraph.

Also i was not able to get my example working with trying to split and your example is different. This is a bugreport so please improve the library to be consistent with your upsert command documentation and show the same behavior as when using curl directly. What would be the workaround to get the curl-behavior with the current state of the python library? I looked into the source code but could not find anything comparable to the curl request - the python library seems to make a lot of assumptions about the structure of a transaction and does not look open to additions and extensions that come with new versions of draph.

Thanks for the feedback @WolfgangFahl. I will raise a Jira ticket on the documentation front.

Also @WolfgangFahl, what is your feedback on this part of the doc. Just want to make sure I get the problem statement right.

https://dgraph.io/docs/mutations/conditional-upsert/ describes the syntax of an upsert statement:

upsert {
  query <query block>
  [fragment <fragment block>]
  mutation [@if(<condition>)] <mutation block 1>
  [mutation [@if(<condition>)] <mutation block 2>]
  ...
}

This is very abstract and i had trouble finding a working example for the usecase to delete all nodes with a certain type so I placed the stackoverflow question: dgraph - How to delete all nodes with a given type? - Stack Overflow

and didn’t get an answer so i tried things out my self based on the example in: How to bulk delete nodes or cascade delete nodes? and came up with the solution:
https://stackoverflow.com/a/63358827/1497139

upsert {  
  query {
    # get the uids of all Country nodes
     countries as var (func: has(<dgraph.type>)) @filter(eq(<dgraph.type>, "Country")) {
        uid
    }
  }
  mutation {
    delete {
      uid(countries) * * .
    }
  }
}

The upsert works in the Ratel Web GUI using the “mutation” tab. It also is supposed to work via curl.

I then wanted to use this in my code https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/dg/dgraph.py which is a wrapper for Dgraph that makes available the functions:

  • addSchema
  • addData
  • query
  • drop_all
  • close

in a straight forward way. As outlined in the python unit test

 def testCountries(self):
        ''' 
        test handling countries
        '''
        countryJsonUrl="https://gist.githubusercontent.com/erdem/8c7d26765831d0f9a8c62f02782ae00d/raw/248037cd701af0a4957cce340dabb0fd04e38f4c/countries.json"
        with urllib.request.urlopen(countryJsonUrl) as url:
            countryList=json.loads(url.read().decode())
        #print(countryList)    
        cg=Dgraph(debug=True)
        cg.drop_all()
        schema='''
name: string @index(exact) .
code: string @index(exact) .     
capital: string .   
location: geo .
type Country {
   code
   name
   location
   capital
}'''
        cg.addSchema(schema)
        startTime=time.time()
        for country in countryList:
            # rename dictionary keys
            #country['name']=country.pop('Name')
            country['code']=country.pop('country_code')
            country['dgraph.type']='Country'
            lat,lng=country.pop('latlng')
            country['location']={'type': 'Point', 'coordinates': [lng,lat] }
            print(country) 
        cg.addData(countryList)
        elapsed=time.time() - startTime
        print("adding %d countries took %5.1f s" % (len(countryList),elapsed)) 
        query='''{
# list of countries
  countries(func: has(code)) {
    uid
    name
    code
    capital
    location
  }
}'''
        queryResult=cg.query(query) 
        self.assertTrue("countries" in queryResult)
        countries=queryResult["countries"]
        self.assertEqual(247,len(countries))
        schemaResult=cg.query("schema{}")
        print(schemaResult)
        self.assertTrue("schema" in schemaResult)
        schema=schemaResult["schema"]
        self.assertEqual(8,len(schema))
        # see http://discuss.dgraph.io/t/running-upsert-in-python/9364
        """mutation='''
        upsert {  
  query {
    # get the uids of all Country nodes
     countries as var (func: has(<dgraph.type>)) @filter(eq(<dgraph.type>, "Country")) {
        uid
    }
  }
  mutation {
    delete {
      uid(countries) * * .
    }
  }
}'''
        cg.mutate(mutation)"""
        cg.close

Please note how the upsert is commented out - this is the issue. The python library should IMHO support the upsert syntax directly and allow for calling upsert with an upsert command as per the documentation of it’s syntax. For me it does not make sense that the parts:

  • upsert syntax documentation
  • upsert example works via RATEL GUI
  • upsert example works via curl
    are consistent and then the way things are handled is completly different in the python library.

This inconsistency is very frustrating and will make adopting Dgraph hard since there is an extra step that needs some trial and error and fiddling. For the time being i don’t even have work-around how to get things working with the current python library. I’d appreciate a workaround example but the workaround is not the solution !!! It is very important to fix the library! Again: if you do not fix the library there will be a very bad inconsistency making live hard for developers because your documentation, your GUI your curl and your library will not behave in the same way.

1 Like

By the way I’d appreciate if you’d add my class Dgraph to the official python library. I think it helps handling the library and makes life easier. You can also adopt the python unit tests and potentially the script for starting dgraph see dgraph - Starting zero alpha and ratel in a single command e.g. in MacOSX and other environments - Stack Overflow. IMHO the setup of dgraph is much easier with the dgraph script (it’s got a usage) and the wrapper class.