Running upsert in python

WolfgangFahl · August 11, 2020, 1:30pm

GitHub - dgraph-io/pydgraph: Official Dgraph Python client explains upserts but there seems not to be any direct support for an upsert like

upsert {  
  query {
    # get the uids of all Country nodes
     countries as var (func: has(<dgraph.type>)) @filter(eq(<dgraph.type>, "Country")) {
        uid
    }
  }
  mutation {
    delete {
      uid(countries) * * .
    }
  }
}

derived from How to bulk delete nodes or cascade delete nodes?

How can this upsert directly applied with the python client?

The client code seems to be overly complex and inconsistent with the documentation. E.g. there has been a discussion on the PR adding txn.upsert rough draft by Kingloko · Pull Request #89 · dgraph-io/pydgraph · GitHub but the doc says in curl something like:

curl -H "Content-Type: application/rdf" -X POST localhost:8080/mutate?commitNow=true -d

upsert {
  query {
    q(func: eq(email, "user@company1.io")) {
      v as uid
      name
    }
  }

  mutation {
    set {
      uid(v) <name> "first last" .
      uid(v) <email> "user@company1.io" .
    }
  }
}' | jq

is possible - so why doesn’t the python client not simply support upsert(upsertCommand) ?

anand · August 11, 2020, 4:42pm

Hi @WolfgangFahl,

I tested the following and it worked fine from python.

Assume this data model.

<OrgLocation>: string @index(exact) .
<OrgName>: string @index(exact) .

Create a type “Org” with the above 2 predicates.
Data is created as below.

{
  set{
    _:newOrg <dgraph.type> "Org" .
    _:newOrg <OrgName> "Org in D" .
    _:newOrg <OrgLocation> "D" .
  }
}

I just tested the following script.

import pydgraph



def main():
    client_stub = pydgraph.DgraphClientStub('localhost:9080')
    client = pydgraph.DgraphClient(client_stub)
    txn = client.txn()
    print('Connection opened!')
    query = """{
      V as var(func: type(Org)) @filter(eq(OrgLocation, \"D\"))
    }"""
    nquad = """
      uid(V) * *  .
    """
    mutation = txn.create_mutation(del_nquads=nquad)
    request = txn.create_request(query=query, mutations=[mutation], commit_now=True)
    txn.do_request(request)
    print('Transaction executed!')

    # Close the client stub.
    client_stub.close()


if __name__ == '__main__':
    try:
        main()
        print('DONE!')
    except Exception as e:
        print('Error: {}'.format(e))

When I query using id of data just created, i see the data has been deleted.

{
  query(func:  uid("0x34")){
    uid
    OrgName
    OrgLocation
  }
}

Please try this and let us know.

WolfgangFahl · August 11, 2020, 4:50pm

@anand thank you for looking into this. Yes this is the awkward behavior explained in the documentation of the python library. It is inconsistent with the upsert syntax. I’d like to use the upsert command directly and not have to split and rename the parts. E.g. nquad is not part of the documentation of upsert. I think the python library is outdated and not consistent with the current state of dgraph.

Also i was not able to get my example working with trying to split and your example is different. This is a bugreport so please improve the library to be consistent with your upsert command documentation and show the same behavior as when using curl directly. What would be the workaround to get the curl-behavior with the current state of the python library? I looked into the source code but could not find anything comparable to the curl request - the python library seems to make a lot of assumptions about the structure of a transaction and does not look open to additions and extensions that come with new versions of draph.

anand · August 11, 2020, 4:52pm

Thanks for the feedback @WolfgangFahl. I will raise a Jira ticket on the documentation front.

anand · August 11, 2020, 4:57pm

Also @WolfgangFahl, what is your feedback on this part of the doc. Just want to make sure I get the problem statement right.

WolfgangFahl · August 12, 2020, 6:13am

https://dgraph.io/docs/mutations/conditional-upsert/ describes the syntax of an upsert statement:

upsert {
  query <query block>
  [fragment <fragment block>]
  mutation [@if(<condition>)] <mutation block 1>
  [mutation [@if(<condition>)] <mutation block 2>]
  ...
}

This is very abstract and i had trouble finding a working example for the usecase to delete all nodes with a certain type so I placed the stackoverflow question: dgraph - How to delete all nodes with a given type? - Stack Overflow

and didn’t get an answer so i tried things out my self based on the example in: How to bulk delete nodes or cascade delete nodes? and came up with the solution:
https://stackoverflow.com/a/63358827/1497139

upsert {  
  query {
    # get the uids of all Country nodes
     countries as var (func: has(<dgraph.type>)) @filter(eq(<dgraph.type>, "Country")) {
        uid
    }
  }
  mutation {
    delete {
      uid(countries) * * .
    }
  }
}

The upsert works in the Ratel Web GUI using the “mutation” tab. It also is supposed to work via curl.

I then wanted to use this in my code https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/dg/dgraph.py which is a wrapper for Dgraph that makes available the functions:

addSchema
addData
query
drop_all
close

in a straight forward way. As outlined in the python unit test

github.com

WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testDgraph.py

'''
Created on 2020-07-23

@author: wf
'''
import unittest
from storage.simple import Simple
from storage.dgraph import Dgraph
from lodstorage.sample import Sample
import getpass
import time

class TestDgraph(unittest.TestCase):
    ''' test Dgraph database '''


    def setUp(self):
        self.host='localhost'
        #if getpass.getuser()=="wf":
        #    self.host='venus'

This file has been truncated. show original

 def testCountries(self):
        ''' 
        test handling countries
        '''
        countryJsonUrl="https://gist.githubusercontent.com/erdem/8c7d26765831d0f9a8c62f02782ae00d/raw/248037cd701af0a4957cce340dabb0fd04e38f4c/countries.json"
        with urllib.request.urlopen(countryJsonUrl) as url:
            countryList=json.loads(url.read().decode())
        #print(countryList)    
        cg=Dgraph(debug=True)
        cg.drop_all()
        schema='''
name: string @index(exact) .
code: string @index(exact) .     
capital: string .   
location: geo .
type Country {
   code
   name
   location
   capital
}'''
        cg.addSchema(schema)
        startTime=time.time()
        for country in countryList:
            # rename dictionary keys
            #country['name']=country.pop('Name')
            country['code']=country.pop('country_code')
            country['dgraph.type']='Country'
            lat,lng=country.pop('latlng')
            country['location']={'type': 'Point', 'coordinates': [lng,lat] }
            print(country) 
        cg.addData(countryList)
        elapsed=time.time() - startTime
        print("adding %d countries took %5.1f s" % (len(countryList),elapsed)) 
        query='''{
# list of countries
  countries(func: has(code)) {
    uid
    name
    code
    capital
    location
  }
}'''
        queryResult=cg.query(query) 
        self.assertTrue("countries" in queryResult)
        countries=queryResult["countries"]
        self.assertEqual(247,len(countries))
        schemaResult=cg.query("schema{}")
        print(schemaResult)
        self.assertTrue("schema" in schemaResult)
        schema=schemaResult["schema"]
        self.assertEqual(8,len(schema))
        # see http://discuss.dgraph.io/t/running-upsert-in-python/9364
        """mutation='''
        upsert {  
  query {
    # get the uids of all Country nodes
     countries as var (func: has(<dgraph.type>)) @filter(eq(<dgraph.type>, "Country")) {
        uid
    }
  }
  mutation {
    delete {
      uid(countries) * * .
    }
  }
}'''
        cg.mutate(mutation)"""
        cg.close

Please note how the upsert is commented out - this is the issue. The python library should IMHO support the upsert syntax directly and allow for calling upsert with an upsert command as per the documentation of it’s syntax. For me it does not make sense that the parts:

upsert syntax documentation
upsert example works via RATEL GUI
upsert example works via curl
are consistent and then the way things are handled is completly different in the python library.

This inconsistency is very frustrating and will make adopting Dgraph hard since there is an extra step that needs some trial and error and fiddling. For the time being i don’t even have work-around how to get things working with the current python library. I’d appreciate a workaround example but the workaround is not the solution !!! It is very important to fix the library! Again: if you do not fix the library there will be a very bad inconsistency making live hard for developers because your documentation, your GUI your curl and your library will not behave in the same way.

WolfgangFahl · August 12, 2020, 6:19am

By the way I’d appreciate if you’d add my class Dgraph to the official python library. I think it helps handling the library and makes life easier. You can also adopt the python unit tests and potentially the script for starting dgraph see dgraph - Starting zero alpha and ratel in a single command e.g. in MacOSX and other environments - Stack Overflow. IMHO the setup of dgraph is much easier with the dgraph script (it’s got a usage) and the wrapper class.

Topic		Replies	Views
Running upserts with pydrgaph using pure dictionary objects Dgraph Clients pydgraph	1	547	August 27, 2021
Question mutation delete Dgraph Clients untagged , pydgraph	3	440	July 11, 2020
Getting result from an upsert Dgraph Clients untagged , pydgraph	8	598	July 11, 2020
Upsert in Golang Client Dgraph Cloud kind:question , dgraph , area:upsert , dql	1	936	August 26, 2022
Can't delete a node with upsert Dgraph kind:question , kind:bug	7	1048	September 29, 2020

Running upsert in python

Related topics