Loading schema/data into Dgraph (running in docker) from file (in Windows folder), automatically

Hi,

I don’t know if this is a Docker issue, Windows issue, or me being a newb issue. Maybe all, or maybe none (though probably the last one). Believe me please, I have read, and re-read, the deploy docs.

I’ve first tried running the “getting started” comand in cmd prompt:

docker run --rm -it -p 8080:8080 -p 9080:9080 -p 8000:8000 -v ~/dgraph:/dgraph dgraph/standalone:v20.03.0

Then in the Docker dashboard, I open a CLI window and try to give the following kinds of commands (for example):

dgraph live -s ./toy_schema
dgraph live -s C:\Users\user1\subfolder\toy_schema
dgraph live -s <C:\Users\user1\subfolder\toy_schema>
dgraph live -s c:/Users/user1/subfolder/toy_schema

These all fail. The cmd prompt is open to the folder where the schema file is, that’s why I tried first one, but it says “file or folder not found”. The second one, the Docker CLI makes the " \ " disappear, so that doesn’t work. And then it doesn’t like the <>. The last one, it just says “file or folder not found” again.

If I use the docker-compose.yml file instead, and run:

docker-compose up

in the cmd prompt, then again open the CLI window and run those commands, they all fail because of

Running transaction with dgraph endpoint: 127.0.0.1:9080
While trying to setup connection: context deadline exceeded. Retrying…

If it’s possible for either of these to work (hopefully I’m just doing something wrong), extension question: can this all be done automatically from one cmd prompt command, like say, adding something to the docker-compose.yml, that can execute the docker live command, or from a single cmd prompt command (I did see docker run can use /bin/bash -c to put together multiple dgraph commands) ??

EDIT: and yes I did share the C: drive, in Docker settings->resources->file sharing.

Now I have added the following to docker-compose.yml under volumes for zero and alpha:

c:/Users/user1/subfolder:/data

I think that will fix the find file problem. Now my command in the Docker CLI window is:

dgraph live -s data/toy_schema -a localhost:9080 -z localhost:5080

And I am selecting the CLI on the zero service from the Docker dashboard. It still gives the error of:

Running transaction with dgraph endpoint: localhost:9080
While trying to setup connection: context deadline exceeded. Retrying…
2020/06/10 20:47:27 Could not setup connection after 1 retries

Could we see your docker.compose.yaml file.
Are you running dgraph live in a container, or from Windows PowerShell or command shell?

Your Windows environment is Windows 10 Pro + Docker-for-Desktop?

version: “3.2”
services:
zero:
image: dgraph/dgraph:latest
volumes:
- /tmp/data:/dgraph
- c:/Users/user1/subfolder:/data
ports:
- 5080:5080
- 6080:6080
restart: on-failure
command: dgraph zero --my=zero:5080
alpha:
image: dgraph/dgraph:latest
volumes:
- /tmp/data:/dgraph
- c:/Users/user1/subfolder:/data
ports:
- 8080:8080
- 9080:9080
restart: on-failure
command: dgraph alpha --my=alpha:7080 --lru_mb=2048 --zero=zero:5080
ratel:
image: dgraph/dgraph:latest
ports:
- 8000:8000
command: dgraph-ratel

Windows 10 Enterprise + Docker Desktop (for Windows) 2.3.0.3

the dgraph live commands are in a command-line window that pops up after I click on the “CLI” button in the Docker dashboard for “zero_1”. I assume that has to be done, I don’t think I can open a new cmd prompt and do it, because that wouldn’t be inside the container?

With Docker-for-Desktop, you can use PowerShell and run the docker command and docker-compose. Docker-for-Desktop will create a VM running Linux, which has the docker service running inside the VM. It does some networking magic to make it seem as if docker is running on localhost, so that port mapping 8080:8080 will work from the Windows host, even those it is actually from the container running on the Hyper-V VM.

I think the formatting got ruined for the docker-compose.yaml as the indentation is not correct. One thing I can spot is that /tmp/data will not work, because Windows doesn’t have a /tmp.

I could suggest using relative paths, for example, in PowerShell:

mkdir Directory $home/dgraph_project
cd $home/dgraph_project
mkdir data

Then, in that directory, you can mount a local directory of ./data. Using your docker-compose, but changing the volume path, you could use this one.

version: "3.2"
services:
  zero:
    image: dgraph/dgraph:latest
    ports:
    - 5080:5080
    - 6080:6080
    volumes:
    - "./data:/data"
    command: dgraph zero --my=zero:5080
  alpha:
    image: dgraph/dgraph:latest
    ports:
    - 8080:8080
    - 9080:9080
    volumes:
    - "./data:/data"
    command: dgraph alpha --my=alpha:7080 --lru_mb=2048 --zero=zero:5080
  ratel:
    image: dgraph/dgraph:latest
    ports:
    - 8000:8000
    command: dgraph-ratel

In PowerShell, assuming your current path is $home/dgraph_project with a docker-compose.yaml in that directory, you can bring this up with

docker-compose up -d

You can test the cluster health through alpha with:

Ratel should be working from:

And also check if GRPC is working through 9080 with this:

dgraph increment --alpha localhost:9080

See if this works. Once we have healthy cluster, we can try other things.

Thanks for the comments and help @joaquin !!

  • I was just trying to use the yml file at the link given in the Dgraph docs for Deloy (https://dgraph.io/docs/deploy/#run-using-docker-compose-on-single-aws-instance)
    perhaps this is assumed to be used in Linux. There doesn’t seem to be any instruction on modificaitons or otherwise using this in Windows.
  • yes the indentation got squashed somehow when I pasted it into my previous reply, but it was there
  • I created “dgraph_data” dir in the folder with the docker-compose.yml, and updated to your code above except changing “./data:/data” to “./dgraph_data:/data”
  • running the following in Powershell works fine, starts that container in Docker:

docker-compose up -d

  • all of the localhost:8080 and localhost:8000 links work fine (although Ratal reports that v20.03.3 is “Health, Dgraph upgrade available” with a yellow dot, instead of green)
  • however, the Powershell is then not capable of running your dgraph command, with the error

dgraph : The term ‘dgraph’ is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.

  • if I instead click the CLI button on the Docker (sub-)container for alpha_1, I can run that command just fine, and it says

# dgraph increment --alpha localhost:9080
[Decoder]: Using assembly version of decoder

Running transaction with dgraph endpoint: localhost:9080
0611 03:03:49.066 Counter VAL: 2 [ Ts: 20 ] Latency: Q 4ms M 5ms S 9ms C 10ms D 1ms
Total: 12ms

  • but if I put my schema and data files in the “dgraph_data” folder (so they’re mounted into /data by the compose yml), still the same errors persist:

# dgraph live -s data/toy_schema -a localhost:9080 -z localhost:5080
[Decoder]: Using assembly version of decoder
I0611 15:06:28.986022 57 init.go:99]

Dgraph version : v20.03.3
Dgraph SHA-256 : 08424035910be6b6720570427948bab8352a0b5a6d59a0d20c3ec5ed29533121
Commit SHA-1 : fa3c19120
Commit timestamp : 2020-06-02 16:47:25 -0700
Branch : HEAD
Go version : go1.14.1

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit http://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.

Running transaction with dgraph endpoint: localhost:9080
2020/06/11 15:06:38 context deadline exceeded
Unable to connect to zero, Is it running at localhost:5080?
github.com/dgraph-io/dgraph/x.Checkf
/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:51
github.com/dgraph-io/dgraph/dgraph/cmd/live.setup
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:383
github.com/dgraph-io/dgraph/dgraph/cmd/live.run
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:443
github.com/dgraph-io/dgraph/dgraph/cmd/live.init.0.func1
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:120
github.com/spf13/cobra.(*Command).execute
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
github.com/spf13/cobra.(*Command).ExecuteC
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
github.com/spf13/cobra.(*Command).Execute
/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:70
main.main
/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:78
runtime.main
/usr/local/go/src/runtime/proc.go:203
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1373

And what I mean by “sub-containers” is this, from the Docker Dashboard:

each of those (alpha_1,zero_1,ratel_1) has its own button for CLI, that can bring up a separate shell window, with sometimes different behaviors on commands.

On the documentation, you are correct, the docs are definitely oriented toward a POSIX oriented shell like bash or zsh (such as macOS or Linux). I think we can improve in as much as possible make it more cross-platform.

Dgraph on Windows host: If you want to run dgraph cli from the Windows host, you can use download it from: Downloads | Dgraph

Dgraph from Docker container: If you use dgraph cli from the docker container, you can do something like this from powershell to enter a bash shell inside the container:

# get current directory name (docker compose uses this in naming containers)
$cwd = "$(Split-Path -Path $(Get-Location) -Leaf)"
# login container managed by docker compose
docker exec -ti ${cwd}_alpha_1 bash

This might be the same thing as how you got there through the Docker-for-Desktop UI. I’m trying this out with docker-machine on Windows 10 Home, which doesn’t have Hyper-V.

The toy_schema, is this something available online, or something you created? Usually for the schema, I would upload both schema and data, not just the schema.

For example, inside the alpha container:

PREFIX=https://github.com/dgraph-io/benchmarks/raw/master/data/
curl --silent --location --remote-name $PREFIX/21million.schema
curl --silent --location --remote-name $PREFIX/21million.rdf.gz
dgraph live \
 -f 21million.rdf.gz \
 -s 21million.schema \
 -a alpha:9080 \
 -z zero:5080

Hi again @joaquin ,

They’re just small examples I made up, to see if I could get loading schema/data from file.

toy_schema:

type Person {
name
age
friend
owns_pet
owns_vehicle
}

type Dog {
name
breed
}

type Cat {
name
breed
}

type Car {
name
make
}

name: string @index(term) .
age: int .
friend: [uid] .
owns_pet: [uid] .
owns_vehicle: [uid] .
breed: string .
make: string .

toy_data:

{
set {
_:michael “Michael” .
_:michael <dgraph.type> “Person” .
_:michael “39” .
_:michael <owns_pet> _:bark .
_:michael _:luke .

_:luke <name> "Luke" .
_:luke <dgraph.type> "Person" .
_:luke <age> "77" .
_:luke <owns_pet> _:meow .
_:luke <friend> _:Michael .

_:bark <name> "Bark" .
_:bark <dgraph.type> "Dog" .
_:bark <breed> "Great Dane" .

_:meow <name> "Meow" .
_:meow <dgraph.type> "Cat" .
_:meow <breed> "Siamese" .

}
}

I’ve tried it both ways, and the different errors still persist.

If I go with the single command line (the “standalone”), then when I do the CLI shell from Docker, it says it can’t find the file.

in Powershell (doesn’t matter if I use “” around the volume path, results are the same):

docker run --rm -it -p 8080:8080 -p 9080:9080 -p 8000:8000 -v ./dgraph_data:/data dgraph/standalone:v20.03.0

in Docker CLI (there is only a single container, in this version):

# dgraph live -f dgraph_data/toy_data.rdf -s dgraph_data/toy_schema.schema -a localhost:9080 -z localhost:5080
[Decoder]: Using assembly version of decoder
I0611 21:22:40.070467 75 init.go:99]

Dgraph version : v20.03.0
Dgraph SHA-256 : 07e63901be984bd20a3505a2ee5840bb8fc4f72cc7749c485f9f77db15b9b75a
Commit SHA-1 : 147c8df9
Commit timestamp : 2020-03-30 17:28:31 -0700
Branch : HEAD
Go version : go1.14.1

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph , visit http://discuss.dgraph.io.
To say hi to the community , visit https://dgraph.slack.com.

Licensed variously under the Apache Public License 2.0 and Dgraph Community License.
Copyright 2015-2020 Dgraph Labs, Inc.

Running transaction with dgraph endpoint: localhost:9080

Processing schema file “dgraph_data/toy_schema.schema”
2020/06/11 21:22:40 open dgraph_data/toy_schema.schema: no such file or directory

If I do it from docker-compose, the error still persists the same when using the above dgraph live command (with both data and schema defined).

I think we would prefer to use the docker-compose, if possible. What is the “standalone” version, and why does it only show one container in Docker? In any case, neither seem to work.

EDIT: for the standalone version, I try every variation of the command, with just using data/toy … or just using the rdf file, etc. It always says it can’t find the file.

I wanted to answer the standalone vs. docker-compose (get to other questions later). Standalone was meant to be a one-liner simple way to run dgraph with docker run, and not meant for production.

For example:

docker run --rm -it -p 8080:8080 -p 9080:9080 -p 8000:8000 -v ./dgraph_data:/dgraph dgraph/standalone:latest

Personally, I prefer having multiple containers with docker-compose as this is closer to a eventual production environment. It also helps you learn the various components, their roles, ports, etc.

1 Like

On live loader. I notice a few that would cause a problems:

  • toy_data: missing type for <name> and <age> in first node
  • toy_data: smart quotes “” instead of ASCII quotes ". Maybe copy/paste error.
  • toy_data and toy_schema: CR/LF - I don’t think this would cause a problem, but just in case, converted it to LF.

The schema needs to be in RDF or JSON format for live loader. I converted your data from the GraphQL+- from to JSON format and then tested it with live loader.

Thus once you are log into the system with docker exec -ti $container_name bash, you would want to run something like:

## create schema file
cat <<-EOF > toy_schema
type Person {
  name
  age
  friend
  owns_pet
  owns_vehicle
}

type Dog {
  name
  breed
}

type Cat {
  name
  breed
}

type Car {
  name
  make
}

name: string @index(term) .
age: int .
friend: [uid] .
owns_pet: [uid] .
owns_vehicle: [uid] .
breed: string .
make: string .
EOF

## create data file in JSON format
cat <<-EOF > toy_data.json
{
  "set": [
    {
      "uid": "_:michael",
      "dgraph.type": "Person",
      "name": "Michael",
      "age": "39",
      "owns_pet": { "uid": "_:bark" },
      "friend": { "uid": "_:luke" }
    },
    {
      "uid": "_:luke",
      "dgraph.type": "Person",
      "name": "Luke",
      "age": "77",
      "owns_pet": { "uid": "_:meow" }
    },
    {
      "uid": "_:bark",
      "dgraph.type": "Dog",
      "name": "Bark",
      "breed": "Great Dane"
    },
    {
      "uid": "_:meow",
      "dgraph.type": "Cat",
      "name": "Meow",
      "breed": "Siamese"
    }
  ]
}
EOF

## use live loader
dgraph live -f toy_data.json -s toy_schema -a alpha:9080 -z zero:5080

Afterwards, I can run a query in Ratel:

{
  pets_friends(func: eq(name, "Michael")) {
    name
    age
    owns_pet {
      breed 
    }
    friend {
      name
      owns_pet {
        breed
      }
    }
  }
}

This will show something like this:

ratel_snippet

I also tried this out by loading the schema in the original format with corrections on a clean system (docker-compose stop; docker-compose rm; docker-compose up -d), and it got the same results.

Thanks again @joaquin for the continued great support !!

  • agree that we prefer to use the docker-compose file over the standalone command

  • the pasting from sublime (or maybe it is the blockcode function in the reply window) is stripping out/changing things that it shouldn’t be. I will just post screenshots, to avoid confusion.

  • docker-compose.yml (change to ./dgraph_data:/dgraph , as you had it above)

  • toy_schema (note: I think there is some confusion here? My understanding is Dgraph must use graphQL± language for schema, and it does not appear you made any changes to this file?

  • toy_data.rdf

  • toy_data.json (this one, you converted from rdf to json, which I took as you have it above)

  • with the schema/data files mounted (rather than using cat), and then using the docker exec command and your version of dgraph live command above, it works with the json file ! Maybe in the live command, it was important to use alpha: and zero: instead of localhost: ? And/or , maybe the files needed to mount to /dgraph instead of /data ? I’m not sure which things did the trick

  • but when doing with the rdf file, there is error

  • our actual data files are already in rdf format, and we prefer not to convert them to json if possible

  • lastly, is there any way you know of to embed these commands into the docker-compose file, so everything loads up on a new machine (with Docker installed) with a single command? If not, we can probably create a script file

  • text editor: I use atom.io for text editor (but also vs code is good too). Atom perserves just pure text, where other adds formatting. Sublime used to be my favorite.
  • loading schema/data:
    • loading directly if you have dgraph cli client installed on the host will work.
    • using a mounted disk didn’t work for me, at least with Docker Machine (Virtualbox) on Win10 Home.
    • docker cp to copy from host to container should also work.
  • RDF vs JSON - When I was testing this, I RDF where uids are autogenerated, I could not get it to work with live loader
    { set { 
       _:michael <name> "Michael" .
       _:michael <dgraph.type> "Person"
       _:luke <name> "Luke" . 
     } 
    }
    
    The RDF with uids manually specified should work fine
    <0x1> <name> "Michael" .
    <0x1> <dgraph.type> "Person" . 
    <0x2> <name> "Luke" .
    

Hi @joaquin, thanks I will try atom.

  • our actual (big) data file is rdf with uids autogenerated, and we would prefer not to rewrite it if possible
  • no issue mounting, for me, it worked fine. dgraph live found the toy_data.rdf file no problem
  • the problem was there was some error with dgraph live to load the data file, in that format

Can you suggest and give example of if there is another way to load the (already mounted) rdf (auto) data file ?

EDIT: if I manually copy and paste the rdf (auto uid) code into the “Mutation” in Ratel, it works fine. So there must be a way that this can be loaded from file??

I found out that the { set { RDF } } is only needed for sending this over HTTP(S). If just have only the n-quads themselves, it should work, e.g.

_:bark <name> "Bark" .
_:bark <dgraph.type> "Dog" .
_:bark <breed> "Great Dane" .

I will try this out later when I get a chance.

1 Like

Yes @joaquin that works !

Thanks for all your support !!

One last question @joaquin (please let me know if needs a new thread):

Even if I run docker-compose stop; docker-compose rm and then run the dgraph live command on a new docker-compose up -d setup … it’s like the data got “doubled”. There are now twice as many edges & nodes as should be. How can the data persist like that? And anyway, is there a dgraph command I can run (in the container … zero? alpha?) to delete the old data before updating the new?

Thank you!

EDIT: I think I solved this one myself, according to https://dgraph.io/docs/deploy/#deleting-database you can simply delete the folders “p”, “w” (and I also had one called “zw”), in the same Windows directory as the schema and data.rdf files.

That is because you have mounted directory on local disk, so that data persists between destroying the in-memory running container. You would need to purge the data files as well to get a clean new environment. When you load the data twice, you get new set of nodes.

Also I did try just the pure RDF as toy_data.rdf and it work as expected with live loader:

dgraph live -f toy_data.rdf -s toy.schema -a alpha:9080 -z zero:5080

The RDF:

_:michael <name> "Michael" .
_:michael <dgraph.type> "Person" .
_:michael <age> "39" .
_:michael <owns_pet> _:bark .
_:michael <friend> _:luke .

_:luke <name> "Luke" .
_:luke <dgraph.type> "Person" .
_:luke <age> "77" .
_:luke <owns_pet> _:meow .
_:luke <friend> _:Michael .

_:bark <name> "Bark" .
_:bark <dgraph.type> "Dog" .
_:bark <breed> "Great Dane" .

_:meow <name> "Meow" .
_:meow <dgraph.type> "Cat" .
_:meow <breed> "Siamese" .
1 Like