Dgraph Live Fails with S3 buckets

Report a Dgraph Bug

When using live loader with an S3 bucket URI like s3://dgraph-dev-backups, live loader will fail with:

Get "https://dataset.s3.dualstack.us-east-1.amazonaws.com/1million.schema": 301 response missing Location header

What version of Dgraph are you using?

Dgraph version   : v21.03.0
Dgraph codename  : rocket
Dgraph SHA-256   : b4e4c77011e2938e9da197395dbce91d0c6ebb83d383b190f5b70201836a773f
Commit SHA-1     : a77bbe8ae
Commit timestamp : 2021-04-07 21:36:38 +0530
Branch           : HEAD
Go version       : go1.16.2
jemalloc enabled : true

Have you tried reproducing the issue with the latest release?

Yep.

What is the hardware spec (RAM, OS)?

n/a

Steps to reproduce the issue (command/config used to run Dgraph).

  1. docker-compose up -d
    version: "3.5"
    services:
      zero:
        image: dgraph/dgraph:v21.03.0
        command: dgraph zero --my=zero:5080 --replicas 1 --raft idx=1
        container_name: zero
    
      alpha:
        image: dgraph/dgraph:v21.03.0
        environment:
          DGRAPH_ALPHA_SECURITY: whitelist=0.0.0.0/0
          AWS_ACCESS_KEY_ID: REDACTED
          AWS_SECRET_ACCESS_KEY: REDACTED
        command: dgraph alpha --my=alpha:7080 --zero=zero:5080
        container_name: alpha
    
  2. Download 1million dataset and upload to a bucket:
    export AWS_PROFILE=dgraph-dev-backups
    PREFIX=https://github.com/dgraph-io/benchmarks/raw/master/data/
    FILES=(1million.schema 1million.rdf.gz)
    for FILE in ${FILES[*]}; do
      curl --silent --location --remote-name $PREFIX/$FILE
      aws s3 cp $FILE s3://dgraph-dev-backups/dataset/
    done
    
  3. Live Load from S3 Bucket
    docker exec -t alpha dgraph live -C \
      -s s3://dgraph-dev-backups/dataset/1million.schema \
      -f s3://dgraph-dev-backups/dataset/1million.rdf.gz \
      -z zero:5080 \
      -a alpha:9080
    

Expected behavior and actual result.

Expect

I would expect this process not to cause a stack trace:

Number of TXs run            : 1042
Number of N-Quads processed  : 1041684
Time spent                   : 2m4.010134056s
N-Quads processed per second : 8400

Actual

A stack trace:

Processing schema file "s3://dgraph-dev-backups/dataset/1million.schema"
2021/06/02 01:52:00 Get "https://dataset.s3.dualstack.us-east-1.amazonaws.com/1million.schema": 301 response missing Location header
Error while reading file
github.com/dgraph-io/dgraph/x.Checkf
	/ext-go/1/src/github.com/dgraph-io/dgraph/x/error.go:51
github.com/dgraph-io/dgraph/dgraph/cmd/live.(*loader).processSchemaFile
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:257
github.com/dgraph-io/dgraph/dgraph/cmd/live.run
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:799
github.com/dgraph-io/dgraph/dgraph/cmd/live.init.0.func1
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/live/run.go:134
github.com/spf13/cobra.(*Command).execute
	/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:830
github.com/spf13/cobra.(*Command).ExecuteC
	/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:914
github.com/spf13/cobra.(*Command).Execute
	/go/pkg/mod/github.com/spf13/cobra@v0.0.5/command.go:864
github.com/dgraph-io/dgraph/dgraph/cmd.Execute
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/cmd/root.go:78
main.main
	/ext-go/1/src/github.com/dgraph-io/dgraph/dgraph/main.go:99
runtime.main
	/usr/local/go/src/runtime/proc.go:225
runtime.goexit

Workaround

When using the full long form of the S3 URI, this works

docker exec -t alpha dgraph live -C \
  -s s3://s3.us-east-2.amazonaws.com/dgraph-dev-backups/dataset/1million.schema \
  -f s3://s3.us-east-2.amazonaws.com/dgraph-dev-backups/dataset/1million.rdf.gz \
  -z zero:5080 \
  -a alpha:9080

I was able to get this working with the triple slash (s3:///), and updated the documentation to make it more clear (PR).

For usability, could there be a friendlier error message rather than a stack trace?