Vendoring dependencies for dgraph

  • dgraph uses glock [1] for dependency management
  • Go 1.6 has vendoring enabled by default
  • glock has conditional support for vendoring as that is implemented in a branch [2] [3]
  • Recent node js fiasco [4] and go best practices [5] for other projects clearly indicate that vendoring is an important and useful practice
  • Seems like other projects that are facing the similar issue are moving away from glock e.g. cockroachdb [6] [7]

Is there a plan in place for dgraph to start vendoring dependencies?

glide [8] and govendor [9] seem to be good candidates for using the /vendor directory feature for golang for dependency management


  1. github.com/dgraph-io/dgraph/blob/e754a367dab976f2c7d62b5af71a27db40d5bb5a/README.md#install-dgraph
  2. github.com/robfig/glock/issues/28
  3. github.com/robfig/glock/tree/vendor
  4. github.com/stevemao/left-pad/issues/4
  5. peter.bourgon.org/go-best-practices-2016/#dependency-management
  6. github.com/cockroachdb/cockroach/issues/4182
  7. github.com/cockroachdb/cockroach/pull/4620
  8. github.com/Masterminds/glide
  9. github.com/kardianos/govendor

Note

  • Discuss doesn’t let me assign this into dev or technical category, might be something related to permissions.
  • Apologies for unclickable links, as a new user I cannot have more than 2 urls in my post.

Hey @abhi,

Thanks for bringing this up!

I haven’t given it much thought. I put glock in place following from CockroachDB’s approach, back at Go 1.4. You’re right that Go1.6 natively supports vendoring.

We don’t have anybody looking at it at the moment. So, more than happy for you to take charge here, and get it done. Would you like to?

Cheers,
Manish

P.S. All topics by non-staff folks fall into Users category for now. Other categories are restricted to @staff.

I’ve got a govendored branch here: https://github.com/dan-compton/dgraph/tree/govendor

More than happy to open a PR this weekend

1 Like

I don’t want to over complicate this task but IMO we must consider all the possibilities before making a decision on how do we want to vendor dependencies. As discussed here (https://github.com/cockroachdb/cockroach/issues/4182) there are multiple solutions and things associated with them that we need to be careful about. Below are some of the options, feel free to add more to the list.

  • Option 1 - Use govendor or glide and check-in the /vendor directory with the src (if we are going with this option then we can just merge Dan’s @compton branch). But as mentioned in the #4182 issue on roachdb this will clutter the main repo codebase and anytime we upgrade dependencies or add new ones, PR merging will be a hassle.

  • Option 2 - Keep a separate repo under github.com/dgraph-io/vendor where all the dependencies are vendored (this can be created using glide or govendor) and add that as a submodule to dgraph repo. This will keep the main repo cleaner but adds an extra step of making sure that updated dependencies are checked in the right repository. cockroachdb is not going with this approach as they want to make local modifications in their dependencies and use them with existing tooling workflows. @mrjn - do you potentially see this being a scenario for dgraph?

  • Option 3 - Check in the dependency list file + the lock file to the main repo but ignore the /vendor directory. With a fresh check-out or pull one can always use glide or govendor to create or update the dependencies in the vendor directory. This is the least complicated solution but opens us up to left-pad scenario where original repos might vanish leaving us with a broken build.

One of the primary reasons I started this conversation was because of glock of installing dependencies in my GOPATH rather than using a dedicated install location (like a lot of folks I use a single GOPATH for my projects and any overwriting of dependencies is not a good experience).

Given that right now we have a limited set of dependencies, we can start with Option #3 and then decide between #1 or #2. This addresses my concern regarding installing dependencies in GOPATH but gives us freedom to choose either #1 or #2 later.

Thoughts?

1 Like

This is most certainly an issue – though I’m not sure it’s as terrible as people might imagine. I’d like to propose an option along the lines of Option 2 (which I find highly interesting). Perhaps it could become policy that contributions be merged as a squashed commit for non-vendored code alongside a second commit which includes additions/modifications to the vendored code.

I’ve updated the repo above to reflect what that might look like. Regardless of the outcome, I’m curious to see thoughts on this matter.

Hey guys

Great discussion!

The way we do things at DGraph is for any decision making, we choose a decision maker, closest to the problem. In this case, @abhi, you brought up the problem and have the most context about it. So, I’d like you to be the decision maker here, and choose something for DGraph which makes the most sense. Also, based on your decision, you can send us a PR, which we’ll be happy to integrate once it goes through our code review process.

In terms of advice, we don’t make any changes to the external dependencies at this stage. We might consider that later down the road, and would be great to figure out a plan for that to happen (probably what @compton is proposing).

So, if you’re up to the task, feel free to take charge here. The team would be happy to help.

Cheers,
Manish

P.S. For more context on how decision-making works, see here: http://www.slideshare.net/pearpress/the-decisionmaker-dennisbakkeppt

2 Likes

@compton - The solution you proposed is definitely a viable solution, but I think it is closer to Option 1 rather than 2 as you are still going to check in your vendor directory in your primary repo.

Perhaps it could become policy that contributions be merged as a squashed commit for non-vendored code alongside a second commit which includes additions/modifications to the vendored code.

My understanding is that you are proposing squashing commit to primary repo code + dependency commit whenever there is an update to both, right? A while ago I came across this blog post (http://jamescooke.info/git-to-squash-or-not-to-squash.html) which has interesting insights about squashing git commits. This might not be completely relevant but it does clearly convey that squashing is a somewhat complicated approach. That said, I will definitely explore your proposed solution in future as it might help keeping the git log clean, but at this point we should start with a simpler approach.

As mentioned in my previous reply, I want start with Option 3 now and then we can come back to this conversation later after considering other approaches.

I will open up a PR request in a day or two.

@mrjn - The process explained in those slides seems to be a good fit for open source projects as it encourages community involvement and interest. Thanks for sharing them.

1 Like

Yes, that’s exactly what I proposed. However, the underlying argument was really about advocating policy over submodules. I’m not a huge fan of submodules (probably not even for protobuf) I’m 100% on board with option 3.

What’s the plan for the proto btw?

Hey @abhi,

So for some weird reason, the docker build hangs on govendor sync. I tried to debug it, but couldn’t figure out why that happens. Any ideas?

In fact, any command related to govendor hangs, including govendor list.

Apologies, just saw this (apparently gmail is marking notification email as spam)

I will look into it this tomorrow.

Hey @abhi,

I looked at this issue. Couldn’t figure out why govendor was getting stuck, but I pushed out all the deps code within vendor, and removed govendor from Dockerfile. This fixes the build issue.

@mrjn

Well, this turned out to be a pretty interesting quest. Below is the list of my findings and changes that will prevent the docker build from getting stuck.

Changes to fix this:

Current Dockerfile:

RUN go get -v github.com/dgraph-io/dgraph/... && \
    go build -v github.com/dgraph-io/dgraph/... && \
    go test github.com/dgraph-io/dgraph/... && echo "v0.2.3"

Modified local version used for debugging:

RUN go get -v github.com/kardianos/govendor && \
    go get -v github.com/dgraph-io/dgraph/...

ENV GOROOT /go

RUN cd $GOPATH/src/github.com/dgraph-io/dgraph && govendor sync
       
RUN go test github.com/dgraph-io/dgraph/... && echo "v0.2.3"

This seems to not get stuck and starts test suite execution (which for some reason is writing out this message -
2016/05/18 03:25:21 grpc: Conn.resetTransport failed to create client transport: connection error: desc = "transport: dial tcp 127.0.0.1:8081: getsockopt: connection refused"; Reconnecting to "127.0.0.1:8081" )

Can you please try out this change on your machine?

Once we are certain that the behavior is same on multiple machines either you or me can clean up the dockerfile and file a bug report in govendor repo/

@pawan has been through this bug.

Hey @abhi

Thanks for debugging this. As strange as it sounds, this error is present when the go test statement is in a separate RUN statement from the previous one. That is why I added it to the same statement earlier here https://github.com/dgraph-io/dgraph/pull/79. I am guessing this is a problem with Docker though I didn’t file a bug.

How about you have it all as one RUN statement and set the GOROOT before it?

Just wanted to bring to your notice that for now we have checked in all our dependencies in the repo.

@pawan - thanks for your prompt response. Let me try that out putting everything in the same RUN statement.

Yes I saw that we checked all the dependencies in the repo, thanks for fixing the build. I just couldn’t resist chasing this govendor bug. :slight_smile:


Welp, I tried your suggestion and it complains saying GOPATH should not be set to GOROOT, which I think is recommended. But not setting GOROOT again brings back the dockerfile execution to an impasse.

I will try to dig into this later.

I also looked into glide’s (glide.sh) behavior with docker - https://hub.docker.com/r/devtransition/golang-glide/
It doesn’t seem to have any problems. We can always use that as a backup option if we ever decide to move away from checking dependencies into the repo.

1 Like

This may be an inane suggestion, but have you considered using a build container and corresponding Makefile? This would provide build consistency. Could look like:

PROJECT := github.com/dgraph-io/dgraph

# docker tag is DGRAPH_VERSION unless DGRAPH_VERSION is set
DGRAPH_VERSION := $(shell git rev-parse --short HEAD)
GIT_BRANCH := $(shell git rev-parse --abbrev-ref HEAD)
GITUNTRACKEDCHANGES := $(shell git status --porcelain --untracked-files=no)
ifneq ($(GITUNTRACKEDCHANGES),)
GITCOMMIT := $(DGRAPH_VERSION)-dirty
endif

all: clean fat build

clean:
rm -fr target bin pkg

build:
docker run \
-v `pwd`:/gopath/src/$(PROJECT) \
quay.io/dgraph-io/build-go:16
docker build -t quay.io/dgraph-io/dgraph:${DGRAPH_VERSION} .

docker-push:
docker push quay.io/dgraph-io/dgraph:${DGRAPH_VERSION}

fmt:
@gofmt -w .

.PHONY: clean all
.PHONY: $(BINARIES)
.PHONY: $(CMDS)

Edit: https://github.com/opsee/build-go is our build container.

Interesting suggestion @compton. Just not sure about the benefits at this point.

I think there are numerous considerations here including those associated with the test-deploy cycle and mitigating the risk of occurrence of issues like the above. However, the most immediate benefit to me would be that this pattern naturally limits container size.

The current size of the dgraph/dgraph container is 1.6GB whereas the current size of dancompton/dgraph is 48mb.

Termie from wercker wrote a pretty decent article about Dockerfile anti-patterns, build-test, and minimal containers http://blog.wercker.com/2015/07/28/Dockerfiles-considered-harmful.html that describes wercker’s philosophy.

I put together a branch and build-container to demonstrate this last night:

branch: https://github.com/dgraph-io/dgraph/commit/eb47a6812ec8a6b30f8aeb378dadd195ccd40444

build-container: https://github.com/dan-compton/build-go

In theory you should be able to clone my build repo (to $GOPATH/src/github.com/dgraph-io/dgraph, unfortunately), run make, and have a dgraph container so long as you have docker installed on your host

Big Edit: I accidentally wiped out my addition of the rockdb dependencies in the Dockerfile, so this will not work until I add back.

Edit2: fixed

2 Likes