Go Modules on Badger and Dgraph


(Francesc Campoy) #1

Badger v2.0.0 and Go modules

After a little bit over a year since the last release of Badger, we decided it was time to cut a new release and announce v2.0.0.

This new release includes a bunch of changes, some of them changed the API in backward-incompatible ways, although the data model has remained compatible. Dgraph 1.0.x and other open source projects depend on these changes, so tagging them is necessary for their stability.

At the same time, and following our community’s needs, we are also adding support to use Badger with Go modules. This document explains the subtle interactions and possible inconsistencies of these two changes.

Badger v2.0.0 from a publisher’s point of view

Once v2.0.0 is released, all projects that depend on badger have two options:

  • use badger without Go Modules: importing it as github.com/dgraph-io/badger as before and managing the dependencies as they did previously (most probably some flavor of vendoring)
  • use badger as a Go Module: will require to import the package as github.com/dgraph-io/badger/v2 everywhere in their code as otherwise the Go Modules requirements documented [here](Modules · golang/go Wiki · GitHub) will not be met.

From the Go Wiki on Go Modules:

As a result of Semantic Import Versioning, code opting in to Go modulesmust comply with these rules:

  • Follow semver . (An example VCS tag is v1.2.3).
  • If the module is version v2 or higher, the major version of the modulemustbe included as a /vN at the end of the module paths used in go.mod files (e.g., module github.com/my/mod/v2, require github.com/my/mod/v2 v2.0.0) and in the package import path (e.g.,import "github.com/my/mod/v2/mypkg").
  • If the module is version v0 or v1, do not include the major version in either the module path or the import path.

It is important to note that the constraint on having to import the package with an appended v2 only applies to the code using Go Modules, and it is not a requirement imposed by Badger but rather the Go community and their agreed specification for how Go modules work.

For those not using Go Modules, whether Badger supports Go modules or not has no impact on their build system or code.

This drives me to argue that supporting Go modules on Badger is the right choice. For those using Go Modules, it might cause a change in import path - but that change is due to their decision of using Go Modules rather than our decision to support them.

That sounds good, but actually …

All of the above is correct for modules which do not contain multiple packages which are imported from the module itself. For those, unfortunately, Go Modules require changing the import path used.

So for instance, since github.com/dgraph-io/badger depends on github.com/dgraph-io/badger/table files like db.go will need to update their import path of the table package to indicate they’re using v2, so the import path would now read "github.com/dgraph-io/badger/v2/table.

Once we introduce this constraint, supporting Go Modules while releasing a v2 version all of a sudden impacts all of our users (including ourselves) in a way that seems unacceptable.

Adopting Go Modules on Badger does force a change in all of our clients. The semantic versioning requirements on the import paths make Go Modules an infectious change which forces all our clients to migrate to them just so they can benefit from our latest release.

Before writing the sentence above, I did spend one hour trying to figure out whether the replace directive could be used to avoid the import path rewrite on the Badger repo. Unfortunately, while it makes Badger compile with Go modules without having to rewrite any import paths, the effect of the replace directive is limited to the module containing it, so when a client using Go modules imports v2 all the import paths will somehow be broken.

If you have any ideas on how to fix this situation, please let me know.

But I want to use Badger’s v2 with Go Modules!

That’s still possible, simply add this line to your go.mod.

require github.com/campoy/gomodtest v2.0.0+incompatible

Badger v2.0.0 from a user’s point of view

Badger is an important project to Dgraph Labs, but it is still just a supporting piece of our main project Dgraph the graph database. As such, we need to be careful with how the different changes the community requests from Badger impact Dgraph too. Go modules is not an exception.

Dgraph is currently keeping all dependencies vendored inside of its own repository and managed by govendor. This can be kept this way, which would allow us to start benefiting from v2.0.0 without having to modify a single line of our code as no import paths need changing.

Since Dgraph is a project mostly distributed as a binary rather than a library, adoption Go modules is not a priority. But if we ever decide to go through that migration, the import path to Badger will have to be updated - but this will also be the case to every single post-v2 dependency supporting Go modules.

Conclusion

Adopting Go Modules forces a rather annoying import path change for dependencies with version 2 or higher. This is unfortunate and something that would be forced onto projects that depend on Badger regardless on their interest on adopting Go modules.

Supporting Go Modules on Badger seems therefore like a change with possibly heavy consequences on our clients and without clear benefits. I’d argue it is better not to support Go modules at this point on Badger.


(Andrew Gerrand) #2

If I understand correctly, you’re making some backward-incompatible API changes with the v2 release. So in that case, whether or not your consumers are using Go modules, you must change the import path.

If you do not change the import path, your consumers will get the latest versions with the incompatible changes, whether they want them or not, potentially causing confusion or breakage.

If you do change the import paths for v2, then you are doing the right thing by your users whether or not they use Go modules.

Apologies if I’m missing something here, but as a Go modules user this is my understanding.


(Andrew Gerrand) #3

Oh I just re-read and saw this. I think what you want in this case is to take the “major subdirectory” approach as described here.

Basically you put your v2 in a v2 directory inside your repo, with a go.mod file in that directory, and then clients find your modules under that path whether or not they are using Go modules.


(Russ Cox) #4

Francesc, it sounds like your concern with using modules is this scenario:

  • a non-module user imports github.com/dgraph-io/badger
  • github.com/dgraph-io/badger contains an import for github.com/dgraph-io/badger/v2/table
  • the non-module user’s go command trips over that /v2/ import

But that is not what happens, at least as of Go 1.9.7+, Go 1.10.3+, and Go 1.11+. Those versions of the toolchain (which everyone should have by now) understand that if non-module code finds an import in module code with a /vN/ in it that doesn’t resolve, it should retry without the /vN/.

So yes, you have to update your import paths in your own module. But no, that should not break your non-module users.

See https://github.com/golang/go/wiki/Modules#how-are-v2-modules-treated-in-a-build-if-modules-support-is-not-enabled-how-does-minimal-module-compatibility-work-in-197-1103-and-111 for lots of details.

Suggesting that module users ask for v2.0.0+incompatible is problematic for other reasons and really an unfortunate decision.

Edit: It is also fine to do what Andrew suggests. It just depends on whether you want a v2 subdirectory for now or not.


(Russ Cox) #5

Also, this post seems not to accurately describe the current state of the world for github.com/dgraph-io/badger. That repo has a go.mod that says ‘module github.com/dgraph-io/badger/v2’, as it should, and the import paths have been updated. Go module users should not use v2.0.0+incompatible: it won’t work, because that’s only for non-module repos.


(Francesc Campoy) #6

Hey Andrew and Russ,

Thanks for answering here.

Yes, that’s actually something I’m going to revert - because in the current state (which if I understand correctly is the one proposed by Go Modules) we lost support for govendor and dep.

The biggest user of Badger is Dgraph, which uses govendor to manage dependencies, and breaking Dgraph (or any other client) build’s to support Go Modules doesn’t seem like the right trade-off.

It seems a fix might be done to dep to eventually be able to understand this internal import path change, but my current situation makes it so either I support Go Modules or others - and in this situation it is safer to chose the tools that already worked before.

In addition to that, not supporting Go modules doesn’t make it too complicated for those who do use them to still use Badger, thanks to the +incompatible directive.

I think the bigger point here is about my disagreement with the statement here:

I understand that changing the import path has benefits, but we also have other mechanisms to control this via Gopkg.toml, go.mod, or whatever flavor of requirements.txt we might be using. Wouldn’t that be enough? That would indeed “do less to enable more”, instead of (IMHO) artificially baking into Go Modules a best practice that not everyone follows.

Finally, I have a question about the workaround @adg mentions here:

In this case, do I need to simply add that go.mod in a v2 directory, or do I also need to add Go code in there? I wouldn’t want to maintain a series of types aliases and function wrappers just to support Go Modules either.

Again, thanks for the help!


(Manish R Jain) #7

(Slightly tangential note here, but something to also consider)

Wanted to chime in about my take on semantic versioning in Badger, which I feel can be applied to other DB libraries as well. Semantic Versioning takes a purist and hard stance on what can be via APIs, without consideration for user expectations, industry standards and in particular, databases.

The biggest breakage for DB library users is not a breaking API change. It’s a breaking data layout change on disk. A breaking API change requires a single PR to fix up a users’ code. Annoying, but an hour later (in most cases), you’re done. If that doesn’t work, one can just go back by reverting that PR.

But, a data layout change requires them to take a backup of their entire data set, then re-import it with the new release. They most likely can’t go back, at least not easily.

One could lump every API breakage and data breakage into a major version release, but that would cause the major version to be increased much faster than what DB users expect to see. Case in point, RocksDB and MongoDB after so many years of development are at 6 and 4 respectively. Badger, in its short lifetime, has surely made more than 3 breaking changes to the APIs since the v1.0 release.

One could further argue the need for so many breaking changes. However, I think it is inevitable if your guiding principle is to keep your code base clean (instead of introducing many if branches to deal with backwards compatibility) and your project is understaffed. Both of them have been true for Badger.

So, assuming breaking changes are inevitable and a major version release every few months isn’t expected from a DB, here’s how I think about semantic versioning from a DB standpoint:

  • A major version release happens when data layout on disk changes. This is the most painful to the end-user.
  • A patch version release happens when there’s a bug fix. This is the simplest upgrade for the end-user.
  • Hence, a minor version release should happen on either an API change or a feature release. Every one likes features. An API change from my perspective isn’t as painful as semantic versioning literature makes it to be, particularly in a compiled language. Of course, it can be painful, if for e.g., a used feature was removed. But, that’s rare and can just be part of the major release. However, an API shuffling to provide a bug fix (something we did in Badger), can be part of a minor release (imported and fixed relatively easily by the end-user on upgrade).

(Francesc Campoy) #8

UPDATE: I have now reversed support for Go modules for v2 of Badger.

I then wrote a couple of programs using Go Modules and dep with Badger v1.5.5 and Badger v2.0.0-rc2 as dependencies.

You can see them on the following repository:

As you can see on that repo, by not officially supporting Go modules we’re able to be easily used by dep and Go Modules.


(Andrew Gerrand) #9

It was a Go packaging best practice before it was a Go modules best practice. What about your users that don’t use dep? These breaking changes affect them too.

If you’re not concerned about those users, then why not make this new version of badger v1.6.0 instead of v2.0.0? Then you can make it a proper v1 Go module, and when you do decide to move to v2 then you can do it the proper Go module way.


(Francesc Campoy) #10

Before Go Modules it was a best practice, Go Module made it a requirement - and that’s where my whole issue with Go Modules is at this point.

Most users use some kind of third party dependency which understands that a move from v1.5.x to v1.6.x is safe, as no backwards incompatibilities should exist - and that the same does not apply to a move v2.0.x. This is why we are tagging it as v2.0.0, regardless of Go Modules support.

That said, we are facing a similar issue with v2.1.0 which will be API compatible with v2.0.0 but the data format will change forcing users to backup and restore their database.

Maybe we should make that also a non backwards compatible change, and therefore release a v3.0.0 instead. I’m sincerely curious, what would you do in this case, @adg?


(Andrew Gerrand) #11

Personally if I wanted to make breaking changes to a released, v1.x.y package, I’d release it as v2.0.0 with a new import path, so as not to break any existing users (whether or not they use a package manager, and most do not!). Then if I were to break it again, I’d issue another major release.

If I could, I’d try to make both breaking changes at the same time, to avoid making my users migrate twice. (And also to avoid having to continue to support the in-between major version.)

In the long run you’ll end up with a much less confusing situation if you give new major versions new import paths, particularly if you intend to continue supporting the v1 series. Looking at a piece of code, you’ll be able to see immediately which version of the library is in use, regardless of the package manager setup (or absence thereof). That’ll be a huge benefit in supporting your users.


(Francesc Campoy) #12

I’d love to do that, but changing the import path would make it very hard for those not using Go Modules to depend on Badger.

And unfortunately, Dgraph depends on Badger and we’d rather not adopt Go Modules yet.

If you change Gopkg.toml on the v2.0.0 directory of github.com/campoy/badger-migrations, you will see that the version is not supported due to the change in import paths.

[prune]
  go-tests = true
  unused-packages = true

[[constraint]]
  name = "github.com/dgraph-io/badger"
  version = "=2.0.0-rc1"
$ dep ensure
...
        v2.0.0-rc1: "github.com/dgraph-io/badger" imports "github.com/dgraph-io/badger/v2/y", which contains malformed code: unknown error for "github.com/dgraph-io/badger/v2/y", if you get this error see https://github.com/golang/dep/issues/351
...

Of course, I tried using the right import path (github.com/dgraph-io/badger/v2) on Gopkg.toml but it didn’t work either:

The following issues were found in Gopkg.toml:

  ✗ the name for "github.com/dgraph-io/badger/v2" should be changed to "github.com/dgraph-io/badger"

I expect dep will fix this issue, eventually.
In the meantime, it seems our best option for us is not to support Go Modules.


(Andrew Gerrand) #13

The idea is that everyone who uses v2 should import it as "github.com/dgraph-io/badger/v2". Not just modules users.

Here’s how I’d release v2 of badger=:

  • Make a directory in the badger repo called v2.
  • Put the v2 badger packages inside the v2 directory, and update their import paths to begin with "github.com/dgraph-io/badger/v2"
  • Put a go.mod file in the v2 directory with the declaration module github.com/dgraph-io/badger/v2.
  • Commit all of that, and tag it with v2.0.0.

Then users of badger v2, whether they us modules or not, should import it with the v2 import path. That should work for everyone, right?


(Francesc Campoy) #14

But that would duplicate all of my code, right?
Is that the recommended way of going?
Having to maintain a clone of the codebase every time a new major version appears seems quite costly at long term.


(Andrew Gerrand) #15

The code would be different, presumably, but yes, there would be some duplication.

But what is the cost, exactly?

  • If you’re the maintainer, then you want both versions, because you need to maintain both in parallel.
  • If you’re the user, then you will likely never see the clone of the repo, and if you’re using modules or a package manager that prunes unused deps, you won’t even have the full clone.
  • Because of the way Git works there’s no real cost to having exact copies of files in your tree.
  • The actual size of the copied files is 764KB in this case, which is almost nothing at all (this web page is much, much larger).

In the longer term, either

  1. Go modules is on by default in the Go distitrbution, in which case you can switch to a vN branch with all the vN code in the root of the repo.
  2. You still want to support people who don’t use modules or use old versions of Go, in which case you could either add more vN directories or create new repositories for the newer versions.

But the question you really have to ask yourself is: how many more major versions of this library do I really want to release? It’s not good for users (more churn), and it’s not good for you (more to maintain).

I think that the Go modules import path thing (enforcing the previous best practice) forces us to confront the real cost of making major releases. Just bumping a version number doesn’t feel like much, but having to think about a major version as a new, distinct thing that must be maintained and supported feels (appropriately) like a big deal.


(Jack Lindamood) #16

You may be interested in the PR here: https://github.com/mediocregopher/radix/pull/128 and the conversation here: https://github.com/mediocregopher/radix/issues/53

It is an example of modules which do contain imports of the module itself, a v3 repository, and supporting “go get”, “dep”, “glide”, and go modules while using go modules correctly with a /v3 import path.

It involves using “goforward” on the old location, but otherwise works fine.


Semantic Versioning, Go Modules, and Databases - Dgraph Blog
(Francesc Campoy) #17

Thanks, @adg and @cep21

I will have a look and see whether we revert the reverting of the Go Modules support, hehe.


(Russ Cox) #18

Sorry, Francesc, my bad. I think I assumed that since you overwrote the old v1 API in place, you were for some reason not worried about breaking existing code. Because that’s what it does, modules or not. That is, changing github.com/dgraph-io/badger from the v1 API to the v2 API breaks all users who do a fresh go get of anything that imports github.com/dgraph-io/badger expecting the v1 API. As it has said in the Go FAQ for a long time (my emphasis):

Packages intended for public use should try to maintain backwards compatibility as they evolve. The Go 1 compatibility guidelines are a good reference here: don’t remove exported names, encourage tagged composite literals, and so on. If different functionality is required, add a new name instead of changing an old one. If a complete break is required, create a new package with a new import path.

If you don’t want to break any existing users, you should follow adg’s advice above, which is:

  • roll the repo back to the latest v1 API
  • create a v2 subdirectory containing the v2 API

For Go modules users you should also create a v2/go.mod that says module github.com/dgraph-io/badger/v2 before you tag it v2.0.1 (or whatever your next unused version number is).

At this point every tool, including dep, govendor, and go get all the way back to Go 1, will understand the difference between github.com/dgraph-io/badger and github.com/dgraph-io/badger/v2 and know where to find each.

Yes, you will still have the v1 API in the repo alongside the v2 API, but that’s the cost of not breaking your users. And if you need to update a security or other fix for v1, it’s easy to do and will work just fine when go get, govendor, or any other tool fetches the update.


(Topper Bowers) #19

Hi hi,

I just came over here to note that the tag deletion previously called for an IPFS upgrade. https://github.com/ipfs/go-ipfs/pull/6461 lots of projects depend on IPFS (like ours :)). We then went through a lot of work to update all of our dependencies across a fleet of our apps.

We then woke up today to find them broken again by the addition of a new tag which had the same name as the old tag and therefore breaking builds for IPFS (and therefore us) again.

I would love, love to have module support back, but at the least, can you make the rc2 tag reflect the module support and then move forward with removing if that’s what you desire? (though I don’t suggest that).


(Francesc Campoy) #20

Hi @tobowers,

Thanks for your report. We’re tracking this on issue https://github.com/dgraph-io/badger/issues/904 and it’d be very useful for us if you could help us in there.

I’m very interested in knowing how people are depending on v2.0.0-rc.2 tag, which was actually tagged over a year ago, before any plans for v2 were released. This was way before I joined the company, so I can try to guess, but I feel like this was tagged by mistake as no rc.1 was ever tagged either.

Support for modules is coming back soon, I will publish a post explaining how and when this afternoon.

Again, sorry for the inconvenience caused - this was totally my personal mistake and we’re working hard on fixing it asap.