Encrypted exports and imports (via bulk, live loader)

Requirements

Exports in dgraph allows exporting the data nodes in the rdf or json format. Each export results in a schema file and a data file in the selected format, hereafter referred to as exported data to mean both these files. This exported data can then be imported into a new cluster. Up until now, there was no way to encrypt this data. This specification allows encrypted exports. Requirements:

  • Support encryption of exported data
  • Enterprise only
  • Encrypted Exported data must be importable again via the bulk and live loaders.
  • AES-CTR mode with 16, 24, 32 bytes key.

Building Blocks

AES And Chaining with Gzip (Same as Encrypted backups and restores spec)

If encryption is turned on an alpha, then we use the configured encryption key. The key size (16, 24, 32 bytes) determines AES-128/192/256 cipher chosen. We use the AES CTR mode. Currently, the exported data is already gzipped. With encryption, we will encrypt the gzipped data.

Initialization Vector (Same as Encrypted backups and restores spec)

The AES Cipher should use a random Initialization Vector (IV). This is a random sequence of 16 bytes. It must be unique but not a secret to obtain desirable security properties (See https://en.wikipedia.org/wiki/Initialization_vector ) . As a convention, the IV is inserted alongside the cipher text.

During export, the 16 bytes IV is prepended to the Cipher-text data after encryption.

During import (via bulk, live loader), the first 16 bytes is read as the IV before decrypting the rest of the data.

Export

Export is an online tool, meaning it is available when alpha is running. For encrypted exports, the alpha must be configured with the “encryption-key-file”.

Note: encryption-key-file was used for encryption-at-rest and will now also be used for encrypted backups and exports.

For encryption during exports, we chain writers as follows:

Plaintext Data → Gzip → AES Encryption → encrypted exports

Import via Bulk Loader

The bulk loader’s “encryption_key_file” option was previously used to encrypt the output p dir. With this feature, this same option will also be used to decrypt the encrypted export data and schema files.

Another option, –encrypted , indicates whether the input rdf/json data and schema files are encrypted or not. With this switch, we can support the use-case of migration of data from unencrypted exports to encrypted import.

So, with the above two options we have 4 cases:

  1. –encrypted=true and no encryption _key_file

Error. If the input is encrypted, a key file must be provided.

2. --encrypted=true and encryption _key_file=”path to key”

Input is encrypted and output “p” dir is encrypted as well.

3. --encrypted=false and no encryption _key_file

Input in not encrypted and the output “p” dir is also not encrypted.

  1. –encrypted=false and encryption _key_file=”path to key

Input is not encrypted but the output is encrypted. (the migration use-case mentioned above)

Import via Live Loader

A new flag keyfile is added to the live-loader. This option is required to decrypt the encrypted export data and schema files. Once the export files are decrypted, the live loader stream the data to a live alpha instance.

Note: If the live alpha instance has encryption turned on, the “p” dir will be encrypted. Else “p” dir is unencrypted.

Testing Suggestions

  1. Start alpha with encryption. Export data. Verify encrypted export.
  2. Start alpha with no encryption. Export data. Verify unencrypted export.
  3. Try above for multiple exports.
  4. Import unencrypted export data via bulk
  5. Start bulk with encryption key file. This should fail since schema and data is unencrypted.
  6. Start bulk without encryption key file. This should pass. Verify p dir is unencrypted.
  7. Start live with keyfile. This should fail since schema and data is unencrypted.
  8. Start live without keyfile and
    1. Alpha with encryption. This should pass and alpha’s p dir should be encrypted.
    2. Alpha without encryption. This should pass and alpha’s p dir should be unencrypted.
  9. Import encrypted export data via bulk
  10. Start bulk without encryption key file. This should fail since schema and data is encrypted.
  11. Start bulk with encryption key file. This should pass. Verify p dir is encrypted.
  12. Start live without keyfile. This should fail since schema and data is encrypted.
  13. Start live with keyfile and
    1. Alpha with encryption. This should pass and alpha’s p dir should be encrypted.
    2. Alpha without encryption. This should pass and alpha’s p dir should be unencrypted.
  14. Try with 1M and 21M data set.
  15. Try with HTTP and GraphQL clients.

Thanks Paras for the details. @katharine: I think a good candidate for clear documentation. cc: @LGalatin

I believe the documentation is already done. Please let me know if that is not the case @Paras

It is in review right now.