I think maybe the worst part of it is that it is not automated. Like, bulk loader will go to the same location that your exports were sent to (as long as its not GCS, but I can look by that until fixed). But it wont even do that for the schema files that are dropped one per group during export. So you have to download them all and concatenate them together manually so it will work… ok, fine.
Then you have a 10h wait on processing the 26 billion things in your export. Also fine, its a lot of things.
But here is the second crazy part - lets say, like me, you have 5 groups. Bulk loader leaves you with 5 directories of 4k files each and you have to copy them across the network to the right places. This is especially stupid in kubernetes, where I have been installing netcat on the init containers of each alpha so I can yolo copy files between running init containers in pods… bad.
I suggest a different story for import:
- export from old cluster, just as it currently is
- new cluster comes up and is fresh. You hit a admin GraphQL mutation that says
import, same signature as
- cluster sets all peers to draining and dumps all of its storage. Then the leaders of the groups of the new system reach out to the storage and see if their group number has a export file.
- what if the number of groups grows between export and import? Maybe it does a 1:1 and then balances to the new group. Maybe it does not do anything special during load and it auto balances later.
- what if the number of groups contract between export and import? Maybe it just assigns 2 files to one group or something like that.
- after the leaders have managed their export files and schema just as dumped, they manage sending snapshots to non-leaders in their group.
- after all peers are at the same raft application, the system is ready to go - no other interaction needed except maybe manually removing the draining state (to match the restore workflow)