Add a Bulk Move Tablet and/or A deterministic scheme for Tablets (Also Geo-Sharding Support)

Moved from GitHub dgraph/4452

Posted by MichelDiz:

Experience Report

What you wanted to do

I wanted to move a massive amount of tablets to other groups.

What you actually did

Update: This could be also part of GraphQL admin.

#!/bin/sh

curl http://localhost:6080/'moveTablet?tablet=name&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=actor.film&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=actor.dubbing_performances&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=art_director.films_art_directed&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=casting_director.films_casting_directed&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=character.portrayed_in_films&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=character.portrayed_in_films_dubbed&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=cinematographer.film&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=festival.date_founded&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=festival.focus&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=festival.individual_festivals&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=festival.location&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=festival.sponsoring_organization&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=cut.film&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=cut.note&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=cut.release_region&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=cut.runtime&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=cut.type_of_cut&group=2' & \
curl http://localhost:6080/'moveTablet?tablet=apple_movietrailer_id&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=art_direction_by&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=casting_director&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=cinematography&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=collections&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=costume_design_by&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=country&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=distributors&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=dubbing_performances&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=edited_by&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=executive_produced_by&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=fandango_id&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=featured_locations&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=featured_song&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=festivals&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=format&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=genre&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=initial_release_date&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=locations&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=metacritic_id&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=music&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=netflix_id&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=personal_appearances&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=prequel&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=produced_by&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=production_companies&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=production_design_by&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=rating&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=release_date_s&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=rottentomatoes_id&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=sequel&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=series&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=set_decoration_by&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=songs&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=starring&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=story_by&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=subjects&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=tagline&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=traileraddict_id&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=written_by&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=post_production&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=pre_production&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=runtime&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=other_crew&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=other_companies&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=primary_language&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=soundtrack&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=trailers&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=gross_revenue&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=estimated_budget&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=filming&group=3' & \
curl http://localhost:6080/'moveTablet?tablet=language&group=3'

Why that wasn’t great, with examples

The Script above is an example itself.

I noticed every first(new) Live Load data entry. Tablets are inert in the first group they find until an event that triggers Tablets balance occurs. This seems to be bad for performance.

Also, I noticed that the N-Quads/s goes from 16600 to 21800 N-Quads/s. But just at the start.

A deterministic scheme for Tablets

Perhaps a schema just for determining explicit or implicit rules for Tablets would be great for better Cluster control and planning. The idea is to allow the user to create these rules and that these rules can be exported if they need to use them in an upgrade.

e.g:

OBS: This example would be an Admin GraphQL mutation payload.

{
   "favorite": "1,2", #This is a rule, The group 1 and 2 are the favorite ones. All Tablets not mentioned in the moveTablets. Will be spread in the favorite ones.
   "rebalance_interval": "9",
   "moveTablets":[
      {
         "move":"name, actor.film, character.portrayed_in_films",
         "group":"2", # Move to this group
         "lock":"true" # This is a rule, locking tablets are excluded from the balancing process.
      },
      {
         "move":"language, filming, soundtrack",
         "group":"3",
         "lock":"false" # This is a rule, Dgraph will move the tablets but after N time Dgraph can move it based on disk rules.
      },
      {
         "move":"executive_produced_by, featured_locations, dubbing_performances",
         "group":"3,4", #This will be spread among these groups
         "lock":"true" # These tablets will be balancing between group 3 and 4.
      }
   ]
}

BTW, it would be interesting to have mass moves of Tablets based on Types. However, types often share the same predicate. And that would be a problem. But it would be interesting. Because the Schema for Tablets would be Types orientated. That would simplify the thing.

Geo-Sharding Support

This is an update (07/24/20)

We start the servers with TAGs instead of guessing what group it will form.

{
   "favorite_tag": "southChina", 
   "rebalance_interval": "9",
   "moveTablets":[
      {
         "move": "barcode, codeA, stockPlaceID",
         "group_tag": "eChina_1, eChina_2", # Move to this group based on TAG. 
         "lock": "true" 
      }
   ]
}

PS. This tagging is just an idea. Not sure how it would be implemented. Maybe the zero holds the tag and distribute by the Alphas connected to it. And the Zero knows what “geo constraints” it belongs to.

martinmr commented :

The bulk loader already has logic to try to fairly assign predicates to groups (in merge_shards.go). Perhaps something similar could be implemented in the live loader. In that case there would not be as great of a need to move tablets.

We probably want a solution that minimizes the number of tablet moves as each tablet move degrades performance.

MichelDiz commented :

I understand, but as we make it possible for the user to move tablets arbitrarily. I believe improving this is a good thing to do.

However, the idea was to “organize” before starting the dataset Load, not during the load. Another detail is as I said the user could plan it better. He could put groups in better instances (with NVMe) and the rest in cheaper instances. This can dictate which data groups deserve the most resources. Or which data groups tend to consume more resources. So he moves the heavily used tablets to the better instances. Or for other purposes, that he defines important in his cluster.

As far as I know (correct me if I’m wrong), Dgraph’s balance is based on “size” of the predicate (docs says “Dgraph Zero tries to rebalance the cluster based on the disk usage in each group”). I don’t know if it takes other parameters into account. As general resources or latency.

If the user can control this, It would be better for the DB administrator, who would control these aspects that Dgraph has no context.

Cheers.

martinmr commented :

@manishrjain is probably the best person to decide whether something like this makes sense. I am not really sure giving this type of control would be helpful at all.

parasssh commented :

When this ticket is picked, we should also look at the ability to enable/disable rebalance as it seems to be a use-case for some users where they don’t want rebalance to happen during certain time windows.