Proposal: Boyscout Refactoring of Functions - by enumerating them

I’m back from being sick!

Anyways, I was starting work on a PR for this issue when I noticed we could probably do better with the functions.

Specifically, my proposal would be to switch this gigantic struct into a smaller struct

// Function holds the information about gql functions.
type Function struct {
	Attr       string
	Lang       string // language of the attribute value
	Args       []Arg  // Contains the arguments of the function.
	UID        []uint64
	NeedsVar   []VarContext // If the function requires some variable
	IsValueVar bool         // eq(val(s), 5)
	IsLenVar   bool         // eq(len(s), 5)
	Fn
}

// NEW TYPE HERE THAT ENUMERATES THE FUNCTIONS SUPPORTED BY DQL
type Fn uint16 // Fn is probably not a great name.

const (
     count Fn = iota
     eq 
... 
)

The primary benefit is that all functions that DQL supports are enumerated once and exactly once, leading to an easy definition of things later on.

Also there appears to be some small benefits to the struct changes. The new struct is 16 bytes smaller as shown here.

fn_new fn_old

While not much of an impact, this will also have impacts on github.com/dgraph-io/dgraph/query.Function.

Again, having defined and enumerating the functions in the gql package, we can free up a lot of additional resources in the downstream. Instead of having a massive switch statement worker/task.go, you can just replace that with a map[parser.Fn]FuncType that is created at init, populated with the correct functions. Less work done.

This also makes the String functions and the datetime functions issues very much easier to address. Again, simply update the enumerations, and the maps in the correct places, and done!

Now I know we are not adding anything to GQL any time soon, but this tiny refactor buys a lot of flexibility down the road. I don’t mind taking it up on a part time basis either. I’ve quite a bit of experience implementing programming languages so I know what sort of pitfalls to look out for.

But, of course I could be very wrong and missing something important, because I’m new to the codebase.

Pros:

  • Buys us flexibility down the road
  • Potential performance gains due to the computer doing much less work in the first place

Cons:

  • This proposal is written by a n00b who may not know all the intricacies of the system yet
  • Really large footprint for a tiny refactor. Pretty much the entire query lang chain is touched.

Can you elaborate a bit more on what you intend to change? I’m not following. How would you fit the entire Function struct into a uint16?

I am at my vote limit currently :frowning:

@mrjn, I believe the overall goal is to remove a gigantic switch statement and put functions in an object from my understanding. Then when a function is needed we can call the object and function by name reference. Is that a close understanding @chewxy ?

I’m saying instead of passing around a string as a function name, pass around a uint16, called Fn. This allows us to directly build map[gql.Fn]T or if you really want to push the performance angle, []T.

This way in order to add new functions to DQL, we simply add to the enumerations and the maps, instead of hunting all over the place to add and then leave some places out (which is what I was fixing in the first place)

I edited the code in my top post with added comments on what I meant.

1 Like

by its identity reference, not name reference (that is what we currently do), but that’s correct.

1 Like

Got it. Optimization wise, might not make a difference. But, if it improves code quality as well, then we’re golden.

1 Like

Experience informs me that when you switch named implementations for numbered implementations, you often see some speedup (traditional example is when you go from straight lambda calculus to one with deBruijn indices, you see a 2-4x speedup). But overall, you’re right, the speedups maybe overall small compared to the fetching of data in the subgraphs, and so, insignificant.