Protocol buffer vs JSON time graphs

I did some analysis for this.

Results for the actors query are good. The query was performed for 478936 actors.

{
  me(_xid_: m.0f4vbz) {
    type.object.name.en
    film.actor.film {
      film.performance.film {
        type.object.name.en
      }
    }
  }
}

https://plot.ly/~pawanrawal/24/actors-query/

The results are not as great as expected for the directors query. Protocol buffer performs well when number of entities are upto say 200 but for greater values it is debatable.
Here are results for the director query.(This query was performed with 98000 different directors)

{
  me(_xid_: m.06pj8) {
    type.object.name.en
    film.director.film  {
      film.film.genre {
        type.object.name.en
      }
    }
  }
}

https://plot.ly/~pawanrawal/27/directors-query/

How do the results look @mrjn ?

What were the queries? Can you post them. Also, can you add the JSON latency data in addition to the ratio?

I hunch is that memory allocation is the issue here. We can also look through your code and see if there are any ways to optimize it.

If we want to spend more time on this, then we could dump the Flatbuffer responses in a file. Have a standalone script pick the response, run the PB and JSON converter, and calculate time. Go profiler could also help us dig into the real issue better, with this component isolation.

The other option is that we send back Flatbuffer response directly to Go client, and do the parsing in the client. That would surely be faster and more network efficient – but that’s a someday/maybe task if we really want to squeeze the extra juice for better performance.

I will run the script again and store that in the CSV file too.

Yes, we could do this for the cases in which the ratio is > 1. Should we spend more time on this?

Hmm, can we do that? I thought that wasn’t possible hence we went for proto buffers. If we could do that, still the total time would be the same right?

If you can find a way to dump the Flatbuffer responses to a file, and bring them back in memory in the standalone script – the rest of the thing is relatively straightforward. So, might be worth.

Modified the graphs to also show JSON latency data.

Checking if there is a way to dump flatbuffer data to a file. Though that would need to happen on the server.

Regarding profiling

I was successfully able to gobencode the SubGraph and write it to a file but on retrieving it back I realised that since in SubGraph

type SubGraph struct {
	Attr     string
	Count    int
	Offset   int
	Children []*SubGraph

	query  []byte
	result []byte
}

query and result are not exported, values for those were not encoded and I could not bring them back to memory. I will make some code changes to export them temporarily, but if we want to have the benchmarking code as part of the repo they would have to exported permanently. What do you think @mrjn?

Should be alright. Just check how they’re being used – directly like sg.query, or always from a function.

Off the top of my head, I can’t think of any issues making them publicly available.

1 Like

It’s from within a function always. Cool, I will change this.

Saw this amazing talk by Brad Fitzpatrick Profiling & Optimizing in Go / Brad Fitzpatrick - YouTube about profiling Go Code. @ashwin95r would definitely recommend you to see this.

Did some analysis on the directors query which had 992 elements in its result. Got svg images for the CPU and Memory profiles. Will study these tomorrow and see how we can optimize this.

Very good idea to do this @mrjn.

3 Likes

Comparsion of ToJSON vs JSON encoding of protocol buffer for the query in query_test.go

{
	me(_uid_:0x01) {
		name
		gender
		status
		friend {
			name
		}
	}
}

ToJSON -

{  
   "_root_":[  
      {  
         "_uid_":"0x1",
         "friend":[  
            {  
               "_uid_":"0x17",
               "name":"Rick Grimes"
            },
            {  
               "_uid_":"0x18",
               "name":"Glenn Rhee"
            },
            {  
               "_uid_":"0x19",
               "name":"Daryl Dixon"
            },
            {  
               "_uid_":"0x1f",
               "name":"Andrea"
            },
            {  
               "_uid_":"0x65"
            }
         ],
         "gender":"female",
         "name":"Michonne",
         "status":"alive"
      }
   ]
}

JSON encoding of Protocol buffer

{  
   "uid":"1",
   "attribute":"_root_",
   "properties":[  
      {  
         "prop":"name",
         "val":"TWljaG9ubmU="
      },
      {  
         "prop":"gender",
         "val":"ZmVtYWxl"
      },
      {  
         "prop":"status",
         "val":"YWxpdmU="
      }
   ],
   "children":[  
      {  
         "uid":"23",
         "attribute":"friend",
         "properties":[  
            {  
               "prop":"name",
               "val":"UmljayBHcmltZXM="
            }
         ]
      },
      {  
         "uid":"24",
         "attribute":"friend",
         "properties":[  
            {  
               "prop":"name",
               "val":"R2xlbm4gUmhlZQ=="
            }
         ]
      },
      {  
         "uid":"25",
         "attribute":"friend",
         "properties":[  
            {  
               "prop":"name",
               "val":"RGFyeWwgRGl4b24="
            }
         ]
      },
      {  
         "uid":"31",
         "attribute":"friend",
         "properties":[  
            {  
               "prop":"name",
               "val":"QW5kcmVh"
            }
         ]
      },
      {  
         "uid":"101",
         "attribute":"friend",
         "properties":[  
            {  
               "prop":"name",
               "val":""
            }
         ]
      }
   ]
}

Ignore that the val are base64 encoded. The main point is that the schema of GraphQL isn’t retained.

I see what you mean. I think the best thing to do here would be to have another look at the ToJson code. And ensure the logic is correct.

1 Like

Right, I will have a look at it and see why we get the error.

Oh, that’s fun stuff. You could write a graphql-generating protobuf plugin and manually create scalar types for your leaf values

ccing @snorecone who is the master of this stuff

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.