Since more than two decades sql databases are the backbones of my webprojects. While quite solid, I have had some big troubles with them.
Long time ago, when banner exchanges in 468x60 and toplists were popular and google adwords was quite new, I had a startup with 3 other marketers. We all had good ideas to make advertising affordable for the small biz and did get huge feedback and a load of pre-orders… It didn’t take a month and we had more ads to deliver then our system could deliver, which was still under development. I discussed our problems with a good friend of mine, a database expert at the university, several times. But it turned out, that at this time our needs and the possibilities couldn’t get clued together. To get all ads out, we would have need huge ram databases and specialized indexes and syncronisations. Ram was expensive these days, so we need to give up, a day I will not forget. Years later I’ve seen such a system of a guy who took that challenge and made the way just how we were opposed to do. He got multi-millionaire with his ad network…
Even I’m still no pro hacker nor a database specialist, I’ve got some good background and a high interest in how to persist data and getting it back fast.
No wonder, the NoSQL hype of the last years attracted me. But all I tested failed for me on some important parts. It’s often not a general issue, just something which is a really need or must-have for me.
The database is most time the bottleneck of what I like to do on the net. Since a long time I like to get away from the sql db’s I allways use. And that’s not easy… as they and a lot of drivers and tools are these days optimized to it’s best. But: Handling data in spreadsheets is so boring and allways dealing with the impact of a sql parser is lame Its only 2D whatever you wanna do, even with 3D glasses it doesn’t get better
Having some time and also need for a better option, I was one day again sitting in front of my monitor crawling the net for another opportunity to solve my data delivering needs.
Key-value was my topic, as my test with redis some years ago were amazing. But having all your values just in the volatile memory with only a snapshot backup wasn’t something I’d like to cook with.
I came accross Badger and that started the journey. Having indexes in memory, but the values on disk is the key. That’s how the sql db’s get there speed optimized. Yes! Why so much other nosql’s try to pack more in the memory… Keys from values separated, this concept is simple, but great. All genius base on simplicity!
I got attracted, demanding for more exploration. Badger was made to backup a graph database. Cool, saving pointers, something I ever wanted to do. Another of these high fetch theories that started with object oriental databases long long time ago. N:n relations, one of the big problems of a sql database causing redundant data. Wow, I thought to myself, this could be a great adventure to explore. After some thoughts of my other plans, I decided to book this travel and went to the port to fullfill the requirements.
Two weeks ago, I went on the boat and while reading all the safety instructions, I made my first plans on how to discover this planet. This week I’ve converted a the database of a web project of mine from mysql to dgraph. And on this way I got my first insights in the machinery room of this flag ship. I haven’t seen it all, just some engines, parts and screws, I even didn’t understand how all that works together, but it feels like coming from a big container ship on a speed boat. It’s a great taste of getting the world closer, getting more in reach, but still holds a fear of being lost bakened by just a tiny instruction set versus the huge library of standards the big ship gots.
Again simplicity is what lifts the grey clouds and gives a bright sight to the stars: The Nquads. I left out the 4th dimension with facets and handels so far and just used the Tripples. Subject, predicate, object. The base of all the languages narrowed in small objects, incredible. With just these three and some index instructions I can build more then before, wow. Today I made the last part of this conversion and made my first tests. Nothing to proof on, just the first queries, no benchmark or comparism sheets, lookups I need for that project, involving 3-4 joins, that take longer then it is good on the old system. It took me some time to figure out how I get to the same results and my feeling says there is some space in envolving the query system^^ or maybe I need to get used to it. When I pressed the run button, to start the engine, I have had hard times to not fall over board… 16ms was the highest amount of time it to took to 100. Uff, really. Okay, so let me do a search by term, or two terms, three, four with some extra feature. How’s about I only want to get those linked to those? And the ones linked with them? Maybe a hop further and one more search down there? Now its getting slower, but with about 100k objects handling ~2kk nodes, I search thru allmost al the data from left upper to right bottom and back, hahaha. Overwhelming, I’m impressed.
What a great first week!