2006/09/07

BigTable, or Infrastructure Paradise, Part 1

The BigTable paper is a very interesting document, that should be read in parallel with an old post by Adam Bosworth.

What is fascinating with BigTable is of course what it does: scaling to very large sizes, distributed, time-versionned, latency, replication, but even more fascinating with what it does not: BigTable is not a relational DBMS, it is not a strongly typed data model, it is not a write-mostly database, it is not row-oriented, it only supports transaction at row level.

Where do I really need SQL ? How much of my database load is represented by something else than record access - or multiple records taking advantage of the natural locality of a given table ? or just fetching a relationship ? would my search patterns be more efficient by using information retrieval techniques on top of a "dumb" DB holding the truth (i.e. BigTable like) instead of a relational model ? Can I loosen an ACID constraint for some of my data ? How can I take advantage of things I -know- about my own data patterns (hot/cold data ? unrelated data ?) How can I remove from my model hidden assumptions that will block my "transparent scaling" efforts (ID sequences, etc) ? After hearing database war stories, how do I scale with databases ?

I'm not saying at all that SQL databases are inherently bad - but what is good with things like BigTable is that is force you to reexamine your assumptions about what your storage is, and how your application should interop with it.

And of course, my ultimate dream infrastructure would be a "Web 2.0 Infrastructure Software Pack", with a low-level, replicating storage (think GFS, or S3, or Hadoop DFS, ParkPlace), a queue and transaction service (think SQS, Chubby), a inherently scalable database (BigTable, C-Store, ...) and a shared-nothing framework (Rails, PHP, etc.). And scaling would mean adding a box, start services. And I have this feeling that using "standard" LAMP architectures is not the best way to accomplish this: I want a small-scale Google-like architecture on a few boxes.


Comments: Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?