2006/09/27

Google Code SVN History Import ?

So apparently, it is now possible to import an existing repository into Google Code Hosting while preserving history, through svnsync. Which is not yet reflected in the FAQ (in the case you'd be able to actually -find- a link to the FAQ from the Google Hosting site).

2006/09/26

KVO, KVC

Isn't that strange that - (id)valueForKeyPath:(NSString *)keyPathdoes not play well with - (id)valueForUndefinedKey:(NSString *)key in Foundation ? Todo: extract the code, post it here so I can be proved wrong, file a bug.

2006/09/07

BigTable, or Infrastructure Paradise, Part 1

The BigTable paper is a very interesting document, that should be read in parallel with an old post by Adam Bosworth.

What is fascinating with BigTable is of course what it does: scaling to very large sizes, distributed, time-versionned, latency, replication, but even more fascinating with what it does not: BigTable is not a relational DBMS, it is not a strongly typed data model, it is not a write-mostly database, it is not row-oriented, it only supports transaction at row level.

Where do I really need SQL ? How much of my database load is represented by something else than record access - or multiple records taking advantage of the natural locality of a given table ? or just fetching a relationship ? would my search patterns be more efficient by using information retrieval techniques on top of a "dumb" DB holding the truth (i.e. BigTable like) instead of a relational model ? Can I loosen an ACID constraint for some of my data ? How can I take advantage of things I -know- about my own data patterns (hot/cold data ? unrelated data ?) How can I remove from my model hidden assumptions that will block my "transparent scaling" efforts (ID sequences, etc) ? After hearing database war stories, how do I scale with databases ?

I'm not saying at all that SQL databases are inherently bad - but what is good with things like BigTable is that is force you to reexamine your assumptions about what your storage is, and how your application should interop with it.

And of course, my ultimate dream infrastructure would be a "Web 2.0 Infrastructure Software Pack", with a low-level, replicating storage (think GFS, or S3, or Hadoop DFS, ParkPlace), a queue and transaction service (think SQS, Chubby), a inherently scalable database (BigTable, C-Store, ...) and a shared-nothing framework (Rails, PHP, etc.). And scaling would mean adding a box, start services. And I have this feeling that using "standard" LAMP architectures is not the best way to accomplish this: I want a small-scale Google-like architecture on a few boxes.


2006/09/04

S3 Browser hits 1.0

This version adds support for multiple files upload. When uploading a directory, the contained files will be mapped to a slash-delimited path as key to conserve the hierarchy. Another important change in file upload is support for content-type, using by default the mime-type provided by Mac OS X for a given file (thanks Jason for the motivation). Both key and content-type can be changed from the default values when uploading. Here is a example.

The backend code has also been revamped to properly batch multiple parallel requests to the service, streamed transfer can be also inspected like the non-streaming variety. Large buckets are also supported (by sending successive lists queries to build the object list), and there's also a lot of small UI and back-end fixes. And a preference panel. How could an application be finished without a preferences panel ?

All in all, this version has almost everything I wanted initially for this tool, thus the 1.0 moniker (1.0.1 actually, I fixed a small bug with key encoding). The problem of course is that I have new feature ideas. Complete ACL support is quite high on my list, as well as having something to deal with huge buckets, from the UI point of view: probably a hierarchical view of some sort, and a live search on object list.


This page is powered by Blogger. Isn't yours?