Sent to you by jeffye via Google Reader:
via Geeking with Greg by Greg Linden on 2/5/09The good folks at Yahoo Research published a paper at VLDB 2008, "PNUTS: Yahoo!'s Hosted Data Serving Platform" (PDF), that has juicy details on the massively distributed database behind most of Yahoo's web applications.
We describe PNUTS, a massively parallel and geographically distributed database system for Yahoo!'s web applications.Do not miss the related work section of the paper where the authors compare PNUTS to Google Bigtable and Amazon Dynamo, among other things.
The foremost requirements of web applications are scalability, consistently good response time for geographically dispersed users, and high availability. At the same time, web applications can frequently tolerate relaxed consistency guarantees.
For example, if a user changes an avatar ... little harm is done if the new avatar is not initially visible to one friend .... It is often acceptable to read (slightly) stale data, but occasionally stronger guarantees are required by applications.
PNUTS provides a consistency model that is between the two extremes of general serializability and eventual consistency ... We provide per-record timeline consistency: all replicas of a given record apply all updates to the record in the same order .... The application [can] indicate cases where it can do with some relaxed consistency for higher performance .... [such as reading] a possibly stale version of the record.
When reading the paper, a couple things about PNUTS struck me as surprising:
First, the system is layered on top of the guarantees of a reliable pub-sub message broker which acts "both as our replacement for a redo log and our replication mechanism." I have to wonder if the choice to not build these pieces of the database themselves could lead to missed opportunities for improving performance and efficiency.
Second, as figures 3 and 4 show, the average latency of requests to their database seems quite high, roughly 100 ms. This is high enough that web applications probably would incur too much total latency if they made a few requests serially (e.g. ask for some data, then, depending on what the data looks like, ask for some other data). That seems like a problem.
Please see also my August 2006 post, "Google Bigtable paper", which discusses the distributed database behind many products at Google.
Please see also my earlier post, "Highly available distributed hash store at Amazon", on the distributed database behind some features at Amazon.com.
Please see also my earlier posts, "Cassandra data store at Facebook" and "HBase: A Google Bigtable clone".
Update: One of the developers of PNUTS commented on this post, pointing out that PNUTS performance is much better in practice (1-10ms/request) when caching layers are in place and making a few comparisons to Bigtable.