HADOOP DISTRIBUTED FILE SYSTEM (HDFS) ARCHITECTURAL DOCUMENTATION
Contents
2 HDFS Assumptions and Goals
2.1 Hardware Failures
2.2 Streaming Data Access
2.3 Large Data Sets
2.4 Simple Coherency Model
2.1 Hardware Failures
2.2 Streaming Data Access
2.3 Large Data Sets
2.4 Simple Coherency Model
4 Communication Among HDFS Elements
4.1 Application Code <-> Client
4.2 Client <-> NameNode
4.3 Client <-> DataNode
4.4 NameNode <-> DataNode
4.1 Application Code <-> Client
4.2 Client <-> NameNode
4.3 Client <-> DataNode
4.4 NameNode <-> DataNode
5 Decomposition and Basic Concepts of HDFS Elements
5.1 Client
5.2 NameNode Decomposition
5.3 DataNode Block Management
5.1 Client
5.2 NameNode Decomposition
5.3 DataNode Block Management
Executive Summary
This document captures the major architectural decisions in HDFS 0.21. The purpose of the document provide a guide to the overall structure of the HDFS code so that contributors can more effectively understand how changes that they are considering can be made, and the consequences of those changes.
The audience for this report is both contributors (who will use the document to gain an understanding of the structure of HDFS and its design rationale) and committers who will use the document to reason about future changes and who will update the document as the system evolves.