Hadoop The Definitive Guide 2nd Edition Revised & Updated 内容简介
本书从Hadoop的缘起开始,由浅入深,结合理论和实践,全方位地介绍Hadoop这一高性能处理海量数据集的理想工具。全书共14章,3个附录,涉及的主题包括:Haddoop简介;MapReduce简介;Hadoop分布式文件系统;Hadoop的I/O、MapReduce应用程序开发;MapReduce的工作机制;MapReduce的类型和格式;MapReduce的特性;如何安装Hadoop集群,如何管理Hadoop;Pig简介;Hbase简介;ZooKeeper简介,最后还提供了丰富的案例分析。
本书是Hadoop权威参考,程序员可从中探索如何分析海量数据集,管理员可以从中了解如何安装与运行Hadoop集群。
Hadoop The Definitive Guide 2nd Edition Revised & Updated 目录
Foreword
Preface
1.MeetHadoop
Data!
Data Storage and Analysis
Comparison with Other Systems
RDBMS
Grid Computing
Volunteer Computing
A Brief History of Hadoop
Apache Hadoop and the Hadoop Ecosystem
2.MapReduce
A Weather Dataset
Data Format
Analyzing the Data with Unix Tools
Analyzing the Data with Hadoop
Map andReduce
Java MapReduce
Scaling Out
Data Flow
Combiner Functions
Running a Distributed MapReduce Job
Hadoop Streaming
Ruby
Python
Hadoop Pipes
Compiling and Running
3.TheHadoopDistributed Filesystem
The Design of HDFS
HDFS Concepts
Blocks
Namenodes and Datanodes
The Command.Line Interfaca
Basic Filesystem Operations
Hadoop Filesystems
Interfaces
The Java Interface
Reading Data from a Hadoop URL
Reading Data Using the FileSystem API
Writing Data
Directories
Querying the Filesystem
Deleting Data
Data Flow
Anatomy of a File Read
Anatomy of a File Write
Coherency Model
Parallel Copying with distcp
Keeping an HDFS Cluster Balanced
Hadoop Archives
Using Hadoop Archives
Limitations
4.Hadoop4Hadoop
Data Integrity
Data Integrity in HDFS
LocalFileSystem
ChecksumFileSystem
Compression
Codecs
Compression and Input Splits
Using Compression in MapReduce
Serialization
The Writable Interface
Writable Classes
Implementing a Custom Writable
Serialization Frameworks
Avro
File—Based Data Structures
SequenceFile
……
5.Developing a MapReduce Application
6.How MapReduce Works
7.MapReduce Types and Formats
8.MapReduce Features
9.Setting Up a Hadoop Cluster
10.Administering Hadoop
11.Pig
12.Hive
13.HBase
14.ZooKeeper
15.Sqoop
16.Case Studies
A. Installing Apache Hadoop
B. Cloudera's Distribution for Hadoop
C. Preparing the NCDC Weather Data
Hadoop The Definitive Guide 2nd Edition Revised & Updated 精彩文摘
Hadoop got its start in Nutch. A few of us were attempting to build an open source web search engine and having trouble managing computations running on even a handful of computers. Once Google published its GFS and MapReduce papers, the route became clear. They'd devised systems to solve precisely the problems we were having with Nutch. So we started, two of us, half-time, to try to re-create these systems as a part of Nutch.
We managed to get Nutch limping along on 20 machines, but it soon became clear that to handle the Web's massive scale, we'd need to run it on thousands of machines and, moreover, that the job was bigger than two half-time developers could handle. Around that time, Yahoo! got interested, and quickly put together a team that I joined.
We split off the distributed computing part of Nutch, naming it Hadoop. With the help of Yahoo!, Hadoop soon grew into a technology that could truly scale to the Web. In 2006, Tom White started contributing to Hadoop. I alrea即knew Tom through an excellent article he'd written about Nutch, so I knew he could present complex ideas in clear prose. I soon learned that he could also develop software that was as pleasant to read as his prose.
→→→→→→→→→→→→→→→→→→→→查找获取
评论