Hadoop The Definitive Guide 2nd Edition Revised & Updated pdf

图书网 2017年12月8日14:24:49
评论
2.4K

Hadoop The Definitive Guide 2nd Edition Revised & Updated 内容简介

本书从Hadoop的缘起开始,由浅入深,结合理论和实践,全方位地介绍Hadoop这一高性能处理海量数据集的理想工具。全书共14章,3个附录,涉及的主题包括:Haddoop简介;MapReduce简介;Hadoop分布式文件系统;Hadoop的I/O、MapReduce应用程序开发;MapReduce的工作机制;MapReduce的类型和格式;MapReduce的特性;如何安装Hadoop集群,如何管理Hadoop;Pig简介;Hbase简介;ZooKeeper简介,最后还提供了丰富的案例分析。

本书是Hadoop权威参考,程序员可从中探索如何分析海量数据集,管理员可以从中了解如何安装与运行Hadoop集群。

Hadoop The Definitive Guide 2nd Edition Revised & Updated 目录

Foreword

Preface

1.MeetHadoop

Data!

Data Storage and Analysis

Comparison with Other Systems

RDBMS

Grid Computing

Volunteer Computing

A Brief History of Hadoop

Apache Hadoop and the Hadoop Ecosystem

2.MapReduce

A Weather Dataset

Data Format

Analyzing the Data with Unix Tools

Analyzing the Data with Hadoop

Map andReduce

Java MapReduce

Scaling Out

Data Flow

Combiner Functions

Running a Distributed MapReduce Job

Hadoop Streaming

Ruby

Python

Hadoop Pipes

Compiling and Running

3.TheHadoopDistributed Filesystem

The Design of HDFS

HDFS Concepts

Blocks

Namenodes and Datanodes

The Command.Line Interfaca

Basic Filesystem Operations

Hadoop Filesystems

Interfaces

The Java Interface

Reading Data from a Hadoop URL

Reading Data Using the FileSystem API

Writing Data

Directories

Querying the Filesystem

Deleting Data

Data Flow

Anatomy of a File Read

Anatomy of a File Write

Coherency Model

Parallel Copying with distcp

Keeping an HDFS Cluster Balanced

Hadoop Archives

Using Hadoop Archives

Limitations

4.Hadoop4Hadoop

Data Integrity

Data Integrity in HDFS

LocalFileSystem

ChecksumFileSystem

Compression

Codecs

Compression and Input Splits

Using Compression in MapReduce

Serialization

The Writable Interface

Writable Classes

Implementing a Custom Writable

Serialization Frameworks

Avro

File—Based Data Structures

SequenceFile

……

5.Developing a MapReduce Application

6.How MapReduce Works

7.MapReduce Types and Formats

8.MapReduce Features

9.Setting Up a Hadoop Cluster

10.Administering Hadoop

11.Pig

12.Hive

13.HBase

14.ZooKeeper

15.Sqoop

16.Case Studies

A. Installing Apache Hadoop

B. Cloudera's Distribution for Hadoop

C. Preparing the NCDC Weather Data

Hadoop The Definitive Guide 2nd Edition Revised & Updated 精彩文摘

Hadoop got its start in Nutch. A few of us were attempting to build an open source web search engine and having trouble managing computations running on even a handful of computers. Once Google published its GFS and MapReduce papers, the route became clear. They'd devised systems to solve precisely the problems we were having with Nutch. So we started, two of us, half-time, to try to re-create these systems as a part of Nutch.

We managed to get Nutch limping along on 20 machines, but it soon became clear that to handle the Web's massive scale, we'd need to run it on thousands of machines and, moreover, that the job was bigger than two half-time developers could handle. Around that time, Yahoo! got interested, and quickly put together a team that I joined.

We split off the distributed computing part of Nutch, naming it Hadoop. With the help of Yahoo!, Hadoop soon grew into a technology that could truly scale to the Web. In 2006, Tom White started contributing to Hadoop. I alrea即knew Tom through an excellent article he'd written about Nutch, so I knew he could present complex ideas in clear prose. I soon learned that he could also develop software that was as pleasant to read as his prose.

图书网:Hadoop The Definitive Guide 2nd Edition Revised & Updated pdf

继续阅读

→→→→→→→→→→→→→→→→→→→→查找获取

匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: