View on GitHub

Hadoop How To

Short, targeted, guides to all things Hadoop

Download this project as a .zip file Download this project as a tar.gz file

Here is the classic wordcount example, using the new Hadoop API.

Wordcount

This first of a four-part series shows how to chain and manage multiple MapReduce Jobs.

Chaining and Managing Multiple MapReduce Jobs Part 1: Using a BASH Shell

In order to inspect work node logs in a Hadoop cluster that is behind a firewall with only SSH access, a browser must be setup for tunneling.

Using a browser to tunnel into a Hadoop cluster to inspect worker node logs

Hadoop MapReduce can write key/value output to HDFS in a variety of formats. Here is how to display them.

Display Your MapReduce Output

MRUnit supports two different input/output methods, add and with. Here is the difference.

MRUnit: with vs. add

This four-part series shows how to pass multiple values from a mapper to a reducer, and from the reducer to output.

Passing Multiple Values in MapReduce Part 1: Strings
Passing Multiple Values in MapReduce Part 2: Custom Writables
Passing Multiple Values in MapReduce Part 3: Maps
Passing Multiple Values in MapReduce Part 4: AVRO

Using the software lifecycle and build tool Maven, you can configure Eclipse for Hadoop development in minutes.

Setup Eclipse for Hadoop Development Using Maven (Linux/Mac version)

Setup Eclipse for Hadoop Development Using Maven (Windows version)


Questions about this material?

Hadoop Concepts Forum


These guides are brought to you by

Center of Excellence for Big Data (CoE4BD)
Graduate Programs in Software
University of St. Thomas
St. Paul, Minnesota

http://www.stthomas.edu/CoE4BD
CoE4BD@stthomas.edu
@CoE4BD

In collaboration with Cloudera and their Academic Partnership program

Also see our Technical Reports