See the essence of the Spark Streaming API.
Using the Scala sbt build tool with Spark code.
sbt build for Scala/Spark
How to unit test Scala Spark code.
Spark Unit Testing
Here is how to setup and run SparkR on a cluster.
This shows how to add Scala functions that look like they are part of the Spark API.
Joins in Hive using both reduce-side and bucket map-side approaches.
Here is the classic wordcount example, on Spark YARN, using the three language APIs.
[Python Wordcount, coming soon]
Here is the classic wordcount example, using the new Hadoop API.
Here are two methods for chaining and managing multiple MapReduce Jobs.
Chaining and Managing Multiple MapReduce Jobs with two drivers
Chaining and Managing Multiple MapReduce Jobs with one driver
In order to inspect work node logs in a Hadoop cluster that is behind a firewall with only SSH access, a browser must be setup for tunneling.
Using a browser to tunnel into a Hadoop cluster to inspect worker node logs
Hadoop MapReduce can write key/value output to HDFS in a variety of formats. Here is how to display them.
Display Your MapReduce Output
MRUnit supports two different input/output methods, add and with. Here is the difference.
MRUnit: with vs. add
This four-part series shows how to pass multiple values from a mapper to a reducer, and from the reducer to output.
Passing Multiple Values in MapReduce Part 1: Strings
Passing Multiple Values in MapReduce Part 2: Custom Writables
Passing Multiple Values in MapReduce Part 3: Maps
Passing Multiple Values in MapReduce Part 4: AVRO
Using the software lifecycle and build tool Maven, you can configure Eclipse for Hadoop development in minutes.
Setup Eclipse for Hadoop Development Using Maven (Linux/Mac version)
Setup Eclipse for Hadoop Development Using Maven (Windows version)
Questions about this material?
Hadoop Concepts Forum
These guides are brought to you by
Center of Excellence for Big Data (CoE4BD)
Graduate Programs in Software
University of St. Thomas
St. Paul, Minnesota
In collaboration with Cloudera and their Academic Partnership program
Also see our Technical Reports