Hadoop 2 (2.2.0) setup on Debian

09 Jan 2014

Today’s post will just be a walk through of the steps required to install Hadoop 2 on Debian Linux. Please note that this is for a single node installation only. This guide is heavily based on the Ubuntu instructions found here.

Install Java

	
# install the java jdk
$ sudo apt-get install openjdk-7-jdk
 
# make a jdk symlink
$ cd /usr/lib/jvm
$ ln -s java-7-openjdk-amd64 jdk
 
# make sure that ssh server is installed
$ sudo apt-get install openssh-server

Add Hadoop Users and Groups

	
# create a new group for hadoop
$ sudo addgroup hadoop
 
# create the hduser and put them in the hadoop group
$ sudo adduser --ingroup hadoop hduser
 
# add them to the sudo group also
$ sudo adduser hduser sudo

Now login as “hduser”.

SSH Certificates

	
# generate your key
$ ssh-keygen -t rsa -P ''
 
# set your public key as authorized
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
 
# test out ssh
$ ssh localhost

Download Hadoop

	
# downoad the package
$ cd ~
$ wget http://mirror.rackcentral.com.au/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz
 
# extract the package
$ sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local
$ cd /usr/local
$ sudo mv hadoop-2.2.0 hadoop
 
# get the hduser to take ownership
$ sudo chown -R hduser:hadoop hadoop

Setup Environment Variables

Add the following lines to your ~/.bashrc

# Hadoop variables
export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL

Add the following lines to /usr/local/hadoop/etc/hadoop/hadoop-env.sh

# modify JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/jdk/

Re-login to your machine as hduser, and check the hadoop version.

$ hadoop version

Configure Hadoop

Add the following lines into the <configuration> node within /usr/local/hadoop/etc/hadoop/core-site.xml

<property>
   <name>fs.default.name</name>
   <value>hdfs://localhost:9000</value>
</property>

Add the following lines into the <configuration> node within /usr/local/hadoop/etc/hadoop/yarn-site.xml

<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
</property>
<property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

Make a copy of the mapred-site template file

	
$ mv mapred-site.xml.template mapred-site.xml
$ vi mapred-site.xml

Add the following lines into the <configuration> node within /usr/local/hadoop/etc/hadoop/mapred-site.xml

<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>

Prepare the Filesystem

	
# create the physical directories
$ cd ~
$ mkdir -p mydata/hdfs/namenode
$ mkdir -p mydata/hdfs/datanode

Add the following lines into the <configuration> node /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<property>
   <name>dfs.replication</name>
   <value>1</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/home/hduser/mydata/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/home/hduser/mydata/hdfs/datanode</value>
 </property>

Format the namenode

	
$ hdfs namenode -format

Start Hadoop

	
$ start-dfs.sh
$ start-yarn.sh
 
# check that services are running
$ jps

Run the Example

	
$ cd /usr/local/hadoop
$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5

Cogs and Levers A blog full of technical stuff

Hadoop 2 (2.2.0) setup on Debian

Install Java

Add Hadoop Users and Groups

SSH Certificates

Download Hadoop

Setup Environment Variables

Configure Hadoop

Prepare the Filesystem

Start Hadoop

Related Posts

Getting Started with the RP2350 22 Jul 2025

Pattern Matching Under The Hood 20 Jul 2025

Traits vs Typeclasses - A Deep Comparison 28 Jun 2025