How To Install Apache Hadoop / HBase on Ubuntu 18.04

HBase is an open source distributed non-relational database developed under the Apache Software Foundation. It is written in Java & runs on top of Hadoop File Systems (HDFS). HBase is one of the dominant databases when working with big data. It is designed for a quick read & write access to huge amounts of structured data.

Today, we will cover our first guide on the Installation of Hadoop & HBase on Ubuntu 18.04 and it is a HBase Installation on a Single Node Hadoop Cluster. It is done on a barebone Ubuntu 18.04 Virtual Machine with 8GB Ram & 4vCPU

Installing Hadoop on Ubuntu 18.04

Cover these steps to install a Single node Hadoop cluster on Ubuntu 18.04 LTS

Step 1: Update System

To deploy Hadoop & HBase on Ubuntu , update it.

- Advertisement -

sudo apt update
sudo apt -y upgrade
sudo reboot

Step 2: Install Java

Skip this step if you have Installed java.

sudo apt install openjdk-8-jre-headless
sudo apt update

Confirm the Installation of Java by

sabi@Ubuntu:~$ java -version
 openjdk version "1.8.0_232"
 OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09)
 OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

Set up JAVA_HOME variable.

cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export PATH=\$PATH:\$JAVA_HOME/bin
EOF

Now, update your PATH & settings.

source /etc/profile.d/hadoop_java.sh

Testing Java

sabi@Ubuntu:~$ echo $JAVA_HOME
 /usr/lib/jvm/java-11-openjdk-amd64

Step 3: Creating User Account

Move forward to create an Account for Hadoop so we have isolation b/w the Hadoop file system & the Unix file system.

sabi@Ubuntu:~$ sudo adduser hadoop
 Adding user hadoop' ... Adding new grouphadoop' (1001) …
 Adding new user hadoop' (1001) with grouphadoop' …
 Creating home directory /home/hadoop' ... Copying files from/etc/skel' …
 Enter new UNIX password: 
 Retype new UNIX password: 
 passwd: password updated successfully
 Changing the user information for hadoop
 Enter the new value, or press ENTER for the default
     Full Name []: Sabir Hussain
     Room Number []: 
     Work Phone []: 
     Home Phone []: 
     Other []: 
 Is the information correct? [Y/n] y
sabi@Ubuntu:~$ sudo usermod -aG sudo hadoop

After adding user, generate SS key pair for the user.

sabi@Ubuntu:~$ sudo su - hadoop
hadoop@Ubuntu:~$ ssh-keygen -t rsa
 Generating public/private rsa key pair.
 Enter file in which to save the key (/home/hadoop/.ssh/id_rsa): 
 Created directory '/home/hadoop/.ssh'.
 Enter passphrase (empty for no passphrase): 
 Enter same passphrase again: 
 Your identification has been saved in /home/hadoop/.ssh/id_rsa.
 Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
 The key fingerprint is:
 SHA256:f/lEUTkJyr49dZEHr9xZ7wCD4Lg3+ephloHQ8w8GVlY hadoop@Ubuntu
 The key's randomart image is:
 +---[RSA 2048]----+
 |        +.E  .o +|
 |     . = ….  B.|
 |    . * . .oo .o=|
 |     o * ..  + +*|
 |      o S  .  =o+|
 |       o O  oo.o.|
 |        = +.oo. .|
 |       o o . o.  |
 |       .o     .  |
 +----[SHA256]-----+

Allow authorization

Add this user’s key to list of Authorized ssh keys.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

Make sure that you can ssh using added key.

hadoop@Ubuntu:~$ ssh localhost
 The authenticity of host 'localhost (127.0.0.1)' can't be established.
 ECDSA key fingerprint is SHA256:jyWPWJLVC9MCHnOAFJjN8c8bwLu0o0U85cWTxHwuHvE.
 Are you sure you want to continue connecting (yes/no)? y
 Please type 'yes' or 'no': yes
 Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
 Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 5.0.0-37-generic x86_64)
 Documentation:  https://help.ubuntu.com
 Management:     https://landscape.canonical.com
 Support:        https://ubuntu.com/advantage
 Canonical Livepatch is available for installation.
 Reduce system reboots and improve kernel security. Activate at:
  https://ubuntu.com/livepatch 
 0 packages can be updated.
 0 updates are security updates.
 Your Hardware Enablement Stack (HWE) is supported until April 2023.
 The programs included with the Ubuntu system are free software;
 the exact distribution terms for each program are described in the
 individual files in /usr/share/doc/*/copyright.
 Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
 applicable law.
 hadoop@Ubuntu:~$ exit
 logout
 Connection to localhost closed.

Step 4: Download & Install Hadoop

Go for the latest release of Hadoop & download it.

wget https://www-eu.apache.org/dist/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz

Extract the files.

tar xzvf hadoop-2.10.0.tar.gz

Move resulting directory to /usr/local/hadoop

sudo mv hadoop-2.10.0 /usr/local/hadoop

Set up HADOOP_HOME and add directory with Hadoop binaries to your $PATH

cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
 export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
 export HADOOP_HOME=/usr/local/hadoop
 export HADOOP_HDFS_HOME=$HADOOP_HOME
 export HADOOP_MAPRED_HOME=$HADOOP_HOME
 export YARN_HOME=$HADOOP_HOME
 export HADOOP_COMMON_HOME=$HADOOP_HOME
 export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
 export PATH=\$PATH:\$JAVA_HOME/bin:\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin
 EOF

Source file using

source /etc/profile.d/hadoop_java.sh

Confirm your Hadoop version by

hadoop@Ubuntu:~$ hadoop version
 Hadoop 2.10.0
 Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194
 Compiled by jhung on 2019-10-22T19:10Z
 Compiled with protoc 2.5.0
 From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026
 This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.0.jar

Step 5: Configure Hadoop

Hadoop configurations are located under /usr/local/hadoop/etc/hadoop/

Various files needed to be modified to complete the Installation on Ubuntu 18.04

First of all edit JAVA_HOME in shell script hadoop-env.sh:

$ sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Then configure:

1.core-site.xml

The core-site.xml file contains Hadoop cluster information used when starting up. These properties include:

The port number used for Hadoop instance
The memory allocated for file system
The memory limit for data storage
The size of Read / Write buffers.

Open core-site.xml

sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml

Add the following properties in b/w the <configuration> and </configuration> tags.

2. hdfs-site.xml

Configure this file for each host to be used in the cluster. It holds the information of

The namenode & datanode paths ol the local filesystem.
Value of replication data

I’m using my disk to store Hadoop infrastructure. You can follow this procedure for your secondary disk.

hadoop@Ubuntu:~$ lsblk
 NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
 loop0    7:0    0 149.9M  1 loop /snap/gnome-3-28-1804/67
 loop1    7:1    0  54.4M  1 loop /snap/core18/1066
 loop2    7:2    0   4.2M  1 loop /snap/gnome-calculator/544
 loop3    7:3    0  14.8M  1 loop /snap/gnome-characters/296
 loop4    7:4    0     4M  1 loop /snap/gnome-calculator/406
 loop5    7:5    0   3.7M  1 loop /snap/gnome-system-monitor/123
 loop6    7:6    0  89.1M  1 loop /snap/core/8268
 loop7    7:7    0  14.8M  1 loop /snap/gnome-characters/375
 loop8    7:8    0   3.7M  1 loop /snap/gnome-system-monitor/100
 loop9    7:9    0  1008K  1 loop /snap/gnome-logs/61
 loop10   7:10   0  88.5M  1 loop /snap/core/7270
 loop11   7:11   0 156.7M  1 loop /snap/gnome-3-28-1804/110
 loop12   7:12   0   956K  1 loop /snap/gnome-logs/81
 loop13   7:13   0  44.2M  1 loop /snap/gtk-common-themes/1353
 loop14   7:14   0  42.8M  1 loop /snap/gtk-common-themes/1313
 sda      8:0    0    20G  0 disk 
 └─sda1   8:1    0    20G  0 part /
 sr0     11:0    1     2G  0 rom

Do partition & mount the disk to /hadoop directory.

1.sudo parted -s -- /dev/sdb mklabel gpt
2.sudo parted -s -a optimal -- /dev/sdb mkpart primary 0% 100%
3.sudo parted -s -- /dev/sdb align-check optimal 1
4.sudo mkfs.xfs /dev/sdb1
5.sudo mkdir /hadoop
echo "/dev/sdb1 /hadoop xfs defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -a

Check:

hadoop@Ubuntu:~$ df -hT | grep /dev/sda1

/dev/sda1      ext4       20G  7.4G   12G  40% /

Create directories for namenode & datanode

sudo mkdir -p /hadoop/hdfs/{namenode,datanode}

Now, set ownership to hadoop user & group

sudo chown -R hadoop:hadoop /hadoop

Open the file

sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Then add the below data in b/w <configuration> & </configuration> tags.

3. mapred-site.xml

Use this file to set the MapReduce Framework

sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

Set according to the below

4. yarn-site.xml

It will overwrite the configurations for Hadoop.yarn because it will define resource management & job scheduling logic.

sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

Do similar configuration

Step 6: Validate Hadoop Configuration

Initialize Hadoop Infrastructure store.

sudo su - hadoop
hdfs namenode -format

Test HDFS Configuration

$ start-dfs.sh

Starting namenodes on [localhost]

Starting datanodes

Starting secondary namenodes [hbase]

hbase: Warning: Permanently added 'hbase' (ECDSA) to the list of known hosts.

In the end, verify the YARN configurations

$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers

Hadoop 2.x default web UI ports.

NameNode – Default HTTP port is 9870.
ResourceManager – Default HTTP port is 8088.
MapReduce JobHistory Server – Default HTTP port is 19888.

Check these by typing

ss -tunelp

Access Hadoop Web Dashboard at http://ServerIP:9870

See Hadoop Cluster Overview at http://ServerIP:8080

Let’s create a directory to test

$ hadoop fs -mkdir /test
 $ hadoop fs -ls /
 Found 1 items
 drwxr-xr-x   - hadoop supergroup          0 2019-12-29 10:23 /test

Stopping Hadoop Services

Run the following command to stop the Hadoop Services.

$ stop-dfs.sh
 $ stop-yarn.sh

See our next article to read How To Install HBase on Ubuntu 18.04

- Advertisement -

Everything Linux, A.I, IT News, DataOps, Open Source and more delivered right to you.

"The best Linux newsletter on the web"

Installing Hadoop on Ubuntu 18.04

Step 1: Update System

Step 2: Install Java

Step 3: Creating User Account

Step 4: Download & Install Hadoop

Step 5: Configure Hadoop

1.core-site.xml

2. hdfs-site.xml

3. mapred-site.xml

4. yarn-site.xml

Step 6: Validate Hadoop Configuration

Test HDFS Configuration

Stopping Hadoop Services

LEAVE A REPLY Cancel reply

Latest article

Review: Osserva – The Sales Intelligence Platform

How to delete a file in Ubuntu 24.04

Create and Administer Student Networking Projects on Linux

POPULAR POSTS

Create Your Own Ubuntu with Distroshare Ubuntu Imager

How to Install OpenLDAP Server on Ubuntu 18.04?

How to configure FileZilla Server in Windows 10/11

POPULAR CATEGORY