3 C
Texas

How To Install Apache Hadoop / HBase on Ubuntu 18.04

HBase is an open source distributed non-relational database developed under the Apache Software Foundation. It is written in Java & runs on top of Hadoop File Systems (HDFS). HBase is one of the dominant databases when working with big data. It is designed for a quick read & write access to huge amounts of structured data.

Today, we will cover our first guide on the Installation of Hadoop & HBase on Ubuntu 18.04 and it is a HBase Installation on a Single Node Hadoop Cluster. It is done on a barebone Ubuntu 18.04 Virtual Machine with 8GB Ram & 4vCPU

Installing Hadoop on Ubuntu 18.04

Cover these steps to install a Single node Hadoop cluster on Ubuntu 18.04 LTS

Step 1: Update System

To deploy Hadoop & HBase on Ubuntu , update it.

- Advertisement -
sudo apt update
sudo apt -y upgrade
sudo reboot

Step 2: Install Java

Skip this step if you have Installed java.

sudo apt install openjdk-8-jre-headless
sudo apt update

Confirm the Installation of Java by

sabi@Ubuntu:~$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)

Set up JAVA_HOME variable.

cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export PATH=$PATH:$JAVA_HOME/bin
EOF

Now, update your PATH & settings.

source /etc/profile.d/hadoop_java.sh

Testing Java

sabi@Ubuntu:~$ echo $JAVA_HOME
/usr/lib/jvm/java-11-openjdk-amd64

Step 3: Creating User Account

Move forward to create an Account for Hadoop so we have isolation b/w the Hadoop file system & the Unix file system.

sabi@Ubuntu:~$ sudo adduser hadoop
Adding user hadoop' ... Adding new grouphadoop' (1001) …
Adding new user hadoop' (1001) with grouphadoop' …
Creating home directory /home/hadoop' ... Copying files from/etc/skel' …
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
Full Name []: Sabir Hussain
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] y
sabi@Ubuntu:~$ sudo usermod -aG sudo hadoop

After adding user, generate SS key pair for the user.

sabi@Ubuntu:~$ sudo su - hadoop
hadoop@Ubuntu:~$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:f/lEUTkJyr49dZEHr9xZ7wCD4Lg3+ephloHQ8w8GVlY hadoop@Ubuntu
The key's randomart image is:
+---[RSA 2048]----+
| +.E .o +|
| . = …. B.|
| . * . .oo .o=|
| o * .. + +*|
| o S . =o+|
| o O oo.o.|
| = +.oo. .|
| o o . o. |
| .o . |
+----[SHA256]-----+

Allow authorization

Add this user’s key to list of Authorized ssh keys.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

Make sure that you can ssh using added key.

hadoop@Ubuntu:~$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:jyWPWJLVC9MCHnOAFJjN8c8bwLu0o0U85cWTxHwuHvE.
Are you sure you want to continue connecting (yes/no)? y
Please type 'yes' or 'no': yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 5.0.0-37-generic x86_64)
Documentation: https://help.ubuntu.com
Management: https://landscape.canonical.com
Support: https://ubuntu.com/advantage
Canonical Livepatch is available for installation.
Reduce system reboots and improve kernel security. Activate at:
https://ubuntu.com/livepatch
0 packages can be updated.
0 updates are security updates.
Your Hardware Enablement Stack (HWE) is supported until April 2023.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
hadoop@Ubuntu:~$ exit
logout
Connection to localhost closed.

Step 4: Download & Install Hadoop

Go for the latest release of Hadoop & download it.

wget https://www-eu.apache.org/dist/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz

Extract the files.

tar xzvf hadoop-2.10.0.tar.gz

Move resulting directory to /usr/local/hadoop

sudo mv hadoop-2.10.0 /usr/local/hadoop

Set up HADOOP_HOME and add directory with Hadoop binaries to your $PATH

cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
EOF

Source file using

source /etc/profile.d/hadoop_java.sh

Confirm your Hadoop version by

hadoop@Ubuntu:~$ hadoop version
Hadoop 2.10.0
Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194
Compiled by jhung on 2019-10-22T19:10Z
Compiled with protoc 2.5.0
From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.0.jar

Step 5: Configure Hadoop

Hadoop configurations are located under /usr/local/hadoop/etc/hadoop/

Various files needed to be modified to complete the Installation on Ubuntu 18.04

First of all edit JAVA_HOME in shell script hadoop-env.sh:

$ sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle

Then configure:

1.core-site.xml

The core-site.xml file contains Hadoop cluster information used when starting up. These properties include:

  • The port number used for Hadoop instance
  • The memory allocated for file system
  • The memory limit for data storage
  • The size of Read / Write buffers.

Open core-site.xml

sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml

Add the following properties in b/w the <configuration> and </configuration> tags.

2. hdfs-site.xml

Configure this file for each host to be used in the cluster. It holds the information of

  • The namenode & datanode paths ol the local filesystem.
  • Value of replication data

I’m using my disk to store Hadoop infrastructure. You can follow this procedure for your secondary disk.

hadoop@Ubuntu:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 149.9M 1 loop /snap/gnome-3-28-1804/67
loop1 7:1 0 54.4M 1 loop /snap/core18/1066
loop2 7:2 0 4.2M 1 loop /snap/gnome-calculator/544
loop3 7:3 0 14.8M 1 loop /snap/gnome-characters/296
loop4 7:4 0 4M 1 loop /snap/gnome-calculator/406
loop5 7:5 0 3.7M 1 loop /snap/gnome-system-monitor/123
loop6 7:6 0 89.1M 1 loop /snap/core/8268
loop7 7:7 0 14.8M 1 loop /snap/gnome-characters/375
loop8 7:8 0 3.7M 1 loop /snap/gnome-system-monitor/100
loop9 7:9 0 1008K 1 loop /snap/gnome-logs/61
loop10 7:10 0 88.5M 1 loop /snap/core/7270
loop11 7:11 0 156.7M 1 loop /snap/gnome-3-28-1804/110
loop12 7:12 0 956K 1 loop /snap/gnome-logs/81
loop13 7:13 0 44.2M 1 loop /snap/gtk-common-themes/1353
loop14 7:14 0 42.8M 1 loop /snap/gtk-common-themes/1313
sda 8:0 0 20G 0 disk
└─sda1 8:1 0 20G 0 part /
sr0 11:0 1 2G 0 rom

Do partition & mount the disk to /hadoop directory.

1.sudo parted -s -- /dev/sdb mklabel gpt
2.sudo parted -s -a optimal -- /dev/sdb mkpart primary 0% 100%
3.sudo parted -s -- /dev/sdb align-check optimal 1
4.sudo mkfs.xfs /dev/sdb1
5.sudo mkdir /hadoop
echo "/dev/sdb1 /hadoop xfs defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -a

Check:

hadoop@Ubuntu:~$ df -hT | grep /dev/sda1
/dev/sda1 ext4 20G 7.4G 12G 40% /

Create directories for namenode & datanode

sudo mkdir -p /hadoop/hdfs/{namenode,datanode}

Now, set ownership to hadoop user & group

sudo chown -R hadoop:hadoop /hadoop

Open the file

sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml

Then add the below data in b/w <configuration> & </configuration> tags.

3. mapred-site.xml

Use this file to set the MapReduce Framework

sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml

Set according to the below

4. yarn-site.xml

It will overwrite the configurations for Hadoop.yarn because it will define resource management & job scheduling logic.

sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml

Do similar configuration

Step 6: Validate Hadoop Configuration

Initialize Hadoop Infrastructure store.

sudo su - hadoop
hdfs namenode -format

Test HDFS Configuration

$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [hbase]
hbase: Warning: Permanently added 'hbase' (ECDSA) to the list of known hosts.

In the end, verify the YARN configurations

$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers

Hadoop 2.x default web UI ports.

  • NameNode – Default HTTP port is 9870.
  • ResourceManager – Default HTTP port is 8088.
  • MapReduce JobHistory Server – Default HTTP port is 19888.

Check these by typing

ss -tunelp

Access Hadoop Web Dashboard at http://ServerIP:9870

See Hadoop Cluster Overview at http://ServerIP:8080

Let’s create a directory to test

$ hadoop fs -mkdir /test
$ hadoop fs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2019-12-29 10:23 /test
Stopping Hadoop Services

Run the following command to stop the Hadoop Services.

$ stop-dfs.sh
$ stop-yarn.sh

See our next article to read How To Install HBase on Ubuntu 18.04

- Advertisement -
Everything Linux, A.I, IT News, DataOps, Open Source and more delivered right to you.
Subscribe
"The best Linux newsletter on the web"

LEAVE A REPLY

Please enter your comment!
Please enter your name here



Latest article