HBase is an open source distributed non-relational database developed under the Apache Software Foundation. It is written in Java & runs on top of Hadoop File Systems (HDFS). HBase is one of the dominant databases when working with big data. It is designed for a quick read & write access to huge amounts of structured data.
Today, we will cover our first guide on the Installation of Hadoop & HBase on Ubuntu 18.04 and it is a HBase Installation on a Single Node Hadoop Cluster. It is done on a barebone Ubuntu 18.04 Virtual Machine with 8GB Ram & 4vCPU
Installing Hadoop on Ubuntu 18.04
Cover these steps to install a Single node Hadoop cluster on Ubuntu 18.04 LTS
Step 1: Update System
To deploy Hadoop & HBase on Ubuntu , update it.
sudo apt update
sudo apt -y upgrade
sudo reboot
Step 2: Install Java
Skip this step if you have Installed java.
sudo apt install openjdk-8-jre-headless
sudo apt update
Confirm the Installation of Java by
sabi@Ubuntu:~$ java -version
openjdk version "1.8.0_232"
OpenJDK Runtime Environment (build 1.8.0_232-8u232-b09-0ubuntu1~18.04.1-b09)
OpenJDK 64-Bit Server VM (build 25.232-b09, mixed mode)
Set up JAVA_HOME variable.
cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export PATH=$PATH:$JAVA_HOME/bin
EOF
Now, update your PATH & settings.
source /etc/profile.d/hadoop_java.sh
Testing Java
sabi@Ubuntu:~$ echo $JAVA_HOME
/usr/lib/jvm/java-11-openjdk-amd64
Step 3: Creating User Account
Move forward to create an Account for Hadoop so we have isolation b/w the Hadoop file system & the Unix file system.
sabi@Ubuntu:~$ sudo adduser hadoop
Adding userhadoop' ... Adding new group
hadoop' (1001) …
Adding new userhadoop' (1001) with group
hadoop' …
Creating home directory/home/hadoop' ... Copying files from
/etc/skel' …
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
Full Name []: Sabir Hussain
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n] y
sabi@Ubuntu:~$ sudo usermod -aG sudo hadoop
After adding user, generate SS key pair for the user.
sabi@Ubuntu:~$ sudo su - hadoop
hadoop@Ubuntu:~$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:f/lEUTkJyr49dZEHr9xZ7wCD4Lg3+ephloHQ8w8GVlY hadoop@Ubuntu
The key's randomart image is:
+---[RSA 2048]----+
| +.E .o +|
| . = …. B.|
| . * . .oo .o=|
| o * .. + +*|
| o S . =o+|
| o O oo.o.|
| = +.oo. .|
| o o . o. |
| .o . |
+----[SHA256]-----+
Allow authorization
Add this user’s key to list of Authorized ssh keys.
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
Make sure that you can ssh using added key.
hadoop@Ubuntu:~$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:jyWPWJLVC9MCHnOAFJjN8c8bwLu0o0U85cWTxHwuHvE.
Are you sure you want to continue connecting (yes/no)? y
Please type 'yes' or 'no': yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.3 LTS (GNU/Linux 5.0.0-37-generic x86_64)
Documentation: https://help.ubuntu.com
Management: https://landscape.canonical.com
Support: https://ubuntu.com/advantage
Canonical Livepatch is available for installation.
Reduce system reboots and improve kernel security. Activate at:
https://ubuntu.com/livepatch
0 packages can be updated.
0 updates are security updates.
Your Hardware Enablement Stack (HWE) is supported until April 2023.
The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.
hadoop@Ubuntu:~$ exit
logout
Connection to localhost closed.
Step 4: Download & Install Hadoop
Go for the latest release of Hadoop & download it.
wget https://www-eu.apache.org/dist/hadoop/common/hadoop-2.10.0/hadoop-2.10.0.tar.gz
Extract the files.
tar xzvf hadoop-2.10.0.tar.gz
Move resulting directory to /usr/local/hadoop
sudo mv hadoop-2.10.0 /usr/local/hadoop
Set up HADOOP_HOME and add directory with Hadoop binaries to your $PATH
cat <<EOF | sudo tee /etc/profile.d/hadoop_java.sh
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
EOF
Source file using
source /etc/profile.d/hadoop_java.sh
Confirm your Hadoop version by
hadoop@Ubuntu:~$ hadoop version
Hadoop 2.10.0
Subversion ssh://git.corp.linkedin.com:29418/hadoop/hadoop.git -r e2f1f118e465e787d8567dfa6e2f3b72a0eb9194
Compiled by jhung on 2019-10-22T19:10Z
Compiled with protoc 2.5.0
From source with checksum 7b2d8877c5ce8c9a2cca5c7e81aa4026
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.0.jar
Step 5: Configure Hadoop
Hadoop configurations are located under /usr/local/hadoop/etc/hadoop/
Various files needed to be modified to complete the Installation on Ubuntu 18.04
First of all edit JAVA_HOME in shell script hadoop-env.sh:
$ sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-oracle
Then configure:
1.core-site.xml
The core-site.xml file contains Hadoop cluster information used when starting up. These properties include:
- The port number used for Hadoop instance
- The memory allocated for file system
- The memory limit for data storage
- The size of Read / Write buffers.
Open core-site.xml
sudo nano /usr/local/hadoop/etc/hadoop/core-site.xml
Add the following properties in b/w the <configuration> and </configuration> tags.
2. hdfs-site.xml
Configure this file for each host to be used in the cluster. It holds the information of
- The namenode & datanode paths ol the local filesystem.
- Value of replication data
I’m using my disk to store Hadoop infrastructure. You can follow this procedure for your secondary disk.
hadoop@Ubuntu:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 149.9M 1 loop /snap/gnome-3-28-1804/67
loop1 7:1 0 54.4M 1 loop /snap/core18/1066
loop2 7:2 0 4.2M 1 loop /snap/gnome-calculator/544
loop3 7:3 0 14.8M 1 loop /snap/gnome-characters/296
loop4 7:4 0 4M 1 loop /snap/gnome-calculator/406
loop5 7:5 0 3.7M 1 loop /snap/gnome-system-monitor/123
loop6 7:6 0 89.1M 1 loop /snap/core/8268
loop7 7:7 0 14.8M 1 loop /snap/gnome-characters/375
loop8 7:8 0 3.7M 1 loop /snap/gnome-system-monitor/100
loop9 7:9 0 1008K 1 loop /snap/gnome-logs/61
loop10 7:10 0 88.5M 1 loop /snap/core/7270
loop11 7:11 0 156.7M 1 loop /snap/gnome-3-28-1804/110
loop12 7:12 0 956K 1 loop /snap/gnome-logs/81
loop13 7:13 0 44.2M 1 loop /snap/gtk-common-themes/1353
loop14 7:14 0 42.8M 1 loop /snap/gtk-common-themes/1313
sda 8:0 0 20G 0 disk
└─sda1 8:1 0 20G 0 part /
sr0 11:0 1 2G 0 rom
Do partition & mount the disk to /hadoop directory.
1.sudo parted -s -- /dev/sdb mklabel gpt
2.sudo parted -s -a optimal -- /dev/sdb mkpart primary 0% 100%
3.sudo parted -s -- /dev/sdb align-check optimal 1
4.sudo mkfs.xfs /dev/sdb1
5.sudo mkdir /hadoop
echo "/dev/sdb1 /hadoop xfs defaults 0 0" | sudo tee -a /etc/fstab
sudo mount -a
Check:
hadoop@Ubuntu:~$ df -hT | grep /dev/sda1
/dev/sda1 ext4 20G 7.4G 12G 40% /
Create directories for namenode & datanode
sudo mkdir -p /hadoop/hdfs/{namenode,datanode}
Now, set ownership to hadoop user & group
sudo chown -R hadoop:hadoop /hadoop
Open the file
sudo nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
Then add the below data in b/w <configuration> & </configuration> tags.
3. mapred-site.xml
Use this file to set the MapReduce Framework
sudo nano /usr/local/hadoop/etc/hadoop/mapred-site.xml
Set according to the below
4. yarn-site.xml
It will overwrite the configurations for Hadoop.yarn because it will define resource management & job scheduling logic.
sudo nano /usr/local/hadoop/etc/hadoop/yarn-site.xml
Do similar configuration
Step 6: Validate Hadoop Configuration
Initialize Hadoop Infrastructure store.
sudo su - hadoop
hdfs namenode -format
Test HDFS Configuration
$ start-dfs.sh
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [hbase]
hbase: Warning: Permanently added 'hbase' (ECDSA) to the list of known hosts.
In the end, verify the YARN configurations
$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
Hadoop 2.x default web UI ports.
- NameNode – Default HTTP port is 9870.
- ResourceManager – Default HTTP port is 8088.
- MapReduce JobHistory Server – Default HTTP port is 19888.
Check these by typing
ss -tunelp
Access Hadoop Web Dashboard at http://ServerIP:9870
See Hadoop Cluster Overview at http://ServerIP:8080
Let’s create a directory to test
$ hadoop fs -mkdir /test
$ hadoop fs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2019-12-29 10:23 /test
Stopping Hadoop Services
Run the following command to stop the Hadoop Services.
$ stop-dfs.sh
$ stop-yarn.sh
See our next article to read How To Install HBase on Ubuntu 18.04