In this article you’ll learn that how to install Apache Spark On Ubuntu 20.04. Apache Spark is most powerful cluster computing system that gives high level API’s in Java, Scala & Python. It provides high level tools with advanced techniques like SQL,MLlib,GraphX & Spark Streaming. So, follow the below steps for an easy & optimal installation of Apache Spark.
Step 1: Update Your System
As usual we do, update your system before installing any new package.
sudo apt update && sudo apt upgrade -y
Once the update finished, reboot your system.
sudo reboot
Step 2: Install Java On Ubuntu 20.04
As apache spark needs Java to operate, install it by typing
sudo apt install default-jdk
Verify the installed java version by typing.
sabi@Ubuntu20:~$ java -version openjdk version "11.0.9.1" 2020-11-04 OpenJDK Runtime Environment (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04) OpenJDK 64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
Step 3: Download & Install Apache Spark On Ubuntu 20.04
Fire the below command in your terminal to download the latest version of Apache spark or visit the official page to download manually.
wget https://www.apache.org/dyn/closer.lua/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz
tar xvzf spark-3.0.1-bin-hadoop2.7.tgz
sudo mv spark-3.0.1-bin-hadoop2.7/ /opt/spark
Now, configure the apache environment.
sudo nano ~/.bashrc
And add the environment variable into the file.
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
Finally source the file by typing
source ~/.bashrc
Step 4: Starting Spark Master Server
You can start the Apache Spark Master server by typing the following command in your terminal.
start-master.sh
Step 5: Access Apache Spark Via Web Interface
Go to your browser and type your server IP with port 8080 to access apache spark web interface.
http://127.0.0.1:8080/
To start a new slave server under this Master server, type the following command.
start-slave.sh spark://ubuntu1:7077
Reload the web page and you’ll see the slave server running.
Finally finish the config & hit the below command to verify the installation.
So, this is how you can install Apache Spark on Ubuntu 20.04