Hello, folks. Many companies are looking at Apache Spark as a component that can serve to somehow not depend so much on Elasticsearch. That’s why in this post, I’ll show you how to install Apache Spark on Debian 10.
According to the project website:
Apache Spark is a unified analytics engine for large-scale data processing.
Also, we can count on its maintenance and evolution to be carried out by prestigious working groups, and there will be great flexibility and interconnection with other Apache modules such as Hadoop, Hive, or Kafka.
Spark is used by a wide range of organizations to process large datasets. In fact, Since 2009, more than 1200 developers have contributed to Spark!
Learning Apache Spark is easy whether you come from a Java, Scala, Python, R, or SQL background.
Install Apache Spark on Debian 10
The installation of Apache Spark is quite simple and easier than you might think.
Install some required packages
So, connect via SSH to your server or open a terminal. To make sure there are no problems, update the distribution completely.
sudo apt update sudo apt upgrade
After that, install Java on Debian 10.
sudo apt install default-jdk Reading package lists... Done Building dependency tree Reading state information... Done The following additional packages will be installed: ca-certificates-java default-jdk-headless default-jre default-jre-headless fontconfig-config fonts-dejavu-core java-common libasound2 libasound2-data libavahi-client3 libavahi-common-data libavahi-common3 libcups2 libdrm-amdgpu1 libdrm-common libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libfontconfig1 libgif7 libgl1 libgl1-mesa-dri libglapi-mesa libglvnd0 libglx-mesa0 libglx0 libjpeg62-turbo liblcms2-2 libllvm7 libnspr4 libnss3 libpciaccess0 libpcsclite1 libsensors-config libsensors5 libx11-6 libx11-data libx11-xcb1 libxau6 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0 libxcb-sync1 libxcb1 libxdamage1 libxdmcp6 libxext6 libxfixes3 libxi6 libxrender1 libxshmfence1 libxtst6 libxxf86vm1 openjdk-11-jdk openjdk-11-jdk-headless openjdk-11-jre openjdk-11-jre-headless x11-common Suggested packages: libasound2-plugins alsa-utils cups-common liblcms2-utils pciutils pcscd lm-sensors openjdk-11-demo openjdk-11-source visualvm libnss-mdns fonts-dejavu-extra fonts-ipafont-gothic fonts-ipafont-mincho fonts-wqy-microhei | fonts-wqy-zenhei fonts-indic Recommended packages: libxt-dev libatk-wrapper-java-jni fonts-dejavu-extra The following NEW packages will be installed: ca-certificates-java default-jdk default-jdk-headless default-jre default-jre-headless fontconfig-config fonts-dejavu-core java-common libasound2 libasound2-data libavahi-client3 libavahi-common-data libavahi-common3 libcups2 libdrm-amdgpu1 libdrm-common libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libdrm2 libfontconfig1 libgif7 libgl1 libgl1-mesa-dri libglapi-mesa libglvnd0 libglx-mesa0 libglx0 libjpeg62-turbo liblcms2-2 libllvm7 libnspr4 libnss3 libpciaccess0 libpcsclite1 libsensors-config libsensors5 libx11-6 libx11-data libx11-xcb1 libxau6 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0 libxcb-sync1 libxcb1 libxdamage1 libxdmcp6 libxext6 libxfixes3 libxi6 libxrender1 libxshmfence1 libxtst6 libxxf86vm1 openjdk-11-jdk openjdk-11-jdk-headless openjdk-11-jre openjdk-11-jre-headless x11-common 0 upgraded, 61 newly installed, 0 to remove and 0 not upgraded. Need to get 294 MB of archives. After this operation, 642 MB of additional disk space will be used. Do you want to continue? [Y/n]
And verify that everything went well by displaying the installed version.
java --version openjdk 11.0.9.1 2020-11-04 OpenJDK Runtime Environment (build 11.0.9.1+1-post-Debian-1deb10u2) OpenJDK 64-Bit Server VM (build 11.0.9.1+1-post-Debian-1deb10u2, mixed mode, sharing)
With Java running correctly, it’s time to install the Scala package on Debian 10.
sudo apt install scala
Check the version of Scala to make sure it was installed correctly.
scala -version Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL
With this, we are done with the Apache Spark dependencies.
Download and install Apache Spark on Debian 10
Now we can download the Apache Spark binary.
So, navigate to the /tmp/ folder and from there with the wget command to perform the download
cd /tmp wget -c https://archive.apache.org/dist/spark/spark-3.0.2/spark-3.0.2-bin-hadoop2.7.tgz
then decompress it and move it to a safe location such as /opt/.
tar -xvzf spark-3.0.2-bin-hadoop2.7.tgz sudo mv spark-3.0.2-bin-hadoop2.7/ /opt/spark
To use Apache Spark seamlessly from any location at the prompt, you need to add this path to the .bashrc
file
nano ~/.bashrc
At the end of the file, add the following lines:
export SPARK_HOME=/opt/spark export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
Save the changes and close the editor. To apply the changes run:
source ~/.bashrc
Now start Apache Spark with these commands, one of which is the master of the cluster
start-master.sh starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark-angelo-org.apache.spark.deploy.master.Master-1-osradar.out
And the slave, which in this case will be the same localhost, but you can replace it with the IP address or Domain of the computer.
start-slave.sh spark://localhost:7077 starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-angelo-org.apache.spark.deploy.worker.Worker-1-osradar.out
Now you can open a web browser and access the web interface via http://your-server:8080
.
So, Apache Spark is working properly…
Conclusion
Apache Spark is easy to install on Debian 10 but so powerful that you can hardly believe it. With this tool, you can do a lot of things with a lot of data.