Install Hadoop Before you begin with Scala and Spark
Prerequisites are Java and Hadoop
Install Scala
1. sudo wget www.scala-lang.org/files/archive/scala-2.11.12.deb
2. sudo dpkg -i scala-2.11.12.deb
Type scala to get into terminal
3. scala
Test scala with REPL running just print greeting message
4. println(“Welcome to Scala”)
To quit
5. :q
Install Spark
First Install git package
6. sudo apt-get install git
Download the recent version of Spark, Extract it and move to /usr/local/spark
7. wget http://mirrors.estointernet.in/apache/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz
8. tar xvzf spark-2.4.3-bin-hadoop2.7.tgz
9. sudo mkdir -p /usr/local/spark
10. sudo chown -R hduser:hadoop /usr/local/spark
11. cd spark-2.4.3-bin-hadoop2.7
12. sudo mv * /usr/local/spark
Configure Spark path in ~/.bashrc file by adding following lines to it
13. vim ~/.bashrc
SPARK_HOME=/usr/local/spark
export PATH=$SPARK_HOME/bin:$PATH
14. source ~/.bashrc
Go to the Bin Directory and execute the spark shell
15. cd /usr/local/spark/bin
16. ./spark-shell
Test spark with REPL running just print greeting message
16. println(“Welcome to Spark”)
Web Console is also available in the highlighted URL in the below image
To quit
17. :q
To start and stop both master and slave node execute below command
18. cd /usr/local/spark/sbin
19. ./start-all.sh
The web UI will be available at 8080 port
http://192.168.77.149:8080/
20. ./stop-all.sh
For Examples Refer
https://mapr.com/blog/how-get-started-using-apache-spark-graphx-scala/