Install Hadoop Before you begin with Scala and Spark

Prerequisites are Java and Hadoop

Install Scala

1. sudo wget www.scala-lang.org/files/archive/scala-2.11.12.deb

2. sudo dpkg -i scala-2.11.12.deb

Type scala to get into terminal

3. scala

Test scala with REPL running just print greeting message

4. println(“Welcome to Scala”)

To quit

5. :q

Install Spark

First Install git package

6. sudo apt-get install git

Download the recent version of Spark, Extract it and move to /usr/local/spark

7. wget http://mirrors.estointernet.in/apache/spark/spark-2.4.3/spark-2.4.3-bin-hadoop2.7.tgz

8. tar xvzf spark-2.4.3-bin-hadoop2.7.tgz

9. sudo mkdir -p /usr/local/spark

10. sudo chown -R hduser:hadoop /usr/local/spark

11. cd spark-2.4.3-bin-hadoop2.7

12. sudo mv * /usr/local/spark

Configure Spark path in ~/.bashrc file by adding following lines to it

13. vim ~/.bashrc

SPARK_HOME=/usr/local/spark

export PATH=$SPARK_HOME/bin:$PATH

14. source ~/.bashrc

Go to the Bin Directory and execute the spark shell

15. cd /usr/local/spark/bin

16. ./spark-shell

Test spark with REPL running just print greeting message

16. println(“Welcome to Spark”)

Web Console is also available in the highlighted URL in the below image


To quit

17. :q

To start and stop both master and slave node execute below command

18. cd /usr/local/spark/sbin

19. ./start-all.sh

The web UI will be available at 8080 port

http://192.168.77.149:8080/

20. ./stop-all.sh

For Examples Refer

https://mapr.com/blog/how-get-started-using-apache-spark-graphx-scala/