Spark 編程指南繁體中文
https://www.gitbook.com/book/taiwansparkusergroup/spark-programming-guide-zh-tw/details
ubuntu 14.04安裝Spark 1.1.0版範例
http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/
一個安裝Spark 1.2.1的範例
http://kurthung1224.pixnet.net/blog/post/270485866
官方網站下載
http://spark.apache.org/downloads.html
Spark 2.0.2
Hadoop 2.7 or later
>>spark-2.0.2-bin-hadoop2.7.tgz
解壓縮
tar zxvf spark-2.0.2-bin-hadoop2.7.tgz
設定到PATH
vim ~/.bashrc
export SPARK_HOME=$opt/spark
export PATH=$PATH:$SPARK_HOME/bin
source ~/.bashrc
執行spark shell的問題
./bin/spark-shell
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: org.apache.spark.launcher.Main. Program will exit.
錯誤訊息version 51.0表示需要使用java/jdk 1.7版以上執行
vim ~/.tc_setjdk1.7
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export JRE_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre
export PATH=$JAVA_HOME:$PATH
sudo update-alternatives --set javac ${JAVA_HOME}/bin/javac
sudo update-alternatives --set java ${JAVA_HOME}/bin/java
sudo update-alternatives --set javah ${JAVA_HOME}/bin/javah
sudo update-alternatives --set javadoc ${JAVA_HOME}/bin/javadoc
sudo update-alternatives --set javap ${JAVA_HOME}/bin/javap
sudo update-alternatives --set jar ${JAVA_HOME}/bin/jar
sudo update-alternatives --set jarsigner ${JAVA_HOME}/bin/jarsigner
sudo update-alternatives --set keytool ${JAVA_HOME}/bin/keytool
引入設定
source ~/.tc_setjdk1.7
執行Spark shell
./bin/spark-shell
然後可以開始第一個Spark shell使用
https://taiwansparkusergroup.gitbooks.io/spark-programming-guide-zh-tw/content/quick-start/using-spark-shell.html
scala> val textFile = sc.textFile("README.md")
textFile: org.apache.spark.rdd.RDD[String] = README.md MapPartitionsRDD[1] at textFile at <console>:24
scala> textFile.count() // RDD 的數據行數
res0: Long = 99
scala> textFile.first() // RDD 的第一行數據
res1: String = # Apache Spark
Resilient Distributed Dataset(RDD) 的彈性分布式資料集
scala> val linesWithSpark = textFile.filter(line => line.contains("Spark"))
linesWithSpark: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[4] at filter at <console>:26
scala> textFile.filter(line => line.contains("Spark")).count() // 有多少行包括 "Spark"?
res4: Long = 19
scala> textFile.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b)
res5: Int = 22
scala> import jav
java javaewah javassist javax javolution
scala> import jav
java javaewah javassist javax javolution
scala> import jav
java javaewah javassist javax javolution
scala> import java.lang
final package lang
scala> import java.lang.Math
object Math
scala> import java.lang.Math
import java.lang.Math
scala> textFile.map(line => line.split(" ").size).reduce((a, b) => Math.max(a, b))
res6: Int = 22
一份Big Data課程的學習
https://bigdataanalytics2014.com/about/