Follow the instructions to install Apache Spark:
https://sites.google.com/a/ku.th/big-data/home/spark
Test your installation and consider Spark with GPU:
https://sites.google.com/a/ku.th/gpu/spark
Spark and GPU : Deeplearning4j
https://deeplearning4j.org/spark-gpus
BlazeGraph
https://youtu.be/LCJtpcJ-bd0
https://devblogs.nvidia.com/parallelforall/gpus-graph-predictive-analytics/
IBM
https://github.com/IBMSparkGPU/GPUEnabler
Article:
Accelerating Spark workloads using GPUs
Transparent matching of Spark portability with GPU performance.
Rajesh Bordawekar August 2, 2016
https://www.oreilly.com/learning/accelerating-spark-workloads-using-gpus
Note: CUDA and python
https://developer.nvidia.com/how-to-cuda-python
https://developer.nvidia.com/pycuda
https://documen.tician.de/pycuda/
See also:
https://sites.google.com/a/ku.th/parallel-computing/pycuda
Options with SPARK+GPU
1. pycuda
2. swig
3. JNI
Reading:
http://tutorials.jenkov.com/java-nio/index.html
http://www.swig.org/tutorial.html
http://spark.apache.org/examples.html
http://stackoverflow.com/questions/26046410/how-can-i-obtain-an-element-position-in-sparks-rdd
scala> val r1 = sc.parallelize(List("a", "b", "c", "d", "e", "f", "g"), 3) scala> val r2 = r1.zipWithIndex scala> r2.foreach(println) (c,2) (d,3) (e,4) (f,5) (g,6) (a,0) (b,1)