R API and SparkR
https://spark.apache.org/docs/3.1.1/sparkr.html
Interactive Jobs
module load R/4.0/4.0.2
# launch interactive sparkR session
sparkR
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 3.1.1
/_/
SparkSession Web UI available at http://hdpen01.chicagobooth.edu:4040
SparkSession available as 'spark'(master = yarn, app id = application_1629131983124_2674).
>
Batch Jobs
# ssh into the cluster
ssh <BoothID>@vulcan.chicagobooth.edu
# load the R module
module load R/4.0/4.0.2
# client mode enables stdout
spark-submit script.R
# cluster mode disables stdout but allows long-running jobs to continue after logging off
# --master client is used by default unless specified otherwise
spark-submit --deploy-mode cluster script.R
Examples
Apache Spark ships with a few example scripts that serve as useful demos.
You can find the examples at the following path: ${SPARK_HOME}/examples/src/main/r/
.
# load R module
module load R/4.0/4.0.2
# run a prediction model with Alternative Least Squares
spark-submit ${SPARK_HOME}/examples/src/main/r/ml/als.R