R API and SparkR

https://spark.apache.org/docs/3.1.1/sparkr.html

Interactive Jobs

module load R/4.0/4.0.2
# launch interactive sparkR session
sparkR
Welcome to
   ____              __
  / __/__  ___ _____/ /__
 _\ \/ _ \/ _ `/ __/  '_/
/__ / .__/\_,_/_/ /_/\_\   version 3.1.1
   /_/

    SparkSession Web UI available at http://hdpen01.chicagobooth.edu:4040
SparkSession available as 'spark'(master = yarn, app id = application_1629131983124_2674).

>

Batch Jobs

# ssh into the cluster
ssh <BoothID>@vulcan.chicagobooth.edu

# load the R module
module load R/4.0/4.0.2
# client mode enables stdout
spark-submit  script.R

# cluster mode disables stdout but allows long-running jobs to continue after logging off
# --master client is used by default unless specified otherwise
spark-submit --deploy-mode cluster  script.R

Examples

Apache Spark ships with a few example scripts that serve as useful demos. You can find the examples at the following path: ${SPARK_HOME}/examples/src/main/r/.

# load R module
module load R/4.0/4.0.2
# run a prediction model with Alternative Least Squares
spark-submit ${SPARK_HOME}/examples/src/main/r/ml/als.R