About Vulcan

The Vulcan Computing Cluster is designed to be a scalable cluster suitable for big data applications.

Vulcan comprises a number of physical servers configured for Hadoop.

Type

Hostname

CPU

RAM

CPU Info

Edge Node

HDPEN01

16C/32TH

187 GB

Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz

Name Node

HDPMN[01-03]

16C/32TH

187 GB

Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz

Worker Node

HDPWN[01-06]

36C/72TH

376 GB

Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz

At the moment, 384 virtual cores and 2 TB of memory on 6 worker nodes are reserved exclusively for user Spark applications.

Apache Hadoop and HDFS

Vulcan relies on the Apache Hadoop software library which includes the Hadoop Distributed File System, commonly referred to as HDFS or Hadoop. The distributed nature of Hadoop makes it suitable for big data applications when the size of data sets exceeds typical memory of a machine.

Apache Spark

We designed Vulcan primarily with Apache Spark in mind. Researchers typically use the Python or R APIs for working with Spark, although Java and Scala APIs are available as well.

Apache YARN

The cluster manager, YARN, is an important component of the cluster whose function is to schedule resources (i.e. CPUs and memory) for Spark jobs. YARN’s queueing algorithm ensures that the cluster is shared evenly among researchers while maximizing its usage.

Acknowledgement of Use

Please include the following acknowledgement when publishing research done using the computing cluster at Booth.

This research was supported in part by the Vulcan computing cluster at The University of Chicago Booth School of Business which is funded by the Office of the Dean.