About Vulcan

The Vulcan Computing Cluster is designed to be a scalable cluster suitable for big data applications.

Vulcan comprises a number of physical servers configured for Hadoop.

Type	Hostname	CPU	RAM	CPU Info
Edge Node	HDPEN01	16C/32TH	187 GB	Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz
Name Node	HDPMN[01-03]	16C/32TH	187 GB	Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz
Worker Node	HDPWN[01-06]	36C/72TH	376 GB	Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz

At the moment, 384 virtual cores and 2 TB of memory on 6 worker nodes are reserved exclusively for user Spark applications.

Apache Hadoop and HDFS

Vulcan relies on the Apache Hadoop software library which includes the Hadoop Distributed File System, commonly referred to as HDFS or Hadoop. The distributed nature of Hadoop makes it suitable for big data applications when the size of data sets exceeds typical memory of a machine.

Apache Spark

We designed Vulcan primarily with Apache Spark in mind. Researchers typically use the Python or R APIs for working with Spark, although Java and Scala APIs are available as well.

Apache YARN

The cluster manager, YARN, is an important component of the cluster whose function is to schedule resources (i.e. CPUs and memory) for Spark jobs. YARN’s queueing algorithm ensures that the cluster is shared evenly among researchers while maximizing its usage.

Acknowledgement of Use

Please include the following acknowledgement when publishing research done using the computing cluster at Booth.

This research was supported in part by the Vulcan computing cluster at The University of Chicago Booth School of Business which is funded by the Office of the Dean.