About Vulcan
The Vulcan Computing Cluster is designed to be a scalable cluster suitable for big data applications.
Vulcan comprises a number of physical servers configured for Hadoop.
Type |
Hostname |
CPU |
RAM |
CPU Info |
---|---|---|---|---|
Edge Node |
HDPEN01 |
16C/32TH |
187 GB |
Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz |
Name Node |
HDPMN[01-03] |
16C/32TH |
187 GB |
Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz |
Worker Node |
HDPWN[01-06] |
36C/72TH |
376 GB |
Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz |
At the moment, 384 virtual cores and 2 TB of memory on 6 worker nodes are reserved exclusively for user Spark applications.
Apache Hadoop and HDFS
Vulcan relies on the Apache Hadoop software library which includes the Hadoop Distributed File System, commonly referred to as HDFS or Hadoop. The distributed nature of Hadoop makes it suitable for big data applications when the size of data sets exceeds typical memory of a machine.
Apache Spark
We designed Vulcan primarily with Apache Spark in mind. Researchers typically use the Python or R APIs for working with Spark, although Java and Scala APIs are available as well.
Apache YARN
The cluster manager, YARN, is an important component of the cluster whose function is to schedule resources (i.e. CPUs and memory) for Spark jobs. YARN’s queueing algorithm ensures that the cluster is shared evenly among researchers while maximizing its usage.
Acknowledgement of Use
Please include the following acknowledgement when publishing research done using the computing cluster at Booth.
This research was supported in part by the Vulcan computing cluster at The University of Chicago Booth School of Business which is funded by the Office of the Dean.