The Gemini Computing Cluster at Saint Louis University is a high-performance computing cluster, leveraging modern technologies such as GPU computing.
The Gemini Computing Cluster consists of 18 compute nodes, plus two nodes that serve web and database applications. Each node contains 2x 8-core Xeon processors with 64 GB RAM and 4 TB local scratch space. In the default configuration, there are 4 NVIDIA M2075 GPUs attached to each node via PCIe to a 16-unit C410x expansion chassis. High-bandwidth, low-latency 40 Gbps QDR Infiniband interfaces are used for internode communications. There is also a dedicated gigabit ethernet network for traditional node communications over TCP.
There are two MD1200 storage units in a RAID 5 configuration that provide 40 TB of unified work space accessible to all of the nodes in the cluster.
Two-step electrostatic potential visualization of thrombin.
Quick Facts About Gemini
- 135 TFLOPs peak computing power
- 344 CPU cores
- 64 GB RAM/node (1.3 TB aggregate)
- 40 TB shared work space + 4 TB scratch/node
- 40 Gbps QDR Infiniband
- More than 5.9 million in-silico screening compounds
- More than 2.8 million CPU and 750,000 GPU hours processing time since first coming online
Finer Details About Gemini
Because the GPUs are not physically inside the compute nodes, they can be reconfigured on-the-fly to provide up to 8 GPUs to a node. All of the GPUs sit on the same PCIe switch within the C410x housing unit, which means that the GPUs can access memory across devices via Unified Memory Addressing without passing through the host bus controller (also called peer-to-peer communication).
The software stack that supports the Infiniband interfaces can be adjusted to allow direct communication between GPUs on different nodes via GPUDirect Remote Direct Memory Access (GRDMA). In this configuration, all 72 GPUs in the cluster can access each other’s data with much lower latency than would otherwise be possible. (The average latency on Gemini for GRDMA transfers over Infiniband is ~7µs for 64KB messages.)
For applications that are easily distributed, or for running a large number of independent tasks, the entire cluster can function as a traditional batch processing cluster. At its peak computing capacity, Gemini provides over 72 TFLOPs of computing power and consumes about 18,000 W of power.
The cluster is managed using the Bright Cluster Manager which allows for quickly provisioning nodes or reconfiguring the operating system (CentOS 7), drivers or other components of the software stack depending on the needs of a particular application.
Frequently Asked Questions
Clusters are very specialized resources for performing certain types of computationally demanding work. More often than not, a cluster is the wrong solution for data analysis and not all types of data analysis are even amenable for use with a cluster.
Tasks that require a lot of manual intervention, take modest amounts of time on traditional computers (a few hours, or perhaps a day) or that can't be run in parallel won't benefit from running on a cluster.
However, if a task takes a few hours, and you have hundreds or thousands of tasks to do, a cluster may be the best option.
Gemini excels as certain kinds of computations, performs very well on others and will perform poorly on others.
Numerically intensive calculations, calculations that scale to a large number of processors or those that require very tightly coupled inter-process communication are ideal. Examples include molecular dynamics simulations, computational fluid dynamics, quantum mechanics/computational chemistry calculations and image processing.
Gemini will also perform batch operations of tasks that can be easily split up and processed in chunks. Certain types of data-mining are possible or in-silico small-molecule compound docking. If what you work on is I/O intensive with a lot of reading and writing to disk, Gemini is not the proper resource.
Gemini is available for use to anyone at SLU with a demonstrable need for its level of computing capacity. Ideal projects require large continuous amounts of computing power for defined periods of time. Gemini can't accommodate projects that only need a couple of CPUs or tasks that can be refactored using other means (e.g. converting scripts into executable applications).
In general, no. Projects running on Gemini typically need all of the resources for defined periods of time (get on, calculate, get off). In some cases where nodes need to be reconfigured for scaling or performance purposes, a subset of the nodes may be temporarily re-provisioned.
Gemini is considered a collaborative tool. There is no charge for using it. If you require a fair amount of assistance working out an analysis, a program to be written or some other kind of specialized help, some form of cost recovery for time may be required. More often than not, payment comes in the form of co-authorship or a small amount of salary recovery in grant applications.
For an application to run on Gemini, the application or scripts need to be able to run on the Linux operating system and capable of running via the clusters job scheduler. It must not require access to external resources or significant database access. It also must run primarily without human intervention (that is, be non-interactive) and not require commercial software licenses that aren't currently available or incur costs which can't be covered as a part of a given project.