HPCMP understands that not all HPC requirements can be satisfied with our current offering of large HPC clusters running in a shared batch environment. HPCMP is now offering two systems at AFRL DSRC for customers with special requirements. Details on how customers can apply for access to these special purpose systems are shown below.
Shared Memory Test System (LANCER)
The Shared Memory Test system, named LANCER, provides HPCMP customers with a continuing opportunity to run shared memory applications as HPCMP mainstream systems move away from shared memory models. Known shared memory codes will be available on LANCER. The Shared Memory Test system consists of the following:
- 2,560 Cores (53.25 TFLOPS Total Peak Performance)
- 2.6GHz Intel Xeon E5-2670 Sandy Bridge Processors
- 8 Gigabyte Memory per Compute Core (20TB Total)
- Lustre file system providing 192 TB Workspace
- Red Hat Enterprise Linux
- ScaleMP vSMP Foundations hypervisor software:
- Creates virtual images with a maximum size of 1,024 cores and 6.5TB usable memory.
- Supports multiple virtual images of various sizes using all 2,560 compute cores.
- Consolidates local disk space from each node into one scratch disk per vSMP image.
Full System Workbench (TALON)
The Full System Workbench, named TALON, provides HPCMP customers with an opportunity to develop projects that require full control of an HPC system. Each TALON project will work in a dedicated workspace partition and may have sole use of the HPC system when their project is running. TALON projects will be allowed to load and develop codes that aren't allowed in our standard, shared batch environment, such as web interface or database operations. TALON projects will coordinate with the AFRL DSRC support team to have their special software applications installed. The Full System Workbench consists of the following:
- 17 user accessible nodes each with dual 2.8GHz, quad-core Nehalem processors (8 Cores/Node)
- 5 login or web nodes with 48GB RAM with 10Gig Ethernet connections to DREN
- 12 compute nodes with 24GB RAM
- 33 TB high bandwidth (1.2GB/sec) Panasas parallel file system
These systems are installed on the AFRL DSRC production network and open to all users in the HPCMP domain. Any non-HPCMP user interested in accessing the systems will need to complete the HPCMP user account process with his organizational S/AAA.
Any user may submit a project proposal at any time and our review team is committed to providing a response within two weeks. A copy of the call for proposals is available here. Users with routine requests to run known shared memory codes may limit their proposals to identifying the codes needed and a very brief description of memory requirements and job types.
Project proposals can be submitted to CCAC at any time, to a special
With a limited number of users sharing a small number of nodes on a system, good collaboration is critical. All system users will be added to a system mailing list, monitored by our support staff. Users will be required to post notices of their plans for system usage over certain thresholds. This will alert the support staff and other users as system load is impacted.
Current plans for both systems call for configurations as similar to allocated HPC systems as possible, including the use of PBS. For TALON, we will run one queue with no limits. For LANCER, one queue for all jobs. Jobs on Lancer are limited by the capabilities of the ScaleMP vSMP Foundations hypervisor software. The largest job on Lancer equals 100% of the largest virtual image. The largest virtual image on Lancer has 1024 cores and 6.5TB of memory. Lancer will also offer ARS to provide scheduled access to software licenses.
Last modified: April 01, 2013