Archived Unscheduled Maintenance News
05.30.12
ERDC - Diamond - System Maintenance
Diamond returned to service at 1225 hours CDT, June 1 2012.
We apologize for the extended delay.
**************************
Diamond remains offline due to hardware that did not survive the power shutdown last night. Additional SGI hardware support personnel will arrive this morning to assist in repairs. At this time there is no estimate of when it will return to service.
You will receive email notification when the repairs have been completed and Diamond has been returned to service.
We apologize for the inconvenience.
05.30.12
ORS - Archive - Hardware Maintenance
The ORS mass storage array on Wiseman did not come out of the maintenance outage cleanly.06.13.12
ERDC - Archive - System Maintenance
The ERDC DSRC's Archive Server (Gold) stopped providing archival service to ERDC DSRC users sometime last night when one of its disk cache filled up.
Gold has been taken offline while the files are being written to tape and cleared off this disk cache.
It is expected that Gold will be returned to service this afternoon, Central Time.
We apologize for the inconvenience.
06.15.12
ERDC - Archive - System Maintenance
The ERDC DSRC's Archive Server (Gold) stopped providing archival service to ERDC DSRC users at 1000 hours Central time, when one of its disk cache filled up. Gold has been taken offline while the files are being written to tape and cleared off this disk cache.
It is expected that Gold will be returned to service this afternoon.
We apologize for the inconvenience.
06.19.12
ERDC - All Systems - Network Maintenance
Network connections to the ERDC DSRC are degraded. Your connection may fail or become very slow to any ERDC DSRC system. We appreciate your patience as we work to resolve this issue.06.25.12
ERDC - Archive - System Maintenance
The disk cache on Gold (Archive Server) /erdc2, is full. User access has been disabled while files are written to tape to clear the cache
06.26.12
ERDC - Archive - System Maintenance
The disk cache on Gold (Archive Server) /erdc2, is full. User access will be disabled while files are written to tape to clear the cache
06.28.12
AFRL - Raptor - System Maintenance
AFRL DSRC Raptor - Return to Service Delayed
The AFRL DSRC Cray XE6 Raptor cluster's return to service has been delayed. A current return-to-service time has not been determined.
We apologize for the inconvenience and will notify you when the system becomes available for use.
07.06.12
ORS - All Systems - Facility Maintenance
Power to Chugach, the Utility Server and Wiseman was lost. All running jobs were lost.07.06.12
ERDC - Diamond - System Maintenance
Diamond is experiencing a /work file system failure.
The High Speed Network was reset and Diamond returned to full service.
07.06.12
ERDC - Diamond - Hardware Maintenance
The ERDC DSRC lost power to one of its machine rooms at 1005 hours Central Time, which contains network gear and Diamond. Power has been restored and the ERDC DSRC network was returned to service at 1145 hours, but Diamond remains down until it completes full hardware diagnostics.
At this time there is no estimate of when Diamond will return to service.
Diamond returned to server at 1900 hours
07.06.12
ERDC - All Systems - Network Maintenance
ERDC DSRC has lost its network connectivity. Details to follow soon.
The ERDC DSRC lost power to one of its machine rooms at 1005 hours Central Time, which contains network gear and Diamond. Power has been restored and the ERDC DSRC network was returned to service at 1145 hours, but Diamond remains down until it completes full hardware diagnostics. Jade, Utility Server, and the Archive Server (Gold) remained up and in service.
At this time there is no estimate of when Diamond will return to service.
07.10.12
AFRL - Hawk - System Maintenance
The AFRL DSRC SGI Altix 4700 Hawk cluster is currently down due to an unscheduled maintenance. We apologize for the inconvenience and will notify you as soon as the system is available again.08.06.12
AFRL - Hawk - Hawk-0
AFRL DSRC SGI Altix 4700 HAWK currently unavailable - 6 August 2012
The AFRL DSRC SGI Altix 4700 HAWK cluster is currently unavailable due to an unscheduled event. We apologize for the inconvenience and will notify you as soon as the system is available again.
08.23.12
ERDC - Diamond - System Maintenance
27 Aug 2012, 1515 hours, CT
The Lustre file system check (lfsck) on /work failed to complete successfully. A more manual attempt to repair the file system has been initiated.
At this time there is no estimate when Diamond will return to production.
We apologize for the continued inconvenience.
***********************************************
24 Aug 2012, 1100 hours, CT
Diamond will be taken offline to complete its emergency maintenance.
After running fsck for the last 24 hours on one of Diamond's OSTs, there is no longer an expectation that it will complete successfully.
To put Diamond back into production, it has been decided to reformat that one OST. This will result in the loss of about 2 to 4 percent of the files on /work.
After the reformat of that OST, the entire file system will need an lfsck to confirm the integrity of the remaining files and their associated metadata, which will take at least another 8 hours.
A list of lost files will be generated and will be made available on Monday.
We apologize for the loss of any of your files.
08.30.12
AFRL - Raptor - System Maintenance
UPDATE: The Raptor system has been returned to server after an unscheduled maintenance.
The AFRL DSRC Cray XE6 Raptor cluster is currently down due to an unscheduled maintenance. We apologize for the inconvenience and will notify you as soon as the system is available again.
09.06.12
ORS - All Systems - Facility Maintenance
All ORS systems were shutdown for emergency power repair. Removed estimated Maintenance End as return to service is still uncertain.09.06.12
ERDC - Diamond - Facility Maintenance
All ERDC HPC systems were taken down for emergency repairs. Systems are expected to return to service in a few hours.
We apologize for the inconvenience.
09.14.12
ERDC - Diamond - System Maintenance
Diamond suffered a system failure and is offline for emergency maintenance. At this time there is no estimate of when it will return to service.
We apologize for the inconvenience.
11.14.12
ERDC - Diamond - System Maintenance
Diamond was taken down last night for emergency repairs at approximately 2030 hours Central Time.
At this time there is no estimate of when it will return to service.
11.17.12
ERDC - Diamond - System Maintenance
The repairs to Diamond continue, with no estimate of when it will return to service.
We apologize for this extended outage.
11.26.12
AFRL - Utility Server - System Maintenance
The AFRL DSRC Utility Server will be going down for Emergency maintenance today, 26 November 2012, at 1400. We apologize for the inconvenience and will notify you as soon as the system is available again.12.03.12
ORS - Copper - Hardware Maintenance
A cabinet was overheating -- system had to be rebooted.12.14.12
ERDC - Garnet - System Maintenance
Garnet has suffered a hardware failure and was taken down for emergency maintenance at 1130 hours, CT. At this time there is no estimate of when it will return to service.
Running jobs were killed by the system halt. Please resubmit after Garnet has returned to service.
We apologize for the inconvenience.
12.20.12
AFRL - Raptor - System Maintenance
Raptor Returned-to-Service after Unexpected Outage - 20 Dec 2012
The AFRL DSRC Cray XE6 Raptor cluster has Returned-to-Service after experiencing an unexpected system outage 0444 thru 2126 on 20 Dec 2012.
The AFRL DSRC apologizes for any inconvenience you may have experienced. If you have any questions, please contact the Consolidated Customer Assistance Center (CCAC) at 877-CCAC-039 (877-222-2039).
03.13.13
ERDC - Diamond - System Maintenance
Diamond is currently experiencing problems with is queueing system. The system administrators are working towards a resolution now. We will update here when more information is available.
03.25.13
ERDC - Garnet - Hardware Maintenance
PBS job scheduling has been suspended until 1000 hours this morning while a hard drive is replaced in the bootraid.
ERDC DSRC
03.25.13
ERDC - Garnet - Hardware Maintenance
Garnet was taken down at 1645 CT for emergency repairs to its boot disks. The estimated time for its return is currently not known.
Running jobs were killed by the system halt.
We apologize for the inconvenience.
03.28.13
ORS - Copper - System Maintenance
Copper job scheduling has been stopped while engineers diagnose a problem of user jobs being killed.04.17.13
ERDC - Utility Server - System Maintenance
The Panasas File system at the ERDC DSRC has suffered some hardware failures, and is in a degraded condition. This file system has been offlined for emergency repairs that are estimated to take up to 24 hours.
Diamond, Garnet, and the ERDC DSRC Utility Server will be affected during this emergency outage.
Diamond and Garnet have dismounted $CENTER (Center Wide File System), but they will remain in production.
The ERDC DSRC Utility Server has been taken down and will remain offline during this entire emergency outage.
We apologize for the inconvenience.
05.09.13
ERDC - Diamond - System Maintenance
Diamond has suffered a failure on its /work file system and was taken down for emergency maintenance at 1315 hours, Central Time. At this time there is no estimate of when it will return to service.
Running jobs were killed by the system halt. Please resubmit after Diamond has been returned to service.
We apologize for the inconvenience.
05.10.13
ERDC - Diamond - System Maintenance
Diamond suffered a system failure and was taken down for emergency maintenance at 1325 hours, Central Time. At this time there is no estimate of when it will return to service.
Running jobs were killed by the system halt. Please resubmit after Diamond has been returned to service.
05.17.13
ERDC - Utility Server - System Maintenance
The ERDC DSRC Utility Server will be taken down for emergency maintenance at 0900 hours, Central Time.
At this time there is no estimate of when it will return to service.
We apologize for the inconvenience.
Last modified: May 21, 2013


