Follow this link to skip to the main content
NASA - National Aeronautics and Space Administration

+ NASA Home
+ Ames Home

+ Sitemap
+ Staff Directory


+Home


HIGH END SYSTEMS
+ Pleiades
+ Columbia
+ Schirra
+ RTJones

ORGANIZATION
PROJECTS
COLUMBIA
PARTNERS


'All we asked was a miracle a day'
- Walt Brooks, NAS Division Chief


Below are highlights of what it took to construct NASA's Columbia supercomputer in less than 120 days, start to finish.


COLUMBIA'S CONSTRUCTION

Through extraordinary dedication and uncompromising commitment, the Columbia project team, comprised of civil servants, contractors, industry partners, and vendors, achieved what many in the supercomputing community considered impossible - conceiving, planning, and constructing the world's largest Linux-based, shared-memory system in just over four months.

It took a diverse group of managers, systems, network, and facilities engineers, security experts, and scientific applications and visualization specialists working literally around the clock to ensure completion and benchmarking of the entire 10,240-processor system on schedule. Team engineers designed a high-speed internal network that efficiently links processors, upgraded an external network for system users, developed a robust computer security architecture, and modified several critical scientific applications to take advantage of Columbia's capabilities. In addition, the project required substantial modification to facility power and cooling systems - all accomplished on a rigorous schedule.

06.18.04 - 10,240-Processor Supercomputer Approved
Funding for the 10,240-processor SGI Altix supercomputer was approved by both houses of Congress. A joint collaboration with SGI and Intel, the new system will provide a 10-fold increase in capacity over the current SGI systems at Ames Research Center - giving NASA the capability and capacity to accelerate both science and mission-critical applications simultaneously. The system construction, named "Project Columbia," will be installed at the NASA Advanced Supercomputing (NAS) Facility in 512-processor increments, with the first installment of 1,024 processors to be delivered on June 28th.

06.28.04 - First Two Columbia Systems Installed, Running
The first two SGI Altix 512-processor systems for Project Columbia were delivered. In two days, the project team (comprised of NASA Advanced Supercomputing (NAS) Division staff and SGI technicians) installed both systems, completed network cabling, and ran diagnostics tests - work that normally takes 7-10 days. In addition, the design for high-level networking was configured. By week's end, one system was already running operational codes for the shuttle Return to Flight work, and the second was ready for systems development work.

07.27.04 - Preliminary Results from Project Columbia Show Promise
NASA's Columbia system has already produced results that foretell breakthrough scientific achievements even before the system is completed.
+ Read More

08.02.04 - Third 512-Node for Project Columbia Delivered
Another of the twenty 512-processor nodes for NASA's Columbia system arrived, along with six new power distribution units. Two minor hardware component failures found during installation were quickly resolved, and the system was turned over to the diagnostics team on August 5th. At the same time, the original 512-processor system, installed in 2003 and known as Kalpana was powered down, and its data storage (RAID) devices were moved and connected to another 512-node of Columbia. Within four hours, users of the system were back in production on the new node. With this move completed, the original Kalpana hardware was repositioned to the main computer room floor, powered up, and then diagnostics and testing of the operating system were begun.

08.12.04 - Major Facilities Preparation Completed for Project Columbia
The Columbia project team completed significant facilities preparations for a new 512-node system and 16 power distribution units (PDUs) scheduled to arrive on August 16th. Work included: installing major electrical power sources to support the new PDUs; installing new network cabinets; and laying extensive copper and fiber optic network cables. The Columbia team also made excellent progress installing the large chilled-water pipe system required for the next-generation SGI Altix computers (3700-BX2 model), which will be part of the Columbia supercomputer. In addition to the facilities work, Columbia project management conducted a Critical Design Review to ensure that all aspects of the project are synchronized and to assure that resources and scheduling requirements were met.

09.02.04 - Seventh New System Installed for Project Columbia
The seventh system for Project Columbia arrived, and was installed, tested, and made available to several users within the week. The project team configured one 512-processor system dedicated to each of the four NASA missions, and the network team accomplished the switch from an old fiber connection to a new one - without affecting service to users. Additionally, most major facilities work was completed, including: installation of power distribution units and electrical upgrades and water pipe insulation.

09.08.04 - Project Columbia Installation Ahead of Schedule
The eighth system for Project Columbia was delivered, installed, and began running diagnostics the same day. SGI revised their delivery schedule to indicate that several machines would arrive earlier than planned; this will help with installation logistics to improve safety on the computer room floor - requiring fewer people on the floor at any time. Completion of the underfloor water loop that will cool the Bx2 systems, arriving this week, marks the end of the major modifications to the NASA Advanced Supercomputing (NAS) Facility. In addition, the visualization team developed and deployed a new method for more realistic viewing of Hurricane Frances simulations. Using ideas from the Line Integral Convolution technique (also developed in the NAS Division) and imaged-based flow visualization, the team deployed the new method on the Finite Volume Global Circulation Model (fvGCM), This computationally intensive technique was implemented on the programmable hardware of Columbia's graphics processing unit.

09.20.04 - Project Columbia Reaches Significant Milestone
Installation of the 10,240-processor Columbia supercomputer is now 55 percent complete, with the arrival of three SGI Altix systems in the past week, including two upgraded Altix 3700Bx2 systems. All systems were installed and tested within 24 hours of delivery. The team worked to ensure related equipment and facilities work was in place for this monumental construction project. For example, the Voltaire Infiniband switch and a custom switch cabinet, which will connect hundreds of network cables to the 20 systems, were successfully installed; and Phase 2 network installation was completed. In addition, progress continued on obtaining improved resolution on several key applications codes, including Estimating the Climate and Circulation of the Ocean (ECCO) and the Finite Volume Global Circulation Model (fvGCM). The Columbia team worked closely with scientists from other centers to enhance the capability of their codes.

09.27.04 - Project Columbia Nears Completion
Eighteen of the 20 SGI Altix systems have now been integrated at the NASA Advanced Supercomputing Facility, and the installation now stands at the 90 percent completion mark. One system arrived on September 23, two on the 24th, and four systems arrived on September 27th, with a total of 12 systems being delivered and installed in September. Theoretical peak capacity is at 55 teraflops. Once again, the project team completed a staggering amount of work in record time, in order to install, test, and set up these systems for applications use. Six of these systems are being dedicated to enhance key mission applications. Significant progress to improve the efficiency of Linpack benchmarks was made over the course of the week. By week's end, systems engineers had achieved a Linpack rating of 15.1 teraflops over a configuration of eight systems, with 60 percent efficiency. With continued work, the team expects to reach their target of at least 80 percent efficiency on 20 systems. To maximize the Linpack rating, three technical approaches are being tested simultaneously.

10.12.04 - All Systems Installed for Project Columbia
The final two 512-processor nodes of the Columbia system arrived and were installed. The computer floor at the NASA Advanced Supercomputing (NAS) Facility now contains all 20 systems - twelve Altix 3700 and eight Altix 3700-BX2 systems - with a total of 10,240 Intel Itanium processors. The project team continued to make considerable progress in improving the efficiency of Linpack benchmarks. With significant progress made on one of the three methods, effort on the least promising approach has been discontinued. By the end of the week, a Linpack rating of 19.6 teraflops had been achieved on eight nodes. While the benchmarks were being tested, work continued on targeted climate and ocean models. Collaborators from other sites arrived to work with the applications and visualization teams, with the goal to produce new science, and higher performance and resolution, on existing codes. Several applications have already achieved results that will be ready for demonstration in early November.

10.19.04 - Linpack Results Surpass the Earth Simulator
Work on improving efficiency of the Linpack runs has continued around the clock for nearly a week. NASA and SGI engineers continued to remove performance obstacles, and after a promising run on eight nodes, the team set up for an attempt on a 16-node configuration, which was preceded by a running diagnostics suite that located hardware flaws in the system that could have jeopardized the nine-hour run. Once these flaws were corrected, the 16-node run commenced. It ran flawlessly, performing at 87 percent efficiency for a Linpack speed of 42.7 teraflops, exceeding the 35.9 teraflop speed posted for the current top system, the Earth Simulator in Japan, by some 19 percent.

10.25.04 - Final Linpack Run Completed and a Return to Science
Following the success achieved on the 16-node run, the team set up for an attempt at a 20-node run. Runs on a large number of nodes are potentially problematic on the new hardware because all systems must remain fully functional during the entire run, which can last nine or more hours. The Columbia team completed the run on October 23, and reported the result to the Linpack Top 500 organization. The number will be revealed publicly along with hundreds of others at the annual Supercomputing Conference (SC2004) in Pittsburgh on November 9.

With this important benchmark completed, the team returned the system to its even more important use - science. Eight nodes were dedicated to the Return to Flight team who will perform a real-time simulation of Shuttle Flight support over an uninterrupted 24-hour period. Following this simulation, two nodes will be dedicated to production usage for each of the four missions, totaling eight production nodes. Another six nodes will be allocated to the Applications Development team to prepare final scientific results for SC2004 demonstrations being performed by NASA scientists. Two nodes will continue to be used by the System Development team working on efficiency and other internal operational issues. The remaining four systems will be turned back to SGI engineers who will be interconnecting them to create the first 2,048-processor system using Altix 3700-Bx2 technology.




USA.gov -- government made easy
+ Feedback
+ Site Help
+ NASA Privacy Statement, Disclaimer, and Accessibility Certification
Click to visit the NAS Homepage
Editor: Jill Dunbar
Webmaster: John Hardman
NASA Official: Rupak Biswas
+ Contact NAS

Last Updated: June 23, 2008