Storing Data

Data storage technology has improved dramatically in the past couple of years. Only a few years ago, the disk capacities of supercomputers accommodated but a fraction of the output of data from a typical Grand Challenge. Such an application might produce one hundred, 8 gigabyte (billion bytes) files on a 1024-node CM-5, i.e. approaching a terabyte (trillion bytes) of data. But until recently the supercomputer it was running on might have possessed only 50-100 gigabytes of disk capacity. Even with a high performance mass storage system housing another 200 gigabytes, it would have still been necessary to send large part of the user's data to a tape-based mass storage system. In spite of this transfer taking place automatically, delays in sending and retrieving the data could sometimes cause serious bottlenecks.

Fortunately, disk space has become a lot cheaper; high-performance computers now have plenty of disk capacity and all of the data can remain on the disk as long as an application is running.

After an application "run" is finished, the resulting data is automatically sent to a mass storage system (MSS) using a software utility called FTP (File Transfer Protocol). In NCSA's case, the UNIX-compatible MSS uses UniTree software on a Convex C220 machine with 100 gigabytes of disk space. (UNIX is an operating system commonly used in scientific computing as well as in many other computer applications.)

NCSA is scheduled to upgrade its MSS to a Convex 3880, with over 200 gigabytes of disk space, in November, 1995. As long as the user's dataset remains on the MSS, she or he can access the file just as quickly as the HiPPI network can transfer it-- at 622 megabits per second. For long-term storage, the data is transferred to high density Metrum tapes and stored on the shelf. The scientist can quickly retrieve the data from tape and have it transferred to disk by typing a command into his or her workstation.

