After waiting weeks to receive cartridge tapes of data from a government service bureau, yesterday's researcher then fell to the painstaking task of extracting the datasets needed. Hindered by limitations in the technology for manipulating the data, the researcher finally could apply only a small portion of the data to the research question.
Seismologist Gary Pavlis, professor of geological science at Indiana University Bloomington, has long been accustomed to gathering data in the hundreds of gigabytes. A seismic network established for the Wabash Valley Seismic Zone Experiment has gathered some 500 megabytes of data per day over the last decade for a study of earthquake risk estimates. It wasn't lack of computing horsepower at IU that slowed the pace of Pavlis' early experiments, but the ability of IU's old tape-handling system to manipulate massive amounts of data. To help overcome the computer's limitations, Pavlis worked with data in chunks, organizing it manually and writing programs to manipulate the data. Processing was fragmented. He put experiments on hold. "The usefulness of many datasets was limited by our ability to get to them," recalls Pavlis. "The inability to store and manipulate data literally throttled the progress of our research."
Inside this IBM mass storage server are two robot arms with optical scanners that can move down a track, locate and select a specific tape out of storage, and insert it into a reader. The system contains eight tape racks, each rack holds 180 tape cartridges, and each tape cartridge holds ten gigabytes of uncompressed data. Previous to this system, a single tape cartridge only held two hundred megabytes of data and each had to be handled and moved manually. University Information Technology Services staff that maintain the system are (front to back) Phill Smith, lead shift supervisor, Sandy Hamm, data handling specialist, Linda M. Davis, senior systems console operator/unit webmaster, and Julie Wetzel, senior systems console operator/NT specialist. --credit
Pavlis struggled to conduct his research in the face of technology's limitations but found some problems were beyond the capabilities of the computing systems at IU and elsewhere. Today, IU's new mass data storage project and supercomputer are removing these limitations from his work.
This summer Pavlis will collect data from some forty state-of-the-art seismic stations installed in Central Asia's Tien Shan (Heavenly Mountain) region. The Tien Shan range is unusual in that its origin doesn't fit the plate tectonics model; much of its growth is believed to have occurred recently. With funding from a National Science Foundation grant, Pavlis will use a 3-D direct-imaging method with which he plans to produce revolutionary new images of the earth's interior. Generating a single image requires rapid access to a full gigabyte of raw data, hundreds of gigabytes of temporary storage, and space to turn the data into 3-D volumes--virtually cubing the amount of data he will handle. (To help put these amounts in perspective, visualize a CD as containing somewhat less than two-thirds of a gigabyte of data.)
Accompanying Pavlis is a new research partner. Pavlis' teammate is IU's new High Performance Storage System, which was a factor in his receiving funding. This IBM mass storage experiment comprises multiple distributed clients and servers, offering researchers tens of terabytes of secure and readily accessible storage space. What makes the storage system unique is that it's coupled to a distributed file service. That service allows researchers to access and share files stored on the networked computers as easily as if they were working from giant databases on their own hard drives. The system offers a high degree of automation, transferring fifteen megabytes of data per second. Today the system offers tens of terabytes of storage space; within the next year that will grow to hundreds of terabytes. Says Pavlis, "This system makes all data instantly accessible for processing--it changes the rules."
In providing a completely new infrastructure for working with large datasets, this utility opens new possibilities for research on campus and around the state of Indiana and puts IU researchers in the vanguard of modern global research. "This system removes the ceilings for data storage and transfer, radically changing the ways researchers can think about projects," says Zdzislaw Meglicki, senior technical advisor in the Office of the Vice President for Information Technology. Affording scholars transparent access to a virtually unlimited store of data, the project removes computer imposed restraints, opening the door to higher levels of creative thinking.
The system is one of the trio of elements that underpin the university's information technology infrastructure. IU's capabilities in processing power, network bandwidth, and massive storage are key tools at the core of such forward-looking strategic initiatives as the Digital Libraries Project (and the well-known music library VARIATIONS project) and distance learning (and IU's new Oncourse course management software). Equipped with these tools, today's scholars are taking their research beyond yesterday's boundaries.--Jan R. Holloway and Malinda Lingwall
For more information on the Web:
Return to the Table of Contents