Seismic Unix

Summary

Seismic UNIX is the definitive, open-source, package for processing of seismic reflection data.   It is heavily dependent upon the unix concept of command line modules linked by pipelines with programs using complex incantations having long argument lists.   All seismic unix processing is driven by unix shell scripts.   Being very reflection oriented it is not at all generic but largely limited to active source data.

Roots of this Package

Brief Development History

At this writing (2016) seismic unix has been in existence for 27 years.  The following text is extracted from the Seismic Unix Users' Manuals and describes it history very well:

Acknowledgments
The Seismic Unix project is partially funded by the Society of Exploration Geophysicists (SEG), and by the Center for Wave Phenomena (CWP), Department of Geophysical Engineering,
Colorado School of Mines. Past support for SU has included these groups, as well as the GasResearch Institute (GRI).Thank you SEG, and CWP for your continued support.  The sponsors of the CWP Consortium Project have long been partners in the SU project and we are pleased to explicitly acknowledge that relationship here. In addition, we wish to acknowledge extra support supplied in the past by IBM Corporation and by the Center for Geoscience Computing at the Colorado School of Mines during the late 1980’s when SU was ported to the modern workstation from its previous incarnation on graphics terminals.  So many faculty and students, both at our Center and elsewhere, have contributed to SU, that it is impossible to list them all here.  However, certain people have made such important contributions that they deserve explicit mention. Einar Kjartansson began writing what is now called SU (the SY package) in the late 1970’s while still a graduate student at Jon Claerbout’s Stanford Exploration Project (SEP). He continued to expand the package while he was a professor at the University of Utah in the early eighties. In 1984, during an extended visit to SEP Einar introduced SY to Shuki Ronen, then a graduate student at Stanford. Ronen further developed SY from 1984 to 1986. Other students at SEP started to use it and contributed code and ideas. SY was inspired by much other software developed at SEP and benefited from the foundations laid by Clairbout and many of his students; Rob Clayton, Stew Levin, Dave Hale, Jeff Thorson, Chuck Sword, and others who pioneered seismic processing on Unix in the seventies and early eighties.  In 1986, Shuki Ronen brought this work to our Center during a one year postdoctoral appointment at the Colorado School of Mines. During this time, Ronen aided Cohen in turning SU into a supportable and exportable product. Chris Liner, while a student at the Center, contributed to many of the graphics codes used in the pre-workstation (i.e, graphics terminal) age of SU. Liner’s broad knowledge of seismology and seismic processing enabled him to make a positive and continuing influence on the SU coding philosophy.  Craig Artley, now at Golden Geophysical, made major contributions to the graphics codes while a student at CWP and continues to make significant contributions to the general package.  Dave Hale wrote several of the “heavy duty” processing codes as well as most of the core scientific and graphics libraries. His knowledge of good C-language coding practice helped make our package an excellent example for applied computer scientists.

Ken Larner contributed many user interface ideas based on his extensive knowledge of seismic processing in the “real world.”

John Scales showed how to use SU effectively in the classroom in his electronic text, Theory of Seismic Imaging, Samizdat Press, 1994. This text is available from the Samizdat press site at: samizdat.mines.edu.

John Stockwell’s involvment with SU began in 1989. He is largely responsible for designing the Makefile structure that makes the package easy to install on the majority of Unix platforms.  He has been the main contact for the project since the first public release of SU in September of 1992 (Release 17). After Jack Cohen’s death in 1996, Stockwell assumed the role of principal investigator of the SU project and has remained in this capacity to the present. The number of codes has more than tripled in the 11 years, since John Stockwell has been PI of the project.

Community

As the above code indicates the seismic community is a mix of academics and industrial firms involved in oil and gas exploration.   The package is heavily used academically as a teaching and research tool as it is the best maintained package for processing seismic reflection data.   It is mainly used commercially by small companies who cannot afford the price tag of a commercial seismic processing system.  A large fraction of these are in foreign countries where the exchange rate make commercial software systems particularly problematic. 

Data Objects Supported

Seismic Unix, like SAC, shares the (archaic) concept that the external and internal representations of data are identical.  In seismic unix all data is a seismic unix file made of up fixed length blocks of binary data.  Each data block is broken into a fixed length header (a small variant of SEGY) and a vector of data stored as binary floats (Note this has changed over the years.  Older versions used host floating point numbers but the more recent releases store data externally as XDR binary data and convert these to host floats during reads.   Ensembles are defined by sort order in a data file using header keys.   That is, order matters in a Seismic Unix file.   Ensembles boundaries are defined by changes in one or more header keys.   (e.g. a shot gather is commonly defined by changes in the field file id number - called ffld in seismic unix).

Data Flow Model

Data flow in seismic unix is derived 100% from the unix concept of pipelines.   This generic concept is illustrated in the following figure:
generic workflow figure

All data passing through the system is assumed to be a continuous stream of fixed length SEGY-variant trace objects.   Individual processing steps are produced by separate unix "filters" (not to be confused with the time series concept of filtering) linked through a fifo (implemented in the unix shell through the "|" symbol).   Individual programs may produce data, modify data input though stdin and write new data to stdout, or read data from stdin and display it via a set of graphics programs.   Data are entering the pipe are normally defined by the unix stdin redirection operator ("<") and data to be saved defined through the unix stdout redirection operators (">" or ">>"). 

Note that figure above illustrates a generic data flow in classic streams-based seismic reflection processing systems of which seismic unix is a specific implementation.  In seismic unix the split flow is awkward, at best, to implement with something like the unix tee command and temporary files.   We show this concept because although it is not common in seismic unix, this type of workflow is common in commercial seismic processing systems like ProMax or Omega.

Data Management System

Metadata

Metadata in seismic unix equal header attributes.  The list of attributes and naming convention are totally static and become part of the vocabulary of working with this package.   The set of attributes are very reflection seismic centric and it is awkward to shoehorn any passive array data into this namespace limit.   Like any reflection system a number of tools are available in the package to manipulate header fields (metadata).  Thus data management is the same as managing the header data.  

Seismograms

Because seismic unix locks headers (metadata) and data together the two are intrinsically inseparable.   Data sets are maintained as one or more seismic unix files stored in directories with some naming convention like "line1000.su", "line2000.su,", etc. 

Processing Algorithm Concepts

Most algorithms are implemented as unix filters that read SU data from stdin and write SU data to stdout.   The exceptions are readers, import/export programs, graphics programs, and the susort program which uses a disk sort an can only be used at the end of a pipeline.

User Interface

Individual programs options are controlled by complex incantations of command line arguments with a reasonably consistent syntax.   Processing work flows are normally defined by unix shell scripts that can become fairly elaborate.  A GUI interface called GeBR  has been developed  and may be useful to some.

Programming Interface for Extensions

New processing modules are readily added to Seismic Unix by writing a standard unix filter program.   The install procedure builds a set of static libraries.   These libraries can be used in the link phase to build a custom program to fit into the classic pipeline data flow model.   CWP does not provide any help for this, but the reason is probably that it is pretty simple.  All the libraries are in one place so the link line only use the simple flag -L$(CWPROOT)/lib and reference which of the static libraries from Seismic Unix are needed to link the application.  

Parallel Processing Capability

Like much of our data processing infrastructure Seismic Unix was conceived and the main development proceeded any notion of modern massively parallel systems.  Note that in a way the pipeline structure itself is a cheap form of parallelistm.   The Unix operating system launches one process for each member of a pipeline chain and then determines how data flows through the system.  Nonetheless, there are a few additional extensions that might be feasible. We comment here on those, but emphasize these are purely theoretical.

Shared Memory Processing Support

Seismic Unix has no parallel constructs.  It should, however, be pretty straightforward to modify a number of the key processing routines to use multithreading with  OpenMP.   Similarly, GPU support is feasible through OpenACC.  In both cases, this would be useful only if an extensive effort were made to profile key programs to see which sections of the code were the bottleneck and understand them well enough to know if parallelization with one of these simplified tools would help.  Further, the pipeline data flow model may make such an effort pointless anyway as a processing chain can only run at the rate of it slowest member.  Further, the optimal number of threads to devote to a particular task is never predictable so it is unclear if this would be useful.

Cluster Computing Support

One could conceive of a master/worker model in a cluster environment where the master farmed out each program in a processing chain to one or more cluster nodes.  The master would then also have to set up the communications channels between the nodes to replace the pipeline fifo constructs used in the unix shell.   This would be an enormous development effort.

Extensibility

The data flow model of Seismic Unix is widely used in scientific computing and has well known limitations for extensibility.  Only the core algorithms of seismic unix could be retained if the package were redesigned to be extensible beyond single node systems.