ObsPy

Summary

ObsPy is an open-source project dedicated to provide a Python framework for processing seismological data. It provides parsers for common file formats, clients to access data centers and seismological signal processing routines which allow the manipulation of seismological time series (see Beyreuther et al. 2010, Megies et al. 2011).  See the ObsPy web site for an expanded summary.

 Citations:

(1) Moritz Beyreuther, Robert Barsch, Lion Krischer, Tobias Megies, Yannik Behr and Joachim Wassermann (2010), ObsPy: A Python Toolbox for Seismology, SRL, 81(3), 530-533.

(2) Tobias Megies, Moritz Beyreuther, Robert Barsch, Lion Krischer, Joachim Wassermann (2011), ObsPy – What can it do for data centers and observatories? Annals Of Geophysics, 54(1), 47-58, doi:10.4401/ag-4838.

Roots of this Package

Brief Development History

Need someone from the ObsPy development team to write a short paragraph here.

Community

ObsPy has developed a rapidly expanding community of users.   It is now heavily used by research groups in Europe and the US.   --Would be helpful for ObsPy developers to expand this--

Data Objects Supported

ObsPy is the only fully object oriented package for seismic processing in use in the research community.   A single channel seismogram is abstracted as a "Trace" object.   A Trace object has some core, required attributes like sample interval and number of samples.  A python dictionary is used to maintain ancillary metadata that would be stored as header attributes in a package like SAC or Seismic Unix.  This leaves the attributes that can be attached to a Trace object unlimited. 

Bundles of seismic "Trace" objects are stored in a "Stream" container.   Stream objects can be used to bundle three-component data.  They can also be used to easily bundle ensembles of scalar data channels.   Ensembles of three component seismograms are not directly supported, but are readily implemented as python lists of Stream object formed from three-component bundles.  

ObsPy also has a set of auxiliary objects useful for some processing:
  1. UTCDateTime - used to maintain absolute time stamps
  2. Catalog - Container used to hold source information
  3. Inventory - Container for internal handling of station metadata.

Data Flow Model

ObsPy is based on a programming model with the Python language used to control data flow.   It is correct to think of ObsPy as an object-oriented version of SAC with python as the command language.  On the other hand, the SAC analogy is limited because using python as the command language vastly expands the capabilities of ObsPy because of the rich ecosystem of modern software systems that can be linked with  ObsPy through python.   For example,  BRTT now has a rich  API for Antelope that  is known to interact cleanly with ObsPy; matlab links to python are possible; and numerous others.  

Data Management System

Metadata

Internally seismogram metadata are managed through a python container ObsPy refers to as stats.  The stats part of the object is a python dictionary (associative array) used to as their implementation of an infinitely extensible header. 

Source and receiver data are maintained as separate objects and it is the programmers responsibility to properly link source and receiver data to a given seismogram.   Python has no direct database support, although this is readily implemented by mixing python and Antelope's python  Datascope API.   Many seem to use files written by the python pickle utilities to save metadata components in files. 

ObsPy has a number of simple tools for retrieving metadata from IRIS DMC and other data centers that have adopted the recent FDSN web services standards.   It is thus possible to populate receiver and source information in the "just in time model" by requesting these data through web services.  This is best done for small request that are do not demand rapid response or tie up a large system as the latency of web request make any application using that model implicitly very slow.


Seismograms

ObsPy has a good collection of import routines that abstract the process of reading data in a range of external representations and placing the results in a Trace or Stream object.   Seismograms can be read and written from local files or imported from data centers through web services.   Use the former for performance and later as a way to assemble a working data set for later processing.  Most uses of ObsPy today manage their waveform data with files and naming conventions.  Experiments have been done to use the Datascope relational database to manage active waveform data, but no production system exist to use the capability. 

Processing Algorithm Concepts

The engine that drives ObsPy is the python scripting language.   Data flow is then determined solely by the script (program) used to define the processing.   Most processing scripts with ObsPy involve some variant of read data into memory, run a series of procedures/methods on that data, and write out results.  This is effectively the same as a typical SAC processing sequence except the command language is python instead of the SAC command language.  Because the python language is much more powerful and extensible than the SAC command language, however, that approach is due mainly to the way current generation programmers think about how data should be processed.  There are python bindings not only to Datascope but common SQL databases like Oracle and nonSQL databases like MongoDG.  There is also a well defined Python API for MATLAB.  This means far more elaborate schemes could be designed using the python language and ObsPY as one component of a more elaborate system.

User Interface

All interactions with ObsPy use the python language.


Programming Interface for Extensions

The reason python is becoming a workhorse in modern computing is its capability as a glue to paste disconnected pieces of software together.  This is done through the concepts of a python "module" and "package" which encapsulates a well defined API to a package of software.   Python packages can be pure python code, compiled language code (C/C++, FORTRAN, etc.) with wrappers, or a mix of the two.   Users of the package announce their need for that package with the keyword "import", which has several variants.   The API for that package is then visible to the python code for the source code that issues the import command. 

Parallel Processing Capability

ObsPy at present has no embedded concepts of parallelism itself.   Python itself, however, has multiple ways to do parallelism through appropriate packages.   Below are current known packages in python that might be exploitable in python scripts using ObsPy. 

Shared Memory Processing Support

  1. Python has a simple threading package.   This is known to work and has been used, for example, in the fetchirisdmc script found in the antelope contrib repository.   Like all threaded programs, however, one must be careful about variables accessed by multiple threads and protect them with a mutex (implemented in the current package with a lock/unlock method) when necessary.  
  2. Packages exist for python bindings to OpenMP.   No examples using this approach are known in seismology.

Cluster Computing Support

Python is being widely used as a framework for cluster computing.   We are aware of two current fundamentally different approaches to cluster computing with python:
  1. The standard for massively parallel computer clusters for the last decade has been the package called MPI.   Python packages to implement MPI are available in any high performance computing center worthy of that title.
  2. The competing standard to MPI is the parallel processing model generically referred to as "Map Reduce".   The map reduce concept is a core of the modern buzz word in computing of "big data".  The most common implementation today is the Hadoop framework championed by Apache but there are other competing systems.  All we are aware of support python and hence could, in principle, be used with ObsPy or any python package.

Extensibility

The python interface allows infinite extensibility through the MPI or Hadoop frameworks.   ObsPy does not yet support these, but they could, in principle, be developed by the community.