SummaryObsPy is an open-source project dedicated to provide a
Python framework for processing seismological data. It provides parsers
for common file formats, clients to access data centers and
seismological signal processing routines which allow the manipulation
of seismological time series (see Beyreuther et al. 2010, Megies et al.
2011). See the ObsPy web site for an expanded summary.
(1) Moritz Beyreuther, Robert Barsch, Lion Krischer, Tobias Megies,
Yannik Behr and Joachim Wassermann (2010), ObsPy: A Python Toolbox for
Seismology, SRL, 81(3), 530-533.
(2) Tobias Megies, Moritz Beyreuther, Robert Barsch, Lion Krischer,
Joachim Wassermann (2011), ObsPy – What can it do for data centers and
observatories? Annals Of Geophysics, 54(1), 47-58, doi:10.4401/ag-4838.
Roots of this Package
Brief Development History
Need someone from the ObsPy development team to write a short paragraph here.
Community ObsPy has developed a rapidly expanding
community of users. It is now heavily used by research
groups in Europe and the US. --Would be helpful for ObsPy
developers to expand this--
Data Objects SupportedObsPy is the only fully object oriented
package for seismic processing in use in the research
community. A single channel seismogram is abstracted as a
"Trace" object. A Trace object has some core, required
attributes like sample interval and number of samples. A python
dictionary is used to maintain ancillary metadata that would be stored
as header attributes in a package like SAC or Seismic Unix. This
leaves the attributes that can be attached to a Trace object
Bundles of seismic "Trace" objects are stored in a "Stream"
container. Stream objects can be used to bundle
three-component data. They can also be used to easily bundle
ensembles of scalar data channels. Ensembles of three
component seismograms are not directly supported, but are readily
implemented as python lists of Stream object formed from
ObsPy also has a set of auxiliary objects useful for some processing:
- UTCDateTime - used to maintain absolute time stamps
- Catalog - Container used to hold source information
- Inventory - Container for internal handling of station metadata.
Data Flow ModelObsPy is based on a programming model with the
Python language used to control data flow. It is correct to
think of ObsPy as an object-oriented version of SAC with python as the
command language. On the other hand, the SAC analogy is limited
because using python as the command language vastly expands the
capabilities of ObsPy because of the rich ecosystem of modern software
systems that can be linked with ObsPy through python.
For example, BRTT now has a rich API for Antelope
that is known to interact cleanly with ObsPy; matlab links to
python are possible; and numerous others.
Data Management System
Internally seismogram metadata are managed through a python container
ObsPy refers to as stats. The stats part of the object is a
python dictionary (associative array) used to as their implementation
of an infinitely extensible header.
Source and receiver data are maintained as separate objects and it is
the programmers responsibility to properly link source and receiver
data to a given seismogram. Python has no direct database
support, although this is readily implemented by mixing python and
Antelope's python Datascope API. Many seem to use files
written by the python pickle utilities to save metadata components in
ObsPy has a number of simple tools for retrieving metadata from IRIS
DMC and other data centers that have adopted the recent FDSN web
services standards. It is thus possible to populate
receiver and source information in the "just in time model" by
requesting these data through web services. This is best done for
small request that are do not demand rapid response or tie up a large
system as the latency of web request make any application using that
model implicitly very slow.
ObsPy has a good collection of import routines that abstract the
process of reading data in a range of external representations and
placing the results in a Trace or Stream object.
Seismograms can be read and written from local files or imported from
data centers through web services. Use the former for
performance and later as a way to assemble a working data set for later
processing. Most uses of ObsPy today manage their waveform data
with files and naming conventions. Experiments have been done to use the Datascope relational database to manage active waveform data, but no production system exist to use the capability.
Processing Algorithm ConceptsThe engine that drives ObsPy is
the python scripting language. Data flow is then determined
solely by the script (program) used to define the
processing. Most processing scripts with ObsPy involve some
variant of read data into memory, run a series of procedures/methods on
that data, and write out results. This is effectively the same as
a typical SAC processing sequence except the command language is python
instead of the SAC command language. Because the python language
is much more powerful and extensible than the SAC command language,
however, that approach is due mainly to the way current generation
programmers think about how data should be processed. There are
python bindings not only to Datascope but common SQL databases like Oracle and nonSQL databases like MongoDG. There is also a well defined Python API for MATLAB.
This means far more elaborate schemes could be designed using the
python language and ObsPY as one component of a more elaborate system.
User InterfaceAll interactions with ObsPy use the python language.
Programming Interface for ExtensionsThe reason python is
becoming a workhorse in modern computing is its capability as a glue to
paste disconnected pieces of software together. This is done
through the concepts of a python "module" and "package" which
encapsulates a well defined API to a package of software.
Python packages can be pure python code, compiled language code (C/C++,
FORTRAN, etc.) with wrappers, or a mix of the two. Users of
the package announce their need for that package with the keyword
"import", which has several variants. The API for that
package is then visible to the python code for the source code that
issues the import command.
Parallel Processing CapabilityObsPy at present has no embedded concepts of parallelism
itself. Python itself, however, has multiple ways to do
parallelism through appropriate packages. Below are current
known packages in python that might be exploitable in python scripts
Shared Memory Processing Support
- Python has a simple threading package. This is known to work and has been used, for example, in the fetchirisdmc script found in the antelope contrib
repository. Like all threaded programs, however, one must
be careful about variables accessed by multiple threads and protect
them with a mutex (implemented in the current package with a
lock/unlock method) when necessary.
- Packages exist for python bindings to OpenMP. No examples using this approach are known in seismology.
Cluster Computing SupportPython is being widely used as a
framework for cluster computing. We are aware of two current
fundamentally different approaches to cluster computing with python:
- The standard for massively parallel computer clusters for the last decade has been the package called MPI. Python packages to implement MPI are available in any high performance computing center worthy of that title.
- The competing standard to MPI is the parallel processing model
generically referred to as "Map Reduce". The map reduce
concept is a core of the modern buzz word in computing of "big
data". The most common implementation today is the Hadoop
framework championed by Apache but there are other competing
systems. All we are aware of support python and hence could, in
principle, be used with ObsPy or any python package.
ExtensibilityThe python interface allows infinite
extensibility through the MPI or Hadoop frameworks. ObsPy
does not yet support these, but they could, in principle, be developed
by the community.