CHEMICAL INFORMATION SOURCES

Chemistry has been called the "central science" because it overlaps with a number of other scientific disciplines, bridging physics and the biosciences. Published findings about modern chemistry date to the 18th century and have increased tremendously in volume since World War II. In 1946, Chemical Abstracts Service (CAS) included in its flagship publication, Chemical Abstracts (CA), references to 49,578 documents; in 1997, the number was 716,564. Chemists pride themselves on their skill in using the literature of chemistry, despite its enormous size and complexity. They have made extraordinary contributions to the organization of chemical information and have had a major impact on the information retrieval practices in other scientific disciplines.

Commercial Databases and Vendors. Much of the control of the literature of chemistry in the last third of the twentieth century was due to partnerships between the abstracting and indexing (A&I) services and the vendors of online services who leased their databases. The partnerships formed by Lockheed Information Systems (later Dialog) and System Development Corporation (later Orbit) with various private, not-for-profit, and governmental A&I services led to the rapid adoption of online searching in the early 1970s.

STN International and Chemical Abstracts Service (CAS). Over 200 databases are now available on the STN International search system, in which CAS is a major stakeholder. STN offers information on a broad range of topics, including chemistry, engineering, life sciences, biotechnology, regulatory compliance, patents, and business. STN International is the only vendor that provides the abstract data from Chemical Abstracts in addition to full structure searching capabilities (in the CA and Registry databases respectively). Utilizing the CAS Registry Number, the unique number that Chemical Abstracts Service assigns to compounds and mixtures indexed by CAS, it is possible to search in excess of 60 different databases on the STN system. An example of a CAS Registry Number is that for isatin: 91-56-5. Chemical registry databases enable the unique identification of a chemical substance and help to avoid errors in indexing chemical compounds in related databases. They can be searched by such parameters as chemical name, molecular formula, and structure. The registry database identifier for a chemical (usually a number) is used to index documents in other databases that deal with chemical substances. The main CAS databases are the Chemical Abstracts (CA) and REGISTRY files. At the end of 1998, they included about 14 million document records and 18 million substance records respectively. CAS also produces a database that covers chemical reactions (CASREACT), as well as one that combines the catalogs of chemical suppliers (CHEMCATS), and a database that indexes regulated chemicals (CHEMLIST).

The CA file has images of all CA issues back to its beginnings in 1907. However, the database is fully searchable only from 1967, with limited searching by CAS Registry Number in the CAOLD file between 1957 and 1966. The Registry file contains the records of all substances indexed by CAS from 1965 onward, as well as most older compounds. The CA database is available in a number of formats, including a CD-ROM version from 1987 to the present. STN also has a Web interface via STNEasy, and dire ct searching is possible through STN International and other vendors of database services. With such a large and complex database, CAS saw the need to develop a software tool that would allow chemists who are infrequent or novice searchers of the database to effectively utilize the information source. The solution was SciFinder, an easy-to-use tool that now has an academic counterpart, SciFinder Scholar. The new release of SciFinder is integrated with STN's ChemPort Connection, which in turn links to participating publishers' Web sites for the full-text electronic journals. Thus, a search can lead from a reference in the CA database to the text of the original article without ever having to visit a library. Such links to electronic journals are becomi ng more common as abstracting and indexing databases and journal vendors (aggregators) forge new partnerships with primary journal publishers.

Other Vendors and Databases. Other major database vendors maintain files for subjects as diverse as competitive intelligence and chemical reaction searching. ISI, the Institute for Scientific Information, has a Web version of Science Citation Index in its Web of Science product. Reaction searches can be done on the ISI Chemistry Server. A special offering on the Questel·Orbit system is their Generic DARC and Markush DARC chemical databases. Markush structures are imprecisely defined chemical structures found in the patent literature. They cover a large number of related compounds for which patent protection is sought. Generic DARC works with subsets of the CA databases that have exactly defined structures. Questel·Orbit is the only vendor besides STN to offer full structure searching of the entire CAS Registry System data. Markush DARC searches the Markush formulae databases MPHARM and IPAT (pharmaceutical and patent databases from INPI, the French Patent Office) and Derwent Information Ltd.'s comprehensive patent databases, WPIM and WPAT.

Impressive as the databases mentioned above are, their coverage is limited to the last three decades or so. The appearance of the Beilstein CrossFire system in the mid-1990s re-awakened interest in the early literature of chemistry. Beilstein covers the literature of modern chemistry all the way back to its beginnings in the 18th century. Users can search for facts, perform structure searches, and construct reaction searches in a database of over 7 million organic and approximately 1 million inorganic and organometallic compounds. The Cambridge Structural Datab ase (CSD) from the Cambridge Crystallographic Data Centre is the largest searchable database of experimentally determined crystal structures in the world. The CSD contains crystal structure information for over 190,000 organic and organometallic compounds analyzed using X-ray or neutron diffraction techniques. Network access is available to European users, but others must license and mount the files locally. Producing specialized files on chemically-related subjects as diverse as corrosion, biotechnology, cancer research, and materials science, Cambridge Scientific Abstracts makes available its own as well as such databases as AGRICOLA (agriculture), MEDLINE, and TOXLINE. Ovid Technologies, Inc. provides access to bibliographic and full-text databases for academic, biomedical and scientific research. In 1994, the company acquired BRS Online and now has more than 90 databases, including MEDLINE and a growing collection of full-t ext electronic journals. Ovid links the references from certain of its databases directly to the original articles in the journals. The Chemical Information System (CIS) contains over 30 databases covering a variety of subjects related to chemistry and the environment. Such topics as site assessment, hazardous materials, material safety data sheets (MSDSs), chemical and physical properties, biodegradation and bioremediation, toxicology and carcinogenicity, regulations, pharmaceuticals, and spectroscopy can be found on the CIS. The system allows both structure and nomenclature searching. Technical Database Services, Inc. (TDS) is a provider of technical scientific information in the areas of chemistry, biology, environmental science, and medicine. Included is the American Institute of Chemical Engineers' DIPPR Pure Component Data Compilation. DIPPR covers 29 fixed-value properties and 13 temperature-dependent properties for about 1600 industrial chemicals.

One of the earliest commercial database vendors was the National Library of Medicine (NLM), whose MEDLINE database covers the literature to 1965. In the last year, versions of MEDLINE and other databases produced by NLM have become available free on the Internet. There are two avenues to the Internet files: PubMed and Internet Grateful Med. PubMed has linkages to publishers’ Web sites for approximately 250 journals and links to molecular biology databases of DNA/protein sequences and 3-D structure data. Internet Grateful Med connects to 15 NLM databases. Among those is ChemID, a chemical registry database with over 339,000 compounds of biomedical and regulatory interest.

Other Free Internet Sources. The most reliable databases are generally those which charge a fee for searching. However, a growing number of databases can be searched free on the Internet. A good way to find such resources is to consult CHEMINFO and its SIRCh (Selected Internet Resources for Chemistry) guide. For specific chemicals, CambridgeSoft’s ChemFinder can locate hundreds of Internet sites by searching a chemical name, CAS Registry Number, molecular formula, or molecular weight. In addition, with CambridgeSoft's ChemDraw plug-in software, structure searching is possible on ChemFinder. Among the sites indexed by ChemFinder is the NIST Chemistry WebBook. It contains thermochemical data, reaction thermochemistry data, mass spectra, UV/Visible spectra, electronic and vibrational spectra, and constants of diatomic molecules, among other data. CHEMCYCLOPEDIA gives several avenues to commercial sources of chemicals. One can search by chemical name or supplier name to find trade names, packaging, special shipping requirements, potential applications, and CAS Registry Numbers. Chemicals are divided into categories such as surfactants or specialty gases, thus making it easy for users to locate specific chemicals. Chemical Patents Plus from CAS has the full text for all classes of patents issued by the U.S. Patent and Trademark Office from 1975 to the present, as well as partial coverage from 1971-1974. From January 1, 1995, the patent page images are available. Searching of the database is free, as is the display of patent titles and abstracts.

 

Handbooks, Encyclopedias, and Data Compilations. CD-ROM versions of various handbooks and larger compilations of data and facts have been available for several years, e.g., the CRC Handbook of Chemistry and Physics and the Merck Index. Encyclopedias, such as the Kirk-Othmer Encyclopedia of Chemical Technology, are now poised for migration to the Internet. What is missing from the mix of CD-ROM and Internet data sources is a comprehensive, authoritative scientific database of the scope of Landolt-Börnstein. Although impressive in their coverage, Web sources, such as WebElements and the NIST Chemistry WebBook are of much narrower scope.

The Future. It will be a long time before computer databases and the Internet eliminate most needs to consult a traditional chemistry library. Nevertheless, the sources already available provide enough useful information that many chemists are turning to them first. As networks become faster and computers more robust, the traditional chemistry library will likely serve primarily an archival function. The Internet has fostered a new communication process that simply did not exist a decade or so ago. News groups and listserves, such as CHMINF-L (The Chemical Information Sources Discussion List), provide almost instantaneous answers to questions that would have taken days or weeks of research in the past. Even more exciting developments are appearing, as the marriage of databases with molecular modeling and visualization techniques (e.g., using MDL's Chime) becomes more widely applied in Internet chemical information sources.

Gary D. Wiggins

Bibliography.

Ash, J.E.; Warr, W.A.; Willett, P. Chemical Structure Systems: Computational Techniques for Representation, Searching, and Process of Structural Information. New York: Ellis Horwood, 1991.

Maizell, R.E. How to Find Chemical Information: A Guide for Practicing Chemists, Educators, and Students. 3rd ed. New York: John Wiley & Sons, 1998.

Wiggins, G. Chemical Information Sources. New York: McGraw-Hill, 1991.

Ridley, D.D. Online Searching: A Scientists Perspective; A Guide for the Chemical and Life Sciences. Chichester, New York: John Wiley & Sons, 1996.

URLs of Sources in the Order Mentioned in the Text

Chemical Abstracts Service (http://www.cas.org/)

CAS Statistical Summary, 1907-1997 (http://www.cas.org/EO/CAS.summ.pdf)

The Dialog Corporation (http://www.dialog.com/)

Questel-Orbit France Telecom Group (http://www.questel.orbit.com/patents/)

STN International (http://info.cas.org/stn.html)

STNEasy (http://stneasy.cas.org/)

Institute for Scientific Information (ISI) (http://www.isinet.com/)

Derwent (http://www.derwent.co.uk/)

Beilstein Information (http://www.beilstein.com/)

Cambridge Crystallographic Data Centre (http://www.ccdc.cam.ac.uk/)

Cambridge Scientific Abstracts (http://www.csa.com)

Ovid Technologies (http://www.ovid.com/)

Chemical Information System (http://www.nisc.com/cis/)

TDS/Numerica (http://www.tds-tds.com/)

The National Library of Medicine (http://www.nlm.nih.gov/)

PubMed (http://www.ncbi.nlm.nih.gov/PubMed/)

Internet Grateful Med (http://igm.nlm.nih.gov/)

CHEMINFO: Chemical Information Sources from Indiana University [SIRCh] (http://www.indiana.edu/~cheminfo/)

CS ChemFinder (http://chemfinder.camsoft.com/)

NIST Chemistry WebBook (http://webbook.nist.gov/chemistry/)

CHEMCYCLOPEDIA: The Manual of Commercially Available Chemicals (http://pubs.acs.org/chemcy/)

Chemical Patents Plus! (http://casweb.cas.org/chempatplus/)

The Periodic Table on the WWW: WebElements by Mark Winter

(http://www.shef.ac.uk/~chem/web-elements/)

MDL Information Systems, Inc. Download Software [Chime] (http://www.mdli.com/download/)

 

Other Chemistry URLs

International Union of Pure and Applied Chemistry (http://www.chem.qmw.ac.uk/iupac/)

The Most Comprehensive List of Chemistry Journals Online at ChemConnect (http://www.chemconnect.com/library/journals.shtml)

BioChemNet: The Best Biology and Chemistry Educational Resources on the Web (http://schmidel.com/bionet.htm)

Chemical Object Test Page at Lawrence Livermore National Laboratory

(http://www-dsed.llnl.gov/documents/tests/chem.html)

CSIR: Chemistry Software and Information Resources (http://www.csir.org/)

Environmental Chemicals Data Information Network and the European Community Pharmaceutical Information Network (http://ulisse.etoit.eudra.org/Ecdin/Ecdin.html)

Abbreviations of Chemical Compounds (http://www.chemie.fu-berlin.de/cgi-bin/abbscomp)

The Royal Society of Chemistry Library and Information Centre Catalogue of Books, Monographs and Images (http://www.rsc.org/cgi-bin/fx.exe?DB=lic)

Gary Wiggins

Head, Chemistry Library

Indiana University

Bloomington, IN 47405

812-855-9452 (Work)

812-855-6611 (Fax)

812-332-6861 (Home)

wiggins@indiana.edu

ã McGraw-Hill, Inc. 1999 Reproduced with permission.from the McGraw-Hill Yearbook of Science and Technolgy 2000