CHEMINFO Title Bar

Chemical Name, Formula, and Registry Number Searching


C472 Lecture Notes
Updated: 28 January 2001

I. Introduction

A search for information on a single chemical substance may start with the name of the substance, its molecular formula, or various other words or codes that can be associated with it. (See: How to Search for CAS Registry Numbers in the CAS Registry File.) In this lecture, we will encounter various coding systems that have been applied to the retrieval of chemical substances from both printed and computer-based sources.

II. Substance Searching Using Chemical Abstracts Service Registry Numbers

One very effective method of retrieving chemical substance information from a reference source is to utilize the Chemical Abstracts Service REGISTRY NUMBER for the substance. The Registry Number is a unique number assigned to each substance indexed by CAS. The Registry Number appears in the indexing of CA File records in preference to the formal name of the compound. (See the indexing for an STN LCA record displayed in the ALL format.) The CAS RN is a number of the format Y-XX-X, where Y can be from two to six digits, and X is one digit, for example, 494-12-2. The Registry Number is found in many databases and increasingly as an index to printed reference works.

III. The Index Guide and the Printed Chemical Substance Indexes

Just as the Index Guide controls the vocabulary that must be used in the Chemical Abstracts General Subject Index, it also provides the correct name to use in searching the CA Chemical Substance Index. For example, a check of the Index Guide for "Flavan" finds the following:

Flavan
See 2H-1-Benzopyran, 3,4-dihydro-2-phenyl- [494-12-2]

In alphabetizing chemical substance names in the index, locant numbers, stereo designators, etc. are ignored. Thus, we must look in the "B" section of the printed CA Chemical Substance Index for "Benzopyran" in order to find index entries on the compound. Note that the CAS Index Name for Flavan is inverted, with the name of the so-called HEADING PARENT listed first. This keeps structurally related compounds in the same area of the index. The basic Heading Parent compound is listed first, followed by derivatives and other structurally related compounds.

Likewise, in searching the CAS Registry File with tools such as STN Express or STN on the Web, you must ascertain the proper order of the name segments before a compound can be found in the chemical name field (/CN) if formal chemical name searching is done. A Chemical Name Segment (/CNS) search can help if you are unsure of the order of the name in the database.

IV. Abbreviations in Command Searching of the CA Files

In an online search, it is important to include the CAS standard abbreviations and acronyms since the abbreviations are used in preference to the full terms in the online records, hence, in the Basic Index of the CA File.

V. CAS Roles in the CA and other Files

ROLES are CAS indexing terms assigned to every indexed substance and to controlled index terms for classes of compounds. The use of roles began to be appplied to the new online CA File records with v. 121 (July 1994). They were then applied retrospectively to all CA File records by means of a computer algorithm. Since there are 38 specific roles and 7 broad super roles, they substantially expand the indexing terms that were used prior to their introduction. The role terms give a more precise link to the substance. For example, it is now possible to specify not only that you want the preparation of the substance, but also that the preparation be a synthetic preparation, as opposed to industrial manufacture. In the past, there was no distinction made in the use of the term "Preparation" in such cases. Nevertheless, it is still possible to search in the CA File for all manner of preparations of a substance or a group of substances found in the Registry File by appending a "/P" to the answer set number from the Registry File (or for a single substance, by appending a "P" directly to the Registry Number in a CA File search), e.g.,

=> SEARCH L2/P (where L2 is an answer set from the Registry File)

or

=> SEARCH 494-12-2P (where 494-12-2 is the CAS Registry Number for Flavan)

Roles must be attached to an L# answer set formed in the Registry File if used in conjunction with that L# to search the CA File. An example of the use of the role code "SPN" (Synthetic Preparation) is:

=> FILE REGISTRY

=> S FULLERENE/CNS

L2 3287 FULLERENE/CNS

CNS is the chemical name segment field designator on STN.

=> FILE CA

=> S L2/SPN OR FULLERENES/SPN 5347 L2 35422 SPN/RL 206 L2/SPN (L2 (L) SPN/RL) 1759 FULLERENES/CT 35422 SPN/RL 108 FULLERENES/SPN (FULLERENES/CT (L) SPN/RL) L3 248 L2/SPN OR FULLERENES/SPN

The Roles can be viewed in an online thesaurus to see the role hierarchies and definitions. To obtain a hierarchical list of roles and super roles, enter HELP ROLES at an arrow prompt in CA, HCA, CAplus, and HCAplus. They are currently used in the CA and CAplus files and in the CASREACT and MARPAT files.

VI. Searching the Registry File with a Chemical Name

The Registry File is the largest single source of chemical names in existence. It can be searched by a trade or common name for a substance (CN), by its CAS Index Name (CN) or by fragments of the CAS Index Name (CNS field). (See: Tips for Chemical Name Searching.) The Basic Index of the Registry File includes both chemical name fragments and molecular formula fragments. It may be necessary to follow certain protocols for special characters in order to search for a chemical name. Greek characters, for example, are spelled out in their entirety with a period before and after the Greek part of the name. Examples of chemical name searches in the Complete Chemical Name Index (/CN) or the Chemical Name Segment Index (/CNS) of the Registry File are:

=> SEARCH ISATIN/CN

=> SEARCH .ALPHA.-METHYLBENZOIN/CN

=> SEARCH ACETYLSALICYLIC ACID/CN

=> SEARCH IMINO/CNS

Since there is a fee to search terms in the Registry File, it is best to check the name by first expanding it in the relevant index. Often, the combination of a molecular formula search and a Chemical Name Segment search is an effective way to retrieve a substance when the molecular formula alone has many isomers.

VII. Searching the Registry File and Printed CA Indexes with a Molecular Formula

The system which is most commonly used today for arranging molecular formulas in indexes is the HILL SYSTEM. The Hill System covers both organic and inorganic compounds according to the following rules:

  1. Sum individually all like atoms within the molecule.

  2. If carbon is present, place it and the total number of C's first in the formula.

  3. If both carbon and hydrogen are present, place hydrogen and the total number of H's second. Note that if carbon is not present, rule 4 applies to the substance, and the H is placed in its regular position in the alphabet.

  4. All other atoms in the molecule are arranged alphabetically.
    That means that for inorganic substances without carbon, the arrangement is alphabetical.

Within the index itself, the numbers of elements come into play. Here is an example of compounds arranged for a Hill System Index:

Al6 Ca5 O14
B2 O3
B2 Zr3
Br H
C Cl4
C H Cl3
C H N O
C2 Ca
C2 H4
C2 H4 Br Cl
C2 H5 Al Br2
C5 H8 O2
C8 H5 N O2
C15 H24 N2
C22 H24 F N3 O2
Ca O3 Ti
Cl H
H2 O4 S
H4 Sn
O3 Pb Rb2
O5 P14 Zn7
Sn Zr4

Note that in the Registry File, the formulas may be searched with or without spaces between the element symbols. They are put here for clarity. The Hill System gives rise to some formulas that are quite different from those a chemist is used to seeing, e.g., H2O4S for sulfuric acid or BrH for hydrobromic acid.

The printed CA Formula Indexes do not have entries for the 600 or so qualified substances that have lots of information written about them. Thus, we find in the CA Formula Index from the 10th Collective Index period (1977-81):

C8H5NO2
1H-Indole-2,3-dione [91-56-5].

This tells us that the printed Chemical Substance Index must be used for detailed information on isatin itself, but it gives direct information that three documents dealt with the sodium salt of isatin during the period. The entries in the Chemical Substance Index include the TEXT MODIFICATIONS that give more information about the documents that are indexed. We find in the Formula Index the abstract numbers for the sodium salt of isatin since there were relatively few documents written about that compound during the 10th Collecintive Index period.

Bear in mind that a chemical formula in the Hill System may have more than one substance with that formula. For a given formula, isomers are arranged alphabetically by the CAS Index Name.

In the online molecular formula index of the Registry File (/MF), salts, addition compounds, and mixtures have the molecular formulas for the components arranged separately, with ratios for salts and addition compounds specified when known. If the ratios are unknown, a lower case "x" before the second formula or subsequent formulas is used, e.g.,

C15 H24 N2.2Cl H

C22 H24 F N3 O2.xH2 O4 S

These are examples of the so-called DOT-DISCONNECTED FORMULAS. As noted above for chemical name searches, it is best to expand the molecular formula in the MF index before actually searching online. (See: Tips for Molecular Formula Searching.)

VIII. The CAOLD File

The CAOLD File contains records for documents indexed in Chemical Abstracts 1907-66. It is possible to search the CAOLD file with the CAS Registry Number. The records for items in the CAOLD file bear little resemblance to those in the CA file, providing merely a link to the printed Chemical Abstracts accession numbers or a mechanism with which to link to a graphics file of the page. It is important to know that the CAOLD file records were generated from the CA Formula Indexes. Since the qualified substances do not have Formula Index entries, there are many CA accession numbers in the period 1907-66 that do not have pointers from the CAOLD file. For that reason, it is always best to double-check the results of a CAOLD file search against the printed Collective Index.

IX. Other Means of Searching the Registry File.

There are a number of other indexes that can be used in an online search of the Registry File, e.g.,

Compound Class Identifiers (/CI) Example: => SEARCH PMS/CI (retrieves polymers)

Periodic Group Codes (/PG) Example: => SEARCH LNTH/PG (retrieves lanthanide series molecular formulas)

Such searches are of use in combination with other Registry File searches in order to narrow an answer set. See the Registry File Summary Sheet for additional possibilities.

X. Ring Indexes

The Ring Systems Handbook provides an easy way to find the Heading Parent name for ring compounds. This name can then be used in the printed Chemical Substance Index or, for an online search, either the name or the Registry Number can be used to retrieve the Registry File record. The access to the entries in the Ring Systems Handbook is by name or ring analysis (and then by molecular formula of the rings making up the compound, ignoring hydrogens). The main part of the set is arranged by the number of rings comprising the compounds and the individual sizes of the smallest set of smallest rings. Thus, the number of component rings, the sizes of those rings, and the elements comprising them are enough information to find a ring compound. A section in the main body of the work might be labeled:



We would find in the section an entry for 1H-Indole [120-72-9]
                         H
               C         .
             :   .     . N .
           C:      .C.      . C
           .        :         :
           .        :         :
           C:       C.........C
             :    .            
               :C.              

with the molecular formula C8H7N and a 2-dimensional structural drawing of the molecule.

It would not be too difficult then to guess the proper Chemical Abstracts Index name for isatin: 1H-Indole-2,3-dione

isatin

Chemical Abstracts incudes an Index of Ring Systems with each Formula Index, beginning with the 7th Collective Index period (1962-66). The Registry File now has much information about rings that can be searched online, such as the Elemental Sequence for the Smallest Ring (/ESS), the number of rings in the ring system (/NRRS), etc. These search techniques can be valuable in refining a substance search in the Registry File.

XI. Other Online Chemical Dictionary Files

Databases such as the Registry File are referred to as ONLINE CHEMICAL DICTIONARY FILES. They exist to help you identify substances, to gather like substances into a set, and to discover which files on the database vendor's system have information on the substance(s).

Of particular interest are the online chemical dictionary files from the National Library of Medicine. Although not nearly as large as the Registry File, NLM's CHEMLINE file contained over 1,360,000 records as of mid-1995. The CAS Registry Number is part of each record. Searching by CAS RN's, molecular formulas, CAS Index Names, synonyms, various name and structural fragments is possible. A smaller NLM file is ChemID, with nearly 350,000 compounds. ChemIDplus has over 68,000 structures displayable, but not searchable as structures. An important feature of the ChemID file is SUPERLIST. SUPERLIST designates a collection of lists of chemical substances maintained by key federal and state government regulatory agencies, as well as by scientific organizations concerned with health and environmental hazards of chemical substances. ChemID provides directory assistance to those lists. Searching the NLM files is considerably cheaper than searching the CAS Registry file. The ChemID file is currently free through the Internet Grateful Med connection.

XII. Other Sources.

Here are links to printed and Internet sources that are relevant to this topic.

Search REGISTRY the world's fastest growing authority database on chemical substance information

Using The CAS Registry File on STN

Copyright
Gary Wiggins
7 October 1995