CHEMINFO Title Bar

STN Command-Language Searching: Basic Concepts


C472 Lecture Notes
Updated: 23 January 2001

I. Logging On and Logging Off

One way to gain access to the STN International system is via telephone lines and a modem. Another way to access the STN system is via the Telnet program on the Internet, using the address STNC.CAS.ORG or STN.FIZ-KARLSRUHE.DE. If using STN Express with Discover! or STN on the Web, the connection to STN is performed by the software.

All commercial systems that charge for online searching of their databases require a loginid and a password. For the STN Academic Program via the Internet, the logon sequence via modem or Telnet would be as below. User input is indicated in bold. "(CR)" means hit the "Enter" key.

Logging onto STN's CAS ONLINE Academic Program via Telnet:

telnet stnc.cas.org (CR)
(CR)
Welcome to STN International! Enter x: i (CR) (1)
LOGINID: dummyid (CR)
PASSWORD: ######### (Enter the password and a CR) (2)
TERMINAL (ENTER 1, 2, 3, OR ?): 3 (CR) (3)
* * * * * * * * * Welcome to STN International * * * * * * * * * *

[News messages appear here.]

=> file lreg (CR)

[Searching occurs here.]

=> log y (CR)


Comments:

(1) The "i" indicates that we are entering with a restricted access Academic Program account, accessible after 5:00 PM on weekdays and certain weekend hours. Users with full access enter "x" at this point.

(2) The LOGINID will appear on the screen, but the password (we hope!) is masked by the #########.

(3) Terminal choices are:

Once in the STN system, the prompt is: =>

II. Basic STN Commands

The STN Messenger search software assumes that you are a novice searcher if you spell out the entire command words. Some commands have single letter equivalents which, if used, signal Messenger that you do not want to be prompted for any information the system needs to complete your search. In this case, it will DEFAULT to system-defined parameters--what the computer assumes you want to do in the absence of explicit information to the contrary.

The five basic STN commands, with single letter equivalents in parentheses where appropriate, are:

"Basic STN Commands" gives fuller information.

III. Proximity Operators.

There are more specific variants of the AND command that can be used to define the spatial relationships of search terms. These are called POSITIONAL or PROXIMITY OPERATORS. On STN, they are:

STN assumes that multi-word phrases are to be searched using the (W) operator in the absence of explicit positional or other Boolean operators.

IV. Truncation (Masking) of Characters to Expand a Search

TRUNCATION is the search technique that allows the searching of more than one form of a word with a single command. See:

for examples of truncation in STN.

Truncation can occur at the left end or the right end of a word stem or within the word. STN now allows all three types of truncation in the CA File Basic Index. The limit of terms that can be gathered in a set by truncation is 30,000 stems. For left truncation the search term must have at least four characters.

II. Truncation (Masking): The Use of Wild Cards

In many cases where subject searches are concerned, we are looking for topics that involve words built on a common root word, or that have some other variations that are easily signaled to a computer by means of a special symbol. TRUNCATION is the technique that tells the computer to form an answer set consisting of all records that contain words with the characters input for the search, but could also contain related words with suffixes (or, in some cases, prefixes or variable characters at a given point in the word).

On the STN system, truncation symbols are:

Symbol Function Example
exclamation point (!) Exactly one character cataly!e
hash mark (#) One or no character alcohol#
question mark (?) Any number of characters ?therap?

As noted in the table, the # sign can be used at the end of a word to pick up both singular and plural forms of a word. Another way of accomplishing the same thing on STN using the command language option is to enter SET PLURALS ON at the system prompt. Both left- and right-hand truncations are allowed with the "?".

There are limits to the number of terms that can be gathered into a set using truncation. Therefore, caution must be exercised in using truncation to prevent too many search terms (or unexpected words) from entering the answer set.

III. Expanding (Neighboring)

We have already seen the expand technique profitably used in author searching. It is also a very useful option in subject searching, especially since it allows us to determine whether the search term we are considering is actually used in the system. In addition, keyboarding errors that have gone undetected may be revealed in an expand list. For example, in the STN CA file, the following list appeared when "organomagnesium" was expanded in the Basic Index at the time of the search:

Set
#
# of
Answers
Variant
Spelling

 E1    1  ORGANOMAGNESIATE/BI
 E2    1  ORGANOMAGNESIATES/BI
 E3  823  ORGANOMAGNESIUM/BI
 E4    1  ORGANOMAGNESIUMALUMINUM/BI
 E5    2  ORGANOMAGNESIUMOXANE/BI
 E6   59  ORGANOMAGNESIUMS/BI
 E7    1  ORGANOMAGNETIC/BI
 E8    1  ORGANOMAGNETISM/BI
 E9    1  ORGANOMAGNSIUM/BI
 E10   1  ORGANOMANGANATE/BI
 E11   1  ORGANOMANGANATES/BI
 E12  74  ORGANOMANGANESE/BI

Note that E9, the one document in the file with the misspelled term "organomagnsium" would probably be missed in a subject search if not spotted in the expand list, so the search statement to pull all of the variants into one set in the CA file would be:

For the online CA File on STN, the preferred terms are searched with the field labels "CT" for phrases or "CW" for words. Thus, a search for parasympathomimetics would find in the Index Guide that the preferred phrase to search is Cholinergic agonists. The online CA file search using command language would then be:

=> S CHOLINERGIC AGONISTS/CT

In an online search, it is important to include the CAS standard abbreviations and acronyms since the abbreviations are used in preference to the full terms in the online records, hence, in the Basic Index of the CA File.

One can always issue the DISPLAY IND command to see how a particularly relevant document has been indexed and then input relevant indexing terms to broaden or narrow a search. Look especially for abbreviations such as DETN or DEGRDN. These are used in preference to the full terms such as determination, degradation, etc. in indexing CA. See CAS Standard Abbreviations and Acronyms. On the STN system, it is now possible to use the command SET ABBREVIATION ON to automatically check if there are CAS abbreviations used for the search terms you input. If so, the system automatically searches those forms. If SET PLURAL ON is also in use, the plural forms of the abbreviations will also be found. Users of SciFinder and SciFinder Scholar do not need to worry about such subleties because the search algorithm authomatically makes allowances for such variants.

Look at a sample record from the CA Student Edition on OCLC, paying particular attention to the index terms and the use of abbreviations.

V. CAS Roles in the CA and other Files

ROLES are CAS indexing terms assigned to every indexed substance and to controlled index terms for classes of compounds. The use of roles began to be appplied to the new online CA File records with v. 121 (July 1994). They were then applied retrospectively to all CA File records by means of a computer algorithm. Originally there were 38 specific roles and 7 broad super roles. They substantially expand the indexing terms that were used prior to their introduction. The role terms give a more precise link to the substance. For example, it is now possible to specify not only that you want the preparation of the substance, but also that the preparation be a synthetic preparation, as opposed to industrial manufacture. In the past, there was no distinction made in the use of the term "Preparation" in such cases. Nevertheless, it is still possible to search in the CA File for all manner of preparations of a substance or a group of substances found in the Registry File by appending a "/P" to the answer set number from the Registry File (or for a single substance, by appending a "P" directly to the Registry Number in a CA File search), e.g.,

=> SEARCH L2/P (where L2 is an answer set from the Registry File)

or

=> SEARCH 494-12-2P (where 494-12-2 is the CAS Registry Number for Flavan)

Roles must be attached to an L# answer set formed in the Registry File if used in conjunction with that L# to search the CA File. An example of the use of the role code "SPN" (Synthetic Preparation) is:

=> FILE REGISTRY

=> S FULLERENE/CNS

L2 3287 FULLERENE/CNS

CNS is the chemical name segment field designator on STN.

=> FILE CA

=> S L2/SPN OR FULLERENES/SPN 5347 L2 35422 SPN/RL 206 L2/SPN (L2 (L) SPN/RL) 1759 FULLERENES/CT 35422 SPN/RL 108 FULLERENES/SPN (FULLERENES/CT (L) SPN/RL) L3 248 L2/SPN OR FULLERENES/SPN

The Roles can be viewed in an online thesaurus to see the role hierarchies and definitions. They are currently used in the CA and CAplus files and in the CASREACT and MARPAT files.

To ensure that the CAS Role Indicators are in agreement with the current focus and direction of chemistry, the following key changes to new and modified Role Indicators were made in late 2001. New Roles have been added:

Ambiguous Biological Roles have been discontinued. Most will fall into the Biological Study, Unclassified Role. Other Roles have been divided to allow for more precision, e.g., Reactant Roles and Reagent Roles. The Nonbiological Use, Unclassified Role definition has been clarified as Other Use, Unclassified Role.

VI. Searching the Registry File with a Chemical Name

The Registry File is the largest single source of chemical names in existence. It can be searched by a trade or common name for a substance (CN), by its CAS Index Name (CN) or by fragments of the CAS Index Name (CNS field). (See: Tips for Chemical Name Searching.) The Basic Index of the Registry File includes both chemical name fragments and molecular formula fragments. It may be necessary to follow certain protocols for special characters in order to search for a chemical name. Greek characters, for example, are spelled out in their entirety with a period before and after the Greek part of the name.

VI. Searching the Registry File with a Chemical Name

The Registry File is the largest single source of chemical names in existence. It can be searched by a trade or common name for a substance (CN), by its CAS Index Name (CN) or by fragments of the CAS Index Name (CNS field). (See: Tips for Chemical Name Searching.) The Basic Index of the Registry File includes both chemical name fragments and molecular formula fragments. It may be necessary to follow certain protocols for special characters in order to search for a chemical name. Greek characters, for example, are spelled out in their entirety with a period before and after the Greek part of the name. Examples of chemical name searches in the Complete Chemical Name Index (/CN) or the Chemical Name Segment Index (/CNS) of the Registry File are:

=> SEARCH ISATIN/CN

=> SEARCH .ALPHA.-METHYLBENZOIN/CN

=> SEARCH ACETYLSALICYLIC ACID/CN

=> SEARCH IMINO/CNS

Since there is a fee to search terms in the Registry File, it is best to check the name by first expanding it in the relevant index. Often, the combination of a molecular formula search and a Chemical Name Segment search is an effective way to retrieve a substance when the molecular formula alone has many isomers.

An example of such a chemical name search in SciFinder Scholar is below. Note that in the SciFinder Scholar system, the search will work with or without the periods around the "alpha," but in STN command-language searching, the dots are mandatory.

alpha-Methylbenzoin Name Search

Since there is a fee to search terms in the Registry File, it is best to check the name by first expanding it in the relevant index. Often, the combination of a molecular formula search and a Chemical Name Segment search is an effective way to retrieve a substance when the molecular formula alone has many isomers.

V. Fields in Records

The STN database summary sheets have examples of the RECORDS in the corresponding databases. Limiting a search to a specific part of the record (a FIELD) is done on STN using a two-letter code preceded by a forward slash. When the search term or phrase is entered, the field code is appended as below:

=> S PARMENTER C?/AU (CR)

=> S ISATIN/CN (CR)

See "Ways to Narrow Your Answer Set in the CA File" for CA File examples of the use of the language field, the document type field , and the publication year field.

VI. The Concept of the Basic Index

What if no field code is used in the search statement? Messenger assumes by default that you want the search to run in the BASIC INDEX. The fields included in the Basic Index vary from database to database.

For the CA File, the Basic Index includes:

For the Registry File, the Basic Index includes:

VII. Section Codes for Online Searches

Since the information in Chemical Abstracts is classified into 80 major subject sections, the section numbers and codes can actually be used on STN with the CA Classification "CC" field in subject searches to assist in limiting a search. For example, works dealing primarily with enzymes are found in section 7 of the weekly Chemical Abstracts. Other documents are assigned to one of the 80 subject categories divided into the following gross categories:

Section
Name
Section
Code
Section
Numbers
Biochemistry BIO/CC 1-20
Organic Chemistry ORG/CC 21-34
Macromolecular Chemistry MAC/CC 35-46
Applied Chemistry & Chemical Engineering APP/CC 47-64
Physical, Inorganic, & Analytical Chemistry PIA/CC 65-80

Thus, a strategy that included in an online search on STN:

or

would have the effect of limiting the retrieved documents in answer set L4 to those dealing with enzymes (found in section 7 of the printed CA) or more broadly, those a biochemical nature found anywhere in section 1-20 of the printed product.

IV. How to Create a Structure in the Registry File.

The "old-fashioned" way of building structures on the STN system is to use alphanumeric commands to gradually create the molecule. There are front-end programs such as STN Express or Molkick that can be used to draw a graphic depiction of the molecule offline and upload it to STN once the connection is made. Of course, SciFinder or STN on the Web tool have a structure searching option. Nevertheless, it is instructive to see the original commands used to draw the molecule and the options for assigning parameters to the structure. When building the structure online via commands, it is advisable for cost reasons to build it in the cheap LREG file. Once complete and an L# is assigned to the structure query, you can transfer to the more expensive Registry File to run the search.

These are the basic steps that must be followed to create the structure online on STN using command language:

  1. Initiate the structure creation sub-program on the STN system by giving the STRUCTURE command at the STN LREG file prompt "=>".
  2. Build the outline of the structure using the GRAph command.
  3. Specify the non-carbon atoms with the NODe command.
  4. Specify the types of bonds in the molecule with the BONd command.
  5. Specify additional requirements for the molecule, such as:
  6. Do a final display of the molecule you have built with the DIS SIA (Display the Structure Image and Attributes) command.
  7. Terminate structure building with the END command.

At this point, an L# is assigned to the structure query you have created. Once the Registry File is entered, the structure search is initiated with the SEARCH L# command. An example of the structure building process using commands on STN and a Type 3 (alphanumeric) terminal setting is seen here.

V. The GRA Command and the Use of Pre-Drawn Structures

The Graph command builds the basic outline of the molecule. This can be a cumbersome process for larger molecules. Hence, there are alternatives. One way is to start with the Registry Number of a known substance that is similar to the compound of interest. Once the STRUCTURE command is given, you are prompted to:

ENTER NAME OF STRUCTURE TO BE RECALLED (NONE):

At this point, you could enter a Registry Number or, if you have built another structure in this session, the L# for that query structure. Another alternative is to enter a code for the pre-drawn systems used in creating structures. Rings of size 5 to 12 ring atoms can be created simply by inputting the appropriate number at the prompt. Other pre-drawn options include STEROD (steroids) and ADAMAN (adamantanes).

If starting from scratch, the two basic options for the GRA command are to draw a chain (c) or ring (r) followed by a number indicating the size of the chain or ring. Thus, GRA c3 builds a chain of 3 atoms, and GRA r6 builds a 6-membered ring. The structures appear on the screen with carbon atoms as the default nodes, and unspecified bonds. All nodes are numbered, so further commands to the system utilize the node numbers for appropriate actions.

One potentially confusing use of the GRA command occurs when two nodes are to be connected. Intuitively, this would seem to involve the BON command because we want to form a bond between the two atom nodes. However, BON is used only to modify an unspecified bond created with the GRA command. Thus, if we wanted to create a 14- membered ring, one way to do it would be to GRA c14, then GRA 1-14. That puts the necessary link between the two end nodes (although some other moving of the atoms would be necessary to make it appear reasonable on the screen).

VI. Use of the NOD Command

The NOD command takes the form: NOD # symbol where the # refers to the number of the node in the molecule and the symbol is defined either by regular symbols for the elements or by special node symbols understood by the STN system. The latter include such things as "X" to represent any halogen, "M" to represent a metal, or "Gk" (where k represents a number from 1 to 20) to indicate a node which can vary according to your defintion of the possible symbols (done with the VARiable command). There are also a number of SHORTCUT SYMBOLS for groups such as methyl "ME" or tert-butyl "T-BU".

There are four GENERIC GROUP SYMBOLS:

By issuing the GGC (Generic Group Category) command, these symbols can be further limited by type, for example, linear "LIN" or low carbon (6 or fewer carbons) "LOC".

Finally, it may be necessary to define a node as potentially being in either a ring or a chain. This is done with the command NOD # rc. Since the system assumes by default that the node is only to exist in the environment drawn, it is necessary to override the default with the rc specification when it is ok for an end node to be in either a ring or a chain in a substructure answer set.

VII. Use of the BON Command

The bond codes used in the Registry File structure building process are letter codes to specify bond types such as "se" (single exact) or "d" (double) or "n" (normalized). A NORMALIZED BOND is an aromatic bond or one found in a tautomer or combinations of rings and tautomers. If a ring has an even number of atoms and contains alternating single and double bonds all the way around the ring system, the bonds in the ring are designated as normalized. For fused rings, only the outside path is considered.

For a tautomer, the following environment must exist:

where:

It is also possible to specify that a bond is only a ring bond or only a chain bond by defining it as BON rs or BON cd, for example. By default the system will assume that the bond is only to be part of a compound that has the environment in which it is drawn.

CASREACT: In addition, common functional groups in the reactants, reagents, and products are searchable with a name labeled as /FG. For example,

=> S PRIMARY AMINE/FG

or

=> S TRIHALIDE/FG.RCT

ROLES: used to describe the information that deals with the substances indexed in a document. One of the super roles is PREP (Preparation), which has more specific roles:

The PREP super role is equivalent to the STN CA/CAplus File search => S L#/P. The same results would be found with the strategy: => S L#/PREP

Roles must be appended to a Registry File answer set if used with an L#. However, they can be applied both to L#'s which may contain one or more substances and to individual CAS Registry Numbers or General Subject Index terms for classes of substances. For example, => S 91-56-5/SPN would find a laboratory-scale preparation of isatin.

It is also possible to label a substance with the role RCT (Reactant) in order to limit the answer set to references where a particular reactant is used, as in:

=> S L# AND 91-56-5/RCT (Note that the L# in this case may not be from the Registry File.)

In the CA File on STN, a convenient way to find all kinds of ways of preparing a substance is to search the CAS Registry Number, either directly or by crossing over from the Registry File. A "P" appended to the search strategy results in the search being limited to the items of interest. For example:

=> S 91-56-5P

or

=> S L#/P

In the second case, the "L#" (answer set) would have resulted from a search in the Registry File that found one or more compounds. It could represent a group of related substances from a substructure search. The "/" is required before the "P" in the CA/CAplus file search when using such a L#.

MISCELLANEOUS COMMANDS:

SET REG OFF

This command allows you to suppress the automatic REGISTRY search and crossover initiated by REG1stRY when a CAS Registry Number is entered in the CA/CAplus family of files. REG1stRY is, by default, ON. Simply enter SET REG1stRY OFF at any arrow prompt to suppress REG1stRY. When SET REG OFF is used, the CAS Registry Number is searched in the BASIC INDEX (/BI). When you search terms in other REG1stRY fields, e.g. chemical name (/CN) or molecular formula (/MF), SET REG OFF does not affect the automatic REGISTRY search and crossover. See HELP SET REG for more details.