Almost all abstracting and indexing services, not to mention many other secondary and primary works, have subject indexes. In this chapter we will look closely at the subject indexes for some of the major works already covered, as well as note the existence of specialized abstracting and indexing services devoted to a particular document type and full-text databases of primary and other literature types. Discussion of the type of subject search that uses the name of a specific chemical compound will be deferred to a later topic, although words that stand for classes of compounds will be discussed here.
The searches dealt with here are word searches. We must often find the right word(s) or group of words (phrases) to pull needed information from a given reference tool. Such searches cover techniques, processes, types of reactions, equipment, etc. The searcher has to be aware of variant spellings, the use of initialisms and acronyms, synonyms, and other complicating factors in such a subject search. In addition, the interpretation that the search system gives to the form in which the search statement is input is critical. For example, does the search system interpret two adjacent words as a phrase that must always have the words in that order? Or does it assume that either of those words could be present in a record in order to be a valid hit?
A fundamental question in conducting a subject search is whether all possible words, including synonyms, acronyms, abbreviations, etc. should be used in a subject search or whether the search can be conducted using a set of preferred terms selected by the indexers of the documents. As computers have become more and more powerful, the techniques of FULL-TEXT SEARCHING have become popular, with every word in a document being a potential subject term. Unfortunately, the number of false drops yielded in this type of UNCONTROLLED VOCABULARY search can be quite voluminous. Therefore, searching with terms selected from the CONTROLLED VOCABULARY of a THESAURUS or other subject term authority list is often preferable. An example is the MeSH Medical Subject Heading List that is used with the National Library of Medicine's Medline database. Another NLM effort that is even broader in its scope is the development of a Unified Medical Language System. Included in that project is the UMLS Metathesaurus. Chemical Abstracts Service uses the Index Guide to control search terms in the printed product, and the CA Lexicon on the STN system shows the underlying structure of the CAS vocabulary control system.
The distinction between uncontrolled (keyword) searching and searching using controlled vocabulary is important and is the main point of this lesson. That distinction is blurred in a tool like SciFinder Scholar. Most keyword searches, such as those in Science Citation Index, impose on the searcher the burden of selecting alternate names, acronyms, etc. for the concept of interest when performing the subject search. For example, Electron Spectroscopy for Chemical Analysis (ESCA) and X-Ray Photoelectron Spectroscopy (XPS) are both names for the same technique. Therefore, a search for all references to the technique in a keyword subject index would force the searcher to use both ESCA and XPS in the search strategy.
The relationship of the tool for controlling the vocabulary in Chemical Abstracts (CA Index Guide) to the printed subject indexes is presented below. Truncation symbols in use on various systems are also covered in this lesson, as are the capability to limit online searches in various ways (by date of publication, format, etc.) and to analyze results.
In many cases where subject searches are concerned, we are looking for topics that involve words built on a common root word, or that have some other variations that are easily signaled to a computer by means of a special symbol. TRUNCATION is the technique that tells the computer to form an answer set consisting of all records that contain words with the characters input for the search, but could also contain related words with suffixes (or, in some cases, prefixes or variable characters at a given point in the word).
On the STN system, truncation symbols are:
| Symbol | Function | Example |
|---|---|---|
| exclamation point (!) | Exactly one character | cataly!e |
| hash mark (#) | One or no character | alcohol# |
| question mark (?) | Any number of characters | ?therap? |
As noted in the table, the # sign can be used at the end of a word to pick up both singular and plural forms of a word. Another way of accomplishing the same thing on STN using the command language option is to enter SET PLURALS ON at the system prompt. Both left- and right-hand truncations are allowed with the "?".
There are limits to the number of terms that can be gathered into a set using truncation. Therefore, caution must be exercised in using truncation to prevent too many search terms (or unexpected words) from entering the answer set.
There is no uniformity of symbols used to designate truncation among different vendors or search engines, although often we find an asterisk (*) used to indicate the right-hand truncation point. That is the case with the Web of Science, for example.
With SciFinder Scholar, no truncation is used. The searcher simply types into the Research Topic search window the natural language expression that defines the search, without even trying to insert Boolean search terms. The SciFinder Scholar search algorithm has some built-in intelligence to look for relevant word forms for the search. For instance, the search system automatically searches for both singular and plural subject words.
Let's see an example of a search on SciFinder Scholar for the analytical technique "Electron Spectroscopy for Chemical Analysis (ESCA)."
At the time it was run, the search as entered found 3474 references where the two concepts "electron spectroscopy" and "chemical analysis" were closely associated with each other and only 517 where the phrase as entered was found. In this case, let's repeat the search using the acronym for the analytical technique (ESCA) and also use a synonomous acronym, XPS. (The technique is also known as X-Ray Photoelectron Spectroscopy.) We have the option of entering synonomous words in parentheses, following a term or phrase. Thus, entering the research topic search on SciFinder Scholar as:
would imply to the system that you are looking for synonymous terms (an OR search). This search found considerably more documents: 81,559 at the time of the search.
Let us restrict the phrase KEYWORD SEARCH to the type of uncontrolled vocabulary searching that is done when the terms are not selected from an authoritative subject list. Keyword indexes are often computer-produced indexes that result in every significant word in the document (or in certain fields of the document) becoming a KEYWORD. Such indexes exist in the weekly printed issues of Chemical Abstracts and in the Science Citation Index in its Permuterm Subject Index. The same is true of the Web of Science subject searches and searches on CARL UnCover. However, Science Citation Index has for a number of years included the capability to enhance the keyword searches using their KeyWords Plus feature.
ISI utilizes words that authors sometimes provide in their articles that they feel best represent the content of the paper. These keywords are contained in the SCI record and are searchable. In addition, ISI generates KeyWords Plus for many articles. KeyWords Plus are words or phrases that frequently appear in the titles of an article's references, but do not necessarily appear in the title of the article itself. KeyWords Plus may be present for articles that have no author keywords, or may include important terms not listed among the title, abstract, or author keywords.
One of the virtues of a keyword subject index is that the index terms reflect the current, ever-changing vocabulary of science. As soon as a new name for a concept, technique, etc, is used in a document, it could become an indexing term. Controlled vocabulary lists, on the other hand, are slower to adapt to changes in scientific terminology, but their greatest benefit is that they guide you to the preferred term for the concept. Hence, the searcher need only identify the preferred indexing term to find documents of interest.
The printed tool that controls the vocabulary in the Chemical Abstracts six-month volume and five-year collective indexes is the INDEX GUIDE. For example, looking in the "E" section of the Index Guide for ESCA reveals the following:
Thus, the searcher would know that documents on this topic can be found in the "P" section of the General Subject Index to Chemical Abstracts. It is important to use the CA Index Guide before using the General Subject Index because there are no "see" references in the General Subject Index itself. Furthermore, each five-year collective index period has its own Index Guide. CAS has provided a guide to Hierarchies of General Subject Headings to assist in selecting terms.
In regular database searches, the database vendors will usually define a default subject index in which words are searched. This is known as the BASIC INDEX. In the CA File, the Basic Index contains subject words from the titles, keywords, abstracts, and controlled vocabulary of the documents indexed by CAS. The vendors will define in the database summary sheets exactly what terms are included in a Basic Index search.
Prior to 1972, there were five- and ten-year Subject Indexes to Chemical Abstracts. Beginning with the 9th Collective Index period for 1972-76, the chemical name index entries for single chemical substances were put into a new work, the CHEMICAL SUBSTANCE INDEX. Everything else, including names for classes of substances (e.g., ethers), went into the GENERAL SUBJECT INDEX. Thus, searches for terms referring to classes of compounds, reactions, processes, equipment, or plant and animal species should be searched in the General Subject Index after the proper term or phrase has been found in the Index Guide. Another way of finding the proper General Subject Index terms for recent CA entries is to utilize the CA Lexicon on STN. The 14th Collective Index period refers to the years 1997-2001. You must keep in mind that the terminology rules may change from one collective index period to another. For example, the 14th CI Period moved significantly toward the current terminology in various fields, preferring DNA to the previous Deoxyribonucleic acids and Drugs to Pharmaceuticals. It is important to check the Index Guide that corresponds to the period you are searching in order to be sure of finding the correct term for use in the General Subject Index.
Not every preferred term or phrase is found in the Index Guide, and if you do not find a listing there, assume that you have chosen the correct preferred term and look in the appropriate section of the General Subject Index. Always be aware that preferred terms may change when the boundaries of the Collective Index periods are crossed.
Look at a sample record from the CA Student Edition on OCLC, paying particular attention to the index terms and the use of abbreviations.
The keywords used in the weekly issue indexing and single words from the titles, abstracts, controlled vocabulary terms, and so-called text modifications of the controlled vocabulary entries are all included in another CA file index, the BASIC INDEX. The text modifications were sometimes difficult to interpret, so beginning in October 1994, CAS introduced a format that is easier to read.
Old style:
As noted above, the SciFinder Scholar topic search will do some behind-the-scenes work to find appropriate terms to include in a search, so people who use that search tool do not have to worry as much about controlled or uncontrolled vocabulary when they perform a research topic search. However, it is recommended that you use synonyms in parentheses next to a related concept, for example, ESCA (XPS).
Since the information in Chemical Abstracts is classified into 80 major subject sections, the section numbers and codes can actually be used on STN with the CA Classification "CC" field in subject searches to assist in limiting a search. For example, works dealing primarily with enzymes are found in section 7 of the weekly Chemical Abstracts. Other documents are assigned to one of the 80 subject categories divided into the following gross categories:
| Section Name |
Section Code |
Section Numbers |
|---|---|---|
| Biochemistry | BIO/CC | 1-20 |
| Organic Chemistry | ORG/CC | 21-34 |
| Macromolecular Chemistry | MAC/CC | 35-46 |
| Applied Chemistry & Chemical Engineering | APP/CC | 47-64 |
| Physical, Inorganic, & Analytical Chemistry | PIA/CC | 65-80 |
Thus, a strategy that included in an online search on STN:
would have the effect of limiting the retrieved documents in answer set L4 to those dealing with enzymes (found in section 7 of the printed CA) or more broadly, those a biochemical nature found anywhere in section 1-20 of the printed product.
In the printed Chemical Abstracts, a B or P immediately before an abstract number designates a book or a patent respectively. In the online CA file, these and other documents are found in the Document Type (DT) field of the CA File:
| Code | Document Type |
|---|---|
| B | Book |
| C | Conference proceedings |
| D | Dissertation |
| GR | General review |
| J | Journal article |
| P | Patent |
| R | Review |
| T | Technical report |
Thus, combining an answer set number with one or more codes or words can either limit the answer set to a particular document type (or perhaps eliminate an unwanted type), e.g.,
or
Eight new document types (biography, book review, editorial, errata, letter, miscellaneous, news announcement, and product review) were introduced to the CAPlus file in 1994. SciFinder Scholar allows you to refine the answer set by many parameters, among them the document types shown below.
SciFinder Scholar searches can be refined by many other options, as seen below.
Similar refinements are possible with Web of Science and other database searches.
There are many specialized abstracting or indexing services that cover either a subset of chemistry, e.g., Analytical Abstracts, or a particular format, e.g., Proquest's Dissertation Abstracts International and their Online Dissertation Services. Many of the techniques for subject searching discussed in this chapter are applicable to those works, but acquainting yourself with the guides, database summary sheets, and other user aids for any tools you choose to search is a very good idea.
Special techniques, particularly the use of proximity operators, are critical to success in searching text databases. Electronic primary journal databases are now widely available on the Web. American Chemical Society journals can be searched by subject on the Web only by words in the article titles or in the full text of the articles., More sophisticated searching is reserved to the Chemical Abstracts database and a link through CAS's ChemPort service to the articles themselves. The ACS Electronic Supporting Information (formerly called Supplementary Material), containing more detailed data and other supplements not found in the printed journals, is also available to subscribers of ACS journals on the ACS Publications Web site. Links to the Supporting Information can be found in the table of contents for those issues that include such data or linked to the HTML version of the articles themselves.
Elsevier Science makes available on the Web a search engine that covers both Elsevier journals and Web resources. It is Scirus, found at http://www.scirus.com
Link to
supplemental readings
Link to Internet sources
Copyright
Gary Wiggins
22 September 1995