The exercise for this session requires you to use a free Internet search engine (Google) and commercial databases: SciFinder Scholar, ingenta (formerly, CARL UnCover), and Academic Search Elite.
The lecture attempts to clarify some aspects of Boolean searching that may not be that obvious to you, such as proximity operators and nesting. In addition, we introduce the concept of truncation (masking) to expand the scope of a search.
It is important to recognize the different years of coverage of various databases and to be aware that most of the really authoritative chemistry database searching is still done in commercial databases through database vendors.
Database producers and database vendors make it possible to search files that are located outside our geographic area through the techniques of online database searching. The online database industry is now in its fourth decade, and many sophisticated search techniques have been developed during that period. In comparison, the search techniques found in Internet search engines might be considered rudimentary, but they are constantly improving.
Each vendor offers a range of databases, some of which are specific to a discipline (chemistry, physics, etc.). Others deal with mission-oriented problems such as energy or the environment, cutting across disciplines in their subject coverage. Once connected to a database vendor's system, it is possible to perform CROSS-DATABASE SEARCHES simultaneously in a number of related files (multi-file searching).
Recall that there are databases corresponding to the different primary and secondary printed sources:
In a sense, the Internet search engines have turned the entire Web into one giant database. However, it has been shown that no single Web search engine indexes everything on the Web. It is the usual case that only 1/3 or less of the publicly available pages are caught by any given search engine's robot that roams the Web looking for pages to index. And you should know that no robot makes the voyage around the Web to collect Web pages every day. It might take months for a robot to complete its journey through the desired Web pages. Thus, no search engine's results are ever totally up to date. That is true whether you try Google, Hotbot, Northern Light, Altavista, or any other search engine. Furthermore, the free Web search engines do not have access to library databases such as the Web OPACs that tell you the holdings of the libraries, nor can they access any of the commercial vendors' offerings. Nevertheless, the search engines are very powerful tools, and for certain types of questions, they can be very useful in a search for information. For example, many people, including chemists, maintain their own personal Web pages nowadays. For locating someone and perhaps finding a full or selective bibliography or a curriculum vitae (CV) of a chemist, the Web may offer the best route to reliable, up-to-date information. Likewise, very new or hot topics may be discussed in Web news groups or discussion lists long before they appear in traditional journals and, later, in abstracting and indexing services. For all of these reasons, we are beginning to see the commercial vendors add options to transfer the search strategy used in a commercial database search to the Internet for further information. Examples are Elsevier Science Direct's Scirus and STN's eScience.
Nevertheless, it ought to be fairly rare that you would begin a subject search for information with a Web search engine if you have easy access to CD-ROM or online commercial databases in your organization. Databases such as the Web of Science (including Science Citation Index potentially all the way back to 1945), Beilstein CrossFire (which, with the Gmelin and Beilstein databases, covers the literature of modern inorganic, organic, and organometallic chemistry back to their beginnings in the 18th and 19th centuries), and Chemical Abstracts (which covers all areas of chemistry in a comprehensive manner back to 1907) are usually much better first choices, if they are available to you.
The options for database searching include:
VENDORS of online search services (for example, STN International) lease or acquire databases from the database PRODUCERS (such as Chemical Abstracts Service or the Institute for Scientific Information) and make them available on remote computers. For a given vendor, which may have dozens or hundreds of databases on its computers, the databases are all searched by a common command language or graphical user interface. In the vast majority of these cases, there is a fee for searching the databases.
WEB SEARCH ENGINES
As noted above, the powerful search engines of today can provide a useful supplement to traditional online searches. A useful guide to search engines is maintained on the Search Engine Watch Web Site.
Some databases that are available for searching free on the Internat are of very high quality, for example, those produced by the National Library of Medicine or other government agencies or commercial organizations. However, the quality of most databases that are freely accessible on the Internet is likely to not be as high as that of commercial databases. In addition, there are many differences in the search interfaces that the user encounters among free Internet databases. Nevertheless, they should not be ignored for certain types of searches.
Chemical and pharmaceutical companies now routinely load databases on their own computers. This includes the placement of CD-ROM products on networks. CD-ROMs also often require the searcher to utilize a number of different search systems, since each product typically comes with its own unique search interface. Several new models of providing databases searching are now being explored with the advent of client/server computer systems.
The costs of a commercial online search are usually not fixed, but are dependent on several factors, including telecommunications network charges (even a connection via the Internet is not free on a commercial system), connect time on the vendor's computer, royalties charged for the information extracted from the database (known as HIT CHARGES), and on some systems, charges for the search terms input in the search strategy.
The benefits of using an online vendor to search databases include:
STN International is at present the only online vendor to have available the abstract data from Chemical Abstracts. The abstract's summary of the document provides a quick way to assess whether the document itself should be read for further information. See the examples of journal article and patent abstracts in the STN CA File Quick Reference Card. The card also shows examples of the Messenger search commands that must be used on the STN system when searching the CA database, with its nearly 20,000,000 bibliographic records, in native command mode.
STN and Questel/ORBIT are the only vendors on which full structure searches of the complete CAS Registry File, approaching 30,000,000 compounds. A subset of the complete CA database that is not used in this course is the CA Student Edition on the OCLC FirstSearch system.
Online search systems offer BOOLEAN SEARCH OPERATORS that show the logical relationship among different concepts. See "Operators for Relating Search Terms" for some examples of Boolean search operators on the STN system.
The most common Boolean operators are:
The normal use of the English word "or" implies a choice, with only one thing possible in the final selection. In a Boolean sense, OR really grabs all of the items and puts them into a set. A special variant of the OR operator is XOR. XOR retrieves a document only if one of the terms in the OR statement is present, but would skip any documents that have both terms.
If each of the pieces of pie and cake in a bakery were placed on its own plate and arranged on an enormous tray, we would satisfy the search (pie OR cake), and the tray would represent our answer set. Since the XOR operator was not used, there could even be some plates on which both pie and cake were found.
In this example, think of each of the pieces of cake as having to be on its own plate with some ice cream on top in order to satisfy the search.
Example: (pie OR cake) NOT chocolate
Let's assume that you are allergic to chocolate. What would happen in the NOT examples if chocolate cake were the only type of cake available? In the first case, you would not get any dessert because the NOT completely eliminates the subset when one of the terms satisfies it. It throws out each of the plates containing the chocolate cake even if the ice cream is your favorite, vanilla. In the second NOT case, however, our search would allow us to have a piece of pie (as long as it wasn't chocolate pie or the plate didn't also have some chocolate cake on it!).
There are more specific variants of the AND command that can be used to define the spatial relationships of search terms. These are called POSITIONAL or PROXIMITY OPERATORS. On STN, they are:
STN assumes that multi-word phrases are to be searched using the (W) operator in the absence of explicit positional or other Boolean operators.
TRUNCATION is the search technique that allows the searching of more than one form of a word with a single command. See:
for examples of truncation in STN.
Truncation can occur at the left end or the right end of a word stem or within the word. STN now allows all three types of truncation in the CA File Basic Index, an index of subject words from the title words, words in the abstracts, or index terms (including Registry Numbers for compounds discussed in the documents). The limit of terms that can be gathered in a set by truncation is 30,000 stems. For left truncation the search term must have at least four characters.
Novice searchers and even professionals sometimes make gross errors with truncation, especially in systems that allow both left- and right-hand truncation. Think what would happen if a search were run with these character strings truncated on both sides:
Every occurrence of the word "chemical" or "chemistry" would be pulled in the first search, and every English word that ends in -ION would be pulled in the second case. Probably not what the searcher would have wanted!
Chemical Abstracts is the largest and most nearly comprehensive abstracting service for information in chemistry. It covers a very broad range of topics and has been published since 1907. At present Chemical Abstracts Service creates three main files and several related databases. These include the CA File of literature since 1947 (and soon to extend back to 1907) and the CAOLD file that at present covers the period 1907-66. The Registry File contains searchable information that leads to the rapid identification of a compound, when a name, molecular structure, or other pertinent data is known about it. The Registry File also links these substances to the information that is indexed in the CA File and other chemical databases on the STN system through the Registry Numbers assigned by Chemical Abstracts Service to chemical substances. The CAS REGISTRY NUMBER is a unique number assigned to each chemical substance in the Registry File. For isatin, it is 91-56-5.
Also produced by CAS are the CASREACT file of organic reaction data, the CHEMCATS file that links chemical substances with commercial suppliers, the CHEMLIST file of regulatory data, and a special variant of the CA File, CAPlus, that offers rapid coverage of the articles in the main journals of chemistry.
The CA File covers chemical literature found in journals, patents, patent families, technical reports, books, conference proceedings, and dissertations from all areas of chemistry, biochemistry, chemical engineering, and related sciences from 1967 to the present. The CAplus file is a special version of the CA File. Since October, 1994 it contains all articles from more than 1,350 key chemical journals, including records for document types not covered in Chemical Abstracts (CA): biographical items, book reviews, editorials, errata, letters to the editor, news announcements, product reviews, meeting abstracts, and miscellaneous items. Bibliographic information and abstracts for the articles from the key chemical journals are added within one week of journal receipt. Both the CA and CAplus files are being retrospectively converted to include earlier information. At the present time, you can also search bibliographic information and CA abstracts from references in the period 1947-66. By the end of 2002, all CA bibliographic data will be included in the CA and CAplus files.
There are low-cost learning files that correspond to:
Learning the command language of STN Interntional, DIALOG, or other vendors can be a significant barrier to online searching for some. There are programs that can help the novice searcher. One such FRONT-END program is STN Express with Discover. Questel/ORBIT's IMAGINATION software is another front-end software packages.
The most recent efforts by the major vendors to win online searchers have been directed toward the Internet. For example, STN EASY allows direct access to the STN databases with a relatively straightforward graphical user interface. Most recently, STN has developed for professional searchers STN on the Web. The U.S. National Library of Medicine's PubMed gives free and easy access to a version of the National Library of Medicine's main database, Medline.
Another STN product is SciFinder and its academic counterpart, Scifinder Scholar, which make the searching of some of STN's databases (CAPlus, Registry, CHEMLIST, CHEMCATS, and CASREACT) relatively effortless. It lets the user perform chemical searches by clicking on the icons depicted below.
SciFinder Scholar removes the need to know the STN Messenger search commands. It even makes it unnecessary to know the proper use of Boolean operators in a subject (Research Topic) search or to know how to use truncation symbols. It employs sophisticated built-in intelligence to deduce the relationships the searcher desires among the various words and phrases. Nevertheless, many online search systems, including Internet search engines, require at least a passing knowledge of these techniques in order to use them effectively.
Link to
supplemental readings
Link to
Internet resources on this topic
Copyright
Gary Wiggins
24 August 1997