S PN=5577239 OR PN=5950192 OR PN=4642762

               2  PN=5577239
               1  PN=5950192
               1  PN=4642762
      S2       3  PN=5577239 OR PN=5950192 OR PN=4642762
?
 
 
 TYPE 2/2/ALL


  2/2/1     (Item 1 from file: 653) 
DIALOG(R)File 653:US Patents Fulltext
(c) format only 2001 The Dialog Corp. All rts. reserv.
 
             01566805
Utility
STORAGE AND RETRIEVAL OF GENERIC CHEMICAL STRUCTURE REPRESENTATIONS
 
PATENT NO.:  4,642,762
ISSUED:      February 10, 1987 (19870210)
INVENTOR(s): Fisanick, William, Columbus, OH (Ohio), US (United States of
             America)
ASSIGNEE(s): American Chemical Society, (A  U.S. Company or Corporation ),
             Washington, DC (District of Columbia), US (United States of
             America)
EXTRA INFO:  Expired, effective February 10, 1999 (19990210), recorded in
             O.G. of April 20, 1999 (19990420)
             Reinstated, effective June 28, 1999 (19990628), recorded in
             O.G. of July 27, 1999 (19990727)
APPL. NO.:   6-614,219
FILED:       May 25, 1984 (19840525)
U.S. CLASS:  707-3 cross ref: 707-104
INTL CLASS:  [4] G06F 15-40
FIELD OF SEARCH: 364-200MSFILE; 364-900MSFILE
                             References Cited
 
                          U.S. PATENT DOCUMENTS
 
    4,473,890    9/1984   Araki                                  364-900
 
PRIMARY EXAMINER: Zache, Raulfe B.
ATTORNEY, AGENT, OR FIRM: Pollick, Philip J.
CLAIMS:           31
EXEMPLARY CLAIM:  1
DRAWING PAGES:    22
DRAWING FIGURES:  22
ART UNIT:         232
FULL TEXT:        1514 lines
 
 
  2/2/2     (Item 1 from file: 654) 
DIALOG(R)File 654:US PAT.FULL.
(c) format only 2001 The Dialog Corp. All rts. reserv.
 
             02999303
Utility
RELATIONAL  DATABASE  MANGEMENT  SYSTEM  FOR  CHEMICAL  STRUCTURE  STORAGE,
SEARCHING AND RETRIEVAL
 
PATENT NO.:  5,950,192
ISSUED:      September 07, 1999 (19990907)
INVENTOR(s): Moore, Jeffrey, Timonimun, MD (Maryland), US (United States of
             America)
             Brazil, Joanne, White Hall, MD (Maryland), US (United States
             of America)
             Hoover, Jeffrey R., Baltimore, MD (Maryland), US (United
             States of America)
ASSIGNEE(s): Oxford Molecular Group, Inc , (A U.S. Company or Corporation),
             Towson, MD (Maryland), US (United States of America)
APPL. NO.:   8-883,165
FILED:       June 26, 1997 (19970626)
 
  This  application  is a continuation, of application Ser. No. 08-715,708,
filed  Sep. 19, 1996, now abandoned, which is a continuation application of
Ser.  No. 08-288,503, filed Aug. 10, 1994, now U.S. Pat. No. 5,577,239, the
entire disclosure of which is incorporated herein by reference.
 
U.S. CLASS:  707-3 cross ref: 702-27
INTL CLASS:  [6] G06F 17-30
FIELD OF SEARCH: 395-496; 395-497; 395-499; 395-600; 395-603; 707-3; 707-2;
             707-1; 707-100; 707-102; 707-104; 707-22; 707-27; 707-19;
             707-20; 702-22; 702-27; 702-19; 702-20
 
                             References Cited
 
                          U.S. PATENT DOCUMENTS
 
    4,642,762    2/1987   Fisanick                                 707-3
    4,811,217    3/1989   Tokizane et al.                        364-300
    4,855,931    8/1989   Saunders                               364-499
    5,025,388    6/1991   Cramer, III et al.                     364-496
    5,056,035   10/1991   Fujita                                 364-497
    5,259,137   11/1993   Wilson et al.                          364-496
    5,367,058   11/1994   Pitner et al.                        530-391.9
    5,379,234    1/1995   Wilson et al.                          364-496
    5,386,507    1/1995   Teig et al.                            395-161
    5,418,944    5/1995   DiPace et al.                          395-600
    5,463,564   10/1995   Agrafiotis et al.                      364-496
    5,577,239   11/1996   Moore et al.                           395-603
 
                         NON-U.S. PATENT DOCUMENTS
 
    090 895 A2   10/1983   EP (European Patent Office)
    213 483 A2    3/1987   EP (European Patent Office)
 
                             OTHER REFERENCES
 
 
Viking Instruments Corp. (Hewlett Packard); SpectraTrak Transportable GS/MS
Systems; (brochure)-No Date.
 
Chemical  Structures, The International Language of Chemistry; Wendy A. War
(Ed.); "Interfacing DARC--Oracle" AJCM (Juus) de Jong (1988).
 
J.  Chem.  Inf.  Comput.  Sci.  (1983)  , vol. 23, No. 3; pp. 102-108; DARC
Substructure  Search  System: A New Approach to Chemical Information; Roger
Attias.
 
J.  Chem. Inf. Comput. Sci. (1987), vol. 27, No. 2; pp. 74-82; DARC System:
Notions  of Defined and Generic Substructures. Filiation and Coding of FREL
Substructure (SS) Classes; Jacques-Emile Dubois et al.
 
J.   Chem.  Inf.  Comput.  Sci.  (1990),  vol.  30,  No.  2;  pp.  191-199,
Substructure  Search Systems. 1. Performance Comparison of the MACCS, DARC,
HTSS,  and CAS Registry MVSSS, and S4 Substructure Search System; Martin G.
Hicks & Clemens.
 
J.  Chem.  Inf.  Comput.  sci.  (1988),  vol.  28,  No.  4; pp. 221-226; An
Efficient Graph Approach to Matching Chemical Structures, O. Owolabi.
 
J.  Chem.  Inf. Comput. Sci. (1990), vol. 30, No. 4; pp. 332-339; Reactions
in the Beilstein Information System: Nonaporic Organic Synthesis; Martin G.
Hicks.
 
Analytica  Chimica Acta, 235 (1990), pp. 87-92; Substructure Search Systems
for Large Chemical Data Bases; Martin G. Hicks et al.
 
J.  Chem.  Inf.  Comput.  Sci.  (1991),  vol.  31,  No. 2; pp. 320-326; The
Beilstein Structure Registry System. 1. General Design; Laszio Domokos.
 
J. Chem. Inf. Comput. Sci. (1989), vol. 29, No. 4; pp. 255-260; 3DSearch; A
System for Three-Dimensional Substructure Searching; Robert P. Sheridan, et
al.
 
Substructure  Searches  of  Chemical  Structure  Files;  (Jan.  23,  1973);
Strategic   Considerations   in  the  Design  of  a  Screening  System  for
Substructure  Searches  of  Chemical Structure Files; George W. Adamson, et
al.
 
Chemical  Structure  Searching;  (Jan.  21,  1975); An Efficient Design for
Chemical Structure Searching. I. The Screens; Alfred Feldman et al.
 
J.  Chem.  Inf.  Comput.  Sci. (1982), vol. No. 4; The Third BASIC Fragment
Search Dictionary; W. Graf, H. K. Kaindl, et al.
 
J.  Chem.  Inf.  Comput. Sci. (1983), vol. 23, No. 3; The CAS Online Search
System.  1.  General  System  Design  and Selection, Generation, and Use of
Search Screens; P. G. Dittmar, et al.
 
Computer  Chemical,  ((1991),  vol.  15, No. 2, pp. 103-107; A Central Atom
Based  Algorithm  and Computer Program for Substructure Search; Alf Dengler
and Ivar Ugi.
J.  Chem.  Inf. Comput. Sci. (1993), vol. 33, No. 4; pp. 545-547; Sturcture
Searching  in  Chemical  Databases  by  Direct  Lookup Methods; Baradley D.
Christie et al.
 
 
PRIMARY EXAMINER: Von Buhr, Maria N.
ATTORNEY, AGENT, OR FIRM: Dickstein Shapiro Morin & Oshinsky
CLAIMS:           14
EXEMPLARY CLAIM:  1
DRAWING PAGES:    7
DRAWING FIGURES:  12
ART UNIT:         277
FULL TEXT:        798 lines
 
 
  2/2/3     (Item 2 from file: 654) 
DIALOG(R)File 654:US PAT.FULL.
(c) format only 2001 The Dialog Corp. All rts. reserv.
 
             02592280
Utility
CHEMICAL STRUCTURE STORAGE, SEARCHING AND RETRIEVAL SYSTEM
 
PATENT NO.:  5,577,239
ISSUED:      November 19, 1996 (19961119)
INVENTOR(s): Moore, Jeffrey, 12 Breezy Tree Ct., Timonimun, MD (Maryland),
             US (United States of America), 21093
             Brazil, Joanne, 4500 Jolly Acres Rd., White Hall, MD
             (Maryland), US (United States of America), 21161
             Hoover, Jeffrey R., 8639 Willow Oak Rd., Baltimore, MD
             (Maryland), US (United States of America), 21234
             [Assignee Code(s): 68000]
EXTRA INFO:  Assignment transaction [Reassigned], recorded October 12,
             1994 (19941012)
             Assignment transaction [Reassigned], recorded January 21,
             1997 (19970121)
APPL. NO.:   8-288,503
FILED:       August 10, 1994 (19940810)
U.S. CLASS:  707-3 cross ref: 702-27
INTL CLASS:  [6] G06F 17-30
FIELD OF SEARCH: 364-DIG.1; 364-DIG.2; 364-496; 364-497; 364-499; 395-600
 
                             References Cited
 
                          U.S. PATENT DOCUMENTS
 
    4,642,762    2/1987   Fisanick                               364-300
    4,811,217    3/1989   Tokizane et al.                        364-300
    4,855,931    8/1989   Saunders                               364-499
    5,025,388    6/1991   Cramer, III et al.                     364-496
    5,056,035   10/1991   Fujita                                 364-497
    5,249,137    2/1993   Wilson et al.                          364-496
    5,367,058   11/1994   Pitner et al.                        530-391.9
    5,379,234    1/1995   Wilson et al.                          364-496
    5,386,507    1/1995   Teig et al.                            395-161
    5,418,944    5/1995   DiPace et al.                          395-600
    5,463,564   10/1995   Agrafiotis et al.                      364-496
 
                             OTHER REFERENCES
 
 
Viking  Instruments  Corp.  (Hewlett  Packard);  Spectra Trak Transportable
GC/MS System; (brochure), No date.
 
Chemical  Structure,  The International Language of Chemistry; Wendy A. War
(Ed.); "Interfacing DARC-Oracle" AJCM (Juus) de Jong (1988).
 
J.  Chem.  Inf.  Comput.  Sci.  (1983),  vol.  23,  No. 3 pp. 102-108; DARC
Substructure  Search  System; A New Approach to Chemical Information; Roger
Attias.
 
J.  Chem. Inf. Comput. Sci. (1987), vol. 27, No. 2; pp. 74-82; DARC System;
Notions  of Defined and Generic Substructures. Filiation and Coding of FREL
Substructure (SS) Classes; Jacques-Emile Dubois et al.
 
J.   Chem.  Inf.  Comput.  Sci.  (1990),  vol.  30,  No.  2;  pp.  191-199,
Substructure  Search Systems, 1, Performance Comparison of the MACCS, DARC,
HTSS,  CAS  Registry  MVSSS,  and S4 Substructure Search Systems; Martin G.
Hicks.
 
J.  Chem.  Inf.  Comput.  Sci.  (1988),  vol.  28,  No.  4; pp. 221-226; An
Efficient Graph Approach to Matching Chemical Structures, O. Owolabi.
 
J.  Chem.  Inf. Comput. Sci. (1990), vol. 30, No. 4; pp. 332-339; Reactions
in the Bellstein Information System: Nonaporic Organic Synthesis; Martin G.
Hicks.
 
Analytica  Chimica Acta, 235 (1990), pp. 87-92; Substructure Search Systems
for Large Chemical Data Bases; Martin G. Hicks et al.
 
J.  Chem.  Inf.  Comput.  Sci.  (1991),  vol.  31,  No. 2; pp. 320-326; The
Bellstein Structure Registry System, 1, General Design; Laszio Domokos.
 
J. Chem. Inf. Comput. Sci. (1989), vol. 29, No. 4; pp. 255-260; 3DSearch; A
System for Three-Dimensional Substructure Searching; Robert P. Sheridan, et
al.
 
Substructure  Searches  of  Chemical  Structure  Files;  (Jan.  23,  1973);
Strategic   Considerations   in  the  Design  of  a  Screening  System  for
Substructure  Searches  of  Chemical Structure Files; George W. Adamson, et
al.
 
Chemical  Structure  Searching;  (Jan.  21,  1975); An Efficient Design for
Chemical Structure Searching, I, The Screens; Alfred Feldman et al.
 
J. Chem. Inf. Comput. Sci. (1982), vol. 22, No. 4; The Third BASIC Fragment
Search Dictionary; W. Graf, H. K. Kaindl, et al.
 
J.  Chem.  Inf.  Comput. Sci. (1983), vol. 23, No. 3; The CAS ONLINE Search
System,  1,  General  System  Design  and Selection, Generation, and Use of
Search Screens; P. G. Dittmar, et al.
 
Computer  Chemical,  (1991),  vol.  15,  No. 2; pp. 103-107; A Central Atom
Based  Algorithm  and Computer Program for Substructure Search; Alf Dengler
and Ivar Ugi.
 
J.  Chem.  Inf. Comput. Sci. (1993), vol. 33, No. 4; pp. 545-547; Structure
Searching  in  Chemical  Databases  by  Direct  Lookup  Methods; Bradley D.
Christie et al.
 
J.   Chem.  Inf.  Comput.  Sci.  (1993);  vol.  33,  No.  4;  pp.  539-541;
Substructure  Searching  on  Very  Large  Files  by  Using Multiple Storage
Techniques; Alexander Bartmann et al.
 
 
PRIMARY EXAMINER: Black, Thomas G.
ASST. EXAMINER:   Von Buhr, Maria N.
ATTORNEY, AGENT, OR FIRM: Dickstein Shapiro Morin & Oshinsky LLP
CLAIMS:           12
EXEMPLARY CLAIM:  1
DRAWING PAGES:    7
DRAWING FIGURES:  12
ART UNIT:         237
FULL TEXT:        791 lines
?
 
 
 TYPE 2/2,EM,SU/ALL


  2/2,EM,SU/1     (Item 1 from file: 653) 
DIALOG(R)File 653:US Patents Fulltext
(c) format only 2001 The Dialog Corp. All rts. reserv.
 
             01566805
Utility
STORAGE AND RETRIEVAL OF GENERIC CHEMICAL STRUCTURE REPRESENTATIONS
 
PATENT NO.:  4,642,762
ISSUED:      February 10, 1987 (19870210)
INVENTOR(s): Fisanick, William, Columbus, OH (Ohio), US (United States of
             America)
ASSIGNEE(s): American Chemical Society, (A  U.S. Company or Corporation ),
             Washington, DC (District of Columbia), US (United States of
             America)
EXTRA INFO:  Expired, effective February 10, 1999 (19990210), recorded in
             O.G. of April 20, 1999 (19990420)
             Reinstated, effective June 28, 1999 (19990628), recorded in
             O.G. of July 27, 1999 (19990727)
APPL. NO.:   6-614,219
FILED:       May 25, 1984 (19840525)
U.S. CLASS:  707-3 cross ref: 707-104
INTL CLASS:  [4] G06F 15-40
FIELD OF SEARCH: 364-200MSFILE; 364-900MSFILE
                             References Cited
 
                          U.S. PATENT DOCUMENTS
 
    4,473,890    9/1984   Araki                                  364-900
 
PRIMARY EXAMINER: Zache, Raulfe B.
ATTORNEY, AGENT, OR FIRM: Pollick, Philip J.
CLAIMS:           31
EXEMPLARY CLAIM:  1
DRAWING PAGES:    22
DRAWING FIGURES:  22
ART UNIT:         232
FULL TEXT:        1514 lines
 
 
                                  FIELD
 
  This  invention  relates  to  a method for storing and retrieving generic
chemical  structure  representations (Markush formulations) and information
associated   with  them.  It  is  directed  especially  to  development  of
specific(real)-atom   and  generic-group  representations  of  the  Markush
formulation  that are used in atom-by-atom and group-by-group comparison of
query  and  file  representations  and  the  use of screening techniques to
eliminate  a  high  percentage  of irrelevant file representations prior to
group-by-group   and   atom-by-atom   comparison   of   generic-group   and
specific-atom representations.
 
                               BACKGROUND
 
  The  ability  to  effectively  retrieve  information  on generic chemical
structures,  i.e.,  so-called  Markush  structures,  has  been a problem of
varying  magnitude  and  complexity  since  the inception of the use of the
Markush  claim  by  the  Patent  Office  in  the  1920's.  Many  manual and
mechanized  information  retrieval  systems have been developed to meet the
challenge  of  this problem but the known techniques for such retrieval are
imprecise  and  often  place  a  premium  on  the knowledge, intuition, and
cognitive skills of the searcher.
 
  The  basic  system for dealing with Markush structures is a manual system
in   which  individual  documents  containing  the  Markush  structure  are
classified   according  to  a  highly  refined  classification  system  and
physically  grouped  according  to  the classification scheme into a search
file. In making a search, the searcher proceeds by classifying the document
(query)  in  hand  and  then  goes to the appropriately classified physical
group of documents in the search file and manually searches those documents
for relevant retrievals. Such a system places a high premium on the correct
initial  classification of search file documents, correct classification of
the  query,  physical search-file integrity, and highly-developed cognitive
skills  of  the  searcher.  Moreover,  because  the  Markush  may represent
thousands  or  even  millions  of  compounds,  it  often  is  impossible to
promulgate   copies   of   the   document  into  all  of  the  search  file
classifications  represented  by the Markush formulation. Weaknesses in any
of  the  aforementioned  areas  is  likely to produce unsatisfactory search
results.  (U.S.  Department  of  Commerce,  "Development  and Use of Patent
Classification Systems", U.S. Government Printing Office, Washington, D.C.,
1966.)
 
  Another  technique  used  in  both  manual and mechanized systems for the
handling   of   Markush   structures  involves  the  use  of  a  system  of
fragmentation  codes  that  are  in  effect  generic  or  real-atom "group"
representations  of  portions  of  a  particular  Markush  formulation. For
example,  that portion of the formulation containing chains of carbon atoms
might  be  generically  encoded  as  alkyl,  or  OH  group as an alcohol or
hydroxide,  and  F,  Cl,  Br,  and I as a halide. Real-atom groups, such as
methyl  for  CH sub 3 13 , ethyl for CH sub 3 CH sub 2 --, and phenyl for C
sub  6 H sub 5 --, are also typically used. (Balent, M. Z.; Emberger, J. M.
"A  Unique Chemical Fragmentation System for Indexing Patent Literature" J.
Chem.  Inf.  Comput.  Sci.  1975,  15,  100-104.  Kaback,  S.  M. "Chemical
Structure Searching in Derwent's World Patents Index" J. Chem. Inf. Comput.
Sci.  1980,  20, 1-6. Rossler, S.; Kolb, A. "The GREMAS System, an Integral
Part  of the IDC System for Chemical Documentation" J. Chem. Doc. 1970, 10,
128-134. Rowlett, R. J. "Gleaning Patents with Chemical Abstracts" Chemtec.
1979,  June,  348-349.  Silk,  J.  A.  "Present  and  Future  Prospects for
Structural  Searching  of the Journal and Patent Literature." J. Chem. Inf.
Comput.  Sci.  1979,  19,  195-198.) However, the inter-relationships among
these  groups  in  a  Markush  formulation  are typically not encoded. As a
result,  such  systems tend to have good recall, i.e., most of the relevant
search file answers are retrieved but, because the inter-relationship among
the  groups  can  not be specified and the reliance on generic terminology,
such  systems  have  a pronounced tendency to lack precision, i.e., many of
the  answers  retrieved  are  irrelevant  to  the query. Precision has been
improved  by  incorporation  of  a  higher  degree  of specificity into the
fragmentation codes, but only at a price paid in terms of higher complexity
and  difficulty  in  file  encoding  and  search  profile formulation and a
resulting higher potential for error.
 
  Mechanized  specific  atom-by-atom  structure  matching of query and file
structural  representations  is  a well-known commercial technique that has
been  available  since  the  1960s  and  has  demonstrated  high recall and
precision  as  a search and retrieval technique. (Wigington, R. L. "Machine
Methods for Accessing Chemical Abstracts Service Information in Proceedings
of  the  IBM  Symposium  on  Computers  and Chemistry"; IBM Data Processing
Division:  White  Plains, NY, 1969. Eakin, D. R. "The ICI CROSSBOW System,"
in  Ash,  J.  E.;  Hyde, E., Eds. Chemical Information Systems, Chichester,
Horwood,  1975.  Dubois,  J.  E.  "DARC  System  in Chemistry", in Computer
Representation  and  Manipulation  of  Chemical  Information, Wipke, W. T.;
Heller,  S.; Feldman, R.; Hyde, E., Eds., Wiley, New York, 1974. Schenk, H.
R.;  Wegmuller,  F. "Substructure Search by Means of the Chemical Abstracts
Service  Chemical  Registry II System" J. Chem. Inf. Comput. Sci. 1976, 16,
153-161.   Feldman,   R.  J.  "Interactive  Graphic  Chemical  Substructure
Searching"   in   Computer  Representation  and  Manipulation  of  Chemical
Information,  Wipke, W. T.; Heller, S.; Feldman, R.; Hyde, E., Eds., Wiley,
New  York,  1974.)  Because atom-by-atom structure matching is a relatively
slow  process, screening techniques have been developed to eliminate a high
percentage of irrelevant file representations. Typically screening involves
capturing key features of the file representations such as atom environment
and  atom  sequences  and  then  matching similar key features of the query
representation  to give a set of answers that are then used in atom-by-atom
structure  matching.  (Dittmar, P. G.; Farmer, N. A.; Fisanick, W.; Haines,
R.  C.;  Mockus, J. "The CAS ONLINE Search System. 1. General System Design
and Selection, Generation, and Use of Search Screens" J. Chem. Inf. Comput.
Sci.  1983,  23, 93-102. Attias, R. "DARC Substructure Search System: A New
Approach  to  Chemical  Information"  J.  Chem. Inf. Comput. Sci. 1983, 23,
102-108.)  Unfortunately,  structure matching techniques tend to be limited
to  files  containing  representations  of  unique individual compounds and
queries  have been limited to specific structural representations that must
exactly   match   the   structural  representation  of  the  file  compound
(full-structure  search)  or  be  embedded within it (substructure search).
Structure  matching  techniques  have  been applied to Markush formulations
which  represent  a  relatively  small  number  of specific compounds using
queries  that  contain  only real atoms. (Meyer, E. "Topological Search for
Classes   of   Compounds  in  Large  Files--even  of  Markush  Formulas--at
Reasonable  Machine  Cost"  in  Computer Representation and Manipulation of
Chemical  Information,  Wipke,  W.  T.;  Heller, S.; Feldman, R.; Hyde, E.,
Eds.,  Wiley,  New  York,  1974.) However, in attempting to apply structure
matching  techniques  to  query  and file structures represented by Markush
formulations  of  the  type  often  found  in  broad  patent claims, one is
immediately  faced  with  the problem that a single Markush formulation may
literally represent millions of specific compounds. When one considers that
the  file  size of the current large commercial structural matching systems
is  a little less than seven million specific compounds, an appreciation is
gained  for the difficulty in using structure matching techniques to search
effectively  Markush structures. Although proposals have been made to apply
structure  matching  techniques  to  broad  Markush formulations, no viable
system  for searching such Markush formulations that gives a high degree of
recall  and precision has yet been achieved. (Lynch, M. F.; Bernard, J. M.;
Welford,  S.  M.  "Computer  Storage  and  Retrieval  of  Generic  Chemical
Structures  in Patents. 1. Introduction and General Strategy" J. Chem. Inf.
Comput.  Sci.  1981, 21, 148-150. Barnard, J. M.; Lynch, M. F.; Welford, S.
M.  "Computer  Storage  and  Retrieval  of  Generic  Chemical Structures in
Patents.  2.  GENSAL,  a  Formal  Language  for  the Description of Generic
Chemical Structures" J. Chem. Inf. Comput. Sci. 1981, 21, 151-161. Welford,
S.  M.;  Lynch,  M.  F.;  Barnard, J. M. "Computer Storage and Retrieval of
Generic Chemical Structures in Patents. 3. Chemical Grammars and their Role
in  the  Manipulation  of  Chemical  Structures" J. Chem. Inf. Comput. Sci.
1981,  21,  161-168. Barnard, J. M.; Lynch, M. F.; Welford, S. M. "Computer
Storage  and  Retrieval  of  Generic  Chemical Structures in Patents. 4. An
Extended Connection Table Representation (ECTR) for Generic Structures." J.
Chem.  Inf.  Comput.  Sci.  1982,  22,  160-164. Nakayama, T.; Fujiwara, Y.
"Computer  Representation  of  Generic  Chemical  Structures by an Extended
Block-Cutpoint  Tree"  J.  Chem. Inf. Comput. Sc 1983, 23, 80-87. Kudo, Y.;
Chihara  H.  "Chemical  Substance  Retrieval  System  for Searching Generic
Representations.  1.  A  Prototype System for the Gazetted List of Existing
Chemical  Substances  of  Japan"  J.  Chem.  Inf.  Comput.  Sci.  1983, 23,
109-117.)
 
                                 SUMMARY
 
  A  typical  Markush storage and retrieval process according to the resent
invention   comprises   the   steps   of   forming  a  file  of  structural
representations  of  Markush formulations in which each Markush formulation
is represented by a single specific atom multiple connectivity node (SpMCN)
representation  in which the formal valance requirements of requisite atoms
are  relaxed  to  allow for the attachment of all atoms and groups of atoms
depicted   in   the   Markush   formulation  and,  as  a  result,  gives  a
representation  containing  all  implicit specific atom structures found in
the  Markush  formulation.  The  SpMCN  is  then converted to an associated
generic group multiple connectivity node (GnMCN) representation through the
use  of a generic-group hierarchy. A query Markush formulation is similarly
converted   to   SpMCN   and   GnMCN   representations.   The  query  GnMCN
representation  then  is  compared on a group-by-group basis with each file
GnMCN  in  such fashion so that a match is found when at least one implicit
generic  structure  representation  (IGSR)  of the query GnMCN is identical
with  (overlaps)  or is contained in (embedded in) at least one IGSR of the
file  GnMCN.  The  query  SpMCN  representation  then  is  compared  on  an
atom-by-atom  basis with the file SpMCN representations associated with the
file  GnMCN  representations  (answers)  obtained  in  the  previous  query
GnMCN/file  GnMCN  matching  step  in such fashion so that a match is found
when at least one implicit specific atom structure representation (ISSR) of
the  query  SpMCN structure is identical with (overlaps) or is contained in
(embedded  in)  at  least one ISSR of the file SpMCN. An indexing system is
used to identify IGSRs and ISSRs for the matching process and to manipulate
large or complex GnMCN and SpMCN representations.
 
  As  a  further  refinement,  generic  features  of  the  original Markush
formulation are captured by using the generic-group hierarchy as a means of
representing  generic features of the Markush formulation in both the SpMCN
and  GnMCN. To insure high recall, a roll-back feature is used to allow for
the  exchange  of  generic-group and specific-atom representations in SpMCN
matching so that all real atom file or query structural features implied in
the  generic structural features of the file or query SpMCN are matched. In
addition,  specific features of the SpMCN and specifically identified parts
of  generic  features of the original Markush formulation, such as specific
atoms,  type  of  bonding, ring size, etc. are associated with each generic
group  of  the file GnMCN as group attributes and are matched against group
attributes  of  the  generic  groups  of  the  query  GnMCN  prior to SpMCN
matching.
 
  As  a  further refinement, screening techniques are applied to both SpMCN
and  GnMCN  representations  in  order  to  eliminate  a  large  number  of
irrelevant  file  representations prior to the more exacting group-by-group
and atom-by-atom comparisons. In order to achieve a high level of recall, a
Boolean  strategy  is  used  in  the  query screen logic expression whereby
special,  "diagnostic"  generic-group  screens are used as alternatives for
sets  of  specific-atom  screens  in  order  to  retrieve  file  answers in
situations  where  real-atom  structures  of  the SpMCN query structure are
implied  in  the  generic  portions  of the file SpMCNs that originate from
generic  features  of  the original Markush formulation and for which there
are no real-atom counterparts.
 
                          DETAILED DESCRIPTION
 
  A simple Markush formulation is set forth in structure Ia of FIG. 1. This
formulation  consists of a fixed structure portion to which is attached the
variable  groups  R  sub 1 and R sub 2. As indicated in the text portion of
the  formulation,  R sub 1 may be chlorine (Cl) or bromine (Br) and R sub 2
may  be  ethyl  (CH  sub  3 CH sub 2) or methyl (CH sub 3). Implicit in the
Markush  formulation  is  the  representation  of  four distinct individual
compound representations, Ia1-Ia4, that are, in effect, all of the possible
individual  structures  resulting from the combinations of fragments in the
variable groups denoted by R sub 1 and R sub 2.
 
  In  representation  Ia,  it  is  noted  that  carbon  (C) typically has a
connectivity (valance) of four, i.e., is capable of attaching or connecting
itself  to  four  other entities or to fewer than four other entities via a
multiple  bond  to  one  or  more of the entities. Specifically in the ring
system  of  representation Ia, each carbon is bound to a second carbon by a
multiple  (double)  bond, to a third carbon atom by a single bond, and to a
hydrogen atom (H) or to a variable group node (R sub 1,R sub 2) by a single
bond  to  give  the usual carbon valance of four. As is shown in structures
Ia1-Ia4,  it  is  common  practice  in  the chemical arts often to omit the
hydrogen  atoms  and  to  designate  the  alternate single and double bonds
between  carbon atoms in the ring as a circle, the later convention is felt
to  represent  more  realisticly  a  delocalized bonding situation in which
there  are  more like one and a half bonds between all carbon atoms. Except
where  noted,  these  common  conventions  will  be followed throughout the
remainder of the specification and drawings.
 
  Structure  Ib  of FIG. 1 is a multiple connectivity node (MCN) structure.
In  it,  all of the fragments belonging to the variable groups described in
the  text  part  of  the  Markush  formulation  have been attached to their
respective  nodes  or points of variability (as shown in the structure part
of  the  Markush  formulation  Ia)  giving rise to nodes of abnormally high
connectivity  and  hence  the multiple connectivity node (MCN) designation.
Since  Ib  represents  all  of the specific atoms identified in the Markush
formulation, it is designated as a specific-atom multiple connectivity node
(SpMCN)   representation.  It  should  be  noted  that  the  four  distinct
individual  compound  representations,  Ia1-Ia4,  are  also implicit in the
SpMCN.  These  individual  implicit  representations  are  referred  to  as
implicit specific-atom structural representations (ISSRs).
 
  By  using  common  generic technology, it is possible to simplify further
the  specific  multiple  connectivity  node structure (SpMCN). For example,
carbon  ring  structures containing only carbon atoms in the ring are often
given the generic description of carbocycles; linear chains of carbon atoms
are  generically  termed  alkyls; and chlorine and bromine are often called
halides.  Using this basic generic terminology, it is possible to transform
the  SpMCN  representation shown in Ib to the generic multiple connectivity
node  representation  (GnMCN)  shown  in Ic. In transforming the SpMCN to a
GnMCN,  the  bonding level between the generic groups is preserved. In this
particular  example,  only  a single bond exists between the carbocycle and
the  variable  groups.  If,  however,  a  multiple  bond exists between the
generic  representations,  such bonding is indicated in the GnMCN. Implicit
within  the  GnMCN representation are four implicit generic group structure
representations  (IGSRs),  Ic1-Ic4,  corresponding  to  the  four  distinct
compound  representations,  ISSRs, implicit in the original Markush and the
SpMCN.  It  is  critical  to  note  that  IGSRs  and  ISSRs  are  used  for
illustrative  purposes only. This invention does not anticipate the storage
of  all  ISSRs  and  IGSRs  associated with the respective SpMCN and GnMCN.
Rather  the invention is directed at the indivdiual ISSRs and IGSRs as they
are   implicitly   contained   within  the  SpMCN  and  GnMCN.  The  actual
representation  and  processing  uses  only  the  explicit  SpMCN and GnMCN
representations.  The  ISSRs and IGSRs are used only as they are implicitly
found within the SpMCN and GnMCN representations.
 
  FIGS.  2,  2',  3,  3'  and  4  illustrate the use of the GnMCN and SpMCN
representations  in  file  searching  and  retrieval.  In  FIGS.  2 and 2',
representations  IIa-VIa are illustrative file Markush formulations as they
might  appear in patent documents, IIb-VIb are SpMCN representations of the
corresponding Markush formulation, and IIc-VIc are GnMCN representations of
the corresponding SpMCN representations. The query Markush formulation VIIa
is  also  shown as a SpMCN representation (VIIb) and a GnMCN representation
(VIIc). As shown in FIGS. 3 and 3', a file search is initiated by comparing
each  query  IGSR  (VIIc1  and  VIIc2)  with  each  file  IGSR  (11c1-IIc5,
IIIc1-IIIc3,   IVc1-IVc3,   Vc1-Vc4,   and  VIc1-VIc4;  identical  implicit
structures  are  shown  only  once). As seen, query IGSR VIIc1 matches with
file  IGSRs IIc1-IIc5 and Vc4; VIIc2 matches with Vc2-Vc3 and VIc1-VIc4. At
this point, representations III and IV have been eliminated from the search
and,  as  shown  in  FIG.  4, matching now proceeds between the query ISSRs
VIIb1-VIIb2  and  the  file  ISSRs IIb1-IIb6, Vb1-Vb4, and VIb1-VIb4; query
ISSR  VIIb1  matches  only with file ISSR IIb1 and query ISSR VIIb2 matches
nothing,  specifically  illustrating that only one ISSR need match one file
ISSR  to  give  an  answer.  To  complete  the search, relevant information
associated with the Markush formulation IIa such as, but not limited to, an
abstract, patent number, or patent document is retrieved for the searcher.
 
  FIG. 5 illustrates the two types of matching criteria that a searcher may
use  in  carrying  out  a  search. Representation VIII is a single compound
representation  in  which  the ISSR is identical with the actual structure.
This structure matches exactly with the file representation X which is also
a single compound representation. The exact matching of all characteristics
of the query representation with those of the file representation is termed
"overlap" or full-structure search. Exact matching may be relaxed such that
the   query   representation   need  only  be  contained  within  the  file
representation.  Thus,  although query representation VIII does not exactly
match  or  "overlap"  the  single  file  representation XI, it is contained
within  representation  XI.  Such  containment  of the query representation
within  the  file  representation  is  termed  "embedment"  or substructure
search.  Systems  for  both  full-structure  and  substructure  search  are
commercially  available,  e.g.,  CAS  ONLINE:  The  Registry File, Chemical
Abstracts Service, Columbus, Ohio.
 
  Atom-by-atom  searching involves the comparison of a query structure with
a file structure using a path-tracing technique. Typically the path-tracing
technique  involves selecting a starting atom (node) of the query structure
(usually a noncarbon atom) and comparing it with the first atom of the file
structure. If the atoms do not match, the file structure is advanced to the
next atom (node) until a match with the starting query node is obtained. If
a match is obtained, the query proceeds to the next connected atom which is
compared  with the next connected atom of the file structure. If these next
atoms  do  not  match,  the  file  structure is backtracked to the original
matching  atom  and  another  connected  atom  is  selected for match. This
advancing/comparing/backtracking routine is continued until all atoms match
or all atom sequences of the query are exhausted. Overlap requires that all
atoms  of  the  query  match  with  all  atoms  of the file structure while
embedment  requires  that all of the atoms of the query be contained within
the  file  structure.  A  description  of atom-by-atom matching is given in
Lynch, M. F.; Harrison, J. M.; Town, W. G.; Ash, J. E. Computer Handling of
Chemical  Information,  MacDonald,  London  and American Elsevier Inc., New
York, 1971 at pp. 73-74, all of which is herein incorporated by reference.
 
  It  is  an  object  of  this  invention  to  extend  both the overlap and
embedment  matching  concepts to Markush searching. Thus if the query SpMCN
IXa  search is limited to overlap only, the query ISSR IXa1 will match only
with  file  ISSR  XIIa2. If the matching criterion is relaxed to embedment,
ISSR  XIIIa1  is  also  a  valid  match.  It  is not necessary to limit the
searching  of  a  Markush  query  to a Markush file, e.g., the ISSRs of the
SpMCN  representation  also  can  be  compared  with both specific compound
representations such as X and XI and the ISSRs of the SpMCN representations
XIIa  and  XIIIa.  At the overlap level of search, query IXa retrieves file
representations  X  and  XIIa;  at  the  embedment  level  of  search, file
representations  X,  XI,  XIIa,  and  XIIIa  are retrieved. Single specific
compound  queries also can be searched against the Markush file, e.g., VIII
matches  with  XIIa1  (overlap)  and  with XIIIa1 (embedment). Although not
illustrated,  embedment  and  overlap criteria are also used at the generic
level   of  searching.  Thus  an  implicit  generic  query  representation,
alkyl-halide,   overlaps   an   implicit   generic   file   representation,
alkyl-halide,  and  is embedded in an implicit generic file representation,
carbocycle-alkyl-halide. Finally it is noted that the overlap criterion can
be  applied  to  the  entire  SpMCN  representation  itself.  Such  a match
condition  requires  all structural elements of the file SpMCN be identical
to  all structural elements of the query SpMCN, i.e., all ISSRs or the file
and   query   SpMCNs   must   be  identical.  Requiring  the  entire  SpMCN
representation IXa to match at the overlap level permits only the retrieval
of  file  SpMCNs that are identical to it, i.e., contain both IXa1 and IXa2
but   only   those  two  implicit  representations.  For  an  entire  SpMCN
representation   to   match   at   the  embedment  level,  the  file  SpMCN
representation must contain all ISSRs of the query representation.
 
  In order to convert SpMCN representations to GnMCN representations, it is
highly  desirable  to have a classification scheme that uses a small number
of  controlled-vocabulary  hierarchical terms that permit classification of
all  groups  of atoms likely to be encountered in a specific substance or a
Markush  formulation.  FIG. 6 illustrates such a classification scheme. The
overall  structure  of  the classification scheme consists of breaking each
less-specific  group into two mutually exclusive, more specific groups. The
general  group "G" is used to handle groups of atoms that can not be easily
associated   with   a   more   specific   group  classification,  e.g.,  an
electron-withdrawing  group,  a group containing nitrogen, etc. The G group
is  classified further into two mutually exclusive groups: any cyclic group
(Cy)  or  any acyclic group (Ay). The cyclic group (Cy) is broken down into
any  carbocycle  group  (Cb)  or any heterocycle group (Hc). The carbocycle
group  (Cb)  characterizes any ring system containing only carbon atoms and
any  attached  hydrogen  atoms.  The  Cb group may be attached to any other
group,  including itself, or it may stand alone. The heterocycle group (Hc)
characterizes  any  ring  system  containing one or more hetero (noncarbon)
atoms and any attached hydrogen atoms. Similar to Cb, Hc may be attached to
any  group, including Hc, or it may stand alone. A fused ring system, i.e.,
two or more rings joined at two or more atoms on each ring with each other,
is  considered  a  single  group while two rings joined to each other by an
acyclic bond is considered as two groups. Thus a naphthalene ring system is
designated  as Cb while a biphenyl system would be characterized as Cb--Cb.
A  quinoline  ring  system,  which  consists  of  a  carbocycle  fused to a
heterocycle, is considered as a single heterocycle group, Hc.
 
  Moving  to  the  acyclic side of the hierarchy, the acyclic group (Ay) is
broken  down  into  any  acyclic  carbon  (chain) group (Ch) or any acyclic
noncarbon  (functional)  group  (Fg).  The  acyclic noncarbon group (Fg) is
further broken down into any acyclic noncarbon connecting group (Fc) or any
acyclic   noncarbon  terminal  group  (Ft).  The  terminal  group  (Ft)  is
characterized  as  a single atom that is neither carbon or hydrogen but may
be  attached  to  one  or  more  hydrogens.  The  Ft  group may stand alone
(unattached to any other group), e.g. NH sub 3, H sub 2 O, Cu, or it may be
attached  to  one and only one other group where the other group may be any
other  group  including  Ft  except that the Ft group cannot be bound to an
alkyl  group  (Ak)  by a multiple bond since, by definition, an alkyl group
bound  to a Ft group by a multiple bond is a Cg group. Thus C sub 6 H sub 5
--NH  sub  2  transforms  to Cb--Ft while an aldehyde such as CH sub 3 --CH
double  bond  O  transforms to Cg double bond Ft and not Ak double bond Ft.
See infra Cg and Ak. The acyclic noncarbon connecting group (Fc) is defined
as  a single atom that is neither carbon or hydrogen but may be attached to
one  or  more  hydrogens  and  must be attached to two or more other groups
including  itself,  e.g.,  phenyl-O-phenyl  is  expressed as Cb--Fc--Cb. By
definition, Fc may not stand by itself or attached to only one other group.
 
  The  acyclic  carbon  group  (Ch)  is further broken down into an acyclic
carbon group (Cg) attached to an acyclic noncarbon terminal group (Ft) by a
multiple bond, or any other acyclic carbon group (Ak) not defined as Cg. By
definition,  the  Cg  group  can not stand alone. It must be attached to at
least  one Ft group by a multiple bond and it may also be attached to other
groups, except Ak or Cg. The Ak group consists of a group of acyclic carbon
atoms  and  any  attached  hydrogen  atoms  that  may stand alone or may be
attached  to  any group, except Cg or Ak. When a Cg is attached to an Ak or
another  Cg or when an Ak is attached to a Cg or another Ak, the two groups
merge into the appropriate single group, e.g., Ft double bond Cg--Cg double
bond  Ft  becomes  Ft  double bond Cg double bond Ft, Ft double bond Cg--Ak
becomes Ft double bond Cg, and Ak--Ak becomes Ak. CH sub 3 --CH double bond
O  becomes Cg double bond Ft; CH sub 3 --OH becomes Ak--Ft. The compound CH
sub  3  --CH  sub  3 is not represented Ak--Ak but rather as simply Ak. The