Allison Tipton

I590

May 5, 2005

Using the Lawson Number when Searching Beilstein

            This paper will explore the derivation and use of the Lawson Number in the Beilstein database.  The Lawson Number is a part of all records in the online Beilstein database.  It is a number that tells the user something about the structure of the molecule.  The Lawson Number is derived from the Beilstein system, and was first implemented when Beilstein went online in 1987.  (Lawson, 1990).  The Lawson Numbers were originally used in the program SANDRA, which took a structure and predicted a location for it in the Beilstein Handbook.  It linked a system number to a compound and a location, which allowed the compound to easily be located by Beilstein users.  The Lawson Number was the 2-byte code that was stored.  This means it could have a value from 0-65535, but the actual values stop at 32759.  (Wiggins and Coca, 2004).  Lawson intended to have a second phase of the implementation that would have extended the numbers up to 40951 in 1990, but this still has not happened yet.  It was to include shape discriminators as well as numbers for large and complex rings.  Also, rings with other heteroatoms besides oxygen, sulfur, or nitrogen would have gotten Lawson Numbers assigned.  (Wiggins and Coca, 2004).

            The Lawson Number shares several characteristics with its counterpart, the Beilstein system number.  The first is an overlap in the number ranges, which can cause ambiguity in the meanings of the numbers.  This was carried over from the printed Beilstein volumes were one system number would end, and another would start on the same page.  The two sets of numbers are also defined using the same rules, but while Beilstein never published the meanings of the system numbers, Lawson did provide meanings for the Lawson numbers.  (Wiggins and Coca, 2004).  Those meanings are linked to their numbers in Coca’s database.  The Lawson numbers represent certain fragments, and can be used to find compounds with similar structural elements.  These are not substructures.  They are also not unique identifiers, and unlike the Beilstein system, compounds often have more than one Lawson Number.  Lawson states that 25.1% of compounds have one Lawson Number, 39.4 % have two, 24.0% have three, 8.5% have four, and 3.0 % have more than four.  (1990). 

            The definitions Lawson provided for the Lawson Numbers can be misleading if the user does not know much about the Beilstein system from which the numbers are derived.  The system uses the structure of the compound to assign a number.  The first decision Beilstein makes is whether a compound is principal compound or a derivative.  The very first rule is that it contains no heteroatoms (not carbon or hydrogen) other than free ones in covalent rings, which means they are only bonded to other ring atoms or hydrogen, or free functional groups, which will be defined later.  The second principal rule is that oxygen is the only chalcogen allowed in principal compounds.  It also must contain no radicals and have only natural isotopes.  (Weissbach, 1976).  For Lawson Numbers, multiple bonds and branching are also taken into account.    

            The next set of rules refers to functional groups.  The first listed functional groups are the non-functional substituents.  These are exterior groups, which include the halogens, nitro, nitroso, and azide.  Compounds with non-functional substituents are always considered as derivatives.  The next set is the free groups.  The free functional groups include the oxygen containing groups, and the further functions.  The first is the hydroxyl group (-OH).  The second is oxo (=O), which is more commonly known as carbonyl.  Its hydrate, C(OH)OH, is considered the same as an oxo group by Beilstein standards.  The third free functional group is the carboxyl group (-C(=O)OH).  The orthoacid group, (-C(OH)(OH)OH) is the hydrate of the carboxyl group, and Beilstein treats it as if is a carboxyl group.  (Weissbach, 1976). 

            The further functions are other functional groups with the formula –Z(O)mHn, or –ZxHn.  The subscripts “m,” and “n” are any where from zero on up, depending on what Z's valence is.  In the second formula “x” can be anywhere from two on up, and “n” can also depend on “x” as well as Z’s valence.  There are no oxygen-oxygen bonds in the further functions, and Z is the atom that bonds to the main carbon framework.  Also Z cannot be a halogen, or a bivalent chalcogen.  The C-Z bond also must be covalent.  (Weissbach, 1976).  Examples of the further functions are –SO2H, -SO3H, and -NH2.  The rule for all free functional groups is that the carbon they are bonded to must not have any further non-ring bonds to other non-metals, if Z is also non-metal.  (Weissbach, 1976). 

            Once a compound has been determined to be either a principal compound or a derivative, it is classified.  The first classification of a compound is its ring class.  The three ring classes are acyclic, isocyclic, (or carbocyclic) and heterocyclic.  (Weissbach, 1976).  Acyclic compounds contain no rings in their structures, although they can have derivatives that do contain rings.  D-Glucose is an example of a classified acyclic compound with a ring. 

            Isocyclic compounds contain at least one ring in their structure, but that ring must consist of carbon atoms only.  They can contain any number of acyclic fragments, and any heteroatoms they contain cannot be ring-members.  (Weissbach, 1976).  The fragment can have one ring or a large system of rings. 

            The heterocyclic compounds also have rings, but are allowed to contain heteroatoms as ring-members.  Like the isosyclic compounds, the heterocyclic compounds can have one ring or many.  They can also include acyclic and isocyclic fragments in addition to their heterocyclic fragments.  Heterocyclic compounds are further classified by the types and numbers of heteroatoms they contain.  Each heteroatom is given a number, depending on its position in the periodic table.  Oxygen is ranked first, followed by the other chalcogens.  The nitrogen group from smallest to largest is next in the classification.  The classification moves to the left, and then down each column, with the heavy transition metals being last.  (Weissbach, 1976).  This is why oxygen appears first in the heterocyclic Lawson Numbers, followed by nitrogen, and then followed by other atoms.  Lawson Numbers do not specify ring atoms other than oxygen or nitrogen.    

            The next classification of principal compounds is in regards to their functional groups.  The functional groups are first ranked by the coupling atom, which is ranked using the same order as the heterocyclic ring classes.  That means oxygen comes first in the classification of functional group coupling atoms.  The three oxygen groups are then ranked by multiplicity, which is the number of carbon-oxygen bonds.   That puts hydroxyl first.  The oxo group is ranked second, followed by carboxyl.  The further functions are ranked next, depending on what the coupling atom Z is.  They are ranked according to multiplicity index, which is one for –Z(O)mHn, and “x” for –ZxHn.  They are ranked with the lowest numbers first.  They are then sub-ranked by the ligation index for –Z(O)mHn, or the unsaturation index for -ZxHn.  The ligation index is calculated by adding all the hydrogens bonded to Z, all the OH’s bonded to Z, and double the lone oxygens bonded to Z.  The lone oxygens are doubled because they are double bonded to Z.  The unsaturation index is xv- 2x-n+1, where “x “and “n” are the numbers in the formula, and “v” is Z’s valency.  The next classification is the oxygen index.  It is the ligation index minus the number of hydrogens.  This leaves the number of bonds to oxygen left over.  The last index for –ZxHn is the isomerism index, which refers to the longest unbranched chain of Z.  (Weissbach, 1976.).  These rules are difficult to understand but basically they mean that the number of oxygens is the next classification of functional groups, followed by the branching.  That means –SO2H comes before –SO3H, and –NH2 comes before –NHOH.  This also puts –NH-NH-NH2 before –N(NH2)2.  The only exception is –PH4 coming after –P(OH2), but that is because trivalent phosphorus is ranked before pentavalent phosphorus.  (Weissbach, 1976).  The final classification of principal compounds depends on the number of each functional group present.

            The next set of rules in the Beilstein system refers to derivatives.  These are more important in understanding the Lawson Number than the rules for principal compounds.  While Lawson’s list of Lawson number definitions makes sense when looking a principal compounds, looking at derivatives requires an actual understanding of how the Beilstein system classifies them.  The first sets of derivatives Beilstein deals with are the chalcogen analogs.  In these compounds, the bivalent chalcogen atom is treated as if it was an oxygen atom.  These compounds are examples.

Figure 1:  Chalcogen Analogs

The next rule refers to non-functional substituents.  These are treated like hydrogens unless the carbon they are bonded to is also bonded to another type of functional group.  That means chloro-methane is considered a derivative of methane, while chloro-methanol is not.  Chloro-methanol is derived in Figure 4.  The next sets of derivatives are derived by condensation.  That means the derivative is the product of a chemical reaction that gives off water as the only other product.  Examples are esters, amides, and acid anhydrides.  The next derivatives are obtained by valency-changing addition.  Only halogens, bivalent chalcogens that are anular or bonded to two carbons, and nitrogens that are either anular or bonded to at least two carbons can have radicals added to them.  This is an example of valency changing addition.  The sulfur gets changed from bivalent to hexavalent. 

Figure 2:  Valency changing addition.

The next derivatives are charged compounds created by adding carbocations or carbanions to compounds.  These are added to heteroatoms, which have either free electrons in the first case or a deficit of electrons in the second.  The next group of derivatives is created when hydrogen atoms are pulled off.  This leaves behind a free radical.  The last sets of derivatives are salts.  These include oxonium ions, alkoxides, hydroxides, acetylene derivatives, and organometallic salts.  (Weissbach, 1976).

            Beilstein and Lawson differ on how they assign numbers.  Lawson gives a number for every significant fragment, while Beilstein only gives a system number for the main fragment.  The main fragment is the fragment whose registry compound appears last in the file.  To assign places in Beilstein’s registry, they deconstruct the compound.  First, they deconstruct salts, radicals, and charged compounds by reversing the rules that form those derivatives.  The next deconstruction involves interior heteroatoms.  That means those atoms are not parts of rings and they are not terminal.  At those points, the molecule is subjected to hydrolysis.  There are many rules concerning the hydrolyzing of bonds.    For a compound that has an anular heteroatom bonded to another heteroatom, the bond is broken by capping the anular heteroatom with hydrogen and the non-anular heteroatom with a hydroxyl group. (Weissbach, 1976).

Figure 3:  Hydrolysis between two heteroatoms.

Heteroatoms that are multiple-bonded to carbon are treated as if they are derived from carbonyl groups.  Examples are imines being derived from ketones, and nitriles being derived from carboxylic acids.  The next group involves carbon atoms that have bonds to more than one non-metal heteroatom, unless all bonds are to non-functional substituents.  If a compound has a non-functional group bonded to the same carbon as another functional group, the compound is treated as an oxo derivative.

Figure 4:  A carbonyl derivative

 

Other places to cleave bonds are between a bivalent chalcogen atom and another heteroatom.  Examples here are peroxides, which get cleaved between the two oxygens, and –ONH2 compounds, which get cleaved between the oxygen and the nitrogen.  Also carbon-bound chains of heteroatoms get cleaved from other different heteroatoms.  (Weissbach 1976).  Weissbach then states that Beilstein has a list that it follows, which determines which heteroatom is capped with hydrogen and which is capped with hydroxyl.  The list is long, and depends on position in the periodic table, with top-right position starting the list.  Smaller valencies get priority.  After hydrolyzing all the relevant bonds in a molecule, the non-functional substituents are replaced by hydrogen, and bivalent chalcogens are replaced by oxygen.  (Weissbach, 1976).  The hydrolysis rule means that there are compounds that can appear to be cyclic, but they get acyclic numbers.  This one in Figure 5 is an example.  It gets cleaved between the phosphorus and each of the two ring oxygens.  This leaves an acyclic fragment behind.  Since the phosphate group is an organic fragment, it does not get a system number or a Lawson number.   

Figure 5:  A cyclic derivative of acyclic compounds

  

Self-Condensing compounds such as sugars also appear as rings but are really acyclic.  The principal compound for the following derivative in Figure 6 can be found by breaking the bond that has an arrow drawn to it.  Carbon 1 (2:00 position) will then have an oxo group on it, and the ring-oxygen will be capped with a hydrogen.   

Figure 6:  Glucose, another cyclic derivative of an acyclic compound.

  

That is where the Lawson Number 1122 (hydroxyl-oxo with 6 total oxygens) comes from. 

Once the compound has been deconstructed, the chemically significant fragments receive Lawson Numbers.  If a compound has more than one, it is known to be a derivative.  This is different from the Beilstein system where only one number is assigned.  That number, as well as the compound’s place in the printed Beilstein comes from the fragment that has the biggest number.  Therefore, if a compound has acyclic, isocyclic, and heterocyclic fragments, it will be placed with the heterocyclic compounds.    

Figure 7: A compound with three significant fragments. 

For example, this compound has three Lawson numbers.  The oxygens with arrows drawn to them are where bonds get cleaved.  The three resulting fragments each get a Lawson number.  In this case, they are 1771(hydroxyl-acid, middle fragment), 298 (hydroxyl, right fragment), and 289. (hydroxyl, left fragment).  Fragment 1771 would be the main fragment, and the one used to find this compound in printed Beilstein.  Another example, isatin, only has one Lawson number.  It is 25776, which means a heterocyclic compound with one anular nitrogen, which has two oxo groups attached to other points in the ring system.  Isatin is a principal compound, and the system number with the same definition as LN 25776 is the locator for isatin in Beilstein. 

            Lawson numbers are most effective when combined with other search terms.  For example it can be combined with molecular formula searches or excluded from molecular formula searches.  The following query gave 108 substances.

Figure 8:  A Beilstein query with LN’s

The first Lawson number range means that the user wants the oxygen in the formula needs to be part of an acyclic fragment.  Doing just that does not get rid of all answers that have rings though.  The second Lawson number in the search is the largest Lawson number an acyclic fragment can have, and it was entered so no compound retrieved can have Lawson numbers larger than that.  That will block the fragments where there are isocyclic 3-member and 4-member rings.    

            The Lawson number can also be combined with Number of Elements.  This is especially useful because of the overlap in Lawson Number ranges.  For example, 3704 is in both the hydrazine + acid range and the diazene/azo range.  Since the first range has four elements in it, and the second only has three, using an element count can limit the hitset to only one of those ranges. 

            It can also be combined with chemical names or name fragments.  The phosphoric acid derivative in Figure 5 was found by combining “phosphoric” as a name segment with the Lawson Number range 512-639.  Another query combining Lawson number 3632 with name segment “hydrazine” found five substances.  These name fragment searches are best done if the user has the Lawson number definitions.      

            The last use of the Lawson number is in substructure searches.  A Lawson number range can be combined with a substructure using the “Not” operator.  Since the Lawson numbers represent structural fragments, this can keep users from getting structures that they do not want.  In the following search, two sets of Lawson numbers are removed.  The first removal is of straight-chain alkyl groups as substituents on the marked oxygen.  The next removal is of all Lawson numbers that correspond to rings.  This actually was done on two lines.  The first line of the search was added because “not” searches cannot be done on the first line of CrossFire’s search grid, even if there is a structure in the structure query box.  To make that search work, the “less than or equal to” search is combined with a “not greater than” search.  The “less than or equal to” search brought back hits if at least one Lawson number met the criteria in the search.  To block any range of Lawson number, a “not” search has to be used.           

Figure 9:  A substructure search with Lawson Numbers

 

This search was done in stages.  The third line of the search makes a very large difference.  Before it was added, there were over 17000 hits.  After it was added, the hits dropped down to about 3100.  With a search like this, the Lawson number is used to refine hitsets.    

            Overall, the Lawson number is a useful tool, but it is easier to use with a list of definitions provided.  With the definitions, basic searches with Lawson Numbers can be used.  To completely understand the Lawson number, a basic understanding of Beilstein’s system is also needed, since the list of definitions is based from unmodified functional groups.  With the list of definitions available as well as a basic understanding of Beilstein’s system, the Lawson number can be a very powerful searching tool.   

 

References

Lawson, Alexander J.  The Lawson Number (LN): Offline Generation and Online Use. In The Beilstein Online Database: Implementation, Content and Retrieval; Stephen R. Heller, Ed.; ACS Symposium Series 436; American Chemical Society: Washington, DC, 1990; pp 143-155.

Wiggins, Gary; Coca, Usha.  Maximizing the Use of the Lawson Number in Beilstein Searching.  Presented at ACS CERM, Indianapolis, IN, June 2-4, 2004, http://www.indiana.edu/~cheminfo/gw/Lawson_Number.ppt .

Weissbach, Oskar.  The Beilstein Guide: A Manual for the Use of Beilstein Handbuch der Organischen Chemie; Springer-Verlag; Berlin, 1976.