I590
May 5, 2005
This
paper will explore the derivation and use of the Lawson Number in the Beilstein
database. The Lawson Number is a part of
all records in the online Beilstein database.
It is a number that tells the user something about the structure of the
molecule. The Lawson Number is derived
from the Beilstein system, and was first implemented when Beilstein went online
in 1987. (Lawson, 1990). The Lawson Numbers were originally used in
the program SANDRA, which took a structure and predicted a location for it in
the Beilstein Handbook. It linked a
system number to a compound and a location, which allowed the compound to easily
be located by Beilstein users. The
Lawson Number was the 2-byte code that was stored. This means it could have a value from
0-65535, but the actual values stop at 32759.
(Wiggins and Coca, 2004). Lawson
intended to have a second phase of the implementation that would have extended
the numbers up to 40951 in 1990, but this still has not happened yet. It was to include shape discriminators as
well as numbers for large and complex rings.
Also, rings with other heteroatoms besides oxygen, sulfur, or nitrogen
would have gotten Lawson Numbers assigned.
(Wiggins and Coca, 2004).
The
Lawson Number shares several characteristics with its counterpart, the
Beilstein system number. The first is an
overlap in the number ranges, which can cause ambiguity in the meanings of the
numbers. This was carried over from the
printed Beilstein volumes were one system number would end, and another would
start on the same page. The two sets of
numbers are also defined using the same rules, but while Beilstein never
published the meanings of the system numbers, Lawson did provide meanings for
the Lawson numbers. (Wiggins and Coca,
2004). Those meanings are linked to
their numbers in Coca’s database. The
Lawson numbers represent certain fragments, and can be used to find compounds
with similar structural elements. These
are not substructures. They are also not
unique identifiers, and unlike the Beilstein system, compounds often have more
than one Lawson Number. Lawson states
that 25.1% of compounds have one Lawson Number, 39.4 % have two, 24.0% have
three, 8.5% have four, and 3.0 % have more than four. (1990).
The
definitions Lawson provided for the Lawson Numbers can be misleading if the
user does not know much about the Beilstein system from which the numbers are
derived. The system uses the structure
of the compound to assign a number. The
first decision Beilstein makes is whether a compound is principal compound or a
derivative. The very first rule is that
it contains no heteroatoms (not carbon or hydrogen) other than free ones in
covalent rings, which means they are only bonded to other ring atoms or
hydrogen, or free functional groups, which will be defined later. The second principal rule is that oxygen is
the only chalcogen allowed in principal compounds. It also must contain no radicals and have
only natural isotopes. (Weissbach,
1976). For Lawson Numbers, multiple
bonds and branching are also taken into account.
The
next set of rules refers to functional groups.
The first listed functional groups are the non-functional
substituents. These are exterior groups,
which include the halogens, nitro, nitroso, and azide. Compounds with non-functional substituents
are always considered as derivatives.
The next set is the free groups.
The free functional groups include the oxygen containing groups, and the
further functions. The first is the
hydroxyl group (-OH). The second is oxo
(=O), which is more commonly known as carbonyl.
Its hydrate, C(OH)OH, is considered the same as an oxo group by
Beilstein standards. The third free
functional group is the carboxyl group (-C(=O)OH). The orthoacid group, (-C(OH)(OH)OH) is the
hydrate of the carboxyl group, and Beilstein treats it as if is a carboxyl
group. (Weissbach, 1976).
The
further functions are other functional groups with the formula –Z(O)mHn, or
–ZxHn. The subscripts “m,” and “n” are
any where from zero on up, depending on what Z's valence is. In the second formula “x” can be anywhere
from two on up, and “n” can also depend on “x” as well as Z’s valence. There are no oxygen-oxygen bonds in the
further functions, and Z is the atom that bonds to the main carbon
framework. Also Z cannot be a halogen,
or a bivalent chalcogen. The C-Z bond
also must be covalent. (Weissbach,
1976). Examples of the further functions
are –SO2H, -SO3H, and -NH2. The rule for
all free functional groups is that the carbon they are bonded to must not have
any further non-ring bonds to other non-metals, if Z is also non-metal. (Weissbach, 1976).
Once
a compound has been determined to be either a principal compound or a derivative,
it is classified. The first
classification of a compound is its ring class.
The three ring classes are acyclic, isocyclic, (or carbocyclic) and
heterocyclic. (Weissbach, 1976). Acyclic compounds contain no rings in their
structures, although they can have derivatives that do contain rings. D-Glucose is an example of a classified
acyclic compound with a ring.
Isocyclic
compounds contain at least one ring in their structure, but that ring must
consist of carbon atoms only. They can
contain any number of acyclic fragments, and any heteroatoms they contain
cannot be ring-members. (Weissbach,
1976). The fragment can have one ring or
a large system of rings.
The
heterocyclic compounds also have rings, but are allowed to contain heteroatoms
as ring-members. Like the isosyclic
compounds, the heterocyclic compounds can have one ring or many. They can also include acyclic and isocyclic
fragments in addition to their heterocyclic fragments. Heterocyclic compounds are further classified
by the types and numbers of heteroatoms they contain. Each heteroatom is given a number, depending
on its position in the periodic table.
Oxygen is ranked first, followed by the other chalcogens. The nitrogen group from smallest to largest
is next in the classification. The
classification moves to the left, and then down each column, with the heavy
transition metals being last.
(Weissbach, 1976). This is why
oxygen appears first in the heterocyclic Lawson Numbers, followed by nitrogen,
and then followed by other atoms. Lawson
Numbers do not specify ring atoms other than oxygen or nitrogen.
The
next classification of principal compounds is in regards to their functional
groups. The functional groups are first
ranked by the coupling atom, which is ranked using the same order as the
heterocyclic ring classes. That means
oxygen comes first in the classification of functional group coupling
atoms. The three oxygen groups are then ranked
by multiplicity, which is the number of carbon-oxygen bonds. That puts hydroxyl first. The oxo group is ranked second, followed by
carboxyl. The further functions are
ranked next, depending on what the coupling atom Z is. They are ranked according to multiplicity
index, which is one for –Z(O)mHn, and “x” for –ZxHn. They are ranked with the lowest numbers
first. They are then sub-ranked by the
ligation index for –Z(O)mHn, or the unsaturation index for -ZxHn. The ligation index is calculated by adding
all the hydrogens bonded to Z, all the OH’s bonded to Z, and double the lone
oxygens bonded to Z. The lone oxygens
are doubled because they are double bonded to Z. The unsaturation index is xv- 2x-n+1, where
“x “and “n” are the numbers in the formula, and “v” is Z’s valency. The next classification is the oxygen
index. It is the ligation index minus
the number of hydrogens. This leaves the
number of bonds to oxygen left over. The
last index for –ZxHn is the isomerism index, which refers to the longest
unbranched chain of Z. (Weissbach,
1976.). These rules are difficult to
understand but basically they mean that the number of oxygens is the next
classification of functional groups, followed by the branching. That means –SO2H comes before –SO3H, and –NH2
comes before –NHOH. This also puts
–NH-NH-NH2 before –N(NH2)2. The only
exception is –PH4 coming after –P(OH2), but that is because trivalent
phosphorus is ranked before pentavalent phosphorus. (Weissbach, 1976). The final classification of principal
compounds depends on the number of each functional group present.
The
next set of rules in the Beilstein system refers to derivatives. These are more important in understanding the
Lawson Number than the rules for principal compounds. While Lawson’s list of Lawson number
definitions makes sense when looking a principal compounds, looking at
derivatives requires an actual understanding of how the Beilstein system
classifies them. The first sets of
derivatives Beilstein deals with are the chalcogen analogs. In these compounds, the bivalent chalcogen
atom is treated as if it was an oxygen atom.
These compounds are examples.
Figure 1: Chalcogen Analogs

The next rule refers to
non-functional substituents. These are
treated like hydrogens unless the carbon they are bonded to is also bonded to
another type of functional group. That
means chloro-methane is considered a derivative of methane, while
chloro-methanol is not. Chloro-methanol
is derived in Figure 4. The next sets of
derivatives are derived by condensation.
That means the derivative is the product of a chemical reaction that
gives off water as the only other product.
Examples are esters, amides, and acid anhydrides. The next derivatives are obtained by
valency-changing addition. Only
halogens, bivalent chalcogens that are anular or bonded to two carbons, and
nitrogens that are either anular or bonded to at least two carbons can have
radicals added to them. This is an
example of valency changing addition.
The sulfur gets changed from bivalent to hexavalent.
Figure 2: Valency changing addition.

The next derivatives are charged
compounds created by adding carbocations or carbanions to compounds. These are added to heteroatoms, which have
either free electrons in the first case or a deficit of electrons in the
second. The next group of derivatives is
created when hydrogen atoms are pulled off.
This leaves behind a free radical.
The last sets of derivatives are salts.
These include oxonium ions, alkoxides, hydroxides, acetylene derivatives,
and organometallic salts. (Weissbach,
1976).
Beilstein
and Lawson differ on how they assign numbers.
Lawson gives a number for every significant fragment, while Beilstein
only gives a system number for the main fragment. The main fragment is the fragment whose
registry compound appears last in the file.
To assign places in Beilstein’s registry, they deconstruct the
compound. First, they deconstruct salts,
radicals, and charged compounds by reversing the rules that form those
derivatives. The next deconstruction
involves interior heteroatoms. That
means those atoms are not parts of rings and they are not terminal. At those points, the molecule is subjected to
hydrolysis. There are many rules
concerning the hydrolyzing of bonds.
For a compound that has an anular heteroatom bonded to another
heteroatom, the bond is broken by capping the anular heteroatom with hydrogen
and the non-anular heteroatom with a hydroxyl group. (Weissbach, 1976).
Figure 3: Hydrolysis between two heteroatoms.

Heteroatoms that are multiple-bonded to carbon are treated as
if they are derived from carbonyl groups.
Examples are imines being derived from ketones, and nitriles being
derived from carboxylic acids. The next
group involves carbon atoms that have bonds to more than one non-metal
heteroatom, unless all bonds are to non-functional substituents. If a compound has a non-functional group
bonded to the same carbon as another functional group, the compound is treated
as an oxo derivative.
Figure 4: A carbonyl
derivative
Other places to cleave bonds are between a bivalent chalcogen
atom and another heteroatom. Examples
here are peroxides, which get cleaved between the two oxygens, and –ONH2 compounds,
which get cleaved between the oxygen and the nitrogen. Also carbon-bound chains of heteroatoms get
cleaved from other different heteroatoms.
(Weissbach 1976). Weissbach then
states that Beilstein has a list that it follows, which determines which heteroatom
is capped with hydrogen and which is capped with hydroxyl. The list is long, and depends on position in
the periodic table, with top-right position starting the list. Smaller valencies get priority. After hydrolyzing all the relevant bonds in a
molecule, the non-functional substituents are replaced by hydrogen, and
bivalent chalcogens are replaced by oxygen.
(Weissbach, 1976). The hydrolysis
rule means that there are compounds that can appear to be cyclic, but they get
acyclic numbers. This one in Figure 5 is
an example. It gets cleaved between the
phosphorus and each of the two ring oxygens.
This leaves an acyclic fragment behind.
Since the phosphate group is an organic fragment, it does not get a
system number or a Lawson number.
Figure 5: A cyclic
derivative of acyclic compounds
Self-Condensing compounds such as sugars also appear as rings
but are really acyclic. The principal
compound for the following derivative in Figure 6 can be found by breaking the
bond that has an arrow drawn to it.
Carbon 1 (2:00 position) will then have an oxo group on it, and the
ring-oxygen will be capped with a hydrogen.
Figure 6: Glucose,
another cyclic derivative of an acyclic compound.
That is where the Lawson Number 1122 (hydroxyl-oxo with 6
total oxygens) comes from.
Once the compound has been
deconstructed, the chemically significant fragments receive Lawson
Numbers. If a compound has more than
one, it is known to be a derivative.
This is different from the Beilstein system where only one number is
assigned. That number, as well as the
compound’s place in the printed Beilstein comes from the fragment that has the
biggest number. Therefore, if a compound
has acyclic, isocyclic, and heterocyclic fragments, it will be placed with the
heterocyclic compounds.
Figure 7: A compound with three significant fragments.

For example, this compound has three Lawson numbers. The oxygens with arrows drawn to them are
where bonds get cleaved. The three resulting
fragments each get a Lawson number. In
this case, they are 1771(hydroxyl-acid, middle fragment), 298 (hydroxyl, right
fragment), and 289. (hydroxyl, left fragment).
Fragment 1771 would be the main fragment, and the one used to find this
compound in printed Beilstein. Another
example, isatin, only has one Lawson number.
It is 25776, which means a heterocyclic compound with one anular
nitrogen, which has two oxo groups attached to other points in the ring system. Isatin is a principal compound, and the
system number with the same definition as LN 25776 is the locator for isatin in
Beilstein.
Lawson
numbers are most effective when combined with other search terms. For example it can be combined with molecular
formula searches or excluded from molecular formula searches. The following query gave 108 substances.
Figure 8: A Beilstein
query with LN’s

The first Lawson number range means that the user wants the
oxygen in the formula needs to be part of an acyclic fragment. Doing just that does not get rid of all
answers that have rings though. The
second Lawson number in the search is the largest Lawson number an acyclic
fragment can have, and it was entered so no compound retrieved can have Lawson
numbers larger than that. That will
block the fragments where there are isocyclic 3-member and 4-member rings.
The Lawson
number can also be combined with Number of Elements. This is especially useful because of the
overlap in Lawson Number ranges. For
example, 3704 is in both the hydrazine + acid range and the diazene/azo
range. Since the first range has four
elements in it, and the second only has three, using an element count can limit
the hitset to only one of those ranges.
It can also
be combined with chemical names or name fragments. The phosphoric acid derivative in Figure 5
was found by combining “phosphoric” as a name segment with the Lawson Number
range 512-639. Another query combining
Lawson number 3632 with name segment “hydrazine” found five substances. These name fragment searches are best done if
the user has the Lawson number definitions.
The last use
of the Lawson number is in substructure searches. A Lawson number range can be combined with a
substructure using the “Not” operator.
Since the Lawson numbers represent structural fragments, this can keep
users from getting structures that they do not want. In the following search, two sets of Lawson
numbers are removed. The first removal
is of straight-chain alkyl groups as substituents on the marked oxygen. The next removal is of all Lawson numbers
that correspond to rings. This actually
was done on two lines. The first line of
the search was added because “not” searches cannot be done on the first line of
CrossFire’s search grid, even if there is a structure in the structure query
box. To make that search work, the “less
than or equal to” search is combined with a “not greater than” search. The “less than or equal to” search brought
back hits if at least one Lawson number met the criteria in the search. To block any range of Lawson number, a “not”
search has to be used.
Figure 9: A
substructure search with Lawson Numbers
This search was done in stages. The third line of the search makes a very
large difference. Before it was added,
there were over 17000 hits. After it was
added, the hits dropped down to about 3100.
With a search like this, the Lawson number is used to refine
hitsets.
Overall, the
Lawson number is a useful tool, but it is easier to use with a list of
definitions provided. With the
definitions, basic searches with Lawson Numbers can be used. To completely understand the Lawson number, a
basic understanding of Beilstein’s system is also needed, since the list of
definitions is based from unmodified functional groups. With the list of definitions available as
well as a basic understanding of Beilstein’s system, the Lawson number can be a
very powerful searching tool.
References
Lawson, Alexander J. The Lawson Number (LN): Offline Generation
and Online Use. In The Beilstein Online
Database: Implementation, Content and Retrieval; Stephen R. Heller, Ed.;
ACS Symposium Series 436; American Chemical Society:
Wiggins,
Weissbach, Oskar. The
Beilstein Guide: A Manual for the Use of Beilstein Handbuch der Organischen
Chemie; Springer-Verlag;