C371 Chemical Informatics I Projects and Papers

Updated: 8 November 2004


The following projects have been selected:

Elizabeth Adams: de novo Drug Design paper
Philip Bidwell: Protein Folding paper
Matt Carrico: Proteomics paper
Patrick Coffman: application of 3D QSAR (CoMFA) for the design of new drugs paper
Tasia Pyburn: de novo Drug Design paper
Manuel Rodriguez: Chemistry Development Kit (CDK)
Matt Spraker: Chem TK Lite
Daniel Vu: Project work with Dr. Mookie Baik


In addition to the options above, it might be acceptable to write a paper on one of the following topics (these are larger topics used at the University of Sheffield in conjunction with their MS program):


In past years, C371 students have worked on the following projects:

I. Digitizing and Indexing Groth's Chemische Krystallographie.

This old five-volume German set is in the IUB Chemistry Library Reference Area with call number QD951.G8. It has recently been found to be a useful resource by organic chemists who are investigating inclusion compounds. Since the work was published prior to 1923, there is no copyright restriction on scanning and placing it on the Web.

Participants must choose an appropriate format and scan the material in such a manner as to allow for easy retrieval on the Web. Consideration must be given to file size and download times, as well as indexing.

The work itself, of course, is in German, so one of our tasks will be to determine English IUPAC names, CAS Registry Numbers, and other appropriate chemical identifiers for the substances. SMILES codes are to be constructed for each.

This is a long-term project that will likely stretch over more than one C371 class, so we must make judicious choices on the approach to the creation of this digital library item. Thus, consultation with people in and use of the resources developed by the IU Libraries/UITS Digital Library Program is advisable.

II. Reciprocal Net and the Common Molecules Project

The IUB Molecular Structure Center has for several years been developing a project known as Reciprocal Net. In particular, the Common Molecules portion of the project is one that could use the assistance of C371 students.

III. Importing Chemical Data Into Excel or Bibliography Manager Software

Large databases such as Beilstein or Chemical Abstracts have prodigious amounts of data and many options for retrieval of the data. However, chemists often want to massage the data and/or reuse them for other purposes. In this project, the students will learn to use effectively a standard spreadsheet program (Microsoft Excel) and one of three general software packages (EndNote, ProCite, or Reference Manager) to import data from commercial chemical databases and the Web.

Some of the major database vendors have developed ways of effectively downloading data for importing to other packages, but those do not always work well for chemical data. Your task will be to document the procedures for retrieving and importing the data and to see which, if any, of the standard software packages allow the importing of structure data that can subsequently be displayed as chemical structures.

IV. A Prototype System for Reformatting CA File Data

Past SLIS students have attempted to extract data from a search of Chemical Abstracts and run it through a Perl script that in effect punctuates the records into The ACS Style Guide format.

Copyright 2003
Gary Wiggins