C571 Chemical Information Technology Projects
Updated: 6 October 2003
The names of the students have been added to the projects they have
chosen. Please check to be sure that I have your name at the appropriate place.
Students who wish to work on projects other than those listed below must first
clear them with Dr. Milosevich or Dr. Wiggins. Likewise, if you want to
substitute a research paper for the project, you should clear the topic with
one of the instructors before beginning the research.
Efforts on 3D Cell Modeling: Jianmiao Fan (IUB)
Metadata, XML, CML: Spencer Lerch (IUPUI), Xiang Zhou (IUB),
Jianyong Zhu (IUB)
Using a well-known format for images, EXIF 2.2-compliant JPEG, how can we enhance the information associated
with a molecular image so that it is not only viewable, but searchable by appropriate
Protein Folding (review article) David Shaw (IUPUI),
Jain Manojkumardhoka (IUPUI), Mandi Zins (IUPUI), Joey Strausburg (IUPUI),
Abdulkadir Genel (IUB), Robert Hollo (IUB), Kent Krieg (IUB), Paul Myrda (IUB)
What is the current state of the art in protein folding? (Broad or deep coverage, but not both)
Given motion pictures of molecular dynamics trajectories, how is the data behind the
visual image mined? Most do not treat the trajectory file
as a database, but as frames in a movie.
De novo Drug Design (review article): Allyson Novotny (IUPUI),
JoAnna Hernandez (IUPUI), Ryan Denton (IUPUI), Tulay Ercanli (IUPUI)
What specific progress can we claim with de novo drug design at this point in time?
Daylight Chemical Information System Software
Learn to use the software and give a classroom presentation on its main features.
Bioinformatics Software and Databases from Commercial Companies
What aspects of commercial software (i.e., not free and publicly available on the Web)
offer significant enhancements to the features that can be had from the free options?
Other Potential Projects/Review Articles:
- Investigation of the Chemistry Development Kit (CDK), an open-source
Java library for chemo- and bioinformatics
The CDK is described in an article by Christoph Steinbeck and others
in the Journal of Chemical Information and Computer
Sciences, 2003, 43, 493-500. Demonstrate how the CDK implements tasks such
as 2D and 3D rendering of chemical structures, SMILES parsing and generation,
ring searches, isomorphism checking, structure diagram generation, etc.
- Investigation of OpenEye software and comparison with the Daylight Toolkit
The OEChem beta software is loaded in Informatics at IUB on Xavier in the publicly accessible
directory /usr/local/oechem/. There are C++ and Python libraries, example code, and documentation there.
At the present time, Daylight is only at IUB, but it is possible that we can get it loaded at IUPUI also.
- Investigation of the appropriateness of the Expereact Web program to
maintain a local database of chemical products that can be shared over the Internet
is the site where the free copy of Expereact WEB can be obtained. How does it relate to MolBank and
the Budapest Open Access Initiative for publishing journals?
- OpenBabel vs. CLiDE vs. VEGA vs. Kekule as tools for converting between chemical file formats
Babel and CLiDE have features to recognize and/or interconvert common file formats for
coding chemical structures. For information on OpenBabel, see
http://openbabel.sourceforge.net. See Henry Rzepa's August 15, 2003 postin on chemweb for compiled
versions of OpenBabel for MacOS X, SGI (Irix 6.5), Linux (Redhat 8), and Windows versions.
Information on the CLiDE Lite program, available for Windows,
can be found at http://www.simbiosys.ca. Information on
VEGA is found at http://users.unimi.it/~ddl Kekule can be
found at http://www.sigmaaldrich.com. You'll need to:
select "Advanced Search," then under "Other Search Options," choose
"Sub Structure Search". On that page there is an option to dowload Kekule.
- RSS and blogging: applications to Chemistry
See "Towards the Chemical Semantic Web: An Introduction to RSS," by Peter Murray-Rust and Henry Rzepa at:
Also, Rzepa's postings on April 29, June 13, and June 27, 2003 on:
- Comparison of the ChemWeb.com Compound Search at
and NLM's structure searching site at:
- Evaluation of chemistry software for Linux: Thaddeus Jones (IUB)
Two sources of information are:
In addition to the options above, it might be acceptable to write a paper on
one of the following topics (these are larger topics used at the University of
Sheffield in conjunction with their MS program):
These topics must be cleared with the instructors before you begin the work.
- Describe the applications of graph theory to the processing of chemical structure databases.
- What factors must be taken into account when designing a screening system for chemical substructure searching.
- Discuss the application of similarity searching in the discovery of novel bioactive compounds.
- Describe the services and systems currently available for the retrieval of information from chemical patents.
- Discuss the application of either classical QSAR (Hansch analysis) or 3D QSAR (CoMFA) to the design of new drugs.
- Describe the mode of operation of either: docking programs; or de novo ligand design programs.
- Discuss the application of compound selection strategies and molecular diversity in combinatorial chemistry.
- In relation to nucleic acid and protein sequence databases, detail the theory and application of sequence alignment.
- Give an overview of the significance of database interoperation to biologists. What practical difficulties are involved and how are these being addressed?