
Undergraduates can participate in projects for credits by registering in CS 4974 or 4994. Consult the Faculty Advisor or Research Supervisor before you register for this course.
Participation on a VTURCS project could also lead to an honors thesis for CS majors interested in graduating with honors.
Can't find anything that piques your curiousity? Don't be afraid to check out the Computer Science faculty list for someone who has a research interest you'd like to know more about. They might just have something for you.








XcisClique is a system that combines the analysis of genomic sequence, known regulatory sequences, and experimental data on gene expression to analyze the statistical significance of combinations (bicliques) of regulatory sequences and gene expression. It consists of local data resources in a relational database together with tools for analyzing sequences and bicliques. Currently, it only has the genome of a small model plant called Arabidopsis thaliana. Amrita Pati completed the current version in 2005, and she is still part of the research group.
Opportunities for Enhancements
(1) A very important genome that recently became available is that of rice. In addition, other organisms will become available over time that can benefit from the capabilities of XcisClique. Every organism has unique challenges related to putting it into a relational database. In other words, there are no standards for what must be included in a genome and in what format. The rice genome will be highly valuable to add to XcisClique, but it will take some effort.
(2) There are some time-consuming analyses that take too long to be done through the web interface. Instead, they are precomputed for a limited set of parameters and stored in a database. A research task is to develop and implement methods that eliminate precomputation and to enhance the web interface to support greater user capabilities.
(3) Certain functionalities of the XcisClique system could be made more efficient with appropriate enhancements to the code. Improving the running time for an analysis in the current system is another research task.
(4) The computational biology and bioinformatics (CBB) group is acquiring a database server so we can expand the size of the databases that are available through our web services. The rice genome is much larger than the Arabidopsis genome. And there is more gene expression data available on the web that could be integrated with the rest of the data.
(5) With enough data, one can imagine mining the database for biologically meaningful patterns. Tools available from Amrita and others can be used, or new mining tools based on specific needs can be developed.
Background Required
Knowledge of Perl and MATLAB is required. Knowledge of C++ is desirable but not essential. The current database is built on the Postgres platform, so knowledge of SQL will be helpful. The existing web-interface has been built using PHP and Perl.