Browse Active Research Projects

Undergraduates can participate in projects for credits by registering in CS 4974 or 4994. Consult the Faculty Advisor or Research Supervisor before you register for this course.

Participation on a VTURCS project could also lead to an honors thesis for CS majors interested in graduating with honors.

Can't find anything that piques your curiousity? Don't be afraid to check out the Computer Science faculty list for someone who has a research interest you'd like to know more about. They might just have something for you.

T. M. Murali

Automatic Class Discovery in Biological Data

Faculty Advisor
T. M. Murali
Research Supervisor
T. M. Murali
Description of Work
In spite of hundreds, if not thousands of years, of development in medicine, identification of complex diseases such as cancer, e.g., by viewing diseased cells under a microscope, still remains an art, to some extent. Over the last 10 years, DNA microarrays have opened up a promising avenue for this problem. A DNA microarray measures the expression levels (activities) of all the genes in a cell. Therefore, by taking DNA microarray samples from patients diagnosed with various diseases and comparing these measurements, it may be possible to put disease diagnosis on a solid molecular footing, where gene expression patterns define a molecular signature for a disease. This project will explore the use of biclustering algorithms to automatically discover diseases types and sub-classes. A bicluster isolates a subset of genes and a subset of samples with very coherent gene expression patterns. Thus, it is possible that these genes form a molecular signature for the disease associated with the samples. If no single disease is associated with the samples, the bicluster may point to a new disease or a new sub-class of an existing disease. There are three main aspects to this project: (i) Implement different biclustering algorithms already published in the literature. (ii) Develop and implement an automatic class discovery framework that uses the biclusters computed by the algorithms implemented in step (i). There is tremendous scope for innovation and new ideas in this aspect. (iii) Validate the class discovery methodology on actual gene expression and disease datasets. This project can involve two students.
Application Instructions
Send CV to murali@cs.vt.edu
Project URL
http://bioinformatics.cs.vt.edu/~murali/papers/xmotif-classifier/
Area(s) of Research
Computational Biology, Data Mining
Compensation
Work for Credit
Contact
murali@cs.vt.edu
T. M. Murali and Naren Ramakrishnan

Data Mining the WikiPedia

Faculty Advisor
T. M. Murali and Naren Ramakrishnan
Research Supervisor
T. M. Murali
Description of Work
The WikiPedia is rapidly emerging as a popular online encyclopaedia. This project poses several fundamental data mining questions about the WikiPedia. What is the link structure of the Wikipedia? How did this structure evolve over time? Can we decompose the Wikepedia automatically into topics? Can we computationally assign topics to Wikepedia articles by exploiting the link structure? This project can span two semesters and involve two or more students.
Application Instructions
Send CV to murali@cs.vt.edu
Project URL
http://wikipedia.org
Area(s) of Research
Databases, Data Mining
Compensation
Work for Credit
Contact
murali@cs.vt.edu
Lenwood S. Heath

Deep, Personalized Searching

Faculty Advisor
Lenwood S. Heath
Research Supervisor
Lenwood S. Heath
Description of Work
Powerful keyword-based searching is available for the web (e.g., Google) and for scientific literature (e.g., Web of Science). However, a person searching for a very specific kind of resource may spend much effort on a search that ends in frustration due to a mismatch between keyword search and the semantics of her information resource needs.

The focus of this project is to incorporate semantic-based searching that is deep and time consuming, even leisurely. A search that takes 24 hours to find just the right resource(s) can be considered successful, as long as those 24 hours consist of automated effort only, while the person pursues other tasks and interests.

Keyword semantics will be obtained using word senses obtained through WordNet. A command-line user interface will launch semantic search algorithms that integrate keyword search (probably Google) and semantics. A relational database will be built to record search progress. Email notification of search milestones can be given. A prototype searcher will emphasize searching for a small list of high-quality tutorials on a precise topic specified by the user. Implementation will be under Linux or Mac OS X.

Application Instructions
If this description charges you up, then see Professor Heath during his office hours (available on his web site). Please bring a resume and transcript. A love of the subtleties of the English language is a definite plus.
Project URL
http://
Area(s) of Research
Databases, Data Mining, Knowledge, Artificial Intelligence
Compensation
Work for Credit or Volunteer
Contact
heath@vt.edu
Wu Feng

High-Performance Biological Sequence Search

Faculty Advisor
Wu Feng
Research Supervisor
Jeremy Archuleta
Description of Work
Biological sequence searching has become a fundamental aspect of all bioinformatics. It can help in tasks such as sequencing the human genome, designing pathogen signatures for pathogen detection, identifying unknown viruses (e.g., the virus now known as SARS), and so on. In this project, you will be coding different modules of part of a much larger project (i.e., mpiBLAST at http://www.mpiblast.org) in order to improve functionality, maintainability, and performance.
Application Instructions
E-mail a resume to feng@cs.vt.edu. Optional, but preferred, materials include unofficial undergraduate transcript and a brief one-paragraph statement of what interests you about this project.
Project URL
http://www.mpiblast.org/
Area(s) of Research
Bioinformatics, Parallel Computation, Software Engineering, Systems, Theory, Computational Biology, Databases, Data Mining, Artificial Intelligence
Compensation
Negotiable
Contact
feng@cs.vt.edu
Lenwood S. Heath

Mining Plant Biology Papers to Identify Gene Functions (MineFun)

Faculty Advisor
Lenwood S. Heath
Research Supervisor
Lenwood S. Heath, Naren Ramakrishnan
Description of Work
There are 26,000+ genes in the model plant Arabidopsis thaliana, each of which have some biological function. Biological databases such as TAIR (http://www.arabidopsis.org) catalog all the genes, including their DNA sequences and putative functions. In many cases, the functional annotation of a gene given in the database is inaccurate or simply unknown. However, an accurate annotation can often be extracted from the scientific literature. To avoid the laborious manual process of reading thousands of papers, it is desirable to partially automate the extraction of annotation from literature. There are databases of scientific literature, including public databases such as PubMed (http://www.ncbi.nim.nih.gov/sites/entrez?db=PubMed) and AGRICOLA (http://agricola.nal.usda.gov/), in which the abstracts of thousands of papers are indexed and searchable. Moreover, the process of extracting relationships from text has previously been automated in the Snowball system (http://snowball.cs.columbia.edu/). This tool is not particularly targeted toward the needs of Arabidopsis gene annotation, but their methods are an excellent starting point for the MineFun project. In this project, we are building tools to data mine gene function information from scientific extracts. The resulting improved annotations will be of great benefit to plant biology. The interested student must have proficiency in multiple languages: at least one high-level language such as C/C++/Java and especially important for this project, Perl. Experience in text processing is desirable.
Application Instructions
Please see Dr. Heath or Dr. Ramakrishnan during their office hours. Send email to set up an appointment, if necessary.
Project URL
http://
Area(s) of Research
Bioinformatics, Data Mining, Databases
Compensation
Negotiable
Contact
heath@vt.edu
Wu Feng

Parallel Programming with Video Cards and More ...

Faculty Advisor
Wu Feng
Research Supervisor
Description of Work
The world of computing is now irrevocably parallel. CPUs have "topped" out roughly 3.0 GHz. So, while performance in the past has doubled roughly every 2 years due to increases in clock frequency, future performance increases will be due to the doubling of the number of cores in a system every 2 years. As such, we are looking at programming models, environments, and applications on multicore and manycore architectures. Of particular relevance and accessibility for VTURCS students are mapping applications onto traditional multicore (Intel and AMD), hybrid multicore (Cell and PlayStation3), manycore (video cards), and reconfigurable multicore (Tilera TILE64) architectures.
Application Instructions
E-mail a resume to feng@cs.vt.edu. Optional, but preferred, materials include unofficial undergraduate transcript and a brief one-paragraph statement of what interests you about this project.
Project URL
http://synergy.cs.vt.edu/
Area(s) of Research
Bioinformatics, Computational Biology, Data Mining, Human-Computer Interaction, Parallel Computation, Systems, Theory
Compensation
Negotiable
Contact
feng@cs.vt.edu
W. Feng

Supercomputing on Video Gaming Consoles

Faculty Advisor
W. Feng
Research Supervisor
Ashwin Aji
Description of Work
Given the extreme needs of today's sophisticated video games, game consoles and video graphics cards in their own right have become supercomputers. The goal of this project is to program *and* optimize a bioinformatics application (or perhaps something else, if reasonable) on the Sony PlayStation 3 and/or the NVIDIA Tesla video graphics card using the CUDA programming environment. (For those interested in human-computer interaction, a nice interface to the above bioinformatics application would serve as a nice project as well, or an interface to our existing codes.)
Application Instructions
E-mail a resume to feng@cs.vt.edu. Optional, but preferred, materials include unofficial undergraduate transcript and a brief one-paragraph statement of what interests you about this project.
Project URL
http://
Area(s) of Research
Bioinformatics, Human-Computer Interaction, Parallel Computation, Problem Solving Environments, Software Engineering, Systems, Theory, Databases, Data Mining, Knowledge
Compensation
Negotiable
Contact
feng@cs.vt.edu