Browse Active Research Projects
Undergraduates can participate in projects for credits by registering in CS 4974 or 4994. Consult the Faculty Advisor or Research Supervisor before you register for this course.
Participation on a VTURCS project could also lead to an honors thesis for CS majors interested in graduating with honors.
Can't find anything that piques your curiousity? Don't be afraid to check out the Computer Science faculty list for someone who has a research interest you'd like to know more about. They might just have something for you.
Automatic Class Discovery in Biological Data
Faculty Advisor
- T. M. Murali
Research Supervisor
- T. M. Murali
Description of Work
- In spite of hundreds, if not thousands of years, of development in medicine, identification of complex diseases such as cancer, e.g., by viewing diseased cells under a microscope, still remains an art, to some extent. Over the last 10 years, DNA microarrays have opened up a promising avenue for this problem. A DNA microarray measures the expression levels (activities) of all the genes in a cell. Therefore, by taking DNA microarray samples from patients diagnosed with various diseases and comparing these measurements, it may be possible to put disease diagnosis on a solid molecular footing, where gene expression patterns define a molecular
signature for a disease.
This project will explore the use of biclustering algorithms to automatically discover diseases types and sub-classes. A bicluster isolates a subset of genes and a subset of samples with very coherent gene expression patterns. Thus, it is possible that these genes form a molecular signature for the disease associated with the samples. If no single disease is associated with the samples, the bicluster may point to a new disease or a new sub-class of an existing disease.
There are three main aspects to this project:
(i) Implement different biclustering algorithms already published in the
literature.
(ii) Develop and implement an automatic class discovery framework that uses the biclusters computed by the algorithms implemented in step (i). There is tremendous scope for innovation and new ideas in this aspect.
(iii) Validate the class discovery methodology on actual gene expression
and disease datasets.
This project can involve two students.
Application Instructions
- Send CV to murali@cs.vt.edu
Project URL
- http://bioinformatics.cs.vt.edu/~murali/papers/xmotif-classifier/
Area(s) of Research
- Computational Biology, Data Mining
Compensation
- Work for Credit
Contact
- murali@cs.vt.edu
Data Mining the WikiPedia
Faculty Advisor
- T. M. Murali and Naren Ramakrishnan
Research Supervisor
- T. M. Murali
Description of Work
- The WikiPedia is rapidly emerging as a popular online encyclopaedia. This project poses several fundamental data mining questions about the WikiPedia. What is the link structure of the Wikipedia? How did this structure evolve over time? Can we decompose the Wikepedia automatically
into topics? Can we computationally assign topics to Wikepedia articles by exploiting the link structure?
This project can span two semesters and involve two or more students.
Application Instructions
- Send CV to murali@cs.vt.edu
Project URL
- http://wikipedia.org
Area(s) of Research
- Databases, Data Mining
Compensation
- Work for Credit
Contact
- murali@cs.vt.edu
Deep, Personalized Searching
Faculty Advisor
- Lenwood S. Heath
Research Supervisor
- Lenwood S. Heath
Description of Work
- Powerful keyword-based searching is available for the web (e.g., Google) and for scientific literature (e.g., Web of Science). However, a person searching for a very specific kind of resource may spend much effort on a search that ends in frustration due to a mismatch between keyword search and the semantics of her information resource needs.
The focus of this project is to incorporate semantic-based searching that is deep and time consuming, even leisurely. A search that takes 24 hours to find just the right resource(s) can be considered successful, as long as those 24 hours consist of automated effort only, while the person pursues other tasks and interests.
Keyword semantics will be obtained using word senses obtained through WordNet.
A command-line user interface will launch semantic search algorithms that integrate keyword search (probably Google) and semantics. A relational database will be built to record search progress. Email notification of search milestones can be given. A prototype searcher will emphasize searching for a small list of high-quality tutorials on a precise topic specified by the user. Implementation will be under Linux or Mac OS X.
Application Instructions
- If this description charges you up, then see Professor Heath during his office hours (available on his web site). Please bring a resume and transcript. A love of the subtleties of the English language is a definite plus.
Project URL
- http://
Area(s) of Research
- Databases, Data Mining, Knowledge, Artificial Intelligence
Compensation
- Work for Credit or Volunteer
Contact
- heath@vt.edu
High-Performance Biological Sequence Search
Faculty Advisor
- Wu Feng
Research Supervisor
- Jeremy Archuleta
Description of Work
- Biological sequence searching has become a fundamental aspect of all bioinformatics. It can help in tasks such as sequencing the human genome, designing pathogen signatures for pathogen detection, identifying unknown viruses (e.g., the virus now known as SARS), and so on. In this project, you will be coding different modules of part of a much larger project (i.e., mpiBLAST at http://www.mpiblast.org) in order to improve functionality, maintainability, and performance.
Application Instructions
- E-mail a resume to feng@cs.vt.edu. Optional, but preferred, materials include unofficial undergraduate transcript and a brief one-paragraph statement of what interests you about this project.
Project URL
- http://www.mpiblast.org/
Area(s) of Research
- Bioinformatics, Parallel Computation, Software Engineering, Systems, Theory, Computational Biology, Databases, Data Mining, Artificial Intelligence
Compensation
- Negotiable
Contact
- feng@cs.vt.edu
Mining Plant Biology Papers to Identify Gene Functions (MineFun)
Faculty Advisor
- Lenwood S. Heath
Research Supervisor
- Lenwood S. Heath, Naren Ramakrishnan
Description of Work
- There are 26,000+ genes in the model plant Arabidopsis thaliana, each of
which have some biological function. Biological databases such as TAIR
(http://www.arabidopsis.org) catalog all the genes, including their DNA
sequences and putative functions. In many cases, the functional
annotation of a gene given in the database is inaccurate or simply
unknown. However, an accurate annotation can often be extracted from the
scientific literature. To avoid the laborious manual process of reading
thousands of papers, it is desirable to partially automate the extraction
of annotation from literature.
There are databases of scientific literature, including public databases
such as PubMed (http://www.ncbi.nim.nih.gov/sites/entrez?db=PubMed) and
AGRICOLA (http://agricola.nal.usda.gov/), in which the abstracts of
thousands of papers are indexed and searchable. Moreover, the process of
extracting relationships from text has previously been automated in the
Snowball system (http://snowball.cs.columbia.edu/). This tool is not
particularly targeted toward the needs of Arabidopsis gene annotation, but
their methods are an excellent starting point for the MineFun project.
In this project, we are building tools to data mine gene function
information from scientific extracts. The resulting improved annotations
will be of great benefit to plant biology.
The interested student must have proficiency in multiple languages: at
least one high-level language such as C/C++/Java and especially important
for this project, Perl. Experience in text processing is desirable.
Application Instructions
- Please see Dr. Heath or Dr. Ramakrishnan during their office hours. Send email to set up an appointment, if necessary.
Project URL
- http://
Area(s) of Research
- Bioinformatics, Data Mining, Databases
Compensation
- Negotiable
Contact
- heath@vt.edu
Parallel Programming with Video Cards and More ...
Faculty Advisor
- Wu Feng
Research Supervisor
Description of Work
- The world of computing is now irrevocably parallel. CPUs have "topped" out roughly 3.0 GHz. So, while performance in the past has doubled roughly every 2 years due to increases in clock frequency, future performance increases will be due to the doubling of the number of cores in a system every 2 years.
As such, we are looking at programming models, environments, and applications on multicore and manycore architectures. Of particular relevance and accessibility for VTURCS students are mapping applications onto traditional multicore (Intel and AMD), hybrid multicore (Cell and PlayStation3), manycore (video cards), and reconfigurable multicore (Tilera TILE64) architectures.
Application Instructions
- E-mail a resume to feng@cs.vt.edu. Optional, but preferred, materials include unofficial undergraduate transcript and a brief one-paragraph statement of what interests you about this project.
Project URL
- http://synergy.cs.vt.edu/
Area(s) of Research
- Bioinformatics, Computational Biology, Data Mining, Human-Computer Interaction, Parallel Computation, Systems, Theory
Compensation
- Negotiable
Contact
- feng@cs.vt.edu
Supercomputing on Video Gaming Consoles
Faculty Advisor
- W. Feng
Research Supervisor
- Ashwin Aji
Description of Work
- Given the extreme needs of today's sophisticated video games, game consoles and video graphics cards in their own right have become supercomputers. The goal of this project is to program *and* optimize a bioinformatics application (or perhaps something else, if reasonable) on the Sony PlayStation 3 and/or the NVIDIA Tesla video graphics card using the CUDA programming environment. (For those interested in human-computer interaction, a nice interface to the above bioinformatics application would serve as a nice project as well, or an interface to our existing codes.)
Application Instructions
- E-mail a resume to feng@cs.vt.edu. Optional, but preferred, materials include unofficial undergraduate transcript and a brief one-paragraph statement of what interests you about this project.
Project URL
- http://
Area(s) of Research
- Bioinformatics, Human-Computer Interaction, Parallel Computation, Problem Solving Environments, Software Engineering, Systems, Theory, Databases, Data Mining, Knowledge
Compensation
- Negotiable
Contact
- feng@cs.vt.edu