Browse Active Research Projects

Undergraduates can participate in projects for credits by registering in CS 4974 or 4994. Consult the Faculty Advisor or Research Supervisor before you register for this course.

Participation on a VTURCS project could also lead to an honors thesis for CS majors interested in graduating with honors.

Can't find anything that piques your curiousity? Don't be afraid to check out the Computer Science faculty list for someone who has a research interest you'd like to know more about. They might just have something for you.

T. M. Murali and Naren Ramakrishnan

Data Mining the WikiPedia

Faculty Advisor
T. M. Murali and Naren Ramakrishnan
Research Supervisor
T. M. Murali
Description of Work
The WikiPedia is rapidly emerging as a popular online encyclopaedia. This project poses several fundamental data mining questions about the WikiPedia. What is the link structure of the Wikipedia? How did this structure evolve over time? Can we decompose the Wikepedia automatically into topics? Can we computationally assign topics to Wikepedia articles by exploiting the link structure? This project can span two semesters and involve two or more students.
Application Instructions
Send CV to murali@cs.vt.edu
Project URL
http://wikipedia.org
Area(s) of Research
Databases, Data Mining
Compensation
Work for Credit
Contact
murali@cs.vt.edu
Nicholas F. Polys

Deep Media Blacksburg

Faculty Advisor
Nicholas F. Polys
Research Supervisor
Nicholas F. Polys
Description of Work
This collection, translation and integration of geospatial information is increasingly important for safety, development, transportation and policy. This independent study will examine the feasibility of data sources mashups from numerous sensor modalities including imagery, topography and LIDAR, weather and more. Using datasets for VT campus and Town of Blacksburg, we will integrate them into X3D-Earth [http://www.web3d.org/x3d-earth/]. The X3D Earth Working Group uses the Web architecture, XML languages, and open protocols to build a standards-based X3D Earth specification usable by governments, industry, scientists, academia, and the general public. X3D-Earth efforts encompass client-side, server-side, authoring, and conversion technologies. Credit will be assessed based on the following prototype and report: using real spatial contexts for regional data, the student will document and innovate the translation, integration and delivery issues in producing X3D-Earth Blacksburg.
Application Instructions
Email to setup a convenient time to discuss details.
Project URL
not online yet
Area(s) of Research
Computer-Aided Instruction, Digital Libraries, Human-Computer Interaction, Databases
Compensation
Work for Credit
Contact
npolys@vt.edu
Lenwood S. Heath

Deep, Personalized Searching

Faculty Advisor
Lenwood S. Heath
Research Supervisor
Lenwood S. Heath
Description of Work
Powerful keyword-based searching is available for the web (e.g., Google) and for scientific literature (e.g., Web of Science). However, a person searching for a very specific kind of resource may spend much effort on a search that ends in frustration due to a mismatch between keyword search and the semantics of her information resource needs.

The focus of this project is to incorporate semantic-based searching that is deep and time consuming, even leisurely. A search that takes 24 hours to find just the right resource(s) can be considered successful, as long as those 24 hours consist of automated effort only, while the person pursues other tasks and interests.

Keyword semantics will be obtained using word senses obtained through WordNet. A command-line user interface will launch semantic search algorithms that integrate keyword search (probably Google) and semantics. A relational database will be built to record search progress. Email notification of search milestones can be given. A prototype searcher will emphasize searching for a small list of high-quality tutorials on a precise topic specified by the user. Implementation will be under Linux or Mac OS X.

Application Instructions
If this description charges you up, then see Professor Heath during his office hours (available on his web site). Please bring a resume and transcript. A love of the subtleties of the English language is a definite plus.
Project URL
http://
Area(s) of Research
Databases, Data Mining, Knowledge, Artificial Intelligence
Compensation
Work for Credit or Volunteer
Contact
heath@vt.edu
Wu Feng

High-Performance Biological Sequence Search

Faculty Advisor
Wu Feng
Research Supervisor
Jeremy Archuleta
Description of Work
Biological sequence searching has become a fundamental aspect of all bioinformatics. It can help in tasks such as sequencing the human genome, designing pathogen signatures for pathogen detection, identifying unknown viruses (e.g., the virus now known as SARS), and so on. In this project, you will be coding different modules of part of a much larger project (i.e., mpiBLAST at http://www.mpiblast.org) in order to improve functionality, maintainability, and performance.
Application Instructions
E-mail a resume to feng@cs.vt.edu. Optional, but preferred, materials include unofficial undergraduate transcript and a brief one-paragraph statement of what interests you about this project.
Project URL
http://www.mpiblast.org/
Area(s) of Research
Bioinformatics, Parallel Computation, Software Engineering, Systems, Theory, Computational Biology, Databases, Data Mining, Artificial Intelligence
Compensation
Negotiable
Contact
feng@cs.vt.edu
Scott Midkiff

Investigating the Application of Pervasive Computing Concepts to Teaching and Learning

Faculty Advisor
Scott Midkiff
Research Supervisor
William (Bill) Plymale
Description of Work
The focus of this project is to learn how pervasive computing concepts and technologies can be used to enhance the areas of teaching, learning, and other university experiences. Pervasive computing concepts will be studied, and realized using hardware prototyping and development kits. Team-based projects will associate pervasive computing concepts with real-life student experiences at Virginia Tech. Sun Microsystem's SunSpots, Arduino controller and development environment, Crossbow and Sentilla/Moteiv motes, and the Processing programming system will be used for hands-on work.
Application Instructions
Please contact Bill Plymale (plymale@vt.edu) with an expression of interest. Include a current resume and/or a list of technical courses taken and any other relevant experiences.
Project URL
Area(s) of Research
Databases, Human-Computer Interaction, Networking, Systems
Compensation
Work for Credit
Contact
plymale@vt.edu
Lenwood S. Heath

Mining Plant Biology Papers to Identify Gene Functions (MineFun)

Faculty Advisor
Lenwood S. Heath
Research Supervisor
Lenwood S. Heath, Naren Ramakrishnan
Description of Work
There are 26,000+ genes in the model plant Arabidopsis thaliana, each of which have some biological function. Biological databases such as TAIR (http://www.arabidopsis.org) catalog all the genes, including their DNA sequences and putative functions. In many cases, the functional annotation of a gene given in the database is inaccurate or simply unknown. However, an accurate annotation can often be extracted from the scientific literature. To avoid the laborious manual process of reading thousands of papers, it is desirable to partially automate the extraction of annotation from literature. There are databases of scientific literature, including public databases such as PubMed (http://www.ncbi.nim.nih.gov/sites/entrez?db=PubMed) and AGRICOLA (http://agricola.nal.usda.gov/), in which the abstracts of thousands of papers are indexed and searchable. Moreover, the process of extracting relationships from text has previously been automated in the Snowball system (http://snowball.cs.columbia.edu/). This tool is not particularly targeted toward the needs of Arabidopsis gene annotation, but their methods are an excellent starting point for the MineFun project. In this project, we are building tools to data mine gene function information from scientific extracts. The resulting improved annotations will be of great benefit to plant biology. The interested student must have proficiency in multiple languages: at least one high-level language such as C/C++/Java and especially important for this project, Perl. Experience in text processing is desirable.
Application Instructions
Please see Dr. Heath or Dr. Ramakrishnan during their office hours. Send email to set up an appointment, if necessary.
Project URL
http://
Area(s) of Research
Bioinformatics, Data Mining, Databases
Compensation
Negotiable
Contact
heath@vt.edu
W. Feng

Supercomputing on Video Gaming Consoles

Faculty Advisor
W. Feng
Research Supervisor
Ashwin Aji
Description of Work
Given the extreme needs of today's sophisticated video games, game consoles and video graphics cards in their own right have become supercomputers. The goal of this project is to program *and* optimize a bioinformatics application (or perhaps something else, if reasonable) on the Sony PlayStation 3 and/or the NVIDIA Tesla video graphics card using the CUDA programming environment. (For those interested in human-computer interaction, a nice interface to the above bioinformatics application would serve as a nice project as well, or an interface to our existing codes.)
Application Instructions
E-mail a resume to feng@cs.vt.edu. Optional, but preferred, materials include unofficial undergraduate transcript and a brief one-paragraph statement of what interests you about this project.
Project URL
http://
Area(s) of Research
Bioinformatics, Human-Computer Interaction, Parallel Computation, Problem Solving Environments, Software Engineering, Systems, Theory, Databases, Data Mining, Knowledge
Compensation
Negotiable
Contact
feng@cs.vt.edu
Lenwood S. Heath

XcisClique

Faculty Advisor
Lenwood S. Heath
Research Supervisor
Lenwood S. Heath
Description of Work
The genome of an organism consists of DNA molecules (chromosomes) in every cell that encode information for the functioning of the cell. The genome is typically thought of as sequences over the chemical alphabet {A,C,G,T}. These sequences encode, among other things, the genes of the organism. In turn, genes carry the genetic codes for proteins. For a genetic code to result in a protein, the gene must be transcribed (copied) to a messenger RNA (mRNA) molecule, which later forms the template to translate into a protein. The transcription step is controlled by regulatory sequences embedded in the genomic sequence. If the gene is actually transcribed into mRNA, then the gene is said to be expressed.

XcisClique is a system that combines the analysis of genomic sequence, known regulatory sequences, and experimental data on gene expression to analyze the statistical significance of combinations (bicliques) of regulatory sequences and gene expression. It consists of local data resources in a relational database together with tools for analyzing sequences and bicliques. Currently, it only has the genome of a small model plant called Arabidopsis thaliana. Amrita Pati completed the current version in 2005, and she is still part of the research group.

Opportunities for Enhancements

(1) A very important genome that recently became available is that of rice. In addition, other organisms will become available over time that can benefit from the capabilities of XcisClique. Every organism has unique challenges related to putting it into a relational database. In other words, there are no standards for what must be included in a genome and in what format. The rice genome will be highly valuable to add to XcisClique, but it will take some effort.

(2) There are some time-consuming analyses that take too long to be done through the web interface. Instead, they are precomputed for a limited set of parameters and stored in a database. A research task is to develop and implement methods that eliminate precomputation and to enhance the web interface to support greater user capabilities.

(3) Certain functionalities of the XcisClique system could be made more efficient with appropriate enhancements to the code. Improving the running time for an analysis in the current system is another research task.

(4) The computational biology and bioinformatics (CBB) group is acquiring a database server so we can expand the size of the databases that are available through our web services. The rice genome is much larger than the Arabidopsis genome. And there is more gene expression data available on the web that could be integrated with the rest of the data.

(5) With enough data, one can imagine mining the database for biologically meaningful patterns. Tools available from Amrita and others can be used, or new mining tools based on specific needs can be developed.

Background Required

Knowledge of Perl and MATLAB is required. Knowledge of C++ is desirable but not essential. The current database is built on the Postgres platform, so knowledge of SQL will be helpful. The existing web-interface has been built using PHP and Perl.

Application Instructions
Visit Dr. Heath during his office hours to discuss your interest.
Project URL
https://bioinformatics.cs.vt.edu/xcisclique/
Area(s) of Research
Bioinformatics, Computational Biology, Databases
Compensation
Work for Credit
Contact
heath@vt.edu