
Undergraduates can participate in projects for credits by registering in CS 4974 or 4994. Consult the Faculty Advisor or Research Supervisor before you register for this course.
Participation on a VTURCS project could also lead to an honors thesis for CS majors interested in graduating with honors.
Can't find anything that piques your curiousity? Don't be afraid to check out the Computer Science faculty list for someone who has a research interest you'd like to know more about. They might just have something for you.























The focus of this project is to incorporate semantic-based searching that is deep and time consuming, even leisurely. A search that takes 24 hours to find just the right resource(s) can be considered successful, as long as those 24 hours consist of automated effort only, while the person pursues other tasks and interests.
Keyword semantics will be obtained using word senses obtained through WordNet. A command-line user interface will launch semantic search algorithms that integrate keyword search (probably Google) and semantics. A relational database will be built to record search progress. Email notification of search milestones can be given. A prototype searcher will emphasize searching for a small list of high-quality tutorials on a precise topic specified by the user. Implementation will be under Linux or Mac OS X.










Machines needed: any number from 1 to infinity.
Calendar time required: any amount, to infinity.
A typical run would be to attempt factorization
of any (or some set) of: F12, F13, F14, F15,
F16, F17, F18, F19, F20, F21, F22 with the
candidates from F14 and up being most efficiently
attacked by the proposed executable.
Fermat numbers remain shrouded in mystery. For
example we know that F14 (the fourteenth Fermat
number) is composite---that is,not prime---yet
not a single prime factor of F14 has ever been
found. Similarly, F18 has two known prime
factors (see table below showing
state-of-knowledge on Fermat numbers through F_24)
so far.
There is C code ("fermat.c") developed at Apple's
Advanced Computation Group and elsewhere over the
last decade. This code combines fast-FFT methods
and elliptic-curve methods, to attack large
Fermat numbers. The code is suitable for
powerful machine clusters (i.e., the Mac System X
terascale computer), and is easy to port.
STATUS OF FERMAT NUMBERS (Nov 2003):
F0-F4: prime
F5-F11: completely factored
F12 = 114689 * 26017793 * 63766529 *
190274191361 * 1256132134125569 * composite
F13 = 2710954639361 * 2663848877152141313 *
3603109844542291969 * 319546020820551643220672513 *
composite
F14 = composite
F15 = 1214251009 * 2327042503868417 *
168768817029516972383024127016961 * composite
F16 = 825753601 * 188981757975021318420037633 *
composite
F17 = 31065037602817 * composite
F18 = 13631489 * 81274690703860512587777 * composite
F19 = 70525124609 * 646730219521 * composite
F20 = composite
F21 = 4485296422913 * composite
F22 = composite
This project would be a joint collaboration with
R. E. Crandall
Apple Distinguished Scientist
Advanced Computation Group
crandall@apple.com






The aim of this project is to design a radically new kind of "file system" (actually a human memory mirror) in analogy to human memory. A prototype that runs under Linux should be implemented as a proof of concept.





































XcisClique is a system that combines the analysis of genomic sequence, known regulatory sequences, and experimental data on gene expression to analyze the statistical significance of combinations (bicliques) of regulatory sequences and gene expression. It consists of local data resources in a relational database together with tools for analyzing sequences and bicliques. Currently, it only has the genome of a small model plant called Arabidopsis thaliana. Amrita Pati completed the current version in 2005, and she is still part of the research group.
Opportunities for Enhancements
(1) A very important genome that recently became available is that of rice. In addition, other organisms will become available over time that can benefit from the capabilities of XcisClique. Every organism has unique challenges related to putting it into a relational database. In other words, there are no standards for what must be included in a genome and in what format. The rice genome will be highly valuable to add to XcisClique, but it will take some effort.
(2) There are some time-consuming analyses that take too long to be done through the web interface. Instead, they are precomputed for a limited set of parameters and stored in a database. A research task is to develop and implement methods that eliminate precomputation and to enhance the web interface to support greater user capabilities.
(3) Certain functionalities of the XcisClique system could be made more efficient with appropriate enhancements to the code. Improving the running time for an analysis in the current system is another research task.
(4) The computational biology and bioinformatics (CBB) group is acquiring a database server so we can expand the size of the databases that are available through our web services. The rice genome is much larger than the Arabidopsis genome. And there is more gene expression data available on the web that could be integrated with the rest of the data.
(5) With enough data, one can imagine mining the database for biologically meaningful patterns. Tools available from Amrita and others can be used, or new mining tools based on specific needs can be developed.
Background Required
Knowledge of Perl and MATLAB is required. Knowledge of C++ is desirable but not essential. The current database is built on the Postgres platform, so knowledge of SQL will be helpful. The existing web-interface has been built using PHP and Perl.