Bioinformatic tools at the Genomics facility
EMBOSS
The EMBOSS package is a popular public domain suite of bioinformatics tools. See emboss.sourceforge.net for more information and click here for a list of programs. Emboss is our recommend sequence analysis package. It will run on Macs (OS-10) and Windows using the cygwin Unix-emulation program. Parts of Emboss may also run as native Windows programs. See the above link. The Genomics facility only supports Emboss running on our Sun/Solaris Unix servers.
Web interfaces
There are two possible web interfaces to EMBOSS. Click here for our recommended web interface (called EMBOSS GUI) which is well organized. Another popular interface is called Pise (pronounced 'peas'.) Click here for our Pise interface; this is not well organized.
Other web interfaces
The Agronomy department supports EMBOSS, Blast and other programs via a Bioteam/Inquiry Pise-based web interface. Their system consists of a cluster of Macintosh computers and thus can run large jobs. Send email to Brian Abernathy in order to obtain an account. Indiana University has a nice EMBOSS/GCG Pise-based bioinformatics portal which people at Purdue can use on a limited basis.
Command line interface
The command line is the best way to work with a large number of files. In particular any program that deals with directories of sequences will not be available via our web interfaces. Once you are logged into on of our computers via SSH or VNC then type 'wossname' to see a list of all programs or 'emnu' for a simple menu system.
BLAST and BLAT
BLAST is the very popular program that compares sequences to a database such as Genbank, RefSeq, etc. The Genomics facility has a WebBlast interface. However unless you wish to use one of our specialized database the NCBI web blast runs much faster. If you want to run large jobs then you should use the command line program 'blastall'. See the EMBOSS section for more information on the command line.
Another very good comparative program that is often overlooked is called 'BLAT'. BLAT runs at least 5 times as fast as Blast however it may not pick up low homology. I.e., it is made for more exact comparisons than Blast. In most cases, though, people just want the more exact matches. The Genomics facility does not have a web interface to the Blat program. It is installed as command line program.
GCG package - version 10.2
The Purdue Genomics facility has not updated the GCG package since 2002. Support for GCG is limited. We suggest using the Emboss suite of programs.
The GCG package, also known as the "Wisconsin" package,
is a suite of programs that has been in existence for many years.
The scope of the package is very large; almost every bioprogram is available
within it. However the package runs only on larger computers -- e.g.,
the Sun computers found at the Genomics center and not PCs or Macs -- which makes the package's
user interface hard for PC and Mac people to understand at first.
The web interface to the package can make this learning
curve much easier and it is suggested that novice users utilize the web
interface instead of the trying to use the command line interface. However the web interface does not provide access to all of the programs and therefore command line use may be required.
Web interface
Click here for the web
interface to the GCG package.
Java and JavaScript must be enabled
for your web browser. Use your normal Silverjack Genomics account
and password to log in. There are severe restrictions on the supported
browsers and operating systems:
Windows 95/98/NT: Netscape 4.77 and Internet
Explorer 5.5
Windows 2000/XP: Internet Explorer 5.5 (at least it seems to
work, please report any problems)
Macintosh: Netscape 4.77 and Internet Explorer 5.5. The
Java portions are unavailable. Unix: Netscape 4.77
X-windows interface
There is an X-windows interface available. Once you have 'X'
installed on your PC or Mac (Unix machines come with it automatically
installed), then in order to configure your X-server to run the GCG X-interface
known as 'wpi' set up the server to connect to silverjack.genomics.purdue.edu
and run the program wpi.csh. The hummingbird program is easy
to setup in this manner but see one of the sysadmins
if you have problems.
Command line interface
You can also use the command line interface via logging into either
Silverjack or Fermat via VNC or SSH.
Here is the online documentation in program
by program form or organized by sections
form.
VectorNTIVectorNTI is a commerical package of bioinformatics programs that runs on PCs and Macs. See http://www.invitrogen.com/
for more information. VectorNTI is very graphical and powerful however it may not have the range of programs that Emboss contains. It also costs about $1000 per PC/Mac as an initial startup cost and around $300/year (2004 prices) thereafter. The price does seem to change yearly thus do not take the mentioned prices as the current ones. The web site contains a downloadable demonstration copy.
The Genomics facility's support of VectorNTI is limited to managing the group purchase of the package. We currently offer no support on how to use the program. If you are interested in purchased VectorNTI via the Genomics facility then get hold of Rick Westerman.
Other programs
We support a variety of other programs. If you want a different program
installed please get hold of Rick Westerman.
We are willing to host any bio-program
that runs on Unix.
Most of our installed programs only run on the computer known as 'silverjack'. The documentation
for these programs tends to be skimpy and incomplete. Here are references
to most of them.
t_coffee is an multiple sequence alignment program. It is perhaps the
best of the bunch. Certainly it is better than the 'pileup' program in the GCG package.
Documentation in various formats.
Try t_coffee -output=gcg -infile=your_fasta_file followed by prettybox your_output_file{*} as a quick start.
The University of Washington assembly package consists of several programs.
These are documented at www.phrap.org.
Plus we have some local copies of the documentation including the phred
base caller and the associated sequence viewer consed
--
the README for Consed version 11 is here.
TIGR
assembler.
LUCY
program (in PDF format so you need the Adobe Acrobat reader)
Staden
package.
'cap3'
and 'formcon' assembly programs.
miropeats is a
program that discovers regions of sequence similarity and then displays
this graphically.
The GASP
[PDF format] sequence assembly package from the University of Washington.
This program processes raw trace files from a sequencer. See also the
GASP
home page.
DOTTER:
a program to do dot plots. The GCG package can also do dot plots but DOTTER
handles more generic files.
VTrace
reads both ABI and SCF chromatograms, automatically detecting the file
type, and displays/prints the traces.
MSPcrunch can post-process Blast files.
The ACT program will visually compare two
whole genome
sequences; this is in conjuction with the MSPcrunch program. The Unix version
of ACT is already installed on Silverjack: type in 'act' and ignore the font
warnings; you do need to have your X-windows display set up properly.
There are PC and Mac versions
available as well. Both of these
programs are from Sanger
(
www.sanger.ac.uk/Software/ACT). You may need install a Java interpreter on your PC/Mac. The above link will lead you to a place where you can do this.
LOOK (also known as GeneMine) integrates the study of protein sequence and structure to help
you understand function. LOOK performs sequence alignment, site-directed
mutagenesis predictions, evolutionary conservation analyses, structural
analyses, and residue-specific literature searches. To run LOOK start
up an X-windows session (either via VNC or an X-windows
server) and, at the command line, type 'xlook'.
There is on-line help, a PDF-formatted
manual and an
offsite web site.
|