Purdue and  Genomics Purdue and  Genomics
     
 

Bioinformatic tools at the Genomics facility



EMBOSS

The EMBOSS package is a popular public domain suite of bioinformatics tools. See emboss.sourceforge.net for more information and click here for a list of programs. Emboss is our recommend sequence analysis package. It will run on Macs (OS-10) and Windows using the cygwin Unix-emulation program. Parts of Emboss may also run as native Windows programs. See the above link. The Genomics facility only supports Emboss running on our Sun/Solaris Unix servers.

Web interfaces

There are two possible web interfaces to EMBOSS. Click here for our recommended web interface (called EMBOSS GUI) which is well organized. Another popular interface is called Pise (pronounced 'peas'.) Click here for our Pise interface; this is not well organized.

Other web interfaces

The Agronomy department supports EMBOSS, Blast and other programs via a Bioteam/Inquiry Pise-based web interface. Their system consists of a cluster of Macintosh computers and thus can run large jobs. Send email to Brian Abernathy in order to obtain an account. Indiana University has a nice EMBOSS/GCG Pise-based bioinformatics portal which people at Purdue can use on a limited basis.

Command line interface

The command line is the best way to work with a large number of files. In particular any program that deals with directories of sequences will not be available via our web interfaces. Once you are logged into on of our computers via SSH or VNC then type 'wossname' to see a list of all programs or 'emnu' for a simple menu system.


BLAST and BLAT

BLAST is the very popular program that compares sequences to a database such as Genbank, RefSeq, etc. The Genomics facility has a
WebBlast interface. However unless you wish to use one of our specialized database the NCBI web blast runs much faster. If you want to run large jobs then you should use the command line program 'blastall'. See the EMBOSS section for more information on the command line.

Another very good comparative program that is often overlooked is called 'BLAT'. BLAT runs at least 5 times as fast as Blast however it may not pick up low homology. I.e., it is made for more exact comparisons than Blast. In most cases, though, people just want the more exact matches. The Genomics facility does not have a web interface to the Blat program. It is installed as command line program.


GCG package - version 10.2

The Purdue Genomics facility has not updated the GCG package since 2002. Support for GCG is limited. We suggest using the
Emboss suite of programs.

The GCG package, also kn
own as the "Wisconsin" package, is a suite of programs that has been in existence for many years. The scope of the package is very large; almost every bioprogram is available within it. However the package runs only on larger computers -- e.g., the Sun computers found at the Genomics center and not PCs or Macs -- which makes the package's user interface hard for PC and Mac people to understand at first. The web interface to the package can make this learning curve much easier and it is suggested that novice users utilize the web interface instead of the trying to use the command line interface. However the web interface does not provide access to all of the programs and therefore command line use may be required.

Web interface

Click here for the web interface to the GCG package. Java and JavaScript must be enabled for your web browser. Use your normal Silverjack Genomics account and password to log in. There are severe restrictions on the supported browsers and operating systems:

Windows 95/98/NT: Netscape 4.77 and Internet Explorer 5.5
Windows 2000/XP: Internet Explorer 5.5 (at least it seems to work, please report any problems)
Macintosh: Netscape 4.77 and Internet Explorer 5.5. The Java portions are unavailable.
Unix: Netscape 4.77

X-windows interface

There is an X-windows interface available. Once you have 'X' installed on your PC or Mac (Unix machines come with it automatically installed), then in order to configure your X-server to run the GCG X-interface known as 'wpi' set up the server to connect to silverjack.genomics.purdue.edu and run the program wpi.csh. The hummingbird program is easy to setup in this manner but see one of the sysadmins if you have problems.

Command line interface

You can also use the command line interface via logging into either Silverjack or Fermat via VNC or SSH.

Here is the online documentation in program by program form or organized by sections form.


VectorNTI

VectorNTI is a commerical package of bioinformatics programs that runs on PCs and Macs. See http://www.invitrogen.com/ for more information. VectorNTI is very graphical and powerful however it may not have the range of programs that Emboss contains. It also costs about $1000 per PC/Mac as an initial startup cost and around $300/year (2004 prices) thereafter. The price does seem to change yearly thus do not take the mentioned prices as the current ones. The web site contains a downloadable demonstration copy.

The Genomics facility's support of VectorNTI is limited to managing the group purchase of the package. We currently offer no support on how to use the program. If you are interested in purchased VectorNTI via the Genomics facility then get hold of Rick Westerman.



Other programs

We support a variety of other programs. If you want a different program installed please get hold of Rick Westerman. We are willing to host any bio-program that runs on Unix.

Most of our installed programs only run on the computer known as 'silverjack'. The documentation for these programs tends to be skimpy and incomplete. Here are references to most of them.

t_coffee is an multiple sequence alignment program. It is perhaps the best of the bunch. Certainly it is better than the 'pileup' program in the GCG package. Documentation in various formats. Try t_coffee -output=gcg -infile=your_fasta_file followed by prettybox your_output_file{*} as a quick start.

The University of Washington assembly package consists of several programs. These are documented at www.phrap.org. Plus we have some local copies of the documentation including the phred base caller and the associated sequence viewer consed -- the README for Consed version 11 is here.

TIGR assembler.

LUCY program (in PDF format so you need the Adobe Acrobat reader)

Staden package.

'cap3' and 'formcon' assembly programs.

miropeats is a program that discovers regions of sequence similarity and then displays this graphically.

The GASP [PDF format] sequence assembly package from the University of Washington. This program processes raw trace files from a sequencer. See also the GASP home page.

DOTTER: a program to do dot plots. The GCG package can also do dot plots but DOTTER handles more generic files.

VTrace reads both ABI and SCF chromatograms, automatically detecting the file type, and displays/prints the traces.

MSPcrunch can post-process Blast files. The ACT program will visually compare two whole genome sequences; this is in conjuction with the MSPcrunch program. The Unix version of ACT is already installed on Silverjack: type in 'act' and ignore the font warnings; you do need to have your X-windows display set up properly. There are PC and Mac versions available as well. Both of these programs are from Sanger ( www.sanger.ac.uk/Software/ACT). You may need install a Java interpreter on your PC/Mac. The above link will lead you to a place where you can do this.

LOOK (also known as GeneMine) integrates the study of protein sequence and structure to help you understand function. LOOK performs sequence alignment, site-directed mutagenesis predictions, evolutionary conservation analyses, structural analyses, and residue-specific literature searches. To run LOOK start up an X-windows session (either via VNC or an X-windows server) and, at the command line, type 'xlook'. There is on-line help, a PDF-formatted manual and an offsite web site.


 
 
Purdue University Griffin with web link to Purdue University
 
Home Link