By Phillip SanMiguel
of the Genomics Core Facility
Affymetrix has recently decided to provide their main GeneChip analysis software at no charge. This is the new "GeneChip Operating Software", GCOS, which folds the old "Micro Array Suite"(MAS) and "Micro DB" into a single program. Previously this program would have sold for something like $6000 a license. Currently it is free to download and can be run on PCs running Windows 2000. (It may also run on Windows XP machines--validation of the program under this Operating System is being done by Affymetrix.) This is a bold move on their part, but I'm convinced that it is a very clever one. Nevertheless, Affy is clearly quaking with fear over this decision and have not really made any great announcement of it that I'm aware of.
Thus far to my mind, the killer application provided by GCOS is a scatter plot with fold-difference lines on it. Take any two chips from the same experiment and plot them against each other. You can click on any point and get the name of the gene and the signal strengths, etc. Further you can click and "Info" button and be taken to Affy's NetAffx site to see more information about the gene of interest. Of course, same as will other biological data, if you haven't done replicates you might be looking at random noise. But that caveat aside if you focus on the 10x or 30x fold changes where one of the signals is reasonably high (above 100 at least) you should gather a list of interesting genes that can be followed up.
Caution: installing this program also installs the MSDE, (Microsoft SQL Server Desktop Engine). If you do not currently use a firewall, you might want to consider using one (such as ZoneAlarm) on the machine you install this package on. Like most Microsoft products it is inherently non-secure. But MSDE will open up additional security holes in any machine onto which it is install and is on the internet. Here is a link to a utility that is purported to lock down access to the MSDE from the net:
www.affymetrix.com/support/technical/software_patches/msde_lock_utility.affx
But my advise would be to go with a firewall and not allow MSDE to take server connections from the internet.
To obtain GCOS go to this site:
www.affymetrix.com/support/technical/product_updates/gcos_download.affx
To obtain the software you will need to register at the Affymetrix site (if you have not already). This is free. Once you have done so you can choose one of the two version of GCOS currently available. My understanding is that v1.1.1 will allow you to analyze data produced from the 11 micron feature size v2.0 chips just now becoming available from Affy. The Purdue Genomics Core started using a v.2.0 scanner in late 2004. If you have v.2 chips from summer 2004 or earliet then you did not get it from us. For the purposes of this document, I'm assuming you will download v1.1.1--although it does require some extra work everytime you add new chip library files. If you get GCOS v1.0 plus 1.0.4 the tips I give below may not apply.
As of April 2005, Affy has version 1.2 of the GCOS software. It is probably
better to use this than the v.1.1.1 software.
It is not difficult to download and install the software, but there were a few points where I got tripped up.
(1) The download involves several stages. You are sent to the Software Download Center at Affymetrix where you must type in the correct activation code. Then you provide registration information. After submitting your registration via their web page an email is sent you. This email contains a link to the download site. By clicking this link, you are sent to a vestibule of the downloading site. Here there is a link to do the download--but of equal importance a "license code". You need this code.
(2) You download the zipped software. It will take some time (the file is over 100 megabytes). You unzip the software by double-clicking on the file that has been downloaded. If you computer doesn't have unziping software, you can get a free trial version of WinZip.
www.winzip.com/info.htm
(3) After unzipping you are not done. You must go to the unzipped folder that was created. Then double-click on the setup.exe file therein. During or shortly after installation you will be asked for a "serial number" but what the software really wants is the "license code" mentioned in (1).
(4) You still are not done--you will need to obtain library files for whatever chips you wish to analyze. Users of the older program, MAS, do not abandon hope. One of the greatest improvements in GCOS over MAS is that library files take only a few minutes to install whereas in MAS they (inexplicably) took many hours. Here is the Affy library download site:
www.affymetrix.com/support/technical/libraryfilesmain.affx
Once you download the zipped file, you must unzip it--then install it by double clicking the "setup.exe" inside the unzipped folder.
(5) If you downloaded v1.1.1 you will need to undertake an additional step at this point. After downloading the library(ies) you are interested in, you need to update them to work with the v1.1.1 version of GCOS. To do so, get the following patch:
www.affymetrix.com/support/technical/product_updates/scanner3k_hires_patch.affx
Unzip it. Run it. It will patch your libraries. Keep this folder around--next time you download a new library file, you will need to run it again.
(6) Now you are ready to actually get data into GCOS for analysis. Just a brief detour to explain the various types of files created during expression analysis using Affy Chips. The general relationship is:
.DAT --> .CEL --> .CHP
.DAT files are the raw image files, largely unprocessed by GCOS. .CEL files are produced when GCOS reads the .DAT files and assigns hybridization data therein to to specific probes. .CHP files are generated from .CEL files by consolidation of all the probe pairs that interrogate a gene into a single value and an Absent/Present/Marginal call. .CHP files can be exported into a tab-delimited format that can be read by Excel.
Near as I can tell at this point, with only the .DAT file you can recreate all the other files. This was definitely the case with MAS and seems to be also with GCOS. But GCOS refuses to sully itself by directly interacting with a .DAT file. Instead this file must be imported into the GCOS local database using a the GCOS Batch Importer. This is a separate program installed at the same time as GCOS. I've used it, and it does work. But I imported from directories that contained .DAT, .CEL and .CHP files as well as another file type .EXP. The latter contains some information about the experiment. The Batch Importer needs to have write permission to the directories where the .DAT files are--because it creates a new .EXP file during the importation process. In the case I mention above all 3 types of files were imported, not just the .DAT files. Detail about how to use the Batch Importer are available in the GCOS tutorial:
www.affymetrix.com/support/technical/tutorial/gcos/index.affx
Under "How to Import Data to GCOS in Batches" The other topic heading can be useful also. But most of topics that involve transferring data out of and into GCOS will want you to use a database backup/restore method which I won't go into here.
Important Note:
I didn't trip over this, but it could cause confusion. During batch importation you "Data Path" is where you have the .DAT files you wish to import--click the '...' button to browse there. The "Library Path: box is where you library files are. If you do a default installation of your library files, then the directory you files end up in is:
C:\GeneChip\Affy_Data\Library
so browse to that directory.
(7) Once your data has been imported using the GCOS Batch Importer, it will be available to the GCOS program. Note that currently .CHP files generated by the Genomics Core Facility are neither scaled nor normalized. Affy has a good manual for data analysis:
www.affymetrix.com/Auth/support/downloads/manuals/data_analysis_fundamentals_manual.pdf
It is based on MAS, but GCOS is very similar to MAS.
A word about Scaling.
Scaling occurs during the creation of CHP files from the CEL files. As I mention above, we have it turned off (set to 1.0) but Affy recommends that it be set for all probe sets to "500". To do this go to "Tools:Expression Settings". Choose "Scaling" tab. Click the "All Probe Sets" radio button. Type in "500" to the Target Signal box. You will need to re-analyze your CEL files for this setting to be applied. Before you do this, consider another issue:
Baseline Comparison Data .
In most cases you are comparing chips with each other. But Affy wants you to choose one chip as the baseline for each comparison. For instance, say you had two chips to be compared: Col_wt_10days and C53979_10days. Open the Expression Analysis Settings window again. This time choose the "Baseline" tab. Click the "Use Baseline Comparison Data" box. Then click "Browse..." Choose the baseline chip (in this case Col_wt_10days) for the comparison. Now do the analysis on the CEL files to generate new CHP files.
The easiest way to do the analysis is to click on "Run:Batch Analysis" from the top left menu bar. Then drag the .CEL files to be analyzed into the Batch Analysis window. Click the Analyze button. You can watch the progress of the analysis in the bottom right window. Once complete, a new .CHP will have been created. Double click on these .CHP files to open them. Once the pivot charts appear you can click on the "Scatter" plot button to generate a graph. Here you click on the points that look interesting and they will be identified.
Phillip SanMiguel