Gene and Exon Expression Using Affymetrix Mouse Exon Arrays (MoEx)

Our lab makes extensive use of Affymetrix expression arrays to measure gene expression. I’ve already processed several different types of mouse and human gene arrays to determine which genes are active or inactive in the experimental sample. The pipeline for this is based on advice from contacts at the Sanger, and is based on the bioconductor Affy package. We use RMA but also MAS5 absent/present calls.

Some of our work requires greater resolution than gene expression – we want a measure of which exons are the most transcribed. This is close to the related problem of determining the specific transcript of a particular gene (which set of exons) is transcribed in the cell type used in the experiment. You can do this with RNA-seq, but it’s quite expensive and the tools aren’t very mature yet. We chose to go with Affymetrix’s mouse exon arrays, or MoEx for short.

However, using R/Bioconductor to derive RMA and MAS5 for MoEx is not as easy as the gene arrays. In R you should be able to do (crudely):


> library("affy")
Loading required package: Biobase

Welcome to Bioconductor

Vignettes contain introductory material. To view, type
'openVignette()'. To cite Bioconductor, see
'citation("Biobase")' and for packages 'citation(pkgname)'.

> rawAffy rmaAffy <- rma(rawAffy)

but for MoEx you get:


Error in getCdfInfo(object) :
Could not obtain CDF environment, problems encountered:
Specified environment does not contain MoEx-1_0-st-v1
Library - package moex10stv1cdf not installed
Bioconductor - moex10stv1cdf not available

For gene arrays, you can usually just use bioclite to install a Bioconductor library that includes the CDF environment, and indeed there is a bioclite package for MoEx, but it doesn’t fix the error message.

I’m sure if I was better at R I could resolve this, but instead I downloaded the CDF manually from the XMap page (more about XMap in a minute) and then do this:


library("affy")
adata <- ReadAffy()
adata@cdfName <- "mouseexonpmcdf"
rma <- rma(adata)

Then you can dump this list of RMA values like this:


write.exprs(rma, "RMA_output.csv", sep="\t")

This produces a file that has RMA values for each MoEx probeset for each biorep we used. Each probeset represent a set of oligo probes which, together, cover a region of the genome to provide information about the exon it overlaps. Now we need to translate probeset IDs to exons. In my case, I want Ensembl exon IDs, since Ensembl is the gold standard annotation for us.

With GeneChip arrays, Ensembl has computed mappings from Affy’s probeset IDs to Ensembl genes, and you can get hold of this information quickly using BioMart. Ensembl also has mappings for MoEx, but these are from MoEx probeset IDs to gene IDs, and not exon IDs.

I tried a variety of methods to derive a specific mapping to exon IDs. I used the coordinates Affy provides for its probes (and thus probesets), which can be obtained under technical documentation (registration required?) to look up the overlapping exons from a chosen Ensembl annotation. I also tried using the sequences of the probes (also available on the Affy web page) and BLAST them against the genome to obtain the coordinates of the probesets myself, and then did the overlapping as before. However, all these methods are rather hacky and I’m by no means an expert with this kind of technology so I can’t be tell if what I’ve done is appropriate or not.

Instead, we took advice and decided to use the Manchester Paterson’s exonmap mapping. Exonmap is the new name for XMap, which I mentioned earlier – they seem to be the MoEx experts. Exonmap is actually a full bioconductor package, that allows you to derive RMA values for specific gene lists and a variety of other functionality. However, as always, we need more fine-grained control over how probesets are combined. So I just extracted the probeset to Ensembl exon ID mapping from the exonmap database tables to use for my stuff.

Downstream, I have custom code to work out expression level for individual exons and genes as well as differential expression between treatment groups. We use an approach based on the splice index method.

However, there are other packaged solutions for doing this. One we tried was AltAnalyze, a Python tool with a nice GUI that allows you to run a variety of types of exon analysis on MoEx and other experiment types. It comes automatically packaged with the Affy power tools (although you can also get those yourself) and a selection of Affy annotation files.

As usual, we needed something we could fiddle with more, so we still ended up with custom code. However, we did use the background-corrected calculation for RMA values from AltAnalyze, which is actually an Affy power tools command:


$HOME/src/AltAnalyze_v1release/AltDatabase/affymetrix/APT/Linux/64bit/apt-probeset-summarize -p $HOME/src/AltAnalyze_v1release/AltDatabase/affymetrix/LibraryFiles/MoEx-1_0-st-v1.r2.pgf -c $HOME/src/AltAnalyze_v1release/AltDatabase/affymetrix/LibraryFiles/MoEx-1_0-st-v1.r2.clf -b $HOME/src/AltAnalyze_v1release/AltDatabase/affymetrix/LibraryFiles/MoEx-1_0-st-v1.r2.antigenomic.bgp --kill-list $HOME/src/AltAnalyze_v1release/AltDatabase/EnsMart55/Mm/exon/Mm_probes_to_remove.txt -a rma-sketch -a dabg --cel-files cel_files.txt -o .

Obviously this has a lot of file dependencies from AltAnalyze, but this gives us not only RMA values, but p-value confidences for those values, which we use downstream. This replaces the R/Bioconductor RMA value generation. I expect there are arguments to be had about which tool generates the best expression values.

I like this Affy power tools approach, because it does not rely on R/Bioconductor and can be easily scripted. We managed to find an ArrayExpress entry containing MoEx bioreps for 54 different mouse tissues. Using the Affy power tools, I wrote a script to generate RMA values and apply some other parts of our pipeline to all 54 groups. We could then compare probeset levels in our experiments to all these tissue types to derive the relative level of expression for exons in our samples – all this took only a matter of days.

Advertisements

2 Responses to “Gene and Exon Expression Using Affymetrix Mouse Exon Arrays (MoEx)”

  1. rishi Says:

    When processing MoEx-1_0-st-v1 library with ‘affy’ package I am getting following error

    >Data <- ReadAffy()
    Error:

    The affy package is not designed for this array type.
    Please use either the oligo or xps package.

  2. psaffrey Says:

    Rishi,

    This blog post is really old – I’m sure the Bioconductor packages have changed a lot from what I’ve done here. I’m afraid you’ll have to Google bash to work around this problem.

    Peter

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: