Saturday, October 30, 2010

Microarray Analysis

microarray analysis

Microarray analysis is a new thing I leared in this class. And I went to several seminar about chip-chip analysis, it's another topic of bioinformatics besides gene prediction. I will skip the part of microarray process and only talk about analysis part.

1. The goal of microarray analysis is to compare two genelists, in these two genelists, which genes are upregulated? which are downregulated? We cannot say the genelist contains more gene A is upregulated by gene A, since it has a lot of bias. We need to use statistical method to find upregulated genes with statistical significant.

 2. Software for microarray analysis. In the project of this class, we used BRB array tool to do microarray analysis. BRB array tool is an add-in of microsoft office. But be careful, from my experience, office 2003 is the best on to add this add-in, office 2007 may have crash. :( For the BRB array tool, we need to import the data, and filt, normalize them. Then add the tag for each array, (depending how you want to compare them), then run the comparison. It will generate a list of significant different genes, and heat maps, and cluster trees if you selected. It's convenient, however, I think if we want to know the whole process and algorithm, it will take us a lot of time. I borrowed a book from library, "Beginner for microarray", since I want to know the process of analysis. I just read small part of it, it really has a lot of things.

3. Principal components analysis and microarray data. This part is the hardest part, since it need a lot of linear algebra and statistics. I checked several reference, and the following is my understanding of PCA. In gene expression arrays we often have many genes being co-expressed in response to the same biological phenomenon, there are large number of measurements, and the measurements are correlated. The principal component analysis is a method for reducing the dimensionality when one has correlated measurements. It makes linear combinations of a group of variables in such a way that the linear combinations represent the data well.
The idea of PCA is,  let X_1, ..., X_p be a set of real valued random variables, define a vector a=(a_1, a_2, ..., a_p)^T and seek a derived variable Z=a_1X_1+a_2X_2+...+a_pX_p such that var(Z) is maximized under the constraint ||a||=1. Then the derived variable Z attempts to capture the common variation in the variables X_i. Usually the single variable Z is not enough to represent the original variables X_1, ..., X_p, in that case we find a second derived variable, uncorrelated with the first, with the largest variance, and so on.
More formally, the task is to find uncorrelated variables Z_k such that Z_k=a_1kX_1+a_2kX_2+...+a_pkX_p and var(Z_k) is maximized under the constraint ||a_k||=1.

4. Sample PCA: Let us represent an expression array by x_ij, where i indexes one of the p genes and j indexes one of the n samples. The largest sample principal component z_1j is defined to be the linear combination z_1j=sum(a_i1, x_ij), ||a_1||=1, that has the largest sample variance.  To calculate the PCA, we only need to calculate the eigenvectors of covariance matrix. And, these eigenvectors are called eigenarrays, z_kj are called eigengenes.


After I read this, I feel I am more clear about what is PCA and it application for microarray analysis.

No comments:

Post a Comment