Saturday, September 25, 2010

3D Star Co-ordinate Projection for the Visualization of Microarray Data

Mining of high-dimensional datasets such as microarray data to gain insight into the data has been the topic of research for decades. The first step in understanding the data is to visually examine the data to be able to make a better judgement about the kinds of algorithms that may best suit ones needs down the pipeline. Since humans can see and understand the data in 3 dimensions or less and since high dimensional datasets often have more than 3 dimensions, visualization algorithms provide a mechanism  to view the high dimensional data in 3 dimensions or less. Although visualization algorithms such as principal component analysis (PCA), linear discriminant analysis(LDA) and locally linear embedding (LLE) are often used by the research community, it can be easily seen that when using orthogonal dimensional anchors, one often runs out of space. Co-ordinate based methods such as parallel co-ordinates provide a solution by employing non-orthogonal parallel co-ordinates as dimensional anchors. But, the use of parallel co-ordinates is limited by the dimensionality and for extremely high-dimensional datasets, the plots become messy. To address this issue, Kandogan introduced 2D-star-coordinate projection (2DSCP) method. This method instead of using parallel-coordinates now uses dimensional anchors as spokes radiating into 2 dimensional space. One advantage of using this method is that one can represent the high-dimensional data as a dot in 2-dimensional space instead of a line as in the case of 2DSCP. This algorithm has flavor of both projection based algorithms such as PCA and co-ordinate based methods such as parallel co-ordinates because it provides plots just like PCA but uses non-orthogonal dimensional anchors such as parallel-coordinates. The 2DSCP however does not convey the depth information and for one to be able to obtain that, the data needs to be projected to 3 dimensions. In this work, we have extended the 2DSCP method to 3 dimensions, called 3DSCP. Also, since the dimensional anchors for the coordinate based methods are non-orthogonal, the projection results are not unique as in the case of PCA. We have therefore developed an automation algorithm to find the configuration of best projection. The algorithm has been successfully applied to high-dimensional datasets from various fields including microarray data. Please refer to the following paper for more details. To understand the concept, the reader is encouraged to read the following notes. Finally, the Matlab code is available from this link. The archive at this link consists of two use cases, one showing application of 3DSCP on artificial dataset and the other showing the application of 3DSCP on colon cancer microarray data. The archive also consists of 3 videos showing the use of automated 3DSCP algorithm on 3 different datasets. The readers are encouraged to write to for suggestions and help.

No comments:

Post a Comment