SAAS-CNV: Somatic copy number alteration analysis using sequencing and SNP array data

Zhongyang Zhang and Ke Hao


saasCNV is a package for the analysis of somatic copy number alterations (SCNAs) of tumor samples using whole genome/exome sequencing (WGS/WES) and SNP array data. It extracts from the sequencing (SNP array) platform two signal dimensions related to SCNA: 1) total read depth (intensity) reflecting total copy number change; 2) allele specific read depth (intensity) reflecting allelic imbalance as a result of differential copy number changes upon the two alleles. The latter also provides valuable clues for the inference of tumor ploidy and purity. It then carries out joint analysis on these two signal dimensions in both segmentation and calling steps. saasCNV also provides visualzation for diagnosis of intermediate data processing and analysis and illustration of final results. More details can be found in the paper:

Zhongyang Zhang and Ke Hao. (2015) SAAS-CNV: A Joint Segmentation Approach on Aggregated and Allele Specific Signals for the Identification of Somatic Copy Number Alterations with Next-Generation Sequencing Data. PLoS Computational Biology, 11(11): e1004618.



This is an example SCNA profile from a heptacellular carcinoma (HCC) sample assayed with whole-genome deep sequencing and SNP array. (A) and (B) display SNP array data and (C) and (D) WGS data. In (A) and (C), on the top panel, the log2ratio signal is plotted against chromosomal position and on the bottom panel, the log2mBAF signal. The dots, each representing a locus, are colored alternately to distinguish chromosomes. The segments, each representing a DNA segment resulting from the joint segmentation, are colored based on inferred copy number status. In (B) and (D), on the main log2mBAF-log2ratio panel, each circle corresponds to a segment in (A) and (C), with the size reflecting the length of the segment; the color code is specified in legend; the dashed gray lines indicate the adjusted baselines. The side panels, corresponding to log2ratio and log2mBAF dimension respectively, show the distribution of median values of each segment.




Zhongyang Zhang and Ke Hao
Department of Genetics and Genomic Sciences
Icahn Institute for Genomics and Multiscale Biology
Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA