Title: A Hidden Markov Modeling Approach for Identifying Tumor Subclones in Next-Generation Sequencing Studies
Allele-specific copy number alteration (ASCNA) analysis is for identifying copy number abnormalities in tumor cells. Unlike normal cells, tumor cells are heterogeneous as a combination of dominant and minor subclones with distinct copy number profiles. Estimating the clonal proportion and identifying mainclone and subclone genotypes across the genome is important for understanding tumor progression. Several ASCNA tools have recently been developed, but they have been limited to the identification of subclone regions, and not the genotype of subclones. In this paper, we propose subHMM, a hidden Markov model-based approach that estimates both subclone region as well as region-specific subclone genotype and clonal proportion. We specify a hidden state variable representing the conglomeration of clonal genotype and subclone status. We propose a two-step algorithm for parameter estimation, where in the first step, a standard hidden Markov model with this conglomerated state variable is fit. Then, in the second step, region-specific estimates of the clonal proportions are obtained by maximizing region-specific pseudo-likelihoods. We apply subHMM to study renal cell carcinoma datasets in The Cancer Genome Atlas. In addition, we conduct simulation studies that show the good performance of the proposed approach.