CytoGPS: A large-scale karyotype analysis of CML data


      • Large-scale karyotype analysis of chronic myeloid leukemia (CML).
      • Novel method for analyzing text based (ISCN) karyotypes.
      • Demonstrated a proof of principle approach to identify disease subgroups based on cytogenetic profiling.
      • Determined 28 clusters from 4969 CML karyotypes based on cytogenetic profile.


      Karyotyping, the practice of visually examining and recording chromosomal abnormalities, is commonly used to diagnose diseases of genetic origin, including cancers. Karyotypes are recorded as text written in the International System for Human Cytogenetic Nomenclature (ISCN). Downstream analysis of karyotypes is conducted manually, due to the visual nature of analysis and the linguistic structure of the ISCN. The ISCN has not been computer-readable and, as such, prevents the full potential of these genomic data from being realized. In response, we developed CytoGPS, a platform to analyze large volumes of cytogenetic data using a Loss-Gain-Fusion model that converts the human-readable ISCN karyotypes into a machine-readable binary format. As proof of principle, we applied CytoGPS to cytogenetic data from the Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer, a National Cancer Institute hosted database of over 69,000 karyotypes of human cancers. Using the Jaccard coefficient to determine similarity between karyotypes structured as binary vectors, we were able to identify novel patterns from 4,968 Mitelman CML karyotypes, such as the co-occurrence of trisomy 19 and 21. The CytoGPS platform unlocks the potential for large-scale, comparative analysis of cytogenetic data. This methodological platform is freely available at


      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Cancer Genetics
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Shuman S.
        Structure, mechanism, and evolution of the mRNA capping apparatus.
        Prog Nucleic Acid Res Mol Biol. 2001; 66: 1-40
        • Heim S.
        • Mitelman F.
        Cancer cytogenetics: chromosomal and molecular genetic aberrations of tumor cells.
        John Wiley & Sons, 2015
        • Stevens-Kroef M.
        • et al.
        Cytogenetic nomenclature and reporting.
        Methods Mol Biol. 2017; 1541: 303-309
        • Hiller B.
        • et al.
        CyDAS: a cytogenetic data analysis system.
        Bioinformatics. 2005; 21: 1282-1283
        • Abrams Z.B.
        • et al.
        CytoGPS: a web-enabled karyotype analysis tool for cytogenetics.
        Bioinformatics. 2019; 35: 5365-5366
        • Rowley J.D.
        A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining.
        Nature. 1973; 243: 290-293
        • Vardiman J.
        • et al.
        Chronic myeloid leukaemia, BCR-ABL1–positive.
        in: Swerdlow S.H. WHO classification of tumors of haematopoietic and lymphoid tissue. IARC Press, Lyon, France2017: 30-36
      1. Parr, T., ANTLR: another tool for language recognition. 2006.

        • Jaccard P.
        The distribution of the flora in the alpine zone. 1.
        New Phytol. 1912; 11: 37-50
        • J. K.P.
        • Rousseeuw L.
        Finding groups in data: an introduction to cluster analysis.
        John Wiley & Sons, Hoboken, NJ1990
        • Wang M.
        • et al.
        Thresher: determining the number of clusters while removing outliers.
        BMC Bioinform. 2018; 19: 9
        • Wang M.
        • Kornblau S.M.
        • Coombes K.R.
        Decomposing the apoptosis pathway into biologically interpretable principal components.
        Cancer Inform. 2018; 17 (p. 1176935118771082)
        • van der Maaten L.
        • Hinton G.
        Visualizing data using t-SNE.
        J Mach Learn Res. 2008; 9 (Nov): 2579-2605
        • Shaffer L.G.
        • McGowan-Jordan J.
        • Schmid M.
        ISCN 2013: an international system for human cytogenetic nomenclature (2013).
        Karger Medical and Scientific Publishers, 2013
        • Meggendorfer M.
        • et al.
        SETBP1 mutations occur in 9% of MDS/MPN and in 4% of MPN cases and are strongly associated with atypical CML, monosomy 7, isochromosome i(17)(q10), ASXL1 and CBL mutations.
        Leukemia. 2013; 27: 1852-1860
        • Johansson B.
        • Fioretos T.
        • Mitelman F.
        Cytogenetic and molecular genetic evolution of chronic myeloid leukemia.
        Acta Haematol. 2002; 107: 76-94
        • Bakshi S.R.
        • et al.
        Trisomy 8 in leukemia: a GCRI experience.
        Indian J Hum Genet. 2012; 18: 106-108
        • Togasaki E.
        • et al.
        Frequent somatic mutations in epigenetic regulators in newly diagnosed chronic myeloid leukemia.
        Blood Cancer J. 2017; 7: e559
        • Wilch E.S.
        • Morton C.C.
        Historical and clinical perspectives on chromosomal translocations.
        Adv Exp Med Biol. 2018; 1044: 1-14
        • Bayani J.
        • Squire J.A.
        Fluorescence in situ Hybridization (FISH).
        Curr Protoc Cell Biol. 2004; (p. Unit 22.4)
        • Baliakas P.
        • et al.
        Additional trisomies amongst patients with chronic lymphocytic leukemia carrying trisomy 12: the accompanying chromosome makes a difference.
        Haematologica. 2016; 101: e299-e302