CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation
Authors:
Kevin Lam,
William Daniels,
J Maxwell Douglas,
Daniel Lai,
Samuel Aparicio,
Benjamin Bloem-Reddy,
Yongjin Park
Abstract:
Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-…
▽ More
Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-SBM respects the discrete nature of CNV calls and captures subpopulation-specific patterns through block-wise structure. Using a two-stage approach, CN-SBM decomposes CNV data into primary and residual components, enabling detection of both large-scale chromosomal alterations and finer aberrations. We derive a scalable variational inference algorithm for application to large cohorts and high-resolution data. Benchmarks on simulated and real datasets show improved model fit over existing methods. Applied to TCGA low-grade glioma data, CN-SBM reveals clinically relevant subtypes and structured residual variation, aiding patient stratification in survival analysis. These results establish CN-SBM as an interpretable, scalable framework for CNV analysis with direct relevance for tumor heterogeneity and prognosis.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
A Gaussian process based approach for validation of multi-variable measurement systems: application to SAR measurement systems
Authors:
C. Bujard,
E. Neufeld,
M. Douglas,
J. Wiart,
N. Kuster
Abstract:
Resource-efficient and robust validation of systems designed to measure a multi-dimensional parameter space is an unsolved problem as it would require millions of test permutations for comprehensive validation coverage. In the paper, an efficient and comprehensive validation approach based on a Gaussian Process (GP) model of the test system has been developed that can operate system-agnostically,…
▽ More
Resource-efficient and robust validation of systems designed to measure a multi-dimensional parameter space is an unsolved problem as it would require millions of test permutations for comprehensive validation coverage. In the paper, an efficient and comprehensive validation approach based on a Gaussian Process (GP) model of the test system has been developed that can operate system-agnostically, avoids calibration to a fixed set of known validation benchmarks, and supports large configuration spaces. The approach consists of three steps that can be performed independently by different parties: 1) GP model creation, 2) model confirmation, and 3) targeted search for critical cases. It has been applied to two systems that measure specific absorption rate (SAR) for compliance testing of wireless devices and apply different SAR measurement methods: a probe-scanning system (per IEC/IEEE 62209-1528), and a static sensor-array system (per IEC 62209-3). The results demonstrate that the approach is practical, feasible, suitable for proving effective equivalence, and can be applied to any measurement method and implementation. The presented method is sufficiently general to be of value not only for SAR system validation, but also in a wide variety of applications that require critical, independent, and efficient validation.
△ Less
Submitted 23 April, 2024; v1 submitted 23 November, 2022;
originally announced November 2022.