Search | arXiv e-print repository

CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation

Authors: Kevin Lam, William Daniels, J Maxwell Douglas, Daniel Lai, Samuel Aparicio, Benjamin Bloem-Reddy, Yongjin Park

Abstract: Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-… ▽ More Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-SBM respects the discrete nature of CNV calls and captures subpopulation-specific patterns through block-wise structure. Using a two-stage approach, CN-SBM decomposes CNV data into primary and residual components, enabling detection of both large-scale chromosomal alterations and finer aberrations. We derive a scalable variational inference algorithm for application to large cohorts and high-resolution data. Benchmarks on simulated and real datasets show improved model fit over existing methods. Applied to TCGA low-grade glioma data, CN-SBM reveals clinically relevant subtypes and structured residual variation, aiding patient stratification in survival analysis. These results establish CN-SBM as an interpretable, scalable framework for CNV analysis with direct relevance for tumor heterogeneity and prognosis. △ Less

Submitted 28 June, 2025; originally announced June 2025.

Comments: 8 pages, 4 figures

arXiv:2411.10462 [pdf]

Gamification and AI: Enhancing User Engagement through Intelligent Systems

Authors: Carlos J. Costa, Joao Tiago Aparicio, Manuela Aparicio, Sofia Aparicio

Abstract: Gamification applies game mechanics to non-game environments to motivate and engage users. Artificial Intelligence (AI) offers powerful tools for personalizing and optimizing gamification, adapting to users' needs, preferences, and performance levels. By integrating AI with gamification, systems can dynamically adjust game mechanics, deliver personalized feedback, and predict user behavior, signif… ▽ More Gamification applies game mechanics to non-game environments to motivate and engage users. Artificial Intelligence (AI) offers powerful tools for personalizing and optimizing gamification, adapting to users' needs, preferences, and performance levels. By integrating AI with gamification, systems can dynamically adjust game mechanics, deliver personalized feedback, and predict user behavior, significantly enhancing the effectiveness of gamification efforts. This paper examines the intersection of gamification and AI, exploring AI's methods to optimize gamified experiences and proposing mathematical models for adaptive and predictive gamification. △ Less

Submitted 2 November, 2024; originally announced November 2024.

arXiv:2408.17268 [pdf]

Predicting the Impact of Generative AI Using an Agent-Based Model

Authors: Joao Tiago Aparicio, Manuela Aparicio, Sofia Aparicio, Carlos J. Costa

Abstract: Generative artificial intelligence (AI) systems have transformed various industries by autonomously generating content that mimics human creativity. However, concerns about their social and economic consequences arise with widespread adoption. This paper employs agent-based modeling (ABM) to explore these implications, predicting the impact of generative AI on societal frameworks. The ABM integrat… ▽ More Generative artificial intelligence (AI) systems have transformed various industries by autonomously generating content that mimics human creativity. However, concerns about their social and economic consequences arise with widespread adoption. This paper employs agent-based modeling (ABM) to explore these implications, predicting the impact of generative AI on societal frameworks. The ABM integrates individual, business, and governmental agents to simulate dynamics such as education, skills acquisition, AI adoption, and regulatory responses. This study enhances understanding of AI's complex interactions and provides insights for policymaking. The literature review underscores ABM's effectiveness in forecasting AI impacts, revealing AI adoption, employment, and regulation trends with potential policy implications. Future research will refine the model, assess long-term implications and ethical considerations, and deepen understanding of generative AI's societal effects. △ Less

Submitted 30 August, 2024; originally announced August 2024.

Comments: 6 pages, 2 figures

arXiv:2310.03866 [pdf, other]

On Repairing Natural Language to SQL Queries

Authors: Aidan Z. H. Yang, Ricardo Brancas, Pedro Esteves, Sofia Aparicio, Joao Pedro Nadkarni, Miguel Terra-Neves, Vasco Manquinho, Ruben Martins

Abstract: Data analysts use SQL queries to access and manipulate data on their databases. However, these queries are often challenging to write, and small mistakes can lead to unexpected data output. Recent work has explored several ways to automatically synthesize queries based on a user-provided specification. One promising technique called text-to-SQL consists of the user providing a natural language des… ▽ More Data analysts use SQL queries to access and manipulate data on their databases. However, these queries are often challenging to write, and small mistakes can lead to unexpected data output. Recent work has explored several ways to automatically synthesize queries based on a user-provided specification. One promising technique called text-to-SQL consists of the user providing a natural language description of the intended behavior and the database's schema. Even though text-to-SQL tools are becoming more accurate, there are still many instances where they fail to produce the correct query. In this paper, we analyze when text-to-SQL tools fail to return the correct query and show that it is often the case that the returned query is close to a correct query. We propose to repair these failing queries using a mutation-based approach that is agnostic to the text-to-SQL tool being used. We evaluate our approach on two recent text-to-SQL tools, RAT-SQL and SmBoP, and show that our approach can repair a significant number of failing queries. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2308.15239 [pdf, other]

Natural language to SQL in low-code platforms

Authors: Sofia Aparicio, Samuel Arcadinho, João Nadkarni, David Aparício, João Lages, Mariana Lourenço, Bartłomiej Matejczyk, Filipe Assunção

Abstract: One of the developers' biggest challenges in low-code platforms is retrieving data from a database using SQL queries. Here, we propose a pipeline allowing developers to write natural language (NL) to retrieve data. In this study, we collect, label, and validate data covering the SQL queries most often performed by OutSystems users. We use that data to train a NL model that generates SQL. Alongside… ▽ More One of the developers' biggest challenges in low-code platforms is retrieving data from a database using SQL queries. Here, we propose a pipeline allowing developers to write natural language (NL) to retrieve data. In this study, we collect, label, and validate data covering the SQL queries most often performed by OutSystems users. We use that data to train a NL model that generates SQL. Alongside this, we describe the entire pipeline, which comprises a feedback loop that allows us to quickly collect production data and use it to retrain our SQL generation model. Using crowd-sourcing, we collect 26k NL and SQL pairs and obtain an additional 1k pairs from production data. Finally, we develop a UI that allows developers to input a NL query in a prompt and receive a user-friendly representation of the resulting SQL query. We use A/B testing to compare four different models in production and observe a 240% improvement in terms of adoption of the feature, 220% in terms of engagement rate, and a 90% decrease in failure rate when compared against the first model that we put into production, showcasing the effectiveness of our pipeline in continuously improving our feature. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Showing 1–5 of 5 results for author: Aparicio, S