-
CN-SBM: Categorical Block Modelling For Primary and Residual Copy Number Variation
Authors:
Kevin Lam,
William Daniels,
J Maxwell Douglas,
Daniel Lai,
Samuel Aparicio,
Benjamin Bloem-Reddy,
Yongjin Park
Abstract:
Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-…
▽ More
Cancer is a genetic disorder whose clonal evolution can be monitored by tracking noisy genome-wide copy number variants. We introduce the Copy Number Stochastic Block Model (CN-SBM), a probabilistic framework that jointly clusters samples and genomic regions based on discrete copy number states using a bipartite categorical block model. Unlike models relying on Gaussian or Poisson assumptions, CN-SBM respects the discrete nature of CNV calls and captures subpopulation-specific patterns through block-wise structure. Using a two-stage approach, CN-SBM decomposes CNV data into primary and residual components, enabling detection of both large-scale chromosomal alterations and finer aberrations. We derive a scalable variational inference algorithm for application to large cohorts and high-resolution data. Benchmarks on simulated and real datasets show improved model fit over existing methods. Applied to TCGA low-grade glioma data, CN-SBM reveals clinically relevant subtypes and structured residual variation, aiding patient stratification in survival analysis. These results establish CN-SBM as an interpretable, scalable framework for CNV analysis with direct relevance for tumor heterogeneity and prognosis.
△ Less
Submitted 28 June, 2025;
originally announced June 2025.
-
Gamification and AI: Enhancing User Engagement through Intelligent Systems
Authors:
Carlos J. Costa,
Joao Tiago Aparicio,
Manuela Aparicio,
Sofia Aparicio
Abstract:
Gamification applies game mechanics to non-game environments to motivate and engage users. Artificial Intelligence (AI) offers powerful tools for personalizing and optimizing gamification, adapting to users' needs, preferences, and performance levels. By integrating AI with gamification, systems can dynamically adjust game mechanics, deliver personalized feedback, and predict user behavior, signif…
▽ More
Gamification applies game mechanics to non-game environments to motivate and engage users. Artificial Intelligence (AI) offers powerful tools for personalizing and optimizing gamification, adapting to users' needs, preferences, and performance levels. By integrating AI with gamification, systems can dynamically adjust game mechanics, deliver personalized feedback, and predict user behavior, significantly enhancing the effectiveness of gamification efforts. This paper examines the intersection of gamification and AI, exploring AI's methods to optimize gamified experiences and proposing mathematical models for adaptive and predictive gamification.
△ Less
Submitted 2 November, 2024;
originally announced November 2024.
-
Predicting the Impact of Generative AI Using an Agent-Based Model
Authors:
Joao Tiago Aparicio,
Manuela Aparicio,
Sofia Aparicio,
Carlos J. Costa
Abstract:
Generative artificial intelligence (AI) systems have transformed various industries by autonomously generating content that mimics human creativity. However, concerns about their social and economic consequences arise with widespread adoption. This paper employs agent-based modeling (ABM) to explore these implications, predicting the impact of generative AI on societal frameworks. The ABM integrat…
▽ More
Generative artificial intelligence (AI) systems have transformed various industries by autonomously generating content that mimics human creativity. However, concerns about their social and economic consequences arise with widespread adoption. This paper employs agent-based modeling (ABM) to explore these implications, predicting the impact of generative AI on societal frameworks. The ABM integrates individual, business, and governmental agents to simulate dynamics such as education, skills acquisition, AI adoption, and regulatory responses. This study enhances understanding of AI's complex interactions and provides insights for policymaking. The literature review underscores ABM's effectiveness in forecasting AI impacts, revealing AI adoption, employment, and regulation trends with potential policy implications. Future research will refine the model, assess long-term implications and ethical considerations, and deepen understanding of generative AI's societal effects.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
On Repairing Natural Language to SQL Queries
Authors:
Aidan Z. H. Yang,
Ricardo Brancas,
Pedro Esteves,
Sofia Aparicio,
Joao Pedro Nadkarni,
Miguel Terra-Neves,
Vasco Manquinho,
Ruben Martins
Abstract:
Data analysts use SQL queries to access and manipulate data on their databases. However, these queries are often challenging to write, and small mistakes can lead to unexpected data output. Recent work has explored several ways to automatically synthesize queries based on a user-provided specification. One promising technique called text-to-SQL consists of the user providing a natural language des…
▽ More
Data analysts use SQL queries to access and manipulate data on their databases. However, these queries are often challenging to write, and small mistakes can lead to unexpected data output. Recent work has explored several ways to automatically synthesize queries based on a user-provided specification. One promising technique called text-to-SQL consists of the user providing a natural language description of the intended behavior and the database's schema. Even though text-to-SQL tools are becoming more accurate, there are still many instances where they fail to produce the correct query.
In this paper, we analyze when text-to-SQL tools fail to return the correct query and show that it is often the case that the returned query is close to a correct query. We propose to repair these failing queries using a mutation-based approach that is agnostic to the text-to-SQL tool being used. We evaluate our approach on two recent text-to-SQL tools, RAT-SQL and SmBoP, and show that our approach can repair a significant number of failing queries.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Natural language to SQL in low-code platforms
Authors:
Sofia Aparicio,
Samuel Arcadinho,
João Nadkarni,
David Aparício,
João Lages,
Mariana Lourenço,
Bartłomiej Matejczyk,
Filipe Assunção
Abstract:
One of the developers' biggest challenges in low-code platforms is retrieving data from a database using SQL queries. Here, we propose a pipeline allowing developers to write natural language (NL) to retrieve data. In this study, we collect, label, and validate data covering the SQL queries most often performed by OutSystems users. We use that data to train a NL model that generates SQL. Alongside…
▽ More
One of the developers' biggest challenges in low-code platforms is retrieving data from a database using SQL queries. Here, we propose a pipeline allowing developers to write natural language (NL) to retrieve data. In this study, we collect, label, and validate data covering the SQL queries most often performed by OutSystems users. We use that data to train a NL model that generates SQL. Alongside this, we describe the entire pipeline, which comprises a feedback loop that allows us to quickly collect production data and use it to retrain our SQL generation model. Using crowd-sourcing, we collect 26k NL and SQL pairs and obtain an additional 1k pairs from production data. Finally, we develop a UI that allows developers to input a NL query in a prompt and receive a user-friendly representation of the resulting SQL query. We use A/B testing to compare four different models in production and observe a 240% improvement in terms of adoption of the feature, 220% in terms of engagement rate, and a 90% decrease in failure rate when compared against the first model that we put into production, showcasing the effectiveness of our pipeline in continuously improving our feature.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.