-
A Novel Strategy for Detecting Multiple Mediators in High-Dimensional Mediation Models
Authors:
Pei-Shan Yen,
Soumya Sahu,
Debarghya Nandi,
Zhaoliang Zhou,
Olusola Ajilore,
Dulal Bhaumik
Abstract:
This article presents a novel methodology for detecting multiple biomarkers in high-dimensional mediation models by utilizing a modified Least Absolute Shrinkage and Selection Operator (LASSO) alongside Pathway LASSO. This approach effectively addresses the problem of overestimating direct effects, which can result in the inaccurate identification of mediators with nonzero indirect effects. To mit…
▽ More
This article presents a novel methodology for detecting multiple biomarkers in high-dimensional mediation models by utilizing a modified Least Absolute Shrinkage and Selection Operator (LASSO) alongside Pathway LASSO. This approach effectively addresses the problem of overestimating direct effects, which can result in the inaccurate identification of mediators with nonzero indirect effects. To mitigate this overestimation and improve the true positive rate for detecting mediators, two constraints on the $L_1$-norm penalty are introduced. The proposed methodology's effectiveness is demonstrated through extensive simulations across various scenarios, highlighting its robustness and reliability under different conditions. Furthermore, a procedure for selecting an optimal threshold for dimension reduction using sure independence screening is introduced, enhancing the accuracy of true biomarker detection and yielding a final model that is both robust and well-suited for real-world applications. To illustrate the practical utility of this methodology, the results are applied to a study dataset involving patients with internalizing psychopathology, showcasing its applicability in clinical settings. Overall, this methodology signifies a substantial advancement in biomarker detection within high-dimensional mediation models, offering promising implications for both research and clinical practices.
△ Less
Submitted 15 April, 2025;
originally announced April 2025.
-
Fiducial Inference for Random-Effects Calibration Models: Advancing Reliable Quantification in Environmental Analytical Chemistry
Authors:
Soumya Sahu,
Thomas Mathew,
Robert Gibbons,
Dulal K. Bhaumik
Abstract:
This article addresses calibration challenges in analytical chemistry by employing a random-effects calibration curve model and its generalizations to capture variability in analyte concentrations. The model is motivated by specific issues in analytical chemistry, where measurement errors remain constant at low concentrations but increase proportionally as concentrations rise. To account for this,…
▽ More
This article addresses calibration challenges in analytical chemistry by employing a random-effects calibration curve model and its generalizations to capture variability in analyte concentrations. The model is motivated by specific issues in analytical chemistry, where measurement errors remain constant at low concentrations but increase proportionally as concentrations rise. To account for this, the model permits the parameters of the calibration curve, which relate instrument responses to true concentrations, to vary across different laboratories, thereby reflecting real-world variability in measurement processes. Traditional large-sample interval estimation methods are inadequate for small samples, leading to the use of an alternative approach, namely the fiducial approach. The calibration curve that accurately captures the heteroscedastic nature of the data, results in more reliable estimates across diverse laboratory conditions. It turns out that the fiducial approach, when used to construct a confidence interval for an unknown concentration, produces a slightly wider width while achieving the desired coverage probability. Applications considered include the determination of the presence of an analyte and the interval estimation of an unknown true analyte concentration. The proposed method is demonstrated for both simulated and real interlaboratory data, including examples involving copper and cadmium in distilled water.
△ Less
Submitted 25 March, 2025; v1 submitted 6 March, 2025;
originally announced March 2025.
-
Fiducial Confidence Intervals for Agreement Measures Among Raters Under a Generalized Linear Mixed Effects Model
Authors:
Soumya Sahu,
Thomas Mathew,
Dulal K. Bhaumik
Abstract:
A generalization of the classical concordance correlation coefficient (CCC) is considered under a three-level design where multiple raters rate every subject over time, and each rater is rating every subject multiple times at each measuring time point. The ratings can be discrete or continuous. A methodology is developed for the interval estimation of the CCC based on a suitable linearization of t…
▽ More
A generalization of the classical concordance correlation coefficient (CCC) is considered under a three-level design where multiple raters rate every subject over time, and each rater is rating every subject multiple times at each measuring time point. The ratings can be discrete or continuous. A methodology is developed for the interval estimation of the CCC based on a suitable linearization of the model along with an adaptation of the fiducial inference approach. The resulting confidence intervals have satisfactory coverage probabilities and shorter expected widths compared to the interval based on Fisher Z-transformation, even under moderate sample sizes. Two real applications available in the literature are discussed. The first application is based on a clinical trial to determine if various treatments are more effective than a placebo for treating knee pain associated with osteoarthritis. The CCC was used to assess agreement among the manual measurements of the joint space widths on plain radiographs by two raters, and the computer-generated measurements of digitalized radiographs. The second example is on a corticospinal tractography, and the CCC was once again applied in order to evaluate the agreement between a well-trained technologist and a neuroradiologist regarding the measurements of fiber number in both the right and left corticospinal tracts. Other relevant applications of our general approach are highlighted in many areas including artificial intelligence.
△ Less
Submitted 13 April, 2025; v1 submitted 6 March, 2025;
originally announced March 2025.
-
Metrics for popularity bias in dynamic recommender systems
Authors:
Valentijn Braun,
Debarati Bhaumik,
Diptish Dey
Abstract:
Albeit the widespread application of recommender systems (RecSys) in our daily lives, rather limited research has been done on quantifying unfairness and biases present in such systems. Prior work largely focuses on determining whether a RecSys is discriminating or not but does not compute the amount of bias present in these systems. Biased recommendations may lead to decisions that can potentiall…
▽ More
Albeit the widespread application of recommender systems (RecSys) in our daily lives, rather limited research has been done on quantifying unfairness and biases present in such systems. Prior work largely focuses on determining whether a RecSys is discriminating or not but does not compute the amount of bias present in these systems. Biased recommendations may lead to decisions that can potentially have adverse effects on individuals, sensitive user groups, and society. Hence, it is important to quantify these biases for fair and safe commercial applications of these systems. This paper focuses on quantifying popularity bias that stems directly from the output of RecSys models, leading to over recommendation of popular items that are likely to be misaligned with user preferences. Four metrics to quantify popularity bias in RescSys over time in dynamic setting across different sensitive user groups have been proposed. These metrics have been demonstrated for four collaborative filtering based RecSys algorithms trained on two commonly used benchmark datasets in the literature. Results obtained show that the metrics proposed provide a comprehensive understanding of growing disparities in treatment between sensitive groups over time when used conjointly.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
APPRAISE: a governance framework for innovation with AI systems
Authors:
Diptish Dey,
Debarati Bhaumik
Abstract:
As artificial intelligence (AI) systems increasingly impact society, the EU Artificial Intelligence Act (AIA) is the first serious legislative attempt to contain the harmful effects of AI systems. This paper proposes a governance framework for AI innovation. The framework bridges the gap between strategic variables and responsible value creation, recommending audit as an enforcement mechanism. Str…
▽ More
As artificial intelligence (AI) systems increasingly impact society, the EU Artificial Intelligence Act (AIA) is the first serious legislative attempt to contain the harmful effects of AI systems. This paper proposes a governance framework for AI innovation. The framework bridges the gap between strategic variables and responsible value creation, recommending audit as an enforcement mechanism. Strategic variables include, among others, organization size, exploration versus exploitation -, and build versus buy dilemmas. The proposed framework is based on primary and secondary research; the latter describes four pressures that organizations innovating with AI experience. Primary research includes an experimental setup, using which 34 organizations in the Netherlands are surveyed, followed up by 2 validation interviews. The survey measures the extent to which organizations coordinate technical elements of AI systems to ultimately comply with the AIA. The validation interviews generated additional in-depth insights and provided root causes. The moderating effect of the strategic variables is tested and found to be statistically significant for variables such as organization size. Relevant insights from primary and secondary research are eventually combined to propose the APPRAISE framework.
△ Less
Submitted 11 December, 2023; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Lode Enhancer: Level Co-creation Through Scaling
Authors:
Debosmita Bhaumik,
Julian Togelius,
Georgios N. Yannakakis,
Ahmed Khalifa
Abstract:
We explore AI-powered upscaling as a design assistance tool in the context of creating 2D game levels. Deep neural networks are used to upscale artificially downscaled patches of levels from the puzzle platformer game Lode Runner. The trained networks are incorporated into a web-based editor, where the user can create and edit levels at three different levels of resolution: 4x4, 8x8, and 16x16. An…
▽ More
We explore AI-powered upscaling as a design assistance tool in the context of creating 2D game levels. Deep neural networks are used to upscale artificially downscaled patches of levels from the puzzle platformer game Lode Runner. The trained networks are incorporated into a web-based editor, where the user can create and edit levels at three different levels of resolution: 4x4, 8x8, and 16x16. An edit at any resolution instantly transfers to the other resolutions. As upscaling requires inventing features that might not be present at lower resolutions, we train neural networks to reproduce these features. We introduce a neural network architecture that is capable of not only learning upscaling but also giving higher priority to less frequent tiles. To investigate the potential of this tool and guide further development, we conduct a qualitative study with 3 designers to understand how they use it. Designers enjoyed co-designing with the tool, liked its underlying concept, and provided feedback for further improvement.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Lode Encoder: AI-constrained co-creativity
Authors:
Debosmita Bhaumik,
Ahmed Khalifa,
Julian Togelius
Abstract:
We present Lode Encoder, a gamified mixed-initiative level creation system for the classic platform-puzzle game Lode Runner. The system is built around several autoencoders which are trained on sets of Lode Runner levels. When fed with the user's design, each autoencoder produces a version of that design which is closer in style to the levels that it was trained on. The Lode Encoder interface allo…
▽ More
We present Lode Encoder, a gamified mixed-initiative level creation system for the classic platform-puzzle game Lode Runner. The system is built around several autoencoders which are trained on sets of Lode Runner levels. When fed with the user's design, each autoencoder produces a version of that design which is closer in style to the levels that it was trained on. The Lode Encoder interface allows the user to build and edit levels through 'painting' from the suggestions provided by the autoencoders. Crucially, in order to encourage designers to explore new possibilities, the system does not include more traditional editing tools. We report on the system design and training procedure, as well as on the evolution of the system itself and user tests.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Complying with the EU AI Act
Authors:
Jacintha Walters,
Diptish Dey,
Debarati Bhaumik,
Sophie Horsman
Abstract:
The EU AI Act is the proposed EU legislation concerning AI systems. This paper identifies several categories of the AI Act. Based on this categorization, a questionnaire is developed that serves as a tool to offer insights by creating quantitative data. Analysis of the data shows various challenges for organizations in different compliance categories. The influence of organization characteristics,…
▽ More
The EU AI Act is the proposed EU legislation concerning AI systems. This paper identifies several categories of the AI Act. Based on this categorization, a questionnaire is developed that serves as a tool to offer insights by creating quantitative data. Analysis of the data shows various challenges for organizations in different compliance categories. The influence of organization characteristics, such as size and sector, is examined to determine the impact on compliance. The paper will also share qualitative data on which questions were prevalent among respondents, both on the content of the AI Act as the application. The paper concludes by stating that there is still room for improvement in terms of compliance with the AIA and refers to a related project that examines a solution to help these organizations.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
An Audit Framework for Technical Assessment of Binary Classifiers
Authors:
Debarati Bhaumik,
Diptish Dey
Abstract:
Multilevel models using logistic regression (MLogRM) and random forest models (RFM) are increasingly deployed in industry for the purpose of binary classification. The European Commission's proposed Artificial Intelligence Act (AIA) necessitates, under certain conditions, that application of such models is fair, transparent, and ethical, which consequently implies technical assessment of these mod…
▽ More
Multilevel models using logistic regression (MLogRM) and random forest models (RFM) are increasingly deployed in industry for the purpose of binary classification. The European Commission's proposed Artificial Intelligence Act (AIA) necessitates, under certain conditions, that application of such models is fair, transparent, and ethical, which consequently implies technical assessment of these models. This paper proposes and demonstrates an audit framework for technical assessment of RFMs and MLogRMs by focussing on model-, discrimination-, and transparency & explainability-related aspects. To measure these aspects 20 KPIs are proposed, which are paired to a traffic light risk assessment method. An open-source dataset is used to train a RFM and a MLogRM model and these KPIs are computed and compared with the traffic lights. The performance of popular explainability methods such as kernel- and tree-SHAP are assessed. The framework is expected to assist regulatory bodies in performing conformity assessments of binary classifiers and also benefits providers and users deploying such AI-systems to comply with the AIA.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
A Framework for Auditing Multilevel Models using Explainability Methods
Authors:
Debarati Bhaumik,
Diptish Dey,
Subhradeep Kayal
Abstract:
Applications of multilevel models usually result in binary classification within groups or hierarchies based on a set of input features. For transparent and ethical applications of such models, sound audit frameworks need to be developed. In this paper, an audit framework for technical assessment of regression MLMs is proposed. The focus is on three aspects, model, discrimination, and transparency…
▽ More
Applications of multilevel models usually result in binary classification within groups or hierarchies based on a set of input features. For transparent and ethical applications of such models, sound audit frameworks need to be developed. In this paper, an audit framework for technical assessment of regression MLMs is proposed. The focus is on three aspects, model, discrimination, and transparency and explainability. These aspects are subsequently divided into sub aspects. Contributors, such as inter MLM group fairness, feature contribution order, and aggregated feature contribution, are identified for each of these sub aspects. To measure the performance of the contributors, the framework proposes a shortlist of KPIs. A traffic light risk assessment method is furthermore coupled to these KPIs. For assessing transparency and explainability, different explainability methods (SHAP and LIME) are used, which are compared with a model intrinsic method using quantitative methods and machine learning modelling. Using an open source dataset, a model is trained and tested and the KPIs are computed. It is demonstrated that popular explainability methods, such as SHAP and LIME, underperform in accuracy when interpreting these models. They fail to predict the order of feature importance, the magnitudes, and occasionally even the nature of the feature contribution. For other contributors, such as group fairness and their associated KPIs, similar analysis and calculations have been performed with the aim of adding profundity to the proposed audit framework. The framework is expected to assist regulatory bodies in performing conformity assessments of AI systems using multilevel binomial classification models at businesses. It will also benefit businesses deploying MLMs to be future proof and aligned with the European Commission proposed Regulation on Artificial Intelligence.
△ Less
Submitted 15 July, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
Inter-relational Model for understanding Chatbot acceptance across retail sectors
Authors:
Diptish Dey,
Debarati Bhaumik
Abstract:
Despite the rising interest in chatbots, deployment has been slow in the retail sector. In the absence of comparative cross sector research on the user acceptance of chatbots in retail, we present a model and a research framework that proposes customer and chatbot antecedents using trust and customer satisfaction as relationship mediators and word of mouth and expectation of continuity as relation…
▽ More
Despite the rising interest in chatbots, deployment has been slow in the retail sector. In the absence of comparative cross sector research on the user acceptance of chatbots in retail, we present a model and a research framework that proposes customer and chatbot antecedents using trust and customer satisfaction as relationship mediators and word of mouth and expectation of continuity as relationship outcomes. In determining our framework, we assimilate constructs from different models and theories overarching user experience with chatbots, technology acceptance and relationship marketing and propose a selection of 11 constructs as antecedents. Furthermore, we suggest retail sectors as one of our 4 moderators. Eventually, we provide insight into our current activities that is expected to identify which factors impact relationship outcomes to which extent across different retail sectors.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Predicting Personas Using Mechanic Frequencies and Game State Traces
Authors:
Michael Cerny Green,
Ahmed Khalifa,
M Charity,
Debosmita Bhaumik,
Julian Togelius
Abstract:
We investigate how to efficiently predict play personas based on playtraces. Play personas can be computed by calculating the action agreement ratio between a player and a generative model of playing behavior, a so-called procedural persona. But this is computationally expensive and assumes that appropriate procedural personas are readily available. We present two methods for estimating player per…
▽ More
We investigate how to efficiently predict play personas based on playtraces. Play personas can be computed by calculating the action agreement ratio between a player and a generative model of playing behavior, a so-called procedural persona. But this is computationally expensive and assumes that appropriate procedural personas are readily available. We present two methods for estimating player persona, one using regular supervised learning and aggregate measures of game mechanics initiated, and another based on sequence learning on a trace of closely cropped gameplay observations. While both of these methods achieve high accuracy when predicting play personas defined by agreement with procedural personas, they utterly fail to predict play style as defined by the players themselves using a questionnaire. This interesting result highlights the value of using computational methods in defining play personas.
△ Less
Submitted 15 June, 2022; v1 submitted 24 March, 2022;
originally announced March 2022.
-
Computing first passage times for Markov-modulated fluid models using numerical PDE problem solvers
Authors:
Debarati Bhaumik,
Marko A. A. Boon,
Daan Crommelin,
Barry Koren,
Bert Zwart
Abstract:
A popular method to compute first-passage probabilities in continuous-time Markov chains is by numerically inverting their Laplace transforms. Past decades, the scientific computing community has developed excellent numerical methods for solving problems governed by partial differential equations (PDEs), making the availability of a Laplace transform not necessary here for computational purposes.…
▽ More
A popular method to compute first-passage probabilities in continuous-time Markov chains is by numerically inverting their Laplace transforms. Past decades, the scientific computing community has developed excellent numerical methods for solving problems governed by partial differential equations (PDEs), making the availability of a Laplace transform not necessary here for computational purposes. In this study we demonstrate that numerical PDE problem solvers are suitable for computing first passage times, and can be very efficient for this purpose. By doing extensive computational experiments, we show that modern PDE problem solvers can outperform numerical Laplace transform inversion, even if a transform is available. When the Laplace transform is explicit (e.g. does not require the computation of an eigensystem), numerical transform inversion remains the primary method of choice.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.
-
Tree Search vs Optimization Approaches for Map Generation
Authors:
Debosmita Bhaumik,
Ahmed Khalifa,
Michael Cerny Green,
Julian Togelius
Abstract:
Search-based procedural content generation uses stochastic global optimization algorithms to search for game content. However, standard tree search algorithms can be competitive with evolution on some optimization problems. We investigate the applicability of several tree search methods to level generation and compare them systematically with several optimization algorithms, including evolutionary…
▽ More
Search-based procedural content generation uses stochastic global optimization algorithms to search for game content. However, standard tree search algorithms can be competitive with evolution on some optimization problems. We investigate the applicability of several tree search methods to level generation and compare them systematically with several optimization algorithms, including evolutionary algorithms. We compare them on three different game level generation problems: Binary, Zelda, and Sokoban. We introduce two new representations that can help tree search algorithms deal with the large branching factor of the generation problem. We find that in general, optimization algorithms clearly outperform tree search algorithms, but given the right problem representation certain tree search algorithms perform similarly to optimization algorithms, and in one particular problem, we see surprisingly strong results from MCTS.
△ Less
Submitted 12 August, 2020; v1 submitted 27 March, 2019;
originally announced March 2019.
-
Estimating historic movement of a climatological variable from a pair of misaligned data sets
Authors:
Dibyendu Bhaumik,
Debasis Sengupta
Abstract:
We consider in this paper the problem of estimating the mean function from a pair of paleoclimatic functional data sets, after one of them has been registered with the other. We show theoretically that registering one data set with respect to the other is the right way to formulate this problem, which is in contrast with estimation of the mean function in a "neutral" time scale that is preferred i…
▽ More
We consider in this paper the problem of estimating the mean function from a pair of paleoclimatic functional data sets, after one of them has been registered with the other. We show theoretically that registering one data set with respect to the other is the right way to formulate this problem, which is in contrast with estimation of the mean function in a "neutral" time scale that is preferred in the analysis of multiple sets of longitudinal growth data. Once this registration is done, the Nadaraya-Watson estimator of the mean function may be computed from the pooled data. We show that, if a consistent estimator of the time transformation is used for this registration, the above estimator of the mean function would be consistent under a few additional conditions. We study the potential change in asymptotic mean squared error of the estimator that may be possible because of the contribution of the time-transformed data set. After demonstrating through simulation that the additional data can lead to improved estimation in spite of estimation error in registration, we estimate the mean function of three pairs of paleoclimatic data sets. The analysis reveals some interesting aspects of the data sets and the estimation problem.
△ Less
Submitted 20 December, 2017;
originally announced December 2017.
-
Feature Sensitive Curve Registration by Kernel Matching
Authors:
Dibyendu Bhaumik,
Radhendushka Srivastava,
Debasis Sengupta
Abstract:
In this paper, we argue that the problem of registering two sets of functional data, where the underlying mean function has sharp features, is not properly addressed by methods designed to align a bunch of growth curves data. We provide a new method, which is able to pool local information without smoothing and to match sharp landmarks without manual identification. This method, which we refer to…
▽ More
In this paper, we argue that the problem of registering two sets of functional data, where the underlying mean function has sharp features, is not properly addressed by methods designed to align a bunch of growth curves data. We provide a new method, which is able to pool local information without smoothing and to match sharp landmarks without manual identification. This method, which we refer to as kernel-matched registration, is based on maximizing a kernel-based measure of alignment. We prove that the proposed method is consistent under fairly general conditions. Simulation results show superiority of the performance of the proposed method over two existing methods. The proposed method is illustrated through the analysis of three sets of paleoclimatic data.
△ Less
Submitted 17 December, 2017; v1 submitted 10 April, 2017;
originally announced April 2017.
-
Feature Sensitive and Automated Curve Registration
Authors:
Dibyendu Bhaumik,
Radhendushka Srivastava,
Debasis Sengupta
Abstract:
Given two sets of functional data having a common underlying mean function but different degrees of distortion in time measurements, we provide a method of estimating the time transformation necessary to align (or `register') them. We prove that the proposed method is consistent under fairly general conditions. Simulation results show superiority of the performance of the proposed method over two…
▽ More
Given two sets of functional data having a common underlying mean function but different degrees of distortion in time measurements, we provide a method of estimating the time transformation necessary to align (or `register') them. We prove that the proposed method is consistent under fairly general conditions. Simulation results show superiority of the performance of the proposed method over two existing methods. The proposed method is illustrated through the analysis of three paleoclimatic data sets.
△ Less
Submitted 20 April, 2016; v1 submitted 23 August, 2015;
originally announced August 2015.
-
Algorithm for Predicting Protein Secondary Structure
Authors:
K. K Senapati,
G. Sahoo,
D. Bhaumik
Abstract:
Predicting protein structure from amino acid sequence is one of the most important unsolved problems of molecular biology and biophysics.Not only would a successful prediction algorithm be a tremendous advance in the understanding of the biochemical mechanisms of proteins, but, since such an algorithm could conceivably be used to design proteins to carry out specific functions.Prediction of the se…
▽ More
Predicting protein structure from amino acid sequence is one of the most important unsolved problems of molecular biology and biophysics.Not only would a successful prediction algorithm be a tremendous advance in the understanding of the biochemical mechanisms of proteins, but, since such an algorithm could conceivably be used to design proteins to carry out specific functions.Prediction of the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three dimensional structure as well as its function. In this research, we use different Hidden Markov models for protein secondary structure prediction. In this paper we have proposed an algorithm for predicting protein secondary structure. We have used Hidden Markov model with sliding window for secondary structure prediction.The secondary structure has three regular forms, for each secondary structural element we are using one Hidden Markov Model.
△ Less
Submitted 14 June, 2010;
originally announced June 2010.