-
A Deep Bayesian Convolutional Spiking Neural Network-based CAD system with Uncertainty Quantification for Medical Images Classification
Authors:
Mohaddeseh Chegini,
Ali Mahloojifar
Abstract:
The Computer_Aided Diagnosis (CAD) systems facilitate accurate diagnosis of diseases. The development of CADs by leveraging third generation neural network, namely, Spiking Neural Network (SNN), is essential to utilize of the benefits of SNNs, such as their event_driven processing, parallelism, low power consumption, and the ability to process sparse temporal_spatial information. However, Deep SNN…
▽ More
The Computer_Aided Diagnosis (CAD) systems facilitate accurate diagnosis of diseases. The development of CADs by leveraging third generation neural network, namely, Spiking Neural Network (SNN), is essential to utilize of the benefits of SNNs, such as their event_driven processing, parallelism, low power consumption, and the ability to process sparse temporal_spatial information. However, Deep SNN as a deep learning model faces challenges with unreliability. To deal with unreliability challenges due to inability to quantify the uncertainty of the predictions, we proposed a deep Bayesian Convolutional Spiking Neural Network based_CADs with uncertainty_aware module. In this study, the Monte Carlo Dropout method as Bayesian approximation is used as an uncertainty quantification method. This method was applied to several medical image classification tasks. Our experimental results demonstrate that our proposed model is accurate and reliable and will be a proper alternative to conventional deep learning for medical image classification.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
BI-RADS prediction of mammographic masses using uncertainty information extracted from a Bayesian Deep Learning model
Authors:
Mohaddeseh Chegini,
Ali Mahloojifar
Abstract:
The BI_RADS score is a probabilistic reporting tool used by radiologists to express the level of uncertainty in predicting breast cancer based on some morphological features in mammography images. There is a significant variability in describing masses which sometimes leads to BI_RADS misclassification. Using a BI_RADS prediction system is required to support the final radiologist decisions. In th…
▽ More
The BI_RADS score is a probabilistic reporting tool used by radiologists to express the level of uncertainty in predicting breast cancer based on some morphological features in mammography images. There is a significant variability in describing masses which sometimes leads to BI_RADS misclassification. Using a BI_RADS prediction system is required to support the final radiologist decisions. In this study, the uncertainty information extracted by a Bayesian deep learning model is utilized to predict the BI_RADS score. The investigation results based on the pathology information demonstrate that the f1-scores of the predictions of the radiologist are 42.86%, 48.33% and 48.28%, meanwhile, the f1-scores of the model performance are 73.33%, 59.60% and 59.26% in the BI_RADS 2, 3 and 5 dataset samples, respectively. Also, the model can distinguish malignant from benign samples in the BI_RADS 0 category of the used dataset with an accuracy of 75.86% and correctly identify all malignant samples as BI_RADS 5. The Grad-CAM visualization shows the model pays attention to the morphological features of the lesions. Therefore, this study shows the uncertainty-aware Bayesian Deep Learning model can report his uncertainty about the malignancy of a lesion based on morphological features, like a radiologist.
△ Less
Submitted 24 March, 2025; v1 submitted 18 March, 2025;
originally announced March 2025.
-
RePanda: Pandas-powered Tabular Verification and Reasoning
Authors:
Atoosa Malemir Chegini,
Keivan Rezaei,
Hamid Eghbalzadeh,
Soheil Feizi
Abstract:
Fact-checking tabular data is essential for ensuring the accuracy of structured information. However, existing methods often rely on black-box models with opaque reasoning. We introduce RePanda, a structured fact verification approach that translates claims into executable pandas queries, enabling interpretable and verifiable reasoning.
To train RePanda, we construct PanTabFact, a structured dat…
▽ More
Fact-checking tabular data is essential for ensuring the accuracy of structured information. However, existing methods often rely on black-box models with opaque reasoning. We introduce RePanda, a structured fact verification approach that translates claims into executable pandas queries, enabling interpretable and verifiable reasoning.
To train RePanda, we construct PanTabFact, a structured dataset derived from the TabFact train set, where claims are paired with executable queries generated using DeepSeek-Chat and refined through automated error correction. Fine-tuning DeepSeek-coder-7B-instruct-v1.5 on PanTabFact, RePanda achieves 84.09% accuracy on the TabFact test set.
To evaluate Out-of-Distribution (OOD) generalization, we interpret question-answer pairs from WikiTableQuestions as factual claims and refer to this dataset as WikiFact. Without additional fine-tuning, RePanda achieves 84.72% accuracy on WikiFact, significantly outperforming all other baselines and demonstrating strong OOD robustness. Notably, these results closely match the zero-shot performance of DeepSeek-Chat (671B), indicating that our fine-tuning approach effectively distills structured reasoning from a much larger model into a compact, locally executable 7B model.
Beyond fact verification, RePanda extends to tabular question answering by generating executable queries that retrieve precise answers. To support this, we introduce PanWiki, a dataset mapping WikiTableQuestions to pandas queries. Fine-tuning on PanWiki, RePanda achieves 75.1% accuracy in direct answer retrieval. These results highlight the effectiveness of structured execution-based reasoning for tabular verification and question answering.
We have publicly released the dataset on Hugging Face at datasets/AtoosaChegini/PanTabFact.
△ Less
Submitted 20 March, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
Reliable Breast Cancer Molecular Subtype Prediction based on uncertainty-aware Bayesian Deep Learning by Mammography
Authors:
Mohaddeseh Chegini,
Ali Mahloojifar
Abstract:
Breast cancer is a heterogeneous disease with different molecular subtypes, clinical behavior, treatment responses as well as survival outcomes. The development of a reliable, accurate, available and inexpensive method to predict the molecular subtypes using medical images plays an important role in the diagnosis and prognosis of breast cancer. Recently, deep learning methods have shown good perfo…
▽ More
Breast cancer is a heterogeneous disease with different molecular subtypes, clinical behavior, treatment responses as well as survival outcomes. The development of a reliable, accurate, available and inexpensive method to predict the molecular subtypes using medical images plays an important role in the diagnosis and prognosis of breast cancer. Recently, deep learning methods have shown good performance in the breast cancer classification tasks using various medical images. Despite all that success, classical deep learning cannot deliver the predictive uncertainty. The uncertainty represents the validity of the predictions. Therefore, the high predicted uncertainty might cause a negative effect in the accurate diagnosis of breast cancer molecular subtypes. To overcome this, uncertainty quantification methods are used to determine the predictive uncertainty. Accordingly, in this study, we proposed an uncertainty-aware Bayesian deep learning model using the full mammogram images. In addition, to increase the performance of the multi-class molecular subtype classification task, we proposed a novel hierarchical classification strategy, named the two-stage classification strategy. The separate AUC of the proposed model for each subtype was 0.71, 0.75 and 0.86 for HER2-enriched, luminal and triple-negative classes, respectively. The proposed model not only has a comparable performance to other studies in the field of breast cancer molecular subtypes prediction, even using full mammography images, but it is also more reliable, due to quantify the predictive uncertainty.
△ Less
Submitted 19 December, 2024; v1 submitted 16 December, 2024;
originally announced December 2024.
-
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Authors:
Gaurav Sahu,
Abhay Puri,
Juan Rodriguez,
Amirhossein Abaskohi,
Mohammad Chegini,
Alexandre Drouin,
Perouz Taslakian,
Valentina Zantedeschi,
Alexandre Lacoste,
David Vazquez,
Nicolas Chapados,
Christopher Pal,
Sai Rajeswar Mudumba,
Issam Hadj Laradji
Abstract:
Data analytics is essential for extracting valuable insights from data that can assist organizations in making effective decisions. We introduce InsightBench, a benchmark dataset with three key features. First, it consists of 100 datasets representing diverse business use cases such as finance and incident management, each accompanied by a carefully curated set of insights planted in the datasets.…
▽ More
Data analytics is essential for extracting valuable insights from data that can assist organizations in making effective decisions. We introduce InsightBench, a benchmark dataset with three key features. First, it consists of 100 datasets representing diverse business use cases such as finance and incident management, each accompanied by a carefully curated set of insights planted in the datasets. Second, unlike existing benchmarks focusing on answering single queries, InsightBench evaluates agents based on their ability to perform end-to-end data analytics, including formulating questions, interpreting answers, and generating a summary of insights and actionable steps. Third, we conducted comprehensive quality assurance to ensure that each dataset in the benchmark had clear goals and included relevant and meaningful questions and analysis. Furthermore, we implement a two-way evaluation mechanism using LLaMA-3 as an effective, open-source evaluator to assess agents' ability to extract insights. We also propose AgentPoirot, our baseline data analysis agent capable of performing end-to-end data analytics. Our evaluation on InsightBench shows that AgentPoirot outperforms existing approaches (such as Pandas Agent) that focus on resolving single queries. We also compare the performance of open- and closed-source LLMs and various evaluation strategies. Overall, this benchmark serves as a testbed to motivate further development in comprehensive automated data analytics and can be accessed here: https://github.com/ServiceNow/insight-bench.
△ Less
Submitted 27 February, 2025; v1 submitted 8 July, 2024;
originally announced July 2024.
-
EditVal: Benchmarking Diffusion Based Text-Guided Image Editing Methods
Authors:
Samyadeep Basu,
Mehrdad Saberi,
Shweta Bhardwaj,
Atoosa Malemir Chegini,
Daniela Massiceti,
Maziar Sanjabi,
Shell Xu Hu,
Soheil Feizi
Abstract:
A plethora of text-guided image editing methods have recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models such as Imagen and Stable Diffusion. A standardized evaluation protocol, however, does not exist to compare methods across different types of fine-grained edits. To address this gap, we introduce EditVal, a standardized benchmark fo…
▽ More
A plethora of text-guided image editing methods have recently been developed by leveraging the impressive capabilities of large-scale diffusion-based generative models such as Imagen and Stable Diffusion. A standardized evaluation protocol, however, does not exist to compare methods across different types of fine-grained edits. To address this gap, we introduce EditVal, a standardized benchmark for quantitatively evaluating text-guided image editing methods. EditVal consists of a curated dataset of images, a set of editable attributes for each image drawn from 13 possible edit types, and an automated evaluation pipeline that uses pre-trained vision-language models to assess the fidelity of generated images for each edit type. We use EditVal to benchmark 8 cutting-edge diffusion-based editing methods including SINE, Imagic and Instruct-Pix2Pix. We complement this with a large-scale human study where we show that EditVall's automated evaluation pipeline is strongly correlated with human-preferences for the edit types we considered. From both the human study and automated evaluation, we find that: (i) Instruct-Pix2Pix, Null-Text and SINE are the top-performing methods averaged across different edit types, however {\it only} Instruct-Pix2Pix and Null-Text are able to preserve original image properties; (ii) Most of the editing methods fail at edits involving spatial operations (e.g., changing the position of an object). (iii) There is no `winner' method which ranks the best individually across a range of different edit types. We hope that our benchmark can pave the way to developing more reliable text-guided image editing tools in the future. We will publicly release EditVal, and all associated code and human-study templates to support these research directions in https://deep-ml-research.github.io/editval/.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Data-Centric Debugging: mitigating model failures via targeted data collection
Authors:
Sahil Singla,
Atoosa Malemir Chegini,
Mazda Moayeri,
Soheil Feiz
Abstract:
Deep neural networks can be unreliable in the real world when the training set does not adequately cover all the settings where they are deployed. Focusing on image classification, we consider the setting where we have an error distribution $\mathcal{E}$ representing a deployment scenario where the model fails. We have access to a small set of samples $\mathcal{E}_{sample}$ from $\mathcal{E}$ and…
▽ More
Deep neural networks can be unreliable in the real world when the training set does not adequately cover all the settings where they are deployed. Focusing on image classification, we consider the setting where we have an error distribution $\mathcal{E}$ representing a deployment scenario where the model fails. We have access to a small set of samples $\mathcal{E}_{sample}$ from $\mathcal{E}$ and it can be expensive to obtain additional samples. In the traditional model development framework, mitigating failures of the model in $\mathcal{E}$ can be challenging and is often done in an ad hoc manner. In this paper, we propose a general methodology for model debugging that can systemically improve model performance on $\mathcal{E}$ while maintaining its performance on the original test set. Our key assumption is that we have access to a large pool of weakly (noisily) labeled data $\mathcal{F}$. However, naively adding $\mathcal{F}$ to the training would hurt model performance due to the large extent of label noise. Our Data-Centric Debugging (DCD) framework carefully creates a debug-train set by selecting images from $\mathcal{F}$ that are perceptually similar to the images in $\mathcal{E}_{sample}$. To do this, we use the $\ell_2$ distance in the feature space (penultimate layer activations) of various models including ResNet, Robust ResNet and DINO where we observe DINO ViTs are significantly better at discovering similar images compared to Resnets. Compared to LPIPS, we find that our method reduces compute and storage requirements by 99.58\%. Compared to the baselines that maintain model performance on the test set, we achieve significantly (+9.45\%) improved results on the debug-heldout sets.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
InForecaster: Forecasting Influenza Hemagglutinin Mutations Through the Lens of Anomaly Detection
Authors:
Ali Garjani,
Atoosa Malemir Chegini,
Mohammadreza Salehi,
Alireza Tabibzadeh,
Parastoo Yousefi,
Mohammad Hossein Razizadeh,
Moein Esghaei,
Maryam Esghaei,
Mohammad Hossein Rohban
Abstract:
The influenza virus hemagglutinin is an important part of the virus attachment to the host cells. The hemagglutinin proteins are one of the genetic regions of the virus with a high potential for mutations. Due to the importance of predicting mutations in producing effective and low-cost vaccines, solutions that attempt to approach this problem have recently gained a significant attention. A histor…
▽ More
The influenza virus hemagglutinin is an important part of the virus attachment to the host cells. The hemagglutinin proteins are one of the genetic regions of the virus with a high potential for mutations. Due to the importance of predicting mutations in producing effective and low-cost vaccines, solutions that attempt to approach this problem have recently gained a significant attention. A historical record of mutations have been used to train predictive models in such solutions. However, the imbalance between mutations and the preserved proteins is a big challenge for the development of such models that needs to be addressed. Here, we propose to tackle this challenge through anomaly detection (AD). AD is a well-established field in Machine Learning (ML) that tries to distinguish unseen anomalies from the normal patterns using only normal training samples. By considering mutations as the anomalous behavior, we could benefit existing rich solutions in this field that have emerged recently. Such methods also fit the problem setup of extreme imbalance between the number of unmutated vs. mutated training samples. Motivated by this formulation, our method tries to find a compact representation for unmutated samples while forcing anomalies to be separated from the normal ones. This helps the model to learn a shared unique representation between normal training samples as much as possible, which improves the discernibility and detectability of mutated samples from the unmutated ones at the test time. We conduct a large number of experiments on four publicly available datasets, consisting of 3 different hemagglutinin protein datasets, and one SARS-CoV-2 dataset, and show the effectiveness of our method through different standard criteria.
△ Less
Submitted 24 October, 2022;
originally announced October 2022.
-
Optimization of Quantized Phase Shifts for Reconfigurable Smart Surfaces Assisted Communications
Authors:
Maximiliano Rivera,
Mohammad Chegini,
Wael Jaafar,
Safwan Alfattani,
Halim Yanikomeroglu
Abstract:
Reconfigurable Smart Surface (RSS) is assumed to be a key enabler for future wireless communication systems due to its ability to control the wireless propagation environment and, thus, enhance communications quality. Although optimal and continuous phase-shift configuration can be analytically obtained, practical RSS systems are prone to both channel estimation errors, discrete control, and curse…
▽ More
Reconfigurable Smart Surface (RSS) is assumed to be a key enabler for future wireless communication systems due to its ability to control the wireless propagation environment and, thus, enhance communications quality. Although optimal and continuous phase-shift configuration can be analytically obtained, practical RSS systems are prone to both channel estimation errors, discrete control, and curse of dimensionality. This leads to relaying on a finite number of phase-shift configurations that is expected to degrade the system's performances. In this paper, we tackle the problem of quantized RSS phase-shift configuration, aiming to maximize the data rate of an orthogonal frequency division multiplexing (OFDM) point-to-point RSS-assisted communication. Due to the complexity of optimally solving the formulated problem, we propose here a sub-optimal greedy algorithm to solve it. Simulation results illustrate the performance superiority of the proposed algorithm compared to baseline approaches. Finally, the impact of several parameters ,e.g., quantization resolution and RSS placement, is investigated.
△ Less
Submitted 22 November, 2021; v1 submitted 19 November, 2021;
originally announced November 2021.