Search | arXiv e-print repository

FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes

Authors: Janki Atul Nawale, Mohammed Safi Ur Rahman Khan, Janani D, Mansi Gupta, Danish Pruthi, Mitesh M. Khapra

Abstract: Existing studies on fairness are largely Western-focused, making them inadequate for culturally diverse countries such as India. To address this gap, we introduce INDIC-BIAS, a comprehensive India-centric benchmark designed to evaluate fairness of LLMs across 85 identity groups encompassing diverse castes, religions, regions, and tribes. We first consult domain experts to curate over 1,800 socio-c… ▽ More Existing studies on fairness are largely Western-focused, making them inadequate for culturally diverse countries such as India. To address this gap, we introduce INDIC-BIAS, a comprehensive India-centric benchmark designed to evaluate fairness of LLMs across 85 identity groups encompassing diverse castes, religions, regions, and tribes. We first consult domain experts to curate over 1,800 socio-cultural topics spanning behaviors and situations, where biases and stereotypes are likely to emerge. Grounded in these topics, we generate and manually validate 20,000 real-world scenario templates to probe LLMs for fairness. We structure these templates into three evaluation tasks: plausibility, judgment, and generation. Our evaluation of 14 popular LLMs on these tasks reveals strong negative biases against marginalized identities, with models frequently reinforcing common stereotypes. Additionally, we find that models struggle to mitigate bias even when explicitly asked to rationalize their decision. Our evaluation provides evidence of both allocative and representational harms that current LLMs could cause towards Indian identities, calling for a more cautious usage in practical applications. We release INDIC-BIAS as an open-source benchmark to advance research on benchmarking and mitigating biases and stereotypes in the Indian context. △ Less

Submitted 29 June, 2025; originally announced June 2025.

Comments: Accepted in ACL 2025

arXiv:2506.19863 [pdf, ps, other]

Exploring the Capabilities of the Frontier Large Language Models for Nuclear Energy Research

Authors: Ahmed Almeldein, Mohammed Alnaggar, Rick Archibald, Tom Beck, Arpan Biswas, Rike Bostelmann, Wes Brewer, Chris Bryan, Christopher Calle, Cihangir Celik, Rajni Chahal, Jong Youl Choi, Arindam Chowdhury, Mark Cianciosa, Franklin Curtis, Gregory Davidson, Sebastian De Pascuale, Lisa Fassino, Ana Gainaru, Yashika Ghai, Luke Gibson, Qian Gong, Christopher Greulich, Scott Greenwood, Cory Hauck , et al. (25 additional authors not shown)

Abstract: The AI for Nuclear Energy workshop at Oak Ridge National Laboratory evaluated the potential of Large Language Models (LLMs) to accelerate fusion and fission research. Fourteen interdisciplinary teams explored diverse nuclear science challenges using ChatGPT, Gemini, Claude, and other AI models over a single day. Applications ranged from developing foundation models for fusion reactor control to au… ▽ More The AI for Nuclear Energy workshop at Oak Ridge National Laboratory evaluated the potential of Large Language Models (LLMs) to accelerate fusion and fission research. Fourteen interdisciplinary teams explored diverse nuclear science challenges using ChatGPT, Gemini, Claude, and other AI models over a single day. Applications ranged from developing foundation models for fusion reactor control to automating Monte Carlo simulations, predicting material degradation, and designing experimental programs for advanced reactors. Teams employed structured workflows combining prompt engineering, deep research capabilities, and iterative refinement to generate hypotheses, prototype code, and research strategies. Key findings demonstrate that LLMs excel at early-stage exploration, literature synthesis, and workflow design, successfully identifying research gaps and generating plausible experimental frameworks. However, significant limitations emerged, including difficulties with novel materials designs, advanced code generation for modeling and simulation, and domain-specific details requiring expert validation. The successful outcomes resulted from expert-driven prompt engineering and treating AI as a complementary tool rather than a replacement for physics-based methods. The workshop validated AI's potential to accelerate nuclear energy research through rapid iteration and cross-disciplinary synthesis while highlighting the need for curated nuclear-specific datasets, workflow automation, and specialized model development. These results provide a roadmap for integrating AI tools into nuclear science workflows, potentially reducing development cycles for safer, more efficient nuclear energy systems while maintaining rigorous scientific standards. △ Less

Submitted 26 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

arXiv:2506.19583 [pdf, ps, other]

ConStellaration: A dataset of QI-like stellarator plasma boundaries and optimization benchmarks

Authors: Santiago A. Cadena, Andrea Merlo, Emanuel Laude, Alexander Bauer, Atul Agrawal, Maria Pascu, Marija Savtchouk, Enrico Guiraud, Lukas Bonauer, Stuart Hudson, Markus Kaiser

Abstract: Stellarators are magnetic confinement devices under active development to deliver steady-state carbon-free fusion energy. Their design involves a high-dimensional, constrained optimization problem that requires expensive physics simulations and significant domain expertise. Recent advances in plasma physics and open-source tools have made stellarator optimization more accessible. However, broader… ▽ More Stellarators are magnetic confinement devices under active development to deliver steady-state carbon-free fusion energy. Their design involves a high-dimensional, constrained optimization problem that requires expensive physics simulations and significant domain expertise. Recent advances in plasma physics and open-source tools have made stellarator optimization more accessible. However, broader community progress is currently bottlenecked by the lack of standardized optimization problems with strong baselines and datasets that enable data-driven approaches, particularly for quasi-isodynamic (QI) stellarator configurations, considered as a promising path to commercial fusion due to their inherent resilience to current-driven disruptions. Here, we release an open dataset of diverse QI-like stellarator plasma boundary shapes, paired with their ideal magnetohydrodynamic (MHD) equilibria and performance metrics. We generated this dataset by sampling a variety of QI fields and optimizing corresponding stellarator plasma boundaries. We introduce three optimization benchmarks of increasing complexity: (1) a single-objective geometric optimization problem, (2) a "simple-to-build" QI stellarator, and (3) a multi-objective ideal-MHD stable QI stellarator that investigates trade-offs between compactness and coil simplicity. For every benchmark, we provide reference code, evaluation scripts, and strong baselines based on classical optimization techniques. Finally, we show how learned models trained on our dataset can efficiently generate novel, feasible configurations without querying expensive physics oracles. By openly releasing the dataset along with benchmark problems and baselines, we aim to lower the entry barrier for optimization and machine learning researchers to engage in stellarator design and to accelerate cross-disciplinary progress toward bringing fusion energy to the grid. △ Less

Submitted 24 June, 2025; originally announced June 2025.

arXiv:2506.18789 [pdf, ps, other]

Shift Happens: Mixture of Experts based Continual Adaptation in Federated Learning

Authors: Rahul Atul Bhope, K. R. Jayaram, Praveen Venkateswaran, Nalini Venkatasubramanian

Abstract: Federated Learning (FL) enables collaborative model training across decentralized clients without sharing raw data, yet faces significant challenges in real-world settings where client data distributions evolve dynamically over time. This paper tackles the critical problem of covariate and label shifts in streaming FL environments, where non-stationary data distributions degrade model performance… ▽ More Federated Learning (FL) enables collaborative model training across decentralized clients without sharing raw data, yet faces significant challenges in real-world settings where client data distributions evolve dynamically over time. This paper tackles the critical problem of covariate and label shifts in streaming FL environments, where non-stationary data distributions degrade model performance and require adaptive middleware solutions. We introduce ShiftEx, a shift-aware mixture of experts framework that dynamically creates and trains specialized global models in response to detected distribution shifts using Maximum Mean Discrepancy for covariate shifts. The framework employs a latent memory mechanism for expert reuse and implements facility location-based optimization to jointly minimize covariate mismatch, expert creation costs, and label imbalance. Through theoretical analysis and comprehensive experiments on benchmark datasets, we demonstrate 5.5-12.9 percentage point accuracy improvements and 22-95 % faster adaptation compared to state-of-the-art FL baselines across diverse shift scenarios. The proposed approach offers a scalable, privacy-preserving middleware solution for FL systems operating in non-stationary, real-world conditions while minimizing communication and computational overhead. △ Less

Submitted 23 June, 2025; originally announced June 2025.

arXiv:2506.14900 [pdf, ps, other]

Adverse Event Extraction from Discharge Summaries: A New Dataset, Annotation Scheme, and Initial Findings

Authors: Imane Guellil, Salomé Andres, Atul Anand, Bruce Guthrie, Huayu Zhang, Abul Hasan, Honghan Wu, Beatrice Alex

Abstract: In this work, we present a manually annotated corpus for Adverse Event (AE) extraction from discharge summaries of elderly patients, a population often underrepresented in clinical NLP resources. The dataset includes 14 clinically significant AEs-such as falls, delirium, and intracranial haemorrhage, along with contextual attributes like negation, diagnosis type, and in-hospital occurrence. Unique… ▽ More In this work, we present a manually annotated corpus for Adverse Event (AE) extraction from discharge summaries of elderly patients, a population often underrepresented in clinical NLP resources. The dataset includes 14 clinically significant AEs-such as falls, delirium, and intracranial haemorrhage, along with contextual attributes like negation, diagnosis type, and in-hospital occurrence. Uniquely, the annotation schema supports both discontinuous and overlapping entities, addressing challenges rarely tackled in prior work. We evaluate multiple models using FlairNLP across three annotation granularities: fine-grained, coarse-grained, and coarse-grained with negation. While transformer-based models (e.g., BERT-cased) achieve strong performance on document-level coarse-grained extraction (F1 = 0.943), performance drops notably for fine-grained entity-level tasks (e.g., F1 = 0.675), particularly for rare events and complex attributes. These results demonstrate that despite high-level scores, significant challenges remain in detecting underrepresented AEs and capturing nuanced clinical language. Developed within a Trusted Research Environment (TRE), the dataset is available upon request via DataLoch and serves as a robust benchmark for evaluating AE extraction methods and supporting future cross-dataset generalisation. △ Less

Submitted 17 June, 2025; originally announced June 2025.

Comments: Accepted and will be published at ACL2025 (main conference)

arXiv:2506.13912 [pdf, ps, other]

Density-aware Walks for Coordinated Campaign Detection

Authors: Atul Anand Gopalakrishnan, Jakir Hossain, Tuğrulcan Elmas, Ahmet Erdem Sarıyüce

Abstract: Coordinated campaigns frequently exploit social media platforms by artificially amplifying topics, making inauthentic trends appear organic, and misleading users into engagement. Distinguishing these coordinated efforts from genuine public discourse remains a significant challenge due to the sophisticated nature of such attacks. Our work focuses on detecting coordinated campaigns by modeling the p… ▽ More Coordinated campaigns frequently exploit social media platforms by artificially amplifying topics, making inauthentic trends appear organic, and misleading users into engagement. Distinguishing these coordinated efforts from genuine public discourse remains a significant challenge due to the sophisticated nature of such attacks. Our work focuses on detecting coordinated campaigns by modeling the problem as a graph classification task. We leverage the recently introduced Large Engagement Networks (LEN) dataset, which contains over 300 networks capturing engagement patterns from both fake and authentic trends on Twitter prior to the 2023 Turkish elections. The graphs in LEN were constructed by collecting interactions related to campaigns that stemmed from ephemeral astroturfing. Established graph neural networks (GNNs) struggle to accurately classify campaign graphs, highlighting the challenges posed by LEN due to the large size of its networks. To address this, we introduce a new graph classification method that leverages the density of local network structures. We propose a random weighted walk (RWW) approach in which node transitions are biased by local density measures such as degree, core number, or truss number. These RWWs are encoded using the Skip-gram model, producing density-aware structural embeddings for the nodes. Training message-passing neural networks (MPNNs) on these density-aware embeddings yields superior results compared to the simpler node features available in the dataset, with nearly a 12\% and 5\% improvement in accuracy for binary and multiclass classification, respectively. Our findings demonstrate that incorporating density-aware structural encoding with MPNNs provides a robust framework for identifying coordinated inauthentic behavior on social media networks such as Twitter. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: 16 Pages. Accepted at ECML-PKDD 2025

arXiv:2506.10999 [pdf]

Automated Validation of COBOL to Java Transformation

Authors: Atul Kumar, Diptikalyan Saha, Toshikai Yasue, Kohichi Ono, Saravanan Krishnan, Sandeep Hans, Fumiko Satoh, Gerald Mitchell, Sachin Kumar

Abstract: Recent advances in Large Language Model (LLM) based Generative AI techniques have made it feasible to translate enterpriselevel code from legacy languages such as COBOL to modern languages such as Java or Python. While the results of LLM-based automatic transformation are encouraging, the resulting code cannot be trusted to correctly translate the original code. We propose a framework and a tool t… ▽ More Recent advances in Large Language Model (LLM) based Generative AI techniques have made it feasible to translate enterpriselevel code from legacy languages such as COBOL to modern languages such as Java or Python. While the results of LLM-based automatic transformation are encouraging, the resulting code cannot be trusted to correctly translate the original code. We propose a framework and a tool to help validate the equivalence of COBOL and translated Java. The results can also help repair the code if there are some issues and provide feedback to the AI model to improve. We have developed a symbolic-execution-based test generation to automatically generate unit tests for the source COBOL programs which also mocks the external resource calls. We generate equivalent JUnit test cases with equivalent mocking as COBOL and run them to check semantic equivalence between original and translated programs. △ Less

Submitted 14 April, 2025; originally announced June 2025.

Comments: arXiv admin note: text overlap with arXiv:2504.10548

Journal ref: ASE 2024

arXiv:2506.08504 [pdf, ps, other]

CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations

Authors: Divyaksh Shukla, Ritesh Baviskar, Dwijesh Gohil, Aniket Tiwari, Atul Shree, Ashutosh Modi

Abstract: Discourse parsing is an important task useful for NLU applications such as summarization, machine comprehension, and emotion recognition. The current discourse parsing datasets based on conversations consists of written English dialogues restricted to a single domain. In this resource paper, we introduce CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations. Th… ▽ More Discourse parsing is an important task useful for NLU applications such as summarization, machine comprehension, and emotion recognition. The current discourse parsing datasets based on conversations consists of written English dialogues restricted to a single domain. In this resource paper, we introduce CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations. The corpus (code-mixed in Hindi and English) has both audio and transcribed text and is annotated with nine discourse relations. We experiment with various SoTA baseline models; the poor performance of SoTA models highlights the challenges of multi-domain code-mixed corpus, pointing towards the need for developing better models for such realistic settings. △ Less

Submitted 10 June, 2025; originally announced June 2025.

Comments: Accepted at ACL Findings 2025 (16 pages: 5 pages main content + 3 pages references + 8 pages appendix)

arXiv:2506.06003 [pdf, ps, other]

What Really is a Member? Discrediting Membership Inference via Poisoning

Authors: Neal Mangaokar, Ashish Hooda, Zhuohang Li, Bradley A. Malin, Kassem Fawaz, Somesh Jha, Atul Prakash, Amrita Roy Chowdhury

Abstract: Membership inference tests aim to determine whether a particular data point was included in a language model's training set. However, recent works have shown that such tests often fail under the strict definition of membership based on exact matching, and have suggested relaxing this definition to include semantic neighbors as members as well. In this work, we show that membership inference tests… ▽ More Membership inference tests aim to determine whether a particular data point was included in a language model's training set. However, recent works have shown that such tests often fail under the strict definition of membership based on exact matching, and have suggested relaxing this definition to include semantic neighbors as members as well. In this work, we show that membership inference tests are still unreliable under this relaxation - it is possible to poison the training dataset in a way that causes the test to produce incorrect predictions for a target point. We theoretically reveal a trade-off between a test's accuracy and its robustness to poisoning. We also present a concrete instantiation of this poisoning attack and empirically validate its effectiveness. Our results show that it can degrade the performance of existing tests to well below random. △ Less

Submitted 6 June, 2025; originally announced June 2025.

arXiv:2506.00316 [pdf, ps, other]

Active Learning via Regression Beyond Realizability

Authors: Atul Ganju, Shashaank Aiyer, Ved Sriraman, Karthik Sridharan

Abstract: We present a new active learning framework for multiclass classification based on surrogate risk minimization that operates beyond the standard realizability assumption. Existing surrogate-based active learning algorithms crucially rely on realizability$\unicode{x2014}$the assumption that the optimal surrogate predictor lies within the model class$\unicode{x2014}$limiting their applicability in pr… ▽ More We present a new active learning framework for multiclass classification based on surrogate risk minimization that operates beyond the standard realizability assumption. Existing surrogate-based active learning algorithms crucially rely on realizability$\unicode{x2014}$the assumption that the optimal surrogate predictor lies within the model class$\unicode{x2014}$limiting their applicability in practical, misspecified settings. In this work we show that under conditions significantly weaker than realizability, as long as the class of models considered is convex, one can still obtain a label and sample complexity comparable to prior work. Despite achieving similar rates, the algorithmic approaches from prior works can be shown to fail in non-realizable settings where our assumption is satisfied. Our epoch-based active learning algorithm departs from prior methods by fitting a model from the full class to the queried data in each epoch and returning an improper classifier obtained by aggregating these models. △ Less

Submitted 30 May, 2025; originally announced June 2025.

arXiv:2505.22550 [pdf]

Domain specific ontologies from Linked Open Data (LOD)

Authors: Rosario Uceda-Sosa, Nandana Mihindukulasooriya, Atul Kumar, Sahil Bansal, Seema Nagar

Abstract: Logical and probabilistic reasoning tasks that require a deeper knowledge of semantics are increasingly relying on general purpose ontologies such as Wikidata and DBpedia. However, tasks such as entity disambiguation and linking may benefit from domain specific knowledge graphs, which make it more efficient to consume the knowledge and easier to extend with proprietary content. We discuss our expe… ▽ More Logical and probabilistic reasoning tasks that require a deeper knowledge of semantics are increasingly relying on general purpose ontologies such as Wikidata and DBpedia. However, tasks such as entity disambiguation and linking may benefit from domain specific knowledge graphs, which make it more efficient to consume the knowledge and easier to extend with proprietary content. We discuss our experience bootstrapping one such ontology for IT with a domain-agnostic pipeline, and extending it using domain-specific glossaries. △ Less

Submitted 28 May, 2025; originally announced May 2025.

arXiv:2505.18058 [pdf]

A Foundation Model Framework for Multi-View MRI Classification of Extramural Vascular Invasion and Mesorectal Fascia Invasion in Rectal Cancer

Authors: Yumeng Zhang, Zohaib Salahuddin, Danial Khan, Shruti Atul Mali, Henry C. Woodruff, Sina Amirrajab, Eduardo Ibor-Crespo, Ana Jimenez-Pastor, Luis Marti-Bonmati, Philippe Lambin

Abstract: Background: Accurate MRI-based identification of extramural vascular invasion (EVI) and mesorectal fascia invasion (MFI) is pivotal for risk-stratified management of rectal cancer, yet visual assessment is subjective and vulnerable to inter-institutional variability. Purpose: To develop and externally evaluate a multicenter, foundation-model-driven framework that automatically classifies EVI and M… ▽ More Background: Accurate MRI-based identification of extramural vascular invasion (EVI) and mesorectal fascia invasion (MFI) is pivotal for risk-stratified management of rectal cancer, yet visual assessment is subjective and vulnerable to inter-institutional variability. Purpose: To develop and externally evaluate a multicenter, foundation-model-driven framework that automatically classifies EVI and MFI on axial and sagittal T2-weighted MRI. Methods: This retrospective study used 331 pre-treatment rectal cancer MRI examinations from three European hospitals. After TotalSegmentator-guided rectal patch extraction, a self-supervised frequency-domain harmonization pipeline was trained to minimize scanner-related contrast shifts. Four classifiers were compared: ResNet50, SeResNet, the universal biomedical pretrained transformer (UMedPT) with a lightweight MLP head, and a logistic-regression variant using frozen UMedPT features (UMedPT_LR). Results: UMedPT_LR achieved the best EVI detection when axial and sagittal features were fused (AUC = 0.82; sensitivity = 0.75; F1 score = 0.73), surpassing the Chaimeleon Grand-Challenge winner (AUC = 0.74). The highest MFI performance was attained by UMedPT on axial harmonized images (AUC = 0.77), surpassing the Chaimeleon Grand-Challenge winner (AUC = 0.75). Frequency-domain harmonization improved MFI classification but variably affected EVI performance. Conventional CNNs (ResNet50, SeResNet) underperformed, especially in F1 score and balanced accuracy. Conclusion: These findings demonstrate that combining foundation model features, harmonization, and multi-view fusion significantly enhances diagnostic performance in rectal MRI. △ Less

Submitted 23 May, 2025; originally announced May 2025.

Comments: 22 pages, 8 figures

arXiv:2505.18019 [pdf, other]

LLM assisted web application functional requirements generation: A case study of four popular LLMs over a Mess Management System

Authors: Rashmi Gupta, Aditya K Gupta, Aarav Jain, Avinash C Pandey, Atul Gupta

Abstract: Like any other discipline, Large Language Models (LLMs) have significantly impacted software engineering by helping developers generate the required artifacts across various phases of software development. This paper presents a case study comparing the performance of popular LLMs GPT, Claude, Gemini, and DeepSeek in generating functional specifications that include use cases, business rules, and c… ▽ More Like any other discipline, Large Language Models (LLMs) have significantly impacted software engineering by helping developers generate the required artifacts across various phases of software development. This paper presents a case study comparing the performance of popular LLMs GPT, Claude, Gemini, and DeepSeek in generating functional specifications that include use cases, business rules, and collaborative workflows for a web application, the Mess Management System. The study evaluated the quality of LLM generated use cases, business rules, and collaborative workflows in terms of their syntactic and semantic correctness, consistency, non ambiguity, and completeness compared to the reference specifications against the zero-shot prompted problem statement. Our results suggested that all four LLMs can specify syntactically and semantically correct, mostly non-ambiguous artifacts. Still, they may be inconsistent at times and may differ significantly in the completeness of the generated specification. Claude and Gemini generated all the reference use cases, with Claude achieving the most complete but somewhat redundant use case specifications. Similar results were obtained for specifying workflows. However, all four LLMs struggled to generate relevant Business Rules, with DeepSeek generating the most reference rules but with less completeness. Overall, Claude generated more complete specification artifacts, while Gemini was more precise in the specifications it generated. △ Less

Submitted 23 May, 2025; originally announced May 2025.

Comments: 11 pages, 12 figures, Accepted in EASE 2025 https://conf.researchr.org/details/ease-2025/ease-2025-ai-models---data/11/LLM-assisted-web-application-functional-requirements-generation-A-case-study-of-fou

arXiv:2505.17971 [pdf]

Explainable Anatomy-Guided AI for Prostate MRI: Foundation Models and In Silico Clinical Trials for Virtual Biopsy-based Risk Assessment

Authors: Danial Khan, Zohaib Salahuddin, Yumeng Zhang, Sheng Kuang, Shruti Atul Mali, Henry C. Woodruff, Sina Amirrajab, Rachel Cavill, Eduardo Ibor-Crespo, Ana Jimenez-Pastor, Adrian Galiana-Bordera, Paula Jimenez Gomez, Luis Marti-Bonmati, Philippe Lambin

Abstract: We present a fully automated, anatomically guided deep learning pipeline for prostate cancer (PCa) risk stratification using routine MRI. The pipeline integrates three key components: an nnU-Net module for segmenting the prostate gland and its zones on axial T2-weighted MRI; a classification module based on the UMedPT Swin Transformer foundation model, fine-tuned on 3D patches with optional anatom… ▽ More We present a fully automated, anatomically guided deep learning pipeline for prostate cancer (PCa) risk stratification using routine MRI. The pipeline integrates three key components: an nnU-Net module for segmenting the prostate gland and its zones on axial T2-weighted MRI; a classification module based on the UMedPT Swin Transformer foundation model, fine-tuned on 3D patches with optional anatomical priors and clinical data; and a VAE-GAN framework for generating counterfactual heatmaps that localize decision-driving image regions. The system was developed using 1,500 PI-CAI cases for segmentation and 617 biparametric MRIs with metadata from the CHAIMELEON challenge for classification (split into 70% training, 10% validation, and 20% testing). Segmentation achieved mean Dice scores of 0.95 (gland), 0.94 (peripheral zone), and 0.92 (transition zone). Incorporating gland priors improved AUC from 0.69 to 0.72, with a three-scale ensemble achieving top performance (AUC = 0.79, composite score = 0.76), outperforming the 2024 CHAIMELEON challenge winners. Counterfactual heatmaps reliably highlighted lesions within segmented regions, enhancing model interpretability. In a prospective multi-center in-silico trial with 20 clinicians, AI assistance increased diagnostic accuracy from 0.72 to 0.77 and Cohen's kappa from 0.43 to 0.53, while reducing review time per case by 40%. These results demonstrate that anatomy-aware foundation models with counterfactual explainability can enable accurate, interpretable, and efficient PCa risk assessment, supporting their potential use as virtual biopsies in clinical practice. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.17893 [pdf]

Pixels to Prognosis: Harmonized Multi-Region CT-Radiomics and Foundation-Model Signatures Across Multicentre NSCLC Data

Authors: Shruti Atul Mali, Zohaib Salahuddin, Danial Khan, Yumeng Zhang, Henry C. Woodruff, Eduardo Ibor-Crespo, Ana Jimenez-Pastor, Luis Marti-Bonmati, Philippe Lambin

Abstract: Purpose: To evaluate the impact of harmonization and multi-region CT image feature integration on survival prediction in non-small cell lung cancer (NSCLC) patients, using handcrafted radiomics, pretrained foundation model (FM) features, and clinical data from a multicenter dataset. Methods: We analyzed CT scans and clinical data from 876 NSCLC patients (604 training, 272 test) across five cente… ▽ More Purpose: To evaluate the impact of harmonization and multi-region CT image feature integration on survival prediction in non-small cell lung cancer (NSCLC) patients, using handcrafted radiomics, pretrained foundation model (FM) features, and clinical data from a multicenter dataset. Methods: We analyzed CT scans and clinical data from 876 NSCLC patients (604 training, 272 test) across five centers. Features were extracted from the whole lung, tumor, mediastinal nodes, coronary arteries, and coronary artery calcium (CAC). Handcrafted radiomics and FM deep features were harmonized using ComBat, reconstruction kernel normalization (RKN), and RKN+ComBat. Regularized Cox models predicted overall survival; performance was assessed using the concordance index (C-index), 5-year time-dependent area under the curve (t-AUC), and hazard ratio (HR). SHapley Additive exPlanations (SHAP) values explained feature contributions. A consensus model used agreement across top region of interest (ROI) models to stratify patient risk. Results: TNM staging showed prognostic utility (C-index = 0.67; HR = 2.70; t-AUC = 0.85). The clinical + tumor radiomics model with ComBat achieved a C-index of 0.7552 and t-AUC of 0.8820. FM features (50-voxel cubes) combined with clinical data yielded the highest performance (C-index = 0.7616; t-AUC = 0.8866). An ensemble of all ROIs and FM features reached a C-index of 0.7142 and t-AUC of 0.7885. The consensus model, covering 78% of valid test cases, achieved a t-AUC of 0.92, sensitivity of 97.6%, and specificity of 66.7%. Conclusion: Harmonization and multi-region feature integration improve survival prediction in multicenter NSCLC data. Combining interpretable radiomics, FM features, and consensus modeling enables robust risk stratification across imaging centers. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.06885 [pdf]

Incremental Analysis of Legacy Applications Using Knowledge Graphs for Application Modernization

Authors: Saravanan Krishnan, Amith Singhee, Keerthi Narayan Raghunath, Alex Mathai, Atul Kumar, David Wenk

Abstract: Industries such as banking, telecom, and airlines - o6en have large so6ware systems that are several decades old. Many of these systems are written in old programming languages such as COBOL, PL/1, Assembler, etc. In many cases, the documentation is not updated, and those who developed/designed these systems are no longer around. Understanding these systems for either modernization or even regular… ▽ More Industries such as banking, telecom, and airlines - o6en have large so6ware systems that are several decades old. Many of these systems are written in old programming languages such as COBOL, PL/1, Assembler, etc. In many cases, the documentation is not updated, and those who developed/designed these systems are no longer around. Understanding these systems for either modernization or even regular maintenance has been a challenge. An extensive application may have natural boundaries based on its code dependencies and architecture. There are also other logical boundaries in an enterprise setting driven by business functions, data domains, etc. Due to these complications, the system architects generally plan their modernization across these logical boundaries in parts, thereby adopting an incremental approach for the modernization journey of the entire system. In this work, we present a so6ware system analysis tool that allows a subject ma=er expert (SME) or system architect to analyze a large so6ware system incrementally. We analyze the source code and other artifacts (such as data schema) to create a knowledge graph using a customizable ontology/schema. Entities and relations in our ontology can be defined for any combination of programming languages and platforms. Using this knowledge graph, the analyst can then define logical boundaries around dependent Entities (e.g. Programs, Transactions, Database Tables etc.). Our tool then presents different views showcasing the dependencies from the newly defined boundary to/from the other logical groups of the system. This exercise is repeated interactively to 1) Identify the Entities and groupings of interest for a modernization task and 2) Understand how a change in one part of the system may affect the other parts. To validate the efficacy of our tool, we provide an initial study of our system on two client applications. △ Less

Submitted 11 May, 2025; originally announced May 2025.

arXiv:2504.20988 [pdf, other]

Hubs and Spokes Learning: Efficient and Scalable Collaborative Machine Learning

Authors: Atul Sharma, Kavindu Herath, Saurabh Bagchi, Chaoyue Liu, Somali Chaterji

Abstract: We introduce the Hubs and Spokes Learning (HSL) framework, a novel paradigm for collaborative machine learning that combines the strengths of Federated Learning (FL) and Decentralized Learning (P2PL). HSL employs a two-tier communication structure that avoids the single point of failure inherent in FL and outperforms the state-of-the-art P2PL framework, Epidemic Learning Local (ELL). At equal comm… ▽ More We introduce the Hubs and Spokes Learning (HSL) framework, a novel paradigm for collaborative machine learning that combines the strengths of Federated Learning (FL) and Decentralized Learning (P2PL). HSL employs a two-tier communication structure that avoids the single point of failure inherent in FL and outperforms the state-of-the-art P2PL framework, Epidemic Learning Local (ELL). At equal communication budgets (total edges), HSL achieves higher performance than ELL, while at significantly lower communication budgets, it can match ELL's performance. For instance, with only 400 edges, HSL reaches the same test accuracy that ELL achieves with 1000 edges for 100 peers (spokes) on CIFAR-10, demonstrating its suitability for resource-constrained systems. HSL also achieves stronger consensus among nodes after mixing, resulting in improved performance with fewer training rounds. We substantiate these claims through rigorous theoretical analyses and extensive experimental results, showcasing HSL's practicality for large-scale collaborative learning. △ Less

Submitted 29 April, 2025; originally announced April 2025.

arXiv:2504.12515 [pdf, other]

Event Quality Score (EQS): Assessing the Realism of Simulated Event Camera Streams via Distances in Latent Space

Authors: Kaustav Chanda, Aayush Atul Verma, Arpitsinh Vaghela, Yezhou Yang, Bharatesh Chakravarthi

Abstract: Event cameras promise a paradigm shift in vision sensing with their low latency, high dynamic range, and asynchronous nature of events. Unfortunately, the scarcity of high-quality labeled datasets hinders their widespread adoption in deep learning-driven computer vision. To mitigate this, several simulators have been proposed to generate synthetic event data for training models for detection and e… ▽ More Event cameras promise a paradigm shift in vision sensing with their low latency, high dynamic range, and asynchronous nature of events. Unfortunately, the scarcity of high-quality labeled datasets hinders their widespread adoption in deep learning-driven computer vision. To mitigate this, several simulators have been proposed to generate synthetic event data for training models for detection and estimation tasks. However, the fundamentally different sensor design of event cameras compared to traditional frame-based cameras poses a challenge for accurate simulation. As a result, most simulated data fail to mimic data captured by real event cameras. Inspired by existing work on using deep features for image comparison, we introduce event quality score (EQS), a quality metric that utilizes activations of the RVT architecture. Through sim-to-real experiments on the DSEC driving dataset, it is shown that a higher EQS implies improved generalization to real-world data after training on simulated events. Thus, optimizing for EQS can lead to developing more realistic event camera simulators, effectively reducing the simulation gap. EQS is available at https://github.com/eventbasedvision/EQS. △ Less

Submitted 20 April, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

Comments: Accepted at 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW); Fifth International Workshop on Event-Based Vision

arXiv:2504.10548 [pdf]

Automated Testing of COBOL to Java Transformation

Authors: Sandeep Hans, Atul Kumar, Toshikai Yasue, Kouichi Ono, Saravanan Krishnan, Devika Sondhi, Fumiko Satoh, Gerald Mitchell, Sachin Kumar, Diptikalyan Saha

Abstract: Recent advances in Large Language Model (LLM) based Generative AI techniques have made it feasible to translate enterprise-level code from legacy languages such as COBOL to modern languages such as Java or Python. While the results of LLM-based automatic transformation are encouraging, the resulting code cannot be trusted to correctly translate the original code, making manual validation of transl… ▽ More Recent advances in Large Language Model (LLM) based Generative AI techniques have made it feasible to translate enterprise-level code from legacy languages such as COBOL to modern languages such as Java or Python. While the results of LLM-based automatic transformation are encouraging, the resulting code cannot be trusted to correctly translate the original code, making manual validation of translated Java code from COBOL a necessary but time-consuming and labor-intensive process. In this paper, we share our experience of developing a testing framework for IBM Watsonx Code Assistant for Z (WCA4Z) [5], an industrial tool designed for COBOL to Java translation. The framework automates the process of testing the functional equivalence of the translated Java code against the original COBOL programs in an industry context. Our framework uses symbolic execution to generate unit tests for COBOL, mocking external calls and transforming them into JUnit tests to validate semantic equivalence with translated Java. The results not only help identify and repair any detected discrepancies but also provide feedback to improve the AI model. △ Less

Submitted 14 April, 2025; originally announced April 2025.

arXiv:2504.10335 [pdf]

MorphTok: Morphologically Grounded Tokenization for Indian Languages

Authors: Maharaj Brahma, N J Karthika, Atul Singh, Devaraj Adiga, Smruti Bhate, Ganesh Ramakrishnan, Rohit Saluja, Maunendra Sankar Desarkar

Abstract: Tokenization is a crucial step in NLP, especially with the rise of large language models (LLMs), impacting downstream performance, computational cost, and efficiency. Existing LLMs rely on the classical Byte-pair Encoding (BPE) algorithm for subword tokenization that greedily merges frequent character bigrams. This often leads to segmentation that does not align with linguistically meaningful unit… ▽ More Tokenization is a crucial step in NLP, especially with the rise of large language models (LLMs), impacting downstream performance, computational cost, and efficiency. Existing LLMs rely on the classical Byte-pair Encoding (BPE) algorithm for subword tokenization that greedily merges frequent character bigrams. This often leads to segmentation that does not align with linguistically meaningful units. To address this, we propose morphology-aware segmentation as a pre-tokenization step prior to applying BPE. To facilitate morphology-aware segmentation, we create a novel dataset for Hindi and Marathi, incorporating sandhi splitting to enhance the subword tokenization. Experiments on downstream tasks show that morphologically grounded tokenization improves performance for machine translation and language modeling. Additionally, to handle the ambiguity in the Unicode characters for diacritics, particularly dependent vowels in syllable-based writing systems, we introduce Constrained BPE (CBPE), an extension to the traditional BPE algorithm that incorporates script-specific constraints. Specifically, CBPE handles dependent vowels. Our results show that CBPE achieves a 1.68\% reduction in fertility scores while maintaining comparable or improved downstream performance in machine translation, offering a computationally efficient alternative to standard BPE. Moreover, to evaluate segmentation across different tokenization algorithms, we introduce a new human evaluation metric, \textit{EvalTok}, enabling more human-grounded assessment. △ Less

Submitted 14 April, 2025; originally announced April 2025.

arXiv:2504.09877 [pdf]

Constructing Micro Knowledge Graphs from Technical Support Documents

Authors: Atul Kumar, Nisha Gupta, Saswati Dana

Abstract: Short technical support pages such as IBM Technotes are quite common in technical support domain. These pages can be very useful as the knowledge sources for technical support applications such as chatbots, search engines and question-answering (QA) systems. Information extracted from documents to drive technical support applications is often stored in the form of Knowledge Graph (KG). Building KG… ▽ More Short technical support pages such as IBM Technotes are quite common in technical support domain. These pages can be very useful as the knowledge sources for technical support applications such as chatbots, search engines and question-answering (QA) systems. Information extracted from documents to drive technical support applications is often stored in the form of Knowledge Graph (KG). Building KGs from a large corpus of documents poses a challenge of granularity because a large number of entities and actions are present in each page. The KG becomes virtually unusable if all entities and actions from these pages are stored in the KG. Therefore, only key entities and actions from each page are extracted and stored in the KG. This approach however leads to loss of knowledge represented by entities and actions left out of the KG as they are no longer available to graph search and reasoning functions. We propose a set of techniques to create micro knowledge graph (micrograph) for each of such web pages. The micrograph stores all the entities and actions in a page and also takes advantage of the structure of the page to represent exactly in which part of that page these entities and actions appeared, and also how they relate to each other. These micrographs can be used as additional knowledge sources by technical support applications. We define schemas for representing semi-structured and plain text knowledge present in the technical support web pages. Solutions in technical support domain include procedures made of steps. We also propose a technique to extract procedures from these webpages and the schemas to represent them in the micrographs. We also discuss how technical support applications can take advantage of the micrographs. △ Less

Submitted 14 April, 2025; originally announced April 2025.

arXiv:2504.06622 [pdf, other]

Quantum neural networks facilitating quantum state classification

Authors: Diksha Sharma, Vivek Balasaheb Sabale, Thirumalai M., Atul Kumar

Abstract: The classification of quantum states into distinct classes poses a significant challenge. In this study, we address this problem using quantum neural networks in combination with a problem-inspired circuit and customised as well as predefined ansätz. To facilitate the resource-efficient quantum state classification, we construct the dataset of quantum states using the proposed problem-inspired cir… ▽ More The classification of quantum states into distinct classes poses a significant challenge. In this study, we address this problem using quantum neural networks in combination with a problem-inspired circuit and customised as well as predefined ansätz. To facilitate the resource-efficient quantum state classification, we construct the dataset of quantum states using the proposed problem-inspired circuit. The problem-inspired circuit incorporates two-qubit parameterised unitary gates of varying entangling power, which is further integrated with the ansätz, developing an entire quantum neural network. To demonstrate the capability of the selected ansätz, we visualise the mitigated barren plateaus. The designed quantum neural network demonstrates the efficiency in binary and multi-class classification tasks. This work establishes a foundation for the classification of multi-qubit quantum states and offers the potential for generalisation to multi-qubit pure quantum states. △ Less

Submitted 9 April, 2025; originally announced April 2025.

arXiv:2503.14095 [pdf, other]

Towards Location-Specific Precipitation Projections Using Deep Neural Networks

Authors: Bipin Kumar, Bhvisy Kumar Yadav, Soumypdeep Mukhopadhyay, Rakshit Rohan, Bhupendra Bahadur Singh, Rajib Chattopadhyay, Nagraju Chilukoti, Atul Kumar Sahai

Abstract: Accurate precipitation estimates at individual locations are crucial for weather forecasting and spatial analysis. This study presents a paradigm shift by leveraging Deep Neural Networks (DNNs) to surpass traditional methods like Kriging for station-specific precipitation approximation. We propose two innovative NN architectures: one utilizing precipitation, elevation, and location, and another in… ▽ More Accurate precipitation estimates at individual locations are crucial for weather forecasting and spatial analysis. This study presents a paradigm shift by leveraging Deep Neural Networks (DNNs) to surpass traditional methods like Kriging for station-specific precipitation approximation. We propose two innovative NN architectures: one utilizing precipitation, elevation, and location, and another incorporating additional meteorological parameters like humidity, temperature, and wind speed. Trained on a vast dataset (1980-2019), these models outperform Kriging across various evaluation metrics (correlation coefficient, root mean square error, bias, and skill score) on a five-year validation set. This compelling evidence demonstrates the transformative power of deep learning for spatial prediction, offering a robust and precise alternative for station-specific precipitation estimation. △ Less

Submitted 18 March, 2025; originally announced March 2025.

Comments: 21 pages, 9 figures

arXiv:2503.00599 [pdf, other]

Large Engagement Networks for Classifying Coordinated Campaigns and Organic Twitter Trends

Authors: Atul Anand Gopalakrishnan, Jakir Hossain, Tugrulcan Elmas, Ahmet Erdem Sariyuce

Abstract: Social media users and inauthentic accounts, such as bots, may coordinate in promoting their topics. Such topics may give the impression that they are organically popular among the public, even though they are astroturfing campaigns that are centrally managed. It is challenging to predict if a topic is organic or a coordinated campaign due to the lack of reliable ground truth. In this paper, we cr… ▽ More Social media users and inauthentic accounts, such as bots, may coordinate in promoting their topics. Such topics may give the impression that they are organically popular among the public, even though they are astroturfing campaigns that are centrally managed. It is challenging to predict if a topic is organic or a coordinated campaign due to the lack of reliable ground truth. In this paper, we create such ground truth by detecting the campaigns promoted by ephemeral astroturfing attacks. These attacks push any topic to Twitter's (X) trends list by employing bots that tweet in a coordinated manner in a short period and then immediately delete their tweets. We manually curate a dataset of organic Twitter trends. We then create engagement networks out of these datasets which can serve as a challenging testbed for graph classification task to distinguish between campaigns and organic trends. Engagement networks consist of users as nodes and engagements as edges (retweets, replies, and quotes) between users. We release the engagement networks for 179 campaigns and 135 non-campaigns, and also provide finer-grain labels to characterize the type of the campaigns and non-campaigns. Our dataset, LEN (Large Engagement Networks), is available in the URL below. In comparison to traditional graph classification datasets, which are small with tens of nodes and hundreds of edges at most, graphs in LEN are larger. The average graph in LEN has ~11K nodes and ~23K edges. We show that state-of-the-art GNN methods give only mediocre results for campaign vs. non-campaign and campaign type classification on LEN. LEN offers a unique and challenging playfield for the graph classification problem. We believe that LEN will help advance the frontiers of graph classification techniques on large networks and also provide an interesting use case in terms of distinguishing coordinated campaigns and organic trends. △ Less

Submitted 28 March, 2025; v1 submitted 1 March, 2025; originally announced March 2025.

Comments: 14 Pages

Journal ref: ICWSM 2025

arXiv:2502.11306 [pdf, other]

Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation

Authors: Hieu Nguyen, Zihao He, Shoumik Atul Gandre, Ujjwal Pasupulety, Sharanya Kumari Shivakumar, Kristina Lerman

Abstract: Large language models (LLMs) often suffer from hallucination, generating factually incorrect or ungrounded content, which limits their reliability in high-stakes applications. A key factor contributing to hallucination is the use of hard labels during training, which enforce deterministic supervision, encourage overconfidence, and disregard the uncertainty inherent in natural language. To address… ▽ More Large language models (LLMs) often suffer from hallucination, generating factually incorrect or ungrounded content, which limits their reliability in high-stakes applications. A key factor contributing to hallucination is the use of hard labels during training, which enforce deterministic supervision, encourage overconfidence, and disregard the uncertainty inherent in natural language. To address this, we propose mitigating hallucination through knowledge distillation (KD), where a teacher model provides smoothed soft labels to a student model, reducing overconfidence and improving factual grounding. We apply KD during supervised finetuning on instructional data, evaluating its effectiveness across LLMs from different families. Experimental results on summarization benchmarks demonstrate that KD reduces hallucination compared to standard finetuning while preserving performance on general NLP tasks. These findings highlight KD as a promising approach for mitigating hallucination in LLMs and improving model reliability. △ Less

Submitted 16 February, 2025; originally announced February 2025.

arXiv:2502.11287 [pdf, other]

MC-BEVRO: Multi-Camera Bird Eye View Road Occupancy Detection for Traffic Monitoring

Authors: Arpitsinh Vaghela, Duo Lu, Aayush Atul Verma, Bharatesh Chakravarthi, Hua Wei, Yezhou Yang

Abstract: Single camera 3D perception for traffic monitoring faces significant challenges due to occlusion and limited field of view. Moreover, fusing information from multiple cameras at the image feature level is difficult because of different view angles. Further, the necessity for practical implementation and compatibility with existing traffic infrastructure compounds these challenges. To address these… ▽ More Single camera 3D perception for traffic monitoring faces significant challenges due to occlusion and limited field of view. Moreover, fusing information from multiple cameras at the image feature level is difficult because of different view angles. Further, the necessity for practical implementation and compatibility with existing traffic infrastructure compounds these challenges. To address these issues, this paper introduces a novel Bird's-Eye-View road occupancy detection framework that leverages multiple roadside cameras to overcome the aforementioned limitations. To facilitate the framework's development and evaluation, a synthetic dataset featuring diverse scenes and varying camera configurations is generated using the CARLA simulator. A late fusion and three early fusion methods were implemented within the proposed framework, with performance further enhanced by integrating backgrounds. Extensive evaluations were conducted to analyze the impact of multi-camera inputs and varying BEV occupancy map sizes on model performance. Additionally, a real-world data collection pipeline was developed to assess the model's ability to generalize to real-world environments. The sim-to-real capabilities of the model were evaluated using zero-shot and few-shot fine-tuning, demonstrating its potential for practical application. This research aims to advance perception systems in traffic monitoring, contributing to improved traffic management, operational efficiency, and road safety. △ Less

Submitted 16 February, 2025; originally announced February 2025.

arXiv:2502.10953 [pdf, other]

Empirical evaluation of LLMs in predicting fixes of Configuration bugs in Smart Home System

Authors: Sheikh Moonwara Anjum Monisha, Atul Bharadwaj

Abstract: This empirical study evaluates the effectiveness of Large Language Models (LLMs) in predicting fixes for configuration bugs in smart home systems. The research analyzes three prominent LLMs - GPT-4, GPT-4o (GPT-4 Turbo), and Claude 3.5 Sonnet - using four distinct prompt designs to assess their ability to identify appropriate fix strategies and generate correct solutions. The study utilized a data… ▽ More This empirical study evaluates the effectiveness of Large Language Models (LLMs) in predicting fixes for configuration bugs in smart home systems. The research analyzes three prominent LLMs - GPT-4, GPT-4o (GPT-4 Turbo), and Claude 3.5 Sonnet - using four distinct prompt designs to assess their ability to identify appropriate fix strategies and generate correct solutions. The study utilized a dataset of 129 debugging issues from the Home Assistant Community, focusing on 21 randomly selected cases for in-depth analysis. Results demonstrate that GPT-4 and Claude 3.5 Sonnet achieved 80\% accuracy in strategy prediction when provided with both bug descriptions and original scripts. GPT-4 exhibited consistent performance across different prompt types, while GPT-4o showed advantages in speed and cost-effectiveness despite slightly lower accuracy. The findings reveal that prompt design significantly impacts model performance, with comprehensive prompts containing both description and original script yielding the best results. This research provides valuable insights for improving automated bug fixing in smart home system configurations and demonstrates the potential of LLMs in addressing configuration-related challenges. △ Less

Submitted 15 February, 2025; originally announced February 2025.

arXiv:2502.04718 [pdf]

Evaluating Text Style Transfer Evaluation: Are There Any Reliable Metrics?

Authors: Sourabrata Mukherjee, Atul Kr. Ojha, John P. McCrae, Ondrej Dusek

Abstract: Text style transfer (TST) is the task of transforming a text to reflect a particular style while preserving its original content. Evaluating TST outputs is a multidimensional challenge, requiring the assessment of style transfer accuracy, content preservation, and naturalness. Using human evaluation is ideal but costly, as is common in other natural language processing (NLP) tasks, however, automa… ▽ More Text style transfer (TST) is the task of transforming a text to reflect a particular style while preserving its original content. Evaluating TST outputs is a multidimensional challenge, requiring the assessment of style transfer accuracy, content preservation, and naturalness. Using human evaluation is ideal but costly, as is common in other natural language processing (NLP) tasks, however, automatic metrics for TST have not received as much attention as metrics for, e.g., machine translation or summarization. In this paper, we examine both set of existing and novel metrics from broader NLP tasks for TST evaluation, focusing on two popular subtasks, sentiment transfer and detoxification, in a multilingual context comprising English, Hindi, and Bengali. By conducting meta-evaluation through correlation with human judgments, we demonstrate the effectiveness of these metrics when used individually and in ensembles. Additionally, we investigate the potential of large language models (LLMs) as tools for TST evaluation. Our findings highlight newly applied advanced NLP metrics and LLM-based evaluations provide better insights than existing TST metrics. Our oracle ensemble approaches show even more potential. △ Less

Submitted 23 April, 2025; v1 submitted 7 February, 2025; originally announced February 2025.

Comments: Accepted at NAACL SRW 2025

arXiv:2502.03472 [pdf, ps, other]

Powering LLM Regulation through Data: Bridging the Gap from Compute Thresholds to Customer Experiences

Authors: Wesley Pasfield

Abstract: The rapid advancement of Large Language Models (LLMs) has created a critical gap in consumer protection due to the lack of standardized certification processes for LLM-powered Artificial Intelligence (AI) systems. This paper argues that current regulatory approaches, which focus on compute-level thresholds and generalized model evaluations, are insufficient to ensure the safety and effectiveness o… ▽ More The rapid advancement of Large Language Models (LLMs) has created a critical gap in consumer protection due to the lack of standardized certification processes for LLM-powered Artificial Intelligence (AI) systems. This paper argues that current regulatory approaches, which focus on compute-level thresholds and generalized model evaluations, are insufficient to ensure the safety and effectiveness of specific LLM-based user experiences. We propose a shift towards a certification process centered on actual user-facing experiences and the curation of high-quality datasets for evaluation. This approach offers several benefits: it drives consumer confidence in AI system performance, enables businesses to demonstrate the credibility of their products, and allows regulators to focus on direct consumer protection. The paper outlines a potential certification workflow, emphasizing the importance of domain-specific datasets and expert evaluation. By repositioning data as the strategic center of regulatory efforts, this framework aims to address the challenges posed by the probabilistic nature of AI systems and the rapid pace of technological advancement. This shift in regulatory focus has the potential to foster innovation while ensuring responsible AI development, ultimately benefiting consumers, businesses, and government entities alike. △ Less

Submitted 12 January, 2025; originally announced February 2025.

Comments: Presented at the 2nd Workshop on Regulatable ML at NeurIPS 2024

arXiv:2502.03470 [pdf, other]

Responsible Artificial Intelligence (RAI) in U.S. Federal Government : Principles, Policies, and Practices

Authors: Atul Rawal, Katie Johnson, Curtis Mitchell, Michael Walton, Diamond Nwankwo

Abstract: Artificial intelligence (AI) and machine learning (ML) have made tremendous advancements in the past decades. From simple recommendation systems to more complex tumor identification systems, AI/ML systems have been utilized in a plethora of applications. This rapid growth of AI/ML and its proliferation in numerous private and public sector applications, while successful, has also opened new challe… ▽ More Artificial intelligence (AI) and machine learning (ML) have made tremendous advancements in the past decades. From simple recommendation systems to more complex tumor identification systems, AI/ML systems have been utilized in a plethora of applications. This rapid growth of AI/ML and its proliferation in numerous private and public sector applications, while successful, has also opened new challenges and obstacles for regulators. With almost little to no human involvement required for some of the new decision-making AI/ML systems, there is now a pressing need to ensure the responsible use of these systems. Particularly in federal government use-cases, the use of AI technologies must be carefully governed by appropriate transparency and accountability mechanisms. This has given rise to new interdisciplinary fields of AI research such as \textit{Responsible AI (RAI)}. In this position paper we provide a brief overview of development in RAI and discuss some of the motivating principles commonly explored in the field. An overview of the current regulatory landscape relating to AI is also discussed with analysis of different Executive Orders, policies and frameworks. We then present examples of how federal agencies are aiming for the responsible use of AI, specifically we present use-case examples of different projects and research from the Census Bureau on implementing the responsible use of AI. We also provide a brief overview for a Responsible AI Assessment Toolkit currently under-development aimed at helping federal agencies operationalize RAI principles. Finally, a robust discussion on how different policies/regulations map to RAI principles, along with challenges and opportunities for regulation/governance of responsible AI within the federal government is presented. △ Less

Submitted 12 January, 2025; originally announced February 2025.

Comments: Presented at the 2nd Workshop on Regulatable ML at NeurIPS 2024

arXiv:2502.01798 [pdf, other]

doi 10.1145/3696410.3714573

Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale

Authors: Elisa Tsai, Neal Mangaokar, Boyuan Zheng, Haizhong Zheng, Atul Prakash

Abstract: Terms and conditions for online shopping websites often contain terms that can have significant financial consequences for customers. Despite their impact, there is currently no comprehensive understanding of the types and potential risks associated with unfavorable financial terms. Furthermore, there are no publicly available detection systems or datasets to systematically identify or mitigate th… ▽ More Terms and conditions for online shopping websites often contain terms that can have significant financial consequences for customers. Despite their impact, there is currently no comprehensive understanding of the types and potential risks associated with unfavorable financial terms. Furthermore, there are no publicly available detection systems or datasets to systematically identify or mitigate these terms. In this paper, we take the first steps toward solving this problem with three key contributions. \textit{First}, we introduce \textit{TermMiner}, an automated data collection and topic modeling pipeline to understand the landscape of unfavorable financial terms. \textit{Second}, we create \textit{ShopTC-100K}, a dataset of terms and conditions from shopping websites in the Tranco top 100K list, comprising 1.8 million terms from 8,251 websites. Consequently, we develop a taxonomy of 22 types from 4 categories of unfavorable financial terms -- spanning purchase, post-purchase, account termination, and legal aspects. \textit{Third}, we build \textit{TermLens}, an automated detector that uses Large Language Models (LLMs) to identify unfavorable financial terms. Fine-tuned on an annotated dataset, \textit{TermLens} achieves an F1 score of 94.6\% and a false positive rate of 2.3\% using GPT-4o. When applied to shopping websites from the Tranco top 100K, we find that 42.06\% of these sites contain at least one unfavorable financial term, with such terms being more prevalent on less popular websites. Case studies further highlight the financial risks and customer dissatisfaction associated with unfavorable financial terms, as well as the limitations of existing ecosystem defenses. △ Less

Submitted 3 February, 2025; originally announced February 2025.

Comments: This paper has been accepted to The Web Conference 2025 (WWW '25)

ACM Class: H.3.3; K.4.1; K.4.2; I.2.7

arXiv:2501.15030 [pdf, other]

OptiSeq: Ordering Examples On-The-Fly for In-Context Learning

Authors: Rahul Atul Bhope, Praveen Venkateswaran, K. R. Jayaram, Vatche Isahagian, Vinod Muthusamy, Nalini Venkatasubramanian

Abstract: Developers using LLMs and LLM-based agents in their applications have provided plenty of anecdotal evidence that in-context-learning (ICL) is fragile. In this paper, we show that in addition to the quantity and quality of examples, the order in which the in-context examples are listed in the prompt affects the output of the LLM and, consequently, their performance. While prior work has explored im… ▽ More Developers using LLMs and LLM-based agents in their applications have provided plenty of anecdotal evidence that in-context-learning (ICL) is fragile. In this paper, we show that in addition to the quantity and quality of examples, the order in which the in-context examples are listed in the prompt affects the output of the LLM and, consequently, their performance. While prior work has explored improving ICL through dataset-dependent techniques, we introduce OptiSeq, a purely inference-time, dataset-free optimization method that efficiently determines the best example order. OptiSeq leverages log probabilities of LLM-generated outputs to systematically prune the search space of possible orderings and recommend the best order(s) by distinguishing orderings that yield high levels of accuracy and those that underperform. Extensive empirical evaluation on multiple LLMs, datasets, and prompts demonstrate that OptiSeq improves accuracy by 5.5 - 10.5 percentage points across multiple tasks. △ Less

Submitted 18 February, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

arXiv:2501.11978 [pdf, ps, other]

Weight Distribution of the Weighted Coordinates Poset Block Space and Singleton Bound

Authors: Atul Kumar Shriwastva, R. S. Selvaraj

Abstract: In this paper, we determine the complete weight distribution of the space $ \mathbb{F}_q^N $ endowed by the weighted coordinates poset block metric ($(P,w,π)$-metric), also known as the $(P,w,π)$-space, thereby obtaining it for $(P,w)$-space, $(P,π)$-space, $π$-space, and $P$-space as special cases. Further, when $P$ is a chain, the resulting space is called as Niederreiter-Rosenbloom-Tsfasman (NR… ▽ More In this paper, we determine the complete weight distribution of the space $ \mathbb{F}_q^N $ endowed by the weighted coordinates poset block metric ($(P,w,π)$-metric), also known as the $(P,w,π)$-space, thereby obtaining it for $(P,w)$-space, $(P,π)$-space, $π$-space, and $P$-space as special cases. Further, when $P$ is a chain, the resulting space is called as Niederreiter-Rosenbloom-Tsfasman (NRT) weighted block space and when $P$ is hierarchical, the resulting space is called as weighted coordinates hierarchical poset block space. The complete weight distribution of both the spaces are deduced from the main result. Moreover, we define an $I$-ball for an ideal $I$ in $P$ and study the characteristics of it in $(P,w,π)$-space. We investigate the relationship between the $I$-perfect codes and $t$-perfect codes in $(P,w,π)$-space. Given an ideal $I$, we investigate how the maximum distance separability (MDS) is related with $I$-perfect codes and $t$-perfect codes in $(P,w,π)$-space. Duality theorem is derived for an MDS $(P,w,π)$-code when all the blocks are of same length. Finally, the distribution of codewords among $r$-balls is analyzed in the case of chain poset, when all the blocks are of same length. △ Less

Submitted 21 January, 2025; originally announced January 2025.

Comments: 28 pages. arXiv admin note: substantial text overlap with arXiv:2210.12183

MSC Class: 94B05; 15A03; 06A06

arXiv:2501.09359 [pdf, other]

A Multi-tiered Solution for Personalized Baggage Item Recommendations using FastText and Association Rule Mining

Authors: Mudavath Ravi, Atul Negi

Abstract: This paper introduces an intelligent baggage item recommendation system to optimize packing for air travelers by providing tailored suggestions based on specific travel needs and destinations. Using FastText word embeddings and Association Rule Mining (ARM), the system ensures efficient luggage space utilization, compliance with weight limits, and an enhanced travel experience. The methodology com… ▽ More This paper introduces an intelligent baggage item recommendation system to optimize packing for air travelers by providing tailored suggestions based on specific travel needs and destinations. Using FastText word embeddings and Association Rule Mining (ARM), the system ensures efficient luggage space utilization, compliance with weight limits, and an enhanced travel experience. The methodology comprises four phases: (1) data collection and preprocessing with pre-trained FastText embeddings for text representation and similarity scoring (2) a content-based recommendation system enriched by user search history (3) application of ARM to user interactions to uncover meaningful item associations and (4) integration of FastText and ARM for accurate, personalized recommendations. Performance is evaluated using metrics such as coverage, support, confidence, lift, leverage, and conviction. Results demonstrate the system's effectiveness in providing relevant suggestions, improving customer satisfaction, and simplifying the packing process. These insights advance personalized recommendations, targeted marketing, and product optimization in air travel and beyond. △ Less

Submitted 16 January, 2025; originally announced January 2025.

arXiv:2501.01464 [pdf, other]

doi 10.1007/978-3-031-43999-5_13

Estimation of 3T MR images from 1.5T images regularized with Physics based Constraint

Authors: Prabhjot Kaur, Atul Singh Minhas, Chirag Kamal Ahuja, Anil Kumar Sao

Abstract: Limited accessibility to high field MRI scanners (such as 7T, 11T) has motivated the development of post-processing methods to improve low field images. Several existing post-processing methods have shown the feasibility to improve 3T images to produce 7T-like images [3,18]. It has been observed that improving lower field (LF, <=1.5T) images comes with additional challenges due to poor image quali… ▽ More Limited accessibility to high field MRI scanners (such as 7T, 11T) has motivated the development of post-processing methods to improve low field images. Several existing post-processing methods have shown the feasibility to improve 3T images to produce 7T-like images [3,18]. It has been observed that improving lower field (LF, <=1.5T) images comes with additional challenges due to poor image quality such as the function mapping 1.5T and higher field (HF, 3T) images is more complex than the function relating 3T and 7T images [10]. Except for [10], no method has been addressed to improve <=1.5T MRI images. Further, most of the existing methods [3,18] including [10] require example images, and also often rely on pixel to pixel correspondences between LF and HF images which are usually inaccurate for <=1.5T images. The focus of this paper is to address the unsupervised framework for quality improvement of 1.5T images and avoid the expensive requirements of example images and associated image registration. The LF and HF images are assumed to be related by a linear transformation (LT). The unknown HF image and unknown LT are estimated in alternate minimization framework. Further, a physics based constraint is proposed that provides an additional non-linear function relating LF and HF images in order to achieve the desired high contrast in estimated HF image. The experimental results demonstrate that the proposed approach provides processed 1.5T images, i.e., estimated 3T-like images with improved image quality, and is comparably better than the existing methods addressing similar problems. The improvement in image quality is also shown to provide better tissue segmentation and volume quantification as compared to scanner acquired 1.5T images. △ Less

Submitted 31 December, 2024; originally announced January 2025.

Comments: conference paper

Journal ref: Medical Image Computing and Computer Assisted Intervention - MICCAI 2023. Lecture Notes in Computer Science, vol 14229. Springer, Cham

arXiv:2412.16607 [pdf, other]

Improving Discovery of Known Software Vulnerability For Enhanced Cybersecurity

Authors: Devesh Sawant, Manjesh K. Hanawal, Atul Kabra

Abstract: Software vulnerabilities are commonly exploited as attack vectors in cyberattacks. Hence, it is crucial to identify vulnerable software configurations early to apply preventive measures. Effective vulnerability detection relies on identifying software vulnerabilities through standardized identifiers such as Common Platform Enumeration (CPE) strings. However, non-standardized CPE strings issued by… ▽ More Software vulnerabilities are commonly exploited as attack vectors in cyberattacks. Hence, it is crucial to identify vulnerable software configurations early to apply preventive measures. Effective vulnerability detection relies on identifying software vulnerabilities through standardized identifiers such as Common Platform Enumeration (CPE) strings. However, non-standardized CPE strings issued by software vendors create a significant challenge. Inconsistent formats, naming conventions, and versioning practices lead to mismatches when querying databases like the National Vulnerability Database (NVD), hindering accurate vulnerability detection. Failure to properly identify and prioritize vulnerable software complicates the patching process and causes delays in updating the vulnerable software, thereby giving attackers a window of opportunity. To address this, we present a method to enhance CPE string consistency by implementing a multi-layered sanitization process combined with a fuzzy matching algorithm on data collected using Osquery. Our method includes a union query with priority weighting, which assigns relevance to various attribute combinations, followed by a fuzzy matching process with threshold-based similarity scoring, yielding higher confidence in accurate matches. Comparative analysis with open-source tools such as FleetDM demonstrates that our approach improves detection accuracy by 40%. △ Less

Submitted 21 December, 2024; originally announced December 2024.

arXiv:2412.05026 [pdf, ps, other]

Quantum Security Analysis of the Key-Alternating Ciphers

Authors: Chen Bai, Mehdi Esmaili, Atul Mantri

Abstract: In this work, we study the quantum security of key-alternating ciphers (KAC), a natural multi-round generalization of the Even-Mansour (EM) cipher underlying many block cipher constructions, including AES. While the classical security of KAC and the quantum security of the $1$-round KAC (i.e. Even-Mansour) cipher are well understood, the quantum resistance of multi-round KAC remains largely unexpl… ▽ More In this work, we study the quantum security of key-alternating ciphers (KAC), a natural multi-round generalization of the Even-Mansour (EM) cipher underlying many block cipher constructions, including AES. While the classical security of KAC and the quantum security of the $1$-round KAC (i.e. Even-Mansour) cipher are well understood, the quantum resistance of multi-round KAC remains largely unexplored. We focus on the $2$-round KAC construction, defined using public $n$-bit permutations $P_1$, $P_2$ and keys $k_0$, $k_1$, and $k_2$ as $E(x) = P_2(P_1(x \oplus k_0) \oplus k_1) \oplus k_2$. Our main contributions are as follows: 1. Quantum Lower Bounds. We provide the first formal analysis showing that a $2$-round KAC is quantum-secure in both the $Q1$ and $Q2$ models. Specifically, in the $Q1$ model, a (non-adaptive) adversary must make at least $2^{2n/5}$ quantum queries to the public permutations and at least $2^{2n/5}$ classical queries to the cipher in order to distinguish it from a random permutation (in contrast to the classical lower bound of $2^{2n/3}$ queries). As a corollary, we show that in the $Q2$ model, a (non-adaptive) adversary requires $2^{n/4}$ quantum queries. To achieve such a result, we employ the quantum hybrid method along with recently proposed lifting theorems in the ideal cipher and random permutation oracle model. 2. Quantum Key-Recovery Attack. We give the first nontrivial quantum key-recovery attack on multi-round KAC in the $Q1$ model. Our quantum attack applies to any $t$-round KAC and achieves quantum query complexity $O(2^{αn})$, where $α= \frac{t(t+1)}{(t+1)^2 + 1}$, improving over the best known classical bound of $O(2^{α' n})$, where $α' = \frac{t}{t+1}$, from Bogdanov et al. (EUROCRYPT 2012). The attack leverages a novel application of quantum walk algorithms specifically adapted to the KAC structure. △ Less

Submitted 23 May, 2025; v1 submitted 6 December, 2024; originally announced December 2024.

Comments: (v2) Added new lower bound results for 2-KAC in the Q1 and Q2 models. Improved presentation throughout

arXiv:2412.02052 [pdf, other]

FoveaSPAD: Exploiting Depth Priors for Adaptive and Efficient Single-Photon 3D Imaging

Authors: Justin Folden, Atul Ingle, Sanjeev J. Koppal

Abstract: Fast, efficient, and accurate depth-sensing is important for safety-critical applications such as autonomous vehicles. Direct time-of-flight LiDAR has the potential to fulfill these demands, thanks to its ability to provide high-precision depth measurements at long standoff distances. While conventional LiDAR relies on avalanche photodiodes (APDs), single-photon avalanche diodes (SPADs) are an eme… ▽ More Fast, efficient, and accurate depth-sensing is important for safety-critical applications such as autonomous vehicles. Direct time-of-flight LiDAR has the potential to fulfill these demands, thanks to its ability to provide high-precision depth measurements at long standoff distances. While conventional LiDAR relies on avalanche photodiodes (APDs), single-photon avalanche diodes (SPADs) are an emerging image-sensing technology that offer many advantages such as extreme sensitivity and time resolution. In this paper, we remove the key challenges to widespread adoption of SPAD-based LiDARs: their susceptibility to ambient light and the large amount of raw photon data that must be processed to obtain in-pixel depth estimates. We propose new algorithms and sensing policies that improve signal-to-noise ratio (SNR) and increase computing and memory efficiency for SPAD-based LiDARs. During capture, we use external signals to \emph{foveate}, i.e., guide how the SPAD system estimates scene depths. This foveated approach allows our method to ``zoom into'' the signal of interest, reducing the amount of raw photon data that needs to be stored and transferred from the SPAD sensor, while also improving resilience to ambient light. We show results both in simulation and also with real hardware emulation, with specific implementations achieving a 1548-fold reduction in memory usage, and our algorithms can be applied to newly available and future SPAD arrays. △ Less

Submitted 2 December, 2024; originally announced December 2024.

arXiv:2411.14611 [pdf, other]

CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View Graphs

Authors: Alex Mathai, Kranthi Sedamaki, Debeshee Das, Noble Saji Mathews, Srikanth Tamilselvam, Sridhar Chimalakonda, Atul Kumar

Abstract: Machine Learning (ML) for software engineering (SE) has gained prominence due to its ability to significantly enhance the performance of various SE applications. This progress is largely attributed to the development of generalizable source code representations that effectively capture the syntactic and semantic characteristics of code. In recent years, pre-trained transformer-based models, inspir… ▽ More Machine Learning (ML) for software engineering (SE) has gained prominence due to its ability to significantly enhance the performance of various SE applications. This progress is largely attributed to the development of generalizable source code representations that effectively capture the syntactic and semantic characteristics of code. In recent years, pre-trained transformer-based models, inspired by natural language processing (NLP), have shown remarkable success in SE tasks. However, source code contains structural and semantic properties embedded within its grammar, which can be extracted from structured code-views like the Abstract Syntax Tree (AST), Data-Flow Graph (DFG), and Control-Flow Graph (CFG). These code-views can complement NLP techniques, further improving SE tasks. Unfortunately, there are no flexible frameworks to infuse arbitrary code-views into existing transformer-based models effectively. Therefore, in this work, we propose CodeSAM, a novel scalable framework to infuse multiple code-views into transformer-based models by creating self-attention masks. We use CodeSAM to fine-tune a small language model (SLM) like CodeBERT on the downstream SE tasks of semantic code search, code clone detection, and program classification. Experimental results show that by using this technique, we improve downstream performance when compared to SLMs like GraphCodeBERT and CodeBERT on all three tasks by utilizing individual code-views or a combination of code-views during fine-tuning. We believe that these results are indicative that techniques like CodeSAM can help create compact yet performant code SLMs that fit in resource constrained settings. △ Less

Submitted 21 November, 2024; originally announced November 2024.

arXiv:2411.10817 [pdf, other]

Conformation Generation using Transformer Flows

Authors: Sohil Atul Shah, Vladlen Koltun

Abstract: Estimating three-dimensional conformations of a molecular graph allows insight into the molecule's biological and chemical functions. Fast generation of valid conformations is thus central to molecular modeling. Recent advances in graph-based deep networks have accelerated conformation generation from hours to seconds. However, current network architectures do not scale well to large molecules. He… ▽ More Estimating three-dimensional conformations of a molecular graph allows insight into the molecule's biological and chemical functions. Fast generation of valid conformations is thus central to molecular modeling. Recent advances in graph-based deep networks have accelerated conformation generation from hours to seconds. However, current network architectures do not scale well to large molecules. Here we present ConfFlow, a flow-based model for conformation generation based on transformer networks. In contrast with existing approaches, ConfFlow directly samples in the coordinate space without enforcing any explicit physical constraints. The generative procedure is highly interpretable and is akin to force field updates in molecular dynamics simulation. When applied to the generation of large molecule conformations, ConfFlow improve accuracy by up to $40\%$ relative to state-of-the-art learning-based methods. The source code is made available at https://github.com/IntelLabs/ConfFlow. △ Less

Submitted 16 February, 2025; v1 submitted 16 November, 2024; originally announced November 2024.

Comments: This paper was completed in December 2022. Code available at https://github.com/IntelLabs/ConfFlow

arXiv:2411.08400 [pdf, other]

BAMAX: Backtrack Assisted Multi-Agent Exploration using Reinforcement Learning

Authors: Geetansh Kalra, Amit Patel, Atul Chaudhari, Divye Singh

Abstract: Autonomous robots collaboratively exploring an unknown environment is still an open problem. The problem has its roots in coordination among non-stationary agents, each with only a partial view of information. The problem is compounded when the multiple robots must completely explore the environment. In this paper, we introduce Backtrack Assisted Multi-Agent Exploration using Reinforcement Learnin… ▽ More Autonomous robots collaboratively exploring an unknown environment is still an open problem. The problem has its roots in coordination among non-stationary agents, each with only a partial view of information. The problem is compounded when the multiple robots must completely explore the environment. In this paper, we introduce Backtrack Assisted Multi-Agent Exploration using Reinforcement Learning (BAMAX), a method for collaborative exploration in multi-agent systems which attempts to explore an entire virtual environment. As in the name, BAMAX leverages backtrack assistance to enhance the performance of agents in exploration tasks. To evaluate BAMAX against traditional approaches, we present the results of experiments conducted across multiple hexagonal shaped grids sizes, ranging from 10x10 to 60x60. The results demonstrate that BAMAX outperforms other methods in terms of faster coverage and less backtracking across these environments. △ Less

Submitted 13 November, 2024; originally announced November 2024.

arXiv:2411.05088 [pdf]

Findings of the IWSLT 2024 Evaluation Campaign

Authors: Ibrahim Said Ahmad, Antonios Anastasopoulos, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, William Chen, Qianqian Dong, Marcello Federico, Barry Haddow, Dávid Javorský, Mateusz Krubiński, Tsz Kin Lam, Xutai Ma, Prashant Mathur, Evgeny Matusov, Chandresh Maurya, John McCrae, Kenton Murray, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, Atul Kr. Ojha , et al. (20 additional authors not shown)

Abstract: This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks attracted 18 teams whose submissions are documented in… ▽ More This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks attracted 18 teams whose submissions are documented in 26 system papers. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia. △ Less

Submitted 7 November, 2024; originally announced November 2024.

Comments: IWSLT 2024; 59 pages

arXiv:2411.02993 [pdf]

Empowering Library Users: Creative Strategies for Engagement and Innovation

Authors: Snehasish Paul, Shivali Chauhan, Atul Kumar Pal

Abstract: This study investigated the integration of cutting-edge technologies and methodologies for creating dynamic, user-centered library environments. In creative strategies for engagement and innovation, library users must be empowered to undertake the new role of modernizing library services and enhancing user experiences. It also enhances the information management and user engagement. This can be at… ▽ More This study investigated the integration of cutting-edge technologies and methodologies for creating dynamic, user-centered library environments. In creative strategies for engagement and innovation, library users must be empowered to undertake the new role of modernizing library services and enhancing user experiences. It also enhances the information management and user engagement. This can be attained from personalized approaches, such as recommendation systems to interactive platforms that will have effective experiences tailored to users of different natures. It investigates the consumer engagement practices of enthusiasm, sharing, and learning about their roles in cognitive, affective, and behavioural engagements. Combined, these new approaches will help promote learning, interaction, and growth, add value, and have a more positive impact on users. The challenge for libraries in this rapidly changing, technologically advancing, and digitally networked world, with a base of expectant users, is to remain relevant and engaging. This study discusses innovative strategies for empowering library users and enhancing their engagement through creative and technological approaches. This investigation was conducted to integrate cutting-edge technologies and methodologies into creating dynamic library settings that are user-centered and foster learning, interaction, and personal growth. △ Less

Submitted 5 November, 2024; originally announced November 2024.

arXiv:2410.21119 [pdf, ps, other]

doi 10.1145/3711896.3736825

A Unified Solution to Diverse Heterogeneities in One-shot Federated Learning

Authors: Jun Bai, Yiliao Song, Di Wu, Atul Sajjanhar, Yong Xiang, Wei Zhou, Xiaohui Tao, Yan Li, Yue Li

Abstract: One-Shot Federated Learning (OSFL) restricts communication between the server and clients to a single round, significantly reducing communication costs and minimizing privacy leakage risks compared to traditional Federated Learning (FL), which requires multiple rounds of communication. However, existing OSFL frameworks remain vulnerable to distributional heterogeneity, as they primarily focus on m… ▽ More One-Shot Federated Learning (OSFL) restricts communication between the server and clients to a single round, significantly reducing communication costs and minimizing privacy leakage risks compared to traditional Federated Learning (FL), which requires multiple rounds of communication. However, existing OSFL frameworks remain vulnerable to distributional heterogeneity, as they primarily focus on model heterogeneity while neglecting data heterogeneity. To bridge this gap, we propose FedHydra, a unified, data-free, OSFL framework designed to effectively address both model and data heterogeneity. Unlike existing OSFL approaches, FedHydra introduces a novel two-stage learning mechanism. Specifically, it incorporates model stratification and heterogeneity-aware stratified aggregation to mitigate the challenges posed by both model and data heterogeneity. By this design, the data and model heterogeneity issues are simultaneously monitored from different aspects during learning. Consequently, FedHydra can effectively mitigate both issues by minimizing their inherent conflicts. We compared FedHydra with five SOTA baselines on four benchmark datasets. Experimental results show that our method outperforms the previous OSFL methods in both homogeneous and heterogeneous settings. The code is available at https://github.com/Jun-B0518/FedHydra. △ Less

Submitted 1 June, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

Comments: Accepted version to KDD 2025

arXiv:2410.08938 [pdf, other]

KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors

Authors: Benson Chen, Tomasz Danel, Patrick J. McEnaney, Nikhil Jain, Kirill Novikov, Spurti Umesh Akki, Joshua L. Turnbull, Virja Atul Pandya, Boris P. Belotserkovskii, Jared Bryce Weaver, Ankita Biswas, Dat Nguyen, Gabriel H. S. Dreiman, Mohammad Sultan, Nathaniel Stanley, Daniel M Whalen, Divya Kanichar, Christoph Klein, Emily Fox, R. Edward Watts

Abstract: DNA-Encoded Libraries (DEL) are combinatorial small molecule libraries that offer an efficient way to characterize diverse chemical spaces. Selection experiments using DELs are pivotal to drug discovery efforts, enabling high-throughput screens for hit finding. However, limited availability of public DEL datasets hinders the advancement of computational techniques designed to process such data. To… ▽ More DNA-Encoded Libraries (DEL) are combinatorial small molecule libraries that offer an efficient way to characterize diverse chemical spaces. Selection experiments using DELs are pivotal to drug discovery efforts, enabling high-throughput screens for hit finding. However, limited availability of public DEL datasets hinders the advancement of computational techniques designed to process such data. To bridge this gap, we present KinDEL, one of the first large, publicly available DEL datasets on two kinases: Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1). Interest in this data modality is growing due to its ability to generate extensive supervised chemical data that densely samples around select molecular structures. Demonstrating one such application of the data, we benchmark different machine learning techniques to develop predictive models for hit identification; in particular, we highlight recent structure-based probabilistic approaches. Finally, we provide biophysical assay data, both on- and off-DNA, to validate our models on a smaller subset of molecules. Data and code for our benchmarks can be found at: https://github.com/insitro/kindel. △ Less

Submitted 11 October, 2024; originally announced October 2024.

arXiv:2409.18337 [pdf, other]

Photon Inhibition for Energy-Efficient Single-Photon Imaging

Authors: Lucas J. Koerner, Shantanu Gupta, Atul Ingle, Mohit Gupta

Abstract: Single-photon cameras (SPCs) are emerging as sensors of choice for various challenging imaging applications. One class of SPCs based on the single-photon avalanche diode (SPAD) detects individual photons using an avalanche process; the raw photon data can then be processed to extract scene information under extremely low light, high dynamic range, and rapid motion. Yet, single-photon sensitivity i… ▽ More Single-photon cameras (SPCs) are emerging as sensors of choice for various challenging imaging applications. One class of SPCs based on the single-photon avalanche diode (SPAD) detects individual photons using an avalanche process; the raw photon data can then be processed to extract scene information under extremely low light, high dynamic range, and rapid motion. Yet, single-photon sensitivity in SPADs comes at a cost -- each photon detection consumes more energy than that of a CMOS camera. This avalanche power significantly limits sensor resolution and could restrict widespread adoption of SPAD-based SPCs. We propose a computational-imaging approach called \emph{photon inhibition} to address this challenge. Photon inhibition strategically allocates detections in space and time based on downstream inference task goals and resource constraints. We develop lightweight, on-sensor computational inhibition policies that use past photon data to disable SPAD pixels in real-time, to select the most informative future photons. As case studies, we design policies tailored for image reconstruction and edge detection, and demonstrate, both via simulations and real SPC captured data, considerable reduction in photon detections (over 90\% of photons) while maintaining task performance metrics. Our work raises the question of ``which photons should be detected?'', and paves the way for future energy-efficient single-photon imaging. △ Less

Submitted 26 September, 2024; originally announced September 2024.

Comments: Accepted for ECCV 2024. Supplementary material and code available at https://wisionlab.com/project/inhibition

arXiv:2409.17460 [pdf, other]

Towards More Relevant Product Search Ranking Via Large Language Models: An Empirical Study

Authors: Qi Liu, Atul Singh, Jingbo Liu, Cun Mu, Zheng Yan

Abstract: Training Learning-to-Rank models for e-commerce product search ranking can be challenging due to the lack of a gold standard of ranking relevance. In this paper, we decompose ranking relevance into content-based and engagement-based aspects, and we propose to leverage Large Language Models (LLMs) for both label and feature generation in model training, primarily aiming to improve the model's predi… ▽ More Training Learning-to-Rank models for e-commerce product search ranking can be challenging due to the lack of a gold standard of ranking relevance. In this paper, we decompose ranking relevance into content-based and engagement-based aspects, and we propose to leverage Large Language Models (LLMs) for both label and feature generation in model training, primarily aiming to improve the model's predictive capability for content-based relevance. Additionally, we introduce different sigmoid transformations on the LLM outputs to polarize relevance scores in labeling, enhancing the model's ability to balance content-based and engagement-based relevances and thus prioritize highly relevant items overall. Comprehensive online tests and offline evaluations are also conducted for the proposed design. Our work sheds light on advanced strategies for integrating LLMs into e-commerce product search ranking model training, offering a pathway to more effective and balanced models with improved ranking relevance. △ Less

Submitted 25 September, 2024; originally announced September 2024.

Comments: To be published in CIKM 2024 GenAIECommerce Workshop

arXiv:2409.17456 [pdf, other]

Long or Short or Both? An Exploration on Lookback Time Windows of Behavioral Features in Product Search Ranking

Authors: Qi Liu, Atul Singh, Jingbo Liu, Cun Mu, Zheng Yan, Jan Pedersen

Abstract: Customer shopping behavioral features are core to product search ranking models in eCommerce. In this paper, we investigate the effect of lookback time windows when aggregating these features at the (query, product) level over history. By studying the pros and cons of using long and short time windows, we propose a novel approach to integrating these historical behavioral features of different tim… ▽ More Customer shopping behavioral features are core to product search ranking models in eCommerce. In this paper, we investigate the effect of lookback time windows when aggregating these features at the (query, product) level over history. By studying the pros and cons of using long and short time windows, we propose a novel approach to integrating these historical behavioral features of different time windows. In particular, we address the criticality of using query-level vertical signals in ranking models to effectively aggregate all information from different behavioral features. Anecdotal evidence for the proposed approach is also provided using live product search traffic on Walmart.com. △ Less

Submitted 25 September, 2024; originally announced September 2024.

Comments: Published in ACM SIGIR Workshop on eCommerce 2024

arXiv:2409.17176 [pdf]

XDC Gasless Subnet: Gasless Subnet Staking dApp for XDC Network

Authors: Mohuya Chakraborty, Atul Khekade

Abstract: With a delegated proof-of-stake (XDPoS) consensus mechanism, the XDC Network is an enterprise-focused blockchain platform that combines the strength of public and private blockchains to provide quick transaction times, low energy consumption, and economical gas fees. XDC is designed for interoperability and supports decentralized apps (dApps) and integrates smoothly with financial systems. It is p… ▽ More With a delegated proof-of-stake (XDPoS) consensus mechanism, the XDC Network is an enterprise-focused blockchain platform that combines the strength of public and private blockchains to provide quick transaction times, low energy consumption, and economical gas fees. XDC is designed for interoperability and supports decentralized apps (dApps) and integrates smoothly with financial systems. It is perfect for trade financing and tokenisation of physical assets because of its emphasis on security and scalability. However, there are a few critical issues that hamper wider acceptance and usability for certain high-frequency applications. This whitepaper introduces a novel and enthralling dApp for establishing a gasless subnet in which mainnet XDC can be staked to spin off a subnet that functions similarly to a non-crypto network, accepting currency fees on the XDC network. This would allow users to stake their tokens without incurring gas fees making the staking process more efficient, cost-effective, and simultaneously enhancing scalability. Performance evaluation of the dApp shows promising results in terms of throughput, latency, scalability, security, and cost efficiency. The use cases and applications of this approach along with challenges and ensuing solutions are included. △ Less

Submitted 21 September, 2024; originally announced September 2024.

Comments: 24 pages, 5 figures, 3 tables, 16 references

Report number: Sep 2024

arXiv:2409.13083 [pdf, other]

FedAT: Federated Adversarial Training for Distributed Insider Threat Detection

Authors: R G Gayathri, Atul Sajjanhar, Md Palash Uddin, Yong Xiang

Abstract: Insider threats usually occur from within the workplace, where the attacker is an entity closely associated with the organization. The sequence of actions the entities take on the resources to which they have access rights allows us to identify the insiders. Insider Threat Detection (ITD) using Machine Learning (ML)-based approaches gained attention in the last few years. However, most techniques… ▽ More Insider threats usually occur from within the workplace, where the attacker is an entity closely associated with the organization. The sequence of actions the entities take on the resources to which they have access rights allows us to identify the insiders. Insider Threat Detection (ITD) using Machine Learning (ML)-based approaches gained attention in the last few years. However, most techniques employed centralized ML methods to perform such an ITD. Organizations operating from multiple locations cannot contribute to the centralized models as the data is generated from various locations. In particular, the user behavior data, which is the primary source of ITD, cannot be shared among the locations due to privacy concerns. Additionally, the data distributed across various locations result in extreme class imbalance due to the rarity of attacks. Federated Learning (FL), a distributed data modeling paradigm, gained much interest recently. However, FL-enabled ITD is not yet explored, and it still needs research to study the significant issues of its implementation in practical settings. As such, our work investigates an FL-enabled multiclass ITD paradigm that considers non-Independent and Identically Distributed (non-IID) data distribution to detect insider threats from different locations (clients) of an organization. Specifically, we propose a Federated Adversarial Training (FedAT) approach using a generative model to alleviate the extreme data skewness arising from the non-IID data distribution among the clients. Besides, we propose to utilize a Self-normalized Neural Network-based Multi-Layer Perceptron (SNN-MLP) model to improve ITD. We perform comprehensive experiments and compare the results with the benchmarks to manifest the enhanced performance of the proposed FedATdriven ITD scheme. △ Less

Submitted 19 September, 2024; originally announced September 2024.

Comments: 10 pages, 7 figures

Showing 1–50 of 273 results for author: Atul