Search | arXiv e-print repository

Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks

Authors: Maria-Florina Balcan, Saumya Goyal, Dravyansh Sharma

Abstract: Modern regression problems often involve high-dimensional data and a careful tuning of the regularization hyperparameters is crucial to avoid overly complex models that may overfit the training data while guaranteeing desirable properties like effective variable selection. We study the recently introduced direction of tuning regularization hyperparameters in linear regression across multiple relat… ▽ More Modern regression problems often involve high-dimensional data and a careful tuning of the regularization hyperparameters is crucial to avoid overly complex models that may overfit the training data while guaranteeing desirable properties like effective variable selection. We study the recently introduced direction of tuning regularization hyperparameters in linear regression across multiple related tasks. We obtain distribution-dependent bounds on the generalization error for the validation loss when tuning the L1 and L2 coefficients, including ridge, lasso and the elastic net. In contrast, prior work develops bounds that apply uniformly to all distributions, but such bounds necessarily degrade with feature dimension, d. While these bounds are shown to be tight for worst-case distributions, our bounds improve with the "niceness" of the data distribution. Concretely, we show that under additional assumptions that instances within each task are i.i.d. draws from broad well-studied classes of distributions including sub-Gaussians, our generalization bounds do not get worse with increasing d, and are much sharper than prior work for very large d. We also extend our results to a generalization of ridge regression, where we achieve tighter bounds that take into account an estimate of the mean of the ground truth distribution. △ Less

Submitted 7 July, 2025; originally announced July 2025.

Comments: 49 pages

arXiv:2507.04896 [pdf, ps, other]

Cross sections of $η$ mesons in $p$$+$$p$ collisions at forward rapidity at $\sqrt{s}=500$ GeV and central rapidity at $\sqrt{s}=510$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, H. Al-Ta'ani, J. Alexander, M. Alfred, D. Anderson, K. R. Andrews, A. Angerami, S. Antsupov, K. Aoki, N. Apadula, E. Appelt, Y. Aramaki, R. Armendariz, H. Asano, E. C. Aschenauer, E. T. Atomssa, T. C. Awes, B. Azmoun , et al. (476 additional authors not shown)

Abstract: We present the first measurements of the forward and midrapidity $η$-meson cross sections from $p$$+$$p$ collisions at $\sqrt{s}=500$ and $510$~GeV, respectively. We also report the midrapidity $η/π^0$ ratio at 510 GeV. The forward cross section is measured differentially in $η$-meson transverse momentum ($p_T$) from 1.0 to 6.5~GeV/$c$ for pseudorapidity $3.0<|η|<3.8$. The midrapidity cross sectio… ▽ More We present the first measurements of the forward and midrapidity $η$-meson cross sections from $p$$+$$p$ collisions at $\sqrt{s}=500$ and $510$~GeV, respectively. We also report the midrapidity $η/π^0$ ratio at 510 GeV. The forward cross section is measured differentially in $η$-meson transverse momentum ($p_T$) from 1.0 to 6.5~GeV/$c$ for pseudorapidity $3.0<|η|<3.8$. The midrapidity cross section is measured from 3.5 to 44 GeV/$c$ for pseudorapidity $|η|<0.35$. Both cross sections serve as critical inputs to an updated global analysis of the $η$-meson fragmentation functions. △ Less

Submitted 7 July, 2025; originally announced July 2025.

Comments: 500 authors from 81 institutions, 14 pages, 7 figures, 3 tables. v1 is version submitted to Physical Review D. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2507.04463 [pdf, ps, other]

Low-mass vector-meson production at forward rapidity in $p$$+$$p$ and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, M. Alfred, D. Anderson, V. Andrieux, S. Antsupov, N. Apadula, H. Asano, B. Azmoun, V. Babintsev, M. Bai, N. S. Bandara, B. Bannier, E. Bannikov, K. N. Barish, S. Bathe, A. Bazilevsky, M. Beaumier, S. Beckman, R. Belmont , et al. (331 additional authors not shown)

Abstract: The PHENIX experiment at the Relativistic Heavy Ion Collider has measured low-mass vector-meson ($ω+ρ$ and $φ$) production through the dimuon decay channel at forward rapidity $(1.2<|\mbox{y}|<2.2)$ in $p$$+$$p$ and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. The low-mass vector-meson yield and nuclear-modification factor were measured as a function of the average number of participating nuc… ▽ More The PHENIX experiment at the Relativistic Heavy Ion Collider has measured low-mass vector-meson ($ω+ρ$ and $φ$) production through the dimuon decay channel at forward rapidity $(1.2<|\mbox{y}|<2.2)$ in $p$$+$$p$ and Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. The low-mass vector-meson yield and nuclear-modification factor were measured as a function of the average number of participating nucleons, $\langle N_{\rm part}\rangle$, and the transverse momentum $p_T$. These results were compared with those obtained via the kaon decay channel in a similar $p_T$ range at midrapidity. The nuclear-modification factors in both rapidity regions are consistent within the uncertainties. A comparison of the $ω+ρ$ and $J/ψ$ mesons reveals that the light and heavy flavors are consistently suppressed across both $p_T$ and ${\langle}N_{\rm part}\rangle$. In contrast, the $φ$ meson displays a nuclear-modification factor consistent with unity, suggesting strangeness enhancement in the medium formed. △ Less

Submitted 6 July, 2025; originally announced July 2025.

Comments: 356 authors from 71 institutions, 14 pages, 14 figures, 1 table. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2506.22120 [pdf, ps, other]

Walking Through Complex Spatial Patterns of Climate and Conflict-Induced Displacements

Authors: David Carranza, Devansh Sharma, Francisco Malveiro, Gustavo Kohlrausch, Jisha Mariyam John, Kaloyan Danovski, Malvina Bozhidarova, Rui Zheng, Sandro Sousa

Abstract: Extreme weather events are projected to intensify global migration, increase resource competition, and amplify socio-spatial phenomena, including intergroup conflicts, socioeconomic inequalities, and unplanned displacements, among others. Addressing these challenges requires consolidating heterogeneous data to identify, estimate, and predict the dynamical process behind climate-induced movements.… ▽ More Extreme weather events are projected to intensify global migration, increase resource competition, and amplify socio-spatial phenomena, including intergroup conflicts, socioeconomic inequalities, and unplanned displacements, among others. Addressing these challenges requires consolidating heterogeneous data to identify, estimate, and predict the dynamical process behind climate-induced movements. We propose a novel hybrid approach to reconstruct hazard-induced displacements by analysing the statistical properties of a diffusion process (walks) that explores the spatial network constructed from real displacements. The likely trajectories produced by the walks inform the typical journey of individuals, identifying potential hazards that may be encountered when fleeing high-risk areas. As a proof of concept, we apply this method to Somalia's detailed displacement tracking matrix, containing 20,220 movements dating from February 8 to June 18, 2025. We reconstruct the likely routes that displaced persons could have taken when fleeing areas affected by conflict or climate hazards. We find that individuals using the most likely paths based on current flows would experience mainly droughts and conflicts, while the latter becomes less prominent at every subsequent step. We also find that the probability of conflict and drought across all trajectories is widely dispersed, meaning that there is no typical exposure. This work provides an understanding of the mechanisms underlying displacement patterns and a framework for estimating future movements in areas expected to face increasing hazards. △ Less

Submitted 4 July, 2025; v1 submitted 27 June, 2025; originally announced June 2025.

Comments: Preprint from Complexity 72H Workshop, held at Carlos III University of Madrid, Spain, 23-27 June 2025

arXiv:2506.10615 [pdf, ps, other]

Strongly correlated topological surface states in type-II Dirac semimetal NiTe$_{2}$

Authors: Neeraj Bhatt, Asif Ali, Deepali Sharma, Sakshi Bansal, Manasi Mandal, Ravi Prakash Singh, Ravi Shankar Singh

Abstract: Nontrivial topology in type-II Dirac semimetal NiTe$_2$ leading to topologically protected surface states give rise to fascinating phenomena holding great promise for next-generation electronic and spintronic devices. Key parameters $-$ such as lattice parameter, disorder, vacancies, and electron correlation $-$ significantly influence the electronic structure and, subsequently, the physical prope… ▽ More Nontrivial topology in type-II Dirac semimetal NiTe$_2$ leading to topologically protected surface states give rise to fascinating phenomena holding great promise for next-generation electronic and spintronic devices. Key parameters $-$ such as lattice parameter, disorder, vacancies, and electron correlation $-$ significantly influence the electronic structure and, subsequently, the physical properties. To resolve the discrepancy between the theoretical description and experimentally observed topological surface states, we comprehensively investigate the electronic structure of NiTe$_2$ using angle-resolved photoemission spectroscopy and density functional theory. Although the bulk electronic structure is found to be well-described within mean field approaches, an accurate description of topological surface states is obtained only by incorporating surface electronic correlation. We reveal that the strongly correlated surface states forming Dirac-like conical crossing much below Fermi level have hybridized Ni 3$d$ and Te 5$p$ character. These findings underscore the intricate interplay between electron correlation and band topology, broadening our understanding of many-body correlation effects on the topological surface states in quantum materials. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: to appear in Phys. Rev. B

arXiv:2506.05252 [pdf, ps, other]

Conservative classifiers do consistently well with improving agents: characterizing statistical and online learning

Authors: Dravyansh Sharma, Alec Sun

Abstract: Machine learning is now ubiquitous in societal decision-making, for example in evaluating job candidates or loan applications, and it is increasingly important to take into account how classified agents will react to the learning algorithms. The majority of recent literature on strategic classification has focused on reducing and countering deceptive behaviors by the classified agents, but recent… ▽ More Machine learning is now ubiquitous in societal decision-making, for example in evaluating job candidates or loan applications, and it is increasingly important to take into account how classified agents will react to the learning algorithms. The majority of recent literature on strategic classification has focused on reducing and countering deceptive behaviors by the classified agents, but recent work of Attias et al. identifies surprising properties of learnability when the agents genuinely improve in order to attain the desirable classification, such as smaller generalization error than standard PAC-learning. In this paper we characterize so-called learnability with improvements across multiple new axes. We introduce an asymmetric variant of minimally consistent concept classes and use it to provide an exact characterization of proper learning with improvements in the realizable setting. While prior work studies learnability only under general, arbitrary agent improvement regions, we give positive results for more natural Euclidean ball improvement sets. In particular, we characterize improper learning under a mild generative assumption on the data distribution. We further show how to learn in more challenging settings, achieving lower generalization error under well-studied bounded noise models and obtaining mistake bounds in realizable and agnostic online learning. We resolve open questions posed by Attias et al. for both proper and improper learning. △ Less

Submitted 5 June, 2025; originally announced June 2025.

Comments: 24 pages

arXiv:2506.00357 [pdf]

Dynamic Control of Momentum-Polarization Photoluminescence States with Liquid-Crystal-tuned Nanocavities

Authors: Chengkun Dong, Matthew R. Chua, Rasna Maruthiyodan Veetil, T. Thu Ha Do, Lu Ding, Deepak K. Sharma, Jun Xia, Ramón Paniagua-Domínguez

Abstract: Dynamic control of light, and in particular beam steering, is pivotal in various optical applications, including telecommunications, LiDAR, and biomedical imaging. Traditional approaches achieve this by interfacing a tunable modulating device with an external light source, facing challenges in achieving compact devices. Here, we introduce a dynamic photoluminescence (PL) modulating device, with wh… ▽ More Dynamic control of light, and in particular beam steering, is pivotal in various optical applications, including telecommunications, LiDAR, and biomedical imaging. Traditional approaches achieve this by interfacing a tunable modulating device with an external light source, facing challenges in achieving compact devices. Here, we introduce a dynamic photoluminescence (PL) modulating device, with which the properties of light directly emitted by a quasi-two-dimensional perovskite (in particular its directionality and polarization) can be modified continuously and over a large range. The device is based on a liquid-crystal-tunable Fabry-Perot (FP) nanocavity and uses the FP energy-momentum dispersion and spin-orbit coupling between the excitons and the cavity modes to enable this dynamic control over the emitted radiation. With this device, we achieve electrically-controlled, continuous and variable emission angles up to a maximum of 28°, as well as manipulation of the PL polarization state, enabling both the creation of polarization gradients and the achievement of polarization conversion at specific emission angles. Moreover, due to its resonant character, a 3-fold increase in the emission intensity is observed, as confirmed through time-resolved photoluminescence (TRPL) measurements. Our approach leverages the unique properties of actively tunable birefringent nanocavities to improve emission directivity, angle tunability and polarization control, presenting a promising solution for next-generation, deeply integrated beam steering devices. △ Less

Submitted 30 May, 2025; originally announced June 2025.

arXiv:2505.22650 [pdf, ps, other]

On Learning Verifiers for Chain-of-Thought Reasoning

Authors: Maria-Florina Balcan, Avrim Blum, Zhiyuan Li, Dravyansh Sharma

Abstract: Chain-of-Thought reasoning has emerged as a powerful approach for solving complex mathematical and logical problems. However, it can often veer off track through incorrect or unsubstantiated inferences. Formal mathematical reasoning, which can be checked with a formal verifier, is one approach to addressing this issue. However, currently LLMs are simply not good enough to solve complex problems in… ▽ More Chain-of-Thought reasoning has emerged as a powerful approach for solving complex mathematical and logical problems. However, it can often veer off track through incorrect or unsubstantiated inferences. Formal mathematical reasoning, which can be checked with a formal verifier, is one approach to addressing this issue. However, currently LLMs are simply not good enough to solve complex problems in a formal way, and even just formalizing an informal problem statement can be challenging. Motivated by this fact, in this work we consider the problem of learning reliable verifiers for natural language Chain-of-Thought reasoning. That is, given a problem statement and step-by-step solution in natural language, the aim of the verifier is to output [Yes] if the reasoning steps in the solution are all valid, and [No] otherwise. In this work we give a formal PAC-learning framework for studying this problem. We propose and analyze several natural verification goals, at different levels of strength, in this framework. We provide sample complexity upper-bounds for learning verifiers satisfying these goals, as well as lower-bound and impossibility results for learning other natural verification objectives without additional assumptions. △ Less

Submitted 28 May, 2025; originally announced May 2025.

arXiv:2505.16465 [pdf, ps, other]

Explicit and Mixed Estimates for Thue inequalities with few coefficients

Authors: N. Saradha, Divyum Sharma

Abstract: Let $F(x,y)$ be an irreducible form of degree $r\geq 3$ and having $s+1$ non-zero coefficients. Let $h\geq 1$ be an integer and consider the Thue inequality $$|F(x,y)|\leq h.$$ Following the seminal work of Thue in 1909, several papers were written giving an upper bound for the number of solutions of the above inequality as $\ll c(r,s,h)$ where $c(r,s,h)$ is an explicit function of $r,s$ and $h.$… ▽ More Let $F(x,y)$ be an irreducible form of degree $r\geq 3$ and having $s+1$ non-zero coefficients. Let $h\geq 1$ be an integer and consider the Thue inequality $$|F(x,y)|\leq h.$$ Following the seminal work of Thue in 1909, several papers were written giving an upper bound for the number of solutions of the above inequality as $\ll c(r,s,h)$ where $c(r,s,h)$ is an explicit function of $r,s$ and $h.$ Invariably, the absolute constant involved in $\ll$ has been left undetermined. In this paper, following Bombieri, Schmidt and Mueller, we give three different upper bounds which are explicit in every aspect. △ Less

Submitted 22 May, 2025; originally announced May 2025.

Comments: 45 pages

MSC Class: 11D61

arXiv:2505.08910 [pdf, ps, other]

Behind Maya: Building a Multilingual Vision Language Model

Authors: Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth. S, Snehanshu Mukherjee, Alham Fikri Aji

Abstract: In recent times, we have seen a rapid development of large Vision-Language Models (VLMs). They have shown impressive results on academic benchmarks, primarily in widely spoken languages but lack performance on low-resource languages and varied cultural contexts. To address these limitations, we introduce Maya, an open-source Multilingual VLM. Our contributions are: 1) a multilingual image-text pre… ▽ More In recent times, we have seen a rapid development of large Vision-Language Models (VLMs). They have shown impressive results on academic benchmarks, primarily in widely spoken languages but lack performance on low-resource languages and varied cultural contexts. To address these limitations, we introduce Maya, an open-source Multilingual VLM. Our contributions are: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; and 2) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks. Code available at https://github.com/nahidalam/maya. △ Less

Submitted 15 May, 2025; v1 submitted 13 May, 2025; originally announced May 2025.

Comments: Accepted at VLMs4ALL CVPR 2025 Workshop; corrected workshop name spelling

arXiv:2505.08442 [pdf, ps, other]

Robust μ-distortion constraints on primordial supermassive black holes from cubic (gNL) non-Gaussian perturbations

Authors: Xavier Pritchard, Christian T. Byrnes, Julien Lesgourgues, Devanshu Sharma

Abstract: We make the first calculation of the spectral distortion constraints on the primordial curvature power spectrum in the limit of large cubic non-Gaussianity. This calculation involves computing a 2-loop integral, which we perform analytically. Despite being non-perturbatively non-Gaussian, we show that the constraints only change significantly from the case of Gaussian perturbations in the high-k t… ▽ More We make the first calculation of the spectral distortion constraints on the primordial curvature power spectrum in the limit of large cubic non-Gaussianity. This calculation involves computing a 2-loop integral, which we perform analytically. Despite being non-perturbatively non-Gaussian, we show that the constraints only change significantly from the case of Gaussian perturbations in the high-k tail, where spectral distortions become weak. We conclude that generating primordial supermassive black holes requires even more extreme forms of non-Gaussianity. We also argue why the mu-distortion constraint is unlikely to significantly change even in the presence of more extreme local non-Gaussianity. △ Less

Submitted 13 May, 2025; originally announced May 2025.

Comments: 12 pages plus appendices, 6 figures

arXiv:2505.06151 [pdf, ps, other]

Estimating Quality in Therapeutic Conversations: A Multi-Dimensional Natural Language Processing Framework

Authors: Alice Rueda, Argyrios Perivolaris, Niloy Roy, Dylan Weston, Sarmed Shaya, Zachary Cote, Martin Ivanov, Bazen G. Teferra, Yuqi Wu, Sirisha Rambhatla, Divya Sharma, Andrew Greenshaw, Rakesh Jetly, Yanbo Zhang, Bo Cao, Reza Samavi, Sridhar Krishnan, Venkat Bhat

Abstract: Engagement between client and therapist is a critical determinant of therapeutic success. We propose a multi-dimensional natural language processing (NLP) framework that objectively classifies engagement quality in counseling sessions based on textual transcripts. Using 253 motivational interviewing transcripts (150 high-quality, 103 low-quality), we extracted 42 features across four domains: conv… ▽ More Engagement between client and therapist is a critical determinant of therapeutic success. We propose a multi-dimensional natural language processing (NLP) framework that objectively classifies engagement quality in counseling sessions based on textual transcripts. Using 253 motivational interviewing transcripts (150 high-quality, 103 low-quality), we extracted 42 features across four domains: conversational dynamics, semantic similarity as topic alignment, sentiment classification, and question detection. Classifiers, including Random Forest (RF), Cat-Boost, and Support Vector Machines (SVM), were hyperparameter tuned and trained using a stratified 5-fold cross-validation and evaluated on a holdout test set. On balanced (non-augmented) data, RF achieved the highest classification accuracy (76.7%), and SVM achieved the highest AUC (85.4%). After SMOTE-Tomek augmentation, performance improved significantly: RF achieved up to 88.9% accuracy, 90.0% F1-score, and 94.6% AUC, while SVM reached 81.1% accuracy, 83.1% F1-score, and 93.6% AUC. The augmented data results reflect the potential of the framework in future larger-scale applications. Feature contribution revealed conversational dynamics and semantic similarity between clients and therapists were among the top contributors, led by words uttered by the client (mean and standard deviation). The framework was robust across the original and augmented datasets and demonstrated consistent improvements in F1 scores and recall. While currently text-based, the framework supports future multimodal extensions (e.g., vocal tone, facial affect) for more holistic assessments. This work introduces a scalable, data-driven method for evaluating engagement quality of the therapy session, offering clinicians real-time feedback to enhance the quality of both virtual and in-person therapeutic interactions. △ Less

Submitted 9 May, 2025; originally announced May 2025.

Comments: 12 pages, 4 figures, 7 tables

arXiv:2505.01482 [pdf, other]

Understanding LLM Scientific Reasoning through Promptings and Model's Explanation on the Answers

Authors: Alice Rueda, Mohammed S. Hassan, Argyrios Perivolaris, Bazen G. Teferra, Reza Samavi, Sirisha Rambhatla, Yuqi Wu, Yanbo Zhang, Bo Cao, Divya Sharma, Sridhar Krishnan Venkat Bhat

Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and problem-solving across various domains. However, their ability to perform complex, multi-step reasoning task-essential for applications in science, medicine, and law-remains an area of active investigation. This paper examines the reasoning capabilities of contemporary LLMs, ana… ▽ More Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and problem-solving across various domains. However, their ability to perform complex, multi-step reasoning task-essential for applications in science, medicine, and law-remains an area of active investigation. This paper examines the reasoning capabilities of contemporary LLMs, analyzing their strengths, limitations, and potential for improvement. The study uses prompt engineering techniques on the Graduate-Level GoogleProof Q&A (GPQA) dataset to assess the scientific reasoning of GPT-4o. Five popular prompt engineering techniques and two tailored promptings were tested: baseline direct answer (zero-shot), chain-of-thought (CoT), zero-shot CoT, self-ask, self-consistency, decomposition, and multipath promptings. Our findings indicate that while LLMs exhibit emergent reasoning abilities, they often rely on pattern recognition rather than true logical inference, leading to inconsistencies in complex problem-solving. The results indicated that self-consistency outperformed the other prompt engineering technique with an accuracy of 52.99%, followed by direct answer (52.23%). Zero-shot CoT (50%) outperformed multipath (48.44%), decomposition (47.77%), self-ask (46.88%), and CoT (43.75%). Self-consistency performed the second worst in explaining the answers. Simple techniques such as direct answer, CoT, and zero-shot CoT have the best scientific reasoning. We propose a research agenda aimed at bridging these gaps by integrating structured reasoning frameworks, hybrid AI approaches, and human-in-the-loop methodologies. By critically evaluating the reasoning mechanisms of LLMs, this paper contributes to the ongoing discourse on the future of artificial general intelligence and the development of more robust, trustworthy AI systems. △ Less

Submitted 2 May, 2025; originally announced May 2025.

arXiv:2504.17120 [pdf, other]

Dynamic Shock Recovery in IO Networks with Priority Constraints

Authors: Jichu Han, Lina Wang, Richard Bookstaber, Dhruv Sharma

Abstract: Physical risks, such as droughts, floods, rising temperatures, earthquakes, infrastructure failures, and geopolitical conflicts, can ripple through global supply chains, raising costs, and constraining production across industries. Assessing these risks requires understanding not only their immediate effects, but also their cascading impacts. For example, a localized drought can disrupt the supply… ▽ More Physical risks, such as droughts, floods, rising temperatures, earthquakes, infrastructure failures, and geopolitical conflicts, can ripple through global supply chains, raising costs, and constraining production across industries. Assessing these risks requires understanding not only their immediate effects, but also their cascading impacts. For example, a localized drought can disrupt the supply of critical raw materials such as cobalt or copper, affecting battery and electric vehicle production. Similarly, regional conflicts can impede cross-border trade, leading to broader economic consequences. Building on an existing model of simultaneous supply and demand shocks, we introduce a new propagation algorithm, Priority with Constraint, which modifies standard priority-based rationing by incorporating a minimum supply guarantee for all customers, regardless of their size or priority ranking. We also identify a buffer effect inherent in the Industry Proportional algorithm, which reflects real-world economic resilience. Finally, we extend the static shock propagation model to incorporate dynamic processes. We introduce mechanisms for gradual shock propagation, reflecting demand stickiness and the potential buffering role of inventories, and gradual recovery, modeling the simultaneous recovery of supply capacity and the inherent tendency for demand to return to pre-shock levels. Simulations demonstrate how the interplay between demand adjustment speed and supply recovery speed significantly influences the severity and duration of the economic impact after a shock. △ Less

Submitted 23 April, 2025; originally announced April 2025.

Comments: 10 pages, 4 figures

arXiv:2504.13863 [pdf]

Utsarjan: A smartphone App for providing kidney care and real-time assistance to children with nephrotic syndrome

Authors: Snigdha Tiwari, Sahil Sharma, Arvind Bagga, Aditi Sinha, Deepak Sharma

Abstract: Background Telemedicine has the potential to provide secure and cost-effective healthcare at the touch of a button. Nephrotic syndrome is a chronic childhood illness involving frequent relapses and demands long/complex treatment. Hence, developing a remote means of doctor-patient interface will ensure the provision of quality healthcare to patients. Methods The Utsarjan mobile App framework was bu… ▽ More Background Telemedicine has the potential to provide secure and cost-effective healthcare at the touch of a button. Nephrotic syndrome is a chronic childhood illness involving frequent relapses and demands long/complex treatment. Hence, developing a remote means of doctor-patient interface will ensure the provision of quality healthcare to patients. Methods The Utsarjan mobile App framework was built with Flutter that enables cross-platform development (Android, iOS, Windows) with speed, smoothness, and open-source benefits. The frontend uses Dart for user interaction, while the backend employs Node.js, Express, and NGINX for APIs, load balancing and high performance. MongoDB ensures a flexible database, Bcrypt secures passwords, PM2 handles deployment, uptime and logs, while Firebase Cloud Messaging powers free push notifications. Results Utsarjan (means excretion) is a multi-functional smartphone application for giving nephrotic care and real-time assistance to all patients (especially those in rural regions and/or who do not have access to specialists). It helps patients and doctors by ensuring opportune visits, recording each clinical test/parameter and improving medication adherence. It gives a graphical visualization of relapses, medicine dosage as well as different anthropometric parameters (urine protein, BP, height and weight). This is the first nephrotic care App that enables prompt access to doctor's advice. Conclusions Utsarjan is a mobile App to provide kidney care and real-time assistance to children with nephrotic syndrome. It gives a graphical overview of changes in a patient's health over the long course of treatment. This will assist doctors in appropriately modifying the treatment regimen. Consequently, it will (hopefully) lead to the prevention of relapses and/or complications. △ Less

Submitted 26 March, 2025; originally announced April 2025.

Comments: 16 pages, 3 figures

arXiv:2504.11952 [pdf, other]

Robust and Fine-Grained Detection of AI Generated Texts

Authors: Ram Mohan Rao Kadiyala, Siddartha Pullakhandam, Kanwal Mehreen, Drishti Sharma, Siddhant Gupta, Jebish Purbey, Ashay Srivastava, Subhasya TippaReddy, Arvind Reddy Bobbili, Suraj Telugara Chandrashekhar, Modabbir Adeeb, Srinadh Vura, Hamza Farooq

Abstract: An ideal detection system for machine generated content is supposed to work well on any generator as many more advanced LLMs come into existence day by day. Existing systems often struggle with accurately identifying AI-generated content over shorter texts. Further, not all texts might be entirely authored by a human or LLM, hence we focused more over partial cases i.e human-LLM co-authored texts.… ▽ More An ideal detection system for machine generated content is supposed to work well on any generator as many more advanced LLMs come into existence day by day. Existing systems often struggle with accurately identifying AI-generated content over shorter texts. Further, not all texts might be entirely authored by a human or LLM, hence we focused more over partial cases i.e human-LLM co-authored texts. Our paper introduces a set of models built for the task of token classification which are trained on an extensive collection of human-machine co-authored texts, which performed well over texts of unseen domains, unseen generators, texts by non-native speakers and those with adversarial inputs. We also introduce a new dataset of over 2.4M such texts mostly co-authored by several popular proprietary LLMs over 23 languages. We also present findings of our models' performance over each texts of each domain and generator. Additional findings include comparison of performance against each adversarial method, length of input texts and characteristics of generated texts compared to the original human authored texts. △ Less

Submitted 22 May, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

Comments: 18 pages, 6 figures

arXiv:2504.09753 [pdf, other]

Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance

Authors: Ram Mohan Rao Kadiyala, Siddartha Pullakhandam, Siddhant Gupta, Drishti Sharma, Jebish Purbey, Kanwal Mehreen, Muhammad Arham, Hamza Farooq

Abstract: Large Language Models (LLMs) have shown remarkable capabilities, but their development has primarily focused on English and other high-resource languages, leaving many languages underserved. We present our latest Hindi-English bi-lingual LLM \textbf{Mantra-14B} with ~3\% average improvement in benchmark scores over both languages, outperforming models twice its size. Using a curated dataset compos… ▽ More Large Language Models (LLMs) have shown remarkable capabilities, but their development has primarily focused on English and other high-resource languages, leaving many languages underserved. We present our latest Hindi-English bi-lingual LLM \textbf{Mantra-14B} with ~3\% average improvement in benchmark scores over both languages, outperforming models twice its size. Using a curated dataset composed of English and Hindi instruction data of 485K samples, we instruction tuned models such as Qwen-2.5-14B-Instruct and Phi-4 to improve performance over both English and Hindi. Our experiments encompassing seven different LLMs of varying parameter sizes and over 140 training attempts with varying English-Hindi training data ratios demonstrated that it is possible to significantly improve multilingual performance without compromising native performance. Further, our approach avoids resource-intensive techniques like vocabulary expansion or architectural modifications, thus keeping the model size small. Our results indicate that modest fine-tuning with culturally and locally informed data can bridge performance gaps without incurring significant computational overhead. We release our training code, datasets, and models under mit and apache licenses to aid further research towards under-represented and low-resource languages. △ Less

Submitted 22 May, 2025; v1 submitted 13 April, 2025; originally announced April 2025.

Comments: 24 pages, 18 figures

arXiv:2504.08403 [pdf, other]

Optimizing Collaborative UAV Networks for Data Efficiency in IoT Ecosystems

Authors: Priyavrat Dev Sharma, Ibrahim Sorkhoh, Muthucumaru Maheswaran

Abstract: Advances in the Internet of Things are revolutionizing data acquisition, enhancing artificial intelligence and quality of service. Unmanned Aerial Vehicles (UAVs) provide an efficient data-gathering solution across varied environments. This paper addresses challenges in integrating UAVs for large scale data operations, including mobility, multi-hop paths, and optimized multi-source information tra… ▽ More Advances in the Internet of Things are revolutionizing data acquisition, enhancing artificial intelligence and quality of service. Unmanned Aerial Vehicles (UAVs) provide an efficient data-gathering solution across varied environments. This paper addresses challenges in integrating UAVs for large scale data operations, including mobility, multi-hop paths, and optimized multi-source information transfer. We propose a collaborative UAV framework that enables efficient data sharing with minimal communication overhead, featuring adaptive power control and dynamic resource allocation. Formulated as an NP-hard Integer Linear Program, our approach uses heuristic algorithms to optimize routing through UAV hubs. Simulations show promise in terms of computation time (99% speedup) and outcome (down to 14% deviation from the optimal). △ Less

Submitted 11 April, 2025; originally announced April 2025.

Comments: 7 pages, 6 figures. Accepted for presentation at the IEEE ICC Workshop 2025 in Montreal, Canada

arXiv:2504.07072 [pdf, other]

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

Authors: Israfel Salazar, Manuel Fernández Burda, Shayekh Bin Islam, Arshia Soltani Moakhar, Shivalika Singh, Fabian Farestam, Angelika Romanou, Danylo Boiko, Dipika Khullar, Mike Zhang, Dominik Krzemiński, Jekaterina Novikova, Luísa Shimabucoro, Joseph Marvin Imperial, Rishabh Maheshwary, Sharad Duwal, Alfonso Amayuelas, Swati Rajwal, Jebish Purbey, Ahmed Ruby, Nicholas Popovič, Marek Suppa, Azmine Toushik Wasi, Ram Mohan Rao Kadiyala, Olga Tsymboi , et al. (20 additional authors not shown)

Abstract: The evaluation of vision-language models (VLMs) has mainly relied on English-language benchmarks, leaving significant gaps in both multilingual and multicultural coverage. While multilingual benchmarks have expanded, both in size and languages, many rely on translations of English datasets, failing to capture cultural nuances. In this work, we propose Kaleidoscope, as the most comprehensive exam b… ▽ More The evaluation of vision-language models (VLMs) has mainly relied on English-language benchmarks, leaving significant gaps in both multilingual and multicultural coverage. While multilingual benchmarks have expanded, both in size and languages, many rely on translations of English datasets, failing to capture cultural nuances. In this work, we propose Kaleidoscope, as the most comprehensive exam benchmark to date for the multilingual evaluation of vision-language models. Kaleidoscope is a large-scale, in-language multimodal benchmark designed to evaluate VLMs across diverse languages and visual inputs. Kaleidoscope covers 18 languages and 14 different subjects, amounting to a total of 20,911 multiple-choice questions. Built through an open science collaboration with a diverse group of researchers worldwide, Kaleidoscope ensures linguistic and cultural authenticity. We evaluate top-performing multilingual vision-language models and find that they perform poorly on low-resource languages and in complex multimodal scenarios. Our results highlight the need for progress on culturally inclusive multimodal evaluation frameworks. △ Less

Submitted 29 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

Comments: v2: corrected the author list

arXiv:2504.06622 [pdf, other]

Quantum neural networks facilitating quantum state classification

Authors: Diksha Sharma, Vivek Balasaheb Sabale, Thirumalai M., Atul Kumar

Abstract: The classification of quantum states into distinct classes poses a significant challenge. In this study, we address this problem using quantum neural networks in combination with a problem-inspired circuit and customised as well as predefined ansätz. To facilitate the resource-efficient quantum state classification, we construct the dataset of quantum states using the proposed problem-inspired cir… ▽ More The classification of quantum states into distinct classes poses a significant challenge. In this study, we address this problem using quantum neural networks in combination with a problem-inspired circuit and customised as well as predefined ansätz. To facilitate the resource-efficient quantum state classification, we construct the dataset of quantum states using the proposed problem-inspired circuit. The problem-inspired circuit incorporates two-qubit parameterised unitary gates of varying entangling power, which is further integrated with the ansätz, developing an entire quantum neural network. To demonstrate the capability of the selected ansätz, we visualise the mitigated barren plateaus. The designed quantum neural network demonstrates the efficiency in binary and multi-class classification tasks. This work establishes a foundation for the classification of multi-qubit quantum states and offers the potential for generalisation to multi-qubit pure quantum states. △ Less

Submitted 9 April, 2025; originally announced April 2025.

arXiv:2504.02955 [pdf, other]

Azimuthal anisotropy of direct photons in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, M. Alfred, S. Antsupov, N. Apadula, H. Asano, B. Azmoun, V. Babintsev, M. Bai, N. S. Bandara, B. Bannier, E. Bannikov, K. N. Barish, S. Bathe, A. Bazilevsky, M. Beaumier, S. Beckman, R. Belmont, A. Berdnikov, Y. Berdnikov , et al. (301 additional authors not shown)

Abstract: The PHENIX experiment at the Relativistic Heavy Ion Collider measured the second Fourier component $v_2$ of the direct-photon azimuthal anisotropy at midrapidity in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV. The results are presented in 10\% wide bins of collision centrality and cover the transverse-momentum range of $1<p_T<20$ GeV/$c$, and are in quantitative agreement with findings publis… ▽ More The PHENIX experiment at the Relativistic Heavy Ion Collider measured the second Fourier component $v_2$ of the direct-photon azimuthal anisotropy at midrapidity in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV. The results are presented in 10\% wide bins of collision centrality and cover the transverse-momentum range of $1<p_T<20$ GeV/$c$, and are in quantitative agreement with findings published earlier, but provide better granularity and higher $p_T$ reach. Above a $p_T$ of 8--10 GeV/$c$, where hard scattering dominates the direct-photon production, $v_2$ is consistent with zero. Below that in each centrality bin $v_2$ as a function of $p_T$ is comparable to the $π^0$ anisotropy albeit with a tendency of being somewhat smaller. The results are compared to recent theory calculations that include, in addition to thermal radiation from the quark-gluon plasma and hadron gas, sources of photons from pre-equilibrium, strong magnetic fields, or radiative hadronization. While the newer theoretical calculations describe the data better than previous models, none of them alone can fully explain the results, particularly in the region of $p_T=4$--8 GeV/$c$. △ Less

Submitted 3 April, 2025; originally announced April 2025.

Comments: 325 authors from 71 institutions, 12 pages, 9 figures, 2 tables. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2503.21529 [pdf, other]

Physics-Informed Neural Network-Based Control for Grid-Forming Converter's Stability Under Overload Conditions

Authors: Abhay Kumar, Dushyant Sharma, Mayukha Pal

Abstract: Grid-forming converters (GFCs) are crucial for frequency and voltage stability in modern power systems. However, their performance under overload conditions remains a challenge. This paper highlights the limitations of existing approaches in managing DC source saturation and AC current limits, emphasizing the need for improved control strategies to ensure system stability. This paper proposes a co… ▽ More Grid-forming converters (GFCs) are crucial for frequency and voltage stability in modern power systems. However, their performance under overload conditions remains a challenge. This paper highlights the limitations of existing approaches in managing DC source saturation and AC current limits, emphasizing the need for improved control strategies to ensure system stability. This paper proposes a control strategy based on a physics-informed neural network (PINN) to improve GFC performance under overloaded conditions, effectively preventing switch failures and mitigating DC source saturation. This approach outperforms conventional methods by maintaining stable voltage and frequency, even under significant load increase where traditional droop control alone proves inadequate. The post-disturbance operating point of GFCs remains unchanged using PINN-based control with an improvement of 0.245 Hz in frequency and 0.03 p.u. in active power when compared to an already existing current limitation strategy. Additionally, it reduces peak voltage deviations during transients by 24.14\%, lowers the rate of change of frequency (ROCOF) from 0.02 Hz/s to 0.005 Hz/s, and improves the rate of change of voltage (ROCOV), keeping both within acceptable limits. These improvements significantly enhance system resilience, especially in inertia-less power networks. △ Less

Submitted 22 May, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

arXiv:2503.21163 [pdf]

Effect of convective transport in edge/SOL plasmas of ADITYA-U tokamak

Authors: Ritu Dey, Joydeep Ghosh, Tanmay M. Macwan, Kaushlender Singh, M. B. Chowdhuri, H. Raj, R. L. Tanna, Deepti Sharma, T. D. Rognlien

Abstract: The 2-D edge plasma fluid transport code, UEDGE has been used to simulate the edge region of circular limiter plasmas of ADITYA-U for modelling the measured electron density profile. The limiter geometry of ADITYA-U has been introduced in the UEDGE code, which is primarily developed and used for divertor configuration. The computational mesh defining the limiter geometry is generated by a routine… ▽ More The 2-D edge plasma fluid transport code, UEDGE has been used to simulate the edge region of circular limiter plasmas of ADITYA-U for modelling the measured electron density profile. The limiter geometry of ADITYA-U has been introduced in the UEDGE code, which is primarily developed and used for divertor configuration. The computational mesh defining the limiter geometry is generated by a routine developed in-house, and has successfully been integrated with the UEDGE code to simulate the edge plasma parameters of ADITYA-U. The radial profiles of edge and scrape-off layer (SOL) electron density, ne and temperature are obtained from the simulations and used to model the measured ne profile using Langmuir probe array. It has been found that a convective velocity, vconv. is definitely needed in addition to the constant perpendicular diffusion coefficient, D throughout the edge and SOL regions to model the edge ne profile. The obtained vconv. is inward and radially constant with a value of 1.5 m/s and the radially constant D is ~ 0.2 m 2/s. The value of D ~ 0.2 m2/s is found to be much less than fluctuation induced diffusivities and lies in-between the neoclassical diffusivity and Bohm diffusivity estimated in the edge-SOL region of ADITYA-U tokamak. Furthermore, the transport of radial electron heat flux is found to be maximizing near the limiter tip location in the poloidal plane. △ Less

Submitted 27 March, 2025; originally announced March 2025.

Comments: 24 pages, 13 figures, peer reviewed journal

Report number: RR_1337_2021

arXiv:2503.20339 [pdf, other]

Effect of $α$-clusters on particle production in O$-$O and p$-$O collisions at LHC energies

Authors: Deependra Sharma, Arpit Singh, Md. Samsul Islam, Basanta Nandi, Sadhana Dash

Abstract: In the present work, O$-$O collisions at $\sqrt{s_{NN}}$ = 7 TeV and p$-$O collisions at $\sqrt{s_{NN}}$ = 9.9 TeV are studied using PYTHIA8/Angantyr model for heavy-ion collisions. The theoretically predicted $α$-cluster structure of oxygen nucleus is implemented in the model to investigate the effect of initial configuration of oxygen nucleus on final state observables. The results obtained from… ▽ More In the present work, O$-$O collisions at $\sqrt{s_{NN}}$ = 7 TeV and p$-$O collisions at $\sqrt{s_{NN}}$ = 9.9 TeV are studied using PYTHIA8/Angantyr model for heavy-ion collisions. The theoretically predicted $α$-cluster structure of oxygen nucleus is implemented in the model to investigate the effect of initial configuration of oxygen nucleus on final state observables. The results obtained from $α$-cluster structure are compared with those obtained from Woods-Saxon nuclear charge density distribution. The Angantyr model simulation showed that the radial distribution of oxygen nucleus in $α$-cluster configuration is more compact in comparison to the Woods-Saxon distribution. The results on charged and identified particle pseudorapidity distribution is obtained in the two initial state configuration of the oxygen nucleus. The results demonstrated that the effect of initial geometrical configuration is more distinct in the non-central collisions in comparison to the central collisions for both O$-$O and p$-$O collisions. △ Less

Submitted 26 March, 2025; originally announced March 2025.

Comments: 11 pages, 11 figures

arXiv:2503.16115 [pdf, other]

A Non-Hermitian State-to-State Analysis of Transport in Aggregates with Multiple Endpoints

Authors: Devansh Sharma, Amartya Bose

Abstract: Efficiency of quantum transport through aggregates with multiple end-points or traps proves to be an emergent and a highly non-equilibrium phenomenon. We present a numerically exact approach for computing the emergent time scale and amount of extraction specific to particular traps leveraging a non-Hermitian generalization of the recently introduced state-to-state transport analysis [Bose and Walt… ▽ More Efficiency of quantum transport through aggregates with multiple end-points or traps proves to be an emergent and a highly non-equilibrium phenomenon. We present a numerically exact approach for computing the emergent time scale and amount of extraction specific to particular traps leveraging a non-Hermitian generalization of the recently introduced state-to-state transport analysis [Bose and Walters, J. Chem. Theory Comput. 2023, 19, 15, 4828-4836]. This method is able to simultaneously account for the coupling between various sites, the many-body effects brought in by the vibrations and environment held at a non-zero temperature, and the local extraction processes described by non-Hermitian terms in the Hamiltonian. In fact, our non-Hermitian state-to-state analysis goes beyond merely providing an emergent loss time-scale. It can parse the entire dynamics into the constituent internal transport pathways and loss to environment. We demonstrate this method using examples of an exciton transport in a lossy polaritonic cavity. The loss at the cavity and the extraction of the exciton from a terminal molecule provide competing mechanisms that our method helps to unravel, revealing extremely interesting non-intuitive physics. This non-Hermitian state-to-state analysis technique contributes an important link in understanding and elucidating the routes of transport in open quantum systems. △ Less

Submitted 20 March, 2025; originally announced March 2025.

Comments: 7 pages, 7 figures

arXiv:2503.11972 [pdf, other]

MoDM: Efficient Serving for Image Generation via Mixture-of-Diffusion Models

Authors: Yuchen Xia, Divyam Sharma, Yichao Yuan, Souvik Kundu, Nishil Talati

Abstract: Diffusion-based text-to-image generation models trade latency for quality: small models are fast but generate lower-quality images, while large models produce better images but are slow. We present MoDM, a novel caching-based serving system for diffusion models that dynamically balances latency and quality through a mixture of diffusion models. Unlike prior approaches that rely on model-specific… ▽ More Diffusion-based text-to-image generation models trade latency for quality: small models are fast but generate lower-quality images, while large models produce better images but are slow. We present MoDM, a novel caching-based serving system for diffusion models that dynamically balances latency and quality through a mixture of diffusion models. Unlike prior approaches that rely on model-specific internal features, MoDM caches final images, allowing seamless retrieval and reuse across multiple diffusion model families. This design enables adaptive serving by dynamically balancing latency and image quality: using smaller models for cache-hit requests to reduce latency while reserving larger models for cache-miss requests to maintain quality. Small model image quality is preserved using retrieved cached images. We design a global monitor that optimally allocates GPU resources and balances inference workload, ensuring high throughput while meeting service-level objectives under varying request rates. Our evaluations show that MoDM significantly reduces average serving time by 2.5x while retaining image quality, making it a practical solution for scalable and resource-efficient model deployment. △ Less

Submitted 14 March, 2025; originally announced March 2025.

arXiv:2503.11807 [pdf, other]

Mitigating Bad Ground Truth in Supervised Machine Learning based Crop Classification: A Multi-Level Framework with Sentinel-2 Images

Authors: Sanayya A, Amoolya Shetty, Abhijeet Sharma, Venkatesh Ravichandran, Masthan Wali Gosuvarapalli, Sarthak Jain, Priyamvada Nanjundiah, Ujjal Kr Dutta, Divya Sharma

Abstract: In agricultural management, precise Ground Truth (GT) data is crucial for accurate Machine Learning (ML) based crop classification. Yet, issues like crop mislabeling and incorrect land identification are common. We propose a multi-level GT cleaning framework while utilizing multi-temporal Sentinel-2 data to address these issues. Specifically, this framework utilizes generating embeddings for farml… ▽ More In agricultural management, precise Ground Truth (GT) data is crucial for accurate Machine Learning (ML) based crop classification. Yet, issues like crop mislabeling and incorrect land identification are common. We propose a multi-level GT cleaning framework while utilizing multi-temporal Sentinel-2 data to address these issues. Specifically, this framework utilizes generating embeddings for farmland, clustering similar crop profiles, and identification of outliers indicating GT errors. We validated clusters with False Colour Composite (FCC) checks and used distance-based metrics to scale and automate this verification process. The importance of cleaning the GT data became apparent when the models were trained on the clean and unclean data. For instance, when we trained a Random Forest model with the clean GT data, we achieved upto 70\% absolute percentage points higher for the F1 score metric. This approach advances crop classification methodologies, with potential for applications towards improving loan underwriting and agricultural decision-making. △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: Accepted In IEEE India Geoscience and Remote Sensing Symposium (InGARSS) 2024

arXiv:2503.04184 [pdf]

Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences

Authors: Adnan Shahid, Adrian Kliks, Ahmed Al-Tahmeesschi, Ahmed Elbakary, Alexandros Nikou, Ali Maatouk, Ali Mokh, Amirreza Kazemi, Antonio De Domenico, Athanasios Karapantelakis, Bo Cheng, Bo Yang, Bohao Wang, Carlo Fischione, Chao Zhang, Chaouki Ben Issaid, Chau Yuen, Chenghui Peng, Chongwen Huang, Christina Chaccour, Christo Kurisummoottil Thomas, Dheeraj Sharma, Dimitris Kalogiros, Dusit Niyato, Eli De Poorter , et al. (110 additional authors not shown)

Abstract: This white paper discusses the role of large-scale AI in the telecommunications industry, with a specific focus on the potential of generative AI to revolutionize network functions and user experiences, especially in the context of 6G systems. It highlights the development and deployment of Large Telecom Models (LTMs), which are tailored AI models designed to address the complex challenges faced b… ▽ More This white paper discusses the role of large-scale AI in the telecommunications industry, with a specific focus on the potential of generative AI to revolutionize network functions and user experiences, especially in the context of 6G systems. It highlights the development and deployment of Large Telecom Models (LTMs), which are tailored AI models designed to address the complex challenges faced by modern telecom networks. The paper covers a wide range of topics, from the architecture and deployment strategies of LTMs to their applications in network management, resource allocation, and optimization. It also explores the regulatory, ethical, and standardization considerations for LTMs, offering insights into their future integration into telecom infrastructure. The goal is to provide a comprehensive roadmap for the adoption of LTMs to enhance scalability, performance, and user-centric innovation in telecom networks. △ Less

Submitted 6 March, 2025; originally announced March 2025.

arXiv:2503.03184 [pdf, ps, other]

PAC Learning with Improvements

Authors: Idan Attias, Avrim Blum, Keziah Naggita, Donya Saless, Dravyansh Sharma, Matthew Walter

Abstract: One of the most basic lower bounds in machine learning is that in nearly any nontrivial setting, it takes $\textit{at least}$ $1/ε$ samples to learn to error $ε$ (and more, if the classifier being learned is complex). However, suppose that data points are agents who have the ability to improve by a small amount if doing so will allow them to receive a (desired) positive classification. In that cas… ▽ More One of the most basic lower bounds in machine learning is that in nearly any nontrivial setting, it takes $\textit{at least}$ $1/ε$ samples to learn to error $ε$ (and more, if the classifier being learned is complex). However, suppose that data points are agents who have the ability to improve by a small amount if doing so will allow them to receive a (desired) positive classification. In that case, we may actually be able to achieve $\textit{zero}$ error by just being "close enough". For example, imagine a hiring test used to measure an agent's skill at some job such that for some threshold $θ$, agents who score above $θ$ will be successful and those who score below $θ$ will not (i.e., learning a threshold on the line). Suppose also that by putting in effort, agents can improve their skill level by some small amount $r$. In that case, if we learn an approximation $\hatθ$ of $θ$ such that $θ\leq \hatθ \leq θ+ r$ and use it for hiring, we can actually achieve error zero, in the sense that (a) any agent classified as positive is truly qualified, and (b) any agent who truly is qualified can be classified as positive by putting in effort. Thus, the ability for agents to improve has the potential to allow for a goal one could not hope to achieve in standard models, namely zero error. In this paper, we explore this phenomenon more broadly, giving general results and examining under what conditions the ability of agents to improve can allow for a reduction in the sample complexity of learning, or alternatively, can make learning harder. We also examine both theoretically and empirically what kinds of improvement-aware algorithms can take into account agents who have the ability to improve to a limited extent when it is in their interest to do so. △ Less

Submitted 3 June, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

Comments: 41 pages, 13 figures, ICML 2025

arXiv:2502.16255 [pdf]

rECGnition_v2.0: Self-Attentive Canonical Fusion of ECG and Patient Data using deep learning for effective Cardiac Diagnostics

Authors: Shreya Srivastava, Durgesh Kumar, Ram Jiwari, Sandeep Seth, Deepak Sharma

Abstract: The variability in ECG readings influenced by individual patient characteristics has posed a considerable challenge to adopting automated ECG analysis in clinical settings. A novel feature fusion technique termed SACC (Self Attentive Canonical Correlation) was proposed to address this. This technique is combined with DPN (Dual Pathway Network) and depth-wise separable convolution to create a robus… ▽ More The variability in ECG readings influenced by individual patient characteristics has posed a considerable challenge to adopting automated ECG analysis in clinical settings. A novel feature fusion technique termed SACC (Self Attentive Canonical Correlation) was proposed to address this. This technique is combined with DPN (Dual Pathway Network) and depth-wise separable convolution to create a robust, interpretable, and fast end-to-end arrhythmia classification model named rECGnition_v2.0 (robust ECG abnormality detection). This study uses MIT-BIH, INCARTDB and EDB dataset to evaluate the efficiency of rECGnition_v2.0 for various classes of arrhythmias. To investigate the influence of constituting model components, various ablation studies were performed, i.e. simple concatenation, CCA and proposed SACC were compared, while the importance of global and local ECG features were tested using DPN rECGnition_v2.0 model and vice versa. It was also benchmarked with state-of-the-art CNN models for overall accuracy vs model parameters, FLOPs, memory requirements, and prediction time. Furthermore, the inner working of the model was interpreted by comparing the activation locations in ECG before and after the SACC layer. rECGnition_v2.0 showed a remarkable accuracy of 98.07% and an F1-score of 98.05% for classifying ten distinct classes of arrhythmia with just 82.7M FLOPs per sample, thereby going beyond the performance metrics of current state-of-the-art (SOTA) models by utilizing MIT-BIH Arrhythmia dataset. Similarly, on INCARTDB and EDB datasets, excellent F1-scores of 98.01% and 96.21% respectively was achieved for AAMI classification. The compact architectural footprint of the rECGnition_v2.0, characterized by its lesser trainable parameters and diminished computational demands, unfurled several advantages including interpretability and scalability. △ Less

Submitted 22 February, 2025; originally announced February 2025.

arXiv:2502.14360 [pdf]

Weed Detection using Convolutional Neural Network

Authors: Santosh Kumar Tripathi, Shivendra Pratap Singh, Devansh Sharma, Harshavardhan U Patekar

Abstract: In this paper we use convolutional neural networks (CNNs) for weed detection in agricultural land. We specifically investigate the application of two CNN layer types, Conv2d and dilated Conv2d, for weed detection in crop fields. The suggested method extracts features from the input photos using pre-trained models, which are subsequently adjusted for weed detection. The findings of the experiment,… ▽ More In this paper we use convolutional neural networks (CNNs) for weed detection in agricultural land. We specifically investigate the application of two CNN layer types, Conv2d and dilated Conv2d, for weed detection in crop fields. The suggested method extracts features from the input photos using pre-trained models, which are subsequently adjusted for weed detection. The findings of the experiment, which used a sizable collection of dataset consisting of 15336 segments, being 3249 of soil, 7376 of soybean, 3520 grass and 1191 of broadleaf weeds. show that the suggested approach can accurately and successfully detect weeds at an accuracy of 94%. This study has significant ramifications for lowering the usage of toxic herbicides and increasing the effectiveness of weed management in agriculture. △ Less

Submitted 20 February, 2025; originally announced February 2025.

arXiv:2502.14234 [pdf, other]

OBELiX: A Curated Dataset of Crystal Structures and Experimentally Measured Ionic Conductivities for Lithium Solid-State Electrolytes

Authors: Félix Therrien, Jamal Abou Haibeh, Divya Sharma, Rhiannon Hendley, Alex Hernández-García, Sun Sun, Alain Tchagang, Jiang Su, Samuel Huberman, Yoshua Bengio, Hongyu Guo, Homin Shin

Abstract: Solid-state electrolyte batteries are expected to replace liquid electrolyte lithium-ion batteries in the near future thanks to their higher theoretical energy density and improved safety. However, their adoption is currently hindered by their lower effective ionic conductivity, a quantity that governs charge and discharge rates. Identifying highly ion-conductive materials using conventional theor… ▽ More Solid-state electrolyte batteries are expected to replace liquid electrolyte lithium-ion batteries in the near future thanks to their higher theoretical energy density and improved safety. However, their adoption is currently hindered by their lower effective ionic conductivity, a quantity that governs charge and discharge rates. Identifying highly ion-conductive materials using conventional theoretical calculations and experimental validation is both time-consuming and resource-intensive. While machine learning holds the promise to expedite this process, relevant ionic conductivity and structural data is scarce. Here, we present OBELiX, a domain-expert-curated database of $\sim$600 synthesized solid electrolyte materials and their experimentally measured room temperature ionic conductivities gathered from literature. Each material is described by their measured composition, space group and lattice parameters. A full-crystal description in the form of a crystallographic information file (CIF) is provided for ~320 structures for which atomic positions were available. We discuss various statistics and features of the dataset and provide training and testing splits that avoid data leakage. Finally, we benchmark seven existing ML models on the task of predicting ionic conductivity and discuss their performance. The goal of this work is to facilitate the use of machine learning for solid-state electrolyte materials discovery. △ Less

Submitted 19 February, 2025; originally announced February 2025.

Comments: 8 pages, 3 figures and 2 tables

arXiv:2502.12937 [pdf, other]

Tuning Algorithmic and Architectural Hyperparameters in Graph-Based Semi-Supervised Learning with Provable Guarantees

Authors: Ally Yalei Du, Eric Huang, Dravyansh Sharma

Abstract: Graph-based semi-supervised learning is a powerful paradigm in machine learning for modeling and exploiting the underlying graph structure that captures the relationship between labeled and unlabeled data. A large number of classical as well as modern deep learning based algorithms have been proposed for this problem, often having tunable hyperparameters. We initiate a formal study of tuning algor… ▽ More Graph-based semi-supervised learning is a powerful paradigm in machine learning for modeling and exploiting the underlying graph structure that captures the relationship between labeled and unlabeled data. A large number of classical as well as modern deep learning based algorithms have been proposed for this problem, often having tunable hyperparameters. We initiate a formal study of tuning algorithm hyperparameters from parameterized algorithm families for this problem. We obtain novel $O(\log n)$ pseudo-dimension upper bounds for hyperparameter selection in three classical label propagation-based algorithm families, where $n$ is the number of nodes, implying bounds on the amount of data needed for learning provably good parameters. We further provide matching $Ω(\log n)$ pseudo-dimension lower bounds, thus asymptotically characterizing the learning-theoretic complexity of the parameter tuning problem. We extend our study to selecting architectural hyperparameters in modern graph neural networks. We bound the Rademacher complexity for tuning the self-loop weighting in recently proposed Simplified Graph Convolution (SGC) networks. We further propose a tunable architecture that interpolates graph convolutional neural networks (GCN) and graph attention networks (GAT) in every layer, and provide Rademacher complexity bounds for tuning the interpolation coefficient. △ Less

Submitted 18 February, 2025; originally announced February 2025.

Comments: 31 pages (11 pages main body), 2 figures

arXiv:2502.00296 [pdf, ps, other]

Perfect powers as sums of convergent denominators of quadratic irrationals

Authors: Divyum Sharma, L. Singhal

Abstract: Let $α$ be a fixed quadratic irrational. Consider the Diophantine equation \[ y^a\ =\ q_{N_1} + \cdots + q_{N_K},\quad N_1 \geq \cdots \geq N_{K} \geq 0,\quad a, y \geq 2 \] where $(q_N)_{N\,\geq\,0}$ is the sequence of convergent denominators to $α$. We find two effective upper bounds for $y^a$ which depend on the Hamming weights of $y$ with respect to its radix and Zeckendorf representations,… ▽ More Let $α$ be a fixed quadratic irrational. Consider the Diophantine equation \[ y^a\ =\ q_{N_1} + \cdots + q_{N_K},\quad N_1 \geq \cdots \geq N_{K} \geq 0,\quad a, y \geq 2 \] where $(q_N)_{N\,\geq\,0}$ is the sequence of convergent denominators to $α$. We find two effective upper bounds for $y^a$ which depend on the Hamming weights of $y$ with respect to its radix and Zeckendorf representations, respectively. The latter bound extends a recent result of Vukusic and Ziegler. En route, we obtain an analogue of a theorem by Kebli, Kihel, Larone and Luca. △ Less

Submitted 31 January, 2025; originally announced February 2025.

Comments: 17 pages, 1 figure

MSC Class: 11D61; 11J86; 11A63; 11B39

arXiv:2501.18953 [pdf, other]

StruM: Structured Mixed Precision for Efficient Deep Learning Hardware Codesign

Authors: Michael Wu, Arnab Raha, Deepak A. Mathaikutty, Martin Langhammer, Engin Tunali, Daksha Sharma

Abstract: In this paper, we propose StruM, a novel structured mixed-precision-based deep learning inference method, co-designed with its associated hardware accelerator (DPU), to address the escalating computational and memory demands of deep learning workloads in data centers and edge applications. Diverging from traditional approaches, our method avoids time-consuming re-training/fine-tuning and specializ… ▽ More In this paper, we propose StruM, a novel structured mixed-precision-based deep learning inference method, co-designed with its associated hardware accelerator (DPU), to address the escalating computational and memory demands of deep learning workloads in data centers and edge applications. Diverging from traditional approaches, our method avoids time-consuming re-training/fine-tuning and specialized hardware access. By leveraging the variance in weight magnitudes within layers, we quantize values within blocks to two different levels, achieving up to a 50% reduction in precision for 8-bit integer weights to 4-bit values across various Convolutional Neural Networks (CNNs) with negligible loss in inference accuracy. To demonstrate efficiency gains by utilizing mixed precision, we implement StruM on top of our in-house FlexNN DNN accelerator [1] that supports low and mixed-precision execution. Experimental results depict that the proposed StruM-based hardware architecture achieves a 31-34% reduction in processing element (PE) power consumption and a 10% reduction in area at the accelerator level. In addition, the statically configured StruM results in 23-26% area reduction at the PE level and 2-3% area savings at the DPU level. △ Less

Submitted 17 May, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

Comments: Version 1, 12 pages, 13 figures

arXiv:2501.16687 [pdf, other]

Optical Skyrmions of Vortex Darkness

Authors: Nilo Mata-Cervera, Deepak K. Sharma, Yijie Shen, Ramon Paniagua-Dominguez, Miguel A. Porras

Abstract: We disclose the existence of a type of optical skyrmion, Gauss-Stokes (GS) skyrmions, that is naturally present in an optical vortex around its phase singularity. Contrary to previous research with optical skyrmions, we neither shape vector beams nor superpose different spatial modes and polarizations. In GS skyrmions, the phase singularity in the transversal field of a single monochromatic beam o… ▽ More We disclose the existence of a type of optical skyrmion, Gauss-Stokes (GS) skyrmions, that is naturally present in an optical vortex around its phase singularity. Contrary to previous research with optical skyrmions, we neither shape vector beams nor superpose different spatial modes and polarizations. In GS skyrmions, the phase singularity in the transversal field of a single monochromatic beam of uniform polarization (a scalar beam) is concealed by the axial field dictated by Gauss's divergence law, giving rise to a polarization singularity of undefined polarization plane. This singularity is enclosed by a rich skyrmionic polarization texture fulfilling a topological map and covering all the states of transverse-axial polarization. In our experiment, we facilitate the observation of a GS skyrmion with the predicted features using focused fields with enhanced axial component. △ Less

Submitted 27 January, 2025; originally announced January 2025.

Comments: 10 pages, 6 figures

arXiv:2501.13734 [pdf, other]

Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function

Authors: Maria-Florina Balcan, Anh Tuan Nguyen, Dravyansh Sharma

Abstract: Modern machine learning algorithms, especially deep learning based techniques, typically involve careful hyperparameter tuning to achieve the best performance. Despite the surge of intense interest in practical techniques like Bayesian optimization and random search based approaches to automating this laborious and compute intensive task, the fundamental learning theoretic complexity of tuning hyp… ▽ More Modern machine learning algorithms, especially deep learning based techniques, typically involve careful hyperparameter tuning to achieve the best performance. Despite the surge of intense interest in practical techniques like Bayesian optimization and random search based approaches to automating this laborious and compute intensive task, the fundamental learning theoretic complexity of tuning hyperparameters for deep neural networks is poorly understood. Inspired by this glaring gap, we initiate the formal study of hyperparameter tuning complexity in deep learning through a recently introduced data driven setting. We assume that we have a series of deep learning tasks, and we have to tune hyperparameters to do well on average over the distribution of tasks. A major difficulty is that the utility function as a function of the hyperparameter is very volatile and furthermore, it is given implicitly by an optimization problem over the model parameters. To tackle this challenge, we introduce a new technique to characterize the discontinuities and oscillations of the utility function on any fixed problem instance as we vary the hyperparameter; our analysis relies on subtle concepts including tools from differential/algebraic geometry and constrained optimization. This can be used to show that the learning theoretic complexity of the corresponding family of utility functions is bounded. We instantiate our results and provide sample complexity bounds for concrete applications tuning a hyperparameter that interpolates neural activation functions and setting the kernel parameter in graph neural networks. △ Less

Submitted 29 April, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

Comments: 57 pages, 4 figures

arXiv:2501.02926 [pdf, other]

Offline-to-online hyperparameter transfer for stochastic bandits

Authors: Dravyansh Sharma, Arun Sai Suggala

Abstract: Classic algorithms for stochastic bandits typically use hyperparameters that govern their critical properties such as the trade-off between exploration and exploitation. Tuning these hyperparameters is a problem of great practical significance. However, this is a challenging problem and in certain cases is information theoretically impossible. To address this challenge, we consider a practically r… ▽ More Classic algorithms for stochastic bandits typically use hyperparameters that govern their critical properties such as the trade-off between exploration and exploitation. Tuning these hyperparameters is a problem of great practical significance. However, this is a challenging problem and in certain cases is information theoretically impossible. To address this challenge, we consider a practically relevant transfer learning setting where one has access to offline data collected from several bandit problems (tasks) coming from an unknown distribution over the tasks. Our aim is to use this offline data to set the hyperparameters for a new task drawn from the unknown distribution. We provide bounds on the inter-task (number of tasks) and intra-task (number of arm pulls for each task) sample complexity for learning near-optimal hyperparameters on unseen tasks drawn from the distribution. Our results apply to several classic algorithms, including tuning the exploration parameters in UCB and LinUCB and the noise parameter in GP-UCB. Our experiments indicate the significance and effectiveness of the transfer of hyperparameters from offline problems in online learning with stochastic bandit feedback. △ Less

Submitted 6 January, 2025; originally announced January 2025.

Comments: AAAI 2025

arXiv:2412.14026 [pdf, other]

Dynamics of Hot QCD Matter 2024 -- Hard Probes

Authors: Santosh K. Das, Prabhakar Palni, Amal Sarkar, Vineet Kumar Agotiya, Aritra Bandyopadhyay, Partha Pratim Bhaduri, Saumen Datta, Vaishnavi Desai, Debarshi Dey, Vincenzo Greco, Mohammad Yousuf Jamal, Gurleen Kaur, Manisha Kumari, Monideepa Maity, Subrata Pal, Binoy Krishna Patra, Pooja, Jai Prakash, Manaswini Priyadarshini, Vyshakh B R, Marco Ruggieri, Nihar Ranjan Sahoo, Raghunath Sahoo, Om Shahi, Devanshu Sharma , et al. (2 additional authors not shown)

Abstract: The hot and dense QCD matter, known as the Quark-Gluon Plasma (QGP), is explored through heavy-ion collision experiments at the LHC and RHIC. Jets and heavy flavors, produced from the initial hard scattering, are used as hard probes to study the properties of the QGP. Recent experimental observations on jet quenching and heavy-flavor suppression have strengthened our understanding, allowing for fi… ▽ More The hot and dense QCD matter, known as the Quark-Gluon Plasma (QGP), is explored through heavy-ion collision experiments at the LHC and RHIC. Jets and heavy flavors, produced from the initial hard scattering, are used as hard probes to study the properties of the QGP. Recent experimental observations on jet quenching and heavy-flavor suppression have strengthened our understanding, allowing for fine-tuning of theoretical models in hard probes. The second conference, HOT QCD Matter 2024, was organized to bring the community together for discussions on key topics in the field. This article comprises 15 sections, each addressing various aspects of hard probes in relativistic heavy-ion collisions, offering a snapshot of current experimental observations and theoretical advancements. The article begins with a discussion on memory effects in the quantum evolution of quarkonia in the quark-gluon plasma, followed by an experimental review, new insights on jet quenching at RHIC and LHC, and concludes with a machine learning approach to heavy flavor production at the Large Hadron Collider. △ Less

Submitted 18 December, 2024; originally announced December 2024.

Comments: Compilation of the 15 contributions in Hard Probes presented at the second 'Hot QCD Matter 2024 Conference' held from July 1-3, 2024, organized by IIT Mandi, India

arXiv:2412.12552 [pdf, other]

SAModified: A Foundation Model-Based Zero-Shot Approach for Refining Noisy Land-Use Land-Cover Maps

Authors: Sparsh Pekhale, Rakshith Sathish, Sathisha Basavaraju, Divya Sharma

Abstract: Land-use and land cover (LULC) analysis is critical in remote sensing, with wide-ranging applications across diverse fields such as agriculture, utilities, and urban planning. However, automating LULC map generation using machine learning is rendered challenging due to noisy labels. Typically, the ground truths (e.g. ESRI LULC, MapBioMass) have noisy labels that hamper the model's ability to learn… ▽ More Land-use and land cover (LULC) analysis is critical in remote sensing, with wide-ranging applications across diverse fields such as agriculture, utilities, and urban planning. However, automating LULC map generation using machine learning is rendered challenging due to noisy labels. Typically, the ground truths (e.g. ESRI LULC, MapBioMass) have noisy labels that hamper the model's ability to learn to accurately classify the pixels. Further, these erroneous labels can significantly distort the performance metrics of a model, leading to misleading evaluations. Traditionally, the ambiguous labels are rectified using unsupervised algorithms. These algorithms struggle not only with scalability but also with generalization across different geographies. To overcome these challenges, we propose a zero-shot approach using the foundation model, Segment Anything Model (SAM), to automatically delineate different land parcels/regions and leverage them to relabel the unsure pixels by using the local label statistics within each detected region. We achieve a significant reduction in label noise and an improvement in the performance of the downstream segmentation model by $\approx 5\%$ when trained with denoised labels. △ Less

Submitted 17 December, 2024; originally announced December 2024.

arXiv:2412.11836 [pdf]

UnMA-CapSumT: Unified and Multi-Head Attention-driven Caption Summarization Transformer

Authors: Dhruv Sharma, Chhavi Dhiman, Dinesh Kumar

Abstract: Image captioning is the generation of natural language descriptions of images which have increased immense popularity in the recent past. With this different deep-learning techniques are devised for the development of factual and stylized image captioning models. Previous models focused more on the generation of factual and stylized captions separately providing more than one caption for a single… ▽ More Image captioning is the generation of natural language descriptions of images which have increased immense popularity in the recent past. With this different deep-learning techniques are devised for the development of factual and stylized image captioning models. Previous models focused more on the generation of factual and stylized captions separately providing more than one caption for a single image. The descriptions generated from these suffer from out-of-vocabulary and repetition issues. To the best of our knowledge, no such work exists that provided a description that integrates different captioning methods to describe the contents of an image with factual and stylized (romantic and humorous) elements. To overcome these limitations, this paper presents a novel Unified Attention and Multi-Head Attention-driven Caption Summarization Transformer (UnMA-CapSumT) based Captioning Framework. It utilizes both factual captions and stylized captions generated by the Modified Adaptive Attention-based factual image captioning model (MAA-FIC) and Style Factored Bi-LSTM with attention (SF-Bi-ALSTM) driven stylized image captioning model respectively. SF-Bi-ALSTM-based stylized IC model generates two prominent styles of expression- {romance, and humor}. The proposed summarizer UnMHA-ST combines both factual and stylized descriptions of an input image to generate styled rich coherent summarized captions. The proposed UnMHA-ST transformer learns and summarizes different linguistic styles efficiently by incorporating proposed word embedding fastText with Attention Word Embedding (fTA-WE) and pointer-generator network with coverage mechanism concept to solve the out-of-vocabulary issues and repetition problem. Extensive experiments are conducted on Flickr8K and a subset of FlickrStyle10K with supporting ablation studies to prove the efficiency and efficacy of the proposed framework. △ Less

Submitted 16 December, 2024; originally announced December 2024.

arXiv:2412.07112 [pdf, other]

Maya: An Instruction Finetuned Multilingual Multimodal Model

Authors: Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth. S, Snehanshu Mukherjee, Alham Fikri Aji

Abstract: The rapid development of large Vision-Language Models (VLMs) has led to impressive results on academic benchmarks, primarily in widely spoken languages. However, significant gaps remain in the ability of current VLMs to handle low-resource languages and varied cultural contexts, largely due to a lack of high-quality, diverse, and safety-vetted data. Consequently, these models often struggle to und… ▽ More The rapid development of large Vision-Language Models (VLMs) has led to impressive results on academic benchmarks, primarily in widely spoken languages. However, significant gaps remain in the ability of current VLMs to handle low-resource languages and varied cultural contexts, largely due to a lack of high-quality, diverse, and safety-vetted data. Consequently, these models often struggle to understand low-resource languages and cultural nuances in a manner free from toxicity. To address these limitations, we introduce Maya, an open-source Multimodal Multilingual model. Our contributions are threefold: 1) a multilingual image-text pretraining dataset in eight languages, based on the LLaVA pretraining dataset; 2) a thorough analysis of toxicity within the LLaVA dataset, followed by the creation of a novel toxicity-free version across eight languages; and 3) a multilingual image-text model supporting these languages, enhancing cultural and linguistic comprehension in vision-language tasks. Code available at https://github.com/nahidalam/maya. △ Less

Submitted 9 December, 2024; originally announced December 2024.

arXiv:2412.06009 [pdf, other]

1-800-SHARED-TASKS at RegNLP: Lexical Reranking of Semantic Retrieval (LeSeR) for Regulatory Question Answering

Authors: Jebish Purbey, Drishti Sharma, Siddhant Gupta, Khawaja Murad, Siddartha Pullakhandam, Ram Mohan Rao Kadiyala

Abstract: This paper presents the system description of our entry for the COLING 2025 RegNLP RIRAG (Regulatory Information Retrieval and Answer Generation) challenge, focusing on leveraging advanced information retrieval and answer generation techniques in regulatory domains. We experimented with a combination of embedding models, including Stella, BGE, CDE, and Mpnet, and leveraged fine-tuning and rerankin… ▽ More This paper presents the system description of our entry for the COLING 2025 RegNLP RIRAG (Regulatory Information Retrieval and Answer Generation) challenge, focusing on leveraging advanced information retrieval and answer generation techniques in regulatory domains. We experimented with a combination of embedding models, including Stella, BGE, CDE, and Mpnet, and leveraged fine-tuning and reranking for retrieving relevant documents in top ranks. We utilized a novel approach, LeSeR, which achieved competitive results with a recall@10 of 0.8201 and map@10 of 0.6655 for retrievals. This work highlights the transformative potential of natural language processing techniques in regulatory applications, offering insights into their capabilities for implementing a retrieval augmented generation system while identifying areas for future improvement in robustness and domain adaptation. △ Less

Submitted 8 December, 2024; originally announced December 2024.

Comments: 5 pages, Accepted to RegNLP @ COLING 2025

arXiv:2412.04351 [pdf, ps, other]

BhashaVerse : Translation Ecosystem for Indian Subcontinent Languages

Authors: Vandan Mujadia, Dipti Misra Sharma

Abstract: This paper focuses on developing translation models and related applications for 36 Indian languages, including Assamese, Awadhi, Bengali, Bhojpuri, Braj, Bodo, Dogri, English, Konkani, Gondi, Gujarati, Hindi, Hinglish, Ho, Kannada, Kangri, Kashmiri (Arabic and Devanagari), Khasi, Mizo, Magahi, Maithili, Malayalam, Marathi, Manipuri (Bengali and Meitei), Nepali, Oriya, Punjabi, Sanskrit, Santali,… ▽ More This paper focuses on developing translation models and related applications for 36 Indian languages, including Assamese, Awadhi, Bengali, Bhojpuri, Braj, Bodo, Dogri, English, Konkani, Gondi, Gujarati, Hindi, Hinglish, Ho, Kannada, Kangri, Kashmiri (Arabic and Devanagari), Khasi, Mizo, Magahi, Maithili, Malayalam, Marathi, Manipuri (Bengali and Meitei), Nepali, Oriya, Punjabi, Sanskrit, Santali, Sinhala, Sindhi (Arabic and Devanagari), Tamil, Tulu, Telugu, and Urdu. Achieving this requires parallel and other types of corpora for all 36 * 36 language pairs, addressing challenges like script variations, phonetic differences, and syntactic diversity. For instance, languages like Kashmiri and Sindhi, which use multiple scripts, demand script normalization for alignment, while low-resource languages such as Khasi and Santali require synthetic data augmentation to ensure sufficient coverage and quality. To address these challenges, this work proposes strategies for corpus creation by leveraging existing resources, developing parallel datasets, generating domain-specific corpora, and utilizing synthetic data techniques. Additionally, it evaluates machine translation across various dimensions, including standard and discourse-level translation, domain-specific translation, reference-based and reference-free evaluation, error analysis, and automatic post-editing. By integrating these elements, the study establishes a comprehensive framework to improve machine translation quality and enable better cross-lingual communication in India's linguistically diverse ecosystem. △ Less

Submitted 2 January, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

arXiv:2412.03033 [pdf]

Unveiling Saving and Credit Dynamics: Insights from Financial Diaries and Surveys among Low-Income Households in Unauthorized Colonies in Delhi

Authors: Divya Sharma

Abstract: The paper presents findings from a comprehensive study examining the saving and credit behaviors of low-income households residing in unauthorized colonies within a metropolitan area. Utilizing a dual approach, the study engaged in prolonged fieldwork, including repeated fortnightly interviews with selected households and a one-time primary survey with a larger sample size. The research meticulous… ▽ More The paper presents findings from a comprehensive study examining the saving and credit behaviors of low-income households residing in unauthorized colonies within a metropolitan area. Utilizing a dual approach, the study engaged in prolonged fieldwork, including repeated fortnightly interviews with selected households and a one-time primary survey with a larger sample size. The research meticulously analyzed the financial lives of these households, focusing on their saving and credit behaviors and assessing the accessibility and intensity of usage of financial instruments available to them. Through suitable regression models, the study identified key factors influencing the usage of financial instruments among low-income households. Transaction costs, convenience, and financial knowledge emerged as significant determinants impacting both usage decisions and the intensity of usage. The research underscores the importance of addressing demand side factors to ensure widespread financial services usage among low-income groups. Efforts to reduce time costs, enhance product accessibility and liquidity, and augment financial literacy are essential for fostering financial inclusion in unauthorized colonies. The findings highlight the imperative of moving beyond mere financial access towards promoting universal usage to realize the full benefits of financial inclusion. △ Less

Submitted 4 December, 2024; originally announced December 2024.

Comments: 53, 8

arXiv:2412.01867 [pdf]

Decoding Financial Behaviour: An Analysis of urbanised households in India using AIDIS 77th round

Authors: Divya Sharma

Abstract: This research paper delves into the financial behavior of urbanized households in India, specifically focusing on million-plus agglomerations. Using data from the 77th round of the All India Debt and Investment Survey, the study analyzes assets, borrowing patterns, and the usage of financial instruments like bank accounts, e-wallets, and life insurance. The research explores demographic factors, h… ▽ More This research paper delves into the financial behavior of urbanized households in India, specifically focusing on million-plus agglomerations. Using data from the 77th round of the All India Debt and Investment Survey, the study analyzes assets, borrowing patterns, and the usage of financial instruments like bank accounts, e-wallets, and life insurance. The research explores demographic factors, household structures, and district-level parameters to understand the intricate financial landscape. With a focus on low-income households, the study identifies income, education, type of employment, and branches per capita as significant factors influencing financial behavior. The study contributes valuable insights for financial institutions, policymakers, and researchers seeking a comprehensive understanding of the financial lives of urban households in India. △ Less

Submitted 2 December, 2024; originally announced December 2024.

Comments: 36, 17

arXiv:2412.00549 [pdf, other]

SeQwen at the Financial Misinformation Detection Challenge Task: Sequential Learning for Claim Verification and Explanation Generation in Financial Domains

Authors: Jebish Purbey, Siddhant Gupta, Nikhil Manali, Siddartha Pullakhandam, Drishti Sharma, Ashay Srivastava, Ram Mohan Rao Kadiyala

Abstract: This paper presents the system description of our entry for the COLING 2025 FMD challenge, focusing on misinformation detection in financial domains. We experimented with a combination of large language models, including Qwen, Mistral, and Gemma-2, and leveraged pre-processing and sequential learning for not only identifying fraudulent financial content but also generating coherent, and concise ex… ▽ More This paper presents the system description of our entry for the COLING 2025 FMD challenge, focusing on misinformation detection in financial domains. We experimented with a combination of large language models, including Qwen, Mistral, and Gemma-2, and leveraged pre-processing and sequential learning for not only identifying fraudulent financial content but also generating coherent, and concise explanations that clarify the rationale behind the classifications. Our approach achieved competitive results with an F1-score of 0.8283 for classification, and ROUGE-1 of 0.7253 for explanations. This work highlights the transformative potential of LLMs in financial applications, offering insights into their capabilities for combating misinformation and enhancing transparency while identifying areas for future improvement in robustness and domain adaptation. △ Less

Submitted 30 November, 2024; originally announced December 2024.

Comments: 6 pages, 9 figures, Submitted to FinNLP-FNP-LLMFinLegal @ COLING 2025

arXiv:2412.00361 [pdf, other]

Investigating the relation between environment and internal structure of massive elliptical galaxies using strong lensing

Authors: S M Rafee Adnan, Muhammad Jobair Hasan, Ahmad Al-Imtiaz, Sulyman H. Robin, Fahim R. Shwadhin, Anowar J. Shajib, Mamun Hossain Nahid, Mehedi Hasan Tanver, Tanjela Akter, Nusrath Jahan, Zareef Jafar, Mamunur Rashid, Anik Biswas, Akbar Ahmed Chowdhury, Jannatul Feardous, Ajmi Rahaman, Masuk Ridwan, Rahul D. Sharma, Zannat Chowdhury, Mir Sazzat Hossain

Abstract: Strong lensing by massive galaxies probes their mass distribution, thus providing a window to study their internal structure, i.e., the distributions of luminous and dark matter. In this paper, we investigate the relation between the internal structure of massive elliptical galaxies and their environment using a sample of 15 strong lensing systems. We performed lens modeling for them using Lenstro… ▽ More Strong lensing by massive galaxies probes their mass distribution, thus providing a window to study their internal structure, i.e., the distributions of luminous and dark matter. In this paper, we investigate the relation between the internal structure of massive elliptical galaxies and their environment using a sample of 15 strong lensing systems. We performed lens modeling for them using Lenstronomy and constrained the mass and light distributions of the deflector galaxies. We adopt the local galaxy density as a metric for the environment and test our results against several alternative definitions of it. We robustly find that the centroid offset between the mass and light is not correlated with the local galaxy density. This result supports using centroid offsets as a probe of dark matter theories since the environment's impact on it can be treated as negligible. Although we find a moderate to strong correlation between the position angle offset and the standard definition of the local galaxy density, consistent with previous studies, the correlation becomes weaker for alternative definitions of the local galaxy density. This result weakens the support for interpreting the position angle misalignment as having originated from interaction with the environment. Furthermore, we find the 'residual shear' magnitude in the lens model to be uncorrelated with the local galaxy density, supporting the interpretation of the residual shear originating, in part, from the inadequacy in modeling the angular structure of the lensing galaxy and not solely from the structures present in the environment or along the line of sight. △ Less

Submitted 27 April, 2025; v1 submitted 30 November, 2024; originally announced December 2024.

Comments: 16 pages, 9 figures, 3 tables. Accepted version for A&A

arXiv:2411.19799 [pdf, other]

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

Authors: Angelika Romanou, Negar Foroutan, Anna Sotnikova, Zeming Chen, Sree Harsha Nelaturu, Shivalika Singh, Rishabh Maheshwary, Micol Altomare, Mohamed A. Haggag, Snegha A, Alfonso Amayuelas, Azril Hafizi Amirudin, Viraat Aryabumi, Danylo Boiko, Michael Chang, Jenny Chim, Gal Cohen, Aditya Kumar Dalmia, Abraham Diress, Sharad Duwal, Daniil Dzenhaliou, Daniel Fernando Erazo Florez, Fabian Farestam, Joseph Marvin Imperial, Shayekh Bin Islam , et al. (34 additional authors not shown)

Abstract: The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (\ie, multilingual LLMs) is bottlenecked by the lack of high-quality evaluation resources in languages other th… ▽ More The performance differential of large language models (LLM) between languages hinders their effective deployment in many regions, inhibiting the potential economic and societal value of generative AI tools in many communities. However, the development of functional LLMs in many languages (\ie, multilingual LLMs) is bottlenecked by the lack of high-quality evaluation resources in languages other than English. Moreover, current practices in multilingual benchmark construction often translate English resources, ignoring the regional and cultural knowledge of the environments in which multilingual systems would be used. In this work, we construct an evaluation suite of 197,243 QA pairs from local exam sources to measure the capabilities of multilingual LLMs in a variety of regional contexts. Our novel resource, INCLUDE, is a comprehensive knowledge- and reasoning-centric benchmark across 44 written languages that evaluates multilingual LLMs for performance in the actual language environments where they would be deployed. △ Less

Submitted 29 November, 2024; originally announced November 2024.

arXiv:2411.08854 [pdf, other]

doi 10.1088/1475-7516/2025/03/017

Stochastic inflation and non-perturbative power spectrum beyond slow roll

Authors: Devanshu Sharma

Abstract: Stochastic inflation, together with the $ΔN$ formalism, provides a powerful tool for estimating the large-scale behaviour of primordial fluctuations. In this work, we develop a numerical code to capture the non-perturbative statistics of these fluctuations and validate it to obtain the exponential non-Gaussian tail of the curvature perturbations. We present a numerical algorithm to compute the non… ▽ More Stochastic inflation, together with the $ΔN$ formalism, provides a powerful tool for estimating the large-scale behaviour of primordial fluctuations. In this work, we develop a numerical code to capture the non-perturbative statistics of these fluctuations and validate it to obtain the exponential non-Gaussian tail of the curvature perturbations. We present a numerical algorithm to compute the non-perturbative curvature power spectrum and apply it to both slow-roll (SR) and ultra-slow-roll (USR) single-field models of inflation. We accurately generate a non-perturbative scale-invariant power spectrum in the SR scenario. In the USR case, we obtain a peak in the power spectrum that, in the time-independent regime, aligns with the structure of its perturbative counterpart. Additionally, We underscore how the evolving nature of the super-Hubble perturbations in the USR model complicates the numerical computation of the non-perturbative spectrum. △ Less

Submitted 7 March, 2025; v1 submitted 13 November, 2024; originally announced November 2024.

Comments: 26 pages, 9 figures, matched with the published version in JCAP

Journal ref: JCAP03(2025)017

Showing 1–50 of 476 results for author: Sharma, D