Search | arXiv e-print repository

Bilingual Adaptation of Monolingual Foundation Models

Authors: Gurpreet Gosal, Yishi Xu, Gokul Ramakrishnan, Rituraj Joshi, Avraham Sheinin, Zhiming, Chen, Biswajit Mishra, Natalia Vassilieva, Joel Hestness, Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Onkar Pandit, Satheesh Katipomu, Samta Kamboj, Samujjwal Ghosh, Rahul Pal, Parvez Mullah, Soundar Doraiswamy, Mohamed El Karim Chami, Preslav Nakov

Abstract: We present an efficient method for adapting a monolingual Large Language Model (LLM) to another language, addressing challenges of catastrophic forgetting and tokenizer limitations. We focus this study on adapting Llama 2 to Arabic. Our two-stage approach begins with expanding the vocabulary and training only the embeddings matrix, followed by full model continual pre-training on a bilingual corpu… ▽ More We present an efficient method for adapting a monolingual Large Language Model (LLM) to another language, addressing challenges of catastrophic forgetting and tokenizer limitations. We focus this study on adapting Llama 2 to Arabic. Our two-stage approach begins with expanding the vocabulary and training only the embeddings matrix, followed by full model continual pre-training on a bilingual corpus. By continually pre-training on a mix of Arabic and English corpora, the model retains its proficiency in English while acquiring capabilities in Arabic. Our approach results in significant improvements in Arabic and slight enhancements in English, demonstrating cost-effective cross-lingual transfer. We perform ablations on embedding initialization techniques, data mix ratios, and learning rates and release a detailed training recipe. To demonstrate generalizability of this approach we also adapted Llama 3 8B to Arabic and Llama 2 13B to Hindi. △ Less

Submitted 25 July, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.11973 [pdf, other]

Preliminary Study of the Impact of AI-Based Interventions on Health and Behavioral Outcomes in Maternal Health Programs

Authors: Arpan Dasgupta, Niclas Boehmer, Neha Madhiwalla, Aparna Hedge, Bryan Wilder, Milind Tambe, Aparna Taneja

Abstract: Automated voice calls are an effective method of delivering maternal and child health information to mothers in underserved communities. One method to fight dwindling listenership is through an intervention in which health workers make live service calls. Previous work has shown that we can use AI to identify beneficiaries whose listenership gets the greatest boost from an intervention. It has als… ▽ More Automated voice calls are an effective method of delivering maternal and child health information to mothers in underserved communities. One method to fight dwindling listenership is through an intervention in which health workers make live service calls. Previous work has shown that we can use AI to identify beneficiaries whose listenership gets the greatest boost from an intervention. It has also been demonstrated that listening to the automated voice calls consistently leads to improved health outcomes for the beneficiaries of the program. These two observations combined suggest the positive effect of AI-based intervention scheduling on behavioral and health outcomes. This study analyzes the relationship between the two. Specifically, we are interested in mothers' health knowledge in the post-natal period, measured through survey questions. We present evidence that improved listenership through AI-scheduled interventions leads to a better understanding of key health issues during pregnancy and infancy. This improved understanding has the potential to benefit the health outcomes of mothers and their babies. △ Less

Submitted 23 May, 2024; originally announced July 2024.

Comments: Accepted at Autonomous Agents for Social Good (AASG) workshop at AAMAS'24

arXiv:2407.09722 [pdf, other]

Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference

Authors: Zongyue Qin, Ziniu Hu, Zifan He, Neha Prakriya, Jason Cong, Yizhou Sun

Abstract: Large language models (LLMs) have achieved remarkable success across diverse tasks, yet their inference processes are hindered by substantial time and energy demands due to single-token generation at each decoding step. While previous methods such as speculative decoding mitigate these inefficiencies by producing multiple tokens per step, each token is still generated by its single-token distribut… ▽ More Large language models (LLMs) have achieved remarkable success across diverse tasks, yet their inference processes are hindered by substantial time and energy demands due to single-token generation at each decoding step. While previous methods such as speculative decoding mitigate these inefficiencies by producing multiple tokens per step, each token is still generated by its single-token distribution, thereby enhancing speed without improving effectiveness. In contrast, our work simultaneously enhances inference speed and improves the output effectiveness. We consider multi-token joint decoding (MTJD), which generates multiple tokens from their joint distribution at each iteration, theoretically reducing perplexity and enhancing task performance. However, MTJD suffers from the high cost of sampling from the joint distribution of multiple tokens. Inspired by speculative decoding, we introduce multi-token assisted decoding (MTAD), a novel framework designed to accelerate MTJD. MTAD leverages a smaller auxiliary model to approximate the joint distribution of a larger model, incorporating a verification mechanism that not only ensures the accuracy of this approximation, but also improves the decoding efficiency over conventional speculative decoding. Theoretically, we demonstrate that MTAD closely approximates exact MTJD with bounded error. Empirical evaluations using Llama-2 and OPT models ranging from 13B to 70B parameters across various tasks reveal that MTAD reduces perplexity by 21.2% and improves downstream performance compared to standard single-token sampling. Furthermore, MTAD achieves a 1.42x speed-up and consumes 1.54x less energy than conventional speculative decoding methods. These results highlight MTAD's ability to make multi-token joint decoding both effective and efficient, promoting more sustainable and high-performance deployment of LLMs. △ Less

Submitted 9 April, 2025; v1 submitted 12 July, 2024; originally announced July 2024.

Journal ref: ICLR 2025

arXiv:2407.07526 [pdf, other]

ler : LVK (LIGO-Virgo-KAGRA collaboration) event (compact-binary mergers) rate calculator and simulator

Authors: Hemantakumar Phurailatpam, Anupreeta More, Harsh Narola, Ng Chung Yin, Justin Janquart, Chris Van Den Broeck, Otto Akseli Hannuksela, Neha Singh, David Keitel

Abstract: '$ler$' is a statistics-based Python package specifically designed for computing detectable rates of both lensed and unlensed GW events, catering to the requirements of the LIGO-Virgo-KAGRA Scientific Collaboration and astrophysics research scholars. The core functionality of '$ler$' intricately hinges upon the interplay of various components which include sampling the properties of compact-binary… ▽ More '$ler$' is a statistics-based Python package specifically designed for computing detectable rates of both lensed and unlensed GW events, catering to the requirements of the LIGO-Virgo-KAGRA Scientific Collaboration and astrophysics research scholars. The core functionality of '$ler$' intricately hinges upon the interplay of various components which include sampling the properties of compact-binary sources, lens galaxies characteristics, solving lens equations to derive properties of resultant images, and computing detectable GW rates. This comprehensive functionality builds on the leveraging of array operations and linear algebra from the $numpy$ library, enhanced by interpolation methods from $scipy$ and Python's $multiprocessing$ capabilities. Efficiency is further boosted by the $numba$ library's Just-In-Time ($njit$) compilation, optimizing extensive numerical computations and employing the inverse transform sampling method to replace more cumbersome rejection sampling. The modular design of '$ler$' not only optimizes speed and functionality but also ensures adaptability and upgradability, supporting the integration of additional statistics as research evolves. Currently, '$ler$' is an important tool in generating simulated GW events, both lensed and unlensed, and provides astrophysically accurate distributions of event-related parameters for both detectable and non-detectable events. This functionality aids in event validation and enhances the forecasting of detection capabilities across various GW detectors to study such events. The architecture of the '$ler$' API facilitates seamless compatibility with other software packages, allowing researchers to integrate and utilize its functionalities based on specific scientific requirements. △ Less

Submitted 15 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: 5 pages, 1 Logo in each of the pages, this is for the JOSS publication

arXiv:2407.06761 [pdf, other]

Spectro-polarimetric view of the gamma-ray emitting NLS1 1H0323+342

Authors: Jincen Jose, Suvendu Rakshit, Swayamtrupta Panda, Jong-Hak Woo, C. S. Stalin, Neha Sharma, Shivangi Pandey

Abstract: The gamma-ray emitting narrow-line Seyfert 1 galaxies are a unique class of objects that launch powerful jets from relatively lower-mass black hole systems compared to the Blazars. However, the black hole masses estimated from the total flux spectrum suffer from the projection effect, making the mass measurement highly uncertain. The polarized spectrum provides a unique view of the central engine… ▽ More The gamma-ray emitting narrow-line Seyfert 1 galaxies are a unique class of objects that launch powerful jets from relatively lower-mass black hole systems compared to the Blazars. However, the black hole masses estimated from the total flux spectrum suffer from the projection effect, making the mass measurement highly uncertain. The polarized spectrum provides a unique view of the central engine through scattered light. We performed spectro-polarimetric observations of the gamma-ray emitting narrow-line Seyfert 1 galaxy 1H0323+342 using SPOL/MMT. The degree of polarization and polarization angle is 0.122 $\pm$ 0.040 % and 142 $\pm$ 9 degrees, while the H$α$ line is polarized at 0.265 $\pm$ 0.280 %. We decomposed the total flux spectrum and estimated broad H$α$ FWHM of 1015 km/s. The polarized flux spectrum shows a broadening similar to the total flux spectrum, with a broadening ratio of 1.22. The Monte Carlo radiative transfer code `STOKES' applied to the data provides the best fit for a small viewing angle of 9-24 degrees and a small optical depth ratio between the polar and the equatorial scatters. A thick BLR with significant scale height can explain a similar broadening of the polarized spectrum compared to the total flux spectrum with a small viewing angle. △ Less

Submitted 11 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: Accepted for publication in MNRAS (Jul. 08th, 2024). 12 pages, 6 figures

arXiv:2407.01688 [pdf, other]

How We Built Cedar: A Verification-Guided Approach

Authors: Craig Disselkoen, Aaron Eline, Shaobo He, Kyle Headley, Michael Hicks, Kesha Hietala, John Kastner, Anwar Mamat, Matt McCutchen, Neha Rungta, Bhakti Shah, Emina Torlak, Andrew Wells

Abstract: This paper presents verification-guided development (VGD), a software engineering process we used to build Cedar, a new policy language for expressive, fast, safe, and analyzable authorization. Developing a system with VGD involves writing an executable model of the system and mechanically proving properties about the model; writing production code for the system and using differential random test… ▽ More This paper presents verification-guided development (VGD), a software engineering process we used to build Cedar, a new policy language for expressive, fast, safe, and analyzable authorization. Developing a system with VGD involves writing an executable model of the system and mechanically proving properties about the model; writing production code for the system and using differential random testing (DRT) to check that the production code matches the model; and using property-based testing (PBT) to check properties of unmodeled parts of the production code. Using VGD for Cedar, we can build fast, idiomatic production code, prove our model correct, and find and fix subtle implementation bugs that evade code reviews and unit testing. While carrying out proofs, we found and fixed 4 bugs in Cedar's policy validator, and DRT and PBT helped us find and fix 21 additional bugs in various parts of Cedar. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.19241 [pdf, other]

Stiefel-Whitney Classes for Finite Special Linear Groups of Even Rank

Authors: Neha Malik, Steven Spallone

Abstract: We compute the total Stiefel-Whitney Classes (SWCs) for orthogonal representations of special linear groups $\text{SL}(n,q)$ when $n$ and $q$ are odd. These classes are expressed in terms of character values at diagonal elements of order $2$. We give several consequences, and work out the $4$th SWC explicitly, and the $8$th SWC when the $4$th vanishes. We compute the total Stiefel-Whitney Classes (SWCs) for orthogonal representations of special linear groups $\text{SL}(n,q)$ when $n$ and $q$ are odd. These classes are expressed in terms of character values at diagonal elements of order $2$. We give several consequences, and work out the $4$th SWC explicitly, and the $8$th SWC when the $4$th vanishes. △ Less

Submitted 7 March, 2025; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 18 pages

MSC Class: 20G40; 55R40

arXiv:2406.15554 [pdf, other]

doi 10.1051/0004-6361/202451037

Testing particle acceleration in blazar jets with continuous high-cadence optical polarization observations

Authors: Ioannis Liodakis, Sebastian Kiehlmann, Alan P. Marscher, Haocheng Zhang, Dmitry Blinov, Svetlana G. Jorstad, Iván Agudo, Erika Benítez, Andrei Berdyugin, Giacomo Bonnoli, Carolina Casadio, Chien-Ting Chen, Wen-Ping Chen, Steven R. Ehlert, Juan Escudero, Tatiana S. Grishina, David Hiriart, Angela Hsu, Ryo Imazawa, Helen E. Jermak, Jincen Jose, Philip Kaaret, Evgenia N. Kopatskaya, Bhavana Lalchand, Elena G. Larionova , et al. (22 additional authors not shown)

Abstract: Variability can be the pathway to understanding the physical processes in astrophysical jets, however, the high-cadence observations required to test particle acceleration models are still missing. Here we report on the first attempt to produce continuous, >24 hour polarization light curves of blazars using telescopes distributed across the globe and the rotation of the Earth to avoid the rising S… ▽ More Variability can be the pathway to understanding the physical processes in astrophysical jets, however, the high-cadence observations required to test particle acceleration models are still missing. Here we report on the first attempt to produce continuous, >24 hour polarization light curves of blazars using telescopes distributed across the globe and the rotation of the Earth to avoid the rising Sun. Our campaign involved 16 telescopes in Asia, Europe, and North America. We observed BL Lacertae and CGRaBS J0211+1051 for a combined 685 telescope hours. We find large variations in the polarization degree and angle for both sources in sub-hour timescales as well as a ~180 degree rotation of the polarization angle in CGRaBS J0211+1051 in less than two days. We compared our high-cadence observations to Particle-In-Cell magnetic reconnection and turbulent plasma simulations. We find that although the state of the art simulation frameworks can produce a large fraction of the polarization properties, they do not account for the entirety of the observed polarization behavior in blazar jets. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 20 pages, 15 figures, 2 tables, accepted for publication in A&A. The data used in the paper are available here: https://doi.org/10.7910/DVN/IETSXS

Journal ref: A&A 689, A200 (2024)

arXiv:2406.08802 [pdf, other]

DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing

Authors: Neha Sahipjohn, Ashishkumar Gudmalwar, Nirmesh Shah, Pankaj Wasnik, Rajiv Ratn Shah

Abstract: Audio-visual alignment after dubbing is a challenging research problem. To this end, we propose a novel method, DubWise Multi-modal Large Language Model (LLM)-based Text-to-Speech (TTS), which can control the speech duration of synthesized speech in such a way that it aligns well with the speakers lip movements given in the reference video even when the spoken text is different or in a different l… ▽ More Audio-visual alignment after dubbing is a challenging research problem. To this end, we propose a novel method, DubWise Multi-modal Large Language Model (LLM)-based Text-to-Speech (TTS), which can control the speech duration of synthesized speech in such a way that it aligns well with the speakers lip movements given in the reference video even when the spoken text is different or in a different language. To accomplish this, we propose to utilize cross-modal attention techniques in a pre-trained GPT-based TTS. We combine linguistic tokens from text, speaker identity tokens via a voice cloning network, and video tokens via a proposed duration controller network. We demonstrate the effectiveness of our system on the Lip2Wav-Chemistry and LRS2 datasets. Also, the proposed method achieves improved lip sync and naturalness compared to the SOTAs for the same language but different text (i.e., non-parallel) and the different language, different text (i.e., cross-lingual) scenarios. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted at INTERSPEECH 2024

arXiv:2406.02193 [pdf, other]

doi 10.1051/0004-6361/202449720

Optical polarimetry study of Lambda-Orionis star-forming region

Authors: Sharma Neha, Archana Soam, G. Maheswar

Abstract: We present an optical polarimetric study of a nearby star-forming region, Lambda-Orionis, to map plane-of-the-sky magnetic field geometry to understand the magnetized evolution of the HII region and associated small molecular clouds. We made multi-wavelength polarization observations of 34 bright stars distributed across the region. R-band polarization measurements focused on small molecular cloud… ▽ More We present an optical polarimetric study of a nearby star-forming region, Lambda-Orionis, to map plane-of-the-sky magnetic field geometry to understand the magnetized evolution of the HII region and associated small molecular clouds. We made multi-wavelength polarization observations of 34 bright stars distributed across the region. R-band polarization measurements focused on small molecular clouds BRC 17 and BRC 18 located at the periphery of the HII region are also presented. The magnetic field lines exhibit a large-scale ordered orientation consistent with the Planck sub-mm polarization measurements. The magnetic field lines in both the BRCs are found to be roughly in north-south directions; however, a larger dispersion is noticed in the orientation for BRC 17 compared to BRC 18. Using structure-function analysis, the strength of the plane-of-the-sky component of the magnetic field is estimated as $\sim$28 $μ$G for BRC 17 and $\sim$40 $μ$G for BRC 18. The average dust grain size and the mean value of the total-to-selective extinction ratio (R$_{V}$) in the HII region are found to be $\sim$0.51 $\pm$ 0.05 $μ$m and $\sim$2.9 $\pm$ 0.3, respectively. The distance of the whole HII region is estimated as $\sim$392 $\pm$ 8 pc by combining astrometry information from GAIA EDR3 for YSOs associated with BRCs and confirmed members of central cluster Collinder 69. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 17 pages, Accepted in A&A on 20/05/2024

Journal ref: A&A 689, A225 (2024)

arXiv:2405.20070 [pdf]

doi 10.1038/s41598-024-58935-6

Pick-up and assembling of chemically sensitive van der Waals heterostructures using dry cryogenic exfoliation

Authors: Vilas Patil, Sanat Ghosh, Amit Basu, Kuldeep, Achintya Dutta, Khushabu Agrawal, Neha Bhatia, Amit Shah, Digambar A. Jangade, Ruta Kulkarni, A. Thamizhavel, Mandar M. Deshmukh

Abstract: Assembling atomic layers of van der Waals materials (vdW) combines the physics of two materials, offering opportunities for novel functional devices. Realization of this has been possible because of advancements in nanofabrication processes which often involve chemical processing of the materials under study; this can be detrimental to device performance. To address this issue, we have developed a… ▽ More Assembling atomic layers of van der Waals materials (vdW) combines the physics of two materials, offering opportunities for novel functional devices. Realization of this has been possible because of advancements in nanofabrication processes which often involve chemical processing of the materials under study; this can be detrimental to device performance. To address this issue, we have developed a modified micro-manipulator setup for cryogenic exfoliation, pick up, and transfer of vdW materials to assemble heterostructures. We use the glass transition of a polymer PDMS to cleave a flake into two, followed by its pick-up and drop to form pristine twisted junctions. To demonstrate the potential of the technique, we fabricated twisted heterostructure of Bi$_2$Sr$_2$CaCu$_2$O$_{8+x}$ (BSCCO), a van der Waals high-temperature cuprate superconductor. We also employed this method to re-exfoliate NbSe$_2$ and make twisted heterostructure. Transport measurements of the fabricated devices indicate the high quality of the artificial twisted interface. In addition, we extend this cryogenic exfoliation method for other vdW materials, offering an effective way of assembling heterostructures and twisted junctions with pristine interfaces. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Journal ref: Scientific Reports 14, Article number: 11097 (2024)

arXiv:2405.19261 [pdf, other]

Faster Cascades via Speculative Decoding

Authors: Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Seungyeon Kim, Neha Gupta, Aditya Krishna Menon, Sanjiv Kumar

Abstract: Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule that invokes the larger model only for "hard" inputs, while speculative decoding uses speculative execution to primarily invoke the larger model in p… ▽ More Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule that invokes the larger model only for "hard" inputs, while speculative decoding uses speculative execution to primarily invoke the larger model in parallel verification mode. These mechanisms offer different benefits: empirically, cascades offer better cost-quality trade-offs, often even outperforming the large model, while theoretically, speculative decoding offers a guarantee of quality-neutrality. In this paper, we leverage the best of both these approaches by designing new speculative cascading techniques that implement their deferral rule through speculative execution. We characterize the optimal deferral rule for our speculative cascades, and employ a plug-in approximation to the optimal rule. Experiments with Gemma and T5 models on a range of language benchmarks show that our approach yields better cost quality trade-offs than cascading and speculative decoding baselines. △ Less

Submitted 21 October, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18061 [pdf, other]

doi 10.18653/v1/2024.wassa-1.28

Context is Important in Depressive Language: A Study of the Interaction Between the Sentiments and Linguistic Markers in Reddit Discussions

Authors: Neha Sharma, Kairit Sirts

Abstract: Research exploring linguistic markers in individuals with depression has demonstrated that language usage can serve as an indicator of mental health. This study investigates the impact of discussion topic as context on linguistic markers and emotional expression in depression, using a Reddit dataset to explore interaction effects. Contrary to common findings, our sentiment analysis revealed a broa… ▽ More Research exploring linguistic markers in individuals with depression has demonstrated that language usage can serve as an indicator of mental health. This study investigates the impact of discussion topic as context on linguistic markers and emotional expression in depression, using a Reddit dataset to explore interaction effects. Contrary to common findings, our sentiment analysis revealed a broader range of emotional intensity in depressed individuals, with both higher negative and positive sentiments than controls. This pattern was driven by posts containing no emotion words, revealing the limitations of the lexicon based approaches in capturing the full emotional context. We observed several interesting results demonstrating the importance of contextual analyses. For instance, the use of 1st person singular pronouns and words related to anger and sadness correlated with increased positive sentiments, whereas a higher rate of present-focused words was associated with more negative sentiments. Our findings highlight the importance of discussion contexts while interpreting the language used in depression, revealing that the emotional intensity and meaning of linguistic markers can vary based on the topic of discussion. △ Less

Submitted 3 July, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Report number: 2024.wassa-1.28

Journal ref: https://aclanthology.org/2024.wassa-1.28

arXiv:2405.17342 [pdf]

doi 10.1088/2753-3751/ad7d10

Measuring Exploration: Review and Systematic Evaluation of Modelling to Generate Alternatives Methods in Macro-Energy Systems Planning Models

Authors: Michael Lau, Neha Patankar, Jesse D. Jenkins

Abstract: As decarbonization agendas mature, macro-energy systems modelling studies have increasingly focused on enhanced decision support methods that move beyond least-cost modelling to improve consideration of additional objectives and tradeoffs. One candidate is Modeling to Generate Alternatives (MGA), which systematically explores new objectives without explicit stakeholder elicitation. Previous litera… ▽ More As decarbonization agendas mature, macro-energy systems modelling studies have increasingly focused on enhanced decision support methods that move beyond least-cost modelling to improve consideration of additional objectives and tradeoffs. One candidate is Modeling to Generate Alternatives (MGA), which systematically explores new objectives without explicit stakeholder elicitation. Previous literature lacks both a comprehensive review of MGA vector selection methods in large-scale energy system models and comparative testing of their relative efficacies in this setting. To fill this gap, this paper provides a comprehensive review of the MGA literature, identifying at least seven MGA vector selection methodologies and carrying out a systematic evaluation of four: Hop-Skip-Jump, Random Vector, Variable Min/Max, and Modelling All Alternatives. We examine each method's runtime, parallelizability, new solution discovery efficiency, and spatial exploration in lower dimensional (N <= 100) spaces, as well as spatial exploration in a three-zone, 8760-hour capacity expansion model case. Through these tests, we find Random Vector provides the broadest exploration of the near-optimal feasible region and Variable Min/Max provides the most extreme results, while the two tie on computational speed. We thus propose a new Hybrid vector selection approach combining the two methods to take advantage of the strengths of each. Additional analysis is provided on MGA variable selection, in which we demonstrate MGA problems formulated over generation variables fail to retain cost-optimal dispatch and are thus not reflective of real operations of equivalent hypothetical capacity choices. As such, we recommend future studies utilize a parallelized combined vector approach over the set of capacity variables for best results in computational speed and spatial exploration while retaining optimal dispatch. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Journal ref: Environmental Research: Energy, Volume 1, Number 4, 2024

arXiv:2405.16401 [pdf, other]

Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning

Authors: Neha Kalibhat, Priyatham Kattakinda, Sumit Nawathe, Arman Zarei, Nikita Seleznev, Samuel Sharpe, Senthil Kumar, Soheil Feizi

Abstract: Vision transformers have established a precedent of patchifying images into uniformly-sized chunks before processing. We hypothesize that this design choice may limit models in learning comprehensive and compositional representations from visual data. This paper explores the notion of providing semantically-meaningful visual tokens to transformer encoders within a vision-language pre-training fram… ▽ More Vision transformers have established a precedent of patchifying images into uniformly-sized chunks before processing. We hypothesize that this design choice may limit models in learning comprehensive and compositional representations from visual data. This paper explores the notion of providing semantically-meaningful visual tokens to transformer encoders within a vision-language pre-training framework. Leveraging off-the-shelf segmentation and scene-graph models, we extract representations of instance segmentation masks (referred to as tangible tokens) and relationships and actions (referred to as intangible tokens). Subsequently, we pre-train a vision-side transformer by incorporating these newly extracted tokens and aligning the resultant embeddings with caption embeddings from a text-side encoder. To capture the structural and semantic relationships among visual tokens, we introduce additive attention weights, which are used to compute self-attention scores. Our experiments on COCO demonstrate notable improvements over ViTs in learned representation quality across text-to-image (+47%) and image-to-text retrieval (+44%) tasks. Furthermore, we showcase the advantages on compositionality benchmarks such as ARO (+18%) and Winoground (+10%). △ Less

Submitted 19 May, 2025; v1 submitted 25 May, 2024; originally announced May 2024.

Comments: Published at CVPR Workshops 2025

arXiv:2405.11056 [pdf, other]

A Comparative Study of Garment Draping Techniques

Authors: Prerana Achar, Mayank Patel, Anushka Mulik, Neha Katre, Stevina Dias, Chirag Raman

Abstract: We present a comparison review that evaluates popular techniques for garment draping for 3D fashion design, virtual try-ons, and animations. A comparative study is performed between various methods for garment draping of clothing over the human body. These include numerous models, such as physics and machine learning based techniques, collision handling, and more. Performance evaluations and trade… ▽ More We present a comparison review that evaluates popular techniques for garment draping for 3D fashion design, virtual try-ons, and animations. A comparative study is performed between various methods for garment draping of clothing over the human body. These include numerous models, such as physics and machine learning based techniques, collision handling, and more. Performance evaluations and trade-offs are discussed to ensure informed decision-making when choosing the most appropriate approach. These methods aim to accurately represent deformations and fine wrinkles of digital garments, considering the factors of data requirements, and efficiency, to produce realistic results. The research can be insightful to researchers, designers, and developers in visualizing dynamic multi-layered 3D clothing. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.09966 [pdf, ps, other]

Tempered Fractional Hawkes Process and Its Generalization

Authors: Neha Gupta, Aditya Maheshwari

Abstract: Hawkes process (HP) is a point process with a conditionally dependent intensity function. This paper defines the tempered fractional Hawkes process (TFHP) by time-changing the HP with an inverse tempered stable subordinator. We obtained results that generalize the fractional Hawkes process defined in Hainaut (2020) to a tempered version which has \textit{semi-heavy tailed} decay. We derive the mea… ▽ More Hawkes process (HP) is a point process with a conditionally dependent intensity function. This paper defines the tempered fractional Hawkes process (TFHP) by time-changing the HP with an inverse tempered stable subordinator. We obtained results that generalize the fractional Hawkes process defined in Hainaut (2020) to a tempered version which has \textit{semi-heavy tailed} decay. We derive the mean, the variance, covariance and the governing fractional difference-differential equations of the TFHP. Additionally, we introduce the generalized fractional Hawkes process (GFHP) by time-changing the HP with the inverse Lévy subordinator. This definition encompasses all potential (inverse Lévy) time changes as specific instances. We also explore the distributional characteristics and the governing difference-differential equation of the one-dimensional distribution for the GFHP. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 15 papages

MSC Class: 60G22; 60G51; 60G55

arXiv:2405.07995 [pdf, ps, other]

On estimation of Hankel determinants for certain class of starlike functions

Authors: S. Sivaprasad Kumar, Neha Verma

Abstract: In the present study, we consider two subclasses starlike and convex functions, denoted by $\mathcal{S}_{\mathcal{B}}^{*}$ and $\mathcal{C}_{\mathcal{B}}$ respectively, associated with a bean-shaped domain. Further, we estimate certain sharp initial coefficients, as well as second, third and fourth-order Hankel determinants for functions belonging to the class $\mathcal{S}_{\mathcal{B}}^{*}$. Addi… ▽ More In the present study, we consider two subclasses starlike and convex functions, denoted by $\mathcal{S}_{\mathcal{B}}^{*}$ and $\mathcal{C}_{\mathcal{B}}$ respectively, associated with a bean-shaped domain. Further, we estimate certain sharp initial coefficients, as well as second, third and fourth-order Hankel determinants for functions belonging to the class $\mathcal{S}_{\mathcal{B}}^{*}$. Additionally, we compute sharp second and third-order Hankel determinants for functions belonging to the $\mathcal{C}_{\mathcal{B}}$ class. △ Less

Submitted 22 April, 2024; originally announced May 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2210.01435

arXiv:2405.06080 [pdf, other]

Scalable Learning of Segment-Level Traffic Congestion Functions

Authors: Shushman Choudhury, Abdul Rahman Kreidieh, Iveel Tsogsuren, Neha Arora, Carolina Osorio, Alexandre Bayen

Abstract: We propose and study a data-driven framework for identifying traffic congestion functions (numerical relationships between observations of traffic variables) at global scale and segment-level granularity. In contrast to methods that estimate a separate set of parameters for each roadway, ours learns a single black-box function over all roadways in a metropolitan area. First, we pool traffic data f… ▽ More We propose and study a data-driven framework for identifying traffic congestion functions (numerical relationships between observations of traffic variables) at global scale and segment-level granularity. In contrast to methods that estimate a separate set of parameters for each roadway, ours learns a single black-box function over all roadways in a metropolitan area. First, we pool traffic data from all segments into one dataset, combining static attributes with dynamic time-dependent features. Second, we train a feed-forward neural network on this dataset, which we can then use on any segment in the area. We evaluate how well our framework identifies congestion functions on observed segments and how it generalizes to unobserved segments and predicts segment attributes on a large dataset covering multiple cities worldwide. For identification error on observed segments, our single data-driven congestion function compares favorably to segment-specific model-based functions on highway roads, but has room to improve on arterial roads. For generalization, our approach shows strong performance across cities and road types: both on unobserved segments in the same city and on zero-shot transfer learning between cities. Finally, for predicting segment attributes, we find that our approach can approximate critical densities for individual segments using their static properties. △ Less

Submitted 25 September, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: Published at IEEE ITSC 2024

arXiv:2405.06067 [pdf, other]

HMT: Hierarchical Memory Transformer for Efficient Long Context Language Processing

Authors: Zifan He, Yingqi Cao, Zongyue Qin, Neha Prakriya, Yizhou Sun, Jason Cong

Abstract: Transformer-based large language models (LLM) have been widely used in language processing applications. However, due to the memory constraints of the devices, most of them restrict the context window. Even though recurrent models in previous works can memorize past tokens to enable unlimited context and maintain effectiveness, they have ``flat'' memory architectures. Such architectures have limit… ▽ More Transformer-based large language models (LLM) have been widely used in language processing applications. However, due to the memory constraints of the devices, most of them restrict the context window. Even though recurrent models in previous works can memorize past tokens to enable unlimited context and maintain effectiveness, they have ``flat'' memory architectures. Such architectures have limitations in selecting and filtering information. Since humans are good at learning and self-adjustment, we believe that imitating brain memory hierarchy is beneficial for model memorization. Thus, we propose the Hierarchical Memory Transformer (HMT), a novel framework that facilitates a model's long-context processing ability by imitating human memorization behavior. Leveraging memory-augmented segment-level recurrence, we organize the memory hierarchy by preserving tokens from early input segments, passing memory embeddings along the sequence, and recalling relevant information from history. Evaluating general language modeling, question-answering tasks, and the summarization task, we show that HMT consistently improves the long-context processing ability of existing models. Furthermore, HMT achieves a comparable or superior generation quality to long-context LLMs with $2 \sim 57\times$ fewer parameters and $2.5 \sim 116\times$ less inference memory, significantly outperforming previous memory-augmented models. Code on Github: https://github.com/OswaldHe/HMT-pytorch. △ Less

Submitted 6 February, 2025; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: NAACL 2025 Main Conference

arXiv:2405.06060 [pdf, other]

doi 10.1063/5.0219503

GdWN$_3$ is a Nitride Perovskite

Authors: Rebecca W. Smaha, John S. Mangum, Neha Yadav, Christopher L. Rom, Brian M. Wieliczka, Baptiste Julien, Andrew Treglia, Craig L. Perkins, Prashun Gorai, Sage R. Bauers, Andriy Zakutayev

Abstract: Nitride perovskites $AB$N$_3$ are an emerging and highly under-explored class of materials that are of interest due to their intriguing calculated ferroelectric, optoelectronic, and other functional properties. Incorporating novel $A$-site cations is one strategy to tune and expand such properties; for example, Gd$^{3+}$ is compelling due to its large magnetic moment, potentially leading to multif… ▽ More Nitride perovskites $AB$N$_3$ are an emerging and highly under-explored class of materials that are of interest due to their intriguing calculated ferroelectric, optoelectronic, and other functional properties. Incorporating novel $A$-site cations is one strategy to tune and expand such properties; for example, Gd$^{3+}$ is compelling due to its large magnetic moment, potentially leading to multiferroic behavior. However, the theoretically predicted ground state of GdWN$_3$ is a non-perovskite monoclinic structure. Here, we experimentally show that GdWN$_3$ crystallizes in a perovskite structure. High-throughput combinatorial sputtering with activated nitrogen is employed to synthesize thin films of Gd$_{1-x}$W$_{x}$N$_{3-y}$ with low oxygen content within the bulk of the films. Ex-situ annealing crystallizes a polycrystalline perovskite phase in a narrow composition window near $x=1$. LeBail fits of synchrotron grazing incidence wide angle X-ray scattering data are consistent with a perovskite ground-state structure. New density functional theory calculations that included antiferromagnetic configurations confirm that the ground-state structure of GdWN$_3$ is a distorted $Pnma$ perovskite with antiferromagnetic ordering, in contrast to prior predictions. Initial property measurements find that GdWN$_3$ is paramagnetic down to $T=2$ K with antiferromagnetic correlations and that the absorption onset depends on cation stoichiometry. This work provides an important stepping stone towards the rapid expansion of the emerging family of nitride perovskites and towards our understanding of their potential multiferroic properties. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 16 pages, 4 figures

Journal ref: Applied Physics Letters 125, 112902 (2024)

arXiv:2405.00204 [pdf, other]

General Purpose Verification for Chain of Thought Prompting

Authors: Robert Vacareanu, Anurag Pratik, Evangelia Spiliopoulou, Zheng Qi, Giovanni Paolini, Neha Anna John, Jie Ma, Yassine Benajiba, Miguel Ballesteros

Abstract: Many of the recent capabilities demonstrated by Large Language Models (LLMs) arise primarily from their ability to exploit contextual information. In this paper, we explore ways to improve reasoning capabilities of LLMs through (1) exploration of different chains of thought and (2) validation of the individual steps of the reasoning process. We propose three general principles that a model should… ▽ More Many of the recent capabilities demonstrated by Large Language Models (LLMs) arise primarily from their ability to exploit contextual information. In this paper, we explore ways to improve reasoning capabilities of LLMs through (1) exploration of different chains of thought and (2) validation of the individual steps of the reasoning process. We propose three general principles that a model should adhere to while reasoning: (i) Relevance, (ii) Mathematical Accuracy, and (iii) Logical Consistency. We apply these constraints to the reasoning steps generated by the LLM to improve the accuracy of the final generation. The constraints are applied in the form of verifiers: the model itself is asked to verify if the generated steps satisfy each constraint. To further steer the generations towards high-quality solutions, we use the perplexity of the reasoning steps as an additional verifier. We evaluate our method on 4 distinct types of reasoning tasks, spanning a total of 9 different datasets. Experiments show that our method is always better than vanilla generation, and, in 6 out of the 9 datasets, it is better than best-of N sampling which samples N reasoning chains and picks the lowest perplexity generation. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 22 pages, preprint

arXiv:2404.17474 [pdf, other]

Establishing best practices for modeling long duration energy storage in deeply decarbonized energy systems

Authors: Gabriel Mantegna, Wilson Ricks, Aneesha Manocha, Neha Patankar, Dharik Mallapragada, Jesse Jenkins

Abstract: Long duration energy storage (LDES) may become a critical technology for the decarbonization of the power sector, as current commercially available Li-ion battery storage technologies cannot cost-effectively shift energy to address multi-day or seasonal variability in demand and renewable energy availability. LDES is difficult to model in existing energy system planning models (such as electricity… ▽ More Long duration energy storage (LDES) may become a critical technology for the decarbonization of the power sector, as current commercially available Li-ion battery storage technologies cannot cost-effectively shift energy to address multi-day or seasonal variability in demand and renewable energy availability. LDES is difficult to model in existing energy system planning models (such as electricity system capacity expansion models), as it is much more dependent on an accurate representation of chronology than other resources. Techniques exist for modeling LDES in these planning models; however, it is not known how spatial and temporal resolution affect the performance of these techniques, creating a research gap. In this study we examine what spatial and temporal resolution is necessarily to accurately capture the full value of LDES, in the context of a continent-scale capacity expansion model. We use the results to draw conclusions and present best practices for modelers seeking to accurately model LDES in a macro-energy systems planning context. Our key findings are: 1) modeling LDES with linked representative periods is crucial to capturing its full value, 2) LDES value is highly sensitive to the cost and availability of other resources, and 3) temporal resolution is more important than spatial resolution for capturing the full value of LDES, although how much temporal resolution is needed will depend on the specific model context. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Working paper

arXiv:2404.16893 [pdf, other]

Automatic AI controller that can drive with confidence: steering vehicle with uncertainty knowledge

Authors: Neha Kumari, Sumit Kumar. Sneha Priya, Ayush Kumar, Akash Fogla

Abstract: In safety-critical systems that interface with the real world, the role of uncertainty in decision-making is pivotal, particularly in the context of machine learning models. For the secure functioning of Cyber-Physical Systems (CPS), it is imperative to manage such uncertainty adeptly. In this research, we focus on the development of a vehicle's lateral control system using a machine learning fram… ▽ More In safety-critical systems that interface with the real world, the role of uncertainty in decision-making is pivotal, particularly in the context of machine learning models. For the secure functioning of Cyber-Physical Systems (CPS), it is imperative to manage such uncertainty adeptly. In this research, we focus on the development of a vehicle's lateral control system using a machine learning framework. Specifically, we employ a Bayesian Neural Network (BNN), a probabilistic learning model, to address uncertainty quantification. This capability allows us to gauge the level of confidence or uncertainty in the model's predictions. The BNN based controller is trained using simulated data gathered from the vehicle traversing a single track and subsequently tested on various other tracks. We want to share two significant results: firstly, the trained model demonstrates the ability to adapt and effectively control the vehicle on multiple similar tracks. Secondly, the quantification of prediction confidence integrated into the controller serves as an early-warning system, signaling when the algorithm lacks confidence in its predictions and is therefore susceptible to failure. By establishing a confidence threshold, we can trigger manual intervention, ensuring that control is relinquished from the algorithm when it operates outside of safe parameters. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2303.08187

arXiv:2404.14548 [pdf, ps, other]

Advancing a Consent-Forward Paradigm for Digital Mental Health Data

Authors: Sachin R. Pendse, Logan Stapleton, Neha Kumar, Munmun De Choudhury, Stevie Chancellor

Abstract: The field of digital mental health is advancing at a rapid pace. Passively collected data from user engagements with digital tools and services continue to contribute new insights into mental health and illness. As the field of digital mental health grows, a concerning norm has been established -- digital service users are given little say over how their data is collected, shared, or used to gener… ▽ More The field of digital mental health is advancing at a rapid pace. Passively collected data from user engagements with digital tools and services continue to contribute new insights into mental health and illness. As the field of digital mental health grows, a concerning norm has been established -- digital service users are given little say over how their data is collected, shared, or used to generate revenue for private companies. Given a long history of service user exclusion from data collection practices, we propose an alternative approach that is attentive to this history: the consent-forward paradigm. This paradigm embeds principles of affirmative consent in the design of digital mental health tools and services, strengthening trust through designing around individual choices and needs, and proactively protecting users from unexpected harm. In this perspective, we outline practical steps to implement this paradigm, toward ensuring that people searching for care have the safest experiences possible. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 15 pages with 2 tables

arXiv:2404.13270 [pdf, other]

StrideNET: Swin Transformer for Terrain Recognition with Dynamic Roughness Extraction

Authors: Maitreya Shelare, Neha Shigvan, Atharva Satam, Poonam Sonar

Abstract: The field of remote-sensing image classification has seen immense progress with the rise of convolutional neural networks, and more recently, through vision transformers. These models, with their self-attention mechanism, can effectively capture global relationships and long-range dependencies between the image patches, in contrast with traditional convolutional models. This paper introduces Strid… ▽ More The field of remote-sensing image classification has seen immense progress with the rise of convolutional neural networks, and more recently, through vision transformers. These models, with their self-attention mechanism, can effectively capture global relationships and long-range dependencies between the image patches, in contrast with traditional convolutional models. This paper introduces StrideNET, a dual-branch transformer-based model developed for terrain recognition and surface roughness extraction. The terrain recognition branch employs the Swin Transformer to classify varied terrains by leveraging its capability to capture both local and global features. Complementing this, the roughness extraction branch utilizes a statistical texture-feature analysis technique to dynamically extract important land surface properties such as roughness and slipperiness. The model was trained on a custom dataset consisting of four terrain classes - grassy, marshy, sandy, and rocky, and it outperforms benchmark CNN and transformer based models, by achieving an average test accuracy of over 99 % across all classes. The applications of this work extend to different domains such as environmental monitoring, land use and cover classification, disaster response and precision agriculture. △ Less

Submitted 19 September, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

Comments: 6 pages, 5 figures, 3rd IEEE International Conference on Computer Vision and Machine Intelligence (IEEE CVMI)

arXiv:2404.11717 [pdf, other]

How often are errors in natural language reasoning due to paraphrastic variability?

Authors: Neha Srikanth, Marine Carpuat, Rachel Rudinger

Abstract: Large language models have been shown to behave inconsistently in response to meaning-preserving paraphrastic inputs. At the same time, researchers evaluate the knowledge and reasoning abilities of these models with test evaluations that do not disaggregate the effect of paraphrastic variability on performance. We propose a metric for evaluating the paraphrastic consistency of natural language rea… ▽ More Large language models have been shown to behave inconsistently in response to meaning-preserving paraphrastic inputs. At the same time, researchers evaluate the knowledge and reasoning abilities of these models with test evaluations that do not disaggregate the effect of paraphrastic variability on performance. We propose a metric for evaluating the paraphrastic consistency of natural language reasoning models based on the probability of a model achieving the same correctness on two paraphrases of the same problem. We mathematically connect this metric to the proportion of a model's variance in correctness attributable to paraphrasing. To estimate paraphrastic consistency, we collect ParaNLU, a dataset of 7,782 human-written and validated paraphrased reasoning problems constructed on top of existing benchmark datasets for defeasible and abductive natural language inference. Using ParaNLU, we measure the paraphrastic consistency of several model classes and show that consistency dramatically increases with pretraining but not finetuning. All models tested exhibited room for improvement in paraphrastic consistency. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: accepted to TACL 2024 (pre-MIT Press publication version)

arXiv:2404.10136 [pdf, other]

Language Model Cascades: Token-level uncertainty and beyond

Authors: Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

Abstract: Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here, a small model is invoked for most "easy" instances, while a few "hard" instances are deferred to the large model. While the principles underpinning c… ▽ More Recent advances in language models (LMs) have led to significant improvements in quality on complex NLP tasks, but at the expense of increased inference costs. Cascading offers a simple strategy to achieve more favorable cost-quality tradeoffs: here, a small model is invoked for most "easy" instances, while a few "hard" instances are deferred to the large model. While the principles underpinning cascading are well-studied for classification tasks - with deferral based on predicted class uncertainty favored theoretically and practically - a similar understanding is lacking for generative LM tasks. In this work, we initiate a systematic study of deferral rules for LM cascades. We begin by examining the natural extension of predicted class uncertainty to generative LM tasks, namely, the predicted sequence uncertainty. We show that this measure suffers from the length bias problem, either over- or under-emphasizing outputs based on their lengths. This is because LMs produce a sequence of uncertainty values, one for each output token; and moreover, the number of output tokens is variable across examples. To mitigate this issue, we propose to exploit the richer token-level uncertainty information implicit in generative LMs. We argue that naive predicted sequence uncertainty corresponds to a simple aggregation of these uncertainties. By contrast, we show that incorporating token-level uncertainty through learned post-hoc deferral rules can significantly outperform such simple aggregation strategies, via experiments on a range of natural language benchmarks with FLAN-T5 models. We further show that incorporating embeddings from the smaller model and intermediate layers of the larger model can give an additional boost in the overall cost-quality tradeoff. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.03908 [pdf, other]

Multi-Task Learning for Lung sound & Lung disease classification

Authors: Suma K V, Deepali Koppad, Preethi Kumar, Neha A Kantikar, Surabhi Ramesh

Abstract: In recent years, advancements in deep learning techniques have considerably enhanced the efficiency and accuracy of medical diagnostics. In this work, a novel approach using multi-task learning (MTL) for the simultaneous classification of lung sounds and lung diseases is proposed. Our proposed model leverages MTL with four different deep learning models such as 2D CNN, ResNet50, MobileNet and Dens… ▽ More In recent years, advancements in deep learning techniques have considerably enhanced the efficiency and accuracy of medical diagnostics. In this work, a novel approach using multi-task learning (MTL) for the simultaneous classification of lung sounds and lung diseases is proposed. Our proposed model leverages MTL with four different deep learning models such as 2D CNN, ResNet50, MobileNet and Densenet to extract relevant features from the lung sound recordings. The ICBHI 2017 Respiratory Sound Database was employed in the current study. The MTL for MobileNet model performed better than the other models considered, with an accuracy of74\% for lung sound analysis and 91\% for lung diseases classification. Results of the experimentation demonstrate the efficacy of our approach in classifying both lung sounds and lung diseases concurrently. In this study,using the demographic data of the patients from the database, risk level computation for Chronic Obstructive Pulmonary Disease is also carried out. For this computation, three machine learning algorithms namely Logistic Regression, SVM and Random Forest classifierswere employed. Among these ML algorithms, the Random Forest classifier had the highest accuracy of 92\%.This work helps in considerably reducing the physician's burden of not just diagnosing the pathology but also effectively communicating to the patient about the possible causes or outcomes. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.01244 [pdf, other]

Searching for enhancement in coalescence of in-jet (anti-)deuterons in proton-proton collisions

Authors: Yoshini Bailung, Neha Shah, Ankhi Roy

Abstract: Recent measurements from ALICE report that $``$in-jet'' nucleons carry a higher probability of forming a deuteron via coalescence than the nucleons from the underlying event (UE). This study makes use of an event shape classifier to separate the $``$in-jet'' deuterons and the deuterons in the UE produced in high multiplicity proton-proton collisions at $\sqrt{s} = 13$ TeV. Event shape variables su… ▽ More Recent measurements from ALICE report that $``$in-jet'' nucleons carry a higher probability of forming a deuteron via coalescence than the nucleons from the underlying event (UE). This study makes use of an event shape classifier to separate the $``$in-jet'' deuterons and the deuterons in the UE produced in high multiplicity proton-proton collisions at $\sqrt{s} = 13$ TeV. Event shape variables such as transverse spherocity allow the categorization of hard and soft components of an event, which can be divided into two respective classes; $``$jetty'' and $``$isotropic''. The $``$jetty'' deuterons minus the contribution of the deuterons from the $``$isotropic'' event are taken as $``$in-jet'' deuterons, and the coalescence mechanism is tested. The coalescence is performed with a Wigner function formalism, augmented as an afterburner to \textsc{pythia}8. The possible enhancement of the coalescence probability of $``$in-jet'' deuterons is investigated by calculating the coalescence parameter ($B_{2}$) in different spherocity classes in high-multiplicity $pp$ collisions. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 11 pages, 9 figures, To appear in Physical Review C

arXiv:2403.19712 [pdf, other]

Second and Third order differential subordination for exponential function

Authors: S. Sivaprasad Kumar, Neha Verma

Abstract: This article presents several findings regarding second and third-order differential subordination of the form: $$ p(z)+γ_1 zp'(z)+γ_2 z^2p''(z)\prec h(z)\implies p(z)\prec e^z $$ and $$ p(z)+γ_1 zp'(z)+γ_2 z^2p''(z)+γ_3 z^3p'''(z)\prec h(z)\implies p(z)\prec e^z. $$ Here, $γ_1$, $γ_2$, and $γ_3$ represent positive real numbers, and various selections of $h(z)$ are explored within the context… ▽ More This article presents several findings regarding second and third-order differential subordination of the form: $$ p(z)+γ_1 zp'(z)+γ_2 z^2p''(z)\prec h(z)\implies p(z)\prec e^z $$ and $$ p(z)+γ_1 zp'(z)+γ_2 z^2p''(z)+γ_3 z^3p'''(z)\prec h(z)\implies p(z)\prec e^z. $$ Here, $γ_1$, $γ_2$, and $γ_3$ represent positive real numbers, and various selections of $h(z)$ are explored within the context of the class $\mathcal{S}^{*}_{e} := \{f \in \mathcal{A} : zf'(z)/f(z) \prec e^z\}$, which denotes the class of starlike functions associated with the exponential function. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2306.11215

arXiv:2403.17563 [pdf, other]

Higher order differential subordinations for certain starlike functions

Authors: Neha Verma, S. Sivaprasad Kumar

Abstract: In this paper, we employ a novel second and third-order differential subordination technique to establish the sufficient conditions for functions to belong to the classes $\mathcal{S}^*_s$ and $\mathcal{S}^*_ρ$, where $\mathcal{S}^*_s$ is the set of all normalized analytic functions $f$ satisfying $ zf'(z)/f(z)\prec 1+\sin z$ and $\mathcal{S}^*_ρ$ is the set of all normalized analytic functions… ▽ More In this paper, we employ a novel second and third-order differential subordination technique to establish the sufficient conditions for functions to belong to the classes $\mathcal{S}^*_s$ and $\mathcal{S}^*_ρ$, where $\mathcal{S}^*_s$ is the set of all normalized analytic functions $f$ satisfying $ zf'(z)/f(z)\prec 1+\sin z$ and $\mathcal{S}^*_ρ$ is the set of all normalized analytic functions $f$ satisfying $ zf'(z)/f(z)\prec 1+\sinh^{-1} z$. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: arXiv admin note: text overlap with arXiv:2306.11215

arXiv:2403.17558 [pdf, ps, other]

Neural category

Authors: Neha Gupta, Suhith K N

Abstract: A neural code on $ n $ neurons is a collection of subsets of the set $ [n]=\{1,2,\dots,n\} $. Curto et al. \cite{curto2013neural} associated a ring $\mathcal{R}_{\mathcal{C}}$ (neural ring) to a neural code $\mathcal{C}$. A special class of ring homomorphisms between two neural rings, called neural ring homomorphism, was introduced by Curto and Youngs \cite{curto2020neural}. The main work in this… ▽ More A neural code on $ n $ neurons is a collection of subsets of the set $ [n]=\{1,2,\dots,n\} $. Curto et al. \cite{curto2013neural} associated a ring $\mathcal{R}_{\mathcal{C}}$ (neural ring) to a neural code $\mathcal{C}$. A special class of ring homomorphisms between two neural rings, called neural ring homomorphism, was introduced by Curto and Youngs \cite{curto2020neural}. The main work in this paper comprises constructing two categories. First is the $\mathfrak{C}$ category, a subcategory of SETS consisting of neural codes and code maps. Second is the neural category $\mathfrak{N}$, a subcategory of \textit{Rngs} consisting of neural rings and neural ring homomorphisms. Then, the rest of the paper characterizes the properties of these two categories like initial and final objects, products, coproducts, limits, etc. Also, we show that these two categories are in dual equivalence. △ Less

Submitted 26 March, 2024; originally announced March 2024.

MSC Class: 52A37; 92B99; 18A99

arXiv:2403.17548 [pdf, other]

Properties of graphs of neural codes

Authors: Suhith K N, Neha Gupta

Abstract: A neural code on $ n $ neurons is a collection of subsets of the set $ [n]=\{1,2,\dots,n\} $. In this paper, we study some properties of graphs of neural codes. In particular, we study codeword containment graph (CCG) given by Chan et al. (SIAM J. on Dis. Math., 37(1):114-145,2017) and general relationship graph (GRG) given by Gross et al. (Adv. in App. Math., 95:65-95, 2018). We provide a suffici… ▽ More A neural code on $ n $ neurons is a collection of subsets of the set $ [n]=\{1,2,\dots,n\} $. In this paper, we study some properties of graphs of neural codes. In particular, we study codeword containment graph (CCG) given by Chan et al. (SIAM J. on Dis. Math., 37(1):114-145,2017) and general relationship graph (GRG) given by Gross et al. (Adv. in App. Math., 95:65-95, 2018). We provide a sufficient condition for CCG to be connected. We also show that the connectedness and completeness of CCG are preserved under surjective morphisms between neural codes defined by A. Jeffs (SIAM J. on App. Alg. and Geo., 4(1):99-122,2020). Further, we show that if CCG of any neural code $\mathcal{C}$ is complete with $|\mathcal{C}|=m$, then $\mathcal{C} \cong \{\emptyset,1,12,\dots,123\cdots m\}$ as neural codes. We also prove that a code whose CCG is complete is open convex. Later, we show that if a code $\mathcal{C}$ with $|\mathcal{C}|>3$ has its CCG to be connected 2-regular then $|\mathcal{C}| $ is even. The GRG was defined only for degree two neural codes using the canonical forms of its neural ideal. We first define GRG for any neural code. Then, we show the behaviour of GRGs under the various elementary code maps. At last, we compare these two graphs for certain classes of codes and see their properties. △ Less

Submitted 26 March, 2024; originally announced March 2024.

MSC Class: 52A37; 92B99; 05C40; 05C99

arXiv:2403.13313 [pdf, other]

Polaris: A Safety-focused LLM Constellation Architecture for Healthcare

Authors: Subhabrata Mukherjee, Paul Gamble, Markel Sanz Ausin, Neel Kant, Kriti Aggarwal, Neha Manjunath, Debajyoti Datta, Zhengliang Liu, Jiayuan Ding, Sophia Busacca, Cezanne Bianco, Swapnil Sharma, Rae Lasko, Michelle Voisard, Sanchay Harneja, Darya Filippova, Gerry Meixiong, Kevin Cha, Amir Youssefi, Meyhaa Buvanesh, Howard Weingram, Sebastian Bierman-Lytle, Harpreet Singh Mangat, Kim Parikh, Saad Godil , et al. (1 additional authors not shown)

Abstract: We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations. Our one-trillion parameter constellation system is composed of several multibillion parameter LLMs as co-operative agents: a stateful pr… ▽ More We develop Polaris, the first safety-focused LLM constellation for real-time patient-AI healthcare conversations. Unlike prior LLM works in healthcare focusing on tasks like question answering, our work specifically focuses on long multi-turn voice conversations. Our one-trillion parameter constellation system is composed of several multibillion parameter LLMs as co-operative agents: a stateful primary agent that focuses on driving an engaging conversation and several specialist support agents focused on healthcare tasks performed by nurses to increase safety and reduce hallucinations. We develop a sophisticated training protocol for iterative co-training of the agents that optimize for diverse objectives. We train our models on proprietary data, clinical care plans, healthcare regulatory documents, medical manuals, and other medical reasoning documents. We align our models to speak like medical professionals, using organic healthcare conversations and simulated ones between patient actors and experienced nurses. This allows our system to express unique capabilities such as rapport building, trust building, empathy and bedside manner. Finally, we present the first comprehensive clinician evaluation of an LLM system for healthcare. We recruited over 1100 U.S. licensed nurses and over 130 U.S. licensed physicians to perform end-to-end conversational evaluations of our system by posing as patients and rating the system on several measures. We demonstrate Polaris performs on par with human nurses on aggregate across dimensions such as medical safety, clinical readiness, conversational quality, and bedside manner. Additionally, we conduct a challenging task-based evaluation of the individual specialist support agents, where we demonstrate our LLM agents significantly outperform a much larger general-purpose LLM (GPT-4) as well as from its own medium-size class (LLaMA-2 70B). △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13190 [pdf, other]

3D Semantic MapNet: Building Maps for Multi-Object Re-Identification in 3D

Authors: Vincent Cartillier, Neha Jain, Irfan Essa

Abstract: We study the task of 3D multi-object re-identification from embodied tours. Specifically, an agent is given two tours of an environment (e.g. an apartment) under two different layouts (e.g. arrangements of furniture). Its task is to detect and re-identify objects in 3D - e.g. a "sofa" moved from location A to B, a new "chair" in the second layout at location C, or a "lamp" from location D in the f… ▽ More We study the task of 3D multi-object re-identification from embodied tours. Specifically, an agent is given two tours of an environment (e.g. an apartment) under two different layouts (e.g. arrangements of furniture). Its task is to detect and re-identify objects in 3D - e.g. a "sofa" moved from location A to B, a new "chair" in the second layout at location C, or a "lamp" from location D in the first layout missing in the second. To support this task, we create an automated infrastructure to generate paired egocentric tours of initial/modified layouts in the Habitat simulator using Matterport3D scenes, YCB and Google-scanned objects. We present 3D Semantic MapNet (3D-SMNet) - a two-stage re-identification model consisting of (1) a 3D object detector that operates on RGB-D videos with known pose, and (2) a differentiable object matching module that solves correspondence estimation between two sets of 3D bounding boxes. Overall, 3D-SMNet builds object-based maps of each layout and then uses a differentiable matcher to re-identify objects across the tours. After training 3D-SMNet on our generated episodes, we demonstrate zero-shot transfer to real-world rearrangement scenarios by instantiating our task in Replica, Active Vision, and RIO environments depicting rearrangements. On all datasets, we find 3D-SMNet outperforms competitive baselines. Further, we show jointly training on real and generated episodes can lead to significant improvements over training on real data alone. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 8pages

arXiv:2403.07534 [pdf, ps, other]

Frobenius numbers associated with Diophantine triples of $x^2+y^2=z^r$ (extended version)

Authors: Takao Komatsu, Neha Gupta, Manoj Upreti

Abstract: We give an explicit formula for the $p$-Frobenius number of triples associated with Diophantine equations $x^2+y^2=z^r$, that is, the largest positive integer that can only be represented in $p$ ways by combining the three integers of the solutions of Diophantine equations $x^2+y^2=z^r$. When $r=2$, the Frobenius number has already been given. We give an explicit formula for the $p$-Frobenius number of triples associated with Diophantine equations $x^2+y^2=z^r$, that is, the largest positive integer that can only be represented in $p$ ways by combining the three integers of the solutions of Diophantine equations $x^2+y^2=z^r$. When $r=2$, the Frobenius number has already been given. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.05726 [pdf, other]

Augmentations vs Algorithms: What Works in Self-Supervised Learning

Authors: Warren Morningstar, Alex Bijamov, Chris Duvarney, Luke Friedman, Neha Kalibhat, Luyang Liu, Philip Mansfield, Renan Rojas-Gomez, Karan Singhal, Bradley Green, Sushant Prakash

Abstract: We study the relative effects of data augmentations, pretraining algorithms, and model architectures in Self-Supervised Learning (SSL). While the recent literature in this space leaves the impression that the pretraining algorithm is of critical importance to performance, understanding its effect is complicated by the difficulty in making objective and direct comparisons between methods. We propos… ▽ More We study the relative effects of data augmentations, pretraining algorithms, and model architectures in Self-Supervised Learning (SSL). While the recent literature in this space leaves the impression that the pretraining algorithm is of critical importance to performance, understanding its effect is complicated by the difficulty in making objective and direct comparisons between methods. We propose a new framework which unifies many seemingly disparate SSL methods into a single shared template. Using this framework, we identify aspects in which methods differ and observe that in addition to changing the pretraining algorithm, many works also use new data augmentations or more powerful model architectures. We compare several popular SSL methods using our framework and find that many algorithmic additions, such as prediction networks or new losses, have a minor impact on downstream task performance (often less than $1\%$), while enhanced augmentation techniques offer more significant performance improvements ($2-4\%$). Our findings challenge the premise that SSL is being driven primarily by algorithmic improvements, and suggest instead a bitter lesson for SSL: that augmentation diversity and data / model scale are more critical contributors to recent advances in self-supervised learning. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 18 pages, 1 figure

arXiv:2403.04651 [pdf, other]

Cedar: A New Language for Expressive, Fast, Safe, and Analyzable Authorization (Extended Version)

Authors: Joseph W. Cutler, Craig Disselkoen, Aaron Eline, Shaobo He, Kyle Headley, Michael Hicks, Kesha Hietala, Eleftherios Ioannidis, John Kastner, Anwar Mamat, Darin McAdams, Matt McCutchen, Neha Rungta, Emina Torlak, Andrew Wells

Abstract: Cedar is a new authorization policy language designed to be ergonomic, fast, safe, and analyzable. Rather than embed authorization logic in an application's code, developers can write that logic as Cedar policies and delegate access decisions to Cedar's evaluation engine. Cedar's simple and intuitive syntax supports common authorization use-cases with readable policies, naturally leveraging concep… ▽ More Cedar is a new authorization policy language designed to be ergonomic, fast, safe, and analyzable. Rather than embed authorization logic in an application's code, developers can write that logic as Cedar policies and delegate access decisions to Cedar's evaluation engine. Cedar's simple and intuitive syntax supports common authorization use-cases with readable policies, naturally leveraging concepts from role-based, attribute-based, and relation-based access control models. Cedar's policy structure enables access requests to be decided quickly. Cedar's policy validator leverages optional typing to help policy writers avoid mistakes, but not get in their way. Cedar's design has been finely balanced to allow for a sound and complete logical encoding, which enables precise policy analysis, e.g., to ensure that when refactoring a set of policies, the authorized permissions do not change. We have modeled Cedar in the Lean programming language, and used Lean's proof assistant to prove important properties of Cedar's design. We have implemented Cedar in Rust, and released it open-source. Comparing Cedar to two open-source languages, OpenFGA and Rego, we find (subjectively) that Cedar has equally or more readable policies, but (objectively) performs far better. △ Less

Submitted 8 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.00986 [pdf, other]

Merging Text Transformer Models from Different Initializations

Authors: Neha Verma, Maha Elbayad

Abstract: Recent work on permutation-based model merging has shown impressive low- or zero-barrier mode connectivity between models from completely different initializations. However, this line of work has not yet extended to the Transformer architecture, despite its dominant popularity in the language domain. Therefore, in this work, we investigate the extent to which separate Transformer minima learn simi… ▽ More Recent work on permutation-based model merging has shown impressive low- or zero-barrier mode connectivity between models from completely different initializations. However, this line of work has not yet extended to the Transformer architecture, despite its dominant popularity in the language domain. Therefore, in this work, we investigate the extent to which separate Transformer minima learn similar features, and propose a model merging technique to investigate the relationship between these minima in the loss landscape. The specifics of the architecture, like its residual connections, multi-headed attention, and discrete, sequential input, require specific interventions in order to compute model permutations that remain within the same functional equivalence class. In merging these models with our method, we consistently find lower loss barriers between minima compared to model averaging, across models trained on a masked-language modeling task or fine-tuned on a language understanding benchmark. Our results show that the minima of these models are less sharp and isolated than previously understood, and provide a basis for future work on merging separately trained Transformer models. △ Less

Submitted 16 December, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Comments: TMLR, November 2024

arXiv:2402.18796 [pdf, other]

MOSAIC: A Modular System for Assistive and Interactive Cooking

Authors: Huaxiaoyue Wang, Kushal Kedia, Juntao Ren, Rahma Abdullah, Atiksh Bhardwaj, Angela Chao, Kelly Y Chen, Nathaniel Chin, Prithwish Dan, Xinyi Fan, Gonzalo Gonzalez-Pumariega, Aditya Kompella, Maximus Adrian Pace, Yash Sharma, Xiangwan Sun, Neha Sunkara, Sanjiban Choudhury

Abstract: We present MOSAIC, a modular architecture for home robots to perform complex collaborative tasks, such as cooking with everyday users. MOSAIC tightly collaborates with humans, interacts with users using natural language, coordinates multiple robots, and manages an open vocabulary of everyday objects. At its core, MOSAIC employs modularity: it leverages multiple large-scale pre-trained models for g… ▽ More We present MOSAIC, a modular architecture for home robots to perform complex collaborative tasks, such as cooking with everyday users. MOSAIC tightly collaborates with humans, interacts with users using natural language, coordinates multiple robots, and manages an open vocabulary of everyday objects. At its core, MOSAIC employs modularity: it leverages multiple large-scale pre-trained models for general tasks like language and image recognition, while using streamlined modules designed for task-specific control. We extensively evaluate MOSAIC on 60 end-to-end trials where two robots collaborate with a human user to cook a combination of 6 recipes. We also extensively test individual modules with 180 episodes of visuomotor picking, 60 episodes of human motion forecasting, and 46 online user evaluations of the task planner. We show that MOSAIC is able to efficiently collaborate with humans by running the overall system end-to-end with a real human user, completing 68.3% (41/60) collaborative cooking trials of 6 different recipes with a subtask completion rate of 91.6%. Finally, we discuss the limitations of the current system and exciting open challenges in this domain. The project's website is at https://portal-cornell.github.io/MOSAIC/ △ Less

Submitted 4 November, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: 22 pages, 13 figures; CoRL 2024

arXiv:2402.14595 [pdf, other]

Agile Requirement Change Management Model for Global Software Development

Authors: Neha Koulecar, Bachan Ghimire

Abstract: We propose a noble, comprehensive and robust agile requirements change management (ARCM) model that addresses the limitations of existing models and is tailored for agile software development in the global software development paradigm. To achieve this goal, we conducted an exhaustive literature review and an empirical study with RCM industry experts. Our study evaluated the effectiveness of the p… ▽ More We propose a noble, comprehensive and robust agile requirements change management (ARCM) model that addresses the limitations of existing models and is tailored for agile software development in the global software development paradigm. To achieve this goal, we conducted an exhaustive literature review and an empirical study with RCM industry experts. Our study evaluated the effectiveness of the proposed RCM model in a real-world setting and identifies any limitations or areas for improvement. The results of our study provide valuable insights into how the proposed RCM model can be applied in agile global software development environments to improve software development practices and optimize project success rates. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 15 pages, 1 figure

arXiv:2402.12840 [pdf, other]

ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic

Authors: Fajri Koto, Haonan Li, Sara Shatnawi, Jad Doughman, Abdelrahman Boda Sadallah, Aisha Alraeesi, Khalid Almubarak, Zaid Alyafeai, Neha Sengupta, Shady Shehata, Nizar Habash, Preslav Nakov, Timothy Baldwin

Abstract: The focus of language model evaluation has transitioned towards reasoning and knowledge-intensive tasks, driven by advancements in pretraining large models. While state-of-the-art models are partially trained on large Arabic texts, evaluating their performance in Arabic remains challenging due to the limited availability of relevant datasets. To bridge this gap, we present \datasetname{}, the firs… ▽ More The focus of language model evaluation has transitioned towards reasoning and knowledge-intensive tasks, driven by advancements in pretraining large models. While state-of-the-art models are partially trained on large Arabic texts, evaluating their performance in Arabic remains challenging due to the limited availability of relevant datasets. To bridge this gap, we present \datasetname{}, the first multi-task language understanding benchmark for the Arabic language, sourced from school exams across diverse educational levels in different countries spanning North Africa, the Levant, and the Gulf regions. Our data comprises 40 tasks and 14,575 multiple-choice questions in Modern Standard Arabic (MSA) and is carefully constructed by collaborating with native speakers in the region. Our comprehensive evaluations of 35 models reveal substantial room for improvement, particularly among the best open-source models. Notably, BLOOMZ, mT0, LLaMA2, and Falcon struggle to achieve a score of 50%, while even the top-performing Arabic-centric model only achieves a score of 62.3%. △ Less

Submitted 29 July, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Findings of ACL 2024

arXiv:2402.08791 [pdf, other]

Investigating Neutron Scattering in a Spherical Proportional Counter: A Tabletop Experiment

Authors: N. Panchal, L. Balogh, J. -F. Caron, G. Giroux, P. Gros

Abstract: In this paper, we report on a tabletop experiment studying neutron scattering in a Spherical Proportional Counter using an Am-Be source. Systematic studies were carried out to investigate the effect of gas mixture, pressure, operating voltage, and sphere size on the drift time-rise time relationship of the signal in a spherical proportional counter. Our experimental results showed good agreement w… ▽ More In this paper, we report on a tabletop experiment studying neutron scattering in a Spherical Proportional Counter using an Am-Be source. Systematic studies were carried out to investigate the effect of gas mixture, pressure, operating voltage, and sphere size on the drift time-rise time relationship of the signal in a spherical proportional counter. Our experimental results showed good agreement with MagBoltz simulations. These findings are a crucial step towards measuring the quenching factor in gases using a neutron beam for the New Experiments With Spheres-Gas (NEWS-G) experiment and has important implications for the development of neutron detection techniques and their potential applications in nuclear and particle physics. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2401.17823 [pdf, other]

Privacy-preserving data release leveraging optimal transport and particle gradient descent

Authors: Konstantin Donhauser, Javier Abad, Neha Hulkund, Fanny Yang

Abstract: We present a novel approach for differentially private data synthesis of protected tabular datasets, a relevant task in highly sensitive domains such as healthcare and government. Current state-of-the-art methods predominantly use marginal-based approaches, where a dataset is generated from private estimates of the marginals. In this paper, we introduce PrivPGD, a new generation method for margina… ▽ More We present a novel approach for differentially private data synthesis of protected tabular datasets, a relevant task in highly sensitive domains such as healthcare and government. Current state-of-the-art methods predominantly use marginal-based approaches, where a dataset is generated from private estimates of the marginals. In this paper, we introduce PrivPGD, a new generation method for marginal-based private data synthesis, leveraging tools from optimal transport and particle gradient descent. Our algorithm outperforms existing methods on a large range of datasets while being highly scalable and offering the flexibility to incorporate additional domain-specific constraints. △ Less

Submitted 29 July, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

Comments: Published at the Forty-first International Conference on Machine Learning

arXiv:2401.17705 [pdf]

Predicting suicidal behavior among Indian adults using childhood trauma, mental health questionnaires and machine learning cascade ensembles

Authors: Akash K Rao, Gunjan Y Trivedi, Riri G Trivedi, Anshika Bajpai, Gajraj Singh Chauhan, Vishnu K Menon, Kathirvel Soundappan, Hemalatha Ramani, Neha Pandya, Varun Dutt

Abstract: Among young adults, suicide is India's leading cause of death, accounting for an alarming national suicide rate of around 16%. In recent years, machine learning algorithms have emerged to predict suicidal behavior using various behavioral traits. But to date, the efficacy of machine learning algorithms in predicting suicidal behavior in the Indian context has not been explored in literature. In th… ▽ More Among young adults, suicide is India's leading cause of death, accounting for an alarming national suicide rate of around 16%. In recent years, machine learning algorithms have emerged to predict suicidal behavior using various behavioral traits. But to date, the efficacy of machine learning algorithms in predicting suicidal behavior in the Indian context has not been explored in literature. In this study, different machine learning algorithms and ensembles were developed to predict suicide behavior based on childhood trauma, different mental health parameters, and other behavioral factors. The dataset was acquired from 391 individuals from a wellness center in India. Information regarding their childhood trauma, psychological wellness, and other mental health issues was acquired through standardized questionnaires. Results revealed that cascade ensemble learning methods using a support vector machine, decision trees, and random forest were able to classify suicidal behavior with an accuracy of 95.04% using data from childhood trauma and mental health questionnaires. The study highlights the potential of using these machine learning ensembles to identify individuals with suicidal tendencies so that targeted interinterventions could be provided efficiently. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: 11 pages, presnted at the 4th International Conference on Frontiers in Computing and Systems (COMSYS 2023), Himachal Pradesh, October 2023

arXiv:2401.16939 [pdf, other]

Linear stability analysis of compressible boundary layer over an insulated wall: Existence of multiple new unstable modes for Mach number beyond 3

Authors: Neha Chaturvedi, Swagata Bhaumik, Rituparn Somvanshi

Abstract: Here, we investigate the linear spatial stability of a parallel two-dimensional compressible boundary layer on an adiabatic plate by considering 2D and 3D disturbances. We employ the Compound Matrix Method for the first time for compressible flows, which, unlike other conventional techniques, can efficiently eliminate the stiffness of the original equation. Our study explores flow Mach numbers ran… ▽ More Here, we investigate the linear spatial stability of a parallel two-dimensional compressible boundary layer on an adiabatic plate by considering 2D and 3D disturbances. We employ the Compound Matrix Method for the first time for compressible flows, which, unlike other conventional techniques, can efficiently eliminate the stiffness of the original equation. Our study explores flow Mach numbers ranging from low subsonic to supersonic cases, to investigate the effects of flow compressibility and spanwise variation of disturbances. We get some interesting results depending on the flow Mach number. Mack (AGARD Report No. 709, 1984) reported the existence of two unstable modes for Mach number greater than 3 from viscous calculations (the so-called second mode) that subsequently fuse to create only one unstable zone when Mach number increases. Our calculations show a series of unstable modes for a Mach number greater than 3. The number of such modes is much more than two (unlike what Mack reports). The number and the frequency extent of the corresponding unstable zones increase with an increase in M, which is significantly higher than subsonic or low-supersonic cases. While the shape of the neutral curves for the second unstable mode for a Mach number greater than 4 is similar to the fused neutral curve shown by Mack for a Mach number of 4.8, the characteristics of higher-order spatially unstable modes considering the viscous stability of supersonic boundary layers remain unreported to the best of our knowledge. The last one is the most novel element in the reported results. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 38 Pages, 9 Figures

arXiv:2401.14362 [pdf, ps, other]

The Typing Cure: Experiences with Large Language Model Chatbots for Mental Health Support

Authors: Inhwa Song, Sachin R. Pendse, Neha Kumar, Munmun De Choudhury

Abstract: People experiencing severe distress increasingly use Large Language Model (LLM) chatbots as mental health support tools. Discussions on social media have described how engagements were lifesaving for some, but evidence suggests that general-purpose LLM chatbots also have notable risks that could endanger the welfare of users if not designed responsibly. In this study, we investigate the lived expe… ▽ More People experiencing severe distress increasingly use Large Language Model (LLM) chatbots as mental health support tools. Discussions on social media have described how engagements were lifesaving for some, but evidence suggests that general-purpose LLM chatbots also have notable risks that could endanger the welfare of users if not designed responsibly. In this study, we investigate the lived experiences of people who have used LLM chatbots for mental health support. We build on interviews with 21 individuals from globally diverse backgrounds to analyze how users create unique support roles for their chatbots, fill in gaps in everyday care, and navigate associated cultural limitations when seeking support from chatbots. We ground our analysis in psychotherapy literature around effective support, and introduce the concept of therapeutic alignment, or aligning AI with therapeutic values for mental health contexts. Our study offers recommendations for how designers can approach the ethical and effective use of LLM chatbots and other AI mental health support tools in mental health care. △ Less

Submitted 9 May, 2025; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: The first two authors contributed equally to this work; typos corrected and post-review revisions incorporated

arXiv:2401.13673 [pdf, other]

Sacred Ecology: The Environmental Impact of African Traditional Religions

Authors: Neha Deopa, Daniele Rinaldo

Abstract: Do religions codify ecological principles? This paper explores theoretically and empirically the role religious beliefs play in shaping environmental interactions. We study African Traditional Religions (ATR) which place forests within a sacred sphere. We build a model of non-market interactions of the mean-field type where the actions of agents with heterogeneous religious beliefs continuously af… ▽ More Do religions codify ecological principles? This paper explores theoretically and empirically the role religious beliefs play in shaping environmental interactions. We study African Traditional Religions (ATR) which place forests within a sacred sphere. We build a model of non-market interactions of the mean-field type where the actions of agents with heterogeneous religious beliefs continuously affect the spatial density of forest cover. The equilibrium extraction policy shows how individual beliefs and their distribution among the population can be a key driver of forest conservation. The model also characterizes the role of resource scarcity in both individual and population extraction decisions. We test the model predictions empirically relying on the unique case of Benin, where ATR adherence is freely reported. Using an instrumental variable strategy that exploits the variation in proximity to the Benin-Nigerian border, we find that a 1 standard deviation increase in ATR adherence has a 0.4 standard deviation positive impact on forest cover change. We study the impact of historically belonging to the ancient Kingdom of Dahomey, birthplace of the Vodun religion. Using the original boundaries as a spatial discontinuity, we find positive evidence of Dahomey affiliation on contemporary forest change. Lastly, we compare observed forest cover to counterfactual outcomes by simulating the absence of ATR beliefs across the population. △ Less

Submitted 9 November, 2023; originally announced January 2024.

arXiv:2401.12092 [pdf, other]

doi 10.1038/s41467-023-44340-6

Discrete symmetries tested at 10$^{-4}$ precision using linear polarization of photons from positronium annihilations

Authors: Paweł Moskal, Eryk Czerwiński, Juhi Raj, Steven D. Bass, Ermias Y. Beyene, Neha Chug, Aurélien Coussat, Catalina Curceanu, Meysam Dadgar, Manish Das, Kamil Dulski, Aleksander Gajos, Marek Gorgol, Beatrix C. Hiesmayr, Bożena Jasińska, Krzysztof Kacprzak, Tevfik Kaplanoglu, Łukasz Kapłon, Konrad Klimaszewski, Paweł Konieczka, Grzegorz Korcyl, Tomasz Kozik, Wojciech Krzemień, Deepak Kumar, Simbarashe Moyo , et al. (16 additional authors not shown)

Abstract: Discrete symmetries play an important role in particle physics with violation of CP connected to the matter-antimatter imbalance in the Universe. We report the most precise test of P, T and CP invariance in decays of ortho-positronium, performed with methodology involving polarization of photons from these decays. Positronium, the simplest bound state of an electron and positron, is of recent inte… ▽ More Discrete symmetries play an important role in particle physics with violation of CP connected to the matter-antimatter imbalance in the Universe. We report the most precise test of P, T and CP invariance in decays of ortho-positronium, performed with methodology involving polarization of photons from these decays. Positronium, the simplest bound state of an electron and positron, is of recent interest with discrepancies reported between measured hyperfine energy structure and theory at the level of $10^{-4}$ signaling a need for better understanding of the positronium system at this level. We test discrete symmetries using photon polarizations determined via Compton scattering in the dedicated J-PET tomograph on an event-by-event basis and without the need to control the spin of the positronium with an external magnetic field, in contrast to previous experiments. Our result is consistent with QED expectations at the level of 0.0007 and one standard deviation. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 17 pages, 10 figures

Journal ref: Nature Communications 15, 78 (2024)

Showing 101–150 of 575 results for author: Neha