Skip to main content

Showing 201–250 of 4,837 results for author: R.A.

.
  1. arXiv:2503.13224  [pdf, other

    cs.CR cs.LG

    ProDiF: Protecting Domain-Invariant Features to Secure Pre-Trained Models Against Extraction

    Authors: Tong Zhou, Shijin Duan, Gaowen Liu, Charles Fleming, Ramana Rao Kompella, Shaolei Ren, Xiaolin Xu

    Abstract: Pre-trained models are valuable intellectual property, capturing both domain-specific and domain-invariant features within their weight spaces. However, model extraction attacks threaten these assets by enabling unauthorized source-domain inference and facilitating cross-domain transfer via the exploitation of domain-invariant features. In this work, we introduce **ProDiF**, a novel framework that… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: Accepted at the ICLR Workshop on Neural Network Weights as a New Data Modality 2025

  2. arXiv:2503.13133  [pdf, other

    gr-qc hep-th

    Regular black holes and their singular families

    Authors: Hyat Huang, Xiao-Pin Rao

    Abstract: Regular black holes without curvature singularity can arise in Einstein gravity with appropriate matter energy-momentum tensor. We show that these regular solutions represent only a special case of a much broader family of black holes with a free mass parameter. The regularity is achieved only at a specific mass value, and any deviation from the fine-tuned parameter inevitably results in curvature… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: 14 pages, 3 figures. Comments are welcome

  3. arXiv:2503.12964  [pdf, other

    cs.CV cs.AI cs.LG

    Training Video Foundation Models with NVIDIA NeMo

    Authors: Zeeshan Patel, Ethan He, Parth Mannan, Xiaowei Ren, Ryan Wolf, Niket Agarwal, Jacob Huffman, Zhuoyao Wang, Carl Wang, Jack Chang, Yan Bai, Tommy Huang, Linnan Wang, Sahil Jain, Shanmugam Ramasamy, Joseph Jennings, Ekaterina Sirazitdinova, Oleg Sudakov, Mingyuan Ma, Bobby Chen, Forrest Lin, Hao Wang, Vasanth Rao Naik Sabavat, Sriharsha Niverty, Rong Ou , et al. (4 additional authors not shown)

    Abstract: Video Foundation Models (VFMs) have recently been used to simulate the real world to train physical AI systems and develop creative visual experiences. However, there are significant challenges in training large-scale, high quality VFMs that can generate high-quality videos. We present a scalable, open-source VFM training pipeline with NVIDIA NeMo, providing accelerated video dataset curation, mul… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  4. arXiv:2503.12863  [pdf, ps, other

    math.PR math.ST

    Parameter estimation for generalized mixed fractional stochastic heat equation

    Authors: B. L. S. Prakasa Rao

    Abstract: We study the properties of a stochastic heat equation with a generalized mixed fractional Brownian noise. We obtain the covariance structure, stationarity and obtain bounds for the asymptotic behaviour of the solution. We suggest estimators for the unknown parameters based on discrete time observations and study their asymptotic properties.

    Submitted 17 March, 2025; originally announced March 2025.

    MSC Class: 60G22

  5. arXiv:2503.12826  [pdf, other

    gr-qc

    Active and Passive Conformal Transformations in Scalar-Tensor Gravitational Theories

    Authors: Israel Quiros, Amit Kumar Rao

    Abstract: Through considering the conformal transformations as coordinate transformations in some abstract space of fields, where the different fields are assumed as ``generalized coordinates,'' we introduce the notion of active and passive conformal transformations. We then apply both complementary approaches to the conformal frames issue, arising in the context of scalar-tensor gravity theories, in order… ▽ More

    Submitted 24 April, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

    Comments: 19 pages 1 figure. Improvements in the text for better understanding, bibliographic references added

  6. arXiv:2503.12446  [pdf, other

    cs.CV cs.AI

    BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries

    Authors: Tianle Li, Yongming Rao, Winston Hu, Yu Cheng

    Abstract: Encoder-free multimodal large language models(MLLMs) eliminate the need for a well-trained vision encoder by directly processing image tokens before the language model. While this approach reduces computational overhead and model complexity, it often requires large amounts of training data to effectively capture the visual knowledge typically encoded by vision models like CLIP. The absence of a vi… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  7. A Comparative Study of Invariance-Aware Loss Functions for Deep Learning-based Gridless Direction-of-Arrival Estimation

    Authors: Kuan-Lin Chen, Bhaskar D. Rao

    Abstract: Covariance matrix reconstruction has been the most widely used guiding objective in gridless direction-of-arrival (DoA) estimation for sparse linear arrays. Many semidefinite programming (SDP)-based methods fall under this category. Although deep learning-based approaches enable the construction of more sophisticated objective functions, most methods still rely on covariance matrix reconstruction.… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

    Comments: 5 pages. Accepted at ICASSP 2025

  8. arXiv:2503.12317  [pdf

    cs.AI

    A Transformer-based survival model for prediction of all-cause mortality in heart failure patients: a multi-cohort study

    Authors: Shishir Rao, Nouman Ahmed, Gholamreza Salimi-Khorshidi, Christopher Yau, Huimin Su, Nathalie Conrad, Folkert W Asselbergs, Mark Woodward, Rod Jackson, John GF Cleland, Kazem Rahimi

    Abstract: We developed and validated TRisk, a Transformer-based AI model predicting 36-month mortality in heart failure patients by analysing temporal patient journeys from UK electronic health records (EHR). Our study included 403,534 heart failure patients (ages 40-90) from 1,418 English general practices, with 1,063 practices for model derivation and 355 for external validation. TRisk was compared agains… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  9. arXiv:2503.12295  [pdf, other

    cs.LG math.NA

    Towards Learning High-Precision Least Squares Algorithms with Sequence Models

    Authors: Jerry Liu, Jessica Grogan, Owen Dugan, Ashish Rao, Simran Arora, Atri Rudra, Christopher Ré

    Abstract: This paper investigates whether sequence models can learn to perform numerical algorithms, e.g. gradient descent, on the fundamental problem of least squares. Our goal is to inherit two properties of standard algorithms from numerical analysis: (1) machine precision, i.e. we want to obtain solutions that are accurate to near floating point error, and (2) numerical generality, i.e. we want them to… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

    Comments: 75 pages, 18 figures. ICLR 2025

  10. arXiv:2503.10842  [pdf, other

    quant-ph

    Monte Carlo model of distilled remote entanglement between superconducting qubits across optical channels

    Authors: Nicolas Dirnegger, Moein Malekakhlagh, Vikesh Siddhu, Ashutosh Rao, Chi Xiong, Muir Kumph, Jason Orcutt, Abram Falk

    Abstract: A promising quantum computing architecture comprises modules of superconducting quantum processors linked by optical channels via quantum transducers. To map transducer device performance to system-level channel performance, our model uses Monte Carlo simulations that incorporate 2-to-1 and 3-to-1 entanglement distillation protocols. We show that the Extreme Photon Loss distillation protocol is pa… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: 13 pages, 8 figures, 1 table, APS March Meeting Conference

  11. arXiv:2503.10621  [pdf, other

    cs.CV cs.RO

    DriveLMM-o1: A Step-by-Step Reasoning Dataset and Large Multimodal Model for Driving Scenario Understanding

    Authors: Ayesha Ishaq, Jean Lahoud, Ketan More, Omkar Thawakar, Ritesh Thawkar, Dinura Dissanayake, Noor Ahsan, Yuhao Li, Fahad Shahbaz Khan, Hisham Cholakkal, Ivan Laptev, Rao Muhammad Anwer, Salman Khan

    Abstract: While large multimodal models (LMMs) have demonstrated strong performance across various Visual Question Answering (VQA) tasks, certain challenges require complex multi-step reasoning to reach accurate answers. One particularly challenging task is autonomous driving, which demands thorough cognitive processing before decisions can be made. In this domain, a sequential and interpretive understandin… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: 8 pages, 4 figures, 3 tables, github: https://github.com/ayesha-ishaq/DriveLMM-o1

  12. arXiv:2503.10615  [pdf, other

    cs.CV

    R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

    Authors: Yi Yang, Xiaoxuan He, Hongkun Pan, Xiyan Jiang, Yan Deng, Xingtao Yang, Haoyu Lu, Dacheng Yin, Fengyun Rao, Minfeng Zhu, Bo Zhang, Wei Chen

    Abstract: Large Language Models have demonstrated remarkable reasoning capability in complex textual tasks. However, multimodal reasoning, which requires integrating visual and textual information, remains a significant challenge. Existing visual-language models often struggle to effectively analyze and reason visual content, resulting in suboptimal performance on complex reasoning tasks. Moreover, the abse… ▽ More

    Submitted 18 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: Code and Model: https://github.com/Fancy-MLLM/R1-onevision

  13. arXiv:2503.10512  [pdf, other

    cs.LG cs.AI

    Conformal Prediction Sets for Deep Generative Models via Reduction to Conformal Regression

    Authors: Hooman Shahrokhi, Devjeet Raj Roy, Yan Yan, Venera Arnaoudova, Janaradhan Rao Doppa

    Abstract: We consider the problem of generating valid and small prediction sets by sampling outputs (e.g., software code and natural language text) from a black-box deep generative model for a given input (e.g., textual prompt). The validity of a prediction set is determined by a user-defined binary admissibility function depending on the target application. For example, requiring at least one program in th… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  14. arXiv:2503.10330  [pdf, other

    cond-mat.str-el

    Dynamical response theory of interacting Majorana fermions and its application to generic Kitaev quantum spin liquids in a field

    Authors: Peng Rao, Roderich Moessner, Johannes Knolle

    Abstract: Motivated by the appearance of Majorana fermions in a broad range of correlated and topological electronic systems, we develop a general method to compute the dynamical response of interacting Majorana fermions in the random-phase approximation (RPA). This can be applied self-consistently on top of Majorana mean-field theory (MFT) backgrounds, thereby in particular providing a powerful tool to ana… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: 19 pages, 14 figures

  15. FDCT: Frequency-Aware Decomposition and Cross-Modal Token-Alignment for Multi-Sensor Target Classification

    Authors: Shoaib Meraj Sami, Md Mahedi Hasan, Nasser M. Nasrabadi, Raghuveer Rao

    Abstract: In automatic target recognition (ATR) systems, sensors may fail to capture discriminative, fine-grained detail features due to environmental conditions, noise created by CMOS chips, occlusion, parallaxes, and sensor misalignment. Therefore, multi-sensor image fusion is an effective choice to overcome these constraints. However, multi-modal image sensors are heterogeneous and have domain and granul… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: 12 pages Accepted in the IEEE Transactions on Aerospace and Electronic Systems

  16. arXiv:2503.09290  [pdf, ps, other

    eess.SP

    Adaptive and Self-Tuning SBL with Total Variation Priors for Block-Sparse Signal Recovery

    Authors: Hamza Djelouat, Reijo Leinonen, Mikko J. Sillanpää, Bhaskar D. Rao, Markku Juntti

    Abstract: This letter addresses the problem of estimating block sparse signal with unknown group partitions in a multiple measurement vector (MMV) setup. We propose a Bayesian framework by applying an adaptive total variation (TV) penalty on the hyper-parameter space of the sparse signal. The main contributions are two-fold. 1) We extend the TV penalty beyond the immediate neighbor, thus enabling better cap… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  17. arXiv:2503.08600  [pdf, other

    cs.CL

    NSF-SciFy: Mining the NSF Awards Database for Scientific Claims

    Authors: Delip Rao, Weiqiu You, Eric Wong, Chris Callison-Burch

    Abstract: We present NSF-SciFy, a large-scale dataset for scientific claim extraction derived from the National Science Foundation (NSF) awards database, comprising over 400K grant abstracts spanning five decades. While previous datasets relied on published literature, we leverage grant abstracts which offer a unique advantage: they capture claims at an earlier stage in the research lifecycle before publica… ▽ More

    Submitted 15 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: 11 pages, 3 figures, 6 tables

  18. arXiv:2503.07891  [pdf, other

    cs.CL cs.AI

    Gemini Embedding: Generalizable Embeddings from Gemini

    Authors: Jinhyuk Lee, Feiyang Chen, Sahil Dua, Daniel Cer, Madhuri Shanbhogue, Iftekhar Naim, Gustavo Hernández Ábrego, Zhe Li, Kaifeng Chen, Henrique Schechter Vera, Xiaoqi Ren, Shanfeng Zhang, Daniel Salz, Michael Boratko, Jay Han, Blair Chen, Shuo Huang, Vikram Rao, Paul Suganthan, Feng Han, Andreas Doumanoglou, Nithi Gupta, Fedor Moiseev, Cathy Yip, Aashi Jain , et al. (22 additional authors not shown)

    Abstract: In this report, we introduce Gemini Embedding, a state-of-the-art embedding model leveraging the power of Gemini, Google's most capable large language model. Capitalizing on Gemini's inherent multilingual and code understanding capabilities, Gemini Embedding produces highly generalizable embeddings for text spanning numerous languages and textual modalities. The representations generated by Gemini… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: 19 pages

  19. arXiv:2503.07284  [pdf, other

    math.NA

    An asymptotic preserving scheme satisfying entropy stability for the barotropic Euler system

    Authors: Megala Anandan, Mária Lukáčová-Medvid'ová, S. V. Raghurama Rao

    Abstract: In this paper we study structure-preserving numerical methods for low Mach number barotropic Euler equations. Besides their asymptotic preserving properties that are crucial in order to obtain uniformly consistent and stable approximations of the Euler equations in their singular limit as the Mach number approaches zero, our aim is also to preserve discrete entropy stability. Suitable acoustic/adv… ▽ More

    Submitted 14 May, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

  20. arXiv:2503.06486  [pdf, other

    cs.CV cs.AI

    PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training

    Authors: Cong Chen, Mingyu Liu, Chenchen Jing, Yizhou Zhou, Fengyun Rao, Hao Chen, Bo Zhang, Chunhua Shen

    Abstract: This paper aims to address the challenge of hallucinations in Multimodal Large Language Models (MLLMs) particularly for dense image captioning tasks. To tackle the challenge, we identify the current lack of a metric that finely measures the caption quality in concept level. We hereby introduce HalFscore, a novel metric built upon the language graph and is designed to evaluate both the accuracy and… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  21. arXiv:2503.06426  [pdf, ps, other

    cs.LG cs.CV cs.DC

    Federated Learning for Diffusion Models

    Authors: Zihao Peng, Xijun Wang, Shengbo Chen, Hong Rao, Cong Shen

    Abstract: Diffusion models are powerful generative models that can produce highly realistic samples for various tasks. Typically, these models are constructed using centralized, independently and identically distributed (IID) training data. However, in practical scenarios, data is often distributed across multiple clients and frequently manifests non-IID characteristics. Federated Learning (FL) can leverage… ▽ More

    Submitted 8 March, 2025; originally announced March 2025.

  22. arXiv:2503.05931  [pdf, other

    cs.CL eess.AS

    Training and Inference Efficiency of Encoder-Decoder Speech Models

    Authors: Piotr Żelasko, Kunal Dhawan, Daniel Galvez, Krishna C. Puvvada, Ankita Pasad, Nithin Rao Koluguri, Ke Hu, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg

    Abstract: Attention encoder-decoder model architecture is the backbone of several recent top performing foundation speech models: Whisper, Seamless, OWSM, and Canary-1B. However, the reported data and compute requirements for their training are prohibitive for many in the research community. In this work, we focus on the efficiency angle and ask the questions of whether we are training these speech models e… ▽ More

    Submitted 19 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

  23. arXiv:2503.05701  [pdf

    cs.LG cs.CL

    OPTIC: Optimizing Patient-Provider Triaging & Improving Communications in Clinical Operations using GPT-4 Data Labeling and Model Distillation

    Authors: Alberto Santamaria-Pang, Frank Tuan, Ross Campbell, Cindy Zhang, Ankush Jindal, Roopa Surapur, Brad Holloman, Deanna Hanisch, Rae Buckley, Carisa Cooney, Ivan Tarapov, Kimberly S. Peairs, Brian Hasselfeld, Peter Greene

    Abstract: The COVID-19 pandemic has accelerated the adoption of telemedicine and patient messaging through electronic medical portals (patient medical advice requests, or PMARs). While these platforms enhance patient access to healthcare, they have also increased the burden on healthcare providers due to the surge in PMARs. This study seeks to develop an efficient tool for message triaging to reduce physici… ▽ More

    Submitted 5 February, 2025; originally announced March 2025.

    Comments: 15 pages, 8 figures. submitted to Journal of the American Medical Informatics Association

  24. arXiv:2503.05473  [pdf, other

    cs.NE cs.AI

    The Society of HiveMind: Multi-Agent Optimization of Foundation Model Swarms to Unlock the Potential of Collective Intelligence

    Authors: Noah Mamie, Susie Xi Rao

    Abstract: Multi-agent systems address issues of accessibility and scalability of artificial intelligence (AI) foundation models, which are often represented by large language models. We develop a framework - the "Society of HiveMind" (SOHM) - that orchestrates the interaction between multiple AI foundation models, imitating the observed behavior of animal swarms in nature by following modern evolutionary th… ▽ More

    Submitted 13 March, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

    Comments: 11 pages (excl. appendix)

  25. arXiv:2503.05122  [pdf, other

    cs.CV

    EDM: Efficient Deep Feature Matching

    Authors: Xi Li, Tong Rao, Cihui Pan

    Abstract: Recent feature matching methods have achieved remarkable performance but lack efficiency consideration. In this paper, we revisit the mainstream detector-free matching pipeline and improve all its stages considering both accuracy and efficiency. We propose an Efficient Deep feature Matching network, EDM. We first adopt a deeper CNN with fewer dimensions to extract multi-level features. Then we pre… ▽ More

    Submitted 22 May, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

  26. arXiv:2503.04998  [pdf, other

    cs.RO

    Multi-Agent Ergodic Exploration under Smoke-Based, Time-Varying Sensor Visibility Constraints

    Authors: Elena Wittemyer, Ananya Rao, Ian Abraham, Howie Choset

    Abstract: In this work, we consider the problem of multi-agent informative path planning (IPP) for robots whose sensor visibility continuously changes as a consequence of a time-varying natural phenomenon. We leverage ergodic trajectory optimization (ETO), which generates paths such that the amount of time an agent spends in an area is proportional to the expected information in that area. We focus specific… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: Accepted to ICRA 2025

  27. arXiv:2503.04857  [pdf, other

    cs.LG stat.ML

    A kinetic-based regularization method for data science applications

    Authors: Abhisek Ganguly, Alessandro Gabbana, Vybhav Rao, Sauro Succi, Santosh Ansumali

    Abstract: We propose a physics-based regularization technique for function learning, inspired by statistical mechanics. By drawing an analogy between optimizing the parameters of an interpolator and minimizing the energy of a system, we introduce corrections that impose constraints on the lower-order moments of the data distribution. This minimizes the discrepancy between the discrete and continuum represen… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  28. arXiv:2503.04724  [pdf, other

    cs.CL

    LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

    Authors: Sambal Shikhar, Mohammed Irfan Kurpath, Sahal Shaji Mullappilly, Jean Lahoud, Fahad Khan, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal

    Abstract: Recent advancements in speech-to-speech dialogue systems leverage LLMs for multimodal interactions, yet they remain hindered by fine-tuning requirements, high computational overhead, and text-speech misalignment. Existing speech-enabled LLMs often degrade conversational quality by modifying the LLM, thereby compromising its linguistic capabilities. In contrast, we propose LLMVoX, a lightweight 30M… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  29. arXiv:2503.04641  [pdf, other

    cs.CV cs.AI cs.LG

    Simulating the Real World: A Unified Survey of Multimodal Generative Models

    Authors: Yuqi Hu, Longguang Wang, Xian Liu, Ling-Hao Chen, Yuwei Guo, Yukai Shi, Ce Liu, Anyi Rao, Zeyu Wang, Hui Xiong

    Abstract: Understanding and replicating the real world is a critical challenge in Artificial General Intelligence (AGI) research. To achieve this, many existing approaches, such as world models, aim to capture the fundamental principles governing the physical world, enabling more accurate simulations and meaningful interactions. However, current methods often treat different modalities, including 2D (images… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: Repository for the related papers at https://github.com/ALEEEHU/World-Simulator

  30. arXiv:2503.04477  [pdf, other

    q-bio.MN

    Exact first passage time distribution for nonlinear chemical reaction networks II: monomolecular reactions and a A + B - C type of second-order reaction with arbitrary initial conditions

    Authors: Changqian Rao, David Waxman, Wei Lin, Zhuoyi Song

    Abstract: In biochemical reaction networks, the first passage time (FPT) of a reaction quantifies the time it takes for the reaction to first occur, from the initial state. While the mean FPT historically served as a summary metric, a far more comprehensive characterization of the dynamics of the network is contained within the complete FPT distribution. The relatively uncommon theoretical treatments of the… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: 13 pages, 5 figures, 4 tables

  31. arXiv:2503.03922  [pdf, other

    cond-mat.mtrl-sci

    Mapping strain and structural heterogeneities around bubbles in amorphous ionically conductive Bi$_2$O$_3$

    Authors: Ellis Rae Kennedy, Stephanie M. Ribet, Ian S. Winter, Caitlin A. Kohnert, Yongqiang Wang, Karen C. Bustillo, Colin Ophus, Benjamin K. Derby

    Abstract: While amorphous materials are often approximated to have a statistically homogeneous atomic structure, they frequently exhibit localized structural heterogeneity that challenges simplified models. This study uses 4D scanning transmission electron microscopy to investigate the strain and structural modifications around gas bubbles in amorphous Bi$_2$O$_3$ induced by argon irradiation. We present a… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: 13 pages, 7 figures. E.R. Kennedy and S.M. Ribet contributed equally to this work

    Report number: LA-UR-24-27734

  32. arXiv:2503.03623  [pdf

    cond-mat.mtrl-sci

    Effect of Ag nano-additivation on microstructure formation in Nd-Fe-B magnets built by laser powder bed fusion

    Authors: Varatharaja Nallathambi, Philipp Gabriel, Xinren Chen, Ziyuan Rao, Konstantin Skokov, Oliver Gutfleisch, Stephan Barcikowski, Anna Rosa Ziefuss, Baptiste Gault

    Abstract: Laser powder bed fusion (PBF-LB/M) enables the near-net shape production of permanent magnets with complex geometry while reducing material waste. However, controlling the microstructure and optimizing magnetic properties remain challenging due to rapid solidification and intrinsic heat treatment effects occurring during both inter-layer and intra-layer processing. Surface additivation of the feed… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  33. arXiv:2503.03381  [pdf

    cond-mat.mtrl-sci

    Lattice dynamics of hexagonal ZnMgS

    Authors: Abdelmahjid Elmahjoubi, Mala Rao, Alexandre Ivanov, Andrei Postnikov, Alain Polian, Toni Alhaddad, Samrath Chaplot, Andrea Piovano, Sebastien Diliberto, Stephanie Michel, Alain Maillard, Karol Strzalkowski, O. Pages

    Abstract: Inelastic neutron scattering measurements on the hexagonal Zn67Mg33S semiconductor alloy reveal a bimodal pattern of the optical modes across the Brillouin zone, confirmed by first-principles simulations. Such modes are sensitive to the local fluctuations in the composition inherent to random Zn/Mg alloying, distinguishing homo from hetero environments of a given bond (1-bond/2-mode), as is formal… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: 30 pages, 14 figures

    MSC Class: 70 ACM Class: J.2

  34. arXiv:2503.03255  [pdf, other

    cs.CV

    Computational Analysis of Degradation Modeling in Blind Panoramic Image Quality Assessment

    Authors: Jiebin Yan, Ziwen Tan, Jiale Rao, Lei Wu, Yifan Zuo, Yuming Fang

    Abstract: Blind panoramic image quality assessment (BPIQA) has recently brought new challenge to the visual quality community, due to the complex interaction between immersive content and human behavior. Although many efforts have been made to advance BPIQA from both conducting psychophysical experiments and designing performance-driven objective algorithms, \textit{limited content} and \textit{few samples}… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  35. arXiv:2503.02665  [pdf, other

    physics.ao-ph cs.LG math.OC

    Weakly-Constrained 4D Var for Downscaling with Uncertainty using Data-Driven Surrogate Models

    Authors: Philip Dinenis, Vishwas Rao, Mihai Anitescu

    Abstract: Dynamic downscaling typically involves using numerical weather prediction (NWP) solvers to refine coarse data to higher spatial resolutions. Data-driven models such as FourCastNet have emerged as a promising alternative to the traditional NWP models for forecasting. Once these models are trained, they are capable of delivering forecasts in a few seconds, thousands of times faster compared to class… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  36. arXiv:2503.01883  [pdf, other

    cs.LG cs.AI

    Learning Surrogates for Offline Black-Box Optimization via Gradient Matching

    Authors: Minh Hoang, Azza Fadhel, Aryan Deshwal, Janardhan Rao Doppa, Trong Nghia Hoang

    Abstract: Offline design optimization problem arises in numerous science and engineering applications including material and chemical design, where expensive online experimentation necessitates the use of in silico surrogate functions to predict and maximize the target objective over candidate designs. Although these surrogates can be learned from offline data, their predictions are often inaccurate outside… ▽ More

    Submitted 26 February, 2025; originally announced March 2025.

    Comments: Accepted at ICML 2024

  37. arXiv:2503.01725  [pdf, other

    cs.CV

    HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization

    Authors: Zitang Zhou, Ke Mei, Yu Lu, Tianyi Wang, Fengyun Rao

    Abstract: This paper introduces HarmonySet, a comprehensive dataset designed to advance video-music understanding. HarmonySet consists of 48,328 diverse video-music pairs, annotated with detailed information on rhythmic synchronization, emotional alignment, thematic coherence, and cultural relevance. We propose a multi-step human-machine collaborative framework for efficient annotation, combining human insi… ▽ More

    Submitted 4 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted at CVPR 2025. Project page: https://harmonyset.github.io/

  38. arXiv:2503.01325  [pdf, other

    math.OC cs.NE

    Towards net-zero manufacturing: carbon-aware scheduling for GHG emissions reduction

    Authors: Andrea Mencaroni, Pieter Leyman, Birger Raa, Stijn De Vuyst, Dieter Claeys

    Abstract: Detailed scheduling has traditionally been optimized for the reduction of makespan and manufacturing costs. However, growing awareness of environmental concerns and increasingly stringent regulations are pushing manufacturing towards reducing the carbon footprint of its operations. Scope 2 emissions, which are the indirect emissions related to the production and consumption of grid electricity, ar… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  39. arXiv:2503.00983  [pdf, other

    quant-ph

    Quantum nonlocal double slit interference with partially coherent qubits

    Authors: Sakshi Rao, Bhaskar Kanseri

    Abstract: Partially coherent quantum-entangled beams combine quantum entanglement with partial coherence, allowing them to maintain quantum characteristics while being more resistant to distortions caused by random media during propagation. In this study, we investigate the effect of coherence variation of such beams on non-local double-slit quantum interference. The spatial coherence variation is achieved… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

  40. arXiv:2503.00885  [pdf, ps, other

    cs.GT

    Social Welfare Maximization in Approval-Based Committee Voting under Uncertainty

    Authors: Haris Aziz, Yuhang Guo, Venkateswara Rao Kagita, Baharak Rastegari, Mashbat Suzuki

    Abstract: Approval voting is widely used for making multi-winner voting decisions. The canonical rule (also called Approval Voting) used in the setting aims to maximize social welfare by selecting candidates with the highest number of approvals. We revisit approval-based multi-winner voting in scenarios where the information regarding the voters' preferences is uncertain. We present several algorithmic resu… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

  41. arXiv:2503.00723  [pdf, other

    cs.LG

    Re-Imagining Multimodal Instruction Tuning: A Representation View

    Authors: Yiyang Liu, James Chenhao Liang, Ruixiang Tang, Yugyung Lee, Majid Rabbani, Sohail Dianat, Raghuveer Rao, Lifu Huang, Dongfang Liu, Qifan Wang, Cheng Han

    Abstract: Multimodal instruction tuning has proven to be an effective strategy for achieving zero-shot generalization by fine-tuning pre-trained Large Multimodal Models (LMMs) with instruction-following data. However, as the scale of LMMs continues to grow, fully fine-tuning these models has become highly parameter-intensive. Although Parameter-Efficient Fine-Tuning (PEFT) methods have been introduced to re… ▽ More

    Submitted 20 March, 2025; v1 submitted 1 March, 2025; originally announced March 2025.

  42. arXiv:2503.00636  [pdf, other

    astro-ph.IM astro-ph.CO astro-ph.EP astro-ph.GA astro-ph.HE astro-ph.SR

    The Simons Observatory: Science Goals and Forecasts for the Enhanced Large Aperture Telescope

    Authors: The Simons Observatory Collaboration, M. Abitbol, I. Abril-Cabezas, S. Adachi, P. Ade, A. E. Adler, P. Agrawal, J. Aguirre, Z. Ahmed, S. Aiola, T. Alford, A. Ali, D. Alonso, M. A. Alvarez, R. An, K. Arnold, P. Ashton, Z. Atkins, J. Austermann, S. Azzoni, C. Baccigalupi, A. Baleato Lizancos, D. Barron, P. Barry, J. Bartlett , et al. (397 additional authors not shown)

    Abstract: We describe updated scientific goals for the wide-field, millimeter-wave survey that will be produced by the Simons Observatory (SO). Significant upgrades to the 6-meter SO Large Aperture Telescope (LAT) are expected to be complete by 2028, and will include a doubled mapping speed with 30,000 new detectors and an automated data reduction pipeline. In addition, a new photovoltaic array will supply… ▽ More

    Submitted 15 March, 2025; v1 submitted 1 March, 2025; originally announced March 2025.

    Comments: 44 pages, 7 figures; abstract slightly abridged; submitted to JCAP. Author contributions to this paper are available at https://simonsobservatory.org/wp-content/uploads/2025/02/Author-contribution-statement-20250228.pdf

  43. arXiv:2503.00619  [pdf, other

    cs.IR cs.AI cs.LG

    PinLanding: Content-First Keyword Landing Page Generation via Multi-Modal AI for Web-Scale Discovery

    Authors: Faye Zhang, Jasmine Wan, Qianyu Cheng, Jinfeng Rao

    Abstract: Online platforms like Pinterest hosting vast content collections traditionally rely on manual curation or user-generated search logs to create keyword landing pages (KLPs) -- topic-centered collection pages that serve as entry points for content discovery. While manual curation ensures quality, it doesn't scale to millions of collections, and search log approaches result in limited topic coverage… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  44. arXiv:2502.21321  [pdf, other

    cs.CL cs.CV

    LLM Post-Training: A Deep Dive into Reasoning Large Language Models

    Authors: Komal Kumar, Tajamul Ashraf, Omkar Thawakar, Rao Muhammad Anwer, Hisham Cholakkal, Mubarak Shah, Ming-Hsuan Yang, Phillip H. S. Torr, Fahad Shahbaz Khan, Salman Khan

    Abstract: Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Pretraining on vast web-scale data has laid the foundation for these models, yet the research community is now increasingly shifting focus toward post-training techniques to achieve further breakthroughs. While pretraining provides a broad linguistic foundation, post-tr… ▽ More

    Submitted 24 March, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

    Comments: 32 pages, 7 figures, 3 tables, 377 references. Github Repo: https://github.com/mbzuai-oryx/Awesome-LLM-Post-training

  45. arXiv:2502.21066  [pdf

    cond-mat.mtrl-sci

    Enhanced Electromechanical Properties of Solution-Processed K$_{0.5}$Na$_{0.5}$NbO$_{3}$ Thin Films

    Authors: Nagamalleswara Rao Alluri, Longfei Song, Stephanie Girod, Barnik Mandal, Juliette Cardoletti, Vid Bobnar, Torsten Granzow, Veronika Kovacova, Adrian-Marie Philippe, Emmanuel Defay, Sebastjan Glinsek

    Abstract: K$_{0.5}$Na$_{0.5}$NbO$_{3}$ is among the most promising lead-free piezoelectrics. While its sputtered films match the performance of the champion piezoelectric Pb(Zr,Ti)O$_{3}$, processing of high-quality, reproducible, and time-stable solution-processed K$_{0.5}$Na$_{0.5}$NbO$_{3}$ films remains challenging. Here, we report 1 $μ$m-thick Mn-doped K$_{0.5}$Na$_{0.5}$NbO$_{3}$ films prepared throug… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

  46. arXiv:2502.20286  [pdf, other

    stat.ML cs.LG stat.CO stat.ME

    Multiple Linked Tensor Factorization

    Authors: Zhiyu Kang, Raghavendra B. Rao, Eric F. Lock

    Abstract: In biomedical research and other fields, it is now common to generate high content data that are both multi-source and multi-way. Multi-source data are collected from different high-throughput technologies while multi-way data are collected over multiple dimensions, yielding multiple tensor arrays. Integrative analysis of these data sets is needed, e.g., to capture and synthesize different facets… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 26 pages, 4 figures, 7 tables

  47. arXiv:2502.19057  [pdf

    physics.optics cond-mat.mtrl-sci cond-mat.soft

    Light-Emitting Microfibers from Lotus Root for Eco-friendly Optical Waveguides and Biosensing

    Authors: X. Yang, L. Xu, S. Xiong, H. Rao, F. Tan, J. Yan, Y. Bao, A. Albanese, A. Camposeo, D. Pisignano, B. Li

    Abstract: Optical biosensors based on micro-/nano-fibers are highly valuable for probing and monitoring liquid environments and bioactivity. Most of current optical biosensors, however, are still based on glass, semiconductors, or metallic materials, which might be not fully suited for biologically-relevant environments. Here, we introduce biocompatible and flexible microfibers from Lotus silk as micro-envi… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 41 pages, 17 Figures, Nano Letters 2024

  48. arXiv:2502.19046  [pdf, other

    eess.IV cs.CV

    Max360IQ: Blind Omnidirectional Image Quality Assessment with Multi-axis Attention

    Authors: Jiebin Yan, Ziwen Tan, Yuming Fang, Jiale Rao, Yifan Zuo

    Abstract: Omnidirectional image, also called 360-degree image, is able to capture the entire 360-degree scene, thereby providing more realistic immersive feelings for users than general 2D image and stereoscopic image. Meanwhile, this feature brings great challenges to measuring the perceptual quality of omnidirectional images, which is closely related to users' quality of experience, especially when the om… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  49. arXiv:2502.18776  [pdf, other

    astro-ph.IM

    First frequency phase transfer from the 3 mm to the 1 mm band on an Earth-sized baseline

    Authors: Sara Issaoun, Dominic W. Pesce, María J. Rioja, Richard Dodson, Lindy Blackburn, Garrett K. Keating, Sheperd S. Doeleman, Bong Won Sohn, Wu Jiang, Dan Hoak, Wei Yu, Pablo Torne, Ramprasad Rao, Remo P. J. Tilanus, Iván Martí-Vidal, Taehyun Jung, Garret Fitzpatrick, Miguel Sánchez-Portal, Salvador Sánchez, Jonathan Weintroub, Mark Gurwell, Carsten Kramer, Carlos Durán, David John, Juan L. Santaren , et al. (11 additional authors not shown)

    Abstract: Frequency Phase Transfer (FPT) is a technique designed to increase coherence and sensitivity in radio interferometry by making use of the non-dispersive nature of the troposphere to calibrate high-frequency data using solutions derived at a lower frequency. While the Korean VLBI Network has pioneered the use of simultaneous multi-band systems for routine FPT up to an observing frequency of 130 GHz… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: 11 pages, 5 figures, accepted to AJ

  50. arXiv:2502.18302  [pdf, other

    cs.CV

    LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation

    Authors: Pengzhi Li, Pengfei Yu, Zide Liu, Wei He, Xuhao Pan, Xudong Rao, Tao Wei, Wei Chen

    Abstract: In this paper, we introduce LDGen, a novel method for integrating large language models (LLMs) into existing text-to-image diffusion models while minimizing computational demands. Traditional text encoders, such as CLIP and T5, exhibit limitations in multilingual processing, hindering image generation across diverse languages. We address these challenges by leveraging the advanced capabilities of… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.