Skip to main content

Showing 1–50 of 156 results for author: Xu, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09972  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Who Said What WSW 2.0? Enhanced Automated Analysis of Preschool Classroom Speech

    Authors: Anchen Sun, Tiantian Feng, Gabriela Gutierrez, Juan J Londono, Anfeng Xu, Batya Elbaum, Shrikanth Narayanan, Lynn K Perry, Daniel S Messinger

    Abstract: This paper introduces an automated framework WSW2.0 for analyzing vocal interactions in preschool classrooms, enhancing both accuracy and scalability through the integration of wav2vec2-based speaker classification and Whisper (large-v2 and large-v3) speech transcription. A total of 235 minutes of audio recordings (160 minutes from 12 children and 75 minutes from 5 teachers), were used to compare… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 8 pages, 2 figures, 5 tables

  2. arXiv:2504.16943  [pdf, other

    cs.CY cs.LG

    Flexibility of German gas-fired generation: evidence from clustering empirical operation

    Authors: Chiara Fusar Bassini, Alice Lixuan Xu, Jorge Sánchez Canales, Lion Hirth, Lynn H. Kaack

    Abstract: A key input to energy models are assumptions about the flexibility of power generation units, i.e., how quickly and often they can start up. These assumptions are usually calibrated on the technical characteristics of the units, such as installed capacity or technology type. However, even if power generation units technically can dispatch flexibly, service obligations and market incentives may con… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: 29 pages, 6 figures, 6 tables

  3. arXiv:2504.15976  [pdf, other

    cs.RO

    ad-trait: A Fast and Flexible Automatic Differentiation Library in Rust

    Authors: Chen Liang, Qian Wang, Andy Xu, Daniel Rakita

    Abstract: The Rust programming language is an attractive choice for robotics and related fields, offering highly efficient and memory-safe code. However, a key limitation preventing its broader adoption in these domains is the lack of high-quality, well-supported Automatic Differentiation (AD)-a fundamental technique that enables convenient derivative computation by systematically accumulating data during f… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  4. arXiv:2504.15253  [pdf, other

    cs.CL cs.LG

    Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators

    Authors: Yilun Zhou, Austin Xu, Peifeng Wang, Caiming Xiong, Shafiq Joty

    Abstract: Scaling test-time computation, or affording a generator large language model (LLM) extra compute during inference, typically employs the help of external non-generative evaluators (i.e., reward models). Concurrently, LLM-judges, models trained to generate evaluations and critiques (explanations) in natural language, are becoming increasingly popular in automatic evaluation. Despite judge empirical… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: The first two authors contributed equally. The codebase is at https://github.com/SalesforceAIResearch/jetts-benchmark

  5. arXiv:2504.14548  [pdf, other

    cs.CV cs.AI

    VGNC: Reducing the Overfitting of Sparse-view 3DGS via Validation-guided Gaussian Number Control

    Authors: Lifeng Lin, Rongfeng Lu, Quan Chen, Haofan Ren, Ming Lu, Yaoqi Sun, Chenggang Yan, Anke Xue

    Abstract: Sparse-view 3D reconstruction is a fundamental yet challenging task in practical 3D reconstruction applications. Recently, many methods based on the 3D Gaussian Splatting (3DGS) framework have been proposed to address sparse-view 3D reconstruction. Although these methods have made considerable advancements, they still show significant issues with overfitting. To reduce the overfitting, we introduc… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: 10 pages,8 figures

  6. arXiv:2504.13787  [pdf, other

    cs.LG cs.AI

    Probabilistic Stability Guarantees for Feature Attributions

    Authors: Helen Jin, Anton Xue, Weiqiu You, Surbhi Goel, Eric Wong

    Abstract: Stability guarantees are an emerging tool for evaluating feature attributions, but existing certification methods rely on smoothed classifiers and often yield conservative guarantees. To address these limitations, we introduce soft stability and propose a simple, model-agnostic, and sample-efficient stability certification algorithm (SCA) that provides non-trivial and interpretable guarantees for… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  7. arXiv:2504.09502  [pdf, other

    cs.CV cs.LG

    PCM-SAR: Physics-Driven Contrastive Mutual Learning for SAR Classification

    Authors: Pengfei Wang, Hao Zheng, Zhigang Hu, Aikun Xu, Meiguang Zheng, Liu Yang

    Abstract: Existing SAR image classification methods based on Contrastive Learning often rely on sample generation strategies designed for optical images, failing to capture the distinct semantic and physical characteristics of SAR data. To address this, we propose Physics-Driven Contrastive Mutual Learning for SAR Classification (PCM-SAR), which incorporates domain-specific physical insights to improve samp… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  8. arXiv:2504.09037  [pdf, other

    cs.AI cs.CL

    A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems

    Authors: Zixuan Ke, Fangkai Jiao, Yifei Ming, Xuan-Phi Nguyen, Austin Xu, Do Xuan Long, Minzhi Li, Chengwei Qin, Peifeng Wang, Silvio Savarese, Caiming Xiong, Shafiq Joty

    Abstract: Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems from conventional models that empower chatbots. In this survey, we categorize existing methods along two orthogonal dimensions: (1) Regimes, whi… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 72 pages, 6 figures

  9. arXiv:2503.17503  [pdf, other

    cs.LG physics.geo-ph stat.ML

    Towards Understanding the Benefits of Neural Network Parameterizations in Geophysical Inversions: A Study With Neural Fields

    Authors: Anran Xu, Lindsey J. Heagy

    Abstract: In this work, we employ neural fields, which use neural networks to map a coordinate to the corresponding physical property value at that coordinate, in a test-time learning manner. For a test-time learning method, the weights are learned during the inversion, as compared to traditional approaches which require a network to be trained using a training data set. Results for synthetic examples in se… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  10. arXiv:2503.15843  [pdf, other

    quant-ph cs.ET

    Reducing T Gates with Unitary Synthesis

    Authors: Tianyi Hao, Amanda Xu, Swamit Tannu

    Abstract: Quantum error correction is essential for achieving practical quantum computing but has a significant computational overhead. Among fault-tolerant (FT) gate operations, non-Clifford gates, such as $T$, are particularly expensive due to their reliance on magic state distillation. These costly $T$ gates appear frequently in FT circuits as many quantum algorithms require arbitrary single-qubit rotati… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  11. arXiv:2503.15620  [pdf, other

    cs.CL cs.AI cs.LG

    Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings

    Authors: Austin Xu, Srijan Bansal, Yifei Ming, Semih Yavuz, Shafiq Joty

    Abstract: The large language model (LLM)-as-judge paradigm has been used to meet the demand for a cheap, reliable, and fast evaluation of model outputs during AI system development and post-deployment monitoring. While judge models -- LLMs finetuned to specialize in assessing and critiquing model outputs -- have been touted as general purpose evaluators, they are typically evaluated only on non-contextual s… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 23 pages, 13 figures, 6 tables

  12. arXiv:2503.05026  [pdf, other

    cs.RO

    Ergodic Exploration over Meshable Surfaces

    Authors: Dayi Dong, Albert Xu, Geordan Gutow, Howie Choset, Ian Abraham

    Abstract: Robotic search and rescue, exploration, and inspection require trajectory planning across a variety of domains. A popular approach to trajectory planning for these types of missions is ergodic search, which biases a trajectory to spend time in parts of the exploration domain that are believed to contain more information. Most prior work on ergodic search has been limited to searching simple surfac… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: 6 content pages, 1 references page, 6 figures, International Conference on Robotics and Automation 2025

  13. arXiv:2502.12169  [pdf, other

    physics.ins-det cs.LG hep-ex

    Antimatter Annihilation Vertex Reconstruction with Deep Learning for ALPHA-g Radial Time Projection Chamber

    Authors: Ashley Ferreira, Mahip Singh, Yukiya Saito, Andrea Capra, Ina Carli, Daniel Duque Quiceno, Wojciech T. Fedorko, Makoto C. Fujiwara, Muyan Li, Lars Martin, Gareth Smith, Anqui Xu

    Abstract: The ALPHA-g experiment at CERN aims to precisely measure the terrestrial gravitational acceleration of antihydrogen atoms. A radial Time Projection Chamber (rTPC), that surrounds the ALPHA-g magnetic trap, is employed to determine the annihilation location, called the vertex. The standard approach requires identifying the trajectories of the ionizing particles in the rTPC from the location of thei… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  14. arXiv:2502.11364  [pdf, other

    cs.CL

    Blessing of Multilinguality: A Systematic Analysis of Multilingual In-Context Learning

    Authors: Yilei Tu, Andrew Xue, Freda Shi

    Abstract: While multilingual large language models generally perform adequately, and sometimes even rival English performance on high-resource languages (HRLs), they often significantly underperform on low-resource languages (LRLs). Among several prompting strategies aiming at bridging the gap, multilingual in-context learning (ICL) has been particularly effective when demonstration in target languages is u… ▽ More

    Submitted 18 February, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

  15. arXiv:2502.08566  [pdf

    cs.ET cs.CV cs.HC

    AR Glulam: Accurate Augmented Reality Using Multiple Fiducial Markers for Glulam Fabrication

    Authors: Alexander Htet Kyaw, Arvin Xu, Sasa Zivkovic, Gwyllim Jahn, Cameron Newnham, Nick Van Den Berg

    Abstract: Recent advancements in Augmented Reality (AR) have demonstrated applications in architecture, design, and fabrication. Compared to conventional 2D construction drawings, AR can be used to superimpose contextual instructions, display 3D spatial information and enable on-site engagement. Despite the potential of AR, the widespread adoption of the technology in the industry is limited by its precisio… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: 10 Figures, Project Paper for Association for Computer Aided Design in Architecture

  16. arXiv:2502.06693  [pdf, ps, other

    cs.LG cs.AI cs.CY

    Recent Advances, Applications and Open Challenges in Machine Learning for Health: Reflections from Research Roundtables at ML4H 2024 Symposium

    Authors: Amin Adibi, Xu Cao, Zongliang Ji, Jivat Neet Kaur, Winston Chen, Elizabeth Healey, Brighton Nuwagira, Wenqian Ye, Geoffrey Woollard, Maxwell A Xu, Hejie Cui, Johnny Xi, Trenton Chang, Vasiliki Bikia, Nicole Zhang, Ayush Noori, Yuan Xia, Md. Belal Hossain, Hanna A. Frank, Alina Peluso, Yuan Pu, Shannon Zejiang Shen, John Wu, Adibvafa Fallahpour, Sazan Mahbub , et al. (17 additional authors not shown)

    Abstract: The fourth Machine Learning for Health (ML4H) symposium was held in person on December 15th and 16th, 2024, in the traditional, ancestral, and unceded territories of the Musqueam, Squamish, and Tsleil-Waututh Nations in Vancouver, British Columbia, Canada. The symposium included research roundtable sessions to foster discussions between participants and senior researchers on timely and relevant to… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  17. arXiv:2502.01763  [pdf, other

    cs.LG math.OC stat.ML

    On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning

    Authors: Thomas T. Zhang, Behrad Moniri, Ansh Nagwekar, Faraz Rahman, Anton Xue, Hamed Hassani, Nikolai Matni

    Abstract: Layer-wise preconditioning methods are a family of memory-efficient optimization algorithms that introduce preconditioners per axis of each layer's weight tensors. These methods have seen a recent resurgence, demonstrating impressive performance relative to entry-wise ("diagonal") preconditioning methods such as Adam(W) on a wide range of neural network optimization tasks. Complementary to their p… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  18. arXiv:2502.01108  [pdf, other

    cs.LG cs.AI eess.SP

    Pulse-PPG: An Open-Source Field-Trained PPG Foundation Model for Wearable Applications Across Lab and Field Settings

    Authors: Mithun Saha, Maxwell A. Xu, Wanting Mao, Sameer Neupane, James M. Rehg, Santosh Kumar

    Abstract: Photoplethysmography (PPG)-based foundation models are gaining traction due to the widespread use of PPG in biosignal monitoring and their potential to generalize across diverse health applications. In this paper, we introduce Pulse-PPG, the first open-source PPG foundation model trained exclusively on raw PPG data collected over a 100-day field study with 120 participants. Existing PPG foundation… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: The first two listed authors contributed equally to this research

  19. arXiv:2501.09426  [pdf, other

    cs.CL

    AutoCBT: An Autonomous Multi-agent Framework for Cognitive Behavioral Therapy in Psychological Counseling

    Authors: Ancheng Xu, Di Yang, Renhao Li, Jingwei Zhu, Minghuan Tan, Min Yang, Wanxin Qiu, Mingchen Ma, Haihong Wu, Bingyu Li, Feng Sha, Chengming Li, Xiping Hu, Qiang Qu, Derek F. Wong, Ruifeng Xu

    Abstract: Traditional in-person psychological counseling remains primarily niche, often chosen by individuals with psychological issues, while online automated counseling offers a potential solution for those hesitant to seek help due to feelings of shame. Cognitive Behavioral Therapy (CBT) is an essential and widely used approach in psychological counseling. The advent of large language models (LLMs) and a… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  20. arXiv:2501.04733  [pdf

    cs.AI cs.ET cs.LG physics.ao-ph

    AI-Driven Reinvention of Hydrological Modeling for Accurate Predictions and Interpretation to Transform Earth System Modeling

    Authors: Cuihui Xia, Lei Yue, Deliang Chen, Yuyang Li, Hongqiang Yang, Ancheng Xue, Zhiqiang Li, Qing He, Guoqing Zhang, Dambaru Ballab Kattel, Lei Lei, Ming Zhou

    Abstract: Traditional equation-driven hydrological models often struggle to accurately predict streamflow in challenging regional Earth systems like the Tibetan Plateau, while hybrid and existing algorithm-driven models face difficulties in interpreting hydrological behaviors. This work introduces HydroTrace, an algorithm-driven, data-agnostic model that substantially outperforms these approaches, achieving… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

  21. arXiv:2501.02364  [pdf, other

    cs.LG cs.CV stat.ML

    Understanding How Nonlinear Layers Create Linearly Separable Features for Low-Dimensional Data

    Authors: Alec S. Xu, Can Yaras, Peng Wang, Qing Qu

    Abstract: Deep neural networks have attained remarkable success across diverse classification tasks. Recent empirical studies have shown that deep networks learn features that are linearly separable across classes. However, these findings often lack rigorous justifications, even under relatively simple settings. In this work, we address this gap by examining the linear separation capabilities of shallow non… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

    Comments: 32 pages, 9 figures

  22. arXiv:2412.17210  [pdf, other

    cs.CV

    Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection

    Authors: Hongsong Wang, Andi Xu, Pinle Ding, Jie Gui

    Abstract: Video Anomaly Detection (VAD) is essential for computer vision research. Existing VAD methods utilize either reconstruction-based or prediction-based frameworks. The former excels at detecting irregular patterns or structures, whereas the latter is capable of spotting abnormal deviations or trends. We address pose-based video anomaly detection and introduce a novel framework called Dual Conditione… ▽ More

    Submitted 8 March, 2025; v1 submitted 22 December, 2024; originally announced December 2024.

    Comments: Code is on https://github.com/guijiejie/DCMD-main

  23. arXiv:2412.06382  [pdf, other

    cs.LG cs.SE

    PyPulse: A Python Library for Biosignal Imputation

    Authors: Kevin Gao, Maxwell A. Xu, James M. Rehg, Alexander Moreno

    Abstract: We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings. Missingness is commonplace in these settings and can arise from multiple causes, such as insecure sensor attachment or data transmission loss. PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bio… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: 7 pages, 3 figures. Implementation and documentation are available at https://github.com/rehg-lab/pulseimpute

  24. arXiv:2412.04785  [pdf, other

    cs.LG cs.CR

    Differentially Private Random Feature Model

    Authors: Chunyang Liao, Deanna Needell, Alexander Xue

    Abstract: Designing privacy-preserving machine learning algorithms has received great attention in recent years, especially in the setting when the data contains sensitive information. Differential privacy (DP) is a widely used mechanism for data analysis with privacy guarantees. In this paper, we produce a differentially private random feature model. Random features, which were proposed to approximate larg… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

    Comments: Submitted to an IEEE journal

  25. arXiv:2412.03154  [pdf, other

    cs.LG cs.AI cs.SE

    Testing Neural Network Verifiers: A Soundness Benchmark with Hidden Counterexamples

    Authors: Xingjian Zhou, Hongji Xu, Andy Xu, Zhouxing Shi, Cho-Jui Hsieh, Huan Zhang

    Abstract: In recent years, many neural network (NN) verifiers have been developed to formally verify certain properties of neural networks such as robustness. Although many benchmarks have been constructed to evaluate the performance of NN verifiers, they typically lack a ground-truth for hard instances where no current verifier can verify and no counterexample can be found, which makes it difficult to chec… ▽ More

    Submitted 4 December, 2024; originally announced December 2024.

    Comments: Preprint

  26. arXiv:2412.00961  [pdf, other

    physics.data-an cs.LG

    AI Meets Antimatter: Unveiling Antihydrogen Annihilations

    Authors: Ashley Ferreira, Mahip Singh, Andrea Capra, Ina Carli, Daniel Duque Quiceno, Wojciech T. Fedorko, Makoto M. Fujiwara, Muyan Li, Lars Martin, Yukiya Saito, Gareth Smith, Anqi Xu

    Abstract: The ALPHA-g experiment at CERN aims to perform the first-ever direct measurement of the effect of gravity on antimatter, determining its weight to within 1% precision. This measurement requires an accurate prediction of the vertical position of annihilations within the detector. In this work, we present a novel approach to annihilation position reconstruction using an ensemble of models based on t… ▽ More

    Submitted 3 December, 2024; v1 submitted 1 December, 2024; originally announced December 2024.

    Comments: 6 pages, 4 figures, submitted to Machine Learning and the Physical Sciences Workshop at the 38th conference on Neural Information Processing Systems (NeurIPS)

  27. arXiv:2411.18822  [pdf, other

    eess.SP cs.AI cs.LG

    RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data

    Authors: Maxwell A. Xu, Jaya Narain, Gregory Darnell, Haraldur Hallgrimsson, Hyewon Jeong, Darren Forde, Richard Fineman, Karthik J. Raghuram, James M. Rehg, Shirley Ren

    Abstract: We present RelCon, a novel self-supervised Relative Contrastive learning approach for training a motion foundation model from wearable accelerometry sensors. First, a learnable distance measure is trained to capture motif similarity and domain-specific semantic information such as rotation invariance. Then, the learned distance provides a measurement of semantic similarity between a pair of accele… ▽ More

    Submitted 10 April, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: Accepted to ICLR 2025. Code here: https://github.com/maxxu05/relcon

    Journal ref: The Thirteenth International Conference on Learning Representations (ICLR), 2025

  28. arXiv:2411.17866  [pdf, other

    cs.LG

    Distributed Sign Momentum with Local Steps for Training Transformers

    Authors: Shuhua Yu, Ding Zhou, Cong Xie, An Xu, Zhi Zhang, Xin Liu, Soummya Kar

    Abstract: Pre-training Transformer models is resource-intensive, and recent studies have shown that sign momentum is an efficient technique for training large-scale deep learning models, particularly Transformers. However, its application in distributed training remains underexplored. This paper investigates a novel communication-efficient distributed sign momentum method with multiple local steps, to cope… ▽ More

    Submitted 7 March, 2025; v1 submitted 26 November, 2024; originally announced November 2024.

    Comments: Added convergence analysis for deterministic sign operator

  29. arXiv:2411.10761  [pdf, other

    cs.CL

    Can Generic LLMs Help Analyze Child-adult Interactions Involving Children with Autism in Clinical Observation?

    Authors: Tiantian Feng, Anfeng Xu, Rimita Lahiri, Helen Tager-Flusberg, So Hyun Kim, Somer Bishop, Catherine Lord, Shrikanth Narayanan

    Abstract: Large Language Models (LLMs) have shown significant potential in understanding human communication and interaction. However, their performance in the domain of child-inclusive interactions, including in clinical settings, remains less explored. In this work, we evaluate generic LLMs' ability to analyze child-adult dyadic interactions in a clinically relevant context involving children with ASD. Sp… ▽ More

    Submitted 16 November, 2024; originally announced November 2024.

    Comments: GenAI for Health Workshop, NeurIPS 2024

  30. arXiv:2411.06306  [pdf, other

    cs.RO cs.AI cs.HC

    Optimal Driver Warning Generation in Dynamic Driving Environment

    Authors: Chenran Li, Aolin Xu, Enna Sachdeva, Teruhisa Misu, Behzad Dariush

    Abstract: The driver warning system that alerts the human driver about potential risks during driving is a key feature of an advanced driver assistance system. Existing driver warning technologies, mainly the forward collision warning and unsafe lane change warning, can reduce the risk of collision caused by human errors. However, the current design methods have several major limitations. Firstly, the warni… ▽ More

    Submitted 9 November, 2024; originally announced November 2024.

    Comments: ICRA 2024

  31. Word reuse and combination support efficient communication of emerging concepts

    Authors: Aotao Xu, Charles Kemp, Lea Frermann, Yang Xu

    Abstract: A key function of the lexicon is to express novel concepts as they emerge over time through a process known as lexicalization. The most common lexicalization strategies are the reuse and combination of existing words, but they have typically been studied separately in the areas of word meaning extension and word formation. Here we offer an information-theoretic account of how both strategies are c… ▽ More

    Submitted 8 November, 2024; originally announced November 2024.

    Comments: Published in Proceedings of the National Academy of Sciences

    Journal ref: Proceedings of the National Academy of Sciences, 121(46), e2406971121 (2024)

  32. arXiv:2411.04469  [pdf, other

    cs.CV

    FreeCap: Hybrid Calibration-Free Motion Capture in Open Environments

    Authors: Aoru Xue, Yiming Ren, Zining Song, Mao Ye, Xinge Zhu, Yuexin Ma

    Abstract: We propose a novel hybrid calibration-free method FreeCap to accurately capture global multi-person motions in open environments. Our system combines a single LiDAR with expandable moving cameras, allowing for flexible and precise motion estimation in a unified world coordinate. In particular, We introduce a local-to-global pose-aware cross-sensor human-matching module that predicts the alignment… ▽ More

    Submitted 10 February, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

  33. arXiv:2411.04104  [pdf, other

    cs.PL quant-ph

    Optimizing Quantum Circuits, Fast and Slow

    Authors: Amanda Xu, Abtin Molavi, Swamit Tannu, Aws Albarghouthi

    Abstract: Optimizing quantum circuits is critical: the number of quantum operations needs to be minimized for a successful evaluation of a circuit on a quantum processor. In this paper we unify two disparate ideas for optimizing quantum circuits, rewrite rules, which are fast standard optimizer passes, and unitary synthesis, which is slow, requiring a search through the space of circuits. We present a clean… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Comments: ASPLOS 2025

  34. arXiv:2410.24178  [pdf, other

    cs.LG

    AR-Pro: Counterfactual Explanations for Anomaly Repair with Formal Properties

    Authors: Xiayan Ji, Anton Xue, Eric Wong, Oleg Sokolsky, Insup Lee

    Abstract: Anomaly detection is widely used for identifying critical errors and suspicious behaviors, but current methods lack interpretability. We leverage common properties of existing methods and recent advances in generative models to introduce counterfactual explanations for anomaly detection. Given an input, we generate its counterfactual as a diffusion-based repair that shows what a non-anomalous vers… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

  35. arXiv:2410.12013  [pdf, other

    cs.CL cs.AI cs.LG

    MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router

    Authors: Yanyue Xie, Zhi Zhang, Ding Zhou, Cong Xie, Ziang Song, Xin Liu, Yanzhi Wang, Xue Lin, An Xu

    Abstract: Mixture-of-Experts (MoE) architectures face challenges such as high memory consumption and redundancy in experts. Pruning MoE can reduce network weights while maintaining model performance. Motivated by the recent observation of emergent large magnitude features in Large Language Models (LLM) and MoE routing policy, we propose MoE-Pruner, a method that prunes weights with the smallest magnitudes m… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  36. arXiv:2410.03207  [pdf, other

    cs.HC

    StoryNavi: On-Demand Narrative-Driven Reconstruction of Video Play With Generative AI

    Authors: Alston Lantian Xu, Tianwei Ma, Tianmeng Liu, Can Liu, Alvaro Cassinelli

    Abstract: Manually navigating lengthy videos to seek information or answer questions can be a tedious and time-consuming task for users. We introduce StoryNavi, a novel system powered by VLLMs for generating customised video play experiences by retrieving materials from original videos. It directly answers users' query by constructing non-linear sequence with identified relevant clips to form a cohesive nar… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

  37. arXiv:2409.14664  [pdf, other

    cs.CL

    Direct Judgement Preference Optimization

    Authors: Peifeng Wang, Austin Xu, Yilun Zhou, Caiming Xiong, Shafiq Joty

    Abstract: Auto-evaluation is crucial for assessing response quality and offering feedback for model development. Recent studies have explored training large language models (LLMs) as generative judges to evaluate and critique other models' outputs. In this work, we investigate the idea of learning from both positive and negative data with preference optimization to enhance the evaluation capabilities of LLM… ▽ More

    Submitted 29 September, 2024; v1 submitted 22 September, 2024; originally announced September 2024.

    Comments: Preprint

  38. arXiv:2409.13684  [pdf, other

    cs.LG cs.AI

    The FIX Benchmark: Extracting Features Interpretable to eXperts

    Authors: Helen Jin, Shreya Havaldar, Chaehyeon Kim, Anton Xue, Weiqiu You, Helen Qu, Marco Gatti, Daniel A Hashimoto, Bhuvnesh Jain, Amin Madani, Masao Sako, Lyle Ungar, Eric Wong

    Abstract: Feature-based methods are commonly used to explain model predictions, but these methods often implicitly assume that interpretable features are readily available. However, this is often not the case for high-dimensional data, and it can be hard even for domain experts to mathematically specify which features are important. Can we instead automatically extract collections or groups of features that… ▽ More

    Submitted 23 December, 2024; v1 submitted 20 September, 2024; originally announced September 2024.

  39. arXiv:2409.11376  [pdf, other

    cs.LG

    Towards Time Series Reasoning with LLMs

    Authors: Winnie Chow, Lauren Gardiner, Haraldur T. Hallgrímsson, Maxwell A. Xu, Shirley You Ren

    Abstract: Multi-modal large language models (MLLMs) have enabled numerous advances in understanding and reasoning in domains like vision, but we have not yet seen this broad success for time-series. Although prior works on time-series MLLMs have shown promising performance in time-series forecasting, very few works show how an LLM could be used for time-series reasoning in natural language. We propose a nov… ▽ More

    Submitted 4 December, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: Oral Presentation at 2024 NeurIPS Workshop on Time Series in the Age of Large Models

  40. arXiv:2409.09916  [pdf, other

    cs.CL cs.AI

    SFR-RAG: Towards Contextually Faithful LLMs

    Authors: Xuan-Phi Nguyen, Shrey Pandit, Senthil Purushwalkam, Austin Xu, Hailin Chen, Yifei Ming, Zixuan Ke, Silvio Savarese, Caiming Xong, Shafiq Joty

    Abstract: Retrieval Augmented Generation (RAG), a paradigm that integrates external contextual information with large language models (LLMs) to enhance factual accuracy and relevance, has emerged as a pivotal area in generative AI. The LLMs used in RAG applications are required to faithfully and completely comprehend the provided context and users' questions, avoid hallucination, handle unanswerable, counte… ▽ More

    Submitted 15 September, 2024; originally announced September 2024.

    Comments: Technical report

  41. arXiv:2409.09340  [pdf, other

    cs.SD cs.AI eess.AS

    Egocentric Speaker Classification in Child-Adult Dyadic Interactions: From Sensing to Computational Modeling

    Authors: Tiantian Feng, Anfeng Xu, Xuan Shi, Somer Bishop, Shrikanth Narayanan

    Abstract: Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by challenges in social communication, repetitive behavior, and sensory processing. One important research area in ASD is evaluating children's behavioral changes over time during treatment. The standard protocol with this objective is BOSCC, which involves dyadic interactions between a child and clinicians performing a… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    Comments: pre-print under review

  42. arXiv:2409.09164  [pdf, other

    cs.RO

    Measure Preserving Flows for Ergodic Search in Convoluted Environments

    Authors: Albert Xu, Bhaskar Vundurthy, Geordan Gutow, Ian Abraham, Jeff Schneider, Howie Choset

    Abstract: Autonomous robotic search has important applications in robotics, such as the search for signs of life after a disaster. When \emph{a priori} information is available, for example in the form of a distribution, a planner can use that distribution to guide the search. Ergodic search is one method that uses the information distribution to generate a trajectory that minimizes the ergodic metric, in t… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: 15 pages, accepted to DARS 2024

  43. arXiv:2409.08605  [pdf, other

    eess.AS cs.SD

    Effective Integration of KAN for Keyword Spotting

    Authors: Anfeng Xu, Biqiao Zhang, Shuyu Kong, Yiteng Huang, Zhaojun Yang, Sangeeta Srivastava, Ming Sun

    Abstract: Keyword spotting (KWS) is an important speech processing component for smart devices with voice assistance capability. In this paper, we investigate if Kolmogorov-Arnold Networks (KAN) can be used to enhance the performance of KWS. We explore various approaches to integrate KAN for a model architecture based on 1D Convolutional Neural Networks (CNN). We find that KAN is effective at modeling high-… ▽ More

    Submitted 11 January, 2025; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Accepted to ICASSP 2025

  44. arXiv:2409.07200  [pdf, other

    cs.CV cs.AI

    ThermalGaussian: Thermal 3D Gaussian Splatting

    Authors: Rongfeng Lu, Hangyu Chen, Zunjie Zhu, Yuhang Qin, Ming Lu, Le Zhang, Chenggang Yan, Anke Xue

    Abstract: Thermography is especially valuable for the military and other users of surveillance cameras. Some recent methods based on Neural Radiance Fields (NeRF) are proposed to reconstruct the thermal scenes in 3D from a set of thermal and RGB images. However, unlike NeRF, 3D Gaussian splatting (3DGS) prevails due to its rapid training and real-time rendering. In this work, we propose ThermalGaussian, the… ▽ More

    Submitted 22 April, 2025; v1 submitted 11 September, 2024; originally announced September 2024.

    Comments: 10 pages, 7 figures

  45. arXiv:2409.04897  [pdf, other

    cs.DS cs.CY cs.LG econ.TH stat.ML

    Centralized Selection with Preferences in the Presence of Biases

    Authors: L. Elisa Celis, Amit Kumar, Nisheeth K. Vishnoi, Andrew Xu

    Abstract: This paper considers the scenario in which there are multiple institutions, each with a limited capacity for candidates, and candidates, each with preferences over the institutions. A central entity evaluates the utility of each candidate to the institutions, and the goal is to select candidates for each institution in a way that maximizes utility while also considering the candidates' preferences… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

    Comments: The conference version of this paper appears in ICML 2024

  46. arXiv:2409.01666  [pdf, other

    cs.CL

    In Defense of RAG in the Era of Long-Context Language Models

    Authors: Tan Yu, Anbang Xu, Rama Akkiraju

    Abstract: Overcoming the limited context limitations in early-generation LLMs, retrieval-augmented generation (RAG) has been a reliable solution for context-based answer generation in the past. Recently, the emergence of long-context LLMs allows the models to incorporate much longer text sequences, making RAG less attractive. Recent studies show that long-context LLMs significantly outperform RAG in long-co… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  47. arXiv:2408.10499  [pdf, other

    cs.HC cs.AI cs.PL

    ProgramAlly: Creating Custom Visual Access Programs via Multi-Modal End-User Programming

    Authors: Jaylin Herskovitz, Andi Xu, Rahaf Alharbi, Anhong Guo

    Abstract: Existing visual assistive technologies are built for simple and common use cases, and have few avenues for blind people to customize their functionalities. Drawing from prior work on DIY assistive technology, this paper investigates end-user programming as a means for users to create and customize visual access programs to meet their unique needs. We introduce ProgramAlly, a system for creating cu… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: UIST 2024

  48. arXiv:2408.08300  [pdf, other

    cs.SE cs.LG

    HELP: Hierarchical Embeddings-based Log Parsing

    Authors: Andy Xu, Arno Gau

    Abstract: Logs are a first-hand source of information for software maintenance and failure diagnosis. Log parsing, which converts semi-structured log messages into structured templates, is a prerequisite for automated log analysis tasks such as anomaly detection, troubleshooting, and root cause analysis. However, existing log parsers fail in real-world systems for three main reasons. First, traditional heur… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  49. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Lluis Castrejon, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis , et al. (237 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 21 December, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

  50. arXiv:2407.17722  [pdf, other

    cs.IR cs.LG

    Text-Driven Neural Collaborative Filtering Model for Paper Source Tracing

    Authors: Aobo Xu, Bingyu Chang, Qingpeng Liu, Ling Jian

    Abstract: Identifying significant references within the complex interrelations of a citation knowledge graph is challenging, which encompasses connections through citations, authorship, keywords, and other relational attributes. The Paper Source Tracing (PST) task seeks to automate the identification of pivotal references for given scholarly articles utilizing advanced data mining techniques. In the KDD CUP… ▽ More

    Submitted 19 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: KDD CUP 2024 OAG-Challenges, Paper Source Tracing, Technical Report of Team AoboSama @ KDD CUP 2024. August 25--29, 2024. Barcelona, Spain