Skip to main content

Showing 1–50 of 789 results for author: Taen, T

.
  1. arXiv:2506.09965  [pdf, ps, other

    cs.CV cs.AI

    Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

    Authors: Junfei Wu, Jian Guan, Kaituo Feng, Qiang Liu, Shu Wu, Liang Wang, Wei Wu, Tieniu Tan

    Abstract: As textual reasoning with large language models (LLMs) has advanced significantly, there has been growing interest in enhancing the multimodal reasoning capabilities of large vision-language models (LVLMs). However, existing methods primarily approach multimodal reasoning in a straightforward, text-centric manner, where both reasoning and answer derivation are conducted purely through text, with t… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    ACM Class: I.2

  2. arXiv:2506.09079  [pdf, ps, other

    cs.CV cs.AI

    VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks

    Authors: Xinlong Chen, Yuanxing Zhang, Yushuo Guan, Bohan Zeng, Yang Shi, Sihan Yang, Pengfei Wan, Qiang Liu, Liang Wang, Tieniu Tan

    Abstract: Recent advancements in multimodal large language models have successfully extended the Reason-Then-Respond paradigm to image-based reasoning, yet video-based reasoning remains an underdeveloped frontier, primarily due to the scarcity of high-quality reasoning-oriented data and effective training methodologies. To bridge this gap, we introduce DarkEventInfer and MixVidQA, two novel datasets specifi… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  3. arXiv:2506.07961  [pdf, ps, other

    cs.RO cs.AI

    BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models

    Authors: Peiyan Li, Yixiang Chen, Hongtao Wu, Xiao Ma, Xiangnan Wu, Yan Huang, Liang Wang, Tao Kong, Tieniu Tan

    Abstract: Recently, leveraging pre-trained vision-language models (VLMs) for building vision-language-action (VLA) models has emerged as a promising approach to effective robot manipulation learning. However, only few methods incorporate 3D signals into VLMs for action prediction, and they do not fully leverage the spatial structure inherent in 3D data, leading to low sample efficiency. In this paper, we in… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: In Submission

  4. arXiv:2506.06043  [pdf, ps, other

    eess.IV

    Implicit Neural Representation-Based MRI Reconstruction Method with Sensitivity Map Constraints

    Authors: Lixuan Rao, Xinlin Zhang, Yiman Huang, Tao Tan, Tong Tong

    Abstract: Magnetic Resonance Imaging (MRI) is a widely utilized diagnostic tool in clinical settings, but its application is limited by the relatively long acquisition time. As a result, fast MRI reconstruction has become a significant area of research. In recent years, Implicit Neural Representation (INR), as a scan-specific method, has demonstrated outstanding performance in fast MRI reconstruction withou… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  5. arXiv:2506.04923  [pdf, ps, other

    astro-ph.EP

    Three Hot Jupiters transiting K-dwarfs with a significant heavy element mass

    Authors: Y. G. C. Frensch, F. Bouchy, G. Lo Curto, S. Ulmer-Moll, S. G. Sousa, N. C. Santos, K. G. Stassun, C. N. Watkins, H. Chakraborty, K. Barkaoui, M. Battley, W. Ceva, K. A. Collins, T. Daylan, P. Evans, J. P. Faria, C. Farret Jentink, E. Fontanet, E. Fridén, G. Furesz, M. Gillon, N. Grieves, C. Hellier, E. Jehin, J. M. Jenkins , et al. (28 additional authors not shown)

    Abstract: Albeit at a lower frequency than around hotter stars, short-period gas giants around low-mass stars ($T_\mathrm{eff} < 4965$ K) do exist, despite predictions from planetary population synthesis models that such systems should be exceedingly rare. By combining data from TESS and ground-based follow-up observations, we seek to confirm and characterize giant planets transiting K dwarfs, particularly… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: 22 pages, 17 figures, accepted for publication in A&A

  6. arXiv:2506.04737  [pdf, ps, other

    cs.CV

    Bridging Annotation Gaps: Transferring Labels to Align Object Detection Datasets

    Authors: Mikhail Kennerley, Angelica Aviles-Rivero, Carola-Bibiane Schönlieb, Robby T. Tan

    Abstract: Combining multiple object detection datasets offers a path to improved generalisation but is hindered by inconsistencies in class semantics and bounding box annotations. Some methods to address this assume shared label taxonomies and address only spatial inconsistencies; others require manual relabelling, or produce a unified label space, which may be unsuitable when a fixed target label space is… ▽ More

    Submitted 6 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  7. arXiv:2506.04116  [pdf, ps, other

    eess.IV cs.AI cs.CV

    A Diffusion-Driven Temporal Super-Resolution and Spatial Consistency Enhancement Framework for 4D MRI imaging

    Authors: Xuanru Zhou, Jiarun Liu, Shoujun Yu, Hao Yang, Cheng Li, Tao Tan, Shanshan Wang

    Abstract: In medical imaging, 4D MRI enables dynamic 3D visualization, yet the trade-off between spatial and temporal resolution requires prolonged scan time that can compromise temporal fidelity--especially during rapid, large-amplitude motion. Traditional approaches typically rely on registration-based interpolation to generate intermediate frames. However, these methods struggle with large deformations,… ▽ More

    Submitted 8 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

  8. arXiv:2506.03018  [pdf, ps, other

    physics.atom-ph nucl-ex physics.optics quant-ph

    $^{229}$Th Nuclear Spectroscopy in an Opaque Material: Laser-Based Conversion Electron Mössbauer Spectroscopy of $^{229}$ThO$_2$

    Authors: Ricky Elwell, James E. S. Terhune, Christian Schneider, Harry W. T. Morgan, Hoang Bao Tran Tan, Udeshika C. Perera, Daniel A. Rehn, Marisa C. Alfonso, Lars von der Wense, Benedict Seiferle, Kevin Scharl, Peter G. Thirolf, Andrei Derevianko, Eric R. Hudson

    Abstract: Here, we report the first demonstration of laser-induced conversion electron Mössbauer spectroscopy of the $^{229}$Th nuclear isomeric state, which provides the ability to probe the nuclear transition in a material that is opaque to light resonant with the nuclear transition. Specifically, we excite the nuclear transition in a thin ThO$_2$ sample whose band gap ($\sim$ 6 eV) is considerably smalle… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: 18 pages, 7 figures

  9. arXiv:2505.20844  [pdf, ps, other

    quant-ph

    Engineering continuous-variable entanglement in mechanical oscillators with optimal control

    Authors: Maverick J. Millican, Vassili G. Matsos, Christophe H. Valahu, Tomas Navickas, Liam J. Bond, Ting Rei Tan

    Abstract: We demonstrate an optimal quantum control strategy for the deterministic preparation of entangled harmonic oscillator states in trapped ions. The protocol employs dynamical phase modulation of laser-driven Jaynes-Cummings and anti-Jaynes-Cummings interactions. We prepare Two-Mode Squeezed Vacuum (TMSV) states in the mechanical motions of a trapped ion and characterize the states with phase-space t… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 10 pages, 5 figures

  10. arXiv:2505.18933  [pdf, ps, other

    cs.AI cs.CL

    REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing

    Authors: Haitian Zhong, Yuhuan Liu, Ziyang Xu, Guofan Liu, Qiang Liu, Shu Wu, Zhe Zhao, Liang Wang, Tieniu Tan

    Abstract: Large language model editing methods frequently suffer from overfitting, wherein factual updates can propagate beyond their intended scope, overemphasizing the edited target even when it's contextually inappropriate. To address this challenge, we introduce REACT (Representation Extraction And Controllable Tuning), a unified two-phase framework designed for precise and controllable knowledge editin… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 15 pages, 4 figures

  11. arXiv:2505.17061  [pdf, ps, other

    cs.CL cs.AI cs.CV

    Mixture of Decoding: An Attention-Inspired Adaptive Decoding Strategy to Mitigate Hallucinations in Large Vision-Language Models

    Authors: Xinlong Chen, Yuanxing Zhang, Qiang Liu, Junfei Wu, Fuzheng Zhang, Tieniu Tan

    Abstract: Large Vision-Language Models (LVLMs) have exhibited impressive capabilities across various visual tasks, yet they remain hindered by the persistent challenge of hallucinations. To address this critical issue, we propose Mixture of Decoding (MoD), a novel approach for hallucination mitigation that dynamically adapts decoding strategies by evaluating the correctness of the model's attention on image… ▽ More

    Submitted 10 June, 2025; v1 submitted 17 May, 2025; originally announced May 2025.

    Comments: Accepted to Findings of ACL 2025

  12. arXiv:2505.13503  [pdf

    stat.ME q-bio.GN

    Ancestry-Adjusted Polygenic Risk Scores for Predicting Obesity Risk in the Indonesian Population

    Authors: Jocelyn Verna Siswanto, Belinda Mutiara, Felicia Austin, Jonathan Susanto, Cathelyn Theophila Tan, Restu Unggul Kresnadi, Kezia Irene

    Abstract: Obesity prevalence in Indonesian adults increased from 10.5% in 2007 to 23.4% in 2023. Studies showed that genetic predisposition significantly influences obesity susceptibility. To aid this, polygenic risk scores (PRS) help aggregate the effects of numerous genetic variants to assess genetic risk. However, 91% of genome-wide association studies (GWAS) involve European populations, limiting their… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 7 pages, 8 figures

    MSC Class: 62P10; 92B15

  13. arXiv:2505.12250  [pdf, ps, other

    cs.CL cs.AI

    Not All Documents Are What You Need for Extracting Instruction Tuning Data

    Authors: Chi Zhang, Huaping Zhong, Hongtao Li, Chengliang Chai, Jiawei Hong, Yuhao Deng, Jiacheng Wang, Tian Tan, Yizhou Yan, Jiantao Qiu, Ye Yuan, Guoren Wang, Conghui He, Lei Cao

    Abstract: Instruction tuning improves the performance of large language models (LLMs), but it heavily relies on high-quality training data. Recently, LLMs have been used to synthesize instruction data using seed question-answer (QA) pairs. However, these synthesized instructions often lack diversity and tend to be similar to the input seeds, limiting their applicability in real-world scenarios. To address t… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  14. arXiv:2505.10981  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory

    Authors: Yexiang Liu, Zekun Li, Zhi Fang, Nan Xu, Ran He, Tieniu Tan

    Abstract: Recently, scaling test-time compute on Large Language Models (LLM) has garnered wide attention. However, there has been limited investigation of how various reasoning prompting strategies perform as scaling. In this paper, we focus on a standard and realistic scaling setting: majority voting. We systematically conduct experiments on 6 LLMs $\times$ 8 prompting strategies $\times$ 6 benchmarks. Exp… ▽ More

    Submitted 4 June, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

    Comments: ACL 2025 Main, 33 pages, 51 figures

  15. arXiv:2505.09965  [pdf, ps, other

    cs.CV

    MambaControl: Anatomy Graph-Enhanced Mamba ControlNet with Fourier Refinement for Diffusion-Based Disease Trajectory Prediction

    Authors: Hao Yang, Tao Tan, Shuai Tan, Weiqin Yang, Kunyan Cai, Calvin Chen, Yue Sun

    Abstract: Modelling disease progression in precision medicine requires capturing complex spatio-temporal dynamics while preserving anatomical integrity. Existing methods often struggle with longitudinal dependencies and structural consistency in progressive disorders. To address these limitations, we introduce MambaControl, a novel framework that integrates selective state-space modelling with diffusion pro… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  16. arXiv:2505.09493  [pdf, ps, other

    astro-ph.CO

    DESI DR1 Lyα 1D power spectrum: The Fast Fourier Transform estimator measurement

    Authors: Corentin Ravoux, Marie-Lynn Abdul-Karim, Jean-Marc Le Goff, Eric Armengaud, Jessica N. Aguilar, Steven Ahlen, Stephen Bailey, Davide Bianchi, Allyson Brodzeller, David Brooks, Jonás Chaves-Montero, Todd Claybaugh, Andrei Cuceu, Roger de Belsunce, Axel de la Macorra, Arjun Dey, Zhejie Ding, Peter Doel, Simone Ferraro, Andreu Font-Ribera, Jaime E. Forero-Romero, Enrique Gaztañaga, Naim Göksel Karaçaylı, Satya Gontcho A Gontcho, Gaston Gutierrez , et al. (42 additional authors not shown)

    Abstract: We present the one-dimensional Lyman-alpha forest power spectrum measurement derived from the data release 1 (DR1) of the Dark Energy Spectroscopic Instrument (DESI). The measurement of the Lyman-alpha forest power spectrum along the line of sight from high-redshift quasar spectra provides information on the shape of the linear matter power spectrum, neutrino masses, and the properties of dark mat… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 40 pages, 16 figures

  17. arXiv:2505.09329  [pdf, ps, other

    cs.CV cs.AI

    BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis

    Authors: Jiarun Liu, Hong-Yu Zhou, Weijian Huang, Hao Yang, Dongning Song, Tao Tan, Yong Liang, Shanshan Wang

    Abstract: Scaling up model and data size have demonstrated impressive performance improvement over a wide range of tasks. Despite extensive studies on scaling behaviors for general-purpose tasks, medical images exhibit substantial differences from natural data. It remains unclear the key factors in developing medical vision foundation models at scale due to the absence of an extensive understanding of scali… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 11 pages, 4 figures

  18. arXiv:2505.07974  [pdf, ps, other

    astro-ph.CO

    DESI DR1 Ly$α$ 1D power spectrum: The optimal estimator measurement

    Authors: N. G. Karaçaylı, P. Martini, J. Aguilar, S. Ahlen, E. Armengaud, S. Bailey, A. Bault, D. Bianchi, A. Brodzeller, D. Brooks, J. Chaves-Montero, T. Claybaugh, A. Cuceu, A. de la Macorra, A. Dey, B. Dey, P. Doel, S. Ferraro, A. Font-Ribera, J. E. Forero-Romero, E. Gaztañaga, S. Gontcho A Gontcho, G. Gutierrez, J. Guy, C. Hahn , et al. (39 additional authors not shown)

    Abstract: The one-dimensional power spectrum $P_{\mathrm{1D}}$ of Ly$α$ forest offers rich insights into cosmological and astrophysical parameters, including constraints on the sum of neutrino masses, warm dark matter models, and the thermal state of the intergalactic medium. We present the measurement of $P_{\mathrm{1D}}$ using the optimal quadratic maximum likelihood estimator applied to over 300,000 Ly… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: 40 pages, 15 figures

  19. arXiv:2505.07170  [pdf

    cs.ET

    Empowering the Grid: Collaborative Edge Artificial Intelligence for Decentralized Energy Systems

    Authors: Eddie de Paula Jr, Niel Bunda, Hezerul Abdul Karim, Nouar AlDahoul, Myles Joshua Toledo Tan

    Abstract: This paper examines how decentralized energy systems can be enhanced using collaborative Edge Artificial Intelligence. Decentralized grids use local renewable sources to reduce transmission losses and improve energy security. Edge AI enables real-time, privacy-preserving data processing at the network edge. Techniques such as federated learning and distributed control improve demand response, equi… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: 16 pages, 1 table

  20. arXiv:2505.03621  [pdf, other

    cs.CV

    PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing

    Authors: Yiping Xie, Bo Zhao, Mingtong Dai, Jian-Ping Zhou, Yue Sun, Tao Tan, Weicheng Xie, Linlin Shen, Zitong Yu

    Abstract: Remote photoplethysmography (rPPG) enables non-contact physiological measurement but remains highly susceptible to illumination changes, motion artifacts, and limited temporal modeling. Large Language Models (LLMs) excel at capturing long-range dependencies, offering a potential solution but struggle with the continuous, noise-sensitive nature of rPPG signals due to their text-centric design. To b… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  21. arXiv:2505.01596  [pdf, ps, other

    astro-ph.IM astro-ph.CO

    Using Active Learning to Improve Quasar Identification for the DESI Spectra Processing Pipeline

    Authors: Dylan Green, David Kirkby, J. Aguilar, S. Ahlen, D. M. Alexander, E. Armengaud, S. Bailey, A. Bault, D. Bianchi, A. Brodzeller, D. Brooks, T. Claybaugh, R. de Belsunce, A. de la Macorra, P. Doel, V. A. Fawcett, S. Ferraro, A. Font-Ribera, J. E. Forero-Romero, E. Gaztañaga, S. Gontcho A Gontcho, G. Gutierrez, M. Ishak, S. Juneau, R. Kehoe , et al. (29 additional authors not shown)

    Abstract: The Dark Energy Spectroscopic Instrument (DESI) survey uses an automatic spectral classification pipeline to classify spectra. QuasarNET is a convolutional neural network used as part of this pipeline originally trained using data from the Baryon Oscillation Spectroscopic Survey (BOSS). In this paper we implement an active learning algorithm to optimally select spectra to use for training a new ve… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 23 pages, 9 figures. Prepared for submission to JCAP

  22. arXiv:2504.20491  [pdf, ps, other

    cs.LO

    Separation and Definability in Fragments of Two-Variable First-Order Logic with Counting

    Authors: Louwe Kuijer, Tony Tan, Frank Wolter, Michael Zakharyaschev

    Abstract: For fragments L of first-order logic (FO) with counting quantifiers, we consider the definability problem, which asks whether a given L-formula can be equivalently expressed by a formula in some fragment of L without counting, and the more general separation problem asking whether two mutually exclusive L-formulas can be separated in some counting-free fragment of L. We show that separation is und… ▽ More

    Submitted 30 April, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

    Comments: The article has been accepted for LICS 2025

    MSC Class: 03B45 (Primary) 03C40 (Secondary)

  23. arXiv:2504.18178  [pdf, other

    cs.CG

    Smallest Intersecting and Enclosing Balls

    Authors: Jiaqi Zheng, Tiow-Seng Tan

    Abstract: We study the smallest intersecting and enclosing ball problems in Euclidean spaces for input objects that are compact and convex. They link and unify many problems in computational geometry and machine learning. We show that both problems can be modeled as zero-sum games, and propose an approximation algorithm for the former. Specifically, the algorithm produces the first results in high-dimension… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: Computational Geometry: Young Researchers Forum (CG:YRF), 2025

  24. arXiv:2504.16036  [pdf

    physics.med-ph eess.SP physics.app-ph

    Rotational ultrasound and photoacoustic tomography of the human body

    Authors: Yang Zhang, Shuai Na, Jonathan J. Russin, Karteekeya Sastry, Li Lin, Junfu Zheng, Yilin Luo, Xin Tong, Yujin An, Peng Hu, Konstantin Maslov, Tze-Woei Tan, Charles Y. Liu, Lihong V. Wang

    Abstract: Imaging the human body's morphological and angiographic information is essential for diagnosing, monitoring, and treating medical conditions. Ultrasonography performs the morphological assessment of the soft tissue based on acoustic impedance variations, whereas photoacoustic tomography (PAT) can visualize blood vessels based on intrinsic hemoglobin absorption. Three-dimensional (3D) panoramic ima… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  25. arXiv:2504.15037  [pdf, ps, other

    cs.LG

    Scaling and Beyond: Advancing Spatial Reasoning in MLLMs Requires New Recipes

    Authors: Huanyu Zhang, Chengzu Li, Wenshan Wu, Shaoguang Mao, Yifan Zhang, Haochen Tian, Ivan Vulić, Zhang Zhang, Liang Wang, Tieniu Tan, Furu Wei

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks. However, recent studies have exposed critical limitations in their spatial reasoning capabilities. This deficiency in spatial reasoning significantly constrains MLLMs' ability to interact effectively with the physical world, thereby limiting their broader applications. We argue that… ▽ More

    Submitted 3 June, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

  26. arXiv:2504.12711  [pdf, other

    cs.CV cs.AI eess.IV

    NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

    Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More

    Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

  27. arXiv:2504.11368  [pdf, other

    cs.CV

    From Gaze to Insight: Bridging Human Visual Attention and Vision Language Model Explanation for Weakly-Supervised Medical Image Segmentation

    Authors: Jingkun Chen, Haoran Duan, Xiao Zhang, Boyan Gao, Tao Tan, Vicente Grau, Jungong Han

    Abstract: Medical image segmentation remains challenging due to the high cost of pixel-level annotations for training. In the context of weak supervision, clinician gaze data captures regions of diagnostic interest; however, its sparsity limits its use for segmentation. In contrast, vision-language models (VLMs) provide semantic context through textual descriptions but lack the explanation precision require… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

    Comments: 10 pages, 5 figures

    MSC Class: 68T45 ACM Class: I.2.10; I.4.8

  28. arXiv:2504.07780  [pdf, other

    cond-mat.str-el cond-mat.other

    Interference-caged quantum many-body scars: the Fock space topological localization and interference zeros

    Authors: Tao-Lin Tan, Yi-Ping Huang

    Abstract: We propose a general mechanism for realizing athermal finite-energy-density eigenstates -- termed interference-caged quantum many-body scars (ICQMBS) -- which originate from exact many-body destructive interference on the Fock space graph. These eigenstates are strictly localized to specific subsets of vertices, analogous to compact localized states in flat-band systems. Central to our framework i… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: 51 pages, 23 figures

  29. arXiv:2504.03798  [pdf, other

    cs.CY cs.AI

    An Intelligent and Privacy-Preserving Digital Twin Model for Aging-in-Place

    Authors: Yongjie Wang, Jonathan Cyril Leung, Ming Chen, Zhiwei Zeng, Benny Toh Hsiang Tan, Yang Qiu, Zhiqi Shen

    Abstract: The population of older adults is steadily increasing, with a strong preference for aging-in-place rather than moving to care facilities. Consequently, supporting this growing demographic has become a significant global challenge. However, facilitating successful aging-in-place is challenging, requiring consideration of multiple factors such as data privacy, health status monitoring, and living en… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: accepted to IEEE TENSYMP 2025

    MSC Class: 68T05; ACM Class: I.2; J.3

  30. arXiv:2504.03641  [pdf, other

    cs.CV

    MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

    Authors: Wulin Xie, Yi-Fan Zhang, Chaoyou Fu, Yang Shi, Bingyan Nie, Hongkai Chen, Zhang Zhang, Liang Wang, Tieniu Tan

    Abstract: Existing MLLM benchmarks face significant challenges in evaluating Unified MLLMs (U-MLLMs) due to: 1) lack of standardized benchmarks for traditional tasks, leading to inconsistent comparisons; 2) absence of benchmarks for mixed-modality generation, which fails to assess multimodal reasoning capabilities. We present a comprehensive evaluation framework designed to systematically assess U-MLLMs. Ou… ▽ More

    Submitted 7 April, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

    Comments: Project page: https://mme-unify.github.io/

  31. arXiv:2504.03193  [pdf, other

    cs.CV

    Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation

    Authors: Xin Zhang, Robby T. Tan

    Abstract: Vision Foundation Models (VFMs) and Vision-Language Models (VLMs) have gained traction in Domain Generalized Semantic Segmentation (DGSS) due to their strong generalization capabilities. However, existing DGSS methods often rely exclusively on either VFMs or VLMs, overlooking their complementary strengths. VFMs (e.g., DINOv2) excel at capturing fine-grained features, while VLMs (e.g., CLIP) provid… ▽ More

    Submitted 15 April, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR 2025 (Highlight)

  32. arXiv:2504.02842  [pdf, other

    eess.SP cs.LG stat.AP stat.ME

    Enhanced ECG Arrhythmia Detection Accuracy by Optimizing Divergence-Based Data Fusion

    Authors: Baozhuo Su, Qingli Dou, Kang Liu, Zhengxian Qu, Jerry Deng, Ting Tan, Yanan Gu

    Abstract: AI computation in healthcare faces significant challenges when clinical datasets are limited and heterogeneous. Integrating datasets from multiple sources and different equipments is critical for effective AI computation but is complicated by their diversity, complexity, and lack of representativeness, so we often need to join multiple datasets for analysis. The currently used method is fusion aft… ▽ More

    Submitted 19 March, 2025; originally announced April 2025.

    Comments: 13 pages, 8 figures, 6 tables

  33. arXiv:2504.02264  [pdf, other

    cs.CV

    MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception

    Authors: Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao, Qiannan Guo, Jiayin Zhu, Pengfei Li, Zilong Chen, Huiming Yang, Zhiwei Li, Lening Wang, Tiao Tan, Huaping Liu

    Abstract: Advanced driver assistance systems require a comprehensive understanding of the driver's mental/physical state and traffic context but existing works often neglect the potential benefits of joint learning between these tasks. This paper proposes MMTL-UniAD, a unified multi-modal multi-task learning framework that simultaneously recognizes driver behavior (e.g., looking around, talking), driver emo… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  34. arXiv:2503.21124  [pdf, other

    cs.CV

    AdaMHF: Adaptive Multimodal Hierarchical Fusion for Survival Prediction

    Authors: Shuaiyu Zhang, Xun Lin, Rongxiang Zhang, Yu Bai, Yong Xu, Tao Tan, Xunbin Zheng, Zitong Yu

    Abstract: The integration of pathologic images and genomic data for survival analysis has gained increasing attention with advances in multimodal learning. However, current methods often ignore biological characteristics, such as heterogeneity and sparsity, both within and across modalities, ultimately limiting their adaptability to clinical practice. To address these challenges, we propose AdaMHF: Adaptive… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: Accepted by ICME 2025

  35. arXiv:2503.20211  [pdf, other

    cs.CV cs.RO

    Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors

    Authors: Weilong Yan, Ming Li, Haipeng Li, Shuwei Shao, Robby T. Tan

    Abstract: Self-supervised depth estimation from monocular cameras in diverse outdoor conditions, such as daytime, rain, and nighttime, is challenging due to the difficulty of learning universal representations and the severe lack of labeled real-world adverse data. Previous methods either rely on synthetic inputs and pseudo-depth labels or directly apply daytime strategies to adverse conditions, resulting i… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  36. TOI-2005b: An Eccentric Warm Jupiter in Spin-Orbit Alignment

    Authors: Allyson Bieryla, Jiayin Dong, George Zhou, Jason D. Eastman, L. C. Mayorga, David W. Latham, Brad Carter, Chelsea X. Huang, Samuel N. Quinn, Karen A. Collins, Lyu Abe, Yuri Beletsky, Rafael Brahm, Nicole D. Colón, Zahra Ensak, Tristan Guillot, Thomas Henning, Melissa J. Hobson, Keith Horne, Jon M. Jenkins, Matías I. Jones, Andrés Jordán, David Osip, George R. Ricker, Joseph E. Rodriguez , et al. (14 additional authors not shown)

    Abstract: We report the discovery and characterization of TOI-2005b, a warm Jupiter on an eccentric (e~0.59), 17.3-day orbit around a V_mag = 9.867 rapidly rotating F-star. The object was detected as a candidate by TESS and the planetary nature of TOI-2005b was then confirmed via a series of ground-based photometric, spectroscopic, and diffraction-limited imaging observations. The planet was found to reside… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: 15 pages, 10 figures, 4 tables

  37. arXiv:2503.18853  [pdf, other

    cs.CV

    3DSwapping: Texture Swapping For 3D Object From Single Reference Image

    Authors: Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, Robby T. Tan

    Abstract: 3D texture swapping allows for the customization of 3D object textures, enabling efficient and versatile visual transformations in 3D editing. While no dedicated method exists, adapted 2D editing and text-driven 3D editing approaches can serve this purpose. However, 2D editing requires frame-by-frame manipulation, causing inconsistencies across views, while text-driven 3D editing struggles to pres… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  38. arXiv:2503.14906  [pdf, other

    eess.IV cs.CV

    FetalFlex: Anatomy-Guided Diffusion Model for Flexible Control on Fetal Ultrasound Image Synthesis

    Authors: Yaofei Duan, Tao Tan, Zhiyuan Zhu, Yuhao Huang, Yuanji Zhang, Rui Gao, Patrick Cheong-Iao Pang, Xinru Gao, Guowei Tao, Xiang Cong, Zhou Li, Lianying Liang, Guangzhi He, Linliang Yin, Xuedong Deng, Xin Yang, Dong Ni

    Abstract: Fetal ultrasound (US) examinations require the acquisition of multiple planes, each providing unique diagnostic information to evaluate fetal development and screening for congenital anomalies. However, obtaining a comprehensive, multi-plane annotated fetal US dataset remains challenging, particularly for rare or complex anomalies owing to their low incidence and numerous subtypes. This poses diff… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 18 pages, 10 figures

  39. arXiv:2503.14745  [pdf, other

    astro-ph.CO

    Data Release 1 of the Dark Energy Spectroscopic Instrument

    Authors: DESI Collaboration, M. Abdul-Karim, A. G. Adame, D. Aguado, J. Aguilar, S. Ahlen, S. Alam, G. Aldering, D. M. Alexander, R. Alfarsy, L. Allen, C. Allende Prieto, O. Alves, A. Anand, U. Andrade, E. Armengaud, S. Avila, A. Aviles, H. Awan, S. Bailey, A. Baleato Lizancos, O. Ballester, A. Bault, J. Bautista, S. BenZvi , et al. (253 additional authors not shown)

    Abstract: In 2021 May the Dark Energy Spectroscopic Instrument (DESI) collaboration began a 5-year spectroscopic redshift survey to produce a detailed map of the evolving three-dimensional structure of the universe between $z=0$ and $z\approx4$. DESI's principle scientific objectives are to place precise constraints on the equation of state of dark energy, the gravitationally driven growth of large-scale st… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: 62 pages, 7 figures, 15 tables, submitted to The Astronomical Journal

  40. arXiv:2503.14744  [pdf, other

    astro-ph.CO

    Constraints on Neutrino Physics from DESI DR2 BAO and DR1 Full Shape

    Authors: W. Elbers, A. Aviles, H. E. Noriega, D. Chebat, A. Menegas, C. S. Frenk, C. Garcia-Quintero, D. Gonzalez, M. Ishak, O. Lahav, K. Naidoo, G. Niz, C. Yèche, M. Abdul-Karim, S. Ahlen, O. Alves, U. Andrade, E. Armengaud, S. BenZvi, D. Bianchi, S. Brieden, A. Brodzeller, D. Brooks, E. Burtin, R. Calderon , et al. (94 additional authors not shown)

    Abstract: The Dark Energy Spectroscopic Instrument (DESI) Collaboration has obtained robust measurements of baryon acoustic oscillations (BAO) in the redshift range, $0.1 < z < 4.2$, based on the Lyman-$α$ forest and galaxies from Data Release 2 (DR2). We combine these measurements with external cosmic microwave background (CMB) data from Planck and ACT to place our tightest constraints yet on the sum of ne… ▽ More

    Submitted 3 April, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: 34 pages, 17 figures. This DESI Collaboration Publication is part of the Data Release 2 publication series (see https://data.desi.lbl.gov/doc/papers/)

  41. arXiv:2503.14743  [pdf, other

    astro-ph.CO

    Extended Dark Energy analysis using DESI DR2 BAO measurements

    Authors: K. Lodha, R. Calderon, W. L. Matthewson, A. Shafieloo, M. Ishak, J. Pan, C. Garcia-Quintero, D. Huterer, G. Valogiannis, L. A. Ureña-López, N. V. Kamble, D. Parkinson, A. G. Kim, G. B. Zhao, J. L. Cervantes-Cota, J. Rohlf, F. Lozano-Rodríguez, J. O. Román-Herrera, M. Abdul-Karim, J. Aguilar, S. Ahlen, O. Alves, U. Andrade, E. Armengaud, A. Aviles , et al. (100 additional authors not shown)

    Abstract: We conduct an extended analysis of dark energy constraints, in support of the findings of the DESI DR2 cosmology key paper, including DESI data, Planck CMB observations, and three different supernova compilations. Using a broad range of parametric and non-parametric methods, we explore the dark energy phenomenology and find consistent trends across all approaches, in good agreement with the… ▽ More

    Submitted 3 April, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: 27 pages, 18 figures. This DESI Collaboration Publication is part of the Data Release 2 publication series (see https://data.desi.lbl.gov/doc/papers )

  42. arXiv:2503.14742  [pdf, other

    astro-ph.CO

    Validation of the DESI DR2 Measurements of Baryon Acoustic Oscillations from Galaxies and Quasars

    Authors: U. Andrade, E. Paillas, J. Mena-Fernández, Q. Li, A. J. Ross, S. Nadathur, M. Rashkovetskyi, A. Pérez-Fernández, H. Seo, N. Sanders, O. Alves, X. Chen, N. Deiosso, A. de Mattia, M. White, M. Abdul-Karim, S. Ahlen, E. Armengaud, A. Aviles, D. Bianchi, S. Brieden, A. Brodzeller, D. Brooks, E. Burtin, R. Calderon , et al. (94 additional authors not shown)

    Abstract: The Dark Energy Spectroscopic Instrument (DESI) data release 2 (DR2) galaxy and quasar clustering data represents a significant expansion of data from DR1, providing improved statistical precision in BAO constraints across multiple tracers, including bright galaxies (BGS), luminous red galaxies (LRGs), emission line galaxies (ELGs), and quasars (QSOs). In this paper, we validate the BAO analysis o… ▽ More

    Submitted 27 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: This DESI Collaboration Publication is part of the Data Release 2 publication series (see https://data.desi.lbl.gov/doc/papers )

  43. arXiv:2503.14741  [pdf, other

    astro-ph.IM astro-ph.CO

    Validation of the DESI DR2 Ly$α$ BAO analysis using synthetic datasets

    Authors: L. Casas, H. K. Herrera-Alcantar, J. Chaves-Montero, A. Cuceu, A. Font-Ribera, M. Lokken, M. Abdul-Karim, C. Ramírez-Pérez, J. Aguilar, S. Ahlen, U. Andrade, E. Armengaud, A. Aviles, S. Bailey, S. BenZvi, D. Bianchi, A. Brodzeller, D. Brooks, R. Canning, A. Carnero Rosell, M. Charles, E. Chaussidon, T. Claybaugh, K. S. Dawson, A. de la Macorra , et al. (73 additional authors not shown)

    Abstract: The second data release (DR2) of the Dark Energy Spectroscopic Instrument (DESI), containing data from the first three years of observations, doubles the number of Lyman-$α$ (Ly$α$) forest spectra in DR1 and it provides the largest dataset of its kind. To ensure a robust validation of the Baryonic Acoustic Oscillation (BAO) analysis using Ly$α$ forests, we have made significant updates compared to… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: This DESI Collaboration Publication is part of the Data Release 2 publication series (see https://data.desi.lbl.gov/doc/papers)

  44. arXiv:2503.14740  [pdf, ps, other

    astro-ph.CO astro-ph.GA

    Construction of the Damped Ly$α$ Absorber Catalog for DESI DR2 Ly$α$ BAO

    Authors: A. Brodzeller, M. Wolfson, D. M. Santos, M. Ho, T. Tan, M. M. Pieri, A. Cuceu, M. Abdul-Karim, J. Aguilar, S. Ahlen, A. Anand, U. Andrade, E. Armengaud, A. Aviles, S. Bailey, A. Bault, D. Bianchi, D. Brooks, R. Canning, L. Casas, M. Charles, E. Chaussidon, J. Chaves-Montero, D. Chebat, T. Claybaugh , et al. (74 additional authors not shown)

    Abstract: We present the Damped Ly$α$ Toolkit for automated detection and characterization of Damped Ly$α$ absorbers (DLA) in quasar spectra. Our method uses quasar spectral templates with and without absorption from intervening DLAs to reconstruct observed quasar forest regions. The best-fitting model determines whether a DLA is present while estimating the redshift and \texttt{HI} column density. With an… ▽ More

    Submitted 9 June, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: This DESI Collaboration Publication is part of the Data Release 2 publication series,see https://data.desi.lbl.gov/doc/papers/

  45. arXiv:2503.14739  [pdf, other

    astro-ph.CO

    DESI DR2 Results I: Baryon Acoustic Oscillations from the Lyman Alpha Forest

    Authors: DESI Collaboration, M. Abdul-Karim, J. Aguilar, S. Ahlen, C. Allende Prieto, O. Alves, A. Anand, U. Andrade, E. Armengaud, A. Aviles, S. Bailey, A. Bault, S. BenZvi, D. Bianchi, C. Blake, A. Brodzeller, D. Brooks, E. Buckley-Geer, E. Burtin, R. Calderon, R. Canning, A. Carnero Rosell, P. Carrilho, L. Casas, F. J. Castander , et al. (124 additional authors not shown)

    Abstract: We present the Baryon Acoustic Oscillation (BAO) measurements with the Lyman-alpha (LyA) forest from the second data release (DR2) of the Dark Energy Spectroscopic Instrument (DESI) survey. Our BAO measurements include both the auto-correlation of the LyA forest absorption observed in the spectra of high-redshift quasars and the cross-correlation of the absorption with the quasar positions. The to… ▽ More

    Submitted 26 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: Submitted to PRD. Updated authors and references. 28 pages and 13 figures. This DESI Collaboration Publication is part of the Data Release 2 publication series (see https://data.desi.lbl.gov/doc/papers )

  46. arXiv:2503.14738  [pdf, other

    astro-ph.CO

    DESI DR2 Results II: Measurements of Baryon Acoustic Oscillations and Cosmological Constraints

    Authors: DESI Collaboration, M. Abdul-Karim, J. Aguilar, S. Ahlen, S. Alam, L. Allen, C. Allende Prieto, O. Alves, A. Anand, U. Andrade, E. Armengaud, A. Aviles, S. Bailey, C. Baltay, P. Bansal, A. Bault, J. Behera, S. BenZvi, D. Bianchi, C. Blake, S. Brieden, A. Brodzeller, D. Brooks, E. Buckley-Geer, E. Burtin , et al. (162 additional authors not shown)

    Abstract: We present baryon acoustic oscillation (BAO) measurements from more than 14 million galaxies and quasars drawn from the Dark Energy Spectroscopic Instrument (DESI) Data Release 2 (DR2), based on three years of operation. For cosmology inference, these galaxy measurements are combined with DESI Lyman-$α$ forest BAO results presented in a companion paper. The DR2 BAO results are consistent with DESI… ▽ More

    Submitted 26 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: 40 pages, 18 figures. This DESI Collaboration Publication is part of the Data Release 2 publication series (see https://data.desi.lbl.gov/doc/papers )

  47. arXiv:2503.14504  [pdf, ps, other

    cs.CV

    Aligning Multimodal LLM with Human Preference: A Survey

    Authors: Tao Yu, Yi-Fan Zhang, Chaoyou Fu, Junkang Wu, Jinda Lu, Kun Wang, Xingyu Lu, Yunhang Shen, Guibin Zhang, Dingjie Song, Yibo Yan, Tianlong Xu, Qingsong Wen, Zhang Zhang, Yan Huang, Liang Wang, Tieniu Tan

    Abstract: Large language models (LLMs) can handle a wide variety of general tasks with simple prompts, without the need for task-specific training. Multimodal Large Language Models (MLLMs), built upon LLMs, have demonstrated impressive potential in tackling complex tasks involving visual, auditory, and textual data. However, critical issues related to truthfulness, safety, o1-like reasoning, and alignment w… ▽ More

    Submitted 23 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

    Comments: Project page: https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Alignment

  48. arXiv:2503.11374  [pdf, other

    cond-mat.mtrl-sci nucl-th physics.atom-ph physics.optics

    A spinless crystal for a high-performance solid-state $^{229}$Th nuclear clock

    Authors: Harry W. T. Morgan, James E. S. Terhune, Ricky Elwell, Hoang Bao Tran Tan, Udeshika C. Perera, Andrei Derevianko, Eric R. Hudson, Anastassia N. Alexandrova

    Abstract: Solid-state $^{229}$Th nuclear clocks require a host material whose band gap is larger than the 8.4 eV nuclear transition energy. As such, excitation of the $^{229}$Th nuclear state has so far only been demonstrated in metal fluorides, specifically CaF$_2$, LiSrAlF$_6$, and ThF$_4$, where the large electronegativity of the halogen leads to sufficient band gaps. However, it is expected that the nuc… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 7 pages of main text and references, 3 pages of supporting information. 2 figures

  49. arXiv:2503.10692  [pdf, other

    cs.CV cs.RO

    Exploring the best way for UAV visual localization under Low-altitude Multi-view Observation Condition: a Benchmark

    Authors: Yibin Ye, Xichao Teng, Shuo Chen, Zhang Li, Leqi Liu, Qifeng Yu, Tao Tan

    Abstract: Absolute Visual Localization (AVL) enables Unmanned Aerial Vehicle (UAV) to determine its position in GNSS-denied environments by establishing geometric relationships between UAV images and geo-tagged reference maps. While many previous works have achieved AVL with image retrieval and matching techniques, research in low-altitude multi-view scenarios still remains limited. Low-altitude Multi-view… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  50. arXiv:2503.10112  [pdf, other

    cs.CV

    MoEdit: On Learning Quantity Perception for Multi-object Image Editing

    Authors: Yanfeng Li, Kahou Chan, Yue Sun, Chantong Lam, Tong Tong, Zitong Yu, Keren Fu, Xiaohong Liu, Tao Tan

    Abstract: Multi-object images are prevalent in various real-world scenarios, including augmented reality, advertisement design, and medical imaging. Efficient and precise editing of these images is critical for these applications. With the advent of Stable Diffusion (SD), high-quality image generation and editing have entered a new era. However, existing methods often struggle to consider each object both i… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.