Skip to main content

Showing 51–100 of 1,045 results for author: Wu, G

.
  1. arXiv:2503.11571  [pdf, other

    cs.CV cs.AI

    RASA: Replace Anyone, Say Anything -- A Training-Free Framework for Audio-Driven and Universal Portrait Video Editing

    Authors: Tianrui Pan, Lin Liu, Jie Liu, Xiaopeng Zhang, Jie Tang, Gangshan Wu, Qi Tian

    Abstract: Portrait video editing focuses on modifying specific attributes of portrait videos, guided by audio or video streams. Previous methods typically either concentrate on lip-region reenactment or require training specialized models to extract keypoints for motion transfer to a new identity. In this paper, we introduce a training-free universal portrait video editing framework that provides a versatil… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: Demo is available at https://alice01010101.github.io/RASA/

  2. arXiv:2503.11383  [pdf, other

    hep-ex

    Study of $φ\to K\bar{K}$ and $K_{S}^{0}-K_{L}^{0}$ asymmetry in the amplitude analysis of $D_{s}^{+} \to K_{S}^{0}K_{L}^{0}π^{+}$ decay

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (701 additional authors not shown)

    Abstract: Using $e^+e^-$ annihilation data corresponding to a total integrated luminosity of 7.33 $\rm fb^{-1}$ collected at center-of-mass energies between 4.128 and 4.226~GeV with the BESIII detector, we provide the first amplitude analysis and absolute branching fraction measurement of the hadronic decay $D_{s}^{+} \to K_{S}^{0}K_{L}^{0}π^{+}$. The branching fraction of… ▽ More

    Submitted 23 March, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

    Comments: 11 pages, 4 figures

  3. arXiv:2503.08010  [pdf, other

    cs.CV cs.AI

    SKALD: Learning-Based Shot Assembly for Coherent Multi-Shot Video Creation

    Authors: Chen Yi Lu, Md Mehrab Tanjim, Ishita Dasgupta, Somdeb Sarkhel, Gang Wu, Saayan Mitra, Somali Chaterji

    Abstract: We present SKALD, a multi-shot video assembly method that constructs coherent video sequences from candidate shots with minimal reliance on text. Central to our approach is the Learned Clip Assembly (LCA) score, a learning-based metric that measures temporal and semantic relationships between shots to quantify narrative coherence. We tackle the exponential complexity of combining multiple shots wi… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

  4. arXiv:2503.07703  [pdf, other

    cs.CV

    Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model

    Authors: Lixue Gong, Xiaoxia Hou, Fanshi Li, Liang Li, Xiaochen Lian, Fei Liu, Liyang Liu, Wei Liu, Wei Lu, Yichun Shi, Shiqi Sun, Yu Tian, Zhi Tian, Peng Wang, Xun Wang, Ye Wang, Guofeng Wu, Jie Wu, Xin Xia, Xuefeng Xiao, Linjie Yang, Zhonghua Zhai, Xinyu Zhang, Qi Zhang, Yuwei Zhang , et al. (3 additional authors not shown)

    Abstract: Rapid advancement of diffusion models has catalyzed remarkable progress in the field of image generation. However, prevalent models such as Flux, SD3.5 and Midjourney, still grapple with issues like model bias, limited text rendering capabilities, and insufficient understanding of Chinese cultural nuances. To address these limitations, we present Seedream 2.0, a native Chinese-English bilingual im… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: Official Page: https://team.doubao.com/tech/seedream

  5. arXiv:2503.07597  [pdf, other

    cs.CV

    HumanMM: Global Human Motion Recovery from Multi-shot Videos

    Authors: Yuhong Zhang, Guanlin Wu, Ling-Hao Chen, Zhuokai Zhao, Jing Lin, Xiaoke Jiang, Jiamin Wu, Zhuoheng Li, Hao Frank Yang, Haoqian Wang, Lei Zhang

    Abstract: In this paper, we present a novel framework designed to reconstruct long-sequence 3D human motion in the world coordinates from in-the-wild videos with multiple shot transitions. Such long-sequence in-the-wild motions are highly valuable to applications such as motion generation and motion understanding, but are of great challenge to be recovered due to abrupt shot transitions, partial occlusions,… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: CVPR 2025; Project page: https://zhangyuhong01.github.io/HumanMM/

  6. arXiv:2503.06896  [pdf, other

    cs.CV

    CATANet: Efficient Content-Aware Token Aggregation for Lightweight Image Super-Resolution

    Authors: Xin Liu, Jie Liu, Jie Tang, Gangshan Wu

    Abstract: Transformer-based methods have demonstrated impressive performance in low-level visual tasks such as Image Super-Resolution (SR). However, its computational complexity grows quadratically with the spatial resolution. A series of works attempt to alleviate this problem by dividing Low-Resolution images into local windows, axial stripes, or dilated windows. SR typically leverages the redundancy of i… ▽ More

    Submitted 10 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR2025

  7. arXiv:2503.06477  [pdf, other

    cs.CV cs.AI

    PDB: Not All Drivers Are the Same -- A Personalized Dataset for Understanding Driving Behavior

    Authors: Chuheng Wei, Ziye Qin, Siyan Li, Ziyan Zhang, Xuanpeng Zhao, Amr Abdelraouf, Rohit Gupta, Kyungtae Han, Matthew J. Barth, Guoyuan Wu

    Abstract: Driving behavior is inherently personal, influenced by individual habits, decision-making styles, and physiological states. However, most existing datasets treat all drivers as homogeneous, overlooking driver-specific variability. To address this gap, we introduce the Personalized Driving Behavior (PDB) dataset, a multi-modal dataset designed to capture personalization in driving behavior under na… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  8. arXiv:2503.05382  [pdf, other

    hep-ex

    Measurement of the branching fractions of $D^+ \to K^+K^-π^+π^+π^-$, $φπ^+π^+π^-$, $K^0_SK^+π^+π^-π^0$, $K^0_SK^+η$, and $K^0_SK^+ω$ decays

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (693 additional authors not shown)

    Abstract: Using $20.3~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data collected at a center-of-mass energy of 3.773 GeV with the BESIII detector operating at the BEPCII collider, the branching fractions of three hadronic charm meson decays, $D^+\to φπ^+π^+π^-$, $D^+\to K^0_SK^+π^+π^-π^0$, and $D^+\to K^0_SK^+ω$, are measured for the first time to be $(0.54\pm0.19\pm0.02)\times 10^{-4}$,… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: 11 pages, 3 figures

    Report number: BAM-00841

  9. arXiv:2503.04344  [pdf, other

    cs.CV

    LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding

    Authors: Shen Zhang, Yaning Tan, Siyuan Liang, Zhaowei Chen, Linze Li, Ge Wu, Yuhao Chen, Shuheng Li, Zhenyu Zhao, Caihua Chen, Jiajun Liang, Yao Tang

    Abstract: Diffusion transformers(DiTs) struggle to generate images at resolutions higher than their training resolutions. The primary obstacle is that the explicit positional encodings(PE), such as RoPE, need extrapolation which degrades performance when the inference resolution differs from training. In this paper, we propose a Length-Extrapolatable Diffusion Transformer(LEDiT), a simple yet powerful archi… ▽ More

    Submitted 7 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

  10. OCL: Ordinal Contrastive Learning for Imputating Features with Progressive Labels

    Authors: Seunghun Baek, Jaeyoon Sim, Guorong Wu, Won Hwa Kim

    Abstract: Accurately discriminating progressive stages of Alzheimer's Disease (AD) is crucial for early diagnosis and prevention. It often involves multiple imaging modalities to understand the complex pathology of AD, however, acquiring a complete set of images is challenging due to high cost and burden for subjects. In the end, missing data become inevitable which lead to limited sample-size and decrease… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: MICCAI 2024 (Provisional Accept)

  11. Modality-Agnostic Style Transfer for Holistic Feature Imputation

    Authors: Seunghun Baek, Jaeyoon Sim, Mustafa Dere, Minjeong Kim, Guorong Wu, Won Hwa Kim

    Abstract: Characterizing a preclinical stage of Alzheimer's Disease (AD) via single imaging is difficult as its early symptoms are quite subtle. Therefore, many neuroimaging studies are curated with various imaging modalities, e.g., MRI and PET, however, it is often challenging to acquire all of them from all subjects and missing data become inevitable. In this regards, in this paper, we propose a framework… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: ISBI 2024 (oral)

  12. arXiv:2503.02196  [pdf, ps, other

    hep-ex

    First Measurement of the Decay Dynamics in the Semileptonic Transition of the $D^{+(0)}$ into the Axial-vector Meson $\bar K_1(1270)$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (680 additional authors not shown)

    Abstract: Using $e^+e^-$ collision data taken at the center-of-mass energy of 3.773 GeV with the BESIII detector, corresponding to an integrated luminosity of 20.3 fb$^{-1}$, we report the first amplitude and angular analyses of the semileptonic decays $D^{+(0)}\to K^-π^+π^{0(-)} e^+ν_e$. From the amplitude analysis, we determine for the first time the hadronic form factors of the semileptonic $D$ decays in… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 15 pages, 6 figures, submitted to PRL

  13. arXiv:2503.01754  [pdf, other

    cs.CV

    SDRT: Enhance Vision-Language Models by Self-Distillation with Diverse Reasoning Traces

    Authors: Guande Wu, Huan Song, Yawei Wang, Qiaojing Yan, Yijun Tian, Lin Lee Cheong, Panpan Xu

    Abstract: Reasoning is increasingly crucial for various tasks. While chain-of-thought prompting enables large language models to leverage reasoning effectively, harnessing the reasoning capabilities of Vision-Language Models (VLMs) remains challenging. To solve this problem, we propose a novel self-distillation framework that enhances the reasoning capabilities of the model. The proposed framework introduce… ▽ More

    Submitted 19 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

  14. arXiv:2503.01565  [pdf, other

    cs.CV eess.IV

    AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning

    Authors: Yuheng Xu, Shijie Yang, Xin Liu, Jie Liu, Jie Tang, Gangshan Wu

    Abstract: In recent years, the increasing popularity of Hi-DPI screens has driven a rising demand for high-resolution images. However, the limited computational power of edge devices poses a challenge in deploying complex super-resolution neural networks, highlighting the need for efficient methods. While prior works have made significant progress, they have not fully exploited pixel-level information. More… ▽ More

    Submitted 7 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR2025

  15. Learning Covariance-Based Multi-Scale Representation of Neuroimaging Measures for Alzheimer Classification

    Authors: Seunghun Baek, Injun Choi, Mustafa Dere, Minjeong Kim, Guorong Wu, Won Hwa Kim

    Abstract: Stacking excessive layers in DNN results in highly underdetermined system when training samples are limited, which is very common in medical applications. In this regard, we present a framework capable of deriving an efficient high-dimensional space with reasonable increase in model size. This is done by utilizing a transform (i.e., convolution) that leverages scale-space theory with covariance st… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: ISBI 2023

  16. arXiv:2503.01210  [pdf, other

    cs.CV

    Every SAM Drop Counts: Embracing Semantic Priors for Multi-Modality Image Fusion and Beyond

    Authors: Guanyao Wu, Haoyu Liu, Hongming Fu, Yichuan Peng, Jinyuan Liu, Xin Fan, Risheng Liu

    Abstract: Multi-modality image fusion, particularly infrared and visible, plays a crucial role in integrating diverse modalities to enhance scene understanding. Although early research prioritized visual quality, preserving fine details and adapting to downstream tasks remains challenging. Recent approaches attempt task-specific design but rarely achieve "The Best of Both Worlds" due to inconsistent optimiz… ▽ More

    Submitted 25 March, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

  17. arXiv:2503.01143  [pdf, other

    cs.LG

    DPR: Diffusion Preference-based Reward for Offline Reinforcement Learning

    Authors: Teng Pang, Bingzheng Wang, Guoqiang Wu, Yilong Yin

    Abstract: Offline preference-based reinforcement learning (PbRL) mitigates the need for reward definition, aligning with human preferences via preference-driven reward feedback without interacting with the environment. However, the effectiveness of preference-driven reward functions depends on the modeling ability of the learning model, which current MLP-based and Transformer-based methods may fail to adequ… ▽ More

    Submitted 13 May, 2025; v1 submitted 2 March, 2025; originally announced March 2025.

  18. arXiv:2503.00510  [pdf, other

    eess.IV cs.CV

    NeuroSymAD: A Neuro-Symbolic Framework for Interpretable Alzheimer's Disease Diagnosis

    Authors: Yexiao He, Ziyao Wang, Yuning Zhang, Tingting Dan, Tianlong Chen, Guorong Wu, Ang Li

    Abstract: Alzheimer's disease (AD) diagnosis is complex, requiring the integration of imaging and clinical data for accurate assessment. While deep learning has shown promise in brain MRI analysis, it often functions as a black box, limiting interpretability and lacking mechanisms to effectively integrate critical clinical data such as biomarkers, medical history, and demographic information. To bridge this… ▽ More

    Submitted 1 March, 2025; originally announced March 2025.

  19. arXiv:2502.20821  [pdf, other

    hep-ex

    Improved measurement of absolute branching fraction of the inclusive decay $Λ_{c}^{+} \to K_{S}^{0} X$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (679 additional authors not shown)

    Abstract: By analyzing $4.5$ fb$^{-1}$ of $e^{+}e^{-}$ collision data accumulated with the BESIII detector at center-of-mass energies ranging from $4599.53$ MeV to $4698.82$ MeV, we report the measurement of the absolute branching fraction (BF) of the inclusive decay $Λ_{c}^{+} \to K_{S}^{0} X$ using the double-tag technique. The result is $\mathcal{B}(Λ_{c}^{+} \to K_{S}^{0} X)=(10.9\pm0.2\pm0.1)\%$, where… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

  20. arXiv:2502.19850  [pdf, other

    hep-ex

    Precision measurement of the branching fraction for the decay $ψ(2S)\rightarrowτ^{+}τ^{-}$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (691 additional authors not shown)

    Abstract: Using $(2259.3 \pm 11.1)\times10^{6}$ $ψ(2S)$ events acquired with the BESIII detector, the branching fraction of $ψ(2S)\rightarrowτ^{+}τ^{-}$ is measured with improved precision to be $\mathcal{B}_{ψ(2S)\rightarrowτ^{+}τ^{-}}=(3.240~\pm~0.023~\pm~0.081)\times 10^{-3}$, where the first and second uncertainties are statistical and systematic, respectively, which is consistent with the world average… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: 10 page, 5 figures

  21. arXiv:2502.18167  [pdf, ps, other

    cs.LG stat.ML

    Sharper Concentration Inequalities for Multi-Graph Dependent Variables

    Authors: Xiao Shao, Guoqiang Wu

    Abstract: In multi-task learning (MTL) with each task involving graph-dependent data, generalization results of existing theoretical analyses yield a sub-optimal risk bound of $O(\frac{1}{\sqrt{n}})$, where $n$ is the number of training samples.This is attributed to the lack of a foundational sharper concentration inequality for multi-graph dependent random variables. To fill this gap, this paper proposes a… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: 34 pages

  22. arXiv:2502.16084  [pdf, other

    hep-ex

    Single Inclusive $π^\pm$ and $K^\pm$ Production in $e^+e^-$ Annihilation at center-of-mass Energies from 2.000 to 3.671GeV

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (707 additional authors not shown)

    Abstract: Using data samples with a total integrated luminosity of 253 $\rm pb^{-1}$ collected by the BESIII detector operating at the BEPCII collider, the differential cross-sections of inclusive $π^\pm$ and $K^\pm$ production, as a function of momentum and normalized by the total hadronic cross-section, are measured at center-of-mass energies from 2.000 to 3.671 GeV. The measured $π^{\pm}$ cross sections… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  23. arXiv:2502.14583  [pdf, other

    cs.LG cs.AI

    A Theory for Conditional Generative Modeling on Multiple Data Sources

    Authors: Rongzhen Wang, Yan Zhang, Chenyu Zheng, Chongxuan Li, Guoqiang Wu

    Abstract: The success of large generative models has driven a paradigm shift, leveraging massive multi-source data to enhance model capabilities. However, the interaction among these sources remains theoretically underexplored. This paper takes the first step toward a rigorous analysis of multi-source training in conditional generative modeling, where each condition represents a distinct data source. Specif… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 35 pages

  24. arXiv:2502.13540  [pdf, other

    hep-ex

    Amplitude analysis of $ψ(3686)\to γK_S^0 K_S^0 $

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (704 additional authors not shown)

    Abstract: Using $(2712\pm14)\times10^6$ $ψ(3686)$ events collected with the BESIII detector, we perform the first amplitude analysis of the radiative decay $ψ(3686)\to γK_S^0 K_S^0$ within the mass region $M_{K_S^0 K_S^0 }<2.8$ GeV/$c^2$. Employing a one-channel K-matrix approach for the description of the dynamics of the $K^0_S K^0_S$ system, the data sample is well described with four poles for the $f_0$-… ▽ More

    Submitted 7 May, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: 20 pages, 4 figures, submitted to JHEP

  25. arXiv:2502.12202  [pdf, other

    cs.CL cs.AI cs.LG

    BoT: Breaking Long Thought Processes of o1-like Large Language Models through Backdoor Attack

    Authors: Zihao Zhu, Hongbao Zhang, Mingda Zhang, Ruotong Wang, Guanzong Wu, Ke Xu, Baoyuan Wu

    Abstract: Longer thought, better performance: large language models with deep reasoning capabilities, particularly o1-like models, have demonstrated remarkable performance by generating extensive thought processes during inference. This trade-off reveals a potential vulnerability: adversaries could compromise model performance by forcing immediate responses without thought processes. To this end, in this pa… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  26. arXiv:2502.11047  [pdf, ps, other

    hep-ex

    Search for the Cabibbo-suppressed decays $Λ_c^{+}\toΣ^0K^{+}π^{0}$ and $Λ_c^{+}\toΣ^0K^{+}π^{+}π^{-}$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (687 additional authors not shown)

    Abstract: Utilizing 4.5 $fb^-$ of $e^+e^-$ annihilation data collected at center-of-mass energies ranging from 4599.53 MeV to 4698.82 MeV by the BESIII detector at the BEPCII collider, we search for the singly Cabibbo-suppressed hadronic decays $Λ_{c}^{+}\toΣ^{0} K^{+}π^{0}$ and $Λ_{c}^{+}\toΣ^{0}K^{+}π^+π^-$ with a single-tag method. No significant signals are observed for both decays. The upper limits on… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

    Comments: 12 pages, 6 figures

  27. arXiv:2502.10606  [pdf, other

    cs.CV cs.RO

    HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation

    Authors: Yibo Liu, Zhaodong Jiang, Binbin Xu, Guile Wu, Yuan Ren, Tongtong Cao, Bingbing Liu, Rui Heng Yang, Amir Rasouli, Jinjun Shan

    Abstract: This work focuses on model-free zero-shot 6D object pose estimation for robotics applications. While existing methods can estimate the precise 6D pose of objects, they heavily rely on curated CAD models or reference images, the preparation of which is a time-consuming and labor-intensive process. Moreover, in real-world scenarios, 3D models or reference images may not be available in advance and i… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  28. arXiv:2502.06171  [pdf

    eess.IV cs.CV

    A Data-Efficient Pan-Tumor Foundation Model for Oncology CT Interpretation

    Authors: Wenhui Lei, Hanyu Chen, Zitian Zhang, Luyang Luo, Qiong Xiao, Yannian Gu, Peng Gao, Yankai Jiang, Ci Wang, Guangtao Wu, Tongjia Xu, Yingjie Zhang, Xiaofan Zhang, Pranav Rajpurkar, Shaoting Zhang, Zhenning Wang

    Abstract: Artificial intelligence-assisted imaging analysis has made substantial strides in tumor diagnosis and management. Here we present PASTA, a pan-tumor CT foundation model that achieves state-of-the-art performance on 45 of 46 representative oncology tasks -- including lesion segmentation, tumor detection in plain CT, tumor staging, survival prediction, structured report generation, and cross-modalit… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: 57 pages, 7 figures

  29. arXiv:2502.05519  [pdf, other

    astro-ph.HE gr-qc

    Observational and Theoretical Constraints on First-Order Phase Transitions in Neutron Stars

    Authors: Zuhua Ji, Jiarui Chen, Gaojian Wu

    Abstract: Understanding the equation of state (EOS) of neutron stars (NSs) is a fundamental challenge in astrophysics and nuclear physics. A first-order phase transition (FOPT) at high densities could lead to the formation of a quark core, significantly affecting NS properties. This review explores observational and theoretical constraints on such transitions using multi-messenger astrophysics. X-ray observ… ▽ More

    Submitted 8 May, 2025; v1 submitted 8 February, 2025; originally announced February 2025.

  30. arXiv:2502.02529  [pdf, ps, other

    math.PR math.OC stat.ML

    A weak convergence approach to large deviations for stochastic approximations

    Authors: Henrik Hult, Adam Lindhe, Pierre Nyquist, Guo-Jhen Wu

    Abstract: The theory of stochastic approximations form the theoretical foundation for studying convergence properties of many popular recursive learning algorithms in statistics, machine learning and statistical physics. Large deviations for stochastic approximations provide asymptotic estimates of the probability that the learning algorithm deviates from its expected path, given by a limit ODE, and the lar… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: 60 p

    MSC Class: 60F10; 62L20; 60J20

  31. arXiv:2501.16391  [pdf, other

    cs.LG cs.AI q-bio.BM

    Inductive-Associative Meta-learning Pipeline with Human Cognitive Patterns for Unseen Drug-Target Interaction Prediction

    Authors: Xiaoqing Lian, Jie Zhu, Tianxu Lv, Shiyun Nie, Hang Fan, Guosheng Wu, Yunjun Ge, Lihua Li, Xiangxiang Zeng, Xiang Pan

    Abstract: Significant differences in protein structures hinder the generalization of existing drug-target interaction (DTI) models, which often rely heavily on pre-learned binding principles or detailed annotations. In contrast, BioBridge designs an Inductive-Associative pipeline inspired by the workflow of scientists who base their accumulated expertise on drawing insights into novel drug-target pairs from… ▽ More

    Submitted 27 March, 2025; v1 submitted 26 January, 2025; originally announced January 2025.

  32. arXiv:2501.15447  [pdf, ps, other

    hep-ex

    Observation of $h_{c}$ radiative decays to multiple light hadrons and the tensor state $f_2(1270)$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (666 additional authors not shown)

    Abstract: Using $ψ(3686)\rightarrow π^{0} h_{c}$ decays from a data sample of $(27.12\pm0.14)\times10^{8}$ $ψ(3686)$ events collected by the BESIII detector at the BEPCII collider, $h_c$ radiative decays to $γπ^{+}π^{-},~γπ^{+}π^{-}η,~\gamma2(π^{+}π^{-})$, and $γp\bar{p}$ are observed for the first time, each with a significance greater than $5σ$. The corresponding branching fractions are measured. Furtherm… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

  33. arXiv:2501.13896  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration

    Authors: Yue Fan, Handong Zhao, Ruiyi Zhang, Yu Shen, Xin Eric Wang, Gang Wu

    Abstract: Graphical User Interface (GUI) action grounding is a critical step in GUI automation that maps language instructions to actionable elements on GUI screens. Most recent works of GUI action grounding leverage large GUI datasets to fine-tune MLLMs. However, the fine-tuning data always covers limited GUI environments, and we find the performance of the resulting model deteriorates in novel environment… ▽ More

    Submitted 27 January, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

  34. arXiv:2501.13183  [pdf, other

    cs.CV

    MONA: Moving Object Detection from Videos Shot by Dynamic Camera

    Authors: Boxun Hu, Mingze Xia, Ding Zhao, Guanlin Wu

    Abstract: Dynamic urban environments, characterized by moving cameras and objects, pose significant challenges for camera trajectory estimation by complicating the distinction between camera-induced and object motion. We introduce MONA, a novel framework designed for robust moving object detection and segmentation from videos shot by dynamic cameras. MONA comprises two key modules: Dynamic Points Extraction… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

  35. arXiv:2501.12420  [pdf, other

    cs.SE cs.AI cs.LG

    Consolidating TinyML Lifecycle with Large Language Models: Reality, Illusion, or Opportunity?

    Authors: Guanghan Wu, Sasu Tarkoma, Roberto Morabito

    Abstract: The evolving requirements of Internet of Things (IoT) applications are driving an increasing shift toward bringing intelligence to the edge, enabling real-time insights and decision-making within resource-constrained environments. Tiny Machine Learning (TinyML) has emerged as a key enabler of this evolution, facilitating the deployment of ML models on devices such as microcontrollers and embedded… ▽ More

    Submitted 5 April, 2025; v1 submitted 20 January, 2025; originally announced January 2025.

    Comments: This paper has been accepted for publication in the IEEE Internet of Things Magazine (Special Issue on Applications of Large Language Models in IoT). The copyright will be transferred to IEEE upon publication. A preliminary version of this work was presented at the Edge AI Foundation event Beyond LLMs and Chatbots: The Journey to Generative AI at the Edge (https://youtu.be/aFWfisdjQIs)

  36. arXiv:2501.10761  [pdf, other

    cs.CV

    Infrared and Visible Image Fusion: From Data Compatibility to Task Adaption

    Authors: Jinyuan Liu, Guanyao Wu, Zhu Liu, Di Wang, Zhiying Jiang, Long Ma, Wei Zhong, Xin Fan, Risheng Liu

    Abstract: Infrared-visible image fusion (IVIF) is a critical task in computer vision, aimed at integrating the unique features of both infrared and visible spectra into a unified representation. Since 2018, the field has entered the deep learning era, with an increasing variety of approaches introducing a range of networks and loss functions to enhance visual performance. However, challenges such as data co… ▽ More

    Submitted 18 January, 2025; originally announced January 2025.

  37. arXiv:2501.08080  [pdf, other

    hep-ex

    Search for the FCNC charmonium decay $J/ψ\to D^0 μ^+ μ^- + \text{c.c.}$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (680 additional authors not shown)

    Abstract: Based on a data sample of $(10087 \pm 44) \times 10^6$ $J/ψ$ events taken with the BESIII detector, we search for the flavor-changing neutral current charmonium decay $J/ψ\to D^{0} μ^{+} μ^{-} + \text{c.c.}$. No significant signal above the background is observed, and the upper limit on its branching fraction is set to be $\mathcal{B}(J/ψ\to D^{0}μ^{+}μ^{-} + \text{c.c.} ) < 1.1 \times 10^{-7}$ at… ▽ More

    Submitted 14 February, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

    Comments: 20 pages, 4 figures

  38. arXiv:2501.06220  [pdf, other

    cs.LG cs.CV

    Powerful Design of Small Vision Transformer on CIFAR10

    Authors: Gent Wu

    Abstract: Vision Transformers (ViTs) have demonstrated remarkable success on large-scale datasets, but their performance on smaller datasets often falls short of convolutional neural networks (CNNs). This paper explores the design and optimization of Tiny ViTs for small datasets, using CIFAR-10 as a benchmark. We systematically evaluate the impact of data augmentation, patch token initialization, low-rank c… ▽ More

    Submitted 6 January, 2025; originally announced January 2025.

  39. arXiv:2501.05098  [pdf, other

    cs.CV

    Motion-X++: A Large-Scale Multimodal 3D Whole-body Human Motion Dataset

    Authors: Yuhong Zhang, Jing Lin, Ailing Zeng, Guanlin Wu, Shunlin Lu, Yurong Fu, Yuanhao Cai, Ruimao Zhang, Haoqian Wang, Lei Zhang

    Abstract: In this paper, we introduce Motion-X++, a large-scale multimodal 3D expressive whole-body human motion dataset. Existing motion datasets predominantly capture body-only poses, lacking facial expressions, hand gestures, and fine-grained pose descriptions, and are typically limited to lab settings with manually labeled text descriptions, thereby restricting their scalability. To address this issue,… ▽ More

    Submitted 9 January, 2025; originally announced January 2025.

    Comments: 17 pages, 14 figures, This work extends and enhances the research published in the NeurIPS 2023 paper, "Motion-X: A Large-scale 3D Expressive Whole-body Human Motion Dataset". arXiv admin note: substantial text overlap with arXiv:2307.00818

  40. arXiv:2501.04156  [pdf, other

    cs.HC

    AdaptiveCoPilot: Design and Testing of a NeuroAdaptive LLM Cockpit Guidance System in both Novice and Expert Pilots

    Authors: Shaoyue Wen, Michael Middleton, Songming Ping, Nayan N Chawla, Guande Wu, Bradley S Feest, Chihab Nadri, Yunmei Liu, David Kaber, Maryam Zahabi, Ryan P. McMahan, Sonia Castelo, Ryan Mckendrick, Jing Qian, Claudio Silva

    Abstract: Pilots operating modern cockpits often face high cognitive demands due to complex interfaces and multitasking requirements, which can lead to overload and decreased performance. This study introduces AdaptiveCoPilot, a neuroadaptive guidance system that adapts visual, auditory, and textual cues in real time based on the pilot's cognitive workload, measured via functional Near-Infrared Spectroscopy… ▽ More

    Submitted 7 January, 2025; originally announced January 2025.

    ACM Class: H.1.2; I.2.1; I.2.7

  41. arXiv:2501.01591  [pdf, other

    cs.LG math.ST

    Multivariate Time Series Anomaly Detection using DiffGAN Model

    Authors: Guangqiang Wu, Fu Zhang

    Abstract: In recent years, some researchers have applied diffusion models to multivariate time series anomaly detection. The partial diffusion strategy, which depends on the diffusion steps, is commonly used for anomaly detection in these models. However, different diffusion steps have an impact on the reconstruction of the original data, thereby impacting the effectiveness of anomaly detection. To address… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

    Comments: 19 pages, 3 figures, 1 table

    ACM Class: G.3

  42. arXiv:2501.00829  [pdf, other

    cs.NE cs.AI

    An LLM-Empowered Adaptive Evolutionary Algorithm For Multi-Component Deep Learning Systems

    Authors: Haoxiang Tian, Xingshuo Han, Guoquan Wu, An Guo, Yuan Zhou. Jie Zhang, Shuo Li, Jun Wei, Tianwei Zhang

    Abstract: Multi-objective evolutionary algorithms (MOEAs) are widely used for searching optimal solutions in complex multi-component applications. Traditional MOEAs for multi-component deep learning (MCDL) systems face challenges in enhancing the search efficiency while maintaining the diversity. To combat these, this paper proposes $μ$MOEA, the first LLM-empowered adaptive evolutionary search algorithm to… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

    Comments: 9

  43. arXiv:2412.20637  [pdf, other

    cs.CL

    Knowledge Editing for Large Language Model with Knowledge Neuronal Ensemble

    Authors: Yongchang Li, Yujin Zhu, Tao Yan, Shijian Fan, Gang Wu, Liang Xu

    Abstract: As real-world knowledge is constantly evolving, ensuring the timeliness and accuracy of a model's knowledge is crucial. This has made knowledge editing in large language models increasingly important. However, existing knowledge editing methods face several challenges, including parameter localization coupling, imprecise localization, and a lack of dynamic interaction across layers. In this paper,… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

    Comments: 26 pages, 5 figures, 2 tables

    MSC Class: 68T50

  44. arXiv:2412.20311  [pdf

    physics.optics

    Narrowband parallel coherent LiDAR with frequency interleaving

    Authors: Long Wang, Liang Hu, Wenhai Jiao, Yaxin Shang, Jianping Chen, Guiling Wu

    Abstract: The high demand for 3D imaging in intelligent robotics is motivating the advances of coherent LiDARs towards high performances with low complexity/cost. However, the current coherent LiDARs suffer from the tight coupling between the high ranging-imaging performance and the high complexity/cost. Herein, we propose a narrowband parallel coherent LiDAR with frequency-interleaving architecture. The Li… ▽ More

    Submitted 28 December, 2024; originally announced December 2024.

  45. arXiv:2412.18396  [pdf, other

    cs.IR

    Contrastive Representation for Interactive Recommendation

    Authors: Jingyu Li, Zhiyong Feng, Dongxiao He, Hongqi Chen, Qinghang Gao, Guoli Wu

    Abstract: Interactive Recommendation (IR) has gained significant attention recently for its capability to quickly capture dynamic interest and optimize both short and long term objectives. IR agents are typically implemented through Deep Reinforcement Learning (DRL), because DRL is inherently compatible with the dynamic nature of IR. However, DRL is currently not perfect for IR. Due to the large action spac… ▽ More

    Submitted 20 January, 2025; v1 submitted 24 December, 2024; originally announced December 2024.

    Comments: AAAI-2025 Accepted paper

  46. arXiv:2412.18231  [pdf, other

    cs.LG

    Towards Macro-AUC oriented Imbalanced Multi-Label Continual Learning

    Authors: Yan Zhang, Guoqiang Wu, Bingzheng Wang, Teng Pang, Haoliang Sun, Yilong Yin

    Abstract: In Continual Learning (CL), while existing work primarily focuses on the multi-class classification task, there has been limited research on Multi-Label Learning (MLL). In practice, MLL datasets are often class-imbalanced, making it inherently challenging, a problem that is even more acute in CL. Due to its sensitivity to imbalance, Macro-AUC is an appropriate and widely used measure in MLL. Howev… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

    Comments: 7 pages of main text, 11 pages of appendix, accepted to AAAI 2025

  47. arXiv:2412.17404  [pdf, other

    cs.AI

    BrainMAP: Learning Multiple Activation Pathways in Brain Networks

    Authors: Song Wang, Zhenyu Lei, Zhen Tan, Jiaqi Ding, Xinyu Zhao, Yushun Dong, Guorong Wu, Tianlong Chen, Chen Chen, Aiying Zhang, Jundong Li

    Abstract: Functional Magnetic Resonance Image (fMRI) is commonly employed to study human brain activity, since it offers insight into the relationship between functional fluctuations and human behavior. To enhance analysis and comprehension of brain activity, Graph Neural Networks (GNNs) have been widely applied to the analysis of functional connectivities (FC) derived from fMRI data, due to their ability t… ▽ More

    Submitted 31 January, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

    Comments: AAAI 2025

  48. arXiv:2412.16085  [pdf, other

    eess.IV cs.CV

    Efficient MedSAMs: Segment Anything in Medical Images on Laptop

    Authors: Jun Ma, Feifei Li, Sumin Kim, Reza Asakereh, Bao-Hiep Le, Dang-Khoa Nguyen-Vu, Alexander Pfefferle, Muxin Wei, Ruochen Gao, Donghang Lyu, Songxiao Yang, Lennart Purucker, Zdravko Marinov, Marius Staring, Haisheng Lu, Thuy Thanh Dao, Xincheng Ye, Zhi Li, Gianluca Brugnara, Philipp Vollmuth, Martha Foltyn-Dumitru, Jaeyoung Cho, Mustafa Ahmed Mahmutoglu, Martin Bendszus, Irada Pflüger , et al. (57 additional authors not shown)

    Abstract: Promptable segmentation foundation models have emerged as a transformative approach to addressing the diverse needs in medical images, but most existing models require expensive computing, posing a big barrier to their adoption in clinical practice. In this work, we organized the first international competition dedicated to promptable medical image segmentation, featuring a large-scale dataset spa… ▽ More

    Submitted 20 December, 2024; originally announced December 2024.

    Comments: CVPR 2024 MedSAM on Laptop Competition Summary: https://www.codabench.org/competitions/1847/

  49. arXiv:2412.13501  [pdf, other

    cs.AI cs.HC

    GUI Agents: A Survey

    Authors: Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, Xintong Li, Jing Shi, Hongjie Chen, Viet Dac Lai, Zhouhang Xie, Sungchul Kim, Ruiyi Zhang, Tong Yu, Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Thien Huu Nguyen , et al. (4 additional authors not shown)

    Abstract: Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems or software applications via GUIs, emulating human actions such as clicking, typing, and navigating visual elements across diverse platforms. Motivated by the growing interest and funda… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  50. arXiv:2412.12998  [pdf, other

    hep-ex

    Observation of the charmonium decay $η_c\toγγ$ in $J/ψ\toγη_c$

    Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann, H. Cai , et al. (658 additional authors not shown)

    Abstract: Using $(2712.4\pm14.3)\times10^{6}$ $ψ(3686)$ events collected with the BESIII detector at the BEPCII collider, the decay $η_c\toγγ$ in $J/ψ\toγη_c$ is observed. We determine the product branching fraction $\mathcal{B}(J/ψ\toγη_c)\times\mathcal{B}(η_c\toγγ)=(5.23\pm0.26_{\rm{stat.}}\pm0.30_{\rm{syst.}})\times10^{-6}$. This result is consistent with the LQCD calculation… ▽ More

    Submitted 2 April, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

    Comments: 10 pages, 4 figures