Skip to main content

Showing 1–50 of 547 results for author: Tae, J

.
  1. arXiv:2506.09310  [pdf, ps, other

    gr-qc

    Wormhole Solutions and pre-inflationary in $F(R, T)$ Gravity with Axion Fields

    Authors: Guo-He Li, Yeqi Fang, Yuchi Wu, Jun Tao

    Abstract: In this study, we investigate wormhole solutions and inflationary initial conditions in the coupled axion-inflaton/scalar field system within $F(R,T)$ gravity. Notably, the Euclidean action is reduced by approximately $10^5$ compared with the GR case. In Euclidean AdS spacetime, we construct Euclidean (semi-)wormhole geometries that naturally set inflationary initial conditions. To enhance the pro… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 15 pages, 9 figures

  2. arXiv:2506.07935  [pdf, ps, other

    cs.MA cs.AI cs.GT

    Diffusion of Responsibility in Collective Decision Making

    Authors: Pavel Naumov, Jia Tao

    Abstract: The term "diffusion of responsibility'' refers to situations in which multiple agents share responsibility for an outcome, obscuring individual accountability. This paper examines this frequently undesirable phenomenon in the context of collective decision-making mechanisms. The work shows that if a decision is made by two agents, then the only way to avoid diffusion of responsibility is for one… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  3. arXiv:2506.03880  [pdf, ps, other

    cs.CL cs.AI

    RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing

    Authors: Ruihan Jin, Pengpeng Shao, Zhengqi Wen, Jinyang Wu, Mingkuan Feng, Shuai Zhang, Jianhua Tao

    Abstract: The rapid advancements in large language models (LLMs) have led to the emergence of routing techniques, which aim to efficiently select the optimal LLM from diverse candidates to tackle specific tasks, optimizing performance while reducing costs. Current LLM routing methods are limited in effectiveness due to insufficient exploration of the intrinsic connection between user queries and the charact… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  4. arXiv:2506.02931  [pdf, ps, other

    cs.MA cs.AI cs.LG

    ThinkTank: A Framework for Generalizing Domain-Specific AI Agent Systems into Universal Collaborative Intelligence Platforms

    Authors: Praneet Sai Madhu Surabhi, Dheeraj Reddy Mudireddy, Jian Tao

    Abstract: This paper presents ThinkTank, a comprehensive and scalable framework designed to transform specialized AI agent systems into versatile collaborative intelligence platforms capable of supporting complex problem-solving across diverse domains. ThinkTank systematically generalizes agent roles, meeting structures, and knowledge integration mechanisms by adapting proven scientific collaboration method… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  5. arXiv:2506.02707  [pdf

    eess.SY

    Unit Commitment with Cost-Oriented Temporal Resolution

    Authors: Junyi Tao, Ran Li, Salvador Pineda

    Abstract: Time-adaptive unit commitment (UC) has recently been investigated to reduce the scheduling costs by flexibly varying the temporal resolution, which is usually determined by clustering the net load patterns. However, there exists a misalignment between cost and net load patterns due to the discrete start-up costs and out-of-merit-order dispatch triggered by ramping and other constraints. The optima… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  6. Event-based multi-view photogrammetry for high-dynamic, high-velocity target measurement

    Authors: Taihang Lei, Banglei Guan, Minzu Liang, Xiangyu Li, Jianbing Liu, Jing Tao, Yang Shang, Qifeng Yu

    Abstract: The characterization of mechanical properties for high-dynamic, high-velocity target motion is essential in industries. It provides crucial data for validating weapon systems and precision manufacturing processes etc. However, existing measurement methods face challenges such as limited dynamic range, discontinuous observations, and high costs. This paper presents a new approach leveraging an even… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: 9 pages, 9 figures, 1 table. This paper was accepted by Acta Mechanica Sinica (Date:30.May 2025)

  7. arXiv:2506.00375  [pdf, ps, other

    cs.SD eess.AS

    RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection

    Authors: Ruibo Fu, Xiaopeng Wang, Zhengqi Wen, Jianhua Tao, Yuankun Xie, Zhiyong Wang, Chunyu Qiang, Xuefei Liu, Cunhang Fan, Chenxing Li, Guanjun Li

    Abstract: Existing methods for deepfake audio detection have demonstrated some effectiveness. However, they still face challenges in generalizing to new forgery techniques and evolving attack patterns. This limitation mainly arises because the models rely heavily on the distribution of the training data and fail to learn a decision boundary that captures the essential characteristics of forgeries. Additiona… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  8. arXiv:2505.19287  [pdf, ps, other

    stat.ME stat.AP

    svc: An R package for Spatially Varying Coefficient Models

    Authors: Justice Akuoko-Frimpong, Edward Shao, Jonathan Ta

    Abstract: Traditional regression models assume stationary relationships between predictors and responses, failing to capture the spatial heterogeneity present in many environmental, epidemiological, and ecological processes. To address this limitation, we develop a scalable Bayesian framework for spatially varying coefficient (SVC) models, implemented in the \pkg{svc} R package (available at https://github.… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  9. arXiv:2505.18232  [pdf, ps, other

    cs.LG cs.AI cs.CL

    ELDeR: Getting Efficient LLMs through Data-Driven Regularized Layer-wise Pruning

    Authors: Mingkuan Feng, Jinyang Wu, Siyuan Liu, Shuai Zhang, Hongjian Fang, Ruihan Jin, Feihu Che, Pengpeng Shao, Zhengqi Wen, Jianhua Tao

    Abstract: The deployment of Large language models (LLMs) in many fields is largely hindered by their high computational and memory costs. Recent studies suggest that LLMs exhibit sparsity, which can be used for pruning. Previous pruning methods typically follow a prune-then-finetune paradigm. Since the pruned parts still contain valuable information, statically removing them without updating the remaining p… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  10. arXiv:2505.15692  [pdf, other

    cs.CL cs.LG

    Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities

    Authors: Jinyang Wu, Chonghua Liao, Mingkuan Feng, Shuai Zhang, Zhengqi Wen, Pengpeng Shao, Huazhe Xu, Jianhua Tao

    Abstract: Reinforcement learning (RL) has emerged as an effective method for training reasoning models. However, existing RL approaches typically bias the model's output distribution toward reward-maximizing paths without introducing external knowledge. This limits their exploration capacity and results in a narrower reasoning capability boundary compared to base models. To address this limitation, we propo… ▽ More

    Submitted 26 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  11. arXiv:2505.15210  [pdf, other

    cs.CL cs.IR

    Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs

    Authors: Jie Ma, Ning Qu, Zhitao Gao, Rui Xing, Jun Liu, Hongbin Pei, Jiang Xie, Linyun Song, Pinghui Wang, Jing Tao, Zhou Su

    Abstract: Knowledge graph-based retrieval-augmented generation seeks to mitigate hallucinations in Large Language Models (LLMs) caused by insufficient or outdated knowledge. However, existing methods often fail to fully exploit the prior knowledge embedded in knowledge graphs (KGs), particularly their structural information and explicit or implicit constraints. The former can enhance the faithfulness of LLM… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Under Review

    ACM Class: I.2.4

  12. arXiv:2505.14135  [pdf, other

    cs.CV

    Hunyuan-Game: Industrial-grade Intelligent Game Creation Model

    Authors: Ruihuang Li, Caijin Zhou, Shoujian Zheng, Jianxiang Lu, Jiabin Huang, Comi Chen, Junshu Tang, Guangzheng Xu, Jiale Tao, Hongmei Wang, Donghao Li, Wenqing Yu, Senbo Wang, Zhimin Li, Yetshuan Shi, Haoyu Yang, Yukun Wang, Wenxun Dai, Jiaqi Li, Linqing Wang, Qixun Wang, Zhiyong Xu, Yingfang Zhang, Jiangfeng Xiong, Weijie Kong , et al. (33 additional authors not shown)

    Abstract: Intelligent game creation represents a transformative advancement in game development, utilizing generative artificial intelligence to dynamically generate and enhance game content. Despite notable progress in generative models, the comprehensive synthesis of high-quality game assets, including both images and videos, remains a challenging frontier. To create high-fidelity game content that simult… ▽ More

    Submitted 28 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  13. arXiv:2505.11770  [pdf, ps, other

    cs.LG cs.AI cs.CL stat.ML

    Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors

    Authors: Jing Huang, Junyi Tao, Thomas Icard, Diyi Yang, Christopher Potts

    Abstract: Interpretability research now offers a variety of techniques for identifying abstract internal mechanisms in neural networks. Can such techniques be used to predict how models will behave on out-of-distribution examples? In this work, we provide a positive answer to this question. Through a diverse set of language modeling tasks--including symbol manipulation, knowledge retrieval, and instruction… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: ICML 2025

  14. arXiv:2505.11733  [pdf, ps, other

    cs.CL

    MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports

    Authors: Kevin Wu, Eric Wu, Rahul Thapa, Kevin Wei, Angela Zhang, Arvind Suresh, Jacqueline J. Tao, Min Woo Sun, Alejandro Lozano, James Zou

    Abstract: Doctors and patients alike increasingly use Large Language Models (LLMs) to diagnose clinical cases. However, unlike domains such as math or coding, where correctness can be objectively defined by the final answer, medical diagnosis requires both the outcome and the reasoning process to be accurate. Currently, widely used medical benchmarks like MedQA and MMLU assess only accuracy in the final ans… ▽ More

    Submitted 20 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  15. arXiv:2505.11079  [pdf, ps, other

    cs.SD cs.CL eess.AS

    $\mathcal{A}LLM4ADD$: Unlocking the Capabilities of Audio Large Language Models for Audio Deepfake Detection

    Authors: Hao Gu, Jiangyan Yi, Chenglong Wang, Jianhua Tao, Zheng Lian, Jiayi He, Yong Ren, Yujie Chen, Zhengqi Wen

    Abstract: Audio deepfake detection (ADD) has grown increasingly important due to the rise of high-fidelity audio generative models and their potential for misuse. Given that audio large language models (ALLMs) have made significant progress in various audio processing tasks, a heuristic question arises: Can ALLMs be leveraged to solve ADD?. In this paper, we first conduct a comprehensive zero-shot evaluatio… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  16. arXiv:2505.11044  [pdf, ps, other

    cs.LG

    Exploration by Random Distribution Distillation

    Authors: Zhirui Fang, Kai Yang, Jian Tao, Jiafei Lyu, Lusong Li, Li Shen, Xiu Li

    Abstract: Exploration remains a critical challenge in online reinforcement learning, as an agent must effectively explore unknown environments to achieve high returns. Currently, the main exploration algorithms are primarily count-based methods and curiosity-based methods, with prediction-error methods being a prominent example. In this paper, we propose a novel method called \textbf{R}andom \textbf{D}istri… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  17. arXiv:2505.09592  [pdf, ps, other

    physics.atom-ph

    Quantum-State-Controlled Collisions of Ultracold Polyatomic Molecules

    Authors: Nathaniel B. Vilas, Paige Robichaud, Christian Hallas, Junheng Tao, Loïc Anderegg, Grace K. Li, Hana Lampson, Lucie D. Augustovičová, John L. Bohn, John M. Doyle

    Abstract: Collisions between ultracold calcium monohydroxide (CaOH) molecules are realized and studied. Inelastic collision rate constants are measured for CaOH prepared in ground and excited vibrational states, and the electric field dependence of these rates is measured for molecules in single quantum states of the parity-doubled bending mode. Theoretical calculations of collision rate coefficients are pe… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Comments: 22 pages, 10 figures

  18. arXiv:2505.06312  [pdf, other

    cs.GT cs.AI

    Responsibility Gap in Collective Decision Making

    Authors: Pavel Naumov, Jia Tao

    Abstract: The responsibility gap is a set of outcomes of a collective decision-making mechanism in which no single agent is individually responsible. In general, when designing a decision-making process, it is desirable to minimise the gap. The paper proposes a concept of an elected dictatorship. It shows that, in a perfect information setting, the gap is empty if and only if the mechanism is an elected d… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: full version of an IJCAI-25 paper

  19. arXiv:2504.19423  [pdf, other

    cs.HC

    MER 2025: When Affective Computing Meets Large Language Models

    Authors: Zheng Lian, Rui Liu, Kele Xu, Bin Liu, Xuefei Liu, Yazhou Zhang, Xin Liu, Yong Li, Zebang Cheng, Haolin Zuo, Ziyang Ma, Xiaojiang Peng, Xie Chen, Ya Li, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao

    Abstract: MER2025 is the third year of our MER series of challenges, aiming to bring together researchers in the affective computing community to explore emerging trends and future directions in the field. Previously, MER2023 focused on multi-label learning, noise robustness, and semi-supervised learning, while MER2024 introduced a new track dedicated to open-vocabulary emotion recognition. This year, MER20… ▽ More

    Submitted 29 April, 2025; v1 submitted 27 April, 2025; originally announced April 2025.

  20. arXiv:2504.14465  [pdf, ps, other

    physics.flu-dyn nlin.CD nlin.PS

    The Onset of Metastable Turbulence in Pipe Flow

    Authors: Jiashun Guan, Jianjun Tao

    Abstract: The onset of turbulence in pipe flow has been a fundamental challenge in physics, applied mathematics, and engineering for over 140 years. To date, the precursor of this laminar-turbulent transition is recognized as transient turbulent spots or puffs, but their defining characteristics - longevity, abrupt relaminarization, and super-exponential lifetime scaling - have been lack of first-principles… ▽ More

    Submitted 9 June, 2025; v1 submitted 19 April, 2025; originally announced April 2025.

    Comments: 22 pages, 7 figures

  21. arXiv:2504.12395  [pdf, other

    cs.CV

    InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework

    Authors: Jiale Tao, Yanbing Zhang, Qixun Wang, Yiji Cheng, Haofan Wang, Xu Bai, Zhengguang Zhou, Ruihuang Li, Linqing Wang, Chunyu Wang, Qin Lin, Qinglin Lu

    Abstract: Current learning-based subject customization approaches, predominantly relying on U-Net architectures, suffer from limited generalization ability and compromised image quality. Meanwhile, optimization-based methods require subject-specific fine-tuning, which inevitably degrades textual controllability. To address these challenges, we propose InstantCharacter, a scalable framework for character cus… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: Tech Report. Code is available at https://github.com/Tencent/InstantCharacter

  22. arXiv:2504.08614  [pdf, other

    cond-mat.quant-gas physics.atom-ph quant-ph

    Imaginary gauge potentials in a non-Hermitian spin-orbit coupled quantum gas

    Authors: Junheng Tao, Emmanuel Mercado-Gutierrez, Mingshu Zhao, Ian Spielman

    Abstract: In 1996, Hatano and Nelson proposed a non-Hermitian lattice model containing an imaginary Peierls phase [Phys. Rev. Lett. 77 570-573 (1996)], which subsequent analyses revealed to be an instance of a new class of topological systems. Here, we experimentally realize a continuum analog to this model containing an imaginary gauge potential using a homogeneous spin-orbit coupled Bose-Einstein condensa… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  23. arXiv:2504.05197  [pdf, other

    cs.SD

    P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation

    Authors: Yong Ren, Jiangyan Yi, Tao Wang, Jianhua Tao, Zheng Lian, Zhengqi Wen, Chenxing Li, Ruibo Fu, Ye Bai, Xiaohui Zhang

    Abstract: Neural speech generation (NSG) has rapidly advanced as a key component of artificial intelligence-generated content, enabling the generation of high-quality, highly realistic speech for diverse applications. This development increases the risk of technique misuse and threatens social security. Audio watermarking can embed imperceptible marks into generated audio, providing a promising approach for… ▽ More

    Submitted 5 May, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

  24. arXiv:2504.00487  [pdf, other

    cs.MM cs.CL cs.CV

    FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning

    Authors: Jie Ma, Zhitao Gao, Qi Chai, Jun Liu, Pinghui Wang, Jing Tao, Zhou Su

    Abstract: Audio-Visual Question Answering (AVQA) is a challenging multimodal reasoning task requiring intelligent systems to answer natural language queries based on paired audio-video inputs accurately. However, existing AVQA approaches often suffer from overfitting to dataset biases, leading to poor robustness. Moreover, current datasets may not effectively diagnose these methods. To address these challen… ▽ More

    Submitted 2 April, 2025; v1 submitted 1 April, 2025; originally announced April 2025.

    Comments: Under Review

    ACM Class: H.5.1; I.2.4

  25. arXiv:2503.22724  [pdf, other

    cs.LG

    A Spatial-temporal Deep Probabilistic Diffusion Model for Reliable Hail Nowcasting with Radar Echo Extrapolation

    Authors: Haonan Shi, Long Tian, Jie Tao, Yufei Li, Liming Wang, Xiyang Liu

    Abstract: Hail nowcasting is a considerable contributor to meteorological disasters and there is a great need to mitigate its socioeconomic effects through precise forecast that has high resolution, long lead times and local details with large landscapes. Existing medium-range weather forecasting methods primarily rely on changes in upper air currents and cloud layers to predict precipitation events, such a… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  26. arXiv:2503.18246  [pdf, other

    eess.IV cs.CV

    ZECO: ZeroFusion Guided 3D MRI Conditional Generation

    Authors: Feiran Wang, Bin Duan, Jiachen Tao, Nikhil Sharma, Dawen Cai, Yan Yan

    Abstract: Medical image segmentation is crucial for enhancing diagnostic accuracy and treatment planning in Magnetic Resonance Imaging (MRI). However, acquiring precise lesion masks for segmentation model training demands specialized expertise and significant time investment, leading to a small dataset scale in clinical practice. In this paper, we present ZECO, a ZeroFusion guided 3D MRI conditional generat… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: Project page: \url{https://brack-wang.github.io/ZECO_web/}; Github Code: \url{https://github.com/Brack-Wang/ZECO}

  27. arXiv:2503.14359  [pdf, other

    cs.CV

    ImViD: Immersive Volumetric Videos for Enhanced VR Engagement

    Authors: Zhengxian Yang, Shi Pan, Shengqi Wang, Haoxiang Wang, Li Lin, Guanjun Li, Zhengqi Wen, Borong Lin, Jianhua Tao, Tao Yu

    Abstract: User engagement is greatly enhanced by fully immersive multi-modal experiences that combine visual and auditory stimuli. Consequently, the next frontier in VR/AR technologies lies in immersive volumetric videos with complete scene capture, large 6-DoF interaction space, multi-modal feedback, and high resolution & frame-rate contents. To stimulate the reconstruction of immersive volumetric videos,… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR 2025

  28. arXiv:2503.13991  [pdf, other

    cs.CV cs.AI

    GraphTEN: Graph Enhanced Texture Encoding Network

    Authors: Bo Peng, Jintao Chen, Mufeng Yao, Chenhao Zhang, Jianghui Zhang, Mingmin Chi, Jiang Tao

    Abstract: Texture recognition is a fundamental problem in computer vision and pattern recognition. Recent progress leverages feature aggregation into discriminative descriptions based on convolutional neural networks (CNNs). However, modeling non-local context relations through visual primitives remains challenging due to the variability and randomness of texture primitives in spatial distributions. In this… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: 6 pages, 7 figures, conference paper

    MSC Class: 68T45 ACM Class: I.2.10; I.4.7

  29. arXiv:2503.09962  [pdf, other

    cs.CV

    Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification

    Authors: Jiayu Jiang, Changxing Ding, Wentao Tan, Junhong Wang, Jin Tao, Xiangmin Xu

    Abstract: Text-to-image person re-identification (ReID) aims to retrieve the images of an interested person based on textual descriptions. One main challenge for this task is the high cost in manually annotating large-scale databases, which affects the generalization ability of ReID models. Recent works handle this problem by leveraging Multi-modal Large Language Models (MLLMs) to describe pedestrian images… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

    Comments: CVPR 2025. Project website: https://github.com/sssaury/HAM

  30. arXiv:2503.08596  [pdf, other

    cs.CV

    X-Field: A Physically Grounded Representation for 3D X-ray Reconstruction

    Authors: Feiran Wang, Jiachen Tao, Junyi Wu, Haoxuan Wang, Bin Duan, Kai Wang, Zongxin Yang, Yan Yan

    Abstract: X-ray imaging is indispensable in medical diagnostics, yet its use is tightly regulated due to potential health risks. To mitigate radiation exposure, recent research focuses on generating novel views from sparse inputs and reconstructing Computed Tomography (CT) volumes, borrowing representations from the 3D reconstruction area. However, these representations originally target visible light imagi… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: Project Page: \url{https://brack-wang.github.io/XField/}, Github Code: \url{https://github.com/Brack-Wang/X-Field}

  31. arXiv:2503.08131  [pdf, ps, other

    cs.LG

    Large Scale Multi-Task Bayesian Optimization with Large Language Models

    Authors: Yimeng Zeng, Natalie Maus, Haydn Thomas Jones, Jeffrey Tao, Fangping Wan, Marcelo Der Torossian Torres, Cesar de la Fuente-Nunez, Ryan Marcus, Osbert Bastani, Jacob R. Gardner

    Abstract: In multi-task Bayesian optimization, the goal is to leverage experience from optimizing existing tasks to improve the efficiency of optimizing new ones. While approaches using multi-task Gaussian processes or deep kernel transfer exist, the performance improvement is marginal when scaling beyond a moderate number of tasks. We introduce a novel approach leveraging large language models (LLMs) to le… ▽ More

    Submitted 12 June, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

  32. arXiv:2503.07886  [pdf, other

    physics.geo-ph

    Experimental Study on the Rotation-induced Reduction of Penetration Resistance in Sand

    Authors: Yong Tang, Yi Zhong, Julian Tao

    Abstract: Soil-dwelling organisms have evolved diverse strategies for efficient subterranean movement. For example, the seeds of Erodium cicutarium and Pelargonium species employ continuous rotational motion for self-burial, while the angled worm lizard Agamodon angeliceps tunnels by oscillating its head around its trunk's axis. These rotational movements significantly reduce penetration resistance. This st… ▽ More

    Submitted 6 April, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

    Comments: 17 pages, 18 figures

  33. arXiv:2502.18549  [pdf, other

    cs.LG cs.CR

    Target Defense with Multiple Defenders and an Agile Attacker via Residual Policy Learning

    Authors: Jiyue Tao, Tongsheng Shen, Dexin Zhao, Feitian Zhang

    Abstract: The target defense problem involves intercepting an attacker before it reaches a designated target region using one or more defenders. This letter focuses on a particularly challenging scenario in which the attacker is more agile than the defenders, significantly increasing the difficulty of effective interception. To address this challenge, we propose a novel residual policy framework that integr… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  34. arXiv:2502.17475  [pdf, other

    eess.SP cs.AI cs.CL cs.LG

    ECG-Expert-QA: A Benchmark for Evaluating Medical Large Language Models in Heart Disease Diagnosis

    Authors: Xu Wang, Jiaju Kang, Puyu Han, Yubao Zhao, Qian Liu, Liwenfei He, Lingqiong Zhang, Lingyun Dai, Yongcheng Wang, Jie Tao

    Abstract: We present ECG-Expert-QA, a comprehensive multimodal dataset for evaluating diagnostic capabilities in electrocardiogram (ECG) interpretation. It combines real-world clinical ECG data with systematically generated synthetic cases, covering 12 essential diagnostic tasks and totaling 47,211 expert-validated QA pairs. These encompass diverse clinical scenarios, from basic rhythm recognition to comple… ▽ More

    Submitted 7 April, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

  35. arXiv:2502.13917  [pdf, ps, other

    cs.CL

    TESS 2: A Large-Scale Generalist Diffusion Language Model

    Authors: Jaesung Tae, Hamish Ivison, Sachin Kumar, Arman Cohan

    Abstract: We introduce TESS 2, a general instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models, as well as matches and sometimes exceeds strong autoregressive (AR) models. We train TESS 2 by first adapting a strong AR model via continued pretraining with the usual cross-entropy as diffusion loss, and then performing further instruction tuning. We fin… ▽ More

    Submitted 31 May, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

    Comments: ACL 2025 camera-ready

  36. arXiv:2502.11342  [pdf, other

    cond-mat.str-el cond-mat.mtrl-sci

    Revisiting the charge-density-wave superlattice of 1$T$-TiSe$_2$

    Authors: Wei Wang, Patrick Liu, Lijun Wu, Jing Tao, Genda Gu, Alfred Zong, Yimei Zhu

    Abstract: A number of intriguing phenomena, including exciton condensation, orbital ordering, and emergence of chirality, have been proposed to accompany charge-density-wave (CDW) formation in the layered transition metal dichalcogenide 1$T$-TiSe$_2$. Explaining these effects relies on knowledge of the atomic displacement pattern underlying the CDW, yet structural proposals based on spatially-averaging bulk… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  37. arXiv:2502.05809  [pdf

    cond-mat.mtrl-sci

    Achieving electrode smoothing by controlling the nucleation phase of metal deposition through polymer-substrate binding

    Authors: Ying Xia, Duo Song, Mingyi Zhang, Zheming Wang, Chenyang Shi, Jingshan S. Du, Sun Hae Ra Shin, Mark H. Engelhard, Praveen K. Thallapally, Christine A. Orme, Jinhui Tao, Maria L. Sushko, James. J. De Yoreo, Jun Liu

    Abstract: Polymer additives [like polyethylene oxide (PEO)] are widely used for smooth electrode deposition in aqueous zinc and a number of other battery systems currently investigated for energy storage applications. However, the precise mechanism by which they regulate morphology and suppress dendrite formation remains unclear. In this study, we address this knowledge gap by using in-situ electrochemical… ▽ More

    Submitted 16 February, 2025; v1 submitted 9 February, 2025; originally announced February 2025.

  38. arXiv:2502.05256  [pdf, other

    cs.DB

    Learned Offline Query Planning via Bayesian Optimization

    Authors: Jeffrey Tao, Natalie Maus, Haydn Jones, Yimeng Zeng, Jacob R. Gardner, Ryan Marcus

    Abstract: Analytics database workloads often contain queries that are executed repeatedly. Existing optimization techniques generally prioritize keeping optimization cost low, normally well below the time it takes to execute a single instance of a query. If a given query is going to be executed thousands of times, could it be worth investing significantly more optimization time? In contrast to traditional o… ▽ More

    Submitted 7 February, 2025; originally announced February 2025.

  39. arXiv:2502.02339  [pdf, ps, other

    cs.CL

    Boosting Multimodal Reasoning with Automated Structured Thinking

    Authors: Jinyang Wu, Mingkuan Feng, Shuai Zhang, Fangrui Lv, Ruihan Jin, Feihu Che, Zengqi Wen, Jianhua Tao

    Abstract: Multimodal large language models excel across diverse domains but struggle with complex visual reasoning tasks. Current approaches aim to incorporate structured thinking via two strategies: explicit search methods and post-training techniques. However, both approaches face significant limitations: Search-based methods suffer from computational inefficiency due to extensive solution space explorati… ▽ More

    Submitted 30 May, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

  40. arXiv:2501.17905  [pdf, other

    cs.LG cs.AI cs.CL

    DReSS: Data-driven Regularized Structured Streamlining for Large Language Models

    Authors: Mingkuan Feng, Jinyang Wu, Shuai Zhang, Pengpeng Shao, Ruihan Jin, Zhengqi Wen, Jianhua Tao, Feihu Che

    Abstract: Large language models (LLMs) have achieved significant progress across various domains, but their increasing scale results in high computational and memory costs. Recent studies have revealed that LLMs exhibit sparsity, providing the potential to reduce model size through pruning techniques. However, existing pruning methods typically follow a prune-then-finetune paradigm. Since the pruned compone… ▽ More

    Submitted 9 February, 2025; v1 submitted 29 January, 2025; originally announced January 2025.

  41. arXiv:2501.16566  [pdf, other

    cs.HC

    AffectGPT: A New Dataset, Model, and Benchmark for Emotion Understanding with Multimodal Large Language Models

    Authors: Zheng Lian, Haoyu Chen, Lan Chen, Haiyang Sun, Licai Sun, Yong Ren, Zebang Cheng, Bin Liu, Rui Liu, Xiaojiang Peng, Jiangyan Yi, Jianhua Tao

    Abstract: The emergence of multimodal large language models (MLLMs) advances multimodal emotion recognition (MER) to the next level, from naive discriminative tasks to complex emotion understanding with advanced video understanding abilities and natural language description. However, the current community suffers from a lack of large-scale datasets with intensive, descriptive emotion annotations, as well as… ▽ More

    Submitted 7 May, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

  42. arXiv:2501.15269  [pdf, other

    cs.LG cs.CR cs.CV

    Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

    Authors: Yining Wang, Mi Zhang, Junjie Sun, Chenyue Wang, Min Yang, Hui Xue, Jialing Tao, Ranjie Duan, Jiexi Liu

    Abstract: Fusing visual understanding into language generation, Multi-modal Large Language Models (MLLMs) are revolutionizing visual-language applications. Yet, these models are often plagued by the hallucination problem, which involves generating inaccurate objects, attributes, and relationships that do not match the visual content. In this work, we delve into the internal attention mechanisms of MLLMs to… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

    Comments: USENIX Security 2025

  43. arXiv:2501.15044  [pdf, other

    eess.SP

    Signal Whisperers: Enhancing Wireless Reception Using DRL-Guided Reflector Arrays

    Authors: Hieu Le, Oguz Bedir, Mostafa Ibrahim, Jian Tao, Sabit Ekin

    Abstract: This paper presents a novel approach for enhancing wireless signal reception through self-adjustable metallic surfaces, termed reflectors, which are guided by deep reinforcement learning (DRL). The designed reflector system aims to improve signal quality for multiple users in scenarios where a direct line-of-sight (LOS) from the access point (AP) and reflector to users is not guaranteed. Utilizing… ▽ More

    Submitted 20 May, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

  44. arXiv:2501.06869  [pdf, other

    cs.AI cs.CV cs.HC cs.LG

    A Foundational Generative Model for Breast Ultrasound Image Analysis

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Haotian Ye, Siyu He, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, James Zou, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Foundational models have emerged as powerful tools for addressing various tasks in clinical settings. However, their potential development to breast ultrasound analysis remains untapped. In this paper, we present BUSGen, the first foundational generative model specifically designed for breast ultrasound image analysis. Pretrained on over 3.5 million breast ultrasound images, BUSGen has acquired ex… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

    Comments: Peking University; Stanford University; Peking University Cancer Hospital & Institute; Peking Union Medical College Hospital; Cancer Hospital, Chinese Academy of Medical Sciences

  45. arXiv:2501.06764  [pdf, other

    cs.LG

    MTPareto: A MultiModal Targeted Pareto Framework for Fake News Detection

    Authors: Kaiying Yan, Moyang Liu, Yukun Liu, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Xuefei Liu, Guanjun Li

    Abstract: Multimodal fake news detection is essential for maintaining the authenticity of Internet multimedia information. Significant differences in form and content of multimodal information lead to intensified optimization conflicts, hindering effective model training as well as reducing the effectiveness of existing fusion methods for bimodal. To address this problem, we propose the MTPareto framework t… ▽ More

    Submitted 24 January, 2025; v1 submitted 12 January, 2025; originally announced January 2025.

  46. arXiv:2501.04931  [pdf, other

    cs.CR cs.AI cs.CL

    Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

    Authors: Shiji Zhao, Ranjie Duan, Fengxiang Wang, Chi Chen, Caixin Kang, Jialing Tao, YueFeng Chen, Hui Xue, Xingxing Wei

    Abstract: Multimodal Large Language Models (MLLMs) have achieved impressive performance and have been put into practical use in commercial applications, but they still have potential safety mechanism vulnerabilities. Jailbreak attacks are red teaming methods that aim to bypass safety mechanisms and discover MLLMs' potential risks. Existing MLLMs' jailbreak methods often bypass the model's safety mechanism t… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

  47. arXiv:2412.19099  [pdf, other

    cs.SD eess.AS

    BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement

    Authors: Cunhang Fan, Enrui Liu, Andong Li, Jianhua Tao, Jian Zhou, Jiahao Li, Chengshi Zheng, Zhao Lv

    Abstract: Although the complex spectrum-based speech enhancement(SE) methods have achieved significant performance, coupling amplitude and phase can lead to a compensation effect, where amplitude information is sacrificed to compensate for the phase that is harmful to SE. In addition, to further improve the performance of SE, many modules are stacked onto SE, resulting in increased model complexity that lim… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI 2025

  48. arXiv:2412.15517  [pdf, other

    cs.LG

    Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learning

    Authors: Yangkun Chen, Kai Yang, Jian Tao, Jiafei Lyu

    Abstract: Recently, deep Multi-Agent Reinforcement Learning (MARL) has demonstrated its potential to tackle complex cooperative tasks, pushing the boundaries of AI in collaborative environments. However, the efficiency of these systems is often compromised by inadequate sample utilization and a lack of diversity in learning strategies. To enhance MARL performance, we introduce a novel sample reuse approach… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: AAAI 2025

  49. arXiv:2412.12759  [pdf, other

    cs.LG

    Versatile Ordering Network: An Attention-based Neural Network for Ordering Across Scales and Quality Metrics

    Authors: Zehua Yu, Weihan Zhang, Sihan Pan, Jun Tao

    Abstract: Ordering has been extensively studied in many visualization applications, such as axis and matrix reordering, for the simple reason that the order will greatly impact the perceived pattern of data. Many quality metrics concerning data pattern, perception, and aesthetics are proposed, and respective optimization algorithms are developed. However, the optimization problems related to ordering are of… ▽ More

    Submitted 18 December, 2024; v1 submitted 17 December, 2024; originally announced December 2024.

    Comments: has been accepted by TVCG on 11-Dec-2024

    MSC Class: I.2.6

  50. arXiv:2412.11551  [pdf, other

    cs.SD cs.AI eess.AS

    Region-Based Optimization in Continual Learning for Audio Deepfake Detection

    Authors: Yujie Chen, Jiangyan Yi, Cunhang Fan, Jianhua Tao, Yong Ren, Siding Zeng, Chu Yuan Zhang, Xinrui Yan, Hao Gu, Jun Xue, Chenglong Wang, Zhao Lv, Xiaohui Zhang

    Abstract: Rapid advancements in speech synthesis and voice conversion bring convenience but also new security risks, creating an urgent need for effective audio deepfake detection. Although current models perform well, their effectiveness diminishes when confronted with the diverse and evolving nature of real-world deepfakes. To address this issue, we propose a continual learning method named Region-Based O… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAI 2025