Skip to main content

Showing 1–50 of 358 results for author: Tian, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.04983  [pdf, ps, other

    stat.ME cs.AI

    Decomposition of Probabilities of Causation with Two Mediators

    Authors: Yuta Kawakami, Jin Tian

    Abstract: Mediation analysis for probabilities of causation (PoC) provides a fundamental framework for evaluating the necessity and sufficiency of treatment in provoking an event through different causal pathways. One of the primary objectives of causal mediation analysis is to decompose the total effect into path-specific components. In this study, we investigate the path-specific probability of necessity… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: arXiv admin note: text overlap with arXiv:2412.14491

  2. arXiv:2505.04971  [pdf, ps, other

    stat.ME cs.AI

    Moments of Causal Effects

    Authors: Yuta Kawakami, Jin Tian

    Abstract: The moments of random variables are fundamental statistical measures for characterizing the shape of a probability distribution, encompassing metrics such as mean, variance, skewness, and kurtosis. Additionally, the product moments, including covariance and correlation, reveal the relationships between multiple random variables. On the other hand, the primary focus of causal inference is the evalu… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  3. arXiv:2504.21385  [pdf, other

    cs.CV

    IDDM: Bridging Synthetic-to-Real Domain Gap from Physics-Guided Diffusion for Real-world Image Dehazing

    Authors: Shijun Zhou, Yajing Liu, Chunhui Hao, Zhiyuan Liu, Jiandong Tian

    Abstract: Due to the domain gap between real-world and synthetic hazy images, current data-driven dehazing algorithms trained on synthetic datasets perform well on synthetic data but struggle to generalize to real-world scenarios. To address this challenge, we propose \textbf{I}mage \textbf{D}ehazing \textbf{D}iffusion \textbf{M}odels (IDDM), a novel diffusion process that incorporates the atmospheric scatt… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  4. arXiv:2504.17789  [pdf, other

    cs.CV

    Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models

    Authors: Xu Ma, Peize Sun, Haoyu Ma, Hao Tang, Chih-Yao Ma, Jialiang Wang, Kunpeng Li, Xiaoliang Dai, Yujun Shi, Xuan Ju, Yushi Hu, Artsiom Sanakoyeu, Felix Juefei-Xu, Ji Hou, Junjiao Tian, Tao Xu, Tingbo Hou, Yen-Cheng Liu, Zecheng He, Zijian He, Matt Feiszli, Peizhao Zhang, Peter Vajda, Sam Tsai, Yun Fu

    Abstract: Autoregressive (AR) models, long dominant in language generation, are increasingly applied to image synthesis but are often considered less competitive than Diffusion-based models. A primary limitation is the substantial number of image tokens required for AR models, which constrains both training and inference efficiency, as well as image resolution. To address this, we present Token-Shuffle, a n… ▽ More

    Submitted 27 April, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

    Comments: Project Page: https://ma-xu.github.io/token-shuffle/ Add related works

  5. arXiv:2504.14493  [pdf, other

    cs.IR cs.AI cs.LG

    FinSage: A Multi-aspect RAG System for Financial Filings Question Answering

    Authors: Xinyu Wang, Jijun Chi, Zhenghan Tai, Tung Sum Thomas Kwok, Muzhi Li, Zhuhong Li, Hailin He, Yuchen Hua, Peng Lu, Suyuchen Wang, Yihong Wu, Jerry Huang, Jingrui Tian, Ling Zhou

    Abstract: Leveraging large language models in real-world settings often entails a need to utilize domain-specific data and tools in order to follow the complex regulations that need to be followed for acceptable use. Within financial sectors, modern enterprises increasingly rely on Retrieval-Augmented Generation (RAG) systems to address complex compliance requirements in financial document workflows. Howeve… ▽ More

    Submitted 29 April, 2025; v1 submitted 20 April, 2025; originally announced April 2025.

  6. arXiv:2504.12737  [pdf, other

    cs.CL

    Chinese-Vicuna: A Chinese Instruction-following Llama-based Model

    Authors: Chenghao Fan, Zhenyi Lu, Jie Tian

    Abstract: Chinese-Vicuna is an open-source, resource-efficient language model designed to bridge the gap in Chinese instruction-following capabilities by fine-tuning Meta's LLaMA architecture using Low-Rank Adaptation (LoRA). Targeting low-resource environments, it enables cost-effective deployment on consumer GPUs (e.g., RTX-2080Ti for 7B models) and supports domain-specific adaptation in fields like healt… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Chinese-Vicuna Technique Report

  7. arXiv:2504.06474  [pdf, other

    cs.AR

    FETTA: Flexible and Efficient Hardware Accelerator for Tensorized Neural Network Training

    Authors: Jinming Lu, Jiayi Tian, Hai Li, Ian Young, Zheng Zhang

    Abstract: The increasing demand for on-device training of deep neural networks (DNNs) aims to leverage personal data for high-performance applications while addressing privacy concerns and reducing communication latency. However, resource-constrained platforms face significant challenges due to the intensive computational and memory demands of DNN training. Tensor decomposition emerges as a promising approa… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

  8. arXiv:2504.05692  [pdf, other

    eess.IV cs.CV

    POMATO: Marrying Pointmap Matching with Temporal Motion for Dynamic 3D Reconstruction

    Authors: Songyan Zhang, Yongtao Ge, Jinyuan Tian, Guangkai Xu, Hao Chen, Chen Lv, Chunhua Shen

    Abstract: 3D reconstruction in dynamic scenes primarily relies on the combination of geometry estimation and matching modules where the latter task is pivotal for distinguishing dynamic regions which can help to mitigate the interference introduced by camera and object motion. Furthermore, the matching module explicitly models object motion, enabling the tracking of specific targets and advancing motion und… ▽ More

    Submitted 8 April, 2025; originally announced April 2025.

    Comments: code: https://github.com/wyddmw/POMATO

  9. arXiv:2504.04374  [pdf, other

    cs.CR cs.AI

    iADCPS: Time Series Anomaly Detection for Evolving Cyber-physical Systems via Incremental Meta-learning

    Authors: Jiyu Tian, Mingchu Li, Liming Chen, Zumin Wang

    Abstract: Anomaly detection for cyber-physical systems (ADCPS) is crucial in identifying faults and potential attacks by analyzing the time series of sensor measurements and actuator states. However, current methods lack adaptation to data distribution shifts in both temporal and spatial dimensions as cyber-physical systems evolve. To tackle this issue, we propose an incremental meta-learning-based approach… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  10. arXiv:2504.01561  [pdf, other

    eess.IV cs.CV

    STPNet: Scale-aware Text Prompt Network for Medical Image Segmentation

    Authors: Dandan Shan, Zihan Li, Yunxiang Li, Qingde Li, Jie Tian, Qingqi Hong

    Abstract: Accurate segmentation of lesions plays a critical role in medical image analysis and diagnosis. Traditional segmentation approaches that rely solely on visual features often struggle with the inherent uncertainty in lesion distribution and size. To address these issues, we propose STPNet, a Scale-aware Text Prompt Network that leverages vision-language modeling to enhance medical image segmentatio… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

  11. arXiv:2504.00999  [pdf, other

    cs.CV cs.AI

    MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization

    Authors: Siyuan Li, Luyuan Zhang, Zedong Wang, Juanxi Tian, Cheng Tan, Zicheng Liu, Chang Yu, Qingsong Xie, Haonan Lu, Haoqian Wang, Zhen Lei

    Abstract: Masked Image Modeling (MIM) with Vector Quantization (VQ) has achieved great success in both self-supervised pre-training and image generation. However, most existing methods struggle to address the trade-off in shared latent space for generation quality vs. representation learning and efficiency. To push the limits of this paradigm, we propose MergeVQ, which incorporates token merging techniques… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: CVPR2025 (in process for more analysis and extension)

  12. arXiv:2503.22205  [pdf, other

    cs.LG cs.CV

    Data-Free Universal Attack by Exploiting the Intrinsic Vulnerability of Deep Models

    Authors: YangTian Yan, Jinyu Tian

    Abstract: Deep neural networks (DNNs) are susceptible to Universal Adversarial Perturbations (UAPs), which are instance agnostic perturbations that can deceive a target model across a wide range of samples. Unlike instance-specific adversarial examples, UAPs present a greater challenge as they must generalize across different samples and models. Generating UAPs typically requires access to numerous examples… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

    Comments: Accepted in AAAI 2025

  13. arXiv:2503.20322  [pdf, other

    cs.CV

    Dynamic Pyramid Network for Efficient Multimodal Large Language Model

    Authors: Hao Ai, Kunyi Wang, Zezhou Wang, Hao Lu, Jin Tian, Yaxin Luo, Peng Xing, Jen-Yuan Huang, Huaxia Li, Gen luo

    Abstract: Multimodal large language models (MLLMs) have demonstrated impressive performance in various vision-language (VL) tasks, but their expensive computations still limit the real-world application. To address this issue, recent efforts aim to compress the visual features to save the computational costs of MLLMs. However, direct visual compression methods, e.g. efficient projectors, inevitably destroy… ▽ More

    Submitted 24 April, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  14. arXiv:2503.20031  [pdf, other

    astro-ph.IM cs.CE

    Lossy Compression of Scientific Data: Applications Constrains and Requirements

    Authors: Franck Cappello, Allison Baker, Ebru Bozda, Martin Burtscher, Kyle Chard, Sheng Di, Paul Christopher O Grady, Peng Jiang, Shaomeng Li, Erik Lindahl, Peter Lindstrom, Magnus Lundborg, Kai Zhao, Xin Liang, Masaru Nagaso, Kento Sato, Amarjit Singh, Seung Woo Son, Dingwen Tao, Jiannan Tian, Robert Underwood, Kazutomo Yoshii, Danylo Lykov, Yuri Alexeev, Kyle Gerard Felker

    Abstract: Increasing data volumes from scientific simulations and instruments (supercomputers, accelerators, telescopes) often exceed network, storage, and analysis capabilities. The scientific community's response to this challenge is scientific data reduction. Reduction can take many forms, such as triggering, sampling, filtering, quantization, and dimensionality reduction. This report focuses on a specif… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

    Comments: 33 pages

  15. arXiv:2503.19889  [pdf

    cond-mat.mtrl-sci cs.RO

    A Multi-Agent Framework Integrating Large Language Models and Generative AI for Accelerated Metamaterial Design

    Authors: Jie Tian, Martin Taylor Sobczak, Dhanush Patil, Jixin Hou, Lin Pang, Arunachalam Ramanathan, Libin Yang, Xianyan Chen, Yuval Golan, Xiaoming Zhai, Hongyue Sun, Kenan Song, Xianqiao Wang

    Abstract: Metamaterials, renowned for their exceptional mechanical, electromagnetic, and thermal properties, hold transformative potential across diverse applications, yet their design remains constrained by labor-intensive trial-and-error methods and limited data interoperability. Here, we introduce CrossMatAgent -- a novel multi-agent framework that synergistically integrates large language models with st… ▽ More

    Submitted 6 April, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

  16. arXiv:2503.17407  [pdf, other

    cs.CL cs.LG

    A Comprehensive Survey on Long Context Language Modeling

    Authors: Jiaheng Liu, Dawei Zhu, Zhiqi Bai, Yancheng He, Huanxuan Liao, Haoran Que, Zekun Wang, Chenchen Zhang, Ge Zhang, Jiebin Zhang, Yuanxing Zhang, Zhuo Chen, Hangyu Guo, Shilong Li, Ziqiang Liu, Yong Shan, Yifan Song, Jiayi Tian, Wenhao Wu, Zhejian Zhou, Ruijie Zhu, Junlan Feng, Yang Gao, Shizhu He, Zhoujun Li , et al. (12 additional authors not shown)

    Abstract: Efficient processing of long contexts has been a persistent pursuit in Natural Language Processing. With the growing number of long documents, dialogues, and other textual data, it is important to develop Long Context Language Models (LCLMs) that can process and analyze extensive inputs in an effective and efficient way. In this paper, we present a comprehensive survey on recent advances in long-c… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  17. arXiv:2503.13435  [pdf, other

    cs.CV

    WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes

    Authors: Ling Yang, Kaixin Zhu, Juanxi Tian, Bohan Zeng, Mingbao Lin, Hongjuan Pei, Wentao Zhang, Shuicheng Yan

    Abstract: With the rapid development of 3D reconstruction technology, research in 4D reconstruction is also advancing, existing 4D reconstruction methods can generate high-quality 4D scenes. However, due to the challenges in acquiring multi-view video data, the current 4D reconstruction benchmarks mainly display actions performed in place, such as dancing, within limited scenarios. In practical scenarios, m… ▽ More

    Submitted 29 April, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

    Comments: Project: https://github.com/Gen-Verse/WideRange4D

  18. arXiv:2503.09294  [pdf, other

    cs.CV

    IQPFR: An Image Quality Prior for Blind Face Restoration and Beyond

    Authors: Peng Hu, Chunming He, Lei Xu, Jingduo Tian, Sina Farsiu, Yulun Zhang, Pei Liu, Xiu Li

    Abstract: Blind Face Restoration (BFR) addresses the challenge of reconstructing degraded low-quality (LQ) facial images into high-quality (HQ) outputs. Conventional approaches predominantly rely on learning feature representations from ground-truth (GT) data; however, inherent imperfections in GT datasets constrain restoration performance to the mean quality level of the training data, rather than attainin… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  19. arXiv:2503.08533  [pdf, other

    cs.CL cs.SD eess.AS

    ESPnet-SDS: Unified Toolkit and Demo for Spoken Dialogue Systems

    Authors: Siddhant Arora, Yifan Peng, Jiatong Shi, Jinchuan Tian, William Chen, Shikhar Bharadwaj, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Shuichiro Shimizu, Vaibhav Srivastav, Shinji Watanabe

    Abstract: Advancements in audio foundation models (FMs) have fueled interest in end-to-end (E2E) spoken dialogue systems, but different web interfaces for each system makes it challenging to compare and contrast them effectively. Motivated by this, we introduce an open-source, user-friendly toolkit designed to build unified web interfaces for various cascaded and E2E spoken dialogue systems. Our demo furthe… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: Accepted at NAACL 2025 Demo Track

  20. arXiv:2503.05808  [pdf, other

    cs.AI cs.LG cs.RO

    DriveGen: Towards Infinite Diverse Traffic Scenarios with Large Models

    Authors: Shenyu Zhang, Jiaguo Tian, Zhengbang Zhu, Shan Huang, Jucheng Yang, Weinan Zhang

    Abstract: Microscopic traffic simulation has become an important tool for autonomous driving training and testing. Although recent data-driven approaches advance realistic behavior generation, their learning still relies primarily on a single real-world dataset, which limits their diversity and thereby hinders downstream algorithm optimization. In this paper, we propose DriveGen, a novel traffic simulation… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: 8 pages, 3 figures

  21. arXiv:2503.05595  [pdf, other

    cs.CV

    Anti-Diffusion: Preventing Abuse of Modifications of Diffusion-Based Models

    Authors: Zheng Li, Liangbin Xie, Jiantao Zhou, Xintao Wang, Haiwei Wu, Jinyu Tian

    Abstract: Although diffusion-based techniques have shown remarkable success in image generation and editing tasks, their abuse can lead to severe negative social impacts. Recently, some works have been proposed to provide defense against the abuse of diffusion-based methods. However, their protection may be limited in specific scenarios by manually defined prompts or the stable diffusion (SD) version. Furth… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  22. arXiv:2503.03528  [pdf, other

    cs.CV cs.AI

    AdaSin: Enhancing Hard Sample Metrics with Dual Adaptive Penalty for Face Recognition

    Authors: Qiqi Guo, Zhuowen Zheng, Guanghua Yang, Zhiquan Liu, Xiaofan Li, Jianqing Li, Jinyu Tian, Xueyuan Gong

    Abstract: In recent years, the emergence of deep convolutional neural networks has positioned face recognition as a prominent research focus in computer vision. Traditional loss functions, such as margin-based, hard-sample mining-based, and hybrid approaches, have achieved notable performance improvements, with some leveraging curriculum learning to optimize training. However, these methods often fall short… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  23. arXiv:2503.02332  [pdf, other

    eess.IV cs.CV

    COMMA: Coordinate-aware Modulated Mamba Network for 3D Dispersed Vessel Segmentation

    Authors: Gen Shi, Hui Zhang, Jie Tian

    Abstract: Accurate segmentation of 3D vascular structures is essential for various medical imaging applications. The dispersed nature of vascular structures leads to inherent spatial uncertainty and necessitates location awareness, yet most current 3D medical segmentation models rely on the patch-wise training strategy that usually loses this spatial context. In this study, we introduce the Coordinate-aware… ▽ More

    Submitted 14 March, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

  24. arXiv:2503.00948  [pdf, other

    cs.CV

    Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think

    Authors: Jie Tian, Xiaoye Qu, Zhenyi Lu, Wei Wei, Sichen Liu, Yu Cheng

    Abstract: Image-to-Video (I2V) generation aims to synthesize a video clip according to a given image and condition (e.g., text). The key challenge of this task lies in simultaneously generating natural motions while preserving the original appearance of the images. However, current I2V diffusion models (I2V-DMs) often produce videos with limited motion degrees or exhibit uncontrollable motion that conflicts… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR2025

    MSC Class: 68T45 ACM Class: I.2.10

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

  25. arXiv:2502.16880  [pdf, other

    cs.CL cs.AI cs.LG

    CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter

    Authors: Yepeng Weng, Dianwen Mei, Huishi Qiu, Xujie Chen, Li Liu, Jiang Tian, Zhongchao Shi

    Abstract: Speculative decoding is a powerful technique that accelerates Large Language Model (LLM) inference by leveraging a lightweight speculative draft model. However, existing designs suffers in performance due to misalignment between training and inference. Recent methods have tried to solve this issue by adopting a multi-step training strategy, but the complex inputs of different training steps make i… ▽ More

    Submitted 1 March, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

    Comments: Under Review

  26. arXiv:2502.15895  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Directional Gradient Projection for Robust Fine-Tuning of Foundation Models

    Authors: Chengyue Huang, Junjiao Tian, Brisa Maneechotesuwan, Shivang Chopra, Zsolt Kira

    Abstract: Robust fine-tuning aims to adapt large foundation models to downstream tasks while preserving their robustness to distribution shifts. Existing methods primarily focus on constraining and projecting current model towards the pre-trained initialization based on the magnitudes between fine-tuned and pre-trained weights, which often require extensive hyper-parameter tuning and can sometimes result in… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Comments: Accepted to ICLR 2025

  27. arXiv:2502.15802  [pdf, other

    cs.LG cs.AI cs.IT

    A General Error-Theoretical Analysis Framework for Constructing Compression Strategies

    Authors: Boyang Zhang, Daning Cheng, Yunquan Zhang, Meiqi Tu, Fangmin Liu, Jiake Tian

    Abstract: The exponential growth in parameter size and computational complexity of deep models poses significant challenges for efficient deployment. The core problem of existing compression methods is that different layers of the model have significant differences in their tolerance to compression levels. For instance, the first layer of a model can typically sustain a higher compression level compared to… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: Under Review

  28. arXiv:2502.15218  [pdf, other

    cs.CL cs.SD eess.AS

    ESPnet-SpeechLM: An Open Speech Language Model Toolkit

    Authors: Jinchuan Tian, Jiatong Shi, William Chen, Siddhant Arora, Yoshiki Masuyama, Takashi Maekaku, Yihan Wu, Junyi Peng, Shikhar Bharadwaj, Yiwen Zhao, Samuele Cornell, Yifan Peng, Xiang Yue, Chao-Han Huck Yang, Graham Neubig, Shinji Watanabe

    Abstract: We present ESPnet-SpeechLM, an open toolkit designed to democratize the development of speech language models (SpeechLMs) and voice-driven agentic applications. The toolkit standardizes speech processing tasks by framing them as universal sequential modeling problems, encompassing a cohesive workflow of data preprocessing, pre-training, inference, and task evaluation. With ESPnet-SpeechLM, users c… ▽ More

    Submitted 24 February, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

  29. arXiv:2502.14893  [pdf, other

    cs.CV cs.AI cs.LG cs.SD eess.AS

    NOTA: Multimodal Music Notation Understanding for Visual Large Language Model

    Authors: Mingni Tang, Jiajia Li, Lu Yang, Zhiqiang Zhang, Jinghao Tian, Zuchao Li, Lefei Zhang, Ping Wang

    Abstract: Symbolic music is represented in two distinct forms: two-dimensional, visually intuitive score images, and one-dimensional, standardized text annotation sequences. While large language models have shown extraordinary potential in music, current research has primarily focused on unimodal symbol sequence text. Existing general-domain visual language models still lack the ability of music notation un… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  30. arXiv:2502.10373  [pdf, other

    cs.CL cs.AI cs.LG eess.AS

    OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models

    Authors: William Chen, Jinchuan Tian, Yifan Peng, Brian Yan, Chao-Han Huck Yang, Shinji Watanabe

    Abstract: Neural scaling laws offer valuable insights for designing robust sequence processing architectures. While these laws have been extensively characterized in other modalities, their behavior in speech remains comparatively underexplored. In this work, we introduce OWLS, an open-access, reproducible suite of multilingual speech recognition and translation models spanning 0.25B to 18B parameters, with… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

    Comments: 23 pages, 13 figures

  31. arXiv:2502.06655  [pdf, other

    cs.AI

    Unbiased Evaluation of Large Language Models from a Causal Perspective

    Authors: Meilin Chen, Jian Tian, Liang Ma, Di Xie, Weijie Chen, Jiang Zhu

    Abstract: Benchmark contamination has become a significant concern in the LLM evaluation community. Previous Agents-as-an-Evaluator address this issue by involving agents in the generation of questions. Despite their success, the biases in Agents-as-an-Evaluator methods remain largely unexplored. In this paper, we present a theoretical formulation of evaluation bias, providing valuable insights into designi… ▽ More

    Submitted 12 May, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

    Comments: Accepted by ICML 2025

  32. arXiv:2502.01789  [pdf

    cs.AI cs.MA

    An Agentic AI Workflow for Detecting Cognitive Concerns in Real-world Data

    Authors: Jiazi Tian, Liqin Wang, Pedram Fard, Valdery Moura Junior, Deborah Blacker, Jennifer S. Haas, Chirag Patel, Shawn N. Murphy, Lidia M. V. R. Moura, Hossein Estiri

    Abstract: Early identification of cognitive concerns is critical but often hindered by subtle symptom presentation. This study developed and validated a fully automated, multi-agent AI workflow using LLaMA 3 8B to identify cognitive concerns in 3,338 clinical notes from Mass General Brigham. The agentic workflow, leveraging task-specific agents that dynamically collaborate to extract meaningful insights fro… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  33. arXiv:2501.11949  [pdf, other

    cs.LG

    GLAM: Global-Local Variation Awareness in Mamba-based World Model

    Authors: Qian He, Wenqi Liang, Chunhui Hao, Gan Sun, Jiandong Tian

    Abstract: Mimicking the real interaction trajectory in the inference of the world model has been shown to improve the sample efficiency of model-based reinforcement learning (MBRL) algorithms. Many methods directly use known state sequences for reasoning. However, this approach fails to enhance the quality of reasoning by capturing the subtle variation between states. Much like how humans infer trends in ev… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

  34. arXiv:2501.06663  [pdf, other

    cs.LG cs.AR cs.CL

    Ultra Memory-Efficient On-FPGA Training of Transformers via Tensor-Compressed Optimization

    Authors: Jiayi Tian, Jinming Lu, Hai Li, Xiangwei Wang, Cong, Hao, Ian Young, Zheng Zhang

    Abstract: Transformer models have achieved state-of-the-art performance across a wide range of machine learning tasks. There is growing interest in training transformers on resource-constrained edge devices due to considerations such as privacy, domain adaptation, and on-device scientific machine learning. However, the significant computational and memory demands required for transformer training often exce… ▽ More

    Submitted 11 January, 2025; originally announced January 2025.

  35. arXiv:2501.04308  [pdf, other

    eess.SP cs.LG

    FSC-loss: A Frequency-domain Structure Consistency Learning Approach for Signal Data Recovery and Reconstruction

    Authors: Liwen Zhang, Zhaoji Miao, Fan Yang, Gen Shi, Jie He, Yu An, Hui Hui, Jie Tian

    Abstract: A core challenge for signal data recovery is to model the distribution of signal matrix (SM) data based on measured low-quality data in biomedical engineering of magnetic particle imaging (MPI). For acquiring the high-resolution (high-quality) SM, the number of meticulous measurements at numerous positions in the field-of-view proves time-consuming (measurement of a 37x37x37 SM takes about 32 hour… ▽ More

    Submitted 8 January, 2025; originally announced January 2025.

    Comments: 11 pages,7 figures

    MSC Class: F.2.2

  36. arXiv:2501.01604  [pdf, other

    cs.SD eess.AS

    Disentangling Hierarchical Features for Anomalous Sound Detection Under Domain Shift

    Authors: Jian Guan, Jiantong Tian, Qiaoxi Zhu, Feiyang Xiao, Hejing Zhang, Xubo Liu

    Abstract: Anomalous sound detection (ASD) encounters difficulties with domain shift, where the sounds of machines in target domains differ significantly from those in source domains due to varying operating conditions. Existing methods typically employ domain classifiers to enhance detection performance, but they often overlook the influence of domain-unrelated information. This oversight can hinder the mod… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

    Comments: Accepted by ICASSP 2025

  37. arXiv:2501.00461  [pdf

    cs.AI cs.LG cs.MA

    Efficient support ticket resolution using Knowledge Graphs

    Authors: Sherwin Varghese, James Tian

    Abstract: A review of over 160,000 customer cases indicates that about 90% of time is spent by the product support for solving around 10% of subset of tickets where a trivial solution may not exist. Many of these challenging cases require the support of several engineers working together within a "swarm", and some also need to go to development support as bugs. These challenging customer issues represent a… ▽ More

    Submitted 31 December, 2024; originally announced January 2025.

  38. arXiv:2412.19947  [pdf, other

    cs.LG cs.AI cs.CR cs.CV stat.ML

    Standard-Deviation-Inspired Regularization for Improving Adversarial Robustness

    Authors: Olukorede Fakorede, Modeste Atsague, Jin Tian

    Abstract: Adversarial Training (AT) has been demonstrated to improve the robustness of deep neural networks (DNNs) against adversarial attacks. AT is a min-max optimization procedure where in adversarial examples are generated to train a more robust DNN. The inner maximization step of AT increases the losses of inputs with respect to their actual classes. The outer minimization involves minimizing the losse… ▽ More

    Submitted 27 December, 2024; originally announced December 2024.

  39. arXiv:2412.17667  [pdf, other

    cs.SD cs.MM eess.AS

    VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

    Authors: Jiatong Shi, Hye-jin Shim, Jinchuan Tian, Siddhant Arora, Haibin Wu, Darius Petermann, Jia Qi Yip, You Zhang, Yuxun Tang, Wangyou Zhang, Dareen Safar Alharthi, Yichen Huang, Koichi Saito, Jionghao Han, Yiwen Zhao, Chris Donahue, Shinji Watanabe

    Abstract: In this work, we introduce VERSA, a unified and standardized evaluation toolkit designed for various speech, audio, and music signals. The toolkit features a Pythonic interface with flexible configuration and dependency control, making it user-friendly and efficient. With full installation, VERSA offers 65 metrics with 729 metric variations based on different configurations. These metrics encompas… ▽ More

    Submitted 26 March, 2025; v1 submitted 23 December, 2024; originally announced December 2024.

  40. arXiv:2412.14491  [pdf, ps, other

    cs.AI

    Mediation Analysis for Probabilities of Causation

    Authors: Yuta Kawakami, Jin Tian

    Abstract: Probabilities of causation (PoC) offer valuable insights for informed decision-making. This paper introduces novel variants of PoC-controlled direct, natural direct, and natural indirect probability of necessity and sufficiency (PNS). These metrics quantify the necessity and sufficiency of a treatment for producing an outcome, accounting for different causal pathways. We develop identification the… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  41. arXiv:2412.11618  [pdf, other

    cs.LG cs.AI

    EvoLlama: Enhancing LLMs' Understanding of Proteins via Multimodal Structure and Sequence Representations

    Authors: Nuowei Liu, Changzhi Sun, Tao Ji, Junfeng Tian, Jianxin Tang, Yuanbin Wu, Man Lan

    Abstract: Current Large Language Models (LLMs) for understanding proteins primarily treats amino acid sequences as a text modality. Meanwhile, Protein Language Models (PLMs), such as ESM-2, have learned massive sequential evolutionary knowledge from the universe of natural protein sequences. Furthermore, structure-based encoders like ProteinMPNN learn the structural information of proteins through Graph Neu… ▽ More

    Submitted 16 December, 2024; originally announced December 2024.

  42. arXiv:2412.06867  [pdf, other

    cs.LG cs.AI cs.CC

    Lossless Model Compression via Joint Low-Rank Factorization Optimization

    Authors: Boyang Zhang, Daning Cheng, Yunquan Zhang, Fangmin Liu, Jiake Tian

    Abstract: Low-rank factorization is a popular model compression technique that minimizes the error $δ$ between approximated and original weight matrices. Despite achieving performances close to the original models when $δ$ is optimized, a performance discrepancy remains due to the separate optimization processes for low-rank factorization and model performance, resulting in unavoidable losses. We address th… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

    Comments: Under Review

  43. arXiv:2412.06412  [pdf, other

    astro-ph.IM cs.AI cs.CL

    StarWhisper Telescope: Agent-Based Observation Assistant System to Approach AI Astrophysicist

    Authors: Cunshi Wang, Xinjie Hu, Yu Zhang, Xunhao Chen, Pengliang Du, Yiming Mao, Rui Wang, Yuyang Li, Ying Wu, Hang Yang, Yansong Li, Beichuan Wang, Haiyang Mu, Zheng Wang, Jianfeng Tian, Liang Ge, Yongna Mao, Shengming Li, Xiaomeng Lu, Jinhang Zou, Yang Huang, Ningchen Sun, Jie Zheng, Min He, Yu Bai , et al. (3 additional authors not shown)

    Abstract: With the rapid advancements in Large Language Models (LLMs), LLM-based agents have introduced convenient and user-friendly methods for leveraging tools across various domains. In the field of astronomical observation, the construction of new telescopes has significantly increased astronomers' workload. Deploying LLM-powered agents can effectively alleviate this burden and reduce the costs associat… ▽ More

    Submitted 10 April, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

    Comments: 36 pages

  44. arXiv:2412.04429  [pdf, other

    cs.CV cs.LG

    Grounding Descriptions in Images informs Zero-Shot Visual Recognition

    Authors: Shaunak Halbe, Junjiao Tian, K J Joseph, James Seale Smith, Katherine Stevo, Vineeth N Balasubramanian, Zsolt Kira

    Abstract: Vision-language models (VLMs) like CLIP have been cherished for their ability to perform zero-shot visual recognition on open-vocabulary concepts. This is achieved by selecting the object category whose textual representation bears the highest similarity with the query image. While successful in some domains, this method struggles with identifying fine-grained entities as well as generalizing to u… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

  45. arXiv:2412.02410  [pdf, other

    cs.SE cs.AI

    A Multi-Agent Framework for Extensible Structured Text Generation in PLCs

    Authors: Donghao Yang, Aolang Wu, Tianyi Zhang, Li Zhang, Fang Liu, Xiaoli Lian, Yuming Ren, Jiaji Tian

    Abstract: Programmable Logic Controllers (PLCs) are microcomputers essential for automating factory operations. Structured Text (ST), a high-level language adhering to the IEC 61131-3 standard, is pivotal for PLCs due to its ability to express logic succinctly and to seamlessly integrate with other languages within the same standard. However, vendors develop their own customized versions of ST, and the lack… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  46. arXiv:2412.01268  [pdf, other

    cs.CV

    Ponder & Press: Advancing Visual GUI Agent towards General Computer Control

    Authors: Yiqin Wang, Haoji Zhang, Jingqi Tian, Yansong Tang

    Abstract: Most existing GUI agents typically depend on non-vision inputs like HTML source code or accessibility trees, limiting their flexibility across diverse software environments and platforms. Current multimodal large language models (MLLMs), which excel at using vision to ground real-world objects, offer a potential alternative. However, they often struggle with accurately localizing GUI elements -- a… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

  47. arXiv:2412.01253  [pdf, other

    cs.CL cs.AI cs.LG

    Yi-Lightning Technical Report

    Authors: Alan Wake, Bei Chen, C. X. Lv, Chao Li, Chengen Huang, Chenglin Cai, Chujie Zheng, Daniel Cooper, Fan Zhou, Feng Hu, Ge Zhang, Guoyin Wang, Heng Ji, Howard Qiu, Jiangcheng Zhu, Jun Tian, Katherine Su, Lihuan Zhang, Liying Li, Ming Song, Mou Li, Peng Liu, Qicheng Hu, Shawn Wang, Shijun Zhou , et al. (19 additional authors not shown)

    Abstract: This technical report presents Yi-Lightning, our latest flagship large language model (LLM). It achieves exceptional performance, ranking 6th overall on Chatbot Arena, with particularly strong results (2nd to 4th place) in specialized categories including Chinese, Math, Coding, and Hard Prompts. Yi-Lightning leverages an enhanced Mixture-of-Experts (MoE) architecture, featuring advanced expert seg… ▽ More

    Submitted 22 January, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

  48. arXiv:2411.15504  [pdf, other

    physics.med-ph cs.RO

    Effects of Muscle Synergy during Overhead Work with a Passive Shoulder Exoskeleton: A Case Study

    Authors: Jin Tian, Baichun Wei, Chifu Yang, Suo Luo, Jiadong Feng, Ping Li, Changbing Chen, Yingjie Liu, Haiqi Zhu, Chunzhi Yi

    Abstract: Objective: Shoulder exoskeletons can effectively assist with overhead work. However, their impacts on muscle synergy remain unclear. The objective is to systematically investigate the effects of the shoulder exoskeleton on muscle synergies during overhead work.Methods: Eight male participants were recruited to perform a screwing task both with (Intervention) and without (Normal) the exoskeleton. E… ▽ More

    Submitted 23 November, 2024; originally announced November 2024.

  49. arXiv:2411.13770  [pdf, other

    cs.RO

    A Novel Passive Occupational Shoulder Exoskeleton With Adjustable Peak Assistive Torque Angle For Overhead Tasks

    Authors: Jin Tian, Haiqi Zhu, Changjia Lu, Chifu Yang, Yingjie Liu, Baichun Wei, Chunzhi Yi

    Abstract: Objective: Overhead tasks are a primary inducement to work-related musculoskeletal disorders. Aiming to reduce shoulder physical loads, passive shoulder exoskeletons are increasingly prevalent in the industry due to their lightweight, affordability, and effectiveness. However, they can only accommodate a specific task and cannot effectively balance between compactness and sufficient range of motio… ▽ More

    Submitted 23 November, 2024; v1 submitted 20 November, 2024; originally announced November 2024.

  50. arXiv:2411.10008  [pdf, other

    cs.AI

    Graph-based Complexity for Causal Effect by Empirical Plug-in

    Authors: Rina Dechter, Annie Raichev, Alexander Ihler, Jin Tian

    Abstract: This paper focuses on the computational complexity of computing empirical plug-in estimates for causal effect queries. Given a causal graph and observational data, any identifiable causal query can be estimated from an expression over the observed variables, called the estimand. The estimand can then be evaluated by plugging in probabilities computed empirically from data. In contrast to conventio… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.