Skip to main content

Showing 1–50 of 151 results for author: Wei, k

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.13776  [pdf, ps, other

    cs.AI cs.CY cs.HC

    Recommendations and Reporting Checklist for Rigorous & Transparent Human Baselines in Model Evaluations

    Authors: Kevin L. Wei, Patricia Paskov, Sunishchal Dev, Michael J. Byun, Anka Reuel, Xavier Roberts-Gaal, Rachel Calcott, Evie Coxon, Chinmay Deshpande

    Abstract: In this position paper, we argue that human baselines in foundation model evaluations must be more rigorous and more transparent to enable meaningful comparisons of human vs. AI performance, and we provide recommendations and a reporting checklist towards this end. Human performance baselines are vital for the machine learning community, downstream users, and policymakers to interpret AI evaluatio… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: A version of this paper has been accepted to ICML 2025 as a position paper (spotlight), with the title: "Position: Human Baselines in Model Evaluations Need Rigor and Transparency (With Recommendations & Reporting Checklist)."

  2. arXiv:2506.09562  [pdf, ps, other

    cs.CR cs.LG

    TooBadRL: Trigger Optimization to Boost Effectiveness of Backdoor Attacks on Deep Reinforcement Learning

    Authors: Songze Li, Mingxuan Zhang, Kang Wei, Shouling Ji

    Abstract: Deep reinforcement learning (DRL) has achieved remarkable success in a wide range of sequential decision-making domains, including robotics, healthcare, smart grids, and finance. Recent research demonstrates that attackers can efficiently exploit system vulnerabilities during the training phase to execute backdoor attacks, producing malicious actions when specific trigger patterns are present in t… ▽ More

    Submitted 12 June, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

  3. Bridging the Artificial Intelligence Governance Gap: The United States' and China's Divergent Approaches to Governing General-Purpose Artificial Intelligence

    Authors: Oliver Guest, Kevin Wei

    Abstract: The United States and China are among the world's top players in the development of advanced artificial intelligence (AI) systems, and both are keen to lead in global AI governance and development. A look at U.S. and Chinese policy landscapes reveals differences in how the two countries approach the governance of general-purpose artificial intelligence (GPAI) systems. Three areas of divergence are… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Published as a RAND commentary

    Report number: PE-A3703-1

    Journal ref: Santa Monica, CA: RAND Corporation, 2024. https://www.rand.org/pubs/perspectives/PEA3703-1.html

  4. arXiv:2505.22313  [pdf, ps, other

    physics.optics cs.CV cs.ET cs.GR

    Large-Area Fabrication-aware Computational Diffractive Optics

    Authors: Kaixuan Wei, Hector A. Jimenez-Romero, Hadi Amata, Jipeng Sun, Qiang Fu, Felix Heide, Wolfgang Heidrich

    Abstract: Differentiable optics, as an emerging paradigm that jointly optimizes optics and (optional) image processing algorithms, has made innovative optical designs possible across a broad range of applications. Many of these systems utilize diffractive optical components (DOEs) for holography, PSF engineering, or wavefront shaping. Existing approaches have, however, mostly remained limited to laboratory… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  5. arXiv:2505.19514  [pdf, other

    cs.CL cs.AI cs.LG

    SIPDO: Closed-Loop Prompt Optimization via Synthetic Data Feedback

    Authors: Yaoning Yu, Ye Yu, Kai Wei, Haojing Luo, Haohan Wang

    Abstract: Prompt quality plays a critical role in the performance of large language models (LLMs), motivating a growing body of work on prompt optimization. Most existing methods optimize prompts over a fixed dataset, assuming static input distributions and offering limited support for iterative improvement. We introduce SIPDO (Self-Improving Prompts through Data-Augmented Optimization), a closed-loop frame… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  6. arXiv:2505.17217  [pdf, ps, other

    cs.CL cs.AI cs.CY

    Mitigating Gender Bias via Fostering Exploratory Thinking in LLMs

    Authors: Kangda Wei, Hasnat Md Abdullah, Ruihong Huang

    Abstract: Large Language Models (LLMs) often exhibit gender bias, resulting in unequal treatment of male and female subjects across different contexts. To address this issue, we propose a novel data generation framework that fosters exploratory thinking in LLMs. Our approach prompts models to generate story pairs featuring male and female protagonists in structurally identical, morally ambiguous scenarios,… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  7. arXiv:2505.11733  [pdf, ps, other

    cs.CL

    MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports

    Authors: Kevin Wu, Eric Wu, Rahul Thapa, Kevin Wei, Angela Zhang, Arvind Suresh, Jacqueline J. Tao, Min Woo Sun, Alejandro Lozano, James Zou

    Abstract: Doctors and patients alike increasingly use Large Language Models (LLMs) to diagnose clinical cases. However, unlike domains such as math or coding, where correctness can be objectively defined by the final answer, medical diagnosis requires both the outcome and the reasoning process to be accurate. Currently, widely used medical benchmarks like MedQA and MMLU assess only accuracy in the final ans… ▽ More

    Submitted 20 May, 2025; v1 submitted 16 May, 2025; originally announced May 2025.

  8. arXiv:2505.01643  [pdf, other

    cs.CY

    Third-party compliance reviews for frontier AI safety frameworks

    Authors: Aidan Homewood, Sophie Williams, Noemi Dreksler, John Lidiard, Malcolm Murray, Lennart Heim, Marta Ziosi, Seán Ó hÉigeartaigh, Michael Chen, Kevin Wei, Christoph Winter, Miles Brundage, Ben Garfinkel, Jonas Schuett

    Abstract: Safety frameworks have emerged as a best practice for managing risks from frontier artificial intelligence (AI) systems. However, it may be difficult for stakeholders to know if companies are adhering to their frameworks. This paper explores a potential solution: third-party compliance reviews. During a third-party compliance review, an independent external party assesses whether a frontier AI com… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 27 pages, 1 figure, 5 tables

  9. arXiv:2504.12324  [pdf, other

    cs.CL cs.AI

    Cross-Document Cross-Lingual NLI via RST-Enhanced Graph Fusion and Interpretability Prediction

    Authors: Mengying Yuan, Wenhao Wang, Zixuan Wang, Yujie Huang, Kangli Wei, Fei Li, Chong Teng, Donghong Ji

    Abstract: Natural Language Inference (NLI) is a fundamental task in natural language processing. While NLI has developed many sub-directions such as sentence-level NLI, document-level NLI and cross-lingual NLI, Cross-Document Cross-Lingual NLI (CDCL-NLI) remains largely unexplored. In this paper, we propose a novel paradigm: CDCL-NLI, which extends traditional NLI capabilities to multi-document, multilingua… ▽ More

    Submitted 20 May, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

  10. arXiv:2504.04346  [pdf, other

    cs.AI cs.SI

    Crowdsourcing-Based Knowledge Graph Construction for Drug Side Effects Using Large Language Models with an Application on Semaglutide

    Authors: Zhijie Duan, Kai Wei, Zhaoqian Xue, Jiayan Zhou, Shu Yang, Siyuan Ma, Jin Jin, Lingyao li

    Abstract: Social media is a rich source of real-world data that captures valuable patient experience information for pharmacovigilance. However, mining data from unstructured and noisy social media content remains a challenging task. We present a systematic framework that leverages large language models (LLMs) to extract medication side effects from social media and organize them into a knowledge graph (KG)… ▽ More

    Submitted 7 April, 2025; v1 submitted 5 April, 2025; originally announced April 2025.

    MSC Class: J.4

  11. arXiv:2504.03906  [pdf, other

    cs.CL

    CliME: Evaluating Multimodal Climate Discourse on Social Media and the Climate Alignment Quotient (CAQ)

    Authors: Abhilekh Borah, Hasnat Md Abdullah, Kangda Wei, Ruihong Huang

    Abstract: The rise of Large Language Models (LLMs) has raised questions about their ability to understand climate-related contexts. Though climate change dominates social media, analyzing its multimodal expressions is understudied, and current tools have failed to determine whether LLMs amplify credible solutions or spread unsubstantiated claims. To address this, we introduce CliME (Climate Change Multimoda… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 16 pages, 9 figures

  12. arXiv:2503.09251  [pdf, other

    cs.LG cs.AI q-bio.QM

    SCOPE-DTI: Semi-Inductive Dataset Construction and Framework Optimization for Practical Usability Enhancement in Deep Learning-Based Drug Target Interaction Prediction

    Authors: Yigang Chen, Xiang Ji, Ziyue Zhang, Yuming Zhou, Yang-Chi-Dung Lin, Hsi-Yuan Huang, Tao Zhang, Yi Lai, Ke Chen, Chang Su, Xingqiao Lin, Zihao Zhu, Yanggyi Zhang, Kangping Wei, Jiehui Fu, Yixian Huang, Shidong Cui, Shih-Chung Yen, Ariel Warshel, Hsien-Da Huang

    Abstract: Deep learning-based drug-target interaction (DTI) prediction methods have demonstrated strong performance; however, real-world applicability remains constrained by limited data diversity and modeling complexity. To address these challenges, we propose SCOPE-DTI, a unified framework combining a large-scale, balanced semi-inductive human DTI dataset with advanced deep learning modeling. Constructed… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  13. arXiv:2503.00162  [pdf, other

    cs.CV cs.AI cs.CL cs.MA

    PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos

    Authors: Kangda Wei, Zhengyu Zhou, Bingqing Wang, Jun Araki, Lukas Lange, Ruihong Huang, Zhe Feng

    Abstract: In recent years, online lecture videos have become an increasingly popular resource for acquiring new knowledge. Systems capable of effectively understanding/indexing lecture videos are thus highly desirable, enabling downstream tasks like question answering to help users efficiently locate specific information within videos. This work proposes PreMind, a novel multi-agent multimodal framework tha… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

  14. arXiv:2502.19425  [pdf, other

    physics.soc-ph cs.CY

    Will the Technological Singularity Come Soon? Modeling the Dynamics of Artificial Intelligence Development via Multi-Logistic Growth Process

    Authors: Guangyin Jin, Xiaohan Ni, Kun Wei, Jie Zhao, Haoming Zhang, Leiming Jia

    Abstract: We are currently in an era of escalating technological complexity and profound societal transformations, where artificial intelligence (AI) technologies exemplified by large language models (LLMs) have reignited discussions on the 'Technological Singularity'. 'Technological Singularity' is a philosophical concept referring to an irreversible and profound transformation that occurs when AI capabili… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  15. arXiv:2502.15677  [pdf, other

    cs.CL cs.AI cs.LG

    FLEKE: Federated Locate-then-Edit Knowledge Editing

    Authors: Zongkai Zhao, Guozeng Xu, Xiuhua Li, Kaiwen Wei, Jiang Zhong

    Abstract: Locate-then-Edit Knowledge Editing (LEKE) is a key technique for updating large language models (LLMs) without full retraining. However, existing methods assume a single-user setting and become inefficient in real-world multi-client scenarios, where decentralized organizations (e.g., hospitals, financial institutions) independently update overlapping knowledge, leading to redundant mediator knowle… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  16. arXiv:2502.14864  [pdf, other

    cs.AI cs.CV

    Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework

    Authors: Yuming Yang, Jiang Zhong, Li Jin, Jingwang Huang, Jingpeng Gao, Qing Liu, Yang Bai, Jingyuan Zhang, Rui Jiang, Kaiwen Wei

    Abstract: Multimodal Retrieval-Augmented Generation (MRAG) enhances reasoning capabilities by integrating external knowledge. However, existing benchmarks primarily focus on simple image-text interactions, overlooking complex visual formats like charts that are prevalent in real-world applications. In this work, we introduce a novel task, Chart-based MRAG, to address this limitation. To semi-automatically g… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

  17. arXiv:2502.13954  [pdf, other

    cs.CL cs.LG

    Latent Distribution Decoupling: A Probabilistic Framework for Uncertainty-Aware Multimodal Emotion Recognition

    Authors: Jingwang Huang, Jiang Zhong, Qin Lei, Jinpeng Gao, Yuming Yang, Sirui Wang, Peiguang Li, Kaiwen Wei

    Abstract: Multimodal multi-label emotion recognition (MMER) aims to identify the concurrent presence of multiple emotions in multimodal data. Existing studies primarily focus on improving fusion strategies and modeling modality-to-label dependencies. However, they often overlook the impact of \textbf{aleatoric uncertainty}, which is the inherent noise in the multimodal data and hinders the effectiveness of… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  18. arXiv:2502.12509   

    cs.CL cs.AI

    LegalCore: A Dataset for Event Coreference Resolution in Legal Documents

    Authors: Kangda Wei, Xi Shi, Jonathan Tong, Sai Ramana Reddy, Anandhavelu Natarajan, Rajiv Jain, Aparna Garimella, Ruihong Huang

    Abstract: Recognizing events and their coreferential mentions in a document is essential for understanding semantic meanings of text. The existing research on event coreference resolution is mostly limited to news articles. In this paper, we present the first dataset for the legal domain, LegalCore, which has been annotated with comprehensive event and event coreference information. The legal contract docum… ▽ More

    Submitted 20 March, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: Need company internal approval before public release

  19. arXiv:2502.10641  [pdf, other

    cs.CL

    Toward Equitable Access: Leveraging Crowdsourced Reviews to Investigate Public Perceptions of Health Resource Accessibility

    Authors: Zhaoqian Xue, Guanhong Liu, Kai Wei, Chong Zhang, Qingcheng Zeng, Songhua Hu, Wenyue Hua, Lizhou Fan, Yongfeng Zhang, Lingyao Li

    Abstract: Access to health resources is a critical determinant of public well-being and societal resilience, particularly during public health crises when demand for medical services and preventive care surges. However, disparities in accessibility persist across demographic and geographic groups, raising concerns about equity. Traditional survey methods often fall short due to limitations in coverage, cost… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  20. arXiv:2502.01635  [pdf, other

    cs.SE cs.AI

    The AI Agent Index

    Authors: Stephen Casper, Luke Bailey, Rosco Hunter, Carson Ezell, Emma Cabalé, Michael Gerovitch, Stewart Slocum, Kevin Wei, Nikola Jurkovic, Ariba Khan, Phillip J. K. Christoffersen, A. Pinar Ozisik, Rakshit Trivedi, Dylan Hadfield-Menell, Noam Kolt

    Abstract: Leading AI developers and startups are increasingly deploying agentic AI systems that can plan and execute complex tasks with limited human involvement. However, there is currently no structured framework for documenting the technical components, intended uses, and safety features of agentic systems. To fill this gap, we introduce the AI Agent Index, the first public database to document informati… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: Accompanying website: https://aiagentindex.mit.edu/

  21. arXiv:2501.16515  [pdf, other

    cs.HC

    SimulataR: Rapid Assisted Reality Prototyping using Design-Blended Videos

    Authors: Ashwin Ram, Yue Gu, Bowen Wang, Sneha Jaikumar, Youqi Wu, Benjamin Tan Kuan Wei, Qingyang Xu, Haiming Liu, Shengdong Zhao

    Abstract: Assisted Reality (aR) is a subfield of Augmented Reality (AR) that overlays information onto a user's immediate view via see-through head-mounted displays (OST-HMDs). This technology has proven to be effective and energy-efficient to support the user and information interaction for everyday wearable intelligent systems. The aR viewing experience, however, is affected by varying real-world backgrou… ▽ More

    Submitted 9 February, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

  22. arXiv:2501.13497  [pdf, other

    cs.SD cs.CL eess.AS

    DQ-Data2vec: Decoupling Quantization for Multilingual Speech Recognition

    Authors: Qijie Shao, Linhao Dong, Kun Wei, Sining Sun, Lei Xie

    Abstract: Data2vec is a self-supervised learning (SSL) approach that employs a teacher-student architecture for contextual representation learning via masked prediction, demonstrating remarkable performance in monolingual ASR. Previous studies have revealed that data2vec's shallow layers capture speaker and language information, middle layers encode phoneme and word features, while deep layers are responsib… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

    Comments: Submitted to the IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

  23. arXiv:2501.13306  [pdf, other

    cs.SD cs.CL eess.AS

    OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia

    Authors: Xuelong Geng, Kun Wei, Qijie Shao, Shuiyun Liu, Zhennan Lin, Zhixian Zhao, Guojian Li, Wenjie Tian, Peikun Chen, Yangze Li, Pengcheng Guo, Mingchen Shao, Shuiyuan Wang, Yuang Cao, Chengyou Wang, Tianyi Xu, Yuhang Dai, Xinfa Zhu, Yue Li, Li Zhang, Lei Xie

    Abstract: Large Language Models (LLMs) have made significant progress in various downstream tasks, inspiring the development of Speech Understanding Language Models (SULMs) to enable comprehensive speech-based interactions. However, most advanced SULMs are developed by the industry, leveraging large-scale datasets and computational resources that are not readily available to the academic community. Moreover… ▽ More

    Submitted 16 February, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

    Comments: OSUM Technical Report v2. The experimental results reported herein differ from those in v1 because of adding new data and training in more steps

  24. arXiv:2501.10114  [pdf, other

    cs.AI

    Infrastructure for AI Agents

    Authors: Alan Chan, Kevin Wei, Sihao Huang, Nitarshan Rajkumar, Elija Perrier, Seth Lazar, Gillian K. Hadfield, Markus Anderljung

    Abstract: AI agents plan and execute interactions in open-ended environments. For example, OpenAI's Operator can use a web browser to do product comparisons and buy online goods. To facilitate beneficial interactions and mitigate harmful ones, much research focuses on directly modifying agent behaviour. For example, developers can train agents to follow user instructions. This focus on direct modifications… ▽ More

    Submitted 16 May, 2025; v1 submitted 17 January, 2025; originally announced January 2025.

    Comments: Accepted to TMLR

  25. arXiv:2501.09606  [pdf, other

    cs.CY

    Local US officials' views on the impacts and governance of AI: Evidence from 2022 and 2023 survey waves

    Authors: Sophia Hatz, Noemi Dreksler, Kevin Wei, Baobao Zhang

    Abstract: This paper presents a survey of local US policymakers' views on the future impact and regulation of AI. Our survey provides insight into US policymakers' expectations regarding the effects of AI on local communities and the nation, as well as their attitudes towards specific regulatory policies. Conducted in two waves (2022 and 2023), the survey captures changes in attitudes following the release… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

  26. arXiv:2501.07120  [pdf, other

    eess.IV cs.CV

    MSV-Mamba: A Multiscale Vision Mamba Network for Echocardiography Segmentation

    Authors: Xiaoxian Yang, Qi Wang, Kaiqi Zhang, Ke Wei, Jun Lyu, Lingchao Chen

    Abstract: Ultrasound imaging frequently encounters challenges, such as those related to elevated noise levels, diminished spatiotemporal resolution, and the complexity of anatomical structures. These factors significantly hinder the model's ability to accurately capture and analyze structural relationships and dynamic patterns across various regions of the heart. Mamba, an emerging model, is one of the most… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

  27. arXiv:2410.13042  [pdf, ps, other

    cs.CY

    How Do AI Companies "Fine-Tune" Policy? Examining Regulatory Capture in AI Governance

    Authors: Kevin Wei, Carson Ezell, Nick Gabrieli, Chinmay Deshpande

    Abstract: Industry actors in the United States have gained extensive influence in conversations about the regulation of general-purpose artificial intelligence (AI) systems. Although industry participation is an important part of the policy process, it can also cause regulatory capture, whereby industry co-opts regulatory regimes to prioritize private over public welfare. Capture of AI policy by AI develope… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 39 pages (14 pages main text), 3 figures, 9 tables. To be published in the Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, & Society (AIES)

    Journal ref: Proc. AAAI/ACM Conf. AI, Ethics & Soc., 7 (2024) 1539-1555

  28. arXiv:2410.01180  [pdf, other

    cs.CV cs.CL

    UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark

    Authors: Hasnat Md Abdullah, Tian Liu, Kangda Wei, Shu Kong, Ruihong Huang

    Abstract: Localizing unusual activities, such as human errors or surveillance incidents, in videos holds practical significance. However, current video understanding models struggle with localizing these unusual events likely because of their insufficient representation in models' pretraining datasets. To explore foundation models' capability in localizing unusual activity, we introduce UAL-Bench, a compreh… ▽ More

    Submitted 1 October, 2024; originally announced October 2024.

    Journal ref: wacv(2025) 5801-5811

  29. arXiv:2409.19878  [pdf, other

    cs.SD eess.AS

    HDMoLE: Mixture of LoRA Experts with Hierarchical Routing and Dynamic Thresholds for Fine-Tuning LLM-based ASR Models

    Authors: Bingshen Mu, Kun Wei, Qijie Shao, Yong Xu, Lei Xie

    Abstract: Recent advancements in integrating Large Language Models (LLM) with automatic speech recognition (ASR) have performed remarkably in general domains. While supervised fine-tuning (SFT) of all model parameters is often employed to adapt pre-trained LLM-based ASR models to specific domains, it imposes high computational costs and notably reduces their performance in general domains. In this paper, we… ▽ More

    Submitted 3 January, 2025; v1 submitted 29 September, 2024; originally announced September 2024.

    Comments: Accepted by ICASSP 2025

  30. arXiv:2409.11214  [pdf, other

    eess.AS cs.SD

    Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text

    Authors: Hongfei Xue, Wei Ren, Xuelong Geng, Kun Wei, Longhao Li, Qijie Shao, Linju Yang, Kai Diao, Lei Xie

    Abstract: Integrating audio encoders with LLMs through connectors has enabled these models to process and comprehend audio modalities, significantly enhancing speech-to-text tasks, including automatic speech recognition (ASR) and automatic speech translation (AST). However, these methods often overlook the critical aspect of language adaptation in multilingual settings, relying instead on multilingual data… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: 5 pages, 3 figures, submitted to ICASSP 2025

  31. arXiv:2409.09754  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    Towards Single-Lens Controllable Depth-of-Field Imaging via Depth-Aware Point Spread Functions

    Authors: Xiaolong Qian, Qi Jiang, Yao Gao, Shaohua Gao, Zhonghua Yi, Lei Sun, Kai Wei, Haifeng Li, Kailun Yang, Kaiwei Wang, Jian Bai

    Abstract: Controllable Depth-of-Field (DoF) imaging commonly produces amazing visual effects based on heavy and expensive high-end lenses. However, confronted with the increasing demand for mobile scenarios, it is desirable to achieve a lightweight solution with Minimalist Optical Systems (MOS). This work centers around two major limitations of MOS, i.e., the severe optical aberrations and uncontrollable Do… ▽ More

    Submitted 11 February, 2025; v1 submitted 15 September, 2024; originally announced September 2024.

    Comments: Accepted to IEEE Transactions on Computational Imaging (TCI). The source code and the established dataset will be publicly available at https://github.com/XiaolongQian/DCDI

  32. arXiv:2408.17431  [pdf, other

    eess.AS cs.AI

    Advancing Multi-talker ASR Performance with Large Language Models

    Authors: Mohan Shi, Zengrui Jin, Yaoxun Xu, Yong Xu, Shi-Xiong Zhang, Kun Wei, Yiwen Shao, Chunlei Zhang, Dong Yu

    Abstract: Recognizing overlapping speech from multiple speakers in conversational scenarios is one of the most challenging problem for automatic speech recognition (ASR). Serialized output training (SOT) is a classic method to address multi-talker ASR, with the idea of concatenating transcriptions from multiple speakers according to the emission times of their speech for training. However, SOT-style transcr… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 8 pages, accepted by IEEE SLT 2024

  33. arXiv:2408.14298  [pdf, other

    cs.DC cs.LG

    Resource Efficient Asynchronous Federated Learning for Digital Twin Empowered IoT Network

    Authors: Shunfeng Chu, Jun Li, Jianxin Wang, Yiyang Ni, Kang Wei, Wen Chen, Shi Jin

    Abstract: As an emerging technology, digital twin (DT) can provide real-time status and dynamic topology mapping for Internet of Things (IoT) devices. However, DT and its implementation within industrial IoT networks necessitates substantial, distributed data support, which often leads to ``data silos'' and raises privacy concerns. To address these issues, we develop a dynamic resource scheduling algorithm… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 13 pages, 8 figures

  34. arXiv:2407.19568  [pdf, other

    cs.CL cs.AI

    Are LLMs Good Annotators for Discourse-level Event Relation Extraction?

    Authors: Kangda Wei, Aayush Gautam, Ruihong Huang

    Abstract: Large Language Models (LLMs) have demonstrated proficiency in a wide array of natural language processing tasks. However, its effectiveness over discourse-level event relation extraction (ERE) tasks remains unexplored. In this paper, we assess the effectiveness of LLMs in addressing discourse-level ERE tasks characterized by lengthy documents and intricate relations encompassing coreference, tempo… ▽ More

    Submitted 22 February, 2025; v1 submitted 28 July, 2024; originally announced July 2024.

  35. arXiv:2407.19446  [pdf, ps, other

    cs.IT stat.ML

    Leave-One-Out Analysis for Nonconvex Robust Matrix Completion with General Thresholding Functions

    Authors: Tianming Wang, Ke Wei

    Abstract: We study the problem of robust matrix completion (RMC), where the partially observed entries of an underlying low-rank matrix is corrupted by sparse noise. Existing analysis of the non-convex methods for this problem either requires the explicit but empirically redundant regularization in the algorithm or requires sample splitting in the analysis. In this paper, we consider a simple yet efficient… ▽ More

    Submitted 25 April, 2025; v1 submitted 28 July, 2024; originally announced July 2024.

    Comments: The dependence of the condition number is improved in the sample complexity

  36. arXiv:2407.19078  [pdf, other

    cs.LG stat.ML

    Practical Marketplace Optimization at Uber Using Causally-Informed Machine Learning

    Authors: Bobby Chen, Siyu Chen, Jason Dowlatabadi, Yu Xuan Hong, Vinayak Iyer, Uday Mantripragada, Rishabh Narang, Apoorv Pandey, Zijun Qin, Abrar Sheikh, Hongtao Sun, Jiaqi Sun, Matthew Walker, Kaichen Wei, Chen Xu, Jingnan Yang, Allen T. Zhang, Guoqing Zhang

    Abstract: Budget allocation of marketplace levers, such as incentives for drivers and promotions for riders, has long been a technical and business challenge at Uber; understanding lever budget changes' impact and estimating cost efficiency to achieve predefined budgets is crucial, with the goal of optimal allocations that maximize business value; we introduce an end-to-end machine learning and optimization… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: To be published in the 2nd Workshop on Causal Inference and Machine Learning in Practice, KDD 2024, August 25 to 29, 2024, Barcelona, Spain, 10 pages

    MSC Class: 62J99

  37. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 2 October, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted by EMNLP 2024 main conference

  38. arXiv:2406.00752  [pdf, other

    cs.DC

    Blockchain-aided wireless federated learning: Resource allocation and client scheduling

    Authors: Jun Li, Weiwei Zhang, Kang Wei, Guangji Chen, Feng Shu, Wen Chen, Shi Jin

    Abstract: Federated learning (FL) based on the centralized design faces both challenges regarding the trust issue and a single point of failure. To alleviate these issues, blockchain-aided decentralized FL (BDFL) introduces the decentralized network architecture into the FL training process, which can effectively overcome the defects of centralized architecture. However, deploying BDFL in wireless networks… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures

  39. arXiv:2405.17932  [pdf, ps, other

    cs.LG cs.DC

    Towards Communication-efficient Federated Learning via Sparse and Aligned Adaptive Optimization

    Authors: Xiumei Deng, Jun Li, Kang Wei, Long Shi, Zeihui Xiong, Ming Ding, Wen Chen, Shi Jin, H. Vincent Poor

    Abstract: Adaptive moment estimation (Adam), as a Stochastic Gradient Descent (SGD) variant, has gained widespread popularity in federated learning (FL) due to its fast convergence. However, federated Adam (FedAdam) algorithms suffer from a threefold increase in uplink communication overhead compared to federated SGD (FedSGD) algorithms, which arises from the necessity to transmit both local model updates a… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  40. arXiv:2405.17914  [pdf, other

    cs.LG

    Trustworthy DNN Partition for Blockchain-enabled Digital Twin in Wireless IIoT Networks

    Authors: Xiumei Deng, Jun Li, Long Shi, Kang Wei, Ming Ding, Yumeng Shao, Wen Chen, Shi Jin

    Abstract: Digital twin (DT) has emerged as a promising solution to enhance manufacturing efficiency in industrial Internet of Things (IIoT) networks. To promote the efficiency and trustworthiness of DT for wireless IIoT networks, we propose a blockchain-enabled DT (B-DT) framework that employs deep neural network (DNN) partitioning technique and reputation-based consensus mechanism, wherein the DTs maintain… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  41. arXiv:2405.13080  [pdf, other

    cs.CR cs.LG

    EmInspector: Combating Backdoor Attacks in Federated Self-Supervised Learning Through Embedding Inspection

    Authors: Yuwen Qian, Shuchi Wu, Kang Wei, Ming Ding, Di Xiao, Tao Xiang, Chuan Ma, Song Guo

    Abstract: Federated self-supervised learning (FSSL) has recently emerged as a promising paradigm that enables the exploitation of clients' vast amounts of unlabeled data while preserving data privacy. While FSSL offers advantages, its susceptibility to backdoor attacks, a concern identified in traditional federated supervised learning (FSL), has not been investigated. To fill the research gap, we undertake… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 18 pages, 12 figures

  42. arXiv:2405.06993  [pdf, other

    cs.LG cs.DC

    Robust Model Aggregation for Heterogeneous Federated Learning: Analysis and Optimizations

    Authors: Yumeng Shao, Jun Li, Long Shi, Kang Wei, Ming Ding, Qianmu Li, Zengxiang Li, Wen Chen, Shi Jin

    Abstract: Conventional synchronous federated learning (SFL) frameworks suffer from performance degradation in heterogeneous systems due to imbalanced local data size and diverse computing power on the client side. To address this problem, asynchronous FL (AFL) and semi-asynchronous FL have been proposed to recover the performance loss by allowing asynchronous aggregation. However, asynchronous aggregation i… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  43. arXiv:2405.05802  [pdf, other

    cs.DC cs.AI

    Deploying Graph Neural Networks in Wireless Networks: A Link Stability Viewpoint

    Authors: Jun Li, Weiwei Zhang, Kang Wei, Guangji Chen, Long Shi, Wen Chen

    Abstract: As an emerging artificial intelligence technology, graph neural networks (GNNs) have exhibited promising performance across a wide range of graph-related applications. However, information exchanges among neighbor nodes in GNN pose new challenges in the resource-constrained scenario, especially in wireless systems. In practical wireless systems, the communication links among nodes are usually unre… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 5 pages,3 figures

  44. arXiv:2405.03516  [pdf, other

    cs.LG

    GI-SMN: Gradient Inversion Attack against Federated Learning without Prior Knowledge

    Authors: Jin Qian, Kaimin Wei, Yongdong Wu, Jilian Zhang, Jipeng Chen, Huan Bao

    Abstract: Federated learning (FL) has emerged as a privacy-preserving machine learning approach where multiple parties share gradient information rather than original user data. Recent work has demonstrated that gradient inversion attacks can exploit the gradients of FL to recreate the original user data, posing significant privacy risks. However, these attacks make strong assumptions about the attacker, su… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 18 pages, 10 figures, conference

  45. arXiv:2405.03152  [pdf, other

    eess.AS cs.SD

    MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

    Authors: Bingshen Mu, Yangze Li, Qijie Shao, Kun Wei, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

    Abstract: Despite notable advancements in automatic speech recognition (ASR), performance tends to degrade when faced with adverse conditions. Generative error correction (GER) leverages the exceptional text comprehension capabilities of large language models (LLM), delivering impressive performance in ASR error correction, where N-best hypotheses provide valuable information for transcription prediction. H… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  46. arXiv:2405.02132  [pdf, other

    cs.SD cs.CL eess.AS

    Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

    Authors: Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

    Abstract: Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configu… ▽ More

    Submitted 4 November, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  47. arXiv:2404.16348  [pdf, other

    cs.CV

    Dual Expert Distillation Network for Generalized Zero-Shot Learning

    Authors: Zhijie Rao, Jingcai Guo, Xiaocheng Lu, Jingming Liang, Jie Zhang, Haozhao Wang, Kang Wei, Xiaofeng Cao

    Abstract: Zero-shot learning has consistently yielded remarkable progress via modeling nuanced one-to-one visual-attribute correlation. Existing studies resort to refining a uniform mapping function to align and correlate the sample regions and subattributes, ignoring two crucial issues: 1) the inherent asymmetry of attributes; and 2) the unutilized channel information. This paper addresses these issues by… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: 9 pages, 4 figures; Accepted to IJCAI 2024

  48. arXiv:2404.13860  [pdf, other

    cs.LG cs.CR

    Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning

    Authors: Huan Bao, Kaimin Wei, Yongdong Wu, Jin Qian, Robert H. Deng

    Abstract: A Model Inversion (MI) attack based on Generative Adversarial Networks (GAN) aims to recover the private training data from complex deep learning models by searching codes in the latent space. However, they merely search a deterministic latent space such that the found latent code is usually suboptimal. In addition, the existing distributional MI schemes assume that an attacker can access the stru… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  49. arXiv:2404.03372  [pdf, other

    math.OC cs.LG

    Elementary Analysis of Policy Gradient Methods

    Authors: Jiacai Liu, Wenye Li, Ke Wei

    Abstract: Projected policy gradient under the simplex parameterization, policy gradient and natural policy gradient under the softmax parameterization, are fundamental algorithms in reinforcement learning. There have been a flurry of recent activities in studying these algorithms from the theoretical aspect. Despite this, their convergence behavior is still not fully understood, even given the access to exa… ▽ More

    Submitted 10 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  50. arXiv:2402.15526  [pdf, other

    cs.AI cs.LG

    Chain-of-Specificity: An Iteratively Refining Method for Eliciting Knowledge from Large Language Models

    Authors: Kaiwen Wei, Jingyuan Zhang, Hongzhi Zhang, Fuzheng Zhang, Di Zhang, Li Jin, Yue Yu

    Abstract: Large Language Models (LLMs) exhibit remarkable generative capabilities, enabling the generation of valuable information. Despite these advancements, previous research found that LLMs sometimes struggle with adhering to specific constraints (e.g., in specific place or at specific time), at times even overlooking them, which leads to responses that are either too generic or not fully satisfactory.… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.