Skip to main content

Showing 1–50 of 487 results for author: Cheng, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.06864  [pdf, ps, other

    q-fin.PM cs.LG

    NewsNet-SDF: Stochastic Discount Factor Estimation with Pretrained Language Model News Embeddings via Adversarial Networks

    Authors: Shunyao Wang, Ming Cheng, Christina Dan Wang

    Abstract: Stochastic Discount Factor (SDF) models provide a unified framework for asset pricing and risk assessment, yet traditional formulations struggle to incorporate unstructured textual information. We introduce NewsNet-SDF, a novel deep learning framework that seamlessly integrates pretrained language model embeddings with financial time series through adversarial networks. Our multimodal architecture… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  2. arXiv:2505.04917  [pdf, other

    cs.CV

    A Simple Detector with Frame Dynamics is a Strong Tracker

    Authors: Chenxu Peng, Chenxu Wang, Minrui Zou, Danyang Li, Zhengpeng Yang, Yimian Dai, Ming-Ming Cheng, Xiang Li

    Abstract: Infrared object tracking plays a crucial role in Anti-Unmanned Aerial Vehicle (Anti-UAV) applications. Existing trackers often depend on cropped template regions and have limited motion modeling capabilities, which pose challenges when dealing with tiny targets. To address this, we propose a simple yet effective infrared tiny-object tracker that enhances tracking performance by integrating global… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 2025 CVPR Anti-UAV Workshop

  3. arXiv:2505.03683  [pdf, other

    cs.SE

    Moral Testing of Autonomous Driving Systems

    Authors: Wenbing Tang, Mingfei Cheng, Yuan Zhou, Yang Liu

    Abstract: Autonomous Driving System (ADS) testing plays a crucial role in their development, with the current focus primarily on functional and safety testing. However, evaluating the non-functional morality of ADSs, particularly their decision-making capabilities in unavoidable collision scenarios, is equally important to ensure the systems' trustworthiness and public acceptance. Unfortunately, testing ADS… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  4. arXiv:2505.03475  [pdf, other

    cs.AI cs.LG

    am-ELO: A Stable Framework for Arena-based LLM Evaluation

    Authors: Zirui Liu, Jiatong Li, Yan Zhuang, Qi Liu, Shuanghong Shen, Jie Ouyang, Mingyue Cheng, Shijin Wang

    Abstract: Arena-based evaluation is a fundamental yet significant evaluation paradigm for modern AI models, especially large language models (LLMs). Existing framework based on ELO rating system suffers from the inevitable instability problem due to ranking inconsistency and the lack of attention to the varying abilities of annotators. In this paper, we introduce a novel stable arena framework to address th… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: ICML2025 Accepted

  5. arXiv:2505.01688  [pdf, other

    cs.IT eess.SP

    Sensing Safety Analysis for Vehicular Networks with Integrated Sensing and Communication (ISAC)

    Authors: Tingyu Shui, Walid Saad, Mingzhe Cheng

    Abstract: Integrated sensing and communication (ISAC) emerged as a key feature of next-generation 6G wireless systems, allowing them to achieve high data rates and sensing accuracy. While prior research has primarily focused on addressing communication safety in ISAC systems, the equally critical issue of sensing safety remains largely ignored. In this paper, a novel threat to the sensing safety of ISAC veh… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  6. arXiv:2504.18765  [pdf, other

    cs.AI

    A Vision for Auto Research with LLM Agents

    Authors: Chengwei Liu, Chong Wang, Jiayue Cao, Jingquan Ge, Kun Wang, Lvye Zhang, Ming-Ming Cheng, Penghai Zhao, Tianlin Li, Xiaojun Jia, Xiang Li, Xinfeng Li, Yang Liu, Yebo Feng, Yihao Huang, Yijia Xu, Yuqiang Sun, Zhenhong Zhou, Zhengzi Xu

    Abstract: This paper introduces Agent-Based Auto Research, a structured multi-agent framework designed to automate, coordinate, and optimize the full lifecycle of scientific research. Leveraging the capabilities of large language models (LLMs) and modular agent collaboration, the system spans all major research phases, including literature review, ideation, methodology planning, experimentation, paper writi… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  7. arXiv:2504.18335  [pdf, ps, other

    cs.IT

    Rack-Aware Minimum Storage Partially Cooperative Regenerating Codes with Small Sub-Packetization

    Authors: Hengming Zhao, Dianhua Wu, Minquan Cheng

    Abstract: In the rack-aware model, there are $\bar{n}$ racks each of which has $u$ nodes with the same storage capacity. Assume that there are $h$ failed nodes uniformly distributed in $\bar{h}$ host racks ( defined as racks containing failed nodes), each rack containing $h/\bar{h}$ failed nodes where $h$ is divisible by $\bar{h}$. Then together with its internal helper nodes, each host rack downloads recov… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  8. arXiv:2504.18126  [pdf, other

    hep-lat cs.LG hep-th

    Lecture Notes on Normalizing Flows for Lattice Quantum Field Theories

    Authors: Miranda C. N. Cheng, Niki Stratikopoulou

    Abstract: Numerical simulations of quantum field theories on lattices serve as a fundamental tool for studying the non-perturbative regime of the theories, where analytic tools often fall short. Challenges arise when one takes the continuum limit or as the system approaches a critical point, especially in the presence of non-trivial topological structures in the theory. Rapid recent advances in machine lear… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

    Comments: 70 pages

  9. arXiv:2504.17815  [pdf, other

    cs.CV

    Visibility-Uncertainty-guided 3D Gaussian Inpainting via Scene Conceptional Learning

    Authors: Mingxuan Cui, Qing Guo, Yuyi Wang, Hongkai Yu, Di Lin, Qin Zou, Ming-Ming Cheng, Xi Li

    Abstract: 3D Gaussian Splatting (3DGS) has emerged as a powerful and efficient 3D representation for novel view synthesis. This paper extends 3DGS capabilities to inpainting, where masked objects in a scene are replaced with new contents that blend seamlessly with the surroundings. Unlike 2D image inpainting, 3D Gaussian inpainting (3DGI) is challenging in effectively leveraging complementary visual and sem… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 14 pages, 12 figures, ICCV

  10. arXiv:2504.16448  [pdf, other

    cs.CL cs.AI

    EMRModel: A Large Language Model for Extracting Medical Consultation Dialogues into Structured Medical Records

    Authors: Shuguang Zhao, Qiangzhong Feng, Zhiyang He, Peipei Sun, Yingying Wang, Xiaodong Tao, Xiaoliang Lu, Mei Cheng, Xinyue Wu, Yanyan Wang, Wei Liang

    Abstract: Medical consultation dialogues contain critical clinical information, yet their unstructured nature hinders effective utilization in diagnosis and treatment. Traditional methods, relying on rule-based or shallow machine learning techniques, struggle to capture deep and implicit semantics. Recently, large pre-trained language models and Low-Rank Adaptation (LoRA), a lightweight fine-tuning method,… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  11. arXiv:2504.14827  [pdf, other

    cs.HC

    LACE: Exploring Turn-Taking and Parallel Interaction Modes in Human-AI Co-Creation for Iterative Image Generation

    Authors: YenKai Huang, Zheng Ning, Ming Cheng

    Abstract: This paper introduces LACE, a co-creative system enabling professional artists to leverage generative AI through controlled prompting and iterative refinement within Photoshop. Addressing challenges in precision, iterative coherence, and workflow compatibility, LACE allows flexible control via layer-based editing and dual-mode collaboration (turn-taking and parallel). A pilot study (N=21) demonstr… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: Extended version of the short paper accepted at the GenAICHI Workshop at CHI 2025. Includes additional results, analysis, qualitative feedback, and discussion

    ACM Class: H.5.2; I.2.6; I.4.8

  12. arXiv:2504.13145  [pdf, other

    cs.AI

    Exploring Expert Failures Improves LLM Agent Tuning

    Authors: Li-Cheng Lan, Andrew Bai, Minhao Cheng, Cho-Jui Hsieh, Tianyi Zhou

    Abstract: Large Language Models (LLMs) have shown tremendous potential as agents, excelling at tasks that require multiple rounds of reasoning and interactions. Rejection Sampling Fine-Tuning (RFT) has emerged as an effective method for finetuning LLMs as agents: it first imitates expert-generated successful trajectories and further improves agentic skills through iterative fine-tuning on successful, self-g… ▽ More

    Submitted 18 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

  13. arXiv:2504.12048  [pdf, other

    cs.CV

    Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM

    Authors: Zirui Pan, Xin Wang, Yipeng Zhang, Hong Chen, Kwan Man Cheng, Yaofei Wu, Wenwu Zhu

    Abstract: Text-to-Video generation, which utilizes the provided text prompt to generate high-quality videos, has drawn increasing attention and achieved great success due to the development of diffusion models recently. Existing methods mainly rely on a pre-trained text encoder to capture the semantic information and perform cross attention with the encoded text prompt to guide the generation of video. Howe… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: AAAI 2025 Poster

  14. arXiv:2504.10434  [pdf, other

    cs.CV

    Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing

    Authors: Taihang Hu, Linxuan Li, Kai Wang, Yaxing Wang, Jian Yang, Ming-Ming Cheng

    Abstract: Text-to-image generation has seen groundbreaking advancements with diffusion models, enabling high-fidelity synthesis and precise image editing through cross-attention manipulation. Recently, autoregressive (AR) models have re-emerged as powerful alternatives, leveraging next-token generation to match diffusion models. However, existing editing techniques designed for diffusion models fail to tran… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  15. Shrinkage Initialization for Smooth Learning of Neural Networks

    Authors: Miao Cheng, Feiyan Zhou, Hongwei Zou, Limin Wang

    Abstract: The successes of intelligent systems have quite relied on the artificial learning of information, which lead to the broad applications of neural learning solutions. As a common sense, the training of neural networks can be largely improved by specifically defined initialization, neuron layers as well as the activation functions. Though there are sequential layer based initialization available, the… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

    Comments: 6 pages, 4 figures

    ACM Class: I.2.6; F.2.1

  16. arXiv:2504.07960  [pdf, other

    cs.CV

    VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

    Authors: Zhong-Yu Li, Ruoyi Du, Juncheng Yan, Le Zhuo, Zhen Li, Peng Gao, Zhanyu Ma, Ming-Ming Cheng

    Abstract: Recent progress in diffusion models significantly advances various image generation tasks. However, the current mainstream approach remains focused on building task-specific models, which have limited efficiency when supporting a wide range of different needs. While universal models attempt to address this limitation, they face critical challenges, including generalizable task instruction, appropr… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

    Comments: Project page: https://visualcloze.github.io/

  17. arXiv:2504.04701  [pdf, other

    cs.CV

    DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation

    Authors: Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng, Qibin Hou

    Abstract: Recent advances in scene understanding benefit a lot from depth maps because of the 3D geometry information, especially in complex conditions (e.g., low light and overexposed). Existing approaches encode depth maps along with RGB images and perform feature fusion between them to enable more robust predictions. Taking into account that depth can be regarded as a geometry supplement for RGB images,… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR 2025

  18. arXiv:2504.00911  [pdf, other

    cs.SE

    Foundation Models for Autonomous Driving System: An Initial Roadmap

    Authors: Xiongfei Wu, Mingfei Cheng, Qiang Hu, Jianlang Chen, Yuheng Huang, Manabu Okada, Michio Hayashi, Tomoyuki Tsuchiya, Xiaofei Xie, Lei Ma

    Abstract: Recent advancements in Foundation Models (FMs), such as Large Language Models (LLMs), have significantly enhanced Autonomous Driving Systems (ADSs) by improving perception, reasoning, and decision-making in dynamic and uncertain environments. However, ADSs are highly complex cyber-physical systems that demand rigorous software engineering practices to ensure reliability and safety. Integrating FMs… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  19. arXiv:2503.23508  [pdf, other

    cs.CV

    Re-Aligning Language to Visual Objects with an Agentic Workflow

    Authors: Yuming Chen, Jiangyan Feng, Haodong Zhang, Lijun Gong, Feng Zhu, Rui Zhao, Qibin Hou, Ming-Ming Cheng, Yibing Song

    Abstract: Language-based object detection (LOD) aims to align visual objects with language expressions. A large amount of paired data is utilized to improve LOD model generalizations. During the training process, recent studies leverage vision-language models (VLMs) to automatically generate human-like expressions for visual objects, facilitating training data scaling up. In this process, we observe that VL… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: 33 pages, 20 figures, 17 tables, ICLR 2025

  20. arXiv:2503.21076  [pdf, other

    cs.CV cs.LG

    KAC: Kolmogorov-Arnold Classifier for Continual Learning

    Authors: Yusong Hu, Zichen Liang, Fei Yang, Qibin Hou, Xialei Liu, Ming-Ming Cheng

    Abstract: Continual learning requires models to train continuously across consecutive tasks without forgetting. Most existing methods utilize linear classifiers, which struggle to maintain a stable classification space while learning new tasks. Inspired by the success of Kolmogorov-Arnold Networks (KAN) in preserving learning stability during simple continual regression tasks, we set out to explore their po… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  21. arXiv:2503.19551  [pdf, other

    cs.CL cs.AI

    Scaling Laws of Synthetic Data for Language Models

    Authors: Zeyu Qin, Qingxiu Dong, Xingxing Zhang, Li Dong, Xiaolong Huang, Ziyi Yang, Mahmoud Khademi, Dongdong Zhang, Hany Hassan Awadalla, Yi R. Fung, Weizhu Chen, Minhao Cheng, Furu Wei

    Abstract: Large language models (LLMs) achieve strong performance across diverse tasks, largely driven by high-quality web data used in pre-training. However, recent studies indicate this data source is rapidly depleting. Synthetic data emerges as a promising alternative, but it remains unclear whether synthetic datasets exhibit predictable scalability comparable to raw pre-training data. In this work, we s… ▽ More

    Submitted 26 March, 2025; v1 submitted 25 March, 2025; originally announced March 2025.

    Comments: work in progress

  22. arXiv:2503.18403  [pdf, other

    cs.CV cs.AI cs.LG

    Knowledge Graph Enhanced Generative Multi-modal Models for Class-Incremental Learning

    Authors: Xusheng Cao, Haori Lu, Linlan Huang, Fei Yang, Xialei Liu, Ming-Ming Cheng

    Abstract: Continual learning in computer vision faces the critical challenge of catastrophic forgetting, where models struggle to retain prior knowledge while adapting to new tasks. Although recent studies have attempted to leverage the generalization capabilities of pre-trained models to mitigate overfitting on current tasks, models still tend to forget details of previously learned categories as tasks pro… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

  23. arXiv:2503.14110  [pdf, other

    cs.IR

    A Comprehensive Survey on Cross-Domain Recommendation: Taxonomy, Progress, and Prospects

    Authors: Hao Zhang, Mingyue Cheng, Qi Liu, Junzhe Jiang, Xianquan Wang, Rujiao Zhang, Chenyi Lei, Enhong Chen

    Abstract: Recommender systems (RS) have become crucial tools for information filtering in various real world scenarios. And cross domain recommendation (CDR) has been widely explored in recent years in order to provide better recommendation results in the target domain with the help of other domains. The CDR technology has developed rapidly, yet there is a lack of a comprehensive survey summarizing recent w… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  24. arXiv:2503.12929  [pdf, other

    cs.CV

    AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction

    Authors: Xuying Zhang, Yupeng Zhou, Kai Wang, Yikai Wang, Zhen Li, Shaohui Jiao, Daquan Zhou, Qibin Hou, Ming-Ming Cheng

    Abstract: Novel view synthesis (NVS) is a cornerstone for image-to-3d creation. However, existing works still struggle to maintain consistency between the generated views and the input views, especially when there is a significant camera pose difference, leading to poor-quality 3D geometries and textures. We attribute this issue to their treatment of all target views with equal priority according to our emp… ▽ More

    Submitted 27 April, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

  25. arXiv:2503.12150  [pdf, other

    cs.CV

    Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis

    Authors: Hongyu Sun, Qiuhong Ke, Ming Cheng, Yongcai Wang, Deying Li, Chenhui Gou, Jianfei Cai

    Abstract: This paper proposes a general solution to enable point cloud recognition models to handle distribution shifts at test time. Unlike prior methods, which rely heavily on training data (often inaccessible during online inference) and are limited to recognizing a fixed set of point cloud classes predefined during training, we explore a more practical and challenging scenario: adapting the model solely… ▽ More

    Submitted 27 April, 2025; v1 submitted 15 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR 2025; 24 pages, 14 figures, 18 tables

  26. arXiv:2503.10677  [pdf, other

    cs.CL cs.AI

    A Survey on Knowledge-Oriented Retrieval-Augmented Generation

    Authors: Mingyue Cheng, Yucong Luo, Jie Ouyang, Qi Liu, Huijie Liu, Li Li, Shuo Yu, Bohou Zhang, Jiawei Cao, Jie Ma, Daoyu Wang, Enhong Chen

    Abstract: Retrieval-Augmented Generation (RAG) has gained significant attention in recent years for its potential to enhance natural language understanding and generation by combining large-scale retrieval systems with generative models. RAG leverages external knowledge sources, such as documents, databases, or structured data, to improve model performance and generate more accurate and contextually relevan… ▽ More

    Submitted 17 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

  27. arXiv:2503.05132  [pdf, other

    cs.AI cs.CV cs.LG

    R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model

    Authors: Hengguang Zhou, Xirui Li, Ruochen Wang, Minhao Cheng, Tianyi Zhou, Cho-Jui Hsieh

    Abstract: Recently DeepSeek R1 demonstrated how reinforcement learning with simple rule-based incentives can enable autonomous development of complex reasoning in large language models, characterized by the "aha moment", in which the model manifest self-reflection and increased response length during training. However, attempts to extend this success to multimodal reasoning often failed to reproduce these k… ▽ More

    Submitted 9 March, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

    Comments: 10 pages, 6 figures

  28. arXiv:2503.04800  [pdf, other

    cs.CL cs.AI

    HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation

    Authors: Jie Ouyang, Tingyue Pan, Mingyue Cheng, Ruiran Yan, Yucong Luo, Jiaying Lin, Qi Liu

    Abstract: While Retrieval-Augmented Generation (RAG) has emerged as an effective approach for addressing the knowledge outdating problem in Large Language Models (LLMs), it faces a critical challenge: the prevalence of outdated information in knowledge bases. Current research primarily focuses on incorporating up-to-date information, yet the impact of outdated information coexisting in retrieval sources rem… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  29. arXiv:2503.02250  [pdf, other

    cs.CY

    AI Automatons: AI Systems Intended to Imitate Humans

    Authors: Alexandra Olteanu, Solon Barocas, Su Lin Blodgett, Lisa Egede, Alicia DeVrio, Myra Cheng

    Abstract: There is a growing proliferation of AI systems designed to mimic people's behavior, work, abilities, likenesses, or humanness -- systems we dub AI automatons. Individuals, groups, or generic humans are being simulated to produce creative work in their styles, to respond to surveys in their places, to probe how they would use a new system before deployment, to provide users with assistance and comp… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: 26 pages, 1 figure

  30. arXiv:2503.01767  [pdf, other

    cs.HC

    Designing VR Simulation System for Clinical Communication Training with LLMs-Based Embodied Conversational Agents

    Authors: Xiuqi Tommy Zhu, Heidi Cheerman, Minxin Cheng, Sheri Kiami, Leanne Chukoskie, Eileen McGivney

    Abstract: VR simulation in Health Professions (HP) education demonstrates huge potential, but fixed learning content with little customization limits its application beyond lab environments. To address these limitations in the context of VR for patient communication training, we conducted a user-centered study involving semi-structured interviews with advanced HP students to understand their challenges in c… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  31. arXiv:2503.00306  [pdf, other

    cs.CL cs.LG

    Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning

    Authors: Tianci Liu, Ruirui Li, Yunzhe Qi, Hui Liu, Xianfeng Tang, Tianqi Zheng, Qingyu Yin, Monica Xiao Cheng, Jun Huan, Haoyu Wang, Jing Gao

    Abstract: Large language models (LLMs) have achieved remarkable performance on various natural language tasks. However, they are trained on static corpora and their knowledge can become outdated quickly in the fast-changing world. This motivates the development of knowledge editing methods designed to update certain knowledge in LLMs without changing unrelated others. To make selective edits, previous effor… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: ICLR 2025

  32. arXiv:2502.15979  [pdf, other

    cs.IR cs.CV

    Visual Zero-Shot E-Commerce Product Attribute Value Extraction

    Authors: Jiaying Gong, Ming Cheng, Hongda Shen, Pierre-Yves Vandenbussche, Janet Jenq, Hoda Eldardiry

    Abstract: Existing zero-shot product attribute value (aspect) extraction approaches in e-Commerce industry rely on uni-modal or multi-modal models, where the sellers are asked to provide detailed textual inputs (product descriptions) for the products. However, manually providing (typing) the product descriptions is time-consuming and frustrating for the sellers. Thus, we propose a cross-modal zero-shot attr… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

    Comments: 10 pages, 4 figures, accepted for publication in NAACL 2025 Industry Track

  33. arXiv:2502.14019  [pdf, other

    cs.CL cs.AI cs.HC

    Dehumanizing Machines: Mitigating Anthropomorphic Behaviors in Text Generation Systems

    Authors: Myra Cheng, Su Lin Blodgett, Alicia DeVrio, Lisa Egede, Alexandra Olteanu

    Abstract: As text generation systems' outputs are increasingly anthropomorphic -- perceived as human-like -- scholars have also raised increasing concerns about how such outputs can lead to harmful outcomes, such as users over-relying or developing emotional dependence on these systems. How to intervene on such system outputs to mitigate anthropomorphic behaviors and their attendant harmful outcomes, howeve… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  34. arXiv:2502.13259  [pdf, other

    cs.CL cs.AI cs.CY

    HumT DumT: Measuring and controlling human-like language in LLMs

    Authors: Myra Cheng, Sunny Yu, Dan Jurafsky

    Abstract: Should LLMs generate language that makes them seem human? Human-like language might improve user experience, but might also lead to overreliance and stereotyping. Assessing these potential impacts requires a systematic way to measure human-like tone in LLM outputs. We introduce HumT and SocioT, metrics for human-like tone and other dimensions of social perceptions in text data based on relative pr… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  35. arXiv:2502.10248  [pdf, other

    cs.CV cs.CL

    Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model

    Authors: Guoqing Ma, Haoyang Huang, Kun Yan, Liangyu Chen, Nan Duan, Shengming Yin, Changyi Wan, Ranchen Ming, Xiaoniu Song, Xing Chen, Yu Zhou, Deshan Sun, Deyu Zhou, Jian Zhou, Kaijun Tan, Kang An, Mei Chen, Wei Ji, Qiling Wu, Wen Sun, Xin Han, Yanan Wei, Zheng Ge, Aojie Li, Bin Wang , et al. (90 additional authors not shown)

    Abstract: We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length. A deep compression Variational Autoencoder, Video-VAE, is designed for video generation tasks, achieving 16x16 spatial and 8x temporal compression ratios, while maintaining exceptional video reconstruction quality. User prompts are encoded… ▽ More

    Submitted 24 February, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

    Comments: 36 pages, 14 figures

  36. arXiv:2502.09977  [pdf, other

    cs.CL cs.AI

    LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs -- No Silver Bullet for LC or RAG Routing

    Authors: Kuan Li, Liwen Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Shuai Wang, Minhao Cheng

    Abstract: Effectively incorporating external knowledge into Large Language Models (LLMs) is crucial for enhancing their capabilities and addressing real-world needs. Retrieval-Augmented Generation (RAG) offers an effective method for achieving this by retrieving the most relevant fragments into LLMs. However, the advancements in context window size for LLMs offer an alternative approach, raising the questio… ▽ More

    Submitted 5 March, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

    Comments: 22 pages

  37. arXiv:2502.09870  [pdf, other

    cs.HC cs.AI cs.CL

    A Taxonomy of Linguistic Expressions That Contribute To Anthropomorphism of Language Technologies

    Authors: Alicia DeVrio, Myra Cheng, Lisa Egede, Alexandra Olteanu, Su Lin Blodgett

    Abstract: Recent attention to anthropomorphism -- the attribution of human-like qualities to non-human objects or entities -- of language technologies like LLMs has sparked renewed discussions about potential negative impacts of anthropomorphism. To productively discuss the impacts of this anthropomorphism and in what contexts it is appropriate, we need a shared vocabulary for the vast variety of ways that… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: 18 pages, 1 figure, to appear at CHI 2025

    Journal ref: ACM CHI Conference on Human Factors in Computing Systems (CHI 2025), Yokohama, Japan

  38. arXiv:2502.09192  [pdf, other

    cs.CL

    Thinking beyond the anthropomorphic paradigm benefits LLM research

    Authors: Lujain Ibrahim, Myra Cheng

    Abstract: Anthropomorphism, or the attribution of human traits to technology, is an automatic and unconscious response that occurs even in those with advanced technical expertise. In this position paper, we analyze hundreds of thousands of computer science research articles from the past decade and present empirical evidence of the prevalence and growth of anthropomorphic terminology in research on large la… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

  39. arXiv:2502.08504  [pdf, other

    cs.SE

    MoDitector: Module-Directed Testing for Autonomous Driving Systems

    Authors: Renzhi Wang, Mingfei Cheng, Xiaofei Xie, Yuan Zhou, Lei Ma

    Abstract: Testing Autonomous Driving Systems (ADS) is crucial for ensuring their safety, reliability, and performance. Despite numerous testing methods available that can generate diverse and challenging scenarios to uncover potential vulnerabilities, these methods often treat ADS as a black-box, primarily focusing on identifying system failures like collisions or near-misses without pinpointing the specifi… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  40. arXiv:2502.08374  [pdf, other

    cs.CV

    AdvSwap: Covert Adversarial Perturbation with High Frequency Info-swapping for Autonomous Driving Perception

    Authors: Yuanhao Huang, Qinfan Zhang, Jiandong Xing, Mengyue Cheng, Haiyang Yu, Yilong Ren, Xiao Xiong

    Abstract: Perception module of Autonomous vehicles (AVs) are increasingly susceptible to be attacked, which exploit vulnerabilities in neural networks through adversarial inputs, thereby compromising the AI safety. Some researches focus on creating covert adversarial samples, but existing global noise techniques are detectable and difficult to deceive the human visual system. This paper introduces a novel a… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: 27th IEEE International Conference on Intelligent Transportation Systems (ITSC)

  41. arXiv:2502.06710  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Learning Musical Representations for Music Performance Question Answering

    Authors: Xingjian Diao, Chunhui Zhang, Tingxuan Wu, Ming Cheng, Zhongyu Ouyang, Weiyi Wu, Jiang Gui

    Abstract: Music performances are representative scenarios for audio-visual modeling. Unlike common scenarios with sparse audio, music performances continuously involve dense audio signals throughout. While existing multimodal learning methods on the audio-video QA demonstrate impressive capabilities in general scenarios, they are incapable of dealing with fundamental problems within the music performances:… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

    Comments: Accepted at EMNLP 2024

  42. arXiv:2502.06020  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding

    Authors: Xingjian Diao, Chunhui Zhang, Weiyi Wu, Zhongyu Ouyang, Peijun Qing, Ming Cheng, Soroush Vosoughi, Jiang Gui

    Abstract: Multimodal foundation models (MFMs) have demonstrated significant success in tasks such as visual captioning, question answering, and image-text retrieval. However, these models face inherent limitations due to their finite internal capacity, which restricts their ability to process extended temporal sequences, a crucial requirement for comprehensive video and audio analysis. To overcome these cha… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: Accepted at NAACL 2025

  43. arXiv:2502.04040  [pdf, other

    cs.LG cs.AI cs.CL

    Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment

    Authors: Haoyu Wang, Zeyu Qin, Li Shen, Xueqian Wang, Minhao Cheng, Dacheng Tao

    Abstract: Training safe LLMs is one of the most critical research challenge. However, the commonly used method, Refusal Training (RT), struggles to generalize against various OOD jailbreaking attacks. Many safety training methods have been proposed to address this issue. While they offer valuable insights, we aim to complement this line of research by investigating whether OOD attacks truly exceed the capab… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

    Comments: The first two authors contributed equally

  44. arXiv:2502.03772  [pdf, other

    cs.CV cs.AI

    A Retrospective Systematic Study on Hierarchical Sparse Query Transformer-assisted Ultrasound Screening for Early Hepatocellular Carcinoma

    Authors: Chaoyin She, Ruifang Lu, Danni He, Jiayi Lv, Yadan Lin, Meiqing Cheng, Hui Huang, Fengyu Ye, Lida Chen, Wei Wang, Qinghua Huang

    Abstract: Hepatocellular carcinoma (HCC), ranking as the third leading cause of cancer-related mortality worldwide, demands urgent improvements in early detection to enhance patient survival. While ultrasound remains the preferred screening modality due to its cost-effectiveness and real-time capabilities, its sensitivity (59%-78%) heavily relies on radiologists' expertise, leading to inconsistent diagnosti… ▽ More

    Submitted 20 March, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  45. arXiv:2502.02905  [pdf, other

    cond-mat.mtrl-sci cs.LG

    AI-driven materials design: a mini-review

    Authors: Mouyang Cheng, Chu-Liang Fu, Ryotaro Okabe, Abhijatmedhi Chotrattanapituk, Artittaya Boonkird, Nguyen Tuan Hung, Mingda Li

    Abstract: Materials design is an important component of modern science and technology, yet traditional approaches rely heavily on trial-and-error and can be inefficient. Computational techniques, enhanced by modern artificial intelligence (AI), have greatly accelerated the design of new materials. Among these approaches, inverse design has shown great promise in designing materials that meet specific proper… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

    Comments: 18 pages, 7 figures, 1 table; Review article

  46. arXiv:2502.02215  [pdf, other

    cs.CV

    InterLCM: Low-Quality Images as Intermediate States of Latent Consistency Models for Effective Blind Face Restoration

    Authors: Senmao Li, Kai Wang, Joost van de Weijer, Fahad Shahbaz Khan, Chun-Le Guo, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng

    Abstract: Diffusion priors have been used for blind face restoration (BFR) by fine-tuning diffusion models (DMs) on restoration datasets to recover low-quality images. However, the naive application of DMs presents several key limitations. (i) The diffusion prior has inferior semantic consistency (e.g., ID, structure and color.), increasing the difficulty of optimizing the BFR model; (ii) reliance on hundre… ▽ More

    Submitted 21 March, 2025; v1 submitted 4 February, 2025; originally announced February 2025.

    Comments: Accepted at ICLR2025

  47. arXiv:2501.18045  [pdf, other

    cs.CY cs.AI cs.CL cs.HC

    From tools to thieves: Measuring and understanding public perceptions of AI through crowdsourced metaphors

    Authors: Myra Cheng, Angela Y. Lee, Kristina Rapuano, Kate Niederhoffer, Alex Liebscher, Jeffrey Hancock

    Abstract: How has the public responded to the increasing prevalence of artificial intelligence (AI)-based technologies? We investigate public perceptions of AI by collecting over 12,000 responses over 12 months from a nationally representative U.S. sample. Participants provided open-ended metaphors reflecting their mental models of AI, a methodology that overcomes the limitations of traditional self-reporte… ▽ More

    Submitted 29 April, 2025; v1 submitted 29 January, 2025; originally announced January 2025.

    Comments: To appear at the ACM Conference on Fairness, Accountability, and Transparency 2025

  48. arXiv:2501.17858  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Improving Your Model Ranking on Chatbot Arena by Vote Rigging

    Authors: Rui Min, Tianyu Pang, Chao Du, Qian Liu, Minhao Cheng, Min Lin

    Abstract: Chatbot Arena is a popular platform for evaluating LLMs by pairwise battles, where users vote for their preferred response from two randomly sampled anonymous models. While Chatbot Arena is widely regarded as a reliable LLM ranking leaderboard, we show that crowdsourced voting can be rigged to improve (or decrease) the ranking of a target model $m_{t}$. We first introduce a straightforward target-… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

  49. arXiv:2501.13554  [pdf, other

    cs.CV cs.AI cs.LG

    One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

    Authors: Tao Liu, Kai Wang, Senmao Li, Joost van de Weijer, Fahad Shahbaz Khan, Shiqi Yang, Yaxing Wang, Jian Yang, Ming-Ming Cheng

    Abstract: Text-to-image generation models can create high-quality images from input prompts. However, they struggle to support the consistent generation of identity-preserving requirements for storytelling. Existing approaches to this problem typically require extensive training in large datasets or additional modifications to the original model architectures. This limits their applicability across differen… ▽ More

    Submitted 5 February, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

    Comments: 28 pages, 22 figures, ICLR2025 conference

  50. arXiv:2501.12528  [pdf, other

    cs.IT

    Improved Coded Caching Scheme for Multi-User Information Retrieval System

    Authors: Junyi Wang, Quan Zang, Jinyu Wang, Minquan Cheng

    Abstract: In this paper, we study the coded caching scheme for the $(L, K, M, N)$ multi-user information retrieval (MIR) system, which consists of a content library containing $N$ files, a base station (BS) with $L$ antennas that cannot access the library, and $K$ single-antenna users, each of which can cache at most $M$ files from the library. The users communicate with the others assisted by the BS to dec… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: 14