Skip to main content

Showing 1–50 of 132 results for author: Cheng, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01464  [pdf, ps, other

    cs.IT

    Coding for Quasi-Static Fading Channel with Imperfect CSI at the Transmitter and Quantized Feedback

    Authors: Yuhan Yang, Mei Han, Haonan Zhang, Haoheng Yuan, Fan Cheng, Bin Dai

    Abstract: The classical Schalkwijk-Kailath (SK) scheme for the additive Gaussian noise channel with noiseless feedback is highly efficient since its coding complexity is extremely low and the decoding error doubly exponentially decays as the coding blocklength tends to infinity. However, its application to the fading channel with imperfect CSI at the transmitter (I-CSIT) is challenging since the SK scheme i… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

    Comments: 7 pages, 6 figures, conference, this paper will be presented at the 2025 IEEE ITW

  2. arXiv:2507.00942  [pdf, ps, other

    cs.IT

    Optimal Feedback Schemes for Dirty Paper Channels With State Estimation at the Receiver

    Authors: Dengfeng Xia, Han Deng, Haonan Zhang, Fan Cheng, Bin Dai, Liuguo Yin

    Abstract: In the literature, it has been shown that feedback does not increase the optimal rate-distortion region of the dirty paper channel with state estimation at the receiver (SE-R). On the other hand, it is well-known that feedback helps to construct low-complexity coding schemes in Gaussian channels, such as the elegant Schalkwijk-Kailath (SK) feedback scheme. This motivates us to explore capacity-ach… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: This paper will be presented at the 2025 IEEE Information Theory Workshop (ITW)

  3. arXiv:2506.22677  [pdf, ps, other

    cs.ET

    Prediction of Protein Three-dimensional Structures via a Hardware-Executable Quantum Computing Framework

    Authors: Yuqi Zhang, Yuxin Yang, William Martin, Kingsten Lin, Zixu Wang, Cheng-Chang Lu, Weiwen Jiang, Ruth Nussinov, Joseph Loscalzo, Qiang Guan, Feixiong Cheng

    Abstract: Accurate prediction of protein active site structures remains a central challenge in structural biology, particularly for short and flexible peptide fragments where conventional methods often fail. Here, we present a quantum computing framework specifically developed for utility-level quantum processors to address this problem. Starting from an amino acid sequence, we formulate the structure predi… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

    Comments: 22 pages, 4 figures

  4. arXiv:2506.18290  [pdf, ps, other

    cs.LG

    Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction

    Authors: Han Zhang, Jinghong Mao, Shangwen Zhu, Zhantao Yang, Lianghua Huang, Yu Liu, Deli Zhao, Ruili Feng, Fan Cheng

    Abstract: Diffusion reconstruction plays a critical role in various applications such as image editing, restoration, and style transfer. In theory, the reconstruction should be simple - it just inverts and regenerates images by numerically solving the Probability Flow-Ordinary Differential Equation (PF-ODE). Yet in practice, noticeable reconstruction errors have been observed, which cannot be well explained… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  5. arXiv:2506.16633  [pdf, ps, other

    cs.CL cs.AI cs.MM

    GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View

    Authors: Fenghua Cheng, Jinxiang Wang, Sen Wang, Zi Huang, Xue Li

    Abstract: Multimodal reasoning is a process of understanding, integrating and inferring information across different data modalities. It has recently attracted surging academic attention as a benchmark for Artificial Intelligence (AI). Although there are various tasks for evaluating multimodal reasoning ability, they still have limitations. Lack of reasoning on hierarchical visual clues at different levels… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  6. arXiv:2506.14377  [pdf

    cs.CY

    Between Regulation and Accessibility: How Chinese University Students Navigate Global and Domestic Generative AI

    Authors: Qin Xie, Ming Li, Fei Cheng

    Abstract: Despite the rapid proliferation of generative AI in higher education, students in China face significant barriers in accessing global tools like ChatGPT due to regulations and constraints. Grounded in the Unified Theory of Acceptance and Use of Technology 2 (UTAUT2) model, this study employs qualitative interviews to investigate how Chinese university students interact with both global and domesti… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  7. arXiv:2506.10941  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    VINCIE: Unlocking In-context Image Editing from Video

    Authors: Leigang Qu, Feng Cheng, Ziyan Yang, Qi Zhao, Shanchuan Lin, Yichun Shi, Yicong Li, Wenjie Wang, Tat-Seng Chua, Lu Jiang

    Abstract: In-context image editing aims to modify images based on a contextual sequence comprising text and previously generated images. Existing methods typically depend on task-specific pipelines and expert models (e.g., segmentation and inpainting) to curate training data. In this work, we explore whether an in-context image editing model can be learned directly from videos. We introduce a scalable appro… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: Project page: https://vincie2025.github.io/

  8. arXiv:2506.00394  [pdf, other

    cs.CV

    Sequence-Based Identification of First-Person Camera Wearers in Third-Person Views

    Authors: Ziwei Zhao, Xizi Wang, Yuchen Wang, Feng Cheng, David Crandall

    Abstract: The increasing popularity of egocentric cameras has generated growing interest in studying multi-camera interactions in shared environments. Although large-scale datasets such as Ego4D and Ego-Exo4D have propelled egocentric vision research, interactions between multiple camera wearers remain underexplored-a key gap for applications like immersive learning and collaborative robotics. To bridge thi… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  9. arXiv:2505.10748  [pdf, ps, other

    cs.AR

    AutoRAC: Automated Processing-in-Memory Accelerator Design for Recommender Systems

    Authors: Feng Cheng, Tunhou Zhang, Junyao Zhang, Jonathan Hao-Cheng Ku, Yitu Wang, Xiaoxuan Yang, Hai, Li, Yiran Chen

    Abstract: The performance bottleneck of deep-learning-based recommender systems resides in their backbone Deep Neural Networks. By integrating Processing-In-Memory~(PIM) architectures, researchers can reduce data movement and enhance energy efficiency, paving the way for next-generation recommender models. Nevertheless, achieving performance and efficiency gains is challenging due to the complexity of the P… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: GLSVLSI 2025

  10. arXiv:2505.06901  [pdf, ps, other

    cs.AR

    Ecco: Improving Memory Bandwidth and Capacity for LLMs via Entropy-aware Cache Compression

    Authors: Feng Cheng, Cong Guo, Chiyue Wei, Junyao Zhang, Changchun Zhou, Edward Hanson, Jiaqi Zhang, Xiaoxiao Liu, Hai "Helen" Li, Yiran Chen

    Abstract: Large language models (LLMs) have demonstrated transformative capabilities across diverse artificial intelligence applications, yet their deployment is hindered by substantial memory and computational demands, especially in resource-constrained environments. Quantization techniques have emerged as a critical solution, reducing data precision to enhance memory and computational efficiency. However,… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

    Comments: ISCA 2025

  11. arXiv:2504.08685  [pdf, other

    cs.CV cs.AI

    Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model

    Authors: Team Seawead, Ceyuan Yang, Zhijie Lin, Yang Zhao, Shanchuan Lin, Zhibei Ma, Haoyuan Guo, Hao Chen, Lu Qi, Sen Wang, Feng Cheng, Feilong Zuo, Xuejiao Zeng, Ziyan Yang, Fangyuan Kong, Meng Wei, Zhiwu Qing, Fei Xiao, Tuyen Hoang, Siyu Zhang, Peihao Zhu, Qi Zhao, Jiangqiao Yan, Liangke Gui, Sheng Bi , et al. (30 additional authors not shown)

    Abstract: This technical report presents a cost-efficient strategy for training a video generation foundation model. We present a mid-sized research model with approximately 7 billion parameters (7B) called Seaweed-7B trained from scratch using 665,000 H100 GPU hours. Despite being trained with moderate computational resources, Seaweed-7B demonstrates highly competitive performance compared to contemporary… ▽ More

    Submitted 4 May, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

    Comments: Technical report (some typos fixed)

  12. arXiv:2504.02636  [pdf

    cs.CY

    A Framework for Developing University Policies on Generative AI Governance: A Cross-national Comparative Study

    Authors: Ming Li, Qin Xie, Ariunaa Enkhtur, Shuoyang Meng, Lilan Chen, Beverley Anne Yamamoto, Fei Cheng, Masayuki Murakami

    Abstract: As generative artificial intelligence (GAI) becomes more integrated into higher education and research, universities adopt varied approaches to GAI policy development. To explore these variations, this study conducts a comparative analysis of leading universities in the United States, Japan, and China, examining their institution-wide policies on GAI application and governance. Based on these find… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: Work in progress

  13. arXiv:2503.22346  [pdf, other

    cs.CV

    ArchCAD-400K: An Open Large-Scale Architectural CAD Dataset and New Baseline for Panoptic Symbol Spotting

    Authors: Ruifeng Luo, Zhengjie Liu, Tianxiao Cheng, Jie Wang, Tongjie Wang, Xingguang Wei, Haomin Wang, YanPeng Li, Fu Chai, Fei Cheng, Shenglong Ye, Wenhai Wang, Yanting Zhang, Yu Qiao, Hongjie Zhang, Xianzhong Zhao

    Abstract: Recognizing symbols in architectural CAD drawings is critical for various advanced engineering applications. In this paper, we propose a novel CAD data annotation engine that leverages intrinsic attributes from systematically archived CAD drawings to automatically generate high-quality annotations, thus significantly reducing manual labeling efforts. Utilizing this engine, we construct ArchCAD-400… ▽ More

    Submitted 2 April, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

  14. arXiv:2503.20822  [pdf, other

    eess.IV cs.AI cs.GR

    Synthetic Video Enhances Physical Fidelity in Video Synthesis

    Authors: Qi Zhao, Xingyu Ni, Ziyu Wang, Feng Cheng, Ziyan Yang, Lu Jiang, Bohan Wang

    Abstract: We investigate how to enhance the physical fidelity of video generation models by leveraging synthetic videos derived from computer graphics pipelines. These rendered videos respect real-world physics, such as maintaining 3D consistency, and serve as a valuable resource that can potentially improve video generation models. To harness this potential, we propose a solution that curates and integrate… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  15. arXiv:2503.09675  [pdf, other

    cs.CV

    Accelerating Diffusion Sampling via Exploiting Local Transition Coherence

    Authors: Shangwen Zhu, Han Zhang, Zhantao Yang, Qianyu Peng, Zhao Pu, Huangji Wang, Fan Cheng

    Abstract: Text-based diffusion models have made significant breakthroughs in generating high-quality images and videos from textual descriptions. However, the lengthy sampling time of the denoising process remains a significant bottleneck in practical applications. Previous methods either ignore the statistical relationships between adjacent steps or rely on attention or feature similarity between them, whi… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  16. arXiv:2503.03379  [pdf, other

    cs.AR

    Prosperity: Accelerating Spiking Neural Networks via Product Sparsity

    Authors: Chiyue Wei, Cong Guo, Feng Cheng, Shiyu Li, Hao "Frank" Yang, Hai "Helen" Li, Yiran Chen

    Abstract: Spiking Neural Networks (SNNs) are highly efficient due to their spike-based activation, which inherently produces bit-sparse computation patterns. Existing hardware implementations of SNNs leverage this sparsity pattern to avoid wasteful zero-value computations, yet this approach fails to fully capitalize on the potential efficiency of SNNs. This study introduces a novel sparsity paradigm called… ▽ More

    Submitted 2 April, 2025; v1 submitted 5 March, 2025; originally announced March 2025.

    Comments: HPCA 2025

  17. arXiv:2503.01302  [pdf, ps, other

    cs.CL

    Causal Tree Extraction from Medical Case Reports: A Novel Task for Experts-like Text Comprehension

    Authors: Sakiko Yahata, Zhen Wan, Fei Cheng, Sadao Kurohashi, Hisahiko Sato, Ryozo Nagai

    Abstract: Extracting causal relationships from a medical case report is essential for comprehending the case, particularly its diagnostic process. Since the diagnostic process is regarded as a bottom-up inference, causal relationships in cases naturally form a multi-layered tree structure. The existing tasks, such as medical relation extraction, are insufficient for capturing the causal relationships of an… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: Work in progress

  18. arXiv:2502.17945  [pdf, other

    cs.CL

    Assessing Large Language Models in Agentic Multilingual National Bias

    Authors: Qianying Liu, Katrina Qiyao Wang, Fei Cheng, Sadao Kurohashi

    Abstract: Large Language Models have garnered significant attention for their capabilities in multilingual natural language processing, while studies on risks associated with cross biases are limited to immediate context preferences. Cross-language disparities in reasoning-based recommendations remain largely unexplored, with a lack of even descriptive analysis. This study is the first to address this gap.… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: 13 pages

  19. arXiv:2502.15652  [pdf, ps, other

    cs.AI cs.CL

    Empowering LLMs with Logical Reasoning: A Comprehensive Survey

    Authors: Fengxiang Cheng, Haoxuan Li, Fenrong Liu, Robert van Rooij, Kun Zhang, Zhouchen Lin

    Abstract: Large language models (LLMs) have achieved remarkable successes on various tasks. However, recent studies have found that there are still significant challenges to the logical reasoning abilities of LLMs, which can be categorized into the following two aspects: (1) Logical question answering: LLMs often fail to generate the correct answer within a complex logical problem which requires sophisticat… ▽ More

    Submitted 4 June, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

    Comments: Accepted by IJCAI 2025 (Survey Track)

  20. arXiv:2502.14241  [pdf, other

    cs.HC

    Augmented Reality In-the-Wild: Usage Patterns and Experiences of Working with AR Laptops in Real-World Settings

    Authors: Yi Fei Cheng, Ari Carden, Hyunsung Cho, Catarina G. Fidalgo, Jonathan Wieland, David Lindlbauer

    Abstract: Augmented Reality (AR) is increasingly positioned as a tool for knowledge work, providing beneficial affordances such as a virtually limitless display space that integrates digital information with the user's physical surroundings. However, for AR to supplant traditional screen-based devices in knowledge work, it must support prolonged usage across diverse contexts. Until now, few studies have exp… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  21. arXiv:2502.11594  [pdf, other

    cs.CV

    iMOVE: Instance-Motion-Aware Video Understanding

    Authors: Jiaze Li, Yaya Shi, Zongyang Ma, Haoran Xu, Feng Cheng, Huihui Xiao, Ruiwen Kang, Fan Yang, Tingting Gao, Di Zhang

    Abstract: Enhancing the fine-grained instance spatiotemporal motion perception capabilities of Video Large Language Models is crucial for improving their temporal and general video understanding. However, current models struggle to perceive detailed and complex instance motions. To address these challenges, we have made improvements from both data and model perspectives. In terms of data, we have meticulous… ▽ More

    Submitted 17 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

  22. arXiv:2502.09925  [pdf, other

    cs.CV cs.AI

    TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types

    Authors: Jiankang Chen, Tianke Zhang, Changyi Liu, Haojie Ding, Yaya Shi, Feng Cheng, Huihui Xiao, Bin Wen, Fan Yang, Tingting Gao, Di Zhang

    Abstract: Multimodal visual language models are gaining prominence in open-world applications, driven by advancements in model architectures, training techniques, and high-quality data. However, their performance is often limited by insufficient task-specific data, leading to poor generalization and biased outputs. Existing efforts to increase task diversity in fine-tuning datasets are hindered by the labor… ▽ More

    Submitted 14 February, 2025; originally announced February 2025.

  23. arXiv:2501.17615  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition

    Authors: Zhengdong Yang, Qianying Liu, Sheng Li, Fei Cheng, Chenhui Chu

    Abstract: We present a novel approach centered on the decoding stage of Automatic Speech Recognition (ASR) that enhances multilingual performance, especially for low-resource languages. It utilizes a cross-lingual embedding clustering method to construct a hierarchical Softmax (H-Softmax) decoder, which enables similar tokens across different languages to share similar decoder representations. It addresses… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

  24. arXiv:2501.06173  [pdf, ps, other

    cs.CV

    VideoAuteur: Towards Long Narrative Video Generation

    Authors: Junfei Xiao, Feng Cheng, Lu Qi, Liangke Gui, Jiepeng Cen, Zhibei Ma, Alan Yuille, Lu Jiang

    Abstract: Recent video generation models have shown promising results in producing high-quality video clips lasting several seconds. However, these models face challenges in generating long sequences that convey clear and informative events, limiting their ability to support coherent narrations. In this paper, we present a large-scale cooking video dataset designed to advance long-form narrative generation… ▽ More

    Submitted 7 June, 2025; v1 submitted 10 January, 2025; originally announced January 2025.

    Comments: Preprint, https://videoauteur.github.io/; V2: Method is updated

  25. arXiv:2412.20373  [pdf, other

    cs.LG cs.AI stat.AP

    A Deep Subgrouping Framework for Precision Drug Repurposing via Emulating Clinical Trials on Real-world Patient Data

    Authors: Seungyeon Lee, Ruoqi Liu, Feixiong Cheng, Ping Zhang

    Abstract: Drug repurposing identifies new therapeutic uses for existing drugs, reducing the time and costs compared to traditional de novo drug discovery. Most existing drug repurposing studies using real-world patient data often treat the entire population as homogeneous, ignoring the heterogeneity of treatment responses across patient subgroups. This approach may overlook promising drugs that benefit spec… ▽ More

    Submitted 29 December, 2024; originally announced December 2024.

    Comments: To be published in KDD 2025

    ACM Class: I.2.0; J.3

  26. arXiv:2412.13520  [pdf, other

    cs.AI cs.DB cs.MA

    ROMAS: A Role-Based Multi-Agent System for Database monitoring and Planning

    Authors: Yi Huang, Fangyin Cheng, Fan Zhou, Jiahui Li, Jian Gong, Hongjun Yang, Zhidong Fan, Caigao Jiang, Siqiao Xue, Faqiang Chen

    Abstract: In recent years, Large Language Models (LLMs) have demonstrated remarkable capabilities in data analytics when integrated with Multi-Agent Systems (MAS). However, these systems often struggle with complex tasks that involve diverse functional requirements and intricate data processing challenges, necessitating customized solutions that lack broad applicability. Furthermore, current MAS fail to emu… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  27. arXiv:2412.09601  [pdf, other

    cs.CV cs.AI cs.CL

    TimeRefine: Temporal Grounding with Time Refining Video LLM

    Authors: Xizi Wang, Feng Cheng, Ziyang Wang, Huiyu Wang, Md Mohaiminul Islam, Lorenzo Torresani, Mohit Bansal, Gedas Bertasius, David Crandall

    Abstract: Video temporal grounding aims to localize relevant temporal boundaries in a video given a textual prompt. Recent work has focused on enabling Video LLMs to perform video temporal grounding via next-token prediction of temporal timestamps. However, accurately localizing timestamps in videos remains challenging for Video LLMs when relying solely on temporal token prediction. Our proposed TimeRefine… ▽ More

    Submitted 5 March, 2025; v1 submitted 12 December, 2024; originally announced December 2024.

  28. arXiv:2412.05012  [pdf, other

    cs.CV

    SAMCL: Empowering SAM to Continually Learn from Dynamic Domains

    Authors: Zeqing Wang, Kangye Ji, Di Wang, Fei Cheng

    Abstract: Segment Anything Model (SAM) struggles with segmenting objects in the open world, especially across diverse and dynamic domains. Continual segmentation (CS) is a potential technique to solve this issue, but a significant obstacle is the intractable balance between previous domains (stability) and new domains (plasticity) during CS. Furthermore, how to utilize two kinds of features of SAM, images a… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

    Comments: 14 pages, 11 figures

  29. arXiv:2411.07569  [pdf, other

    cs.IR

    Towards Automated Model Design on Recommender Systems

    Authors: Tunhou Zhang, Dehua Cheng, Yuchen He, Zhengxing Chen, Xiaoliang Dai, Liang Xiong, Yudong Liu, Feng Cheng, Yufan Cao, Feng Yan, Hai Li, Yiran Chen, Wei Wen

    Abstract: The increasing popularity of deep learning models has created new opportunities for developing AI-based recommender systems. Designing recommender systems using deep neural networks requires careful architecture design, and further optimization demands extensive co-design efforts on jointly optimizing model architecture and hardware. Design automation, such as Automated Machine Learning (AutoML),… ▽ More

    Submitted 12 November, 2024; originally announced November 2024.

    Comments: Accepted in ACM Transactions on Recommender Systems. arXiv admin note: substantial text overlap with arXiv:2207.07187

    Journal ref: ACM Transactions on Recommender Systems (TORS) 2024

  30. arXiv:2410.19704  [pdf, other

    q-bio.BM cs.AI cs.LG

    Multi-view biomedical foundation models for molecule-target and property prediction

    Authors: Parthasarathy Suryanarayanan, Yunguang Qiu, Shreyans Sethi, Diwakar Mahajan, Hongyang Li, Yuxin Yang, Elif Eyigoz, Aldo Guzman Saenz, Daniel E. Platt, Timothy H. Rumbell, Kenney Ng, Sanjoy Dey, Myson Burch, Bum Chul Kwon, Pablo Meyer, Feixiong Cheng, Jianying Hu, Joseph A. Morrone

    Abstract: Foundation models applied to bio-molecular space hold promise to accelerate drug discovery. Molecular representation is key to building such models. Previous works have typically focused on a single representation or view of the molecules. Here, we develop a multi-view foundation model approach, that integrates molecular views of graph, image and text. Single-view foundation models are each pre-tr… ▽ More

    Submitted 31 January, 2025; v1 submitted 25 October, 2024; originally announced October 2024.

    Comments: 37 pages including supplement. 10 figures, 8 tables

  31. arXiv:2410.07265  [pdf, other

    cs.AR cs.AI cs.LG cs.SE

    A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

    Authors: Cong Guo, Feng Cheng, Zhixu Du, James Kiessling, Jonathan Ku, Shiyu Li, Ziru Li, Mingyuan Ma, Tergel Molom-Ochir, Benjamin Morris, Haoxuan Shan, Jingwei Sun, Yitu Wang, Chiyue Wei, Xueying Wu, Yuhao Wu, Hao Frank Yang, Jingyang Zhang, Junyao Zhang, Qilin Zheng, Guanglei Zhou, Hai, Li, Yiran Chen

    Abstract: The rapid development of large language models (LLMs) has significantly transformed the field of artificial intelligence, demonstrating remarkable capabilities in natural language processing and moving towards multi-modal functionality. These models are increasingly integrated into diverse applications, impacting both research and industry. However, their development and deployment present substan… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: Accepted by IEEE Circuits and Systems Magazine

  32. arXiv:2410.06550  [pdf, other

    cs.CL cs.AI

    Investigating Cost-Efficiency of LLM-Generated Training Data for Conversational Semantic Frame Analysis

    Authors: Shiho Matta, Yin Jou Huang, Fei Cheng, Hirokazu Kiyomaru, Yugo Murawaki

    Abstract: Recent studies have demonstrated that few-shot learning allows LLMs to generate training data for supervised models at a low cost. However, the quality of LLM-generated data may not entirely match that of human-labeled data. This raises a crucial question: how should one balance the trade-off between the higher quality but more expensive human data and the lower quality yet substantially cheaper L… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: 12 pages including 4 pages of references and appendix. 7 figures

  33. arXiv:2409.00413  [pdf, other

    cs.HC

    iToT: An Interactive System for Customized Tree-of-Thought Generation

    Authors: Alan Boyle, Isha Gupta, Sebastian Hönig, Lukas Mautner, Kenza Amara, Furui Cheng, Mennatallah El-Assady

    Abstract: As language models have become increasingly successful at a wide array of tasks, different prompt engineering methods have been developed alongside them in order to adapt these models to new tasks. One of them is Tree-of-Thoughts (ToT), a prompting strategy and framework for language model inference and problem-solving. It allows the model to explore multiple solution paths and select the best cou… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: 6 pages excl. figures and comments; 8 figures. Will appear in IEEE 2024 NLVIZ Workshop

  34. arXiv:2408.10811  [pdf, other

    cs.CL cs.AI

    Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in?

    Authors: Chengzhi Zhong, Fei Cheng, Qianying Liu, Junfeng Jiang, Zhen Wan, Chenhui Chu, Yugo Murawaki, Sadao Kurohashi

    Abstract: In this study, we investigate whether non-English-centric LLMs, despite their strong performance, `think' in their respective dominant language: more precisely, `think' refers to how the representations of intermediate layers, when un-embedded into the vocabulary space, exhibit higher probabilities for certain dominant languages during generation. We term such languages as internal… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: work in progress

  35. arXiv:2407.17892  [pdf, ps, other

    cs.LG cs.AI

    An Iterative Approach to Topic Modelling

    Authors: Albert Wong, Florence Wing Yau Cheng, Ashley Keung, Yamileth Hercules, Mary Alexandra Garcia, Yew-Wei Lim, Lien Pham

    Abstract: Topic modelling has become increasingly popular for summarizing text data, such as social media posts and articles. However, topic modelling is usually completed in one shot. Assessing the quality of resulting topics is challenging. No effective methods or measures have been developed for assessing the results or for making further enhancements to the topics. In this research, we propose we propos… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  36. arXiv:2407.10545  [pdf, other

    cs.LG cs.AI cs.CV

    LightCL: Compact Continual Learning with Low Memory Footprint For Edge Device

    Authors: Zeqing Wang, Fei Cheng, Kangye Ji, Bohu Huang

    Abstract: Continual learning (CL) is a technique that enables neural networks to constantly adapt to their dynamic surroundings. Despite being overlooked for a long time, this technology can considerably address the customized needs of users in edge devices. Actually, most CL methods require huge resource consumption by the training behavior to acquire generalizability among all tasks for delaying forgettin… ▽ More

    Submitted 8 March, 2025; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Published in ASPDAC'25

  37. arXiv:2407.03963  [pdf, other

    cs.CL cs.AI

    LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

    Authors: LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano , et al. (58 additional authors not shown)

    Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its… ▽ More

    Submitted 30 December, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  38. arXiv:2407.03314  [pdf, other

    cs.CV cs.CL cs.DB

    BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs

    Authors: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

    Abstract: Advancements in large Vision-Language Models have brought precise, accurate image captioning, vital for advancing multi-modal image understanding and processing. Yet these captions often carry lengthy, intertwined contexts that are difficult to parse and frequently overlook essential cues, posing a great barrier for models like GroundingDINO and SDXL, which lack the strong text encoding and syntax… ▽ More

    Submitted 27 March, 2025; v1 submitted 3 July, 2024; originally announced July 2024.

  39. arXiv:2406.10432  [pdf, other

    cs.CL

    AMR-RE: Abstract Meaning Representations for Retrieval-Based In-Context Learning in Relation Extraction

    Authors: Peitao Han, Lis Kanashiro Pereira, Fei Cheng, Wan Jou She, Eiji Aramaki

    Abstract: Existing in-context learning (ICL) methods for relation extraction (RE) often prioritize language similarity over structural similarity, which can lead to overlooking entity relationships. To address this, we propose an AMR-enhanced retrieval-based ICL method for RE. Our model retrieves in-context examples based on semantic structure similarity between task inputs and training samples. Evaluations… ▽ More

    Submitted 24 April, 2025; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to NAACL 2025 SRW

  40. arXiv:2406.06847  [pdf, other

    cs.CV

    Generalized W-Net: Arbitrary-style Chinese Character Synthesization

    Authors: Haochuan Jiang, Guanyu Yang, Fei Cheng, Kaizhu Huang

    Abstract: Synthesizing Chinese characters with consistent style using few stylized examples is challenging. Existing models struggle to generate arbitrary style characters with limited examples. In this paper, we propose the Generalized W-Net, a novel class of W-shaped architectures that addresses this. By incorporating Adaptive Instance Normalization and introducing multi-content, our approach can synthesi… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Journal ref: International Conference on Brain Inspired Cognitive Systems 2023

  41. arXiv:2405.19209  [pdf, other

    cs.CV cs.AI cs.CL

    VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

    Authors: Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin, Jaehong Yoon, Feng Cheng, Gedas Bertasius, Mohit Bansal

    Abstract: Long-form video understanding is complicated by the high redundancy of video data and the abundance of query-irrelevant information. To tackle these challenges, we propose VideoTree, a training-free framework which builds a query-adaptive and hierarchical video representation for LLM reasoning over long-form videos. First, VideoTree extracts query-relevant information from the input video through… ▽ More

    Submitted 14 March, 2025; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: CVPR 2025; First three authors contributed equally; Project page: https://videotree2024.github.io/

  42. arXiv:2405.17137  [pdf, other

    cs.CV

    Jump-teaching: Ultra Efficient and Robust Learning with Noisy Label

    Authors: Kangye Ji, Fei Cheng, Zeqing Wang, Bohu Huang

    Abstract: Sample selection is the most straightforward technique to combat label noise, aiming to distinguish mislabeled samples during training and avoid the degradation of the robustness of the model. In the workflow, $\textit{selecting possibly clean data}$ and $\textit{model update}$ are iterative. However, their interplay and intrinsic characteristics hinder the robustness and efficiency of learning wi… ▽ More

    Submitted 27 August, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  43. arXiv:2405.11921  [pdf, other

    cs.CV

    MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections

    Authors: Jiayue Liu, Xiao Tang, Freeman Cheng, Roy Yang, Zhihao Li, Jianzhuang Liu, Yi Huang, Jiaqi Lin, Shiyong Liu, Xiaofei Wu, Songcen Xu, Chun Yuan

    Abstract: 3D Gaussian Splatting showcases notable advancements in photo-realistic and real-time novel view synthesis. However, it faces challenges in modeling mirror reflections, which exhibit substantial appearance variations from different viewpoints. To tackle this problem, we present MirrorGaussian, the first method for mirror scene reconstruction with real-time rendering based on 3D Gaussian Splatting.… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  44. arXiv:2405.03913  [pdf, other

    q-bio.QM cs.LG stat.ML

    Digital Twin Calibration for Biological System-of-Systems: Cell Culture Manufacturing Process

    Authors: Fuqiang Cheng, Wei Xie, Hua Zheng

    Abstract: Biomanufacturing innovation relies on an efficient Design of Experiments (DoEs) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach for digital twin model calibration. In this study, we consider… ▽ More

    Submitted 28 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  45. arXiv:2405.00708  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Interactive Analysis of LLMs using Meaningful Counterfactuals

    Authors: Furui Cheng, Vilém Zouhar, Robin Shing Moon Chan, Daniel Fürst, Hendrik Strobelt, Mennatallah El-Assady

    Abstract: Counterfactual examples are useful for exploring the decision boundaries of machine learning models and determining feature attributions. How can we apply counterfactual-based methods to analyze and explain LLMs? We identify the following key challenges. First, the generated textual counterfactuals should be meaningful and readable to users and thus can be mentally compared to draw conclusions. Se… ▽ More

    Submitted 23 April, 2024; originally announced May 2024.

    ACM Class: I.2.7; H.5.2

  46. arXiv:2404.10209  [pdf, other

    cs.AI cs.LG

    Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models

    Authors: Siqiao Xue, Danrui Qi, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Hong Yi, Shaodong Liu, Hongjun Yang, Faqiang Chen

    Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interact… ▽ More

    Submitted 24 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  47. arXiv:2403.18504  [pdf

    cs.CL

    AcTED: Automatic Acquisition of Typical Event Duration for Semi-supervised Temporal Commonsense QA

    Authors: Felix Virgo, Fei Cheng, Lis Kanashiro Pereira, Masayuki Asahara, Ichiro Kobayashi, Sadao Kurohashi

    Abstract: We propose a voting-driven semi-supervised approach to automatically acquire the typical duration of an event and use it as pseudo-labeled data. The human evaluation demonstrates that our pseudo labels exhibit surprisingly high accuracy and balanced coverage. In the temporal commonsense QA task, experimental results show that using only pseudo examples of 400 events, we achieve performance compara… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  48. arXiv:2403.11517  [pdf, other

    q-bio.NC cs.HC

    Inter-individual and inter-site neural code conversion without shared stimuli

    Authors: Haibao Wang, Jun Kai Ho, Fan L. Cheng, Shuntaro C. Aoki, Yusuke Muraki, Misato Tanaka, Yukiyasu Kamitani

    Abstract: Inter-individual variability in fine-grained functional brain organization poses challenges for scalable data analysis and modeling. Functional alignment techniques can help mitigate these individual differences but typically require paired brain data with the same stimuli between individuals, which is often unavailable. We present a neural code conversion method that overcomes this constraint by… ▽ More

    Submitted 1 August, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  49. arXiv:2403.08755  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    DAM: Dynamic Adapter Merging for Continual Video QA Learning

    Authors: Feng Cheng, Ziyang Wang, Yi-Lin Sung, Yan-Bo Lin, Mohit Bansal, Gedas Bertasius

    Abstract: We present a parameter-efficient method for continual video question-answering (VidQA) learning. Our method, named DAM, uses the proposed Dynamic Adapter Merging to (i) mitigate catastrophic forgetting, (ii) enable efficient adaptation to continually arriving datasets, (iii) handle inputs from unknown datasets during inference, and (iv) enable knowledge sharing across similar dataset domains. Give… ▽ More

    Submitted 22 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: The first two authors contribute equally

  50. arXiv:2403.03690  [pdf

    cs.CL cs.AI

    Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese

    Authors: Yikun Sun, Zhen Wan, Nobuhiro Ueda, Sakiko Yahata, Fei Cheng, Chenhui Chu, Sadao Kurohashi

    Abstract: The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly developing such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propos… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: COLING 2024. Our code are available here: \href{https://github.com/hitoshizuku7/awesome-Ja-self-instruct}{self-instruct data} and \href{https://github.com/ku-nlp/ja-vicuna-qa-benchmark}{evaluation benchmark}