Skip to main content

Showing 1–50 of 58 results for author: Zou, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.00753  [pdf, other

    cs.CL cs.LG

    A Survey on Large Language Model based Human-Agent Systems

    Authors: Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Langzhou He, Yangning Li, Yuwei Cao, Dongyuan Li, Renhe Jiang, Philip S. Yu

    Abstract: Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability due to hallucinations, difficulty in handling complex tasks, and substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world app… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Paper lists and resources are available at \url{https://github.com/HenryPengZou/Awesome-LLM-Based-Human-Agent-System-Papers}

  2. arXiv:2503.20376  [pdf, other

    cs.IR

    Dewey Long Context Embedding Model: A Technical Report

    Authors: Dun Zhang, Panxiang Zou, Yudong Zhou

    Abstract: This technical report presents the training methodology and evaluation results of the open-source dewey_en_beta embedding model. The increasing demand for retrieval-augmented generation (RAG) systems and the expanding context window capabilities of large language models (LLMs) have created critical challenges for conventional embedding models. Current approaches often struggle to maintain semantic… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: 5 pages, 1 figure

  3. arXiv:2503.06467  [pdf, other

    cs.CV

    SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts

    Authors: Shijia Zhao, Qiming Xia, Xusheng Guo, Pufan Zou, Maoji Zheng, Hai Wu, Chenglu Wen, Cheng Wang

    Abstract: Recently, sparsely-supervised 3D object detection has gained great attention, achieving performance close to fully-supervised 3D objectors while requiring only a few annotated instances. Nevertheless, these methods suffer challenges when accurate labels are extremely absent. In this paper, we propose a boosting strategy, termed SP3D, explicitly utilizing the cross-modal semantic prompts generated… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    Comments: 11 pages, 3 figures

  4. arXiv:2503.03062  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Semi-Supervised In-Context Learning: A Baseline Study

    Authors: Zhengyao Gu, Henry Peng Zou, Yankai Chen, Aiwei Liu, Weizhi Zhang, Philip S. Yu

    Abstract: Most existing work in data selection for In-Context Learning (ICL) has focused on constructing demonstrations from ground truth annotations, with limited attention given to selecting reliable self-generated annotations. In this work, we propose a three-step semi-supervised ICL framework: annotation generation, demonstration selection, and semi-supervised inference. Our baseline, Naive-SemiICL, whi… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  5. arXiv:2503.01814  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    LLMInit: A Free Lunch from Large Language Models for Selective Initialization of Recommendation

    Authors: Weizhi Zhang, Liangwei Yang, Wooseong Yang, Henry Peng Zou, Yuqing Liu, Ke Xu, Sourav Medya, Philip S. Yu

    Abstract: Collaborative filtering models, particularly graph-based approaches, have demonstrated strong performance in capturing user-item interactions for recommendation systems. However, they continue to struggle in cold-start and data-sparse scenarios. The emergence of large language models (LLMs) like GPT and LLaMA presents new possibilities for enhancing recommendation performance, especially in cold-s… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  6. arXiv:2502.19163  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    TestNUC: Enhancing Test-Time Computing Approaches through Neighboring Unlabeled Data Consistency

    Authors: Henry Peng Zou, Zhengyao Gu, Yue Zhou, Yankai Chen, Weizhi Zhang, Liancheng Fang, Yibo Wang, Yangning Li, Kay Liu, Philip S. Yu

    Abstract: Test-time computing approaches, which leverage additional computational resources during inference, have been proven effective in enhancing large language model performance. This work introduces a novel, linearly scaling approach, TestNUC, that improves test-time predictions by leveraging the local consistency of neighboring unlabeled data-it classifies an input instance by considering not only th… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

  7. arXiv:2502.18414  [pdf, other

    cs.CL cs.LG

    GLEAN: Generalized Category Discovery with Diverse and Quality-Enhanced LLM Feedback

    Authors: Henry Peng Zou, Siffi Singh, Yi Nian, Jianfeng He, Jason Cai, Saab Mansour, Hang Su

    Abstract: Generalized Category Discovery (GCD) is a practical and challenging open-world task that aims to recognize both known and novel categories in unlabeled data using limited labeled data from known categories. Due to the lack of supervision, previous GCD methods face significant challenges, such as difficulty in rectifying errors for confusing instances, and inability to effectively uncover and lever… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  8. arXiv:2502.16804  [pdf, other

    cs.MA cs.AI

    Multi-Agent Autonomous Driving Systems with Large Language Models: A Survey of Recent Advances

    Authors: Yaozu Wu, Dongyuan Li, Yankai Chen, Renhe Jiang, Henry Peng Zou, Liancheng Fang, Zhen Wang, Philip S. Yu

    Abstract: Autonomous Driving Systems (ADSs) are revolutionizing transportation by reducing human intervention, improving operational efficiency, and enhancing safety. Large Language Models (LLMs), known for their exceptional planning and reasoning capabilities, have been integrated into ADSs to assist with driving decision-making. However, LLM-based single-agent ADSs face three major challenges: limited per… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  9. arXiv:2502.16414  [pdf, other

    cs.LG cs.AI

    TabGen-ICL: Residual-Aware In-Context Example Selection for Tabular Data Generation

    Authors: Liancheng Fang, Aiwei Liu, Hengrui Zhang, Henry Peng Zou, Weizhi Zhang, Philip S. Yu

    Abstract: Large Language models (LLMs) have achieved encouraging results in tabular data generation. However, existing approaches require fine-tuning, which is computationally expensive. This paper explores an alternative: prompting a fixed LLM with in-context examples. We observe that using randomly selected in-context examples hampers the LLM's performance, resulting in sub-optimal generation quality. To… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

  10. arXiv:2502.07934  [pdf, other

    cs.IT

    Age of Information Optimization with Preemption Strategies for Correlated Systems

    Authors: Egemen Erbayat, Ali Maatouk, Peng Zou, Suresh Subramaniam

    Abstract: In this paper, we examine a multi-sensor system where each sensor monitors multiple dynamic information processes and transmits updates over a shared communication channel. These updates may include correlated information across the various processes. In this type of system, we analyze the impact of preemption, where ongoing transmissions are replaced by newer updates, on minimizing the Age of Inf… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

  11. arXiv:2502.02196  [pdf, other

    cs.CV cs.AI

    Exploiting Ensemble Learning for Cross-View Isolated Sign Language Recognition

    Authors: Fei Wang, Kun Li, Yiqi Nie, Zhangling Duan, Peng Zou, Zhiliang Wu, Yuwei Wang, Yanyan Wei

    Abstract: In this paper, we present our solution to the Cross-View Isolated Sign Language Recognition (CV-ISLR) challenge held at WWW 2025. CV-ISLR addresses a critical issue in traditional Isolated Sign Language Recognition (ISLR), where existing datasets predominantly capture sign language videos from a frontal perspective, while real-world camera angles often vary. To accurately recognize sign language f… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Comments: 3rd Place in Cross-View Isolated Sign Language Recognition Challenge at WWW 2025

  12. arXiv:2501.01945  [pdf, other

    cs.IR cs.AI

    Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap

    Authors: Weizhi Zhang, Yuanchen Bei, Liangwei Yang, Henry Peng Zou, Peilin Zhou, Aiwei Liu, Yinghui Li, Hao Chen, Jianling Wang, Yu Wang, Feiran Huang, Sheng Zhou, Jiajun Bu, Allen Lin, James Caverlee, Fakhri Karray, Irwin King, Philip S. Yu

    Abstract: Cold-start problem is one of the long-standing challenges in recommender systems, focusing on accurately modeling new or interaction-limited users or items to provide better recommendations. Due to the diversification of internet platforms and the exponential growth of users and items, the importance of cold-start recommendation (CSR) is becoming increasingly evident. At the same time, large langu… ▽ More

    Submitted 16 January, 2025; v1 submitted 3 January, 2025; originally announced January 2025.

  13. arXiv:2412.18255  [pdf, other

    cs.CV

    AdaCo: Overcoming Visual Foundation Model Noise in 3D Semantic Segmentation via Adaptive Label Correction

    Authors: Pufan Zou, Shijia Zhao, Weijie Huang, Qiming Xia, Chenglu Wen, Wei Li, Cheng Wang

    Abstract: Recently, Visual Foundation Models (VFMs) have shown a remarkable generalization performance in 3D perception tasks. However, their effectiveness in large-scale outdoor datasets remains constrained by the scarcity of accurate supervision signals, the extensive noise caused by variable outdoor conditions, and the abundance of unknown objects. In this work, we propose a novel label-free learning met… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

    Comments: 2025 AAAI

  14. arXiv:2411.13578  [pdf, other

    cs.CV cs.AI cs.LG

    COOD: Concept-based Zero-shot OOD Detection

    Authors: Zhendong Liu, Yi Nian, Henry Peng Zou, Li Li, Xiyang Hu, Yue Zhao

    Abstract: How can models effectively detect out-of-distribution (OOD) samples in complex, multi-label settings without extensive retraining? Existing OOD detection methods struggle to capture the intricate semantic relationships and label co-occurrences inherent in multi-label settings, often requiring large amounts of training data and failing to generalize to unseen label combinations. While large languag… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

  15. arXiv:2410.11327  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Sequential LLM Framework for Fashion Recommendation

    Authors: Han Liu, Xianfeng Tang, Tianlang Chen, Jiapeng Liu, Indu Indu, Henry Peng Zou, Peng Dai, Roberto Fernandez Galan, Michael D Porter, Dongmei Jia, Ning Zhang, Lian Xiong

    Abstract: The fashion industry is one of the leading domains in the global e-commerce sector, prompting major online retailers to employ recommendation systems for product suggestions and customer convenience. While recommendation systems have been widely studied, most are designed for general e-commerce problems and struggle with the unique challenges of the fashion domain. To address these issues, we prop… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  16. arXiv:2409.12139  [pdf, other

    cs.SD cs.AI eess.AS

    Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models

    Authors: Sijing Chen, Yuan Feng, Laipeng He, Tianwei He, Wendi He, Yanni Hu, Bin Lin, Yiting Lin, Yu Pan, Pengfei Tan, Chengwei Tian, Chen Wang, Zhicheng Wang, Ruoye Xie, Jixun Yao, Quanlei Yan, Yuguang Yang, Jianhao Ye, Jingjing Yin, Yanzhen Yu, Huimin Zhang, Xiang Zhang, Guangcheng Zhao, Hongbin Zhou, Pengpeng Zou

    Abstract: With the advent of the big data and large language model era, zero-shot personalized rapid customization has emerged as a significant trend. In this report, we introduce Takin AudioLLM, a series of techniques and models, mainly including Takin TTS, Takin VC, and Takin Morphing, specifically designed for audiobook production. These models are capable of zero-shot speech production, generating high-… ▽ More

    Submitted 23 September, 2024; v1 submitted 18 September, 2024; originally announced September 2024.

    Comments: Technical Report; 18 pages; typos corrected, references added, demo url modified, author name modified;

  17. arXiv:2409.09927  [pdf, other

    cs.CL cs.AI

    Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges

    Authors: Vinay Samuel, Yue Zhou, Henry Peng Zou

    Abstract: As large language models achieve increasingly impressive results, questions arise about whether such performance is from generalizability or mere data memorization. Thus, numerous data contamination detection methods have been proposed. However, these approaches are often validated with traditional benchmarks and early-stage LLMs, leaving uncertainty about their effectiveness when evaluating state… ▽ More

    Submitted 8 December, 2024; v1 submitted 15 September, 2024; originally announced September 2024.

    Comments: Accepted to COLING 2025 12 pages, 1 figure

  18. arXiv:2409.09214  [pdf, other

    cs.SD eess.AS

    Seed-Music: A Unified Framework for High Quality and Controlled Music Generation

    Authors: Ye Bai, Haonan Chen, Jitong Chen, Zhuo Chen, Yi Deng, Xiaohong Dong, Lamtharn Hantrakul, Weituo Hao, Qingqing Huang, Zhongyi Huang, Dongya Jia, Feihu La, Duc Le, Bochen Li, Chumin Li, Hui Li, Xingxing Li, Shouda Liu, Wei-Tsung Lu, Yiqing Lu, Andrew Shaw, Janne Spijkervet, Yakun Sun, Bo Wang, Ju-Chiang Wang , et al. (13 additional authors not shown)

    Abstract: We introduce Seed-Music, a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our unified framework leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: controlled music generation and post-production editing. For controlled music generation, our system enables vocal music gene… ▽ More

    Submitted 19 September, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: Seed-Music technical report, 20 pages, 5 figures

  19. arXiv:2409.03055  [pdf, other

    cs.SD eess.AS

    SymPAC: Scalable Symbolic Music Generation With Prompts And Constraints

    Authors: Haonan Chen, Jordan B. L. Smith, Janne Spijkervet, Ju-Chiang Wang, Pei Zou, Bochen Li, Qiuqiang Kong, Xingjian Du

    Abstract: Progress in the task of symbolic music generation may be lagging behind other tasks like audio and text generation, in part because of the scarcity of symbolic training data. In this paper, we leverage the greater scale of audio music data by applying pre-trained MIR models (for transcription, beat tracking, structure analysis, etc.) to extract symbolic events and encode them into token sequences.… ▽ More

    Submitted 9 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: ISMIR 2024

  20. arXiv:2407.18910  [pdf, other

    cs.LG cs.IR

    Do We Really Need Graph Convolution During Training? Light Post-Training Graph-ODE for Efficient Recommendation

    Authors: Weizhi Zhang, Liangwei Yang, Zihe Song, Henry Peng Zou, Ke Xu, Liancheng Fang, Philip S. Yu

    Abstract: The efficiency and scalability of graph convolution networks (GCNs) in training recommender systems (RecSys) have been persistent concerns, hindering their deployment in real-world applications. This paper presents a critical examination of the necessity of graph convolutions during the training phase and introduces an innovative alternative: the Light Post-Training Graph Ordinary-Differential-Equ… ▽ More

    Submitted 28 July, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: Accepted to CIKM 2024

  21. arXiv:2407.18416  [pdf, other

    cs.CL cs.AI cs.LG

    PersonaGym: Evaluating Persona Agents and LLMs

    Authors: Vinay Samuel, Henry Peng Zou, Yue Zhou, Shreyas Chaudhari, Ashwin Kalyan, Tanmay Rajpurohit, Ameet Deshpande, Karthik Narasimhan, Vishvak Murahari

    Abstract: Persona agents, which are LLM agents that act according to an assigned persona, have demonstrated impressive contextual response capabilities across various applications. These persona agents offer significant enhancements across diverse sectors, such as education, healthcare, and entertainment, where model developers can align agent responses to different user requirements thereby broadening the… ▽ More

    Submitted 18 December, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: 21 pages, 5 figures

  22. arXiv:2407.12037  [pdf, other

    cs.AR cs.SE

    A Novel HDL Code Generator for Effectively Testing FPGA Logic Synthesis Compilers

    Authors: Zhihao Xu, Shikai Guo, Guilin Zhao, Peiyu Zou, Xiaochen Li, He Jiang

    Abstract: Field Programmable Gate Array (FPGA) logic synthesis compilers (e.g., Vivado, Iverilog, Yosys, and Quartus) are widely applied in Electronic Design Automation (EDA), such as the development of FPGA programs.However, defects (i.e., incorrect synthesis) in logic synthesis compilers may lead to unexpected behaviors in target applications, posing security risks. Therefore, it is crucial to thoroughly… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  23. arXiv:2407.05721  [pdf, other

    cs.CL

    PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation

    Authors: Jinpeng Hu, Tengteng Dong, Luo Gang, Hui Ma, Peng Zou, Xiao Sun, Dan Guo, Xun Yang, Meng Wang

    Abstract: Mental health has attracted substantial attention in recent years and LLM can be an effective technology for alleviating this problem owing to its capability in text understanding and dialogue. However, existing research in this domain often suffers from limitations, such as training on datasets lacking crucial prior knowledge and evidence, and the absence of comprehensive evaluation methods. In t… ▽ More

    Submitted 6 December, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE Transactions on Computational Social Systems. https://github.com/MACLAB-HFUT/PsycoLLM

  24. arXiv:2407.00869  [pdf, other

    cs.CL cs.AI

    Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks

    Authors: Yue Zhou, Henry Peng Zou, Barbara Di Eugenio, Yang Zhang

    Abstract: We find that language models have difficulties generating fallacious and deceptive reasoning. When asked to generate deceptive outputs, language models tend to leak honest counterparts but believe them to be false. Exploiting this deficiency, we propose a jailbreak attack method that elicits an aligned language model for malicious output. Specifically, we query the model to generate a fallacious y… ▽ More

    Submitted 23 September, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted to the main conference of EMNLP 2024

  25. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 2 October, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: Accepted by EMNLP 2024 main conference

  26. arXiv:2406.05392  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas: A Survey

    Authors: Chengyuan Deng, Yiqun Duan, Xin Jin, Heng Chang, Yijun Tian, Han Liu, Yichen Wang, Kuofeng Gao, Henry Peng Zou, Yiqiao Jin, Yijia Xiao, Shenghao Wu, Zongxing Xie, Weimin Lyu, Sihong He, Lu Cheng, Haohan Wang, Jun Zhuang

    Abstract: Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, an… ▽ More

    Submitted 21 October, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  27. arXiv:2404.15954  [pdf, other

    cs.IR cs.LG

    Mixed Supervised Graph Contrastive Learning for Recommendation

    Authors: Weizhi Zhang, Liangwei Yang, Zihe Song, Henry Peng Zou, Ke Xu, Yuanjie Zhu, Philip S. Yu

    Abstract: Recommender systems (RecSys) play a vital role in online platforms, offering users personalized suggestions amidst vast information. Graph contrastive learning aims to learn from high-order collaborative filtering signals with unsupervised augmentation on the user-item bipartite graph, which predominantly relies on the multi-task learning framework involving both the pair-wise recommendation loss… ▽ More

    Submitted 25 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  28. arXiv:2404.15592  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction

    Authors: Henry Peng Zou, Vinay Samuel, Yue Zhou, Weizhi Zhang, Liancheng Fang, Zihe Song, Philip S. Yu, Cornelia Caragea

    Abstract: Existing datasets for attribute value extraction (AVE) predominantly focus on explicit attribute values while neglecting the implicit ones, lack product images, are often not publicly available, and lack an in-depth human inspection across diverse domains. To address these limitations, we present ImplicitAVE, the first, publicly available multimodal dataset for implicit attribute value extraction.… ▽ More

    Submitted 19 July, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted by ACL 2024 (Findings) - Scores: Soundness - 4/4/4, Dataset - 4/4/4, Overall Assessment - 4/3.5/3.5, Meta - 4

  29. arXiv:2404.08886  [pdf, other

    cs.CV cs.AI cs.CL cs.IR cs.LG

    EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM

    Authors: Henry Peng Zou, Gavin Heqing Yu, Ziwei Fan, Dan Bu, Han Liu, Peng Dai, Dongmei Jia, Cornelia Caragea

    Abstract: In e-commerce, accurately extracting product attribute values from multimodal data is crucial for improving user experience and operational efficiency of retailers. However, previous approaches to multimodal attribute value extraction often struggle with implicit attribute values embedded in images or text, rely heavily on extensive labeled data, and can easily confuse similar attribute values. To… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted by NAACL 2024 Industry Track

  30. arXiv:2404.08638  [pdf, other

    cs.IT

    Age of Information Optimization and State Error Analysis for Correlated Multi-Process Multi-Sensor Systems

    Authors: Egemen Erbayat, Ali Maatouk, Peng Zou, Suresh Subramaniam

    Abstract: In this paper, we examine a multi-sensor system where each sensor may monitor more than one time-varying information process and send status updates to a remote monitor over a common channel. We consider that each sensor's status update may contain information about more than one information process in the system subject to the system's constraints. To investigate the impact of this correlation on… ▽ More

    Submitted 20 August, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  31. arXiv:2402.17785  [pdf, other

    cs.SD cs.AI eess.AS

    ByteComposer: a Human-like Melody Composition Method based on Language Model Agent

    Authors: Xia Liang, Xingjian Du, Jiaju Lin, Pei Zou, Yuan Wan, Bilei Zhu

    Abstract: Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks. However, how to design a human-aligned and interpretable melody composition system is still under-explored. To solve this problem, we propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps : "Conception Analysis - Draft Composition - Self-Eval… ▽ More

    Submitted 6 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  32. arXiv:2310.14627  [pdf, other

    cs.CL cs.LG

    CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster Tweet Classification

    Authors: Henry Peng Zou, Yue Zhou, Cornelia Caragea, Doina Caragea

    Abstract: The shared real-time information about natural disasters on social media platforms like Twitter and Facebook plays a critical role in informing volunteers, emergency managers, and response organizations. However, supervised learning models for monitoring disaster events require large amounts of annotated data, making them unrealistic for real-time use in disaster events. To address this challenge,… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted by ISCRAM 2023

  33. arXiv:2310.14583  [pdf, other

    cs.CL cs.LG

    JointMatch: A Unified Approach for Diverse and Collaborative Pseudo-Labeling to Semi-Supervised Text Classification

    Authors: Henry Peng Zou, Cornelia Caragea

    Abstract: Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data. However, existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation. In this paper, we propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning an… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023 (Main)

  34. arXiv:2310.14577  [pdf, other

    cs.CL cs.LG

    DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet Classification via Memory Bank

    Authors: Henry Peng Zou, Yue Zhou, Weizhi Zhang, Cornelia Caragea

    Abstract: During crisis events, people often use social media platforms such as Twitter to disseminate information about the situation, warnings, advice, and support. Emergency relief organizations leverage such information to acquire timely crisis circumstances and expedite rescue operations. While existing works utilize such information to build models for crisis event analysis, fully-supervised approache… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted by EMNLP 2023 (Findings)

  35. arXiv:2310.02593  [pdf

    cs.AI

    A ModelOps-based Framework for Intelligent Medical Knowledge Extraction

    Authors: Hongxin Ding, Peinie Zou, Zhiyuan Wang, Junfeng Zhao, Yasha Wang, Qiang Zhou

    Abstract: Extracting medical knowledge from healthcare texts enhances downstream tasks like medical knowledge graph construction and clinical decision-making. However, the construction and application of knowledge extraction models lack automation, reusability and unified management, leading to inefficiencies for researchers and high barriers for non-AI experts such as doctors, to utilize knowledge extracti… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  36. arXiv:2309.16247  [pdf, other

    eess.AS cs.SD

    PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

    Authors: Xiang Lyu, Yuhang Cao, Qing Wang, Jingjing Yin, Yuguang Yang, Pengpeng Zou, Yanni Hu, Heng Lu

    Abstract: Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts. However, SA-ASR poses unique challenges due to factors such as speaker overlap, speaker variability, background noise, and reverberation. In this study, we propose PP-MeT system, a real-world personal… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  37. arXiv:2304.12256  [pdf, other

    cs.IT

    How Costly Was That (In)Decision?

    Authors: Peng Zou, Ali Maatouk, Jin Zhang, Suresh Subramaniam

    Abstract: In this paper, we introduce a new metric, named Penalty upon Decision (PuD), for measuring the impact of communication delays and state changes at the source on a remote decision maker. Specifically, the metric quantifies the performance degradation at the decision maker's side due to delayed, erroneous, and (possibly) missed decisions. We clarify the rationale for the metric and derive closed-for… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

  38. arXiv:2303.10561  [pdf, other

    cs.CV

    Spatial-temporal Transformer for Affective Behavior Analysis

    Authors: Peng Zou, Rui Wang, Kehua Wen, Yasi Peng, Xiao Sun

    Abstract: The in-the-wild affective behavior analysis has been an important study. In this paper, we submit our solutions for the 5th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW), which includes V-A Estimation, Facial Expression Classification and AU Detection Sub-challenges. We propose a Transformer Encoder with Multi-Head Attention framework to learn the distribution of both… ▽ More

    Submitted 19 March, 2023; originally announced March 2023.

  39. arXiv:2208.03051  [pdf, other

    cs.CV cs.CL cs.SD eess.AS eess.IV

    Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment Analysis

    Authors: Jia Li, Ziyang Zhang, Junjie Lang, Yueqi Jiang, Liuwei An, Peng Zou, Yangyang Xu, Sheng Gao, Jie Lin, Chunxiao Fan, Xiao Sun, Meng Wang

    Abstract: In this paper, we present our solutions for the Multimodal Sentiment Analysis Challenge (MuSe) 2022, which includes MuSe-Humor, MuSe-Reaction and MuSe-Stress Sub-challenges. The MuSe 2022 focuses on humor detection, emotional reactions and multimodal emotional stress utilizing different modalities and data sets. In our work, different kinds of multimodal features are extracted, including acoustic,… ▽ More

    Submitted 12 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

    Comments: 8 pages, 2 figures, to appear in MuSe 2022 (ACM MM2022 co-located workshop)

  40. arXiv:2206.08224  [pdf, other

    cs.CV

    Multi scale Feature Extraction and Fusion for Online Knowledge Distillation

    Authors: Panpan Zou, Yinglei Teng, Tao Niu

    Abstract: Online knowledge distillation conducts knowledge transfer among all student models to alleviate the reliance on pre-trained models. However, existing online methods rely heavily on the prediction distributions and neglect the further exploration of the representational knowledge. In this paper, we propose a novel Multi-scale Feature Extraction and Fusion method (MFEF) for online knowledge distilla… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: 12 pages, 3 figures

  41. arXiv:2206.08186  [pdf, other

    cs.CV cs.AI

    Asymptotic Soft Cluster Pruning for Deep Neural Networks

    Authors: Tao Niu, Yinglei Teng, Panpan Zou

    Abstract: Filter pruning method introduces structural sparsity by removing selected filters and is thus particularly effective for reducing complexity. Previous works empirically prune networks from the point of view that filter with smaller norm contributes less to the final results. However, such criteria has been proven sensitive to the distribution of filters, and the accuracy may hard to recover since… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  42. arXiv:2202.12081  [pdf, other

    cs.IR

    Community Trend Prediction on Heterogeneous Graph in E-commerce

    Authors: Jiahao Yuan, Zhao Li, Pengcheng Zou, Xuan Gao, Jinwei Pan, Wendi Ji, Xiaoling Wang

    Abstract: In online shopping, ever-changing fashion trends make merchants need to prepare more differentiated products to meet the diversified demands, and e-commerce platforms need to capture the market trend with a prophetic vision. For the trend prediction, the attribute tags, as the essential description of items, can genuinely reflect the decision basis of consumers. However, few existing works explore… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: Published as a full paper at WSDM 2022

  43. arXiv:2201.02968  [pdf, other

    cs.LG cs.AI cs.NI

    An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic

    Authors: Tao Niu, Yinglei Teng, Zhu Han, Panpan Zou

    Abstract: Recently, the applications of deep neural network (DNN) have been very prominent in many fields such as computer vision (CV) and natural language processing (NLP) due to its superior feature extraction performance. However, the high-dimension parameter model and large-scale mathematical calculation restrict the execution efficiency, especially for Internet of Things (IoT) devices. Different from t… ▽ More

    Submitted 9 January, 2022; originally announced January 2022.

  44. arXiv:2112.05725  [pdf, ps, other

    cs.DS cs.CC

    Beyond the Longest Letter-duplicated Subsequence Problem

    Authors: Wenfeng Lai, Adiesha Liyanage, Binhai Zhu, Peng Zou

    Abstract: Given a sequence $S$ of length $n$, a letter-duplicated subsequence is a subsequence of $S$ in the form of $x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$ with $x_i\inΣ$, $x_j\neq x_{j+1}$ and $d_i\geq 2$ for all $i$ in $[k]$ and $j$ in $[k-1]$. A linear time algorithm for computing the longest letter-duplicated subsequence (LLDS) of $S$ can be easily obtained. In this paper, we focus on two variants of this… ▽ More

    Submitted 4 January, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

    Comments: 18 pages

    MSC Class: 68W01; 68W32

  45. arXiv:2110.05020  [pdf, other

    cs.SD cs.MM eess.AS

    MELONS: generating melody with long-term structure using transformers and structure graph

    Authors: Yi Zou, Pei Zou, Yi Zhao, Kaixiang Zhang, Ran Zhang, Xiaorui Wang

    Abstract: The creation of long melody sequences requires effective expression of coherent musical structure. However, there is no clear representation of musical structure. Recent works on music generation have suggested various approaches to deal with the structural information of music, but generating a full-song melody with clear long-term structure remains a challenge. In this paper, we propose MELONS,… ▽ More

    Submitted 3 November, 2021; v1 submitted 11 October, 2021; originally announced October 2021.

  46. arXiv:2109.14062  [pdf, ps, other

    cs.IT

    Overage and Staleness Metrics for Status Update Systems

    Authors: Peng Zou, Jin Zhang, Xianglin Wei, Suresh Subramaniam

    Abstract: Status update systems consist of sensors that take measurements of a physical parameter and transmit them to a remote receiver. Age of Information (AoI) has been studied extensively as a metric for the freshness of information in such systems with and without an enforced hard or soft deadline. In this paper, we propose three metrics for status update systems to measure the ability of different que… ▽ More

    Submitted 9 October, 2021; v1 submitted 28 September, 2021; originally announced September 2021.

  47. arXiv:2011.04166  [pdf, other

    cs.AI

    Distant Supervision for E-commerce Query Segmentation via Attention Network

    Authors: Zhao Li, Donghui Ding, Pengcheng Zou, Yu Gong, Xi Chen, Ji Zhang, Jianliang Gao, Youxi Wu, Yucong Duan

    Abstract: The booming online e-commerce platforms demand highly accurate approaches to segment queries that carry the product requirements of consumers. Recent works have shown that the supervised methods, especially those based on deep learning, are attractive for achieving better performance on the problem of query segmentation. However, the lack of labeled data is still a big challenge for training a dee… ▽ More

    Submitted 8 November, 2020; originally announced November 2020.

  48. arXiv:2003.14069  [pdf, ps, other

    cs.IT eess.SY

    On Age and Value of Information in Status Update Systems

    Authors: Peng Zou, Omur Ozel, Suresh Subramaniam

    Abstract: Motivated by the inherent value of packets arising in many cyber-physical applications (e.g., due to precision of the information content or an alarm message), we consider status update systems with update packets carrying values as well as their generation time stamps. Once generated, a status update packet has a random initial value and a deterministic deadline after which it is not useful (ulti… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

  49. arXiv:2003.13577  [pdf, ps, other

    cs.IT cs.NI

    Maintaining Information Freshness in Power-Efficient Status Update Systems

    Authors: Parisa Rafiee, Peng Zou, Omur Ozel, Suresh Subramaniam

    Abstract: This paper is motivated by emerging edge computing systems which consist of sensor nodes that acquire and process information and then transmit status updates to an edge receiver for possible further processing. As power is a scarce resource at the sensor nodes, the system is modeled as a tandem computation-transmission queue with power-efficient computing. Jobs arrive at the computation server wi… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  50. arXiv:2002.04778  [pdf, other

    cs.DS cs.CC

    Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms

    Authors: Manuel Lafond, Binhai Zhu, Peng Zou

    Abstract: Recently, due to the genomic sequence analysis in several types of cancer, the genomic data based on {\em copy number profiles} ({\em CNP} for short) are getting more and more popular. A CNP is a vector where each component is a non-negative integer representing the number of copies of a specific gene or segment of interest. In this paper, we present two streams of results. The first is the nega… ▽ More

    Submitted 11 February, 2020; originally announced February 2020.

    Comments: 16 pages, 3 figures

    MSC Class: 68 ACM Class: F.2.2; J.3