-
A Survey on Large Language Model based Human-Agent Systems
Authors:
Henry Peng Zou,
Wei-Chieh Huang,
Yaozu Wu,
Yankai Chen,
Chunyu Miao,
Hoang Nguyen,
Yue Zhou,
Weizhi Zhang,
Liancheng Fang,
Langzhou He,
Yangning Li,
Yuwei Cao,
Dongyuan Li,
Renhe Jiang,
Philip S. Yu
Abstract:
Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability due to hallucinations, difficulty in handling complex tasks, and substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world app…
▽ More
Recent advances in large language models (LLMs) have sparked growing interest in building fully autonomous agents. However, fully autonomous LLM-based agents still face significant challenges, including limited reliability due to hallucinations, difficulty in handling complex tasks, and substantial safety and ethical risks, all of which limit their feasibility and trustworthiness in real-world applications. To overcome these limitations, LLM-based human-agent systems (LLM-HAS) incorporate human-provided information, feedback, or control into the agent system to enhance system performance, reliability and safety. This paper provides the first comprehensive and structured survey of LLM-HAS. It clarifies fundamental concepts, systematically presents core components shaping these systems, including environment & profiling, human feedback, interaction types, orchestration and communication, explores emerging applications, and discusses unique challenges and opportunities. By consolidating current knowledge and offering a structured overview, we aim to foster further research and innovation in this rapidly evolving interdisciplinary field. Paper lists and resources are available at https://github.com/HenryPengZou/Awesome-LLM-Based-Human-Agent-System-Papers.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Dewey Long Context Embedding Model: A Technical Report
Authors:
Dun Zhang,
Panxiang Zou,
Yudong Zhou
Abstract:
This technical report presents the training methodology and evaluation results of the open-source dewey_en_beta embedding model. The increasing demand for retrieval-augmented generation (RAG) systems and the expanding context window capabilities of large language models (LLMs) have created critical challenges for conventional embedding models. Current approaches often struggle to maintain semantic…
▽ More
This technical report presents the training methodology and evaluation results of the open-source dewey_en_beta embedding model. The increasing demand for retrieval-augmented generation (RAG) systems and the expanding context window capabilities of large language models (LLMs) have created critical challenges for conventional embedding models. Current approaches often struggle to maintain semantic coherence when processing documents exceeding typical sequence length limitations, significantly impacting retrieval performance in knowledge-intensive applications. This paper presents dewey_en_beta, a novel text embedding model that achieves excellent performance on MTEB (Eng, v2) and LongEmbed benchmark while supporting 128K token sequences. Our technical contribution centers on chunk alignment training, an innovative methodology that enables the simultaneous generation of localized chunk embeddings and global document-level representations through distillation. Information regarding the model release can be found at https://huggingface.co/infgrad/dewey_en_beta.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts
Authors:
Shijia Zhao,
Qiming Xia,
Xusheng Guo,
Pufan Zou,
Maoji Zheng,
Hai Wu,
Chenglu Wen,
Cheng Wang
Abstract:
Recently, sparsely-supervised 3D object detection has gained great attention, achieving performance close to fully-supervised 3D objectors while requiring only a few annotated instances. Nevertheless, these methods suffer challenges when accurate labels are extremely absent. In this paper, we propose a boosting strategy, termed SP3D, explicitly utilizing the cross-modal semantic prompts generated…
▽ More
Recently, sparsely-supervised 3D object detection has gained great attention, achieving performance close to fully-supervised 3D objectors while requiring only a few annotated instances. Nevertheless, these methods suffer challenges when accurate labels are extremely absent. In this paper, we propose a boosting strategy, termed SP3D, explicitly utilizing the cross-modal semantic prompts generated from Large Multimodal Models (LMMs) to boost the 3D detector with robust feature discrimination capability under sparse annotation settings. Specifically, we first develop a Confident Points Semantic Transfer (CPST) module that generates accurate cross-modal semantic prompts through boundary-constrained center cluster selection. Based on these accurate semantic prompts, which we treat as seed points, we introduce a Dynamic Cluster Pseudo-label Generation (DCPG) module to yield pseudo-supervision signals from the geometry shape of multi-scale neighbor points. Additionally, we design a Distribution Shape score (DS score) that chooses high-quality supervision signals for the initial training of the 3D detector. Experiments on the KITTI dataset and Waymo Open Dataset (WOD) have validated that SP3D can enhance the performance of sparsely supervised detectors by a large margin under meager labeling conditions. Moreover, we verified SP3D in the zero-shot setting, where its performance exceeded that of the state-of-the-art methods. The code is available at https://github.com/xmuqimingxia/SP3D.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
Semi-Supervised In-Context Learning: A Baseline Study
Authors:
Zhengyao Gu,
Henry Peng Zou,
Yankai Chen,
Aiwei Liu,
Weizhi Zhang,
Philip S. Yu
Abstract:
Most existing work in data selection for In-Context Learning (ICL) has focused on constructing demonstrations from ground truth annotations, with limited attention given to selecting reliable self-generated annotations. In this work, we propose a three-step semi-supervised ICL framework: annotation generation, demonstration selection, and semi-supervised inference. Our baseline, Naive-SemiICL, whi…
▽ More
Most existing work in data selection for In-Context Learning (ICL) has focused on constructing demonstrations from ground truth annotations, with limited attention given to selecting reliable self-generated annotations. In this work, we propose a three-step semi-supervised ICL framework: annotation generation, demonstration selection, and semi-supervised inference. Our baseline, Naive-SemiICL, which prompts select high-confidence self-generated demonstrations for ICL prompting, outperforms a 16-shot baseline by an average of 9.94% across 16 datasets. We further introduce IterPSD, an annotation approach that refines pseudo-demonstrations iteratively, achieving up to 6.8% additional gains in classification tasks. Lastly, we reveal a scaling law for semi-supervised ICL, where models achieve optimal performance with over 1,000 demonstrations.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
LLMInit: A Free Lunch from Large Language Models for Selective Initialization of Recommendation
Authors:
Weizhi Zhang,
Liangwei Yang,
Wooseong Yang,
Henry Peng Zou,
Yuqing Liu,
Ke Xu,
Sourav Medya,
Philip S. Yu
Abstract:
Collaborative filtering models, particularly graph-based approaches, have demonstrated strong performance in capturing user-item interactions for recommendation systems. However, they continue to struggle in cold-start and data-sparse scenarios. The emergence of large language models (LLMs) like GPT and LLaMA presents new possibilities for enhancing recommendation performance, especially in cold-s…
▽ More
Collaborative filtering models, particularly graph-based approaches, have demonstrated strong performance in capturing user-item interactions for recommendation systems. However, they continue to struggle in cold-start and data-sparse scenarios. The emergence of large language models (LLMs) like GPT and LLaMA presents new possibilities for enhancing recommendation performance, especially in cold-start settings. Despite their promise, LLMs pose challenges related to scalability and efficiency due to their high computational demands and limited ability to model complex user-item relationships effectively. In this work, we introduce a novel perspective on leveraging LLMs for CF model initialization. Through experiments, we uncover an embedding collapse issue when scaling CF models to larger embedding dimensions. To effectively harness large-scale LLM embeddings, we propose innovative selective initialization strategies utilizing random, uniform, and variance-based index sampling. Our comprehensive evaluation on multiple real-world datasets demonstrates significant performance gains across various CF models while maintaining a lower computational cost compared to existing LLM-based recommendation approaches.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
TestNUC: Enhancing Test-Time Computing Approaches through Neighboring Unlabeled Data Consistency
Authors:
Henry Peng Zou,
Zhengyao Gu,
Yue Zhou,
Yankai Chen,
Weizhi Zhang,
Liancheng Fang,
Yibo Wang,
Yangning Li,
Kay Liu,
Philip S. Yu
Abstract:
Test-time computing approaches, which leverage additional computational resources during inference, have been proven effective in enhancing large language model performance. This work introduces a novel, linearly scaling approach, TestNUC, that improves test-time predictions by leveraging the local consistency of neighboring unlabeled data-it classifies an input instance by considering not only th…
▽ More
Test-time computing approaches, which leverage additional computational resources during inference, have been proven effective in enhancing large language model performance. This work introduces a novel, linearly scaling approach, TestNUC, that improves test-time predictions by leveraging the local consistency of neighboring unlabeled data-it classifies an input instance by considering not only the model's prediction on that instance but also on neighboring unlabeled instances. We evaluate TestNUC across eight diverse datasets, spanning intent classification, topic mining, domain discovery, and emotion detection, demonstrating its consistent superiority over baseline methods such as standard prompting and self-consistency. Furthermore, TestNUC can be seamlessly integrated with existing test-time computing approaches, substantially boosting their performance. Our analysis reveals that TestNUC scales effectively with increasing amounts of unlabeled data and performs robustly across different embedding models, making it practical for real-world applications. Our code is available at https://github.com/HenryPengZou/TestNUC.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
GLEAN: Generalized Category Discovery with Diverse and Quality-Enhanced LLM Feedback
Authors:
Henry Peng Zou,
Siffi Singh,
Yi Nian,
Jianfeng He,
Jason Cai,
Saab Mansour,
Hang Su
Abstract:
Generalized Category Discovery (GCD) is a practical and challenging open-world task that aims to recognize both known and novel categories in unlabeled data using limited labeled data from known categories. Due to the lack of supervision, previous GCD methods face significant challenges, such as difficulty in rectifying errors for confusing instances, and inability to effectively uncover and lever…
▽ More
Generalized Category Discovery (GCD) is a practical and challenging open-world task that aims to recognize both known and novel categories in unlabeled data using limited labeled data from known categories. Due to the lack of supervision, previous GCD methods face significant challenges, such as difficulty in rectifying errors for confusing instances, and inability to effectively uncover and leverage the semantic meanings of discovered clusters. Therefore, additional annotations are usually required for real-world applicability. However, human annotation is extremely costly and inefficient. To address these issues, we propose GLEAN, a unified framework for generalized category discovery that actively learns from diverse and quality-enhanced LLM feedback. Our approach leverages three different types of LLM feedback to: (1) improve instance-level contrastive features, (2) generate category descriptions, and (3) align uncertain instances with LLM-selected category descriptions. Extensive experiments demonstrate the superior performance of \MethodName over state-of-the-art models across diverse datasets, metrics, and supervision settings. Our code is available at https://github.com/amazon-science/Glean.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Multi-Agent Autonomous Driving Systems with Large Language Models: A Survey of Recent Advances
Authors:
Yaozu Wu,
Dongyuan Li,
Yankai Chen,
Renhe Jiang,
Henry Peng Zou,
Liancheng Fang,
Zhen Wang,
Philip S. Yu
Abstract:
Autonomous Driving Systems (ADSs) are revolutionizing transportation by reducing human intervention, improving operational efficiency, and enhancing safety. Large Language Models (LLMs), known for their exceptional planning and reasoning capabilities, have been integrated into ADSs to assist with driving decision-making. However, LLM-based single-agent ADSs face three major challenges: limited per…
▽ More
Autonomous Driving Systems (ADSs) are revolutionizing transportation by reducing human intervention, improving operational efficiency, and enhancing safety. Large Language Models (LLMs), known for their exceptional planning and reasoning capabilities, have been integrated into ADSs to assist with driving decision-making. However, LLM-based single-agent ADSs face three major challenges: limited perception, insufficient collaboration, and high computational demands. To address these issues, recent advancements in LLM-based multi-agent ADSs have focused on improving inter-agent communication and cooperation. This paper provides a frontier survey of LLM-based multi-agent ADSs. We begin with a background introduction to related concepts, followed by a categorization of existing LLM-based approaches based on different agent interaction modes. We then discuss agent-human interactions in scenarios where LLM-based agents engage with humans. Finally, we summarize key applications, datasets, and challenges in this field to support future research (https://anonymous.4open.science/r/LLM-based_Multi-agent_ADS-3A5C/README.md).
△ Less
Submitted 23 February, 2025;
originally announced February 2025.
-
TabGen-ICL: Residual-Aware In-Context Example Selection for Tabular Data Generation
Authors:
Liancheng Fang,
Aiwei Liu,
Hengrui Zhang,
Henry Peng Zou,
Weizhi Zhang,
Philip S. Yu
Abstract:
Large Language models (LLMs) have achieved encouraging results in tabular data generation. However, existing approaches require fine-tuning, which is computationally expensive. This paper explores an alternative: prompting a fixed LLM with in-context examples. We observe that using randomly selected in-context examples hampers the LLM's performance, resulting in sub-optimal generation quality. To…
▽ More
Large Language models (LLMs) have achieved encouraging results in tabular data generation. However, existing approaches require fine-tuning, which is computationally expensive. This paper explores an alternative: prompting a fixed LLM with in-context examples. We observe that using randomly selected in-context examples hampers the LLM's performance, resulting in sub-optimal generation quality. To address this, we propose a novel in-context learning framework: TabGen-ICL, to enhance the in-context learning ability of LLMs for tabular data generation. TabGen-ICL operates iteratively, retrieving a subset of real samples that represent the residual between currently generated samples and true data distributions. This approach serves two purposes: locally, it provides more effective in-context learning examples for the LLM in each iteration; globally, it progressively narrows the gap between generated and real data. Extensive experiments on five real-world tabular datasets demonstrate that TabGen-ICL significantly outperforms the random selection strategy. Specifically, it reduces the error rate by a margin of $3.5\%-42.2\%$ on fidelity metrics. We demonstrate for the first time that prompting a fixed LLM can yield high-quality synthetic tabular data. The code is provided in the \href{https://github.com/fangliancheng/TabGEN-ICL}{link}.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
Age of Information Optimization with Preemption Strategies for Correlated Systems
Authors:
Egemen Erbayat,
Ali Maatouk,
Peng Zou,
Suresh Subramaniam
Abstract:
In this paper, we examine a multi-sensor system where each sensor monitors multiple dynamic information processes and transmits updates over a shared communication channel. These updates may include correlated information across the various processes. In this type of system, we analyze the impact of preemption, where ongoing transmissions are replaced by newer updates, on minimizing the Age of Inf…
▽ More
In this paper, we examine a multi-sensor system where each sensor monitors multiple dynamic information processes and transmits updates over a shared communication channel. These updates may include correlated information across the various processes. In this type of system, we analyze the impact of preemption, where ongoing transmissions are replaced by newer updates, on minimizing the Age of Information (AoI). While preemption is optimal in some scenarios, its effectiveness in multi-sensor correlated systems remains an open question. To address this, we introduce a probabilistic preemption policy, where the source sensor preemption decision is stochastic. We derive closed-form expressions for the AoI and frame its optimization as a sum of linear ratios problem, a well-known NP-hard problem. To navigate this complexity, we establish an upper bound on the iterations using a branch-and-bound algorithm by leveraging a reformulation of the problem. This analysis reveals linear scalability with the number of processes and a logarithmic dependency on the reciprocal of the error that shows the optimal solution can be efficiently found. Building on these findings, we show how different correlation matrices can lead to distinct optimal preemption strategies. Interestingly, we demonstrate that the diversity of processes within the sensors' packets, as captured by the correlation matrix, plays a more significant role in preemption priority than the number of updates.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Exploiting Ensemble Learning for Cross-View Isolated Sign Language Recognition
Authors:
Fei Wang,
Kun Li,
Yiqi Nie,
Zhangling Duan,
Peng Zou,
Zhiliang Wu,
Yuwei Wang,
Yanyan Wei
Abstract:
In this paper, we present our solution to the Cross-View Isolated Sign Language Recognition (CV-ISLR) challenge held at WWW 2025. CV-ISLR addresses a critical issue in traditional Isolated Sign Language Recognition (ISLR), where existing datasets predominantly capture sign language videos from a frontal perspective, while real-world camera angles often vary. To accurately recognize sign language f…
▽ More
In this paper, we present our solution to the Cross-View Isolated Sign Language Recognition (CV-ISLR) challenge held at WWW 2025. CV-ISLR addresses a critical issue in traditional Isolated Sign Language Recognition (ISLR), where existing datasets predominantly capture sign language videos from a frontal perspective, while real-world camera angles often vary. To accurately recognize sign language from different viewpoints, models must be capable of understanding gestures from multiple angles, making cross-view recognition challenging. To address this, we explore the advantages of ensemble learning, which enhances model robustness and generalization across diverse views. Our approach, built on a multi-dimensional Video Swin Transformer model, leverages this ensemble strategy to achieve competitive performance. Finally, our solution ranked 3rd in both the RGB-based ISLR and RGB-D-based ISLR tracks, demonstrating the effectiveness in handling the challenges of cross-view recognition. The code is available at: https://github.com/Jiafei127/CV_ISLR_WWW2025.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap
Authors:
Weizhi Zhang,
Yuanchen Bei,
Liangwei Yang,
Henry Peng Zou,
Peilin Zhou,
Aiwei Liu,
Yinghui Li,
Hao Chen,
Jianling Wang,
Yu Wang,
Feiran Huang,
Sheng Zhou,
Jiajun Bu,
Allen Lin,
James Caverlee,
Fakhri Karray,
Irwin King,
Philip S. Yu
Abstract:
Cold-start problem is one of the long-standing challenges in recommender systems, focusing on accurately modeling new or interaction-limited users or items to provide better recommendations. Due to the diversification of internet platforms and the exponential growth of users and items, the importance of cold-start recommendation (CSR) is becoming increasingly evident. At the same time, large langu…
▽ More
Cold-start problem is one of the long-standing challenges in recommender systems, focusing on accurately modeling new or interaction-limited users or items to provide better recommendations. Due to the diversification of internet platforms and the exponential growth of users and items, the importance of cold-start recommendation (CSR) is becoming increasingly evident. At the same time, large language models (LLMs) have achieved tremendous success and possess strong capabilities in modeling user and item information, providing new potential for cold-start recommendations. However, the research community on CSR still lacks a comprehensive review and reflection in this field. Based on this, in this paper, we stand in the context of the era of large language models and provide a comprehensive review and discussion on the roadmap, related literature, and future directions of CSR. Specifically, we have conducted an exploration of the development path of how existing CSR utilizes information, from content features, graph relations, and domain information, to the world knowledge possessed by large language models, aiming to provide new insights for both the research and industrial communities on CSR. Related resources of cold-start recommendations are collected and continuously updated for the community in https://github.com/YuanchenBei/Awesome-Cold-Start-Recommendation.
△ Less
Submitted 16 January, 2025; v1 submitted 3 January, 2025;
originally announced January 2025.
-
AdaCo: Overcoming Visual Foundation Model Noise in 3D Semantic Segmentation via Adaptive Label Correction
Authors:
Pufan Zou,
Shijia Zhao,
Weijie Huang,
Qiming Xia,
Chenglu Wen,
Wei Li,
Cheng Wang
Abstract:
Recently, Visual Foundation Models (VFMs) have shown a remarkable generalization performance in 3D perception tasks. However, their effectiveness in large-scale outdoor datasets remains constrained by the scarcity of accurate supervision signals, the extensive noise caused by variable outdoor conditions, and the abundance of unknown objects. In this work, we propose a novel label-free learning met…
▽ More
Recently, Visual Foundation Models (VFMs) have shown a remarkable generalization performance in 3D perception tasks. However, their effectiveness in large-scale outdoor datasets remains constrained by the scarcity of accurate supervision signals, the extensive noise caused by variable outdoor conditions, and the abundance of unknown objects. In this work, we propose a novel label-free learning method, Adaptive Label Correction (AdaCo), for 3D semantic segmentation. AdaCo first introduces the Cross-modal Label Generation Module (CLGM), providing cross-modal supervision with the formidable interpretive capabilities of the VFMs. Subsequently, AdaCo incorporates the Adaptive Noise Corrector (ANC), updating and adjusting the noisy samples within this supervision iteratively during training. Moreover, we develop an Adaptive Robust Loss (ARL) function to modulate each sample's sensitivity to noisy supervision, preventing potential underfitting issues associated with robust loss. Our proposed AdaCo can effectively mitigate the performance limitations of label-free learning networks in 3D semantic segmentation tasks. Extensive experiments on two outdoor benchmark datasets highlight the superior performance of our method.
△ Less
Submitted 24 December, 2024;
originally announced December 2024.
-
COOD: Concept-based Zero-shot OOD Detection
Authors:
Zhendong Liu,
Yi Nian,
Henry Peng Zou,
Li Li,
Xiyang Hu,
Yue Zhao
Abstract:
How can models effectively detect out-of-distribution (OOD) samples in complex, multi-label settings without extensive retraining? Existing OOD detection methods struggle to capture the intricate semantic relationships and label co-occurrences inherent in multi-label settings, often requiring large amounts of training data and failing to generalize to unseen label combinations. While large languag…
▽ More
How can models effectively detect out-of-distribution (OOD) samples in complex, multi-label settings without extensive retraining? Existing OOD detection methods struggle to capture the intricate semantic relationships and label co-occurrences inherent in multi-label settings, often requiring large amounts of training data and failing to generalize to unseen label combinations. While large language models have revolutionized zero-shot OOD detection, they primarily focus on single-label scenarios, leaving a critical gap in handling real-world tasks where samples can be associated with multiple interdependent labels. To address these challenges, we introduce COOD, a novel zero-shot multi-label OOD detection framework. COOD leverages pre-trained vision-language models, enhancing them with a concept-based label expansion strategy and a new scoring function. By enriching the semantic space with both positive and negative concepts for each label, our approach models complex label dependencies, precisely differentiating OOD samples without the need for additional training. Extensive experiments demonstrate that our method significantly outperforms existing approaches, achieving approximately 95% average AUROC on both VOC and COCO datasets, while maintaining robust performance across varying numbers of labels and different types of OOD samples.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
Sequential LLM Framework for Fashion Recommendation
Authors:
Han Liu,
Xianfeng Tang,
Tianlang Chen,
Jiapeng Liu,
Indu Indu,
Henry Peng Zou,
Peng Dai,
Roberto Fernandez Galan,
Michael D Porter,
Dongmei Jia,
Ning Zhang,
Lian Xiong
Abstract:
The fashion industry is one of the leading domains in the global e-commerce sector, prompting major online retailers to employ recommendation systems for product suggestions and customer convenience. While recommendation systems have been widely studied, most are designed for general e-commerce problems and struggle with the unique challenges of the fashion domain. To address these issues, we prop…
▽ More
The fashion industry is one of the leading domains in the global e-commerce sector, prompting major online retailers to employ recommendation systems for product suggestions and customer convenience. While recommendation systems have been widely studied, most are designed for general e-commerce problems and struggle with the unique challenges of the fashion domain. To address these issues, we propose a sequential fashion recommendation framework that leverages a pre-trained large language model (LLM) enhanced with recommendation-specific prompts. Our framework employs parameter-efficient fine-tuning with extensive fashion data and introduces a novel mix-up-based retrieval technique for translating text into relevant product suggestions. Extensive experiments show our proposed framework significantly enhances fashion recommendation performance.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
Authors:
Sijing Chen,
Yuan Feng,
Laipeng He,
Tianwei He,
Wendi He,
Yanni Hu,
Bin Lin,
Yiting Lin,
Yu Pan,
Pengfei Tan,
Chengwei Tian,
Chen Wang,
Zhicheng Wang,
Ruoye Xie,
Jixun Yao,
Quanlei Yan,
Yuguang Yang,
Jianhao Ye,
Jingjing Yin,
Yanzhen Yu,
Huimin Zhang,
Xiang Zhang,
Guangcheng Zhao,
Hongbin Zhou,
Pengpeng Zou
Abstract:
With the advent of the big data and large language model era, zero-shot personalized rapid customization has emerged as a significant trend. In this report, we introduce Takin AudioLLM, a series of techniques and models, mainly including Takin TTS, Takin VC, and Takin Morphing, specifically designed for audiobook production. These models are capable of zero-shot speech production, generating high-…
▽ More
With the advent of the big data and large language model era, zero-shot personalized rapid customization has emerged as a significant trend. In this report, we introduce Takin AudioLLM, a series of techniques and models, mainly including Takin TTS, Takin VC, and Takin Morphing, specifically designed for audiobook production. These models are capable of zero-shot speech production, generating high-quality speech that is nearly indistinguishable from real human speech and facilitating individuals to customize the speech content according to their own needs. Specifically, we first introduce Takin TTS, a neural codec language model that builds upon an enhanced neural speech codec and a multi-task training framework, capable of generating high-fidelity natural speech in a zero-shot way. For Takin VC, we advocate an effective content and timbre joint modeling approach to improve the speaker similarity, while advocating for a conditional flow matching based decoder to further enhance its naturalness and expressiveness. Last, we propose the Takin Morphing system with highly decoupled and advanced timbre and prosody modeling approaches, which enables individuals to customize speech production with their preferred timbre and prosody in a precise and controllable manner. Extensive experiments validate the effectiveness and robustness of our Takin AudioLLM series models. For detailed demos, please refer to https://everest-ai.github.io/takinaudiollm/.
△ Less
Submitted 23 September, 2024; v1 submitted 18 September, 2024;
originally announced September 2024.
-
Towards Data Contamination Detection for Modern Large Language Models: Limitations, Inconsistencies, and Oracle Challenges
Authors:
Vinay Samuel,
Yue Zhou,
Henry Peng Zou
Abstract:
As large language models achieve increasingly impressive results, questions arise about whether such performance is from generalizability or mere data memorization. Thus, numerous data contamination detection methods have been proposed. However, these approaches are often validated with traditional benchmarks and early-stage LLMs, leaving uncertainty about their effectiveness when evaluating state…
▽ More
As large language models achieve increasingly impressive results, questions arise about whether such performance is from generalizability or mere data memorization. Thus, numerous data contamination detection methods have been proposed. However, these approaches are often validated with traditional benchmarks and early-stage LLMs, leaving uncertainty about their effectiveness when evaluating state-of-the-art LLMs on the contamination of more challenging benchmarks. To address this gap and provide a dual investigation of SOTA LLM contamination status and detection method robustness, we evaluate five contamination detection approaches with four state-of-the-art LLMs across eight challenging datasets often used in modern LLM evaluation. Our analysis reveals that (1) Current methods have non-trivial limitations in their assumptions and practical applications; (2) Notable difficulties exist in detecting contamination introduced during instruction fine-tuning with answer augmentation; and (3) Limited consistencies between SOTA contamination detection techniques. These findings highlight the complexity of contamination detection in advanced LLMs and the urgent need for further research on robust and generalizable contamination evaluation. Our code is available at https://github.com/vsamuel2003/data-contamination.
△ Less
Submitted 8 December, 2024; v1 submitted 15 September, 2024;
originally announced September 2024.
-
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation
Authors:
Ye Bai,
Haonan Chen,
Jitong Chen,
Zhuo Chen,
Yi Deng,
Xiaohong Dong,
Lamtharn Hantrakul,
Weituo Hao,
Qingqing Huang,
Zhongyi Huang,
Dongya Jia,
Feihu La,
Duc Le,
Bochen Li,
Chumin Li,
Hui Li,
Xingxing Li,
Shouda Liu,
Wei-Tsung Lu,
Yiqing Lu,
Andrew Shaw,
Janne Spijkervet,
Yakun Sun,
Bo Wang,
Ju-Chiang Wang
, et al. (13 additional authors not shown)
Abstract:
We introduce Seed-Music, a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our unified framework leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: controlled music generation and post-production editing. For controlled music generation, our system enables vocal music gene…
▽ More
We introduce Seed-Music, a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our unified framework leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: controlled music generation and post-production editing. For controlled music generation, our system enables vocal music generation with performance controls from multi-modal inputs, including style descriptions, audio references, musical scores, and voice prompts. For post-production editing, it offers interactive tools for editing lyrics and vocal melodies directly in the generated audio.
We encourage readers to listen to demo audio examples at https://team.doubao.com/seed-music "https://team.doubao.com/seed-music".
△ Less
Submitted 19 September, 2024; v1 submitted 13 September, 2024;
originally announced September 2024.
-
SymPAC: Scalable Symbolic Music Generation With Prompts And Constraints
Authors:
Haonan Chen,
Jordan B. L. Smith,
Janne Spijkervet,
Ju-Chiang Wang,
Pei Zou,
Bochen Li,
Qiuqiang Kong,
Xingjian Du
Abstract:
Progress in the task of symbolic music generation may be lagging behind other tasks like audio and text generation, in part because of the scarcity of symbolic training data. In this paper, we leverage the greater scale of audio music data by applying pre-trained MIR models (for transcription, beat tracking, structure analysis, etc.) to extract symbolic events and encode them into token sequences.…
▽ More
Progress in the task of symbolic music generation may be lagging behind other tasks like audio and text generation, in part because of the scarcity of symbolic training data. In this paper, we leverage the greater scale of audio music data by applying pre-trained MIR models (for transcription, beat tracking, structure analysis, etc.) to extract symbolic events and encode them into token sequences. To the best of our knowledge, this work is the first to demonstrate the feasibility of training symbolic generation models solely from auto-transcribed audio data. Furthermore, to enhance the controllability of the trained model, we introduce SymPAC (Symbolic Music Language Model with Prompting And Constrained Generation), which is distinguished by using (a) prompt bars in encoding and (b) a technique called Constrained Generation via Finite State Machines (FSMs) during inference time. We show the flexibility and controllability of this approach, which may be critical in making music AI useful to creators and users.
△ Less
Submitted 9 September, 2024; v1 submitted 4 September, 2024;
originally announced September 2024.
-
Do We Really Need Graph Convolution During Training? Light Post-Training Graph-ODE for Efficient Recommendation
Authors:
Weizhi Zhang,
Liangwei Yang,
Zihe Song,
Henry Peng Zou,
Ke Xu,
Liancheng Fang,
Philip S. Yu
Abstract:
The efficiency and scalability of graph convolution networks (GCNs) in training recommender systems (RecSys) have been persistent concerns, hindering their deployment in real-world applications. This paper presents a critical examination of the necessity of graph convolutions during the training phase and introduces an innovative alternative: the Light Post-Training Graph Ordinary-Differential-Equ…
▽ More
The efficiency and scalability of graph convolution networks (GCNs) in training recommender systems (RecSys) have been persistent concerns, hindering their deployment in real-world applications. This paper presents a critical examination of the necessity of graph convolutions during the training phase and introduces an innovative alternative: the Light Post-Training Graph Ordinary-Differential-Equation (LightGODE). Our investigation reveals that the benefits of GCNs are more pronounced during testing rather than training. Motivated by this, LightGODE utilizes a novel post-training graph convolution method that bypasses the computation-intensive message passing of GCNs and employs a non-parametric continuous graph ordinary-differential-equation (ODE) to dynamically model node representations. This approach drastically reduces training time while achieving fine-grained post-training graph convolution to avoid the distortion of the original training embedding space, termed the embedding discrepancy issue. We validate our model across several real-world datasets of different scales, demonstrating that LightGODE not only outperforms GCN-based models in terms of efficiency and effectiveness but also significantly mitigates the embedding discrepancy commonly associated with deeper graph convolution layers. Our LightGODE challenges the prevailing paradigms in RecSys training and suggests re-evaluating the role of graph convolutions, potentially guiding future developments of efficient large-scale graph-based RecSys.
△ Less
Submitted 28 July, 2024; v1 submitted 26 July, 2024;
originally announced July 2024.
-
PersonaGym: Evaluating Persona Agents and LLMs
Authors:
Vinay Samuel,
Henry Peng Zou,
Yue Zhou,
Shreyas Chaudhari,
Ashwin Kalyan,
Tanmay Rajpurohit,
Ameet Deshpande,
Karthik Narasimhan,
Vishvak Murahari
Abstract:
Persona agents, which are LLM agents that act according to an assigned persona, have demonstrated impressive contextual response capabilities across various applications. These persona agents offer significant enhancements across diverse sectors, such as education, healthcare, and entertainment, where model developers can align agent responses to different user requirements thereby broadening the…
▽ More
Persona agents, which are LLM agents that act according to an assigned persona, have demonstrated impressive contextual response capabilities across various applications. These persona agents offer significant enhancements across diverse sectors, such as education, healthcare, and entertainment, where model developers can align agent responses to different user requirements thereby broadening the scope of agent applications. However, evaluating persona agent performance is incredibly challenging due to the complexity of assessing persona adherence in free-form interactions across various environments that are relevant to each persona agent. We introduce PersonaGym, the first dynamic evaluation framework for assessing persona agents, and PersonaScore, the first automated human-aligned metric grounded in decision theory for comprehensive large-scale evaluation of persona agents. Our evaluation of 6 open and closed-source LLMs, using a benchmark encompassing 200 personas and 10,000 questions, reveals significant opportunities for advancement in persona agent capabilities across state-of-the-art models. For example, Claude 3.5 Sonnet only has a 2.97% relative improvement in PersonaScore than GPT 3.5 despite being a much more advanced model. Importantly, we find that increased model size and complexity do not necessarily imply enhanced persona agent capabilities thereby highlighting the pressing need for algorithmic and architectural invention towards faithful and performant persona agents.
△ Less
Submitted 18 December, 2024; v1 submitted 25 July, 2024;
originally announced July 2024.
-
A Novel HDL Code Generator for Effectively Testing FPGA Logic Synthesis Compilers
Authors:
Zhihao Xu,
Shikai Guo,
Guilin Zhao,
Peiyu Zou,
Xiaochen Li,
He Jiang
Abstract:
Field Programmable Gate Array (FPGA) logic synthesis compilers (e.g., Vivado, Iverilog, Yosys, and Quartus) are widely applied in Electronic Design Automation (EDA), such as the development of FPGA programs.However, defects (i.e., incorrect synthesis) in logic synthesis compilers may lead to unexpected behaviors in target applications, posing security risks. Therefore, it is crucial to thoroughly…
▽ More
Field Programmable Gate Array (FPGA) logic synthesis compilers (e.g., Vivado, Iverilog, Yosys, and Quartus) are widely applied in Electronic Design Automation (EDA), such as the development of FPGA programs.However, defects (i.e., incorrect synthesis) in logic synthesis compilers may lead to unexpected behaviors in target applications, posing security risks. Therefore, it is crucial to thoroughly test logic synthesis compilers to eliminate such defects.Despite several Hardware Design Language (HDL) code generators (e.g., Verismith) have been proposed to find defects in logic synthesis compilers, the effectiveness of these generators is still limited by the simple code generation strategy and the monogeneity of the generated HDL code.This paper proposes LegoHDL, a novel method to generate syntax valid HDL code for comprehensively testing FPGA logic synthesis compilers.LegoHDL can generate more complex and diverse defect-trigger HDL code (e.g., Verilog, VHDL, and SystemVerilog) by leveraging the guidance of abstract syntax tree and the extensive function block libraries of cyber-physical systems. Extensive experiments show that the diversity and defect-trigger capability of HDL code generated by LegoHDL are significantly better than the state-of-the-art method (i.e., Verismith).In three months, LegoHDL has reported 20 new defects--many of which are deep and important; 16 of them have been confirmed.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
PsycoLLM: Enhancing LLM for Psychological Understanding and Evaluation
Authors:
Jinpeng Hu,
Tengteng Dong,
Luo Gang,
Hui Ma,
Peng Zou,
Xiao Sun,
Dan Guo,
Xun Yang,
Meng Wang
Abstract:
Mental health has attracted substantial attention in recent years and LLM can be an effective technology for alleviating this problem owing to its capability in text understanding and dialogue. However, existing research in this domain often suffers from limitations, such as training on datasets lacking crucial prior knowledge and evidence, and the absence of comprehensive evaluation methods. In t…
▽ More
Mental health has attracted substantial attention in recent years and LLM can be an effective technology for alleviating this problem owing to its capability in text understanding and dialogue. However, existing research in this domain often suffers from limitations, such as training on datasets lacking crucial prior knowledge and evidence, and the absence of comprehensive evaluation methods. In this paper, we propose a specialized psychological large language model (LLM), named PsycoLLM, trained on a proposed high-quality psychological dataset, including single-turn QA, multi-turn dialogues and knowledge-based QA. Specifically, we construct multi-turn dialogues through a three-step pipeline comprising multi-turn QA generation, evidence judgment, and dialogue refinement. We augment this process with real-world psychological case backgrounds extracted from online platforms, enhancing the relevance and applicability of the generated data. Additionally, to compare the performance of PsycoLLM with other LLMs, we develop a comprehensive psychological benchmark based on authoritative psychological counseling examinations in China, which includes assessments of professional ethics, theoretical proficiency, and case analysis. The experimental results on the benchmark illustrate the effectiveness of PsycoLLM, which demonstrates superior performance compared to other LLMs.
△ Less
Submitted 6 December, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks
Authors:
Yue Zhou,
Henry Peng Zou,
Barbara Di Eugenio,
Yang Zhang
Abstract:
We find that language models have difficulties generating fallacious and deceptive reasoning. When asked to generate deceptive outputs, language models tend to leak honest counterparts but believe them to be false. Exploiting this deficiency, we propose a jailbreak attack method that elicits an aligned language model for malicious output. Specifically, we query the model to generate a fallacious y…
▽ More
We find that language models have difficulties generating fallacious and deceptive reasoning. When asked to generate deceptive outputs, language models tend to leak honest counterparts but believe them to be false. Exploiting this deficiency, we propose a jailbreak attack method that elicits an aligned language model for malicious output. Specifically, we query the model to generate a fallacious yet deceptively real procedure for the harmful behavior. Since a fallacious procedure is generally considered fake and thus harmless by LLMs, it helps bypass the safeguard mechanism. Yet the output is factually harmful since the LLM cannot fabricate fallacious solutions but proposes truthful ones. We evaluate our approach over five safety-aligned large language models, comparing four previous jailbreak methods, and show that our approach achieves competitive performance with more harmful outputs. We believe the findings could be extended beyond model safety, such as self-verification and hallucination.
△ Less
Submitted 23 September, 2024; v1 submitted 30 June, 2024;
originally announced July 2024.
-
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing
Authors:
Jiangshu Du,
Yibo Wang,
Wenting Zhao,
Zhongfen Deng,
Shuaiqi Liu,
Renze Lou,
Henry Peng Zou,
Pranav Narayanan Venkit,
Nan Zhang,
Mukund Srinath,
Haoran Ranran Zhang,
Vipul Gupta,
Yinghui Li,
Tao Li,
Fei Wang,
Qin Liu,
Tianlin Liu,
Pengzhi Gao,
Congying Xia,
Chen Xing,
Jiayang Cheng,
Zhaowei Wang,
Ying Su,
Raj Sanjay Shah,
Ruohao Guo
, et al. (15 additional authors not shown)
Abstract:
This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th…
▽ More
This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload?
This study focuses on the topic of LLMs assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with "deficiency" labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) "LLMs as Reviewers", how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) "LLMs as Metareviewers", how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis.
△ Less
Submitted 2 October, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas: A Survey
Authors:
Chengyuan Deng,
Yiqun Duan,
Xin Jin,
Heng Chang,
Yijun Tian,
Han Liu,
Yichen Wang,
Kuofeng Gao,
Henry Peng Zou,
Yiqiao Jin,
Yijia Xiao,
Shenghao Wu,
Zongxing Xie,
Weimin Lyu,
Sihong He,
Lu Cheng,
Haohan Wang,
Jun Zhuang
Abstract:
Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, an…
▽ More
Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, and data privacy, to emerging problems like truthfulness and social norms. We critically analyze existing research aimed at understanding, examining, and mitigating these ethical risks. Our survey underscores integrating ethical standards and societal values into the development of LLMs, thereby guiding the development of responsible and ethically aligned language models.
△ Less
Submitted 21 October, 2024; v1 submitted 8 June, 2024;
originally announced June 2024.
-
Mixed Supervised Graph Contrastive Learning for Recommendation
Authors:
Weizhi Zhang,
Liangwei Yang,
Zihe Song,
Henry Peng Zou,
Ke Xu,
Yuanjie Zhu,
Philip S. Yu
Abstract:
Recommender systems (RecSys) play a vital role in online platforms, offering users personalized suggestions amidst vast information. Graph contrastive learning aims to learn from high-order collaborative filtering signals with unsupervised augmentation on the user-item bipartite graph, which predominantly relies on the multi-task learning framework involving both the pair-wise recommendation loss…
▽ More
Recommender systems (RecSys) play a vital role in online platforms, offering users personalized suggestions amidst vast information. Graph contrastive learning aims to learn from high-order collaborative filtering signals with unsupervised augmentation on the user-item bipartite graph, which predominantly relies on the multi-task learning framework involving both the pair-wise recommendation loss and the contrastive loss. This decoupled design can cause inconsistent optimization direction from different losses, which leads to longer convergence time and even sub-optimal performance. Besides, the self-supervised contrastive loss falls short in alleviating the data sparsity issue in RecSys as it learns to differentiate users/items from different views without providing extra supervised collaborative filtering signals during augmentations. In this paper, we propose Mixed Supervised Graph Contrastive Learning for Recommendation (MixSGCL) to address these concerns. MixSGCL originally integrates the training of recommendation and unsupervised contrastive losses into a supervised contrastive learning loss to align the two tasks within one optimization direction. To cope with the data sparsity issue, instead unsupervised augmentation, we further propose node-wise and edge-wise mixup to mine more direct supervised collaborative filtering signals based on existing user-item interactions. Extensive experiments on three real-world datasets demonstrate that MixSGCL surpasses state-of-the-art methods, achieving top performance on both accuracy and efficiency. It validates the effectiveness of MixSGCL with our coupled design on supervised graph contrastive learning.
△ Less
Submitted 25 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction
Authors:
Henry Peng Zou,
Vinay Samuel,
Yue Zhou,
Weizhi Zhang,
Liancheng Fang,
Zihe Song,
Philip S. Yu,
Cornelia Caragea
Abstract:
Existing datasets for attribute value extraction (AVE) predominantly focus on explicit attribute values while neglecting the implicit ones, lack product images, are often not publicly available, and lack an in-depth human inspection across diverse domains. To address these limitations, we present ImplicitAVE, the first, publicly available multimodal dataset for implicit attribute value extraction.…
▽ More
Existing datasets for attribute value extraction (AVE) predominantly focus on explicit attribute values while neglecting the implicit ones, lack product images, are often not publicly available, and lack an in-depth human inspection across diverse domains. To address these limitations, we present ImplicitAVE, the first, publicly available multimodal dataset for implicit attribute value extraction. ImplicitAVE, sourced from the MAVE dataset, is carefully curated and expanded to include implicit AVE and multimodality, resulting in a refined dataset of 68k training and 1.6k testing data across five domains. We also explore the application of multimodal large language models (MLLMs) to implicit AVE, establishing a comprehensive benchmark for MLLMs on the ImplicitAVE dataset. Six recent MLLMs with eleven variants are evaluated across diverse settings, revealing that implicit value extraction remains a challenging task for MLLMs. The contributions of this work include the development and release of ImplicitAVE, and the exploration and benchmarking of various MLLMs for implicit AVE, providing valuable insights and potential future research directions. Dataset and code are available at https://github.com/HenryPengZou/ImplicitAVE
△ Less
Submitted 19 July, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM
Authors:
Henry Peng Zou,
Gavin Heqing Yu,
Ziwei Fan,
Dan Bu,
Han Liu,
Peng Dai,
Dongmei Jia,
Cornelia Caragea
Abstract:
In e-commerce, accurately extracting product attribute values from multimodal data is crucial for improving user experience and operational efficiency of retailers. However, previous approaches to multimodal attribute value extraction often struggle with implicit attribute values embedded in images or text, rely heavily on extensive labeled data, and can easily confuse similar attribute values. To…
▽ More
In e-commerce, accurately extracting product attribute values from multimodal data is crucial for improving user experience and operational efficiency of retailers. However, previous approaches to multimodal attribute value extraction often struggle with implicit attribute values embedded in images or text, rely heavily on extensive labeled data, and can easily confuse similar attribute values. To address these issues, we introduce EIVEN, a data- and parameter-efficient generative framework that pioneers the use of multimodal LLM for implicit attribute value extraction. EIVEN leverages the rich inherent knowledge of a pre-trained LLM and vision encoder to reduce reliance on labeled data. We also introduce a novel Learning-by-Comparison technique to reduce model confusion by enforcing attribute value comparison and difference identification. Additionally, we construct initial open-source datasets for multimodal implicit attribute value extraction. Our extensive experiments reveal that EIVEN significantly outperforms existing methods in extracting implicit attribute values while requiring less labeled data.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Age of Information Optimization and State Error Analysis for Correlated Multi-Process Multi-Sensor Systems
Authors:
Egemen Erbayat,
Ali Maatouk,
Peng Zou,
Suresh Subramaniam
Abstract:
In this paper, we examine a multi-sensor system where each sensor may monitor more than one time-varying information process and send status updates to a remote monitor over a common channel. We consider that each sensor's status update may contain information about more than one information process in the system subject to the system's constraints. To investigate the impact of this correlation on…
▽ More
In this paper, we examine a multi-sensor system where each sensor may monitor more than one time-varying information process and send status updates to a remote monitor over a common channel. We consider that each sensor's status update may contain information about more than one information process in the system subject to the system's constraints. To investigate the impact of this correlation on the overall system's performance, we conduct an analysis of both the average Age of Information (AoI) and source state estimation error at the monitor. Building upon this analysis, we subsequently explore the impact of the packet arrivals, correlation probabilities, and rate of processes' state change on the system's performance. Next, we consider the case where sensors have limited sensing abilities and distribute a portion of their sensing abilities across the different processes. We optimize this distribution to minimize the total AoI of the system. Interestingly, we show that monitoring multiple processes from a single source may not always be beneficial. Our results also reveal that the optimal sensing distribution for diverse arrival rates may exhibit a rapid regime switch, rather than smooth transitions, after crossing critical system values. This highlights the importance of identifying these critical thresholds to ensure effective system performance.
△ Less
Submitted 20 August, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
ByteComposer: a Human-like Melody Composition Method based on Language Model Agent
Authors:
Xia Liang,
Xingjian Du,
Jiaju Lin,
Pei Zou,
Yuan Wan,
Bilei Zhu
Abstract:
Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks. However, how to design a human-aligned and interpretable melody composition system is still under-explored. To solve this problem, we propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps : "Conception Analysis - Draft Composition - Self-Eval…
▽ More
Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks. However, how to design a human-aligned and interpretable melody composition system is still under-explored. To solve this problem, we propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps : "Conception Analysis - Draft Composition - Self-Evaluation and Modification - Aesthetic Selection". This framework seamlessly blends the interactive and knowledge-understanding features of LLMs with existing symbolic music generation models, thereby achieving a melody composition agent comparable to human creators. We conduct extensive experiments on GPT4 and several open-source large language models, which substantiate our framework's effectiveness. Furthermore, professional music composers were engaged in multi-dimensional evaluations, the final results demonstrated that across various facets of music composition, ByteComposer agent attains the level of a novice melody composer.
△ Less
Submitted 6 March, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster Tweet Classification
Authors:
Henry Peng Zou,
Yue Zhou,
Cornelia Caragea,
Doina Caragea
Abstract:
The shared real-time information about natural disasters on social media platforms like Twitter and Facebook plays a critical role in informing volunteers, emergency managers, and response organizations. However, supervised learning models for monitoring disaster events require large amounts of annotated data, making them unrealistic for real-time use in disaster events. To address this challenge,…
▽ More
The shared real-time information about natural disasters on social media platforms like Twitter and Facebook plays a critical role in informing volunteers, emergency managers, and response organizations. However, supervised learning models for monitoring disaster events require large amounts of annotated data, making them unrealistic for real-time use in disaster events. To address this challenge, we present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting where only a small number of annotated data is required. Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data, mimicking the early stage of a disaster. Through integrating effective semi-supervised learning ideas and incorporating TextMixUp, CrisisMatch achieves performance improvement on two disaster datasets of 11.2\% on average. Further analyses are also provided for the influence of the number of labeled data and out-of-domain results.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
JointMatch: A Unified Approach for Diverse and Collaborative Pseudo-Labeling to Semi-Supervised Text Classification
Authors:
Henry Peng Zou,
Cornelia Caragea
Abstract:
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data. However, existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation. In this paper, we propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning an…
▽ More
Semi-supervised text classification (SSTC) has gained increasing attention due to its ability to leverage unlabeled data. However, existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation. In this paper, we propose JointMatch, a holistic approach for SSTC that addresses these challenges by unifying ideas from recent semi-supervised learning and the task of learning with noise. JointMatch adaptively adjusts classwise thresholds based on the learning status of different classes to mitigate model bias towards current easy classes. Additionally, JointMatch alleviates error accumulation by utilizing two differently initialized networks to teach each other in a cross-labeling manner. To maintain divergence between the two networks for mutual learning, we introduce a strategy that weighs more disagreement data while also allowing the utilization of high-quality agreement data for training. Experimental results on benchmark datasets demonstrate the superior performance of JointMatch, achieving a significant 5.13% improvement on average. Notably, JointMatch delivers impressive results even in the extremely-scarce-label setting, obtaining 86% accuracy on AG News with only 5 labels per class. We make our code available at https://github.com/HenryPengZou/JointMatch.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet Classification via Memory Bank
Authors:
Henry Peng Zou,
Yue Zhou,
Weizhi Zhang,
Cornelia Caragea
Abstract:
During crisis events, people often use social media platforms such as Twitter to disseminate information about the situation, warnings, advice, and support. Emergency relief organizations leverage such information to acquire timely crisis circumstances and expedite rescue operations. While existing works utilize such information to build models for crisis event analysis, fully-supervised approache…
▽ More
During crisis events, people often use social media platforms such as Twitter to disseminate information about the situation, warnings, advice, and support. Emergency relief organizations leverage such information to acquire timely crisis circumstances and expedite rescue operations. While existing works utilize such information to build models for crisis event analysis, fully-supervised approaches require annotating vast amounts of data and are impractical due to limited response time. On the other hand, semi-supervised models can be biased, performing moderately well for certain classes while performing extremely poorly for others, resulting in substantially negative effects on disaster monitoring and rescue. In this paper, we first study two recent debiasing methods on semi-supervised crisis tweet classification. Then we propose a simple but effective debiasing method, DeCrisisMB, that utilizes a Memory Bank to store and perform equal sampling for generated pseudo-labels from each class at each training iteration. Extensive experiments are conducted to compare different debiasing methods' performance and generalization ability in both in-distribution and out-of-distribution settings. The results demonstrate the superior performance of our proposed method. Our code is available at https://github.com/HenryPengZou/DeCrisisMB.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
A ModelOps-based Framework for Intelligent Medical Knowledge Extraction
Authors:
Hongxin Ding,
Peinie Zou,
Zhiyuan Wang,
Junfeng Zhao,
Yasha Wang,
Qiang Zhou
Abstract:
Extracting medical knowledge from healthcare texts enhances downstream tasks like medical knowledge graph construction and clinical decision-making. However, the construction and application of knowledge extraction models lack automation, reusability and unified management, leading to inefficiencies for researchers and high barriers for non-AI experts such as doctors, to utilize knowledge extracti…
▽ More
Extracting medical knowledge from healthcare texts enhances downstream tasks like medical knowledge graph construction and clinical decision-making. However, the construction and application of knowledge extraction models lack automation, reusability and unified management, leading to inefficiencies for researchers and high barriers for non-AI experts such as doctors, to utilize knowledge extraction. To address these issues, we propose a ModelOps-based intelligent medical knowledge extraction framework that offers a low-code system for model selection, training, evaluation and optimization. Specifically, the framework includes a dataset abstraction mechanism based on multi-layer callback functions, a reusable model training, monitoring and management mechanism. We also propose a model recommendation method based on dataset similarity, which helps users quickly find potentially suitable models for a given dataset. Our framework provides convenience for researchers to develop models and simplifies model access for non-AI experts such as doctors.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Authors:
Xiang Lyu,
Yuhang Cao,
Qing Wang,
Jingjing Yin,
Yuguang Yang,
Pengpeng Zou,
Yanni Hu,
Heng Lu
Abstract:
Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts. However, SA-ASR poses unique challenges due to factors such as speaker overlap, speaker variability, background noise, and reverberation. In this study, we propose PP-MeT system, a real-world personal…
▽ More
Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts. However, SA-ASR poses unique challenges due to factors such as speaker overlap, speaker variability, background noise, and reverberation. In this study, we propose PP-MeT system, a real-world personalized prompt based meeting transcription system, which consists of a clustering system, target-speaker voice activity detection (TS-VAD), and TS-ASR. Specifically, we utilize target-speaker embedding as a prompt in TS-VAD and TS-ASR modules in our proposed system. In constrast with previous system, we fully leverage pre-trained models for system initialization, thereby bestowing our approach with heightened generalizability and precision. Experiments on M2MeT2.0 Challenge dataset show that our system achieves a cp-CER of 11.27% on the test set, ranking first in both fixed and open training conditions.
△ Less
Submitted 28 September, 2023;
originally announced September 2023.
-
How Costly Was That (In)Decision?
Authors:
Peng Zou,
Ali Maatouk,
Jin Zhang,
Suresh Subramaniam
Abstract:
In this paper, we introduce a new metric, named Penalty upon Decision (PuD), for measuring the impact of communication delays and state changes at the source on a remote decision maker. Specifically, the metric quantifies the performance degradation at the decision maker's side due to delayed, erroneous, and (possibly) missed decisions. We clarify the rationale for the metric and derive closed-for…
▽ More
In this paper, we introduce a new metric, named Penalty upon Decision (PuD), for measuring the impact of communication delays and state changes at the source on a remote decision maker. Specifically, the metric quantifies the performance degradation at the decision maker's side due to delayed, erroneous, and (possibly) missed decisions. We clarify the rationale for the metric and derive closed-form expressions for its average in M/GI/1 and M/GI/1/1 with blocking settings. Numerical results are then presented to support our expressions and to compare the infinite and zero buffer regimes. Interestingly, comparing these two settings sheds light on a buffer length design challenge that is essential to minimize the average PuD.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Spatial-temporal Transformer for Affective Behavior Analysis
Authors:
Peng Zou,
Rui Wang,
Kehua Wen,
Yasi Peng,
Xiao Sun
Abstract:
The in-the-wild affective behavior analysis has been an important study. In this paper, we submit our solutions for the 5th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW), which includes V-A Estimation, Facial Expression Classification and AU Detection Sub-challenges. We propose a Transformer Encoder with Multi-Head Attention framework to learn the distribution of both…
▽ More
The in-the-wild affective behavior analysis has been an important study. In this paper, we submit our solutions for the 5th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW), which includes V-A Estimation, Facial Expression Classification and AU Detection Sub-challenges. We propose a Transformer Encoder with Multi-Head Attention framework to learn the distribution of both the spatial and temporal features. Besides, there are virious effective data augmentation strategies employed to alleviate the problems of sample imbalance during model training. The results fully demonstrate the effectiveness of our proposed model based on the Aff-Wild2 dataset.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
Hybrid Multimodal Feature Extraction, Mining and Fusion for Sentiment Analysis
Authors:
Jia Li,
Ziyang Zhang,
Junjie Lang,
Yueqi Jiang,
Liuwei An,
Peng Zou,
Yangyang Xu,
Sheng Gao,
Jie Lin,
Chunxiao Fan,
Xiao Sun,
Meng Wang
Abstract:
In this paper, we present our solutions for the Multimodal Sentiment Analysis Challenge (MuSe) 2022, which includes MuSe-Humor, MuSe-Reaction and MuSe-Stress Sub-challenges. The MuSe 2022 focuses on humor detection, emotional reactions and multimodal emotional stress utilizing different modalities and data sets. In our work, different kinds of multimodal features are extracted, including acoustic,…
▽ More
In this paper, we present our solutions for the Multimodal Sentiment Analysis Challenge (MuSe) 2022, which includes MuSe-Humor, MuSe-Reaction and MuSe-Stress Sub-challenges. The MuSe 2022 focuses on humor detection, emotional reactions and multimodal emotional stress utilizing different modalities and data sets. In our work, different kinds of multimodal features are extracted, including acoustic, visual, text and biological features. These features are fused by TEMMA and GRU with self-attention mechanism frameworks. In this paper, 1) several new audio features, facial expression features and paragraph-level text embeddings are extracted for accuracy improvement. 2) we substantially improve the accuracy and reliability of multimodal sentiment prediction by mining and blending the multimodal features. 3) effective data augmentation strategies are applied in model training to alleviate the problem of sample imbalance and prevent the model from learning biased subject characters. For the MuSe-Humor sub-challenge, our model obtains the AUC score of 0.8932. For the MuSe-Reaction sub-challenge, the Pearson's Correlations Coefficient of our approach on the test set is 0.3879, which outperforms all other participants. For the MuSe-Stress sub-challenge, our approach outperforms the baseline in both arousal and valence on the test dataset, reaching a final combined result of 0.5151.
△ Less
Submitted 12 August, 2022; v1 submitted 5 August, 2022;
originally announced August 2022.
-
Multi scale Feature Extraction and Fusion for Online Knowledge Distillation
Authors:
Panpan Zou,
Yinglei Teng,
Tao Niu
Abstract:
Online knowledge distillation conducts knowledge transfer among all student models to alleviate the reliance on pre-trained models. However, existing online methods rely heavily on the prediction distributions and neglect the further exploration of the representational knowledge. In this paper, we propose a novel Multi-scale Feature Extraction and Fusion method (MFEF) for online knowledge distilla…
▽ More
Online knowledge distillation conducts knowledge transfer among all student models to alleviate the reliance on pre-trained models. However, existing online methods rely heavily on the prediction distributions and neglect the further exploration of the representational knowledge. In this paper, we propose a novel Multi-scale Feature Extraction and Fusion method (MFEF) for online knowledge distillation, which comprises three key components: Multi-scale Feature Extraction, Dual-attention and Feature Fusion, towards generating more informative feature maps for distillation. The multiscale feature extraction exploiting divide-and-concatenate in channel dimension is proposed to improve the multi-scale representation ability of feature maps. To obtain more accurate information, we design a dual-attention to strengthen the important channel and spatial regions adaptively. Moreover, we aggregate and fuse the former processed feature maps via feature fusion to assist the training of student models. Extensive experiments on CIF AR-10, CIF AR-100, and CINIC-10 show that MFEF transfers more beneficial representational knowledge for distillation and outperforms alternative methods among various network architectures
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Asymptotic Soft Cluster Pruning for Deep Neural Networks
Authors:
Tao Niu,
Yinglei Teng,
Panpan Zou
Abstract:
Filter pruning method introduces structural sparsity by removing selected filters and is thus particularly effective for reducing complexity. Previous works empirically prune networks from the point of view that filter with smaller norm contributes less to the final results. However, such criteria has been proven sensitive to the distribution of filters, and the accuracy may hard to recover since…
▽ More
Filter pruning method introduces structural sparsity by removing selected filters and is thus particularly effective for reducing complexity. Previous works empirically prune networks from the point of view that filter with smaller norm contributes less to the final results. However, such criteria has been proven sensitive to the distribution of filters, and the accuracy may hard to recover since the capacity gap is fixed once pruned. In this paper, we propose a novel filter pruning method called Asymptotic Soft Cluster Pruning (ASCP), to identify the redundancy of network based on the similarity of filters. Each filter from over-parameterized network is first distinguished by clustering, and then reconstructed to manually introduce redundancy into it. Several guidelines of clustering are proposed to better preserve feature extraction ability. After reconstruction, filters are allowed to be updated to eliminate the effect caused by mistakenly selected. Besides, various decaying strategies of the pruning rate are adopted to stabilize the pruning process and improve the final performance as well. By gradually generating more identical filters within each cluster, ASCP can remove them through channel addition operation with almost no accuracy drop. Extensive experiments on CIFAR-10 and ImageNet datasets show that our method can achieve competitive results compared with many state-of-the-art algorithms.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
Community Trend Prediction on Heterogeneous Graph in E-commerce
Authors:
Jiahao Yuan,
Zhao Li,
Pengcheng Zou,
Xuan Gao,
Jinwei Pan,
Wendi Ji,
Xiaoling Wang
Abstract:
In online shopping, ever-changing fashion trends make merchants need to prepare more differentiated products to meet the diversified demands, and e-commerce platforms need to capture the market trend with a prophetic vision. For the trend prediction, the attribute tags, as the essential description of items, can genuinely reflect the decision basis of consumers. However, few existing works explore…
▽ More
In online shopping, ever-changing fashion trends make merchants need to prepare more differentiated products to meet the diversified demands, and e-commerce platforms need to capture the market trend with a prophetic vision. For the trend prediction, the attribute tags, as the essential description of items, can genuinely reflect the decision basis of consumers. However, few existing works explore the attribute trend in the specific community for e-commerce. In this paper, we focus on the community trend prediction on the item attribute and propose a unified framework that combines the dynamic evolution of two graph patterns to predict the attribute trend in a specific community. Specifically, we first design a communityattribute bipartite graph at each time step to learn the collaboration of different communities. Next, we transform the bipartite graph into a hypergraph to exploit the associations of different attribute tags in one community. Lastly, we introduce a dynamic evolution component based on the recurrent neural networks to capture the fashion trend of attribute tags. Extensive experiments on three real-world datasets in a large e-commerce platform show the superiority of the proposed approach over several strong alternatives and demonstrate the ability to discover the community trend in advance.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic
Authors:
Tao Niu,
Yinglei Teng,
Zhu Han,
Panpan Zou
Abstract:
Recently, the applications of deep neural network (DNN) have been very prominent in many fields such as computer vision (CV) and natural language processing (NLP) due to its superior feature extraction performance. However, the high-dimension parameter model and large-scale mathematical calculation restrict the execution efficiency, especially for Internet of Things (IoT) devices. Different from t…
▽ More
Recently, the applications of deep neural network (DNN) have been very prominent in many fields such as computer vision (CV) and natural language processing (NLP) due to its superior feature extraction performance. However, the high-dimension parameter model and large-scale mathematical calculation restrict the execution efficiency, especially for Internet of Things (IoT) devices. Different from the previous cloud/edge-only pattern that brings huge pressure for uplink communication and device-only fashion that undertakes unaffordable calculation strength, we highlight the collaborative computation between the device and edge for DNN models, which can achieve a good balance between the communication load and execution accuracy. Specifically, a systematic on-demand co-inference framework is proposed to exploit the multi-branch structure, in which the pre-trained Alexnet is right-sized through \emph{early-exit} and partitioned at an intermediate DNN layer. The integer quantization is enforced to further compress transmission bits. As a result, we establish a new Deep Reinforcement Learning (DRL) optimizer-Soft Actor Critic for discrete (SAC-d), which generates the \emph{exit point}, \emph{partition point}, and \emph{compressing bits} by soft policy iterations. Based on the latency and accuracy aware reward design, such an optimizer can well adapt to the complex environment like dynamic wireless channel and arbitrary CPU processing, and is capable of supporting the 5G URLLC. Real-world experiment on Raspberry Pi 4 and PC shows the outperformance of the proposed solution.
△ Less
Submitted 9 January, 2022;
originally announced January 2022.
-
Beyond the Longest Letter-duplicated Subsequence Problem
Authors:
Wenfeng Lai,
Adiesha Liyanage,
Binhai Zhu,
Peng Zou
Abstract:
Given a sequence $S$ of length $n$, a letter-duplicated subsequence is a subsequence of $S$ in the form of $x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$ with $x_i\inΣ$, $x_j\neq x_{j+1}$ and $d_i\geq 2$ for all $i$ in $[k]$ and $j$ in $[k-1]$. A linear time algorithm for computing the longest letter-duplicated subsequence (LLDS) of $S$ can be easily obtained. In this paper, we focus on two variants of this…
▽ More
Given a sequence $S$ of length $n$, a letter-duplicated subsequence is a subsequence of $S$ in the form of $x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$ with $x_i\inΣ$, $x_j\neq x_{j+1}$ and $d_i\geq 2$ for all $i$ in $[k]$ and $j$ in $[k-1]$. A linear time algorithm for computing the longest letter-duplicated subsequence (LLDS) of $S$ can be easily obtained. In this paper, we focus on two variants of this problem. We first consider the constrained version when $Σ$ is unbounded, each letter appears in $S$ at least 6 times and all the letters in $Σ$ must appear in the solution. We show that the problem is NP-hard (a further twist indicates that the problem does not admit any polynomial time approximation). The reduction is from possibly the simplest version of SAT that is NP-complete, $(\leq 2,1,\leq 3)$-SAT, where each variable appears at most twice positively and exact once negatively, and each clause contains at most three literals and some clauses must contain exactly two literals. (We hope that this technique will serve as a general tool to help us proving the NP-hardness for some more tricky sequence problems involving only one sequence -- much harder than with at least two input sequences, which we apply successfully at the end of the paper on some extra variations of the LLDS problem.) We then show that when each letter appears in $S$ at most 3 times, then the problem admits a factor $1.5-O(\frac{1}{n})$ approximation. Finally, we consider the weighted version, where the weight of a block $x_i^{d_i} (d_i\geq 2)$ could be any positive function which might not grow with $d_i$. We give a non-trivial $O(n^2)$ time dynamic programming algorithm for this version, i.e., computing an LD-subsequence of $S$ whose weight is maximized.
△ Less
Submitted 4 January, 2022; v1 submitted 10 December, 2021;
originally announced December 2021.
-
MELONS: generating melody with long-term structure using transformers and structure graph
Authors:
Yi Zou,
Pei Zou,
Yi Zhao,
Kaixiang Zhang,
Ran Zhang,
Xiaorui Wang
Abstract:
The creation of long melody sequences requires effective expression of coherent musical structure. However, there is no clear representation of musical structure. Recent works on music generation have suggested various approaches to deal with the structural information of music, but generating a full-song melody with clear long-term structure remains a challenge. In this paper, we propose MELONS,…
▽ More
The creation of long melody sequences requires effective expression of coherent musical structure. However, there is no clear representation of musical structure. Recent works on music generation have suggested various approaches to deal with the structural information of music, but generating a full-song melody with clear long-term structure remains a challenge. In this paper, we propose MELONS, a melody generation framework based on a graph representation of music structure which consists of eight types of bar-level relations. MELONS adopts a multi-step generation method with transformer-based networks by factoring melody generation into two sub-problems: structure generation and structure conditional melody generation. Experimental results show that MELONS can produce structured melodies with high quality and rich contents.
△ Less
Submitted 3 November, 2021; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Overage and Staleness Metrics for Status Update Systems
Authors:
Peng Zou,
Jin Zhang,
Xianglin Wei,
Suresh Subramaniam
Abstract:
Status update systems consist of sensors that take measurements of a physical parameter and transmit them to a remote receiver. Age of Information (AoI) has been studied extensively as a metric for the freshness of information in such systems with and without an enforced hard or soft deadline. In this paper, we propose three metrics for status update systems to measure the ability of different que…
▽ More
Status update systems consist of sensors that take measurements of a physical parameter and transmit them to a remote receiver. Age of Information (AoI) has been studied extensively as a metric for the freshness of information in such systems with and without an enforced hard or soft deadline. In this paper, we propose three metrics for status update systems to measure the ability of different queuing systems to meet a threshold requirement for the AoI. The {\em overage probability} is defined as the probability that the age of the most recent update packet held by the receiver is larger than the threshold. The {\em stale update probability} is the probability that an update is stale, i.e., its age has exceeded the deadline, when it is delivered to the receiver. Finally, the {\em average overage} is defined as the time average of the overage (i.e., age beyond the threshold), and is a measure of the average ``staleness'' of the update packets held by the receiver. We investigate these metrics in three typical status update queuing systems -- M/G/1/1, M/G/1/$2^*$, and M/M/1. Numerical results show the performances for these metrics under different parameter settings and different service distributions. The differences between the average overage and average AoI are also shown. Our results demonstrate that a lower bound exists for the stale update probability when the buffer size is limited. Further, we observe that the overage probability decreases and the stale update probability increases as the update arrival rate increases.
△ Less
Submitted 9 October, 2021; v1 submitted 28 September, 2021;
originally announced September 2021.
-
Distant Supervision for E-commerce Query Segmentation via Attention Network
Authors:
Zhao Li,
Donghui Ding,
Pengcheng Zou,
Yu Gong,
Xi Chen,
Ji Zhang,
Jianliang Gao,
Youxi Wu,
Yucong Duan
Abstract:
The booming online e-commerce platforms demand highly accurate approaches to segment queries that carry the product requirements of consumers. Recent works have shown that the supervised methods, especially those based on deep learning, are attractive for achieving better performance on the problem of query segmentation. However, the lack of labeled data is still a big challenge for training a dee…
▽ More
The booming online e-commerce platforms demand highly accurate approaches to segment queries that carry the product requirements of consumers. Recent works have shown that the supervised methods, especially those based on deep learning, are attractive for achieving better performance on the problem of query segmentation. However, the lack of labeled data is still a big challenge for training a deep segmentation network, and the problem of Out-of-Vocabulary (OOV) also adversely impacts the performance of query segmentation. Different from query segmentation task in an open domain, e-commerce scenario can provide external documents that are closely related to these queries. Thus, to deal with the two challenges, we employ the idea of distant supervision and design a novel method to find contexts in external documents and extract features from these contexts. In this work, we propose a BiLSTM-CRF based model with an attention module to encode external features, such that external contexts information, which can be utilized naturally and effectively to help query segmentation. Experiments on two datasets show the effectiveness of our approach compared with several kinds of baselines.
△ Less
Submitted 8 November, 2020;
originally announced November 2020.
-
On Age and Value of Information in Status Update Systems
Authors:
Peng Zou,
Omur Ozel,
Suresh Subramaniam
Abstract:
Motivated by the inherent value of packets arising in many cyber-physical applications (e.g., due to precision of the information content or an alarm message), we consider status update systems with update packets carrying values as well as their generation time stamps. Once generated, a status update packet has a random initial value and a deterministic deadline after which it is not useful (ulti…
▽ More
Motivated by the inherent value of packets arising in many cyber-physical applications (e.g., due to precision of the information content or an alarm message), we consider status update systems with update packets carrying values as well as their generation time stamps. Once generated, a status update packet has a random initial value and a deterministic deadline after which it is not useful (ultimate staleness). In our model, value of a packet decreases in time (even after reception) starting from its generation to ultimate staleness when it vanishes. The value of information (VoI) at the receiver is additive in that the VoI is the sum of the current values of all packets held by the receiver. We investigate various queuing disciplines under potential dependence between value and service time and provide closed form expressions for average VoI at the receiver. Numerical results illustrate the average VoI for different scenarios and the contrast between average age of information (AoI) and average VoI.
△ Less
Submitted 31 March, 2020;
originally announced March 2020.
-
Maintaining Information Freshness in Power-Efficient Status Update Systems
Authors:
Parisa Rafiee,
Peng Zou,
Omur Ozel,
Suresh Subramaniam
Abstract:
This paper is motivated by emerging edge computing systems which consist of sensor nodes that acquire and process information and then transmit status updates to an edge receiver for possible further processing. As power is a scarce resource at the sensor nodes, the system is modeled as a tandem computation-transmission queue with power-efficient computing. Jobs arrive at the computation server wi…
▽ More
This paper is motivated by emerging edge computing systems which consist of sensor nodes that acquire and process information and then transmit status updates to an edge receiver for possible further processing. As power is a scarce resource at the sensor nodes, the system is modeled as a tandem computation-transmission queue with power-efficient computing. Jobs arrive at the computation server with rate $λ$ as a Poisson process with no available data buffer. The computation server can be in one of three states: (i) OFF: the server is turned off and no jobs are observed or processed, (ii) ON-Idle: the server is turned on but there is no job in the server, (iii) ON-Busy: the server is turned on and a job is processed in the server. These states cost zero, one and $p_c$ units of power, respectively. Under a long-term power constraint, the computation server switches from one state to another in sequence: first a deterministic $T_o$ time units in OFF state, then waiting for a job arrival in ON-Idle state and then in ON-Busy state for an independent identically distributed compute time duration. The transmission server has a single unit data buffer to save incoming packets and applies last come first serve with discarding as well as a packet deadline to discard a sitting packet for maintaining information freshness, which is measured by the Age of Information (AoI). Additionally, there is a monotonic functional relation between the mean time spent in ON-Busy state and the mean transmission time. We obtain closed-form expressions for average AoI and average peak AoI. Our numerical results illustrate various regimes of operation for best AoI performances optimized over packet deadlines with relation to power efficiency.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.
-
Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms
Authors:
Manuel Lafond,
Binhai Zhu,
Peng Zou
Abstract:
Recently, due to the genomic sequence analysis in several types of cancer, the genomic data based on {\em copy number profiles} ({\em CNP} for short) are getting more and more popular. A CNP is a vector where each component is a non-negative integer representing the number of copies of a specific gene or segment of interest.
In this paper, we present two streams of results. The first is the nega…
▽ More
Recently, due to the genomic sequence analysis in several types of cancer, the genomic data based on {\em copy number profiles} ({\em CNP} for short) are getting more and more popular. A CNP is a vector where each component is a non-negative integer representing the number of copies of a specific gene or segment of interest.
In this paper, we present two streams of results. The first is the negative results on two open problems regarding the computational complexity of the Minimum Copy Number Generation (MCNG) problem posed by Qingge et al. in 2018. It was shown by Qingge et al. that the problem is NP-hard if the duplications are tandem and they left the open question of whether the problem remains NP-hard if arbitrary duplications are used. We answer this question affirmatively in this paper; in fact, we prove that it is NP-hard to even obtain a constant factor approximation. We also prove that the parameterized version is W[1]-hard, answering another open question by Qingge et al.
The other result is positive and is based on a new (and more general) problem regarding CNP's. The \emph{Copy Number Profile Conforming (CNPC)} problem is formally defined as follows: given two CNP's $C_1$ and $C_2$, compute two strings $S_1$ and $S_2$ with $cnp(S_1)=C_1$ and $cnp(S_2)=C_2$ such that the distance between $S_1$ and $S_2$, $d(S_1,S_2)$, is minimized. Here, $d(S_1,S_2)$ is a very general term, which means it could be any genome rearrangement distance (like reversal, transposition, and tandem duplication, etc). We make the first step by showing that if $d(S_1,S_2)$ is measured by the breakpoint distance then the problem is polynomially solvable.
△ Less
Submitted 11 February, 2020;
originally announced February 2020.