Search | arXiv e-print repository

VarGes: Improving Variation in Co-Speech 3D Gesture Generation via StyleCLIPS

Authors: Ming Meng, Ke Mu, Yonggui Zhu, Zhe Zhu, Haoyu Sun, Heyang Yan, Zhaoxin Fan

Abstract: Generating expressive and diverse human gestures from audio is crucial in fields like human-computer interaction, virtual reality, and animation. Though existing methods have achieved remarkable performance, they often exhibit limitations due to constrained dataset diversity and the restricted amount of information derived from audio inputs. To address these challenges, we present VarGes, a novel… ▽ More Generating expressive and diverse human gestures from audio is crucial in fields like human-computer interaction, virtual reality, and animation. Though existing methods have achieved remarkable performance, they often exhibit limitations due to constrained dataset diversity and the restricted amount of information derived from audio inputs. To address these challenges, we present VarGes, a novel variation-driven framework designed to enhance co-speech gesture generation by integrating visual stylistic cues while maintaining naturalness. Our approach begins with the Variation-Enhanced Feature Extraction (VEFE) module, which seamlessly incorporates \textcolor{blue}{style-reference} video data into a 3D human pose estimation network to extract StyleCLIPS, thereby enriching the input with stylistic information. Subsequently, we employ the Variation-Compensation Style Encoder (VCSE), a transformer-style encoder equipped with an additive attention mechanism pooling layer, to robustly encode diverse StyleCLIPS representations and effectively manage stylistic variations. Finally, the Variation-Driven Gesture Predictor (VDGP) module fuses MFCC audio features with StyleCLIPS encodings via cross-attention, injecting this fused data into a cross-conditional autoregressive model to modulate 3D human gesture generation based on audio input and stylistic clues. The efficacy of our approach is validated on benchmark datasets, where it outperforms existing methods in terms of gesture diversity and naturalness. The code and video results will be made publicly available upon acceptance:https://github.com/mookerr/VarGES/ . △ Less

Submitted 18 February, 2025; v1 submitted 15 February, 2025; originally announced February 2025.

arXiv:2410.04511 [pdf, other]

Realizing Video Summarization from the Path of Language-based Semantic Understanding

Authors: Kuan-Chen Mu, Zhi-Yi Chin, Wei-Chen Chiu

Abstract: The recent development of Video-based Large Language Models (VideoLLMs), has significantly advanced video summarization by aligning video features and, in some cases, audio features with Large Language Models (LLMs). Each of these VideoLLMs possesses unique strengths and weaknesses. Many recent methods have required extensive fine-tuning to overcome the limitations of these models, which can be re… ▽ More The recent development of Video-based Large Language Models (VideoLLMs), has significantly advanced video summarization by aligning video features and, in some cases, audio features with Large Language Models (LLMs). Each of these VideoLLMs possesses unique strengths and weaknesses. Many recent methods have required extensive fine-tuning to overcome the limitations of these models, which can be resource-intensive. In this work, we observe that the strengths of one VideoLLM can complement the weaknesses of another. Leveraging this insight, we propose a novel video summarization framework inspired by the Mixture of Experts (MoE) paradigm, which operates as an inference-time algorithm without requiring any form of fine-tuning. Our approach integrates multiple VideoLLMs to generate comprehensive and coherent textual summaries. It effectively combines visual and audio content, provides detailed background descriptions, and excels at identifying keyframes, which enables more semantically meaningful retrieval compared to traditional computer vision approaches that rely solely on visual information, all without the need for additional fine-tuning. Moreover, the resulting summaries enhance performance in downstream tasks such as summary video generation, either through keyframe selection or in combination with text-to-image models. Our language-driven approach offers a semantically rich alternative to conventional methods and provides flexibility to incorporate newer VideoLLMs, enhancing adaptability and performance in video summarization tasks. △ Less

Submitted 6 October, 2024; originally announced October 2024.

arXiv:2310.19138 [pdf, other]

Backward and Forward Inference in Interacting Independent-Cascade Processes: A Scalable and Convergent Message-Passing Approach

Authors: Nouman Khan, Kangle Mu, Mehrdad Moharrami, Vijay Subramanian

Abstract: We study the problems of estimating the past and future evolutions of two diffusion processes that spread concurrently on a network. Specifically, given a known network $G=(V, \overrightarrow{E})$ and a (possibly noisy) snapshot $\mathcal{O}_n$ of its state taken at (a possibly unknown) time $W$, we wish to determine the posterior distributions of the initial state of the network and the infection… ▽ More We study the problems of estimating the past and future evolutions of two diffusion processes that spread concurrently on a network. Specifically, given a known network $G=(V, \overrightarrow{E})$ and a (possibly noisy) snapshot $\mathcal{O}_n$ of its state taken at (a possibly unknown) time $W$, we wish to determine the posterior distributions of the initial state of the network and the infection times of its nodes. These distributions are useful in finding source nodes of epidemics and rumors -- $\textit{backward inference}$ -- , and estimating the spread of a fixed set of source nodes -- $\textit{forward inference}$. To model the interaction between the two processes, we study an extension of the independent-cascade (IC) model where, when a node gets infected with either process, its susceptibility to the other one changes. First, we derive the exact joint probability of the initial state of the network and the observation-snapshot $\mathcal{O}_n$. Then, using the machinery of factor-graphs, factor-graph transformations, and the generalized distributive-law, we derive a Belief-Propagation (BP) based algorithm that is scalable to large networks and can converge on graphs of arbitrary topology (at a likely expense in approximation accuracy). △ Less

Submitted 29 October, 2023; originally announced October 2023.

arXiv:2209.10299 [pdf, other]

DPCN: Towards Deadline-aware Payment Channel Networks

Authors: Wenhui Wang, Ke Mu, Xuetao Wei

Abstract: Payment channel is a class of techniques designed to solve the scalability problem of blockchain. By establishing channels off the blockchain to form payment channel networks (PCNs), users can make instant payments without interacting with the blockchain, avoiding the problems of long transaction consensus delays and high transaction fees. Recently, the optimization of PCNs has mainly focused on i… ▽ More Payment channel is a class of techniques designed to solve the scalability problem of blockchain. By establishing channels off the blockchain to form payment channel networks (PCNs), users can make instant payments without interacting with the blockchain, avoiding the problems of long transaction consensus delays and high transaction fees. Recently, the optimization of PCNs has mainly focused on improving the network throughput via multi-path routing. However, the transaction's atomicity comes at a non-trivial cost for transaction completion latency that affects user experience in deadline-sensitive applications of PCNs. In this paper, we propose a new and systematic framework DPCN to consider the deadlines of transactions for payment channel networks while improving the success ratio of transactions. DPCN is enabled via a synergy of three components: (1) deadline-based dynamic transaction split mechanism that splits the transaction according to current network status and the transaction's deadline; (2) deadline-aware transaction scheduling that prioritizes near-deadline transactions; (3) deadline-aware transaction congestion avoidance algorithm, which uses a path window to balance transactions with different deadlines. Our extensive experiments show that compared with existing methods, DPCN can well meet the needs of transactions with different deadlines and ensure a higher success ratio for transactions in the payment channel networks. △ Less

Submitted 13 October, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

arXiv:2107.08873 [pdf, other]

RingFed: Reducing Communication Costs in Federated Learning on Non-IID Data

Authors: Guang Yang, Ke Mu, Chunhe Song, Zhijia Yang, Tierui Gong

Abstract: Federated learning is a widely used distributed deep learning framework that protects the privacy of each client by exchanging model parameters rather than raw data. However, federated learning suffers from high communication costs, as a considerable number of model parameters need to be transmitted many times during the training process, making the approach inefficient, especially when the commun… ▽ More Federated learning is a widely used distributed deep learning framework that protects the privacy of each client by exchanging model parameters rather than raw data. However, federated learning suffers from high communication costs, as a considerable number of model parameters need to be transmitted many times during the training process, making the approach inefficient, especially when the communication network bandwidth is limited. This article proposes RingFed, a novel framework to reduce communication overhead during the training process of federated learning. Rather than transmitting parameters between the center server and each client, as in original federated learning, in the proposed RingFed, the updated parameters are transmitted between each client in turn, and only the final result is transmitted to the central server, thereby reducing the communication overhead substantially. After several local updates, clients first send their parameters to another proximal client, not to the center server directly, to preaggregate. Experiments on two different public datasets show that RingFed has fast convergence, high model accuracy, and low communication cost. △ Less

Submitted 19 July, 2021; originally announced July 2021.

arXiv:2009.00833 [pdf, other]

Intrinsic Relationship Reasoning for Small Object Detection

Authors: Kui Fu, Jia Li, Lin Ma, Kai Mu, Yonghong Tian

Abstract: The small objects in images and videos are usually not independent individuals. Instead, they more or less present some semantic and spatial layout relationships with each other. Modeling and inferring such intrinsic relationships can thereby be beneficial for small object detection. In this paper, we propose a novel context reasoning approach for small object detection which models and infers the… ▽ More The small objects in images and videos are usually not independent individuals. Instead, they more or less present some semantic and spatial layout relationships with each other. Modeling and inferring such intrinsic relationships can thereby be beneficial for small object detection. In this paper, we propose a novel context reasoning approach for small object detection which models and infers the intrinsic semantic and spatial layout relationships between objects. Specifically, we first construct a semantic module to model the sparse semantic relationships based on the initial regional features, and a spatial layout module to model the sparse spatial layout relationships based on their position and shape information, respectively. Both of them are then fed into a context reasoning module for integrating the contextual information with respect to the objects and their relationships, which is further fused with the original regional visual features for classification and regression. Experimental results reveal that the proposed approach can effectively boost the small object detection performance. △ Less

Submitted 2 September, 2020; originally announced September 2020.

arXiv:1504.06700 [pdf, ps, other]

Preferential Multi-Context Systems

Authors: Kedian Mu, Kewen Wang, Lian Wen

Abstract: Multi-context systems (MCS) presented by Brewka and Eiter can be considered as a promising way to interlink decentralized and heterogeneous knowledge contexts. In this paper, we propose preferential multi-context systems (PMCS), which provide a framework for incorporating a total preorder relation over contexts in a multi-context system. In a given PMCS, its contexts are divided into several parts… ▽ More Multi-context systems (MCS) presented by Brewka and Eiter can be considered as a promising way to interlink decentralized and heterogeneous knowledge contexts. In this paper, we propose preferential multi-context systems (PMCS), which provide a framework for incorporating a total preorder relation over contexts in a multi-context system. In a given PMCS, its contexts are divided into several parts according to the total preorder relation over them, moreover, only information flows from a context to ones of the same part or less preferred parts are allowed to occur. As such, the first $l$ preferred parts of an PMCS always fully capture the information exchange between contexts of these parts, and then compose another meaningful PMCS, termed the $l$-section of that PMCS. We generalize the equilibrium semantics for an MCS to the (maximal) $l_{\leq}$-equilibrium which represents belief states at least acceptable for the $l$-section of an PMCS. We also investigate inconsistency analysis in PMCS and related computational complexity issues. △ Less

Submitted 25 April, 2015; originally announced April 2015.

MSC Class: 68T30 ACM Class: I.2.4

arXiv:1406.6102 [pdf, other]

doi 10.1017/S1471068414000611

Random Logic Programs: Linear Model

Authors: Kewen Wang, Lian Wen, Kedian Mu

Abstract: This paper proposes a model, the linear model, for randomly generating logic programs with low density of rules and investigates statistical properties of such random logic programs. It is mathematically shown that the average number of answer sets for a random program converges to a constant when the number of atoms approaches infinity. Several experimental results are also reported, which justif… ▽ More This paper proposes a model, the linear model, for randomly generating logic programs with low density of rules and investigates statistical properties of such random logic programs. It is mathematically shown that the average number of answer sets for a random program converges to a constant when the number of atoms approaches infinity. Several experimental results are also reported, which justify the suitability of the linear model. It is also experimentally shown that, under this model, the size distribution of answer sets for random programs tends to a normal distribution when the number of atoms is sufficiently large. △ Less

Submitted 23 June, 2014; originally announced June 2014.

Comments: 33 pages. To appear in: Theory and Practice of Logic Programming

Report number: GUICTWK2014-1

Journal ref: Theory and Practice of Logic Programming 15 (2014) 818-853

Showing 1–8 of 8 results for author: Mu, K