Skip to main content

Showing 1–13 of 13 results for author: Zhuang, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.08490  [pdf, ps, other

    cs.CL

    Integration of Old and New Knowledge for Generalized Intent Discovery: A Consistency-driven Prototype-Prompting Framework

    Authors: Xiao Wei, Xiaobao Wang, Ning Zhuang, Chenyang Wang, Longbiao Wang, Jianwu dang

    Abstract: Intent detection aims to identify user intents from natural language inputs, where supervised methods rely heavily on labeled in-domain (IND) data and struggle with out-of-domain (OOD) intents, limiting their practical applicability. Generalized Intent Discovery (GID) addresses this by leveraging unlabeled OOD data to discover new intents without additional annotation. However, existing methods fo… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 9 pages, 2 figures, 7 tables, IJCAI 2025

  2. arXiv:2412.09920  [pdf, other

    cs.CV

    Precision-Enhanced Human-Object Contact Detection via Depth-Aware Perspective Interaction and Object Texture Restoration

    Authors: Yuxiao Wang, Wenpeng Neng, Zhenao Wei, Yu Lei, Weiying Xue, Nan Zhuang, Yanwu Xu, Xinyu Jiang, Qi Liu

    Abstract: Human-object contact (HOT) is designed to accurately identify the areas where humans and objects come into contact. Current methods frequently fail to account for scenarios where objects are frequently blocking the view, resulting in inaccurate identification of contact areas. To tackle this problem, we suggest using a perspective interaction HOT detector called PIHOT, which utilizes a depth map g… ▽ More

    Submitted 16 December, 2024; v1 submitted 13 December, 2024; originally announced December 2024.

    Comments: Accepted by AAAl 2025

  3. arXiv:2410.05954  [pdf, other

    cs.CV cs.LG

    Pyramidal Flow Matching for Efficient Video Generative Modeling

    Authors: Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong Mu, Zhouchen Lin

    Abstract: Video generation requires modeling a vast spatiotemporal space, which demands significant computational resources and data usage. To reduce the complexity, the prevailing approaches employ a cascaded architecture to avoid direct training with full resolution latent. Despite reducing computational demands, the separate optimization of each sub-stage hinders knowledge sharing and sacrifices flexibil… ▽ More

    Submitted 15 March, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

    Comments: ICLR 2025

  4. arXiv:2403.07652  [pdf, other

    cs.LG cs.CL

    Harder Tasks Need More Experts: Dynamic Routing in MoE Models

    Authors: Quzhe Huang, Zhenwei An, Nan Zhuang, Mingxu Tao, Chen Zhang, Yang Jin, Kun Xu, Kun Xu, Liwei Chen, Songfang Huang, Yansong Feng

    Abstract: In this paper, we introduce a novel dynamic expert selection framework for Mixture of Experts (MoE) models, aiming to enhance computational efficiency and model performance by adjusting the number of activated experts based on input difficulty. Unlike traditional MoE approaches that rely on fixed Top-K routing, which activates a predetermined number of experts regardless of the input's complexity,… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  5. arXiv:2403.01840  [pdf, other

    cs.CV cs.AI

    FreeA: Human-object Interaction Detection using Free Annotation Labels

    Authors: Qi Liu, Yuxiao Wang, Xinyu Jiang, Wolin Liang, Zhenao Wei, Yu Lei, Nan Zhuang, Weiying Xue

    Abstract: Recent human-object interaction (HOI) detection methods depend on extensively annotated image datasets, which require a significant amount of manpower. In this paper, we propose a novel self-adaptive, language-driven HOI detection method, termed FreeA. This method leverages the adaptability of the text-image model to generate latent HOI labels without requiring manual annotation. Specifically, Fre… ▽ More

    Submitted 16 May, 2025; v1 submitted 4 March, 2024; originally announced March 2024.

  6. arXiv:2007.13332  [pdf, other

    cs.CV

    Few-shot Knowledge Transfer for Fine-grained Cartoon Face Generation

    Authors: Nan Zhuang, Cheng Yang

    Abstract: In this paper, we are interested in generating fine-grained cartoon faces for various groups. We assume that one of these groups consists of sufficient training data while the others only contain few samples. Although the cartoon faces of these groups share similar style, the appearances in various groups could still have some specific characteristics, which makes them differ from each other. A ma… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

    Comments: Technical Report

  7. arXiv:1912.01728  [pdf, other

    cs.CL cs.SD eess.AS

    Fast Intent Classification for Spoken Language Understanding

    Authors: Akshit Tyagi, Varun Sharma, Rahul Gupta, Lynn Samson, Nan Zhuang, Zihang Wang, Bill Campbell

    Abstract: Spoken Language Understanding (SLU) systems consist of several machine learning components operating together (e.g. intent classification, named entity recognition and resolution). Deep learning models have obtained state of the art results on several of these tasks, largely attributed to their better modeling capacity. However, an increase in modeling capacity comes with added costs of higher lat… ▽ More

    Submitted 14 February, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

    Comments: Accepted as a conference paper at ICASSP 20

  8. arXiv:1905.04293  [pdf, other

    cs.CV

    Differential Recurrent Neural Network and its Application for Human Activity Recognition

    Authors: Naifan Zhuang, Guo-Jun Qi, The Duc Kieu, Kien A. Hua

    Abstract: The Long Short-Term Memory (LSTM) recurrent neural network is capable of processing complex sequential information since it utilizes special gating schemes for learning representations from long input sequences. It has the potential to model any sequential time-series data, where the current hidden state has to be considered in the context of the past hidden states. This property makes LSTM an ide… ▽ More

    Submitted 9 May, 2019; originally announced May 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1504.06678, arXiv:1804.04192

  9. arXiv:1805.12176  [pdf, other

    cs.LG cs.MM stat.ML

    Deep Segment Hash Learning for Music Generation

    Authors: Kevin Joslyn, Naifan Zhuang, Kien A. Hua

    Abstract: Music generation research has grown in popularity over the past decade, thanks to the deep learning revolution that has redefined the landscape of artificial intelligence. In this paper, we propose a novel approach to music generation inspired by musical segment concatenation methods and hash learning algorithms. Given a segment of music, we use a deep recurrent neural network and ranking-based ha… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Comments: 16 pages, 4 figures

  10. arXiv:1805.01290  [pdf, other

    cs.CV

    Multi-task Learning of Cascaded CNN for Facial Attribute Classification

    Authors: Ni Zhuang, Yan Yan, Si Chen, Hanzi Wang

    Abstract: Recently, facial attribute classification (FAC) has attracted significant attention in the computer vision community. Great progress has been made along with the availability of challenging FAC datasets. However, conventional FAC methods usually firstly pre-process the input images (i.e., perform face detection and alignment) and then predict facial attributes. These methods ignore the inherent de… ▽ More

    Submitted 3 May, 2018; originally announced May 2018.

  11. Multi-label Learning Based Deep Transfer Neural Network for Facial Attribute Classification

    Authors: Ni Zhuang, Yan Yan, Si Chen, Hanzi Wang, Chunhua Shen

    Abstract: Deep Neural Network (DNN) has recently achieved outstanding performance in a variety of computer vision tasks, including facial attribute classification. The great success of classifying facial attributes with DNN often relies on a massive amount of labelled data. However, in real-world applications, labelled data are only provided for some commonly used attributes (such as age, gender); whereas,… ▽ More

    Submitted 3 May, 2018; originally announced May 2018.

  12. arXiv:1804.04192  [pdf, other

    cs.CV

    Deep Differential Recurrent Neural Networks

    Authors: Naifan Zhuang, The Duc Kieu, Guo-Jun Qi, Kien A. Hua

    Abstract: Due to the special gating schemes of Long Short-Term Memory (LSTM), LSTMs have shown greater potential to process complex sequential information than the traditional Recurrent Neural Network (RNN). The conventional LSTM, however, fails to take into consideration the impact of salient spatio-temporal dynamics present in the sequential input data. This problem was first addressed by the differential… ▽ More

    Submitted 11 April, 2018; originally announced April 2018.

  13. arXiv:1504.06678  [pdf, ps, other

    cs.CV

    Differential Recurrent Neural Networks for Action Recognition

    Authors: Vivek Veeriah, Naifan Zhuang, Guo-Jun Qi

    Abstract: The long short-term memory (LSTM) neural network is capable of processing complex sequential information since it utilizes special gating schemes for learning representations from long input sequences. It has the potential to model any sequential time-series data, where the current hidden state has to be considered in the context of the past hidden states. This property makes LSTM an ideal choice… ▽ More

    Submitted 24 April, 2015; originally announced April 2015.