Skip to main content

Showing 1–42 of 42 results for author: Pham, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21920  [pdf, ps, other

    cs.CV

    SepFormer: Coarse-to-fine Separator Regression Network for Table Structure Recognition

    Authors: Nam Quan Nguyen, Xuan Phong Pham, Tuan-Anh Tran

    Abstract: The automated reconstruction of the logical arrangement of tables from image data, termed Table Structure Recognition (TSR), is fundamental for semantic data extraction. Recently, researchers have explored a wide range of techniques to tackle this problem, demonstrating significant progress. Each table is a set of vertical and horizontal separators. Following this realization, we present SepFormer… ▽ More

    Submitted 27 June, 2025; originally announced June 2025.

  2. arXiv:2503.22281  [pdf, other

    cs.CV

    Divide to Conquer: A Field Decomposition Approach for Multi-Organ Whole-Body CT Image Registration

    Authors: Xuan Loc Pham, Mathias Prokop, Bram van Ginneken, Alessa Hering

    Abstract: Image registration is an essential technique for the analysis of Computed Tomography (CT) images in clinical practice. However, existing methodologies are predominantly tailored to a specific organ of interest and often exhibit lower performance on other organs, thus limiting their generalizability and applicability. Multi-organ registration addresses these limitations, but the simultaneous alignm… ▽ More

    Submitted 28 March, 2025; originally announced March 2025.

  3. arXiv:2503.20418  [pdf, ps, other

    cs.CV

    ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On

    Authors: Ji Woo Hong, Tri Ton, Trung X. Pham, Gwanhyeong Koo, Sunjae Yoon, Chang D. Yoo

    Abstract: This paper introduces ITA-MDT, the Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On (IVTON), designed to overcome the limitations of previous approaches by leveraging the Masked Diffusion Transformer (MDT) for improved handling of both global garment context and fine-grained details. The IVTON task involves seamlessly superimposing a garment from one im… ▽ More

    Submitted 1 June, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

    Comments: CVPR 2025, Project Page: https://jiwoohong93.github.io/ita-mdt/

  4. arXiv:2502.11915  [pdf, other

    cs.AI math.HO

    On the robustness of ChatGPT in teaching Korean Mathematics

    Authors: Phuong-Nam Nguyen, Quang Nguyen-The, An Vu-Minh, Diep-Anh Nguyen, Xuan-Lam Pham

    Abstract: ChatGPT, an Artificial Intelligence model, has the potential to revolutionize education. However, its effectiveness in solving non-English questions remains uncertain. This study evaluates ChatGPT's robustness using 586 Korean mathematics questions. ChatGPT achieves 66.72% accuracy, correctly answering 391 out of 586 questions. We also assess its ability to rate mathematics questions based on elev… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 21 pages, 12 figures, includes statistical analysis of ChatGPT's robustness in solving and rating multilingual mathematics questions. Focus on Korean CSAT Mathematics. Evaluates AI accuracy, rating effectiveness, and topic analysis

    ACM Class: I.2.7; K.3.1; G.3

  5. arXiv:2502.09164  [pdf, other

    cs.CV cs.LG

    E-MD3C: Taming Masked Diffusion Transformers for Efficient Zero-Shot Object Customization

    Authors: Trung X. Pham, Zhang Kang, Ji Woo Hong, Xuran Zheng, Chang D. Yoo

    Abstract: We propose E-MD3C ($\underline{E}$fficient $\underline{M}$asked $\underline{D}$iffusion Transformer with Disentangled $\underline{C}$onditions and $\underline{C}$ompact $\underline{C}$ollector), a highly efficient framework for zero-shot object image customization. Unlike prior works reliant on resource-intensive Unet architectures, our approach employs lightweight masked diffusion transformers op… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: 16 pages, 14 figures

  6. arXiv:2411.18552  [pdf, other

    cs.CV

    FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion

    Authors: Haosen Yang, Adrian Bulat, Isma Hadji, Hai X. Pham, Xiatian Zhu, Georgios Tzimiropoulos, Brais Martinez

    Abstract: Diffusion models are proficient at generating high-quality images. They are however effective only when operating at the resolution used during training. Inference at a scaled resolution leads to repetitive patterns and structural distortions. Retraining at higher resolutions quickly becomes prohibitive. Thus, methods enabling pre-existing diffusion models to operate at flexible test-time resoluti… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

  7. arXiv:2410.02130  [pdf, other

    cs.SD cs.CV eess.AS

    MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation

    Authors: Trung X. Pham, Tri Ton, Chang D. Yoo

    Abstract: We introduce MDSGen, a novel framework for vision-guided open-domain sound generation optimized for model parameter size, memory consumption, and inference speed. This framework incorporates two key innovations: (1) a redundant video feature removal module that filters out unnecessary visual information, and (2) a temporal-aware masking strategy that leverages temporal context for enhanced audio g… ▽ More

    Submitted 13 February, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

    Comments: ICLR 2025

  8. arXiv:2405.01054  [pdf, other

    cs.RO cs.CV cs.LG

    Continual Learning for Robust Gate Detection under Dynamic Lighting in Autonomous Drone Racing

    Authors: Zhongzheng Qiao, Xuan Huy Pham, Savitha Ramasamy, Xudong Jiang, Erdal Kayacan, Andriy Sarabakha

    Abstract: In autonomous and mobile robotics, a principal challenge is resilient real-time environmental perception, particularly in situations characterized by unknown and dynamic elements, as exemplified in the context of autonomous drone racing. This study introduces a perception technique for detecting drone racing gates under illumination variations, which is common during high-speed drone flights. The… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures, in 2024 International Joint Conference on Neural Networks (IJCNN)

  9. arXiv:2402.01516  [pdf, other

    cs.CV

    Cross-view Masked Diffusion Transformers for Person Image Synthesis

    Authors: Trung X. Pham, Zhang Kang, Chang D. Yoo

    Abstract: We present X-MDPT ($\underline{Cross}$-view $\underline{M}$asked $\underline{D}$iffusion $\underline{P}$rediction $\underline{T}$ransformers), a novel diffusion model designed for pose-guided human image generation. X-MDPT distinguishes itself by employing masked diffusion transformers that operate on latent patches, a departure from the commonly-used Unet structures in existing works. The model c… ▽ More

    Submitted 3 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  10. arXiv:2401.13594  [pdf, other

    cs.CL cs.AI

    Graph Guided Question Answer Generation for Procedural Question-Answering

    Authors: Hai X. Pham, Isma Hadji, Xinnuo Xu, Ziedune Degutyte, Jay Rainey, Evangelos Kazakos, Afsaneh Fazly, Georgios Tzimiropoulos, Brais Martinez

    Abstract: In this paper, we focus on task-specific question answering (QA). To this end, we introduce a method for generating exhaustive and high-quality training data, which allows us to train compact (e.g., run on a mobile device), task-specific QA models that are competitive against GPT variants. The key technological enabler is a novel mechanism for automatic question-answer generation from procedural t… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted to EACL 2024 as long paper. 25 pages including appendix

    MSC Class: I.2.7

  11. arXiv:2311.18508  [pdf, other

    eess.IV cs.CV

    DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based Single Image Super-resolution

    Authors: Axi Niu, Kang Zhang, Joshua Tian Jin Tee, Trung X. Pham, Jinqiu Sun, Chang D. Yoo, In So Kweon, Yanning Zhang

    Abstract: It is well known the adversarial optimization of GAN-based image super-resolution (SR) methods makes the preceding SR model generate unpleasant and undesirable artifacts, leading to large distortion. We attribute the cause of such distortions to the poor calibration of the discriminator, which hampers its ability to provide meaningful feedback to the generator for learning high-quality images. To… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  12. arXiv:2310.14030  [pdf, other

    cs.RO

    Visual Tracking Nonlinear Model Predictive Control Method for Autonomous Wind Turbine Inspection

    Authors: Abdelhakim Amer, Mohit Mehndiratta, Jonas le Fevre Sejersen, Huy Xuan Pham, Erdal Kayacan

    Abstract: Automated visual inspection of on-and offshore wind turbines using aerial robots provides several benefits, namely, a safe working environment by circumventing the need for workers to be suspended high above the ground, reduced inspection time, preventive maintenance, and access to hard-to-reach areas. A novel nonlinear model predictive control (NMPC) framework alongside a global wind turbine path… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: 8 pages, accepted for publication at ICAR conference

  13. arXiv:2305.18547  [pdf, other

    cs.CV

    Learning from Multi-Perception Features for Real-Word Image Super-resolution

    Authors: Axi Niu, Kang Zhang, Trung X. Pham, Pei Wang, Jinqiu Sun, In So Kweon, Yanning Zhang

    Abstract: Currently, there are two popular approaches for addressing real-world image super-resolution problems: degradation-estimation-based and blind-based methods. However, degradation-estimation-based methods may be inaccurate in estimating the degradation, making them less applicable to real-world LR images. On the other hand, blind-based methods are often limited by their fixed single perception infor… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  14. arXiv:2302.12831  [pdf, other

    eess.IV cs.CV

    CDPMSR: Conditional Diffusion Probabilistic Models for Single Image Super-Resolution

    Authors: Axi Niu, Kang Zhang, Trung X. Pham, Jinqiu Sun, Yu Zhu, In So Kweon, Yanning Zhang

    Abstract: Diffusion probabilistic models (DPM) have been widely adopted in image-to-image translation to generate high-quality images. Prior attempts at applying the DPM to image super-resolution (SR) have shown that iteratively refining a pure Gaussian noise with a conditional image using a U-Net trained on denoising at various-level noises can help obtain a satisfied high-resolution image for the low-reso… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

    Comments: 4 pages, 4 figures

  15. arXiv:2211.09861  [pdf, other

    cs.CV

    Self-Supervised Visual Representation Learning via Residual Momentum

    Authors: Trung X. Pham, Axi Niu, Zhang Kang, Sultan Rizky Madjid, Ji Woo Hong, Daehyeok Kim, Joshua Tian Jin Tee, Chang D. Yoo

    Abstract: Self-supervised learning (SSL) approaches have shown promising capabilities in learning the representation from unlabeled data. Amongst them, momentum-based frameworks have attracted significant attention. Despite being a great success, these momentum-based SSL frameworks suffer from a large gap in representation between the online encoder (student) and the momentum encoder (teacher), which hinder… ▽ More

    Submitted 21 November, 2022; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: 18 pages, 16 figures

  16. arXiv:2210.08282  [pdf, other

    cs.CV

    LAD: A Hybrid Deep Learning System for Benign Paroxysmal Positional Vertigo Disorders Diagnostic

    Authors: Trung Xuan Pham, Jin Woong Choi, Rusty John Lloyd Mina, Thanh Nguyen, Sultan Rizky Madjid, Chang Dong Yoo

    Abstract: Herein, we introduce "Look and Diagnose" (LAD), a hybrid deep learning-based system that aims to support doctors in the medical field in diagnosing effectively the Benign Paroxysmal Positional Vertigo (BPPV) disorder. Given the body postures of the patient in the Dix-Hallpike and lateral head turns test, the visual information of both eyes is captured and fed into LAD for analyzing and classifying… ▽ More

    Submitted 15 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE Access 2022, 13 pages, 14 figures

  17. arXiv:2207.14131  [pdf, other

    cs.RO

    PencilNet: Zero-Shot Sim-to-Real Transfer Learning for Robust Gate Perception in Autonomous Drone Racing

    Authors: Huy Xuan Pham, Andriy Sarabakha, Mykola Odnoshyvkin, Erdal Kayacan

    Abstract: In autonomous and mobile robotics, one of the main challenges is the robust on-the-fly perception of the environment, which is often unknown and dynamic, like in autonomous drone racing. In this work, we propose a novel deep neural network-based perception method for racing gate detection -- PencilNet -- which relies on a lightweight neural network backbone on top of a pencil filter. This approach… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

    Comments: accepted for publication by IEEE RA-L/IROS 2022

  18. arXiv:2204.02120  [pdf, other

    cs.RO

    Event-based Navigation for Autonomous Drone Racing with Sparse Gated Recurrent Network

    Authors: Kristoffer Fogh Andersen, Huy Xuan Pham, Halil Ibrahim Ugurlu, Erdal Kayacan

    Abstract: Event-based vision has already revolutionized the perception task for robots by promising faster response, lower energy consumption, and lower bandwidth without introducing motion blur. In this work, a novel deep learning method based on gated recurrent units utilizing sparse convolutions for detecting gates in a race track is proposed using event-based vision for the autonomous drone racing probl… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted to present at the 20th European Control Conference (ECC)

  19. arXiv:2203.17248  [pdf, other

    cs.LG cs.AI

    Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo

    Authors: Chaoning Zhang, Kang Zhang, Trung X. Pham, Axi Niu, Zhinan Qiao, Chang D. Yoo, In So Kweon

    Abstract: Contrastive learning (CL) is widely known to require many negative samples, 65536 in MoCo for instance, for which the performance of a dictionary-free framework is often inferior because the negative sample size (NSS) is limited by its mini-batch size (MBS). To decouple the NSS from the MBS, a dynamic dictionary has been adopted in a large volume of CL frameworks, among which arguably the most pop… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted by CVPR2022

  20. arXiv:2203.16262  [pdf, other

    cs.LG cs.AI

    How Does SimSiam Avoid Collapse Without Negative Samples? A Unified Understanding with Self-supervised Contrastive Learning

    Authors: Chaoning Zhang, Kang Zhang, Chenshuang Zhang, Trung X. Pham, Chang D. Yoo, In So Kweon

    Abstract: To avoid collapse in self-supervised learning (SSL), a contrastive loss is widely used but often requires a large number of negative samples. Without negative samples yet achieving competitive performance, a recent work has attracted significant attention for providing a minimalist simple Siamese (SimSiam) method to avoid collapse. However, the reason for how it avoids collapse without negative sa… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: accepted on ICLR 2022

  21. arXiv:2203.05735  [pdf, other

    cs.LG

    An Efficient Video Streaming Architecture with QoS Control for Virtual Desktop Infrastructure in Cloud Computing

    Authors: Huu-Quoc Nguyen, Tien-Dung Nguyen, Van-Nam Pham, Xuan-Qui Pham, Quang-Thai Ngo, Eui-Nam Huh

    Abstract: In virtual desktop infrastructure (VDI) environments, the remote display protocol has a big responsibility to transmit video data from a data center-hosted desktop to the endpoint. The protocol must ensure a high level of client perceived end-to-end quality of service (QoS) under heavy work load conditions. Each remote display protocol works differently depending on the network and which applicati… ▽ More

    Submitted 10 March, 2022; originally announced March 2022.

    Comments: 26 pages, Multimedia Tools and Applications Journal

  22. arXiv:2202.10336  [pdf, other

    cs.CY cs.AI cs.LG

    Artificial Intelligence for the Metaverse: A Survey

    Authors: Thien Huynh-The, Quoc-Viet Pham, Xuan-Qui Pham, Thanh Thi Nguyen, Zhu Han, Dong-Seong Kim

    Abstract: Along with the massive growth of the Internet from the 1990s until now, various innovative technologies have been created to bring users breathtaking experiences with more virtual interactions in cyberspace. Many virtual environments with thousands of services and applications, from social networks to virtual gaming worlds, have been developed with immersive experience and digital transformation,… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  23. arXiv:2108.00475  [pdf, other

    cs.CV eess.IV

    Self-supervised Learning with Local Attention-Aware Feature

    Authors: Trung X. Pham, Rusty John Lloyd Mina, Dias Issa, Chang D. Yoo

    Abstract: In this work, we propose a novel methodology for self-supervised learning for generating global and local attention-aware visual features. Our approach is based on training a model to differentiate between specific image transformations of an input sample and the patched images. Utilizing this approach, the proposed method is able to outperform the previous best competitor by 1.03% on the Tiny-Ima… ▽ More

    Submitted 1 August, 2021; originally announced August 2021.

    Comments: 5 pages, 4 figures

  24. arXiv:2102.02547  [pdf, other

    cs.CV

    CHEF: Cross-modal Hierarchical Embeddings for Food Domain Retrieval

    Authors: Hai X. Pham, Ricardo Guerrero, Jiatong Li, Vladimir Pavlovic

    Abstract: Despite the abundance of multi-modal data, such as image-text pairs, there has been little effort in understanding the individual entities and their different roles in the construction of these data instances. In this work, we endeavour to discover the entities and their corresponding importance in cooking recipes automaticall} as a visual-linguistic association problem. More specifically, we intr… ▽ More

    Submitted 4 February, 2021; originally announced February 2021.

    Comments: 22 pages, accepted in AAAI 2021

  25. arXiv:2012.01345  [pdf, other

    cs.CV cs.IR cs.LG

    Cross-Modal Retrieval and Synthesis (X-MRS): Closing the Modality Gap in Shared Representation Learning

    Authors: Ricardo Guerrero, Hai Xuan Pham, Vladimir Pavlovic

    Abstract: Computational food analysis (CFA) naturally requires multi-modal evidence of a particular food, e.g., images, recipe text, etc. A key to making CFA possible is multi-modal shared representation learning, which aims to create a joint representation of the multiple views (text and image) of the data. In this work we propose a method for food domain cross-modal shared representation learning that pre… ▽ More

    Submitted 30 September, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

  26. arXiv:2010.09623  [pdf, other

    cs.CL

    An Empirical Study for Vietnamese Constituency Parsing with Pre-training

    Authors: Tuan-Vi Tran, Xuan-Thien Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

    Abstract: In this work, we use a span-based approach for Vietnamese constituency parsing. Our method follows the self-attention encoder architecture and a chart decoder using a CKY-style inference algorithm. We present analyses of the experiment results of the comparison of our empirical method using pre-training models XLM-Roberta and PhoBERT on both Vietnamese datasets VietTreebank and NIIVTB1. The result… ▽ More

    Submitted 19 October, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

  27. arXiv:1909.06720  [pdf, other

    cs.CV

    Cascade RPN: Delving into High-Quality Region Proposal Network with Adaptive Convolution

    Authors: Thang Vu, Hyunjun Jang, Trung X. Pham, Chang D. Yoo

    Abstract: This paper considers an architecture referred to as Cascade Region Proposal Network (Cascade RPN) for improving the region-proposal quality and detection performance by \textit{systematically} addressing the limitation of the conventional RPN that \textit{heuristically defines} the anchors and \textit{aligns} the features to the anchors. First, instead of using multiple anchors with predefined sca… ▽ More

    Submitted 4 December, 2019; v1 submitted 14 September, 2019; originally announced September 2019.

    Comments: To appear in NeurIPS 2019 (spotlight)

  28. arXiv:1811.00690  [pdf, other

    cs.RO

    A Multi-Robotic System for Environmental Cleaning

    Authors: Chuong Le, Huy Xuan Pham, Hung Manh La

    Abstract: There is a lot of waste in an industrial environment that could cause harmful effects to both the products and the workers resulting in product defects, itchy eyes or chronic obstructive pulmonary disease, etc. While automative cleaning robots could be used, the environment is often too big for one robot to clean alone in addition to the fact that it does not have adequate stored dirt capacity. We… ▽ More

    Submitted 1 November, 2018; originally announced November 2018.

  29. arXiv:1810.01641  [pdf, other

    cs.CV

    PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report

    Authors: Andrey Ignatov, Radu Timofte, Thang Van Vu, Tung Minh Luu, Trung X Pham, Cao Van Nguyen, Yongwoo Kim, Jae-Seok Choi, Munchurl Kim, Jie Huang, Jiewen Ran, Chen Xing, Xingguang Zhou, Pengfei Zhu, Mingrui Geng, Yawei Li, Eirikur Agustsson, Shuhang Gu, Luc Van Gool, Etienne de Stoutz, Nikolay Kobyshev, Kehui Nie, Yan Zhao, Gen Li, Tong Tong , et al. (23 additional authors not shown)

    Abstract: This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones. The challenge consisted of two tracks. In the first one, participants were solving the classical image super-resolution problem with a bicubic downscaling factor of 4. The second track was aimed at real-world photo enhancement, and the goal was to map lo… ▽ More

    Submitted 3 October, 2018; originally announced October 2018.

  30. arXiv:1803.07926  [pdf, other

    cs.RO

    A Distributed Control Framework of Multiple Unmanned Aerial Vehicles for Dynamic Wildfire Tracking

    Authors: Huy Xuan Pham, Hung Manh La, David Feil-Seifer, Matthew Dean

    Abstract: Wild-land fire fighting is a hazardous job. A key task for firefighters is to observe the "fire front" to chart the progress of the fire and areas that will likely spread next. Lack of information of the fire front causes many accidents. Using Unmanned Aerial Vehicles (UAVs) to cover wildfire is promising because it can replace humans in hazardous fire tracking and significantly reduce operation c… ▽ More

    Submitted 19 March, 2018; originally announced March 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1704.02630

  31. arXiv:1803.07716  [pdf, other

    cs.CV

    Generative Adversarial Talking Head: Bringing Portraits to Life with a Weakly Supervised Neural Network

    Authors: Hai X. Pham, Yuting Wang, Vladimir Pavlovic

    Abstract: This paper presents Generative Adversarial Talking Head (GATH), a novel deep generative neural network that enables fully automatic facial expression synthesis of an arbitrary portrait with continuous action unit (AU) coefficients. Specifically, our model directly manipulates image pixels to make the unseen subject in the still photo express various emotions controlled by values of facial AU coeff… ▽ More

    Submitted 28 March, 2018; v1 submitted 20 March, 2018; originally announced March 2018.

    Comments: Fix typos, add youtube link of supplementary video

  32. arXiv:1803.07250  [pdf, other

    cs.RO

    Cooperative and Distributed Reinforcement Learning of Drones for Field Coverage

    Authors: Huy Xuan Pham, Hung Manh La, David Feil-Seifer, Aria Nefian

    Abstract: This paper proposes a distributed Multi-Agent Reinforcement Learning (MARL) algorithm for a team of Unmanned Aerial Vehicles (UAVs). The proposed MARL algorithm allows UAVs to learn cooperatively to provide a full coverage of an unknown field of interest while minimizing the overlapping sections among their field of views. Two challenges in MARL for such a system are discussed in the paper: firstl… ▽ More

    Submitted 16 September, 2018; v1 submitted 20 March, 2018; originally announced March 2018.

  33. arXiv:1801.05086  [pdf, other

    cs.RO

    Autonomous UAV Navigation Using Reinforcement Learning

    Authors: Huy X. Pham, Hung M. La, David Feil-Seifer, Luan V. Nguyen

    Abstract: Unmanned aerial vehicles (UAV) are commonly used for missions in unknown environments, where an exact mathematical model of the environment may not be available. This paper provides a framework for using reinforcement learning to allow the UAV to navigate successfully in such environments. We conducted our simulation and real implementation to show how the UAVs can successfully learn to navigate t… ▽ More

    Submitted 15 January, 2018; originally announced January 2018.

  34. arXiv:1711.10124  [pdf, ps, other

    cs.CL

    Vietnamese Semantic Role Labelling

    Authors: Phuong Le-Hong, Thai Hoang Pham, Xuan Khoai Pham, Thi Minh Huyen Nguyen, Thi Luong Nguyen, Minh Hiep Nguyen

    Abstract: In this paper, we study semantic role labelling (SRL), a subtask of semantic parsing of natural language sentences and its application for the Vietnamese language. We present our effort in building Vietnamese PropBank, the first Vietnamese SRL corpus and a software system for labelling semantic roles of Vietnamese texts. In particular, we present a novel constituent extraction algorithm in the arg… ▽ More

    Submitted 27 November, 2017; originally announced November 2017.

    Comments: Accepted to the VNU Journal of Science

  35. arXiv:1710.00920  [pdf, other

    cs.CV

    End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech

    Authors: Hai X. Pham, Yuting Wang, Vladimir Pavlovic

    Abstract: We present a deep learning framework for real-time speech-driven 3D facial animation from just raw waveforms. Our deep neural network directly maps an input sequence of speech audio to a series of micro facial action unit activations and head rotations to drive a 3D blendshape face model. In particular, our deep model is able to learn the latent representations of time-varying contextual informati… ▽ More

    Submitted 7 December, 2017; v1 submitted 2 October, 2017; originally announced October 2017.

  36. arXiv:1709.07104  [pdf, ps, other

    cs.CL

    On the Use of Machine Translation-Based Approaches for Vietnamese Diacritic Restoration

    Authors: Thai-Hoang Pham, Xuan-Khoai Pham, Phuong Le-Hong

    Abstract: This paper presents an empirical study of two machine translation-based approaches for Vietnamese diacritic restoration problem, including phrase-based and neural-based machine translation models. This is the first work that applies neural-based machine translation method to this problem and gives a thorough comparison to the phrase-based machine translation method which is the current state-of-th… ▽ More

    Submitted 26 October, 2017; v1 submitted 20 September, 2017; originally announced September 2017.

    Comments: 4 pages, 2 figures, 4 tables, accepted to IALP 2017

  37. arXiv:1708.07241  [pdf, other

    cs.CL

    NNVLP: A Neural Network-Based Vietnamese Language Processing Toolkit

    Authors: Thai-Hoang Pham, Xuan-Khoai Pham, Tuan-Anh Nguyen, Phuong Le-Hong

    Abstract: This paper demonstrates neural network-based toolkit namely NNVLP for essential Vietnamese language processing tasks including part-of-speech (POS) tagging, chunking, named entity recognition (NER). Our toolkit is a combination of bidirectional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), Conditional Random Field (CRF), using pre-trained word embeddings as input, which ach… ▽ More

    Submitted 19 October, 2017; v1 submitted 23 August, 2017; originally announced August 2017.

    Comments: 4 pages, 5 figures, 6 tables, accepted to IJCNLP 2017

  38. arXiv:1705.04038  [pdf, ps, other

    cs.CL

    Building a Semantic Role Labelling System for Vietnamese

    Authors: Thai-Hoang Pham, Xuan-Khoai Pham, Phuong Le-Hong

    Abstract: Semantic role labelling (SRL) is a task in natural language processing which detects and classifies the semantic arguments associated with the predicates of a sentence. It is an important step towards understanding the meaning of a natural language. There exists SRL systems for well-studied languages like English, Chinese or Japanese but there is not any such system for the Vietnamese language. In… ▽ More

    Submitted 11 May, 2017; originally announced May 2017.

    Comments: 8 pages, ICDIM 2015

  39. arXiv:1704.07938  [pdf

    cs.LG

    An ensemble-based online learning algorithm for streaming data

    Authors: Tien Thanh Nguyen, Thi Thu Thuy Nguyen, Xuan Cuong Pham, Alan Wee-Chung Liew, James C. Bezdek

    Abstract: In this study, we introduce an ensemble-based approach for online machine learning. The ensemble of base classifiers in our approach is obtained by learning Naive Bayes classifiers on different training sets which are generated by projecting the original training set to lower dimensional space. We propose a mechanism to learn sequences of data using data chunks paradigm. The experiments conducted… ▽ More

    Submitted 25 April, 2017; originally announced April 2017.

    Comments: 19 pages, 3 figures

  40. arXiv:1704.02630  [pdf, other

    cs.RO

    A Distributed Control Framework for a Team of Unmanned Aerial Vehicles for Dynamic Wildfire Tracking

    Authors: Huy X. Pham, Hung M. La, David Feil-Seifer, Matthew Deans

    Abstract: Wildland fire fighting is a very dangerous job, and the lack of information of the fire front is one of main reasons that causes many accidents. Using unmanned aerial vehicle (UAV) to cover wildfire is promising because it can replace human in hazardous fire tracking and save operation costs significantly. In this paper we propose a distributed control framework designed for a team of UAVs that ca… ▽ More

    Submitted 9 April, 2017; originally announced April 2017.

  41. arXiv:1703.05411  [pdf

    cs.LG stat.ML

    Aggregation of Classifiers: A Justifiable Information Granularity Approach

    Authors: Tien Thanh Nguyen, Xuan Cuong Pham, Alan Wee-Chung Liew, Witold Pedrycz

    Abstract: In this study, we introduce a new approach to combine multi-classifiers in an ensemble system. Instead of using numeric membership values encountered in fixed combining rules, we construct interval membership values associated with each class prediction at the level of meta-data of observation by using concepts of information granules. In the proposed method, uncertainty (diversity) of findings pr… ▽ More

    Submitted 15 March, 2017; originally announced March 2017.

    Comments: 33 pages, 3 figures

  42. arXiv:1507.02779  [pdf, other

    cs.CV

    Robust Performance-driven 3D Face Tracking in Long Range Depth Scenes

    Authors: Hai X. Pham, Chongyu Chen, Luc N. Dao, Vladimir Pavlovic, Jianfei Cai, Tat-jen Cham

    Abstract: We introduce a novel robust hybrid 3D face tracking framework from RGBD video streams, which is capable of tracking head pose and facial actions without pre-calibration or intervention from a user. In particular, we emphasize on improving the tracking performance in instances where the tracked subject is at a large distance from the cameras, and the quality of point cloud deteriorates severely. Th… ▽ More

    Submitted 10 July, 2015; originally announced July 2015.

    Comments: 10 pages, 8 figures, 4 tables