-
DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation
Authors:
Vu Ngoc Tu,
Van Thong Huynh,
Hyung-Jeong Yang,
M. Zaigham Zaheer,
Shah Nawaz,
Karthik Nandakumar,
Soo-Hyung Kim
Abstract:
Conversational engagement estimation is posed as a regression problem, entailing the identification of the favorable attention and involvement of the participants in the conversation. This task arises as a crucial pursuit to gain insights into human's interaction dynamics and behavior patterns within a conversation. In this research, we introduce a dilated convolutional Transformer for modeling an…
▽ More
Conversational engagement estimation is posed as a regression problem, entailing the identification of the favorable attention and involvement of the participants in the conversation. This task arises as a crucial pursuit to gain insights into human's interaction dynamics and behavior patterns within a conversation. In this research, we introduce a dilated convolutional Transformer for modeling and estimating human engagement in the MULTIMEDIATE 2023 competition. Our proposed system surpasses the baseline models, exhibiting a noteworthy $7$\% improvement on test set and $4$\% on validation set. Moreover, we employ different modality fusion mechanism and show that for this type of data, a simple concatenated method with self-attention fusion gains the best performance.
△ Less
Submitted 31 July, 2023;
originally announced August 2023.
-
Multi-scale Transformer-based Network for Emotion Recognition from Multi Physiological Signals
Authors:
Tu Vu,
Van Thong Huynh,
Soo-Hyung Kim
Abstract:
This paper presents an efficient Multi-scale Transformer-based approach for the task of Emotion recognition from Physiological data, which has gained widespread attention in the research community due to the vast amount of information that can be extracted from these signals using modern sensors and machine learning techniques. Our approach involves applying a Multi-modal technique combined with s…
▽ More
This paper presents an efficient Multi-scale Transformer-based approach for the task of Emotion recognition from Physiological data, which has gained widespread attention in the research community due to the vast amount of information that can be extracted from these signals using modern sensors and machine learning techniques. Our approach involves applying a Multi-modal technique combined with scaling data to establish the relationship between internal body signals and human emotions. Additionally, we utilize Transformer and Gaussian Transformation techniques to improve signal encoding effectiveness and overall performance. Our model achieves decent results on the CASE dataset of the EPiC competition, with an RMSE score of 1.45.
△ Less
Submitted 7 May, 2023; v1 submitted 1 May, 2023;
originally announced May 2023.
-
The 1st Agriculture-Vision Challenge: Methods and Results
Authors:
Mang Tik Chiu,
Xingqian Xu,
Kai Wang,
Jennifer Hobbs,
Naira Hovakimyan,
Thomas S. Huang,
Honghui Shi,
Yunchao Wei,
Zilong Huang,
Alexander Schwing,
Robert Brunner,
Ivan Dozier,
Wyatt Dozier,
Karen Ghandilyan,
David Wilson,
Hyunseong Park,
Junhee Kim,
Sungho Kim,
Qinghui Liu,
Michael C. Kampffmeyer,
Robert Jenssen,
Arnt B. Salberg,
Alexandre Barbosa,
Rodrigo Trevisan,
Bingchen Zhao
, et al. (17 additional authors not shown)
Abstract:
The first Agriculture-Vision Challenge aims to encourage research in developing novel and effective algorithms for agricultural pattern recognition from aerial images, especially for the semantic segmentation task associated with our challenge dataset. Around 57 participating teams from various countries compete to achieve state-of-the-art in aerial agriculture semantic segmentation. The Agricultu…
▽ More
The first Agriculture-Vision Challenge aims to encourage research in developing novel and effective algorithms for agricultural pattern recognition from aerial images, especially for the semantic segmentation task associated with our challenge dataset. Around 57 participating teams from various countries compete to achieve state-of-the-art in aerial agriculture semantic segmentation. The Agriculture-Vision Challenge Dataset was employed, which comprises of 21,061 aerial and multi-spectral farmland images. This paper provides a summary of notable methods and results in the challenge. Our submission server and leaderboard will continue to open for researchers that are interested in this challenge dataset and task; the link can be found here.
△ Less
Submitted 23 April, 2020; v1 submitted 21 April, 2020;
originally announced April 2020.
-
Eye Semantic Segmentation with a Lightweight Model
Authors:
Van Thong Huynh,
Soo-Hyung Kim,
Guee-Sang Lee,
Hyung-Jeong Yang
Abstract:
In this paper, we present a multi-class eye segmentation method that can run the hardware limitations for real-time inference. Our approach includes three major stages: get a grayscale image from the input, segment three distinct eye region with a deep network, and remove incorrect areas with heuristic filters. Our model based on the encoder decoder structure with the key is the depthwise convolut…
▽ More
In this paper, we present a multi-class eye segmentation method that can run the hardware limitations for real-time inference. Our approach includes three major stages: get a grayscale image from the input, segment three distinct eye region with a deep network, and remove incorrect areas with heuristic filters. Our model based on the encoder decoder structure with the key is the depthwise convolution operation to reduce the computation cost. We experiment on OpenEDS, a large scale dataset of eye images captured by a head-mounted display with two synchronized eye facing cameras. We achieved the mean intersection over union (mIoU) of 94.85% with a model of size 0.4 megabytes. The source code are available https://github.com/th2l/Eye_VR_Segmentation
△ Less
Submitted 4 November, 2019;
originally announced November 2019.