Search | arXiv e-print repository

ViToSA: Audio-Based Toxic Spans Detection on Vietnamese Speech Utterances

Authors: Huy Ba Do, Vy Le-Phuong Huynh, Luan Thanh Nguyen

Abstract: Toxic speech on online platforms is a growing concern, impacting user experience and online safety. While text-based toxicity detection is well-studied, audio-based approaches remain underexplored, especially for low-resource languages like Vietnamese. This paper introduces ViToSA (Vietnamese Toxic Spans Audio), the first dataset for toxic spans detection in Vietnamese speech, comprising 11,000 au… ▽ More Toxic speech on online platforms is a growing concern, impacting user experience and online safety. While text-based toxicity detection is well-studied, audio-based approaches remain underexplored, especially for low-resource languages like Vietnamese. This paper introduces ViToSA (Vietnamese Toxic Spans Audio), the first dataset for toxic spans detection in Vietnamese speech, comprising 11,000 audio samples (25 hours) with accurate human-annotated transcripts. We propose a pipeline that combines ASR and toxic spans detection for fine-grained identification of toxic content. Our experiments show that fine-tuning ASR models on ViToSA significantly reduces WER when transcribing toxic speech, while the text-based toxic spans detection (TSD) models outperform existing baselines. These findings establish a novel benchmark for Vietnamese audio-based toxic spans detection, paving the way for future research in speech content moderation. △ Less

Submitted 31 May, 2025; originally announced June 2025.

Comments: Accepted for presentation at INTERSPEECH 2025

arXiv:2411.18790 [pdf, ps, other]

Fast Schulze Voting Using Quickselect

Authors: Arushi Arora, David Eppstein, Randy Le Huynh

Abstract: The Schulze voting method aggregates voter preference data using maxmin-weight graph paths, achieving the Condorcet property that a candidate who would win every head-to-head contest will also win the overall election. Once the voter preferences among $m$ candidates have been arranged into an $m\times m$ matrix of pairwise election outcomes, a previous algorithm of Sornat, Vassilevska Williams and… ▽ More The Schulze voting method aggregates voter preference data using maxmin-weight graph paths, achieving the Condorcet property that a candidate who would win every head-to-head contest will also win the overall election. Once the voter preferences among $m$ candidates have been arranged into an $m\times m$ matrix of pairwise election outcomes, a previous algorithm of Sornat, Vassilevska Williams and Xu (EC '21) determines the Schulze winner in randomized expected time $O(m^2\log^4 m)$. We improve this to randomized expected time $O(m^2\log m)$ using a modified version of quickselect. △ Less

Submitted 27 November, 2024; originally announced November 2024.

arXiv:2403.17458 [pdf, ps, other]

Expectations Versus Reality: Evaluating Intrusion Detection Systems in Practice

Authors: Jake Hesford, Daniel Cheng, Alan Wan, Larry Huynh, Seungho Kim, Hyoungshick Kim, Jin B. Hong

Abstract: Our paper provides empirical comparisons between recent IDSs to provide an objective comparison between them to help users choose the most appropriate solution based on their requirements. Our results show that no one solution is the best, but is dependent on external variables such as the types of attacks, complexity, and network environment in the dataset. For example, BoT_IoT and Stratosphere I… ▽ More Our paper provides empirical comparisons between recent IDSs to provide an objective comparison between them to help users choose the most appropriate solution based on their requirements. Our results show that no one solution is the best, but is dependent on external variables such as the types of attacks, complexity, and network environment in the dataset. For example, BoT_IoT and Stratosphere IoT datasets both capture IoT-related attacks, but the deep neural network performed the best when tested using the BoT_IoT dataset while HELAD performed the best when tested using the Stratosphere IoT dataset. So although we found that a deep neural network solution had the highest average F1 scores on tested datasets, it is not always the best-performing one. We further discuss difficulties in using IDS from literature and project repositories, which complicated drawing definitive conclusions regarding IDS selection. △ Less

Submitted 28 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

Comments: 10 pages

MSC Class: 68M25; 68M20 ACM Class: C.4; D.m

arXiv:2308.11269 [pdf, other]

Quantum-Inspired Machine Learning: a Survey

Authors: Larry Huynh, Jin Hong, Ajmal Mian, Hajime Suzuki, Yanqiu Wu, Seyit Camtepe

Abstract: Quantum-inspired Machine Learning (QiML) is a burgeoning field, receiving global attention from researchers for its potential to leverage principles of quantum mechanics within classical computational frameworks. However, current review literature often presents a superficial exploration of QiML, focusing instead on the broader Quantum Machine Learning (QML) field. In response to this gap, this su… ▽ More Quantum-inspired Machine Learning (QiML) is a burgeoning field, receiving global attention from researchers for its potential to leverage principles of quantum mechanics within classical computational frameworks. However, current review literature often presents a superficial exploration of QiML, focusing instead on the broader Quantum Machine Learning (QML) field. In response to this gap, this survey provides an integrated and comprehensive examination of QiML, exploring QiML's diverse research domains including tensor network simulations, dequantized algorithms, and others, showcasing recent advancements, practical applications, and illuminating potential future research avenues. Further, a concrete definition of QiML is established by analyzing various prior interpretations of the term and their inherent ambiguities. As QiML continues to evolve, we anticipate a wealth of future developments drawing from quantum mechanics, quantum computing, and classical machine learning, enriching the field further. This survey serves as a guide for researchers and practitioners alike, providing a holistic understanding of QiML's current landscape and future directions. △ Less

Submitted 8 September, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: 59 pages, 13 figures, 9 tables. - Edited for spelling, grammar, and corrected minor typos in formulas - Adjusted wording in places for better clarity - Corrected contact info - Added Table 1 to clarify variables used in dequantized algs. - Added subsections in QVAS discussing QCBMs and TN-based VQC models - Included additional references as requested by authors to ensure a more exhaustive survey

MSC Class: 68Q09 ACM Class: A.1; I.5.4

arXiv:2208.04717 [pdf, other]

Cascaded and Generalizable Neural Radiance Fields for Fast View Synthesis

Authors: Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila

Abstract: We present CG-NeRF, a cascade and generalizable neural radiance fields method for view synthesis. Recent generalizing view synthesis methods can render high-quality novel views using a set of nearby input views. However, the rendering speed is still slow due to the nature of uniformly-point sampling of neural radiance fields. Existing scene-specific methods can train and render novel views efficie… ▽ More We present CG-NeRF, a cascade and generalizable neural radiance fields method for view synthesis. Recent generalizing view synthesis methods can render high-quality novel views using a set of nearby input views. However, the rendering speed is still slow due to the nature of uniformly-point sampling of neural radiance fields. Existing scene-specific methods can train and render novel views efficiently but can not generalize to unseen data. Our approach addresses the problems of fast and generalizing view synthesis by proposing two novel modules: a coarse radiance fields predictor and a convolutional-based neural renderer. This architecture infers consistent scene geometry based on the implicit neural fields and renders new views efficiently using a single GPU. We first train CG-NeRF on multiple 3D scenes of the DTU dataset, and the network can produce high-quality and accurate novel views on unseen real and synthetic data using only photometric losses. Moreover, our method can leverage a denser set of reference images of a single scene to produce accurate novel views without relying on additional explicit representations and still maintains the high-speed rendering of the pre-trained model. Experimental results show that CG-NeRF outperforms state-of-the-art generalizable neural rendering methods on various synthetic and real datasets. △ Less

Submitted 19 November, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

Comments: Accepted at IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

arXiv:2203.01994 [pdf, other]

Fast Neural Architecture Search for Lightweight Dense Prediction Networks

Authors: Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila

Abstract: We present LDP, a lightweight dense prediction neural architecture search (NAS) framework. Starting from a pre-defined generic backbone, LDP applies the novel Assisted Tabu Search for efficient architecture exploration. LDP is fast and suitable for various dense estimation problems, unlike previous NAS methods that are either computational demanding or deployed only for a single subtask. The perfo… ▽ More We present LDP, a lightweight dense prediction neural architecture search (NAS) framework. Starting from a pre-defined generic backbone, LDP applies the novel Assisted Tabu Search for efficient architecture exploration. LDP is fast and suitable for various dense estimation problems, unlike previous NAS methods that are either computational demanding or deployed only for a single subtask. The performance of LPD is evaluated on monocular depth estimation, semantic segmentation, and image super-resolution tasks on diverse datasets, including NYU-Depth-v2, KITTI, Cityscapes, COCO-stuff, DIV2K, Set5, Set14, BSD100, Urban100. Experiments show that the proposed framework yields consistent improvements on all tested dense prediction tasks, while being $5\%-315\%$ more compact in terms of the number of model parameters than prior arts. △ Less

Submitted 9 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: 15 pages, 11 figures, 8 tables. arXiv admin note: substantial text overlap with arXiv:2108.11105

arXiv:2108.12502 [pdf, other]

doi 10.1145/3460418.3479320

StressNAS: Affect State and Stress Detection Using Neural Architecture Search

Authors: Lam Huynh, Tri Nguyen, Thu Nguyen, Susanna Pirttikangas, Pekka Siirtola

Abstract: Smartwatches have rapidly evolved towards capabilities to accurately capture physiological signals. As an appealing application, stress detection attracts many studies due to its potential benefits to human health. It is propitious to investigate the applicability of deep neural networks (DNN) to enhance human decision-making through physiological signals. However, manually engineering DNN proves… ▽ More Smartwatches have rapidly evolved towards capabilities to accurately capture physiological signals. As an appealing application, stress detection attracts many studies due to its potential benefits to human health. It is propitious to investigate the applicability of deep neural networks (DNN) to enhance human decision-making through physiological signals. However, manually engineering DNN proves a tedious task especially in stress detection due to the complex nature of this phenomenon. To this end, we propose an optimized deep neural network training scheme using neural architecture search merely using wrist-worn data from WESAD. Experiments show that our approach outperforms traditional ML methods by 8.22% and 6.02% in the three-state and two-state classifiers, respectively, using the combination of WESAD wrist signals. Moreover, the proposed method can minimize the need for human-design DNN while improving performance by 4.39% (three-state) and 8.99% (binary). △ Less

Submitted 26 August, 2021; originally announced August 2021.

Comments: 5 pages, 2 figures

arXiv:2108.11105 [pdf, other]

Lightweight Monocular Depth with a Novel Neural Architecture Search Method

Authors: Lam Huynh, Phong Nguyen, Jiri Matas, Esa Rahtu, Janne Heikkila

Abstract: This paper presents a novel neural architecture search method, called LiDNAS, for generating lightweight monocular depth estimation models. Unlike previous neural architecture search (NAS) approaches, where finding optimized networks are computationally highly demanding, the introduced novel Assisted Tabu Search leads to efficient architecture exploration. Moreover, we construct the search space o… ▽ More This paper presents a novel neural architecture search method, called LiDNAS, for generating lightweight monocular depth estimation models. Unlike previous neural architecture search (NAS) approaches, where finding optimized networks are computationally highly demanding, the introduced novel Assisted Tabu Search leads to efficient architecture exploration. Moreover, we construct the search space on a pre-defined backbone network to balance layer diversity and search space size. The LiDNAS method outperforms the state-of-the-art NAS approach, proposed for disparity and depth estimation, in terms of search efficiency and output model performance. The LiDNAS optimized models achieve results superior to compact depth estimation state-of-the-art on NYU-Depth-v2, KITTI, and ScanNet, while being 7%-500% more compact in size, i.e the number of model parameters. △ Less

Submitted 25 August, 2021; originally announced August 2021.

Comments: 11 pages, 10 figures

arXiv:2108.11098 [pdf, other]

Monocular Depth Estimation Primed by Salient Point Detection and Normalized Hessian Loss

Authors: Lam Huynh, Matteo Pedone, Phong Nguyen, Jiri Matas, Esa Rahtu, Janne Heikkila

Abstract: Deep neural networks have recently thrived on single image depth estimation. That being said, current developments on this topic highlight an apparent compromise between accuracy and network size. This work proposes an accurate and lightweight framework for monocular depth estimation based on a self-attention mechanism stemming from salient point detection. Specifically, we utilize a sparse set of… ▽ More Deep neural networks have recently thrived on single image depth estimation. That being said, current developments on this topic highlight an apparent compromise between accuracy and network size. This work proposes an accurate and lightweight framework for monocular depth estimation based on a self-attention mechanism stemming from salient point detection. Specifically, we utilize a sparse set of keypoints to train a FuSaNet model that consists of two major components: Fusion-Net and Saliency-Net. In addition, we introduce a normalized Hessian loss term invariant to scaling and shear along the depth direction, which is shown to substantially improve the accuracy. The proposed method achieves state-of-the-art results on NYU-Depth-v2 and KITTI while using 3.1-38.4 times smaller model in terms of the number of parameters than baseline approaches. Experiments on the SUN-RGBD further demonstrate the generalizability of the proposed method. △ Less

Submitted 25 August, 2021; originally announced August 2021.

Comments: 11 pages, 7 figures

arXiv:2104.02773 [pdf, other]

A New Dimension in Testimony: Relighting Video with Reflectance Field Exemplars

Authors: Loc Huynh, Bipin Kishore, Paul Debevec

Abstract: We present a learning-based method for estimating 4D reflectance field of a person given video footage illuminated under a flat-lit environment of the same subject. For training data, we use one light at a time to illuminate the subject and capture the reflectance field data in a variety of poses and viewpoints. We estimate the lighting environment of the input video footage and use the subject's… ▽ More We present a learning-based method for estimating 4D reflectance field of a person given video footage illuminated under a flat-lit environment of the same subject. For training data, we use one light at a time to illuminate the subject and capture the reflectance field data in a variety of poses and viewpoints. We estimate the lighting environment of the input video footage and use the subject's reflectance field to create synthetic images of the subject illuminated by the input lighting environment. We then train a deep convolutional neural network to regress the reflectance field from the synthetic images. We also use a differentiable renderer to provide feedback for the network by matching the relit images with the input video frames. This semi-supervised training scheme allows the neural network to handle unseen poses in the dataset as well as compensate for the lighting estimation error. We evaluate our method on the video footage of the real Holocaust survivors and show that our method outperforms the state-of-the-art methods in both realism and speed. △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2012.10296 [pdf, other]

Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion

Authors: Lam Huynh, Phong Nguyen-Ha, Jiri Matas, Esa Rahtu, Janne Heikkila

Abstract: In this paper, we propose enhancing monocular depth estimation by adding 3D points as depth guidance. Unlike existing depth completion methods, our approach performs well on extremely sparse and unevenly distributed point clouds, which makes it agnostic to the source of the 3D points. We achieve this by introducing a novel multi-scale 3D point fusion network that is both lightweight and efficient.… ▽ More In this paper, we propose enhancing monocular depth estimation by adding 3D points as depth guidance. Unlike existing depth completion methods, our approach performs well on extremely sparse and unevenly distributed point clouds, which makes it agnostic to the source of the 3D points. We achieve this by introducing a novel multi-scale 3D point fusion network that is both lightweight and efficient. We demonstrate its versatility on two different depth estimation problems where the 3D points have been acquired with conventional structure-from-motion and LiDAR. In both cases, our network performs on par with state-of-the-art depth completion methods and achieves significantly higher accuracy when only a small number of points is used while being more compact in terms of the number of parameters. We show that our method outperforms some contemporary deep learning based multi-view stereo and structure-from-motion methods both in accuracy and in compactness. △ Less

Submitted 25 August, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

Comments: 10 pages, 9 figures

arXiv:2011.14398 [pdf, other]

RGBD-Net: Predicting color and depth images for novel views synthesis

Authors: Phong Nguyen-Ha, Animesh Karnewar, Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila

Abstract: We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network. The former one predicts depth maps of the target views by using adaptive depth scaling, while the latter one leverages the predicted depths and renders spatially and temporally consistent target images.… ▽ More We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network. The former one predicts depth maps of the target views by using adaptive depth scaling, while the latter one leverages the predicted depths and renders spatially and temporally consistent target images. In the experimental evaluation on standard datasets, RGBD-Net not only outperforms the state-of-the-art by a clear margin, but it also generalizes well to new scenes without per-scene optimization. Moreover, we show that RGBD-Net can be optionally trained without depth supervision while still retaining high-quality rendering. Thanks to the depth regression network, RGBD-Net can be also used for creating dense 3D point clouds that are more accurate than those produced by some state-of-the-art multi-view stereo methods. △ Less

Submitted 9 July, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

Comments: 19 pages, 15 figures. Code will be available at: https://github.com/phongnhhn92/RGBDNet

arXiv:2010.06034 [pdf, other]

doi 10.1016/j.media.2021.102002

Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy

Authors: Sharib Ali, Mariia Dmitrieva, Noha Ghatwary, Sophia Bano, Gorkem Polat, Alptekin Temizel, Adrian Krenzer, Amar Hekalo, Yun Bo Guo, Bogdan Matuszewski, Mourad Gridach, Irina Voiculescu, Vishnusai Yoganand, Arnav Chavan, Aryan Raj, Nhan T. Nguyen, Dat Q. Tran, Le Duy Huynh, Nicolas Boutry, Shahadate Rezvy, Haijian Chen, Yoon Ho Choi, Anand Subramanian, Velmurugan Balasubramanian, Xiaohong W. Gao , et al. (12 additional authors not shown)

Abstract: The Endoscopy Computer Vision Challenge (EndoCV) is a crowd-sourcing initiative to address eminent problems in developing reliable computer aided detection and diagnosis endoscopy systems and suggest a pathway for clinical translation of technologies. Whilst endoscopy is a widely used diagnostic and treatment tool for hollow-organs, there are several core challenges often faced by endoscopists, ma… ▽ More The Endoscopy Computer Vision Challenge (EndoCV) is a crowd-sourcing initiative to address eminent problems in developing reliable computer aided detection and diagnosis endoscopy systems and suggest a pathway for clinical translation of technologies. Whilst endoscopy is a widely used diagnostic and treatment tool for hollow-organs, there are several core challenges often faced by endoscopists, mainly: 1) presence of multi-class artefacts that hinder their visual interpretation, and 2) difficulty in identifying subtle precancerous precursors and cancer abnormalities. Artefacts often affect the robustness of deep learning methods applied to the gastrointestinal tract organs as they can be confused with tissue of interest. EndoCV2020 challenges are designed to address research questions in these remits. In this paper, we present a summary of methods developed by the top 17 teams and provide an objective comparison of state-of-the-art methods and methods designed by the participants for two sub-challenges: i) artefact detection and segmentation (EAD2020), and ii) disease detection and segmentation (EDD2020). Multi-center, multi-organ, multi-class, and multi-modal clinical endoscopy datasets were compiled for both EAD2020 and EDD2020 sub-challenges. The out-of-sample generalization ability of detection algorithms was also evaluated. Whilst most teams focused on accuracy improvements, only a few methods hold credibility for clinical usability. The best performing teams provided solutions to tackle class imbalance, and variabilities in size, origin, modality and occurrences by exploring data augmentation, data fusion, and optimal class thresholding techniques. △ Less

Submitted 17 February, 2021; v1 submitted 12 October, 2020; originally announced October 2020.

Comments: 32 pages

arXiv:2004.04548 [pdf, other]

Sequential View Synthesis with Transformer

Authors: Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Janne Heikkila

Abstract: This paper addresses the problem of novel view synthesis by means of neural rendering, where we are interested in predicting the novel view at an arbitrary camera pose based on a given set of input images from other viewpoints. Using the known query pose and input poses, we create an ordered set of observations that leads to the target view. Thus, the problem of single novel view synthesis is refo… ▽ More This paper addresses the problem of novel view synthesis by means of neural rendering, where we are interested in predicting the novel view at an arbitrary camera pose based on a given set of input images from other viewpoints. Using the known query pose and input poses, we create an ordered set of observations that leads to the target view. Thus, the problem of single novel view synthesis is reformulated as a sequential view prediction task. In this paper, the proposed Transformer-based Generative Query Network (T-GQN) extends the neural-rendering methods by adding two new concepts. First, we use multi-view attention learning between context images to obtain multiple implicit scene representations. Second, we introduce a sequential rendering decoder to predict an image sequence, including the target view, based on the learned representations. Finally, we evaluate our model on various challenging datasets and demonstrate that our model not only gives consistent predictions but also doesn't require any retraining for finetuning. △ Less

Submitted 22 September, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

Comments: Code is available at: https://github.com/phongnhhn92/TransformerGQN; Supplementary material: https://bit.ly/3kEgnzU

arXiv:2004.02760 [pdf, other]

Guiding Monocular Depth Estimation Using Depth-Attention Volume

Authors: Lam Huynh, Phong Nguyen-Ha, Jiri Matas, Esa Rahtu, Janne Heikkila

Abstract: Recovering the scene depth from a single image is an ill-posed problem that requires additional priors, often referred to as monocular depth cues, to disambiguate different 3D interpretations. In recent works, those priors have been learned in an end-to-end manner from large datasets by using deep neural networks. In this paper, we propose guiding depth estimation to favor planar structures that a… ▽ More Recovering the scene depth from a single image is an ill-posed problem that requires additional priors, often referred to as monocular depth cues, to disambiguate different 3D interpretations. In recent works, those priors have been learned in an end-to-end manner from large datasets by using deep neural networks. In this paper, we propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments. This is achieved by incorporating a non-local coplanarity constraint to the network with a novel attention mechanism called depth-attention volume (DAV). Experiments on two popular indoor datasets, namely NYU-Depth-v2 and ScanNet, show that our method achieves state-of-the-art depth estimation results while using only a fraction of the number of parameters needed by the competing methods. △ Less

Submitted 16 August, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: 30 pages

arXiv:2001.09193 [pdf, other]

doi 10.1016/j.media.2021.102166

VerSe: A Vertebrae Labelling and Segmentation Benchmark for Multi-detector CT Images

Authors: Anjany Sekuboyina, Malek E. Husseini, Amirhossein Bayat, Maximilian Löffler, Hans Liebl, Hongwei Li, Giles Tetteh, Jan Kukačka, Christian Payer, Darko Štern, Martin Urschler, Maodong Chen, Dalong Cheng, Nikolas Lessmann, Yujin Hu, Tianfu Wang, Dong Yang, Daguang Xu, Felix Ambellan, Tamaz Amiranashvili, Moritz Ehlke, Hans Lamecker, Sebastian Lehnert, Marilia Lirio, Nicolás Pérez de Olaguer , et al. (44 additional authors not shown)

Abstract: Vertebral labelling and segmentation are two fundamental tasks in an automated spine processing pipeline. Reliable and accurate processing of spine images is expected to benefit clinical decision-support systems for diagnosis, surgery planning, and population-based analysis on spine and bone health. However, designing automated algorithms for spine processing is challenging predominantly due to co… ▽ More Vertebral labelling and segmentation are two fundamental tasks in an automated spine processing pipeline. Reliable and accurate processing of spine images is expected to benefit clinical decision-support systems for diagnosis, surgery planning, and population-based analysis on spine and bone health. However, designing automated algorithms for spine processing is challenging predominantly due to considerable variations in anatomy and acquisition protocols and due to a severe shortage of publicly available data. Addressing these limitations, the Large Scale Vertebrae Segmentation Challenge (VerSe) was organised in conjunction with the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2019 and 2020, with a call for algorithms towards labelling and segmentation of vertebrae. Two datasets containing a total of 374 multi-detector CT scans from 355 patients were prepared and 4505 vertebrae have individually been annotated at voxel-level by a human-machine hybrid algorithm (https://osf.io/nqjyw/, https://osf.io/t98fz/). A total of 25 algorithms were benchmarked on these datasets. In this work, we present the the results of this evaluation and further investigate the performance-variation at vertebra-level, scan-level, and at different fields-of-view. We also evaluate the generalisability of the approaches to an implicit domain shift in data by evaluating the top performing algorithms of one challenge iteration on data from the other iteration. The principal takeaway from VerSe: the performance of an algorithm in labelling and segmenting a spine scan hinges on its ability to correctly identify vertebrae in cases of rare anatomical variations. The content and code concerning VerSe can be accessed at: https://github.com/anjany/verse. △ Less

Submitted 5 April, 2022; v1 submitted 24 January, 2020; originally announced January 2020.

Comments: Challenge report for the VerSe 2019 and 2020. Published in Medical Image Analysis (DOI: https://doi.org/10.1016/j.media.2021.102166)

Journal ref: Medical Image Analysis, Volume 73, October 2021, 102166

arXiv:1904.05124 [pdf, other]

Predicting Novel Views Using Generative Adversarial Query Network

Authors: Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Janne Heikkila

Abstract: The problem of predicting a novel view of the scene using an arbitrary number of observations is a challenging problem for computers as well as for humans. This paper introduces the Generative Adversarial Query Network (GAQN), a general learning framework for novel view synthesis that combines Generative Query Network (GQN) and Generative Adversarial Networks (GANs). The conventional GQN encodes i… ▽ More The problem of predicting a novel view of the scene using an arbitrary number of observations is a challenging problem for computers as well as for humans. This paper introduces the Generative Adversarial Query Network (GAQN), a general learning framework for novel view synthesis that combines Generative Query Network (GQN) and Generative Adversarial Networks (GANs). The conventional GQN encodes input views into a latent representation that is used to generate a new view through a recurrent variational decoder. The proposed GAQN builds on this work by adding two novel aspects: First, we extend the current GQN architecture with an adversarial loss function for improving the visual quality and convergence speed. Second, we introduce a feature-matching loss function for stabilizing the training procedure. The experiments demonstrate that GAQN is able to produce high-quality results and faster convergence compared to the conventional approach. △ Less

Submitted 10 April, 2019; originally announced April 2019.

Comments: 12 pages, 4 figures, accepted for presentation at the Scandinavian Conference on Image Analysis 2019

arXiv:1604.00025 [pdf, other]

doi 10.1109/NTMS.2009.5384673

doi 10.1007/978-90-481-3662-9_77

doi 10.1007/978-90-481-3662-9_73

A Java Data Security Framework (JDSF) and its Case Studies

Authors: Serguei A. Mokhov, Lee Wei Huynh, Jian Li, Farid Rassai

Abstract: We present the design of something we call Confidentiality, Integrity and Authentication Sub-Frameworks, which are a part of a more general Java Data Security Framework (JDSF) designed to support various aspects related to data security (confidentiality, origin authentication, integrity, and SQL randomization). The JDSF was originally designed in 2007 for use in the two use-cases, MARF and HSQLDB,… ▽ More We present the design of something we call Confidentiality, Integrity and Authentication Sub-Frameworks, which are a part of a more general Java Data Security Framework (JDSF) designed to support various aspects related to data security (confidentiality, origin authentication, integrity, and SQL randomization). The JDSF was originally designed in 2007 for use in the two use-cases, MARF and HSQLDB, to allow a plug-in-like implementation of and verification of various security aspects and their generalization. The JDSF project explores secure data storage related issues from the point of view of data security in the two projects. A variety of common security aspects and tasks were considered in order to extract a spectrum of possible parameters these aspects require for the design an extensible frameworked API and its implementation. A particular challenge being tackled is an aggregation of diverse approaches and algorithms into a common set of Java APIs to cover all or at least most common aspects, and, at the same time keeping the framework as simple as possible. As a part of the framework, we provide the mentioned sub-frameworks' APIs to allow for the common algorithm implementations of the confidentiality, integrity, and authentication aspects for MARF's and HSQLDB's database(s). At the same time we perform a detailed overview of the related work and literature on data and database security that we considered as a possible input to design the JDSF. △ Less

Submitted 31 March, 2016; originally announced April 2016.

Comments: a 2007 project report; parts appeared in various conferences; includes index

arXiv:1505.00073 [pdf, other]

Bijective Deformations in $\mathbb{R}^n$ via Integral Curve Coordinates

Authors: Lisa Huynh, Yotam Gingold

Abstract: We introduce Integral Curve Coordinates, which identify each point in a bounded domain with a parameter along an integral curve of the gradient of a function $f$ on that domain; suitable functions have exactly one critical point, a maximum, in the domain, and the gradient of the function on the boundary points inward. Because every integral curve intersects the boundary exactly once, Integral Curv… ▽ More We introduce Integral Curve Coordinates, which identify each point in a bounded domain with a parameter along an integral curve of the gradient of a function $f$ on that domain; suitable functions have exactly one critical point, a maximum, in the domain, and the gradient of the function on the boundary points inward. Because every integral curve intersects the boundary exactly once, Integral Curve Coordinates provide a natural bijective mapping from one domain to another given a bijection of the boundary. Our approach can be applied to shapes in any dimension, provided that the boundary of the shape (or cage) is topologically equivalent to an $n$-sphere. We present a simple algorithm for generating a suitable function space for $f$ in any dimension. We demonstrate our approach in 2D and describe a practical (simple and robust) algorithm for tracing integral curves on a (piecewise-linear) triangulated regular grid. △ Less

Submitted 30 April, 2015; originally announced May 2015.

MSC Class: 37E30 ACM Class: I.3.5

arXiv:0906.0065 [pdf, other]

Managing Distributed MARF with SNMP

Authors: Serguei A. Mokhov, Lee Wei Huynh, Jian Li

Abstract: The scope of this project's work focuses on the research and prototyping of the extension of the Distributed MARF such that its services can be managed through the most popular management protocol familiarly, SNMP. The rationale behind SNMP vs. MARF's proprietary management protocols, is that can be integrated with the use of common network service and device management, so the administrators ca… ▽ More The scope of this project's work focuses on the research and prototyping of the extension of the Distributed MARF such that its services can be managed through the most popular management protocol familiarly, SNMP. The rationale behind SNMP vs. MARF's proprietary management protocols, is that can be integrated with the use of common network service and device management, so the administrators can manage MARF nodes via a already familiar protocol, as well as monitor their performance, gather statistics, set desired configuration, etc. perhaps using the same management tools they've been using for other network devices and application servers. △ Less

Submitted 26 July, 2009; v1 submitted 30 May, 2009; originally announced June 2009.

Comments: 39 pages, 16 figures, TOC, index. A large portion of this report has been published at PDPTA'08. This 2007 report is a successor of the original DMARF work documented at arXiv:0905.2459 ; v2 adds missing .ind file for the index

ACM Class: C.2.4; I.5; I.2.6; D.2.10; D.2.11; D.2.5; D.2.2; I.2.7

Journal ref: Proceedings of PDPTA'08 (2008), Volume 2, pp. 948-954

Showing 1–20 of 20 results for author: Huynh, L