Search | arXiv e-print repository

Towards a Decentralised Application-Centric Orchestration Framework in the Cloud-Edge Continuum

Authors: Amjad Ullah, Andras Markus, Hacı İsmail Aslan, Tamas Kiss, Jozsef Kovacs, James Deslauriers, Amy L. Murphy, Yiming Wang Odej Kao

Abstract: The efficient management of complex distributed applications in the Cloud-Edge continuum, including their deployment on heterogeneous computing resources and run-time operations, presents significant challenges. Resource management solutions -- also called orchestrators -- play a pivotal role by automating and managing tasks such as resource discovery, optimisation, application deployment, and lif… ▽ More The efficient management of complex distributed applications in the Cloud-Edge continuum, including their deployment on heterogeneous computing resources and run-time operations, presents significant challenges. Resource management solutions -- also called orchestrators -- play a pivotal role by automating and managing tasks such as resource discovery, optimisation, application deployment, and lifecycle management, whilst ensuring the desired system performance. This paper introduces Swarmchestrate, a decentralised, application-centric orchestration framework inspired by the self-organising principles of Swarms. Swarmchestrate addresses the end-to-end management of distributed applications, from submission to optimal resource allocation across cloud and edge providers, as well as dynamic reconfiguration. Our initial findings include the implementation of the application deployment phase within a Cloud-Edge simulation environment, demonstrating the potential of Swarmchestrate. The results offer valuable insight into the coordination of resource offerings between various providers and optimised resource allocation, providing a foundation for designing scalable and efficient infrastructures. △ Less

Submitted 1 April, 2025; originally announced April 2025.

Comments: Accepted for publication in the 9th IEEE International Conference on Fog and Edge Computing 2025

arXiv:2401.15312 [pdf, other]

How We Refute Claims: Automatic Fact-Checking through Flaw Identification and Explanation

Authors: Wei-Yu Kao, An-Zi Yen

Abstract: Automated fact-checking is a crucial task in the governance of internet content. Although various studies utilize advanced models to tackle this issue, a significant gap persists in addressing complex real-world rumors and deceptive claims. To address this challenge, this paper explores the novel task of flaw-oriented fact-checking, including aspect generation and flaw identification. We also intr… ▽ More Automated fact-checking is a crucial task in the governance of internet content. Although various studies utilize advanced models to tackle this issue, a significant gap persists in addressing complex real-world rumors and deceptive claims. To address this challenge, this paper explores the novel task of flaw-oriented fact-checking, including aspect generation and flaw identification. We also introduce RefuteClaim, a new framework designed specifically for this task. Given the absence of an existing dataset, we present FlawCheck, a dataset created by extracting and transforming insights from expert reviews into relevant aspects and identified flaws. The experimental results underscore the efficacy of RefuteClaim, particularly in classifying and elucidating false claims. △ Less

Submitted 27 January, 2024; originally announced January 2024.

arXiv:2401.12019 [pdf, other]

Stereo-Matching Knowledge Distilled Monocular Depth Estimation Filtered by Multiple Disparity Consistency

Authors: Woonghyun Ka, Jae Young Lee, Jaehyun Choi, Junmo Kim

Abstract: In stereo-matching knowledge distillation methods of the self-supervised monocular depth estimation, the stereo-matching network's knowledge is distilled into a monocular depth network through pseudo-depth maps. In these methods, the learning-based stereo-confidence network is generally utilized to identify errors in the pseudo-depth maps to prevent transferring the errors. However, the learning-b… ▽ More In stereo-matching knowledge distillation methods of the self-supervised monocular depth estimation, the stereo-matching network's knowledge is distilled into a monocular depth network through pseudo-depth maps. In these methods, the learning-based stereo-confidence network is generally utilized to identify errors in the pseudo-depth maps to prevent transferring the errors. However, the learning-based stereo-confidence networks should be trained with ground truth (GT), which is not feasible in a self-supervised setting. In this paper, we propose a method to identify and filter errors in the pseudo-depth map using multiple disparity maps by checking their consistency without the need for GT and a training process. Experimental results show that the proposed method outperforms the previous methods and works well on various configurations by filtering out erroneous areas where the stereo-matching is vulnerable, especially such as textureless regions, occlusion boundaries, and reflective surfaces. △ Less

Submitted 22 January, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: ICASSP 2024. The first two authors are equally contributed

arXiv:2401.12001 [pdf, other]

Modeling Stereo-Confidence Out of the End-to-End Stereo-Matching Network via Disparity Plane Sweep

Authors: Jae Young Lee, Woonghyun Ka, Jaehyun Choi, Junmo Kim

Abstract: We propose a novel stereo-confidence that can be measured externally to various stereo-matching networks, offering an alternative input modality choice of the cost volume for learning-based approaches, especially in safety-critical systems. Grounded in the foundational concepts of disparity definition and the disparity plane sweep, the proposed stereo-confidence method is built upon the idea that… ▽ More We propose a novel stereo-confidence that can be measured externally to various stereo-matching networks, offering an alternative input modality choice of the cost volume for learning-based approaches, especially in safety-critical systems. Grounded in the foundational concepts of disparity definition and the disparity plane sweep, the proposed stereo-confidence method is built upon the idea that any shift in a stereo-image pair should be updated in a corresponding amount shift in the disparity map. Based on this idea, the proposed stereo-confidence method can be summarized in three folds. 1) Using the disparity plane sweep, multiple disparity maps can be obtained and treated as a 3-D volume (predicted disparity volume), like the cost volume is constructed. 2) One of these disparity maps serves as an anchor, allowing us to define a desirable (or ideal) disparity profile at every spatial point. 3) By comparing the desirable and predicted disparity profiles, we can quantify the level of matching ambiguity between left and right images for confidence measurement. Extensive experimental results using various stereo-matching networks and datasets demonstrate that the proposed stereo-confidence method not only shows competitive performance on its own but also consistent performance improvements when it is used as an input modality for learning-based stereo-confidence methods. △ Less

Submitted 22 January, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: AAAI 2024. The first two authors contributed equally

arXiv:2210.07496 [pdf, ps, other]

On the size of maximal binary codes with 2, 3, and 4 distances

Authors: Alexander Barg, Alexey Glazyrin, Wei-Jiun Kao, Ching-Yi Lai, Pin-Chieh Tseng, Wei-Hsuan Yu

Abstract: We address the maximum size of binary codes and binary constant weight codes with few distances. Previous works established a number of bounds for these quantities as well as the exact values for a range of small code lengths. As our main results, we determine the exact size of maximal binary codes with two distances for all lengths $n\ge 6$ as well as the exact size of maximal binary constant wei… ▽ More We address the maximum size of binary codes and binary constant weight codes with few distances. Previous works established a number of bounds for these quantities as well as the exact values for a range of small code lengths. As our main results, we determine the exact size of maximal binary codes with two distances for all lengths $n\ge 6$ as well as the exact size of maximal binary constant weight codes with 2,3, and 4 distances for several values of the weight and for all but small lengths. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Main text 23 pp. and Appendix 17pp

arXiv:2204.03219 [pdf, other]

DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores

Authors: Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee

Abstract: Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain adaptive pre-training to further pre-train self-supervised learning models on synthetic s… ▽ More Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain adaptive pre-training to further pre-train self-supervised learning models on synthetic speech. And a proposed module is added to model the opinion score distribution of each utterance. With the proposed components, DDOS outperforms previous works on BVCC dataset. And the zero shot transfer result on BC2019 dataset is significantly improved. DDOS also wins second place in Interspeech 2022 VoiceMOS challenge in terms of system-level score. △ Less

Submitted 15 August, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: Accepted to Interspeech 2022. Code will be available in the future

arXiv:2204.00352 [pdf, other]

On the Efficiency of Integrating Self-supervised Learning and Meta-learning for User-defined Few-shot Keyword Spotting

Authors: Wei-Tsung Kao, Yuan-Kuei Wu, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-Yi Lee

Abstract: User-defined keyword spotting is a task to detect new spoken terms defined by users. This can be viewed as a few-shot learning problem since it is unreasonable for users to define their desired keywords by providing many examples. To solve this problem, previous works try to incorporate self-supervised learning models or apply meta-learning algorithms. But it is unclear whether self-supervised lea… ▽ More User-defined keyword spotting is a task to detect new spoken terms defined by users. This can be viewed as a few-shot learning problem since it is unreasonable for users to define their desired keywords by providing many examples. To solve this problem, previous works try to incorporate self-supervised learning models or apply meta-learning algorithms. But it is unclear whether self-supervised learning and meta-learning are complementary and which combination of the two types of approaches is most effective for few-shot keyword discovery. In this work, we systematically study these questions by utilizing various self-supervised learning models and combining them with a wide variety of meta-learning algorithms. Our result shows that HuBERT combined with Matching network achieves the best result and is robust to the changes of few-shot examples. △ Less

Submitted 5 October, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

Comments: Accepted by SLT 2022

arXiv:2202.03822 [pdf, other]

A Novel Plug-in Module for Fine-Grained Visual Classification

Authors: Po-Yung Chou, Cheng-Hung Lin, Wen-Chung Kao

Abstract: Visual classification can be divided into coarse-grained and fine-grained classification. Coarse-grained classification represents categories with a large degree of dissimilarity, such as the classification of cats and dogs, while fine-grained classification represents classifications with a large degree of similarity, such as cat species, bird species, and the makes or models of vehicles. Unlike… ▽ More Visual classification can be divided into coarse-grained and fine-grained classification. Coarse-grained classification represents categories with a large degree of dissimilarity, such as the classification of cats and dogs, while fine-grained classification represents classifications with a large degree of similarity, such as cat species, bird species, and the makes or models of vehicles. Unlike coarse-grained visual classification, fine-grained visual classification often requires professional experts to label data, which makes data more expensive. To meet this challenge, many approaches propose to automatically find the most discriminative regions and use local features to provide more precise features. These approaches only require image-level annotations, thereby reducing the cost of annotation. However, most of these methods require two- or multi-stage architectures and cannot be trained end-to-end. Therefore, we propose a novel plug-in module that can be integrated to many common backbones, including CNN-based or Transformer-based networks to provide strongly discriminative regions. The plugin module can output pixel-level feature maps and fuse filtered features to enhance fine-grained visual classification. Experimental results show that the proposed plugin module outperforms state-of-the-art approaches and significantly improves the accuracy to 92.77\% and 92.83\% on CUB200-2011 and NABirds, respectively. We have released our source code in Github https://github.com/chou141253/FGVC-PIM.git. △ Less

Submitted 8 February, 2022; originally announced February 2022.

arXiv:2201.07436 [pdf, other]

Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth

Authors: Doyeon Kim, Woonghyun Ka, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim

Abstract: Depth estimation from a single image is an important task that can be applied to various fields in computer vision, and has grown rapidly with the development of convolutional neural networks. In this paper, we propose a novel structure and training strategy for monocular depth estimation to further improve the prediction accuracy of the network. We deploy a hierarchical transformer encoder to cap… ▽ More Depth estimation from a single image is an important task that can be applied to various fields in computer vision, and has grown rapidly with the development of convolutional neural networks. In this paper, we propose a novel structure and training strategy for monocular depth estimation to further improve the prediction accuracy of the network. We deploy a hierarchical transformer encoder to capture and convey the global context, and design a lightweight yet powerful decoder to generate an estimated depth map while considering local connectivity. By constructing connected paths between multi-scale local features and the global decoding stream with our proposed selective feature fusion module, the network can integrate both representations and recover fine details. In addition, the proposed decoder shows better performance than the previously proposed decoders, with considerably less computational complexity. Furthermore, we improve the depth-specific augmentation method by utilizing an important observation in depth estimation to enhance the model. Our network achieves state-of-the-art performance over the challenging depth dataset NYU Depth V2. Extensive experiments have been conducted to validate and show the effectiveness of the proposed approach. Finally, our model shows better generalisation ability and robustness than other comparative models. △ Less

Submitted 29 October, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

Comments: 11pages, 5 figures

arXiv:2111.05113 [pdf, other]

Membership Inference Attacks Against Self-supervised Speech Models

Authors: Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee

Abstract: Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention. SSL models pre-trained on a huge amount of unlabeled audio can generate general-purpose representations that benefit a wide variety of speech processing tasks. Despite their ubiquitous deployment, however, the potential privacy risks of these models have not been well investigated. In… ▽ More Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention. SSL models pre-trained on a huge amount of unlabeled audio can generate general-purpose representations that benefit a wide variety of speech processing tasks. Despite their ubiquitous deployment, however, the potential privacy risks of these models have not been well investigated. In this paper, we present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access. The experiment results show that these pre-trained models are vulnerable to MIA and prone to membership information leakage with high Area Under the Curve (AUC) in both utterance-level and speaker-level. Furthermore, we also conduct several ablation studies to understand the factors that contribute to the success of MIA. △ Less

Submitted 15 August, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

Comments: Accepted to Interspeech 2022. Code will be available in the future

arXiv:2104.03017 [pdf, other]

Utilizing Self-supervised Representations for MOS Prediction

Authors: Wei-Cheng Tseng, Chien-yu Huang, Wei-Tsung Kao, Yist Y. Lin, Hung-yi Lee

Abstract: Speech quality assessment has been a critical issue in speech processing for decades. Existing automatic evaluations usually require clean references or parallel ground truth data, which is infeasible when the amount of data soars. Subjective tests, on the other hand, do not need any additional clean or parallel data and correlates better to human perception. However, such a test is expensive and… ▽ More Speech quality assessment has been a critical issue in speech processing for decades. Existing automatic evaluations usually require clean references or parallel ground truth data, which is infeasible when the amount of data soars. Subjective tests, on the other hand, do not need any additional clean or parallel data and correlates better to human perception. However, such a test is expensive and time-consuming because crowd work is necessary. It thus becomes highly desired to develop an automatic evaluation approach that correlates well with human perception while not requiring ground truth data. In this paper, we use self-supervised pre-trained models for MOS prediction. We show their representations can distinguish between clean and noisy audios. Then, we fine-tune these pre-trained models followed by simple linear layers in an end-to-end manner. The experiment results showed that our framework outperforms the two previous state-of-the-art models by a significant improvement on Voice Conversion Challenge 2018 and achieves comparable or superior performance on Voice Conversion Challenge 2016. We also conducted an ablation study to further investigate how each module benefits the task. The experiment results are implemented and reproducible with publicly available toolkits. △ Less

Submitted 20 September, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: In Proceedings of Interspeech 2021. We acknowledge the support of AWS Machine Learning Research Awards program. Source code available at https://github.com/s3prl/s3prl/tree/master/s3prl/downstream/mos_prediction

arXiv:2103.07162 [pdf, other]

Is BERT a Cross-Disciplinary Knowledge Learner? A Surprising Finding of Pre-trained Models' Transferability

Authors: Wei-Tsung Kao, Hung-Yi Lee

Abstract: This paper investigates whether the power of the models pre-trained on text data, such as BERT, can be transferred to general token sequence classification applications. To verify pre-trained models' transferability, we test the pre-trained models on text classification tasks with meanings of tokens mismatches, and real-world non-text token sequence classification data, including amino acid, DNA,… ▽ More This paper investigates whether the power of the models pre-trained on text data, such as BERT, can be transferred to general token sequence classification applications. To verify pre-trained models' transferability, we test the pre-trained models on text classification tasks with meanings of tokens mismatches, and real-world non-text token sequence classification data, including amino acid, DNA, and music. We find that even on non-text data, the models pre-trained on text converge faster, perform better than the randomly initialized models, and only slightly worse than the models using task-specific knowledge. We also find that the representations of the text and non-text pre-trained models share non-trivial similarities. △ Less

Submitted 19 April, 2022; v1 submitted 12 March, 2021; originally announced March 2021.

Comments: Findings of EMNLP 2021

arXiv:2001.09309 [pdf, other]

BERT's output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT

Authors: Wei-Tsung Kao, Tsung-Han Wu, Po-Han Chi, Chun-Cheng Hsieh, Hung-Yi Lee

Abstract: Although Bidirectional Encoder Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks, it remains a black box. A variety of previous works have tried to lift the veil of BERT and understand each layer's functionality. In this paper, we found that surprisingly the output layer of BERT can reconstruct the input sentence by directly t… ▽ More Although Bidirectional Encoder Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks, it remains a black box. A variety of previous works have tried to lift the veil of BERT and understand each layer's functionality. In this paper, we found that surprisingly the output layer of BERT can reconstruct the input sentence by directly taking each layer of BERT as input, even though the output layer has never seen the input other than the final hidden layer. This fact remains true across a wide variety of BERT-based models, even when some layers are duplicated. Based on this observation, we propose a quite simple method to boost the performance of BERT. By duplicating some layers in the BERT-based models to make it deeper (no extra training required in this step), they obtain better performance in the downstream tasks after fine-tuning. △ Less

Submitted 15 February, 2021; v1 submitted 25 January, 2020; originally announced January 2020.

Comments: 7 pages, 8 figures, 3 tables

arXiv:1903.12258 [pdf, other]

Using Deep Learning Neural Networks and Candlestick Chart Representation to Predict Stock Market

Authors: Rosdyana Mangir Irawan Kusuma, Trang-Thi Ho, Wei-Chun Kao, Yu-Yen Ou, Kai-Lung Hua

Abstract: Stock market prediction is still a challenging problem because there are many factors effect to the stock market price such as company news and performance, industry performance, investor sentiment, social media sentiment and economic factors. This work explores the predictability in the stock market using Deep Convolutional Network and candlestick charts. The outcome is utilized to design a decis… ▽ More Stock market prediction is still a challenging problem because there are many factors effect to the stock market price such as company news and performance, industry performance, investor sentiment, social media sentiment and economic factors. This work explores the predictability in the stock market using Deep Convolutional Network and candlestick charts. The outcome is utilized to design a decision support framework that can be used by traders to provide suggested indications of future stock price direction. We perform this work using various types of neural networks like convolutional neural network, residual network and visual geometry group network. From stock market historical data, we converted it to candlestick charts. Finally, these candlestick charts will be feed as input for training a Convolutional Neural Network model. This Convolutional Neural Network model will help us to analyze the patterns inside the candlestick chart and predict the future movements of stock market. The effectiveness of our method is evaluated in stock market prediction with a promising results 92.2% and 92.1% accuracy for Taiwan and Indonesian stock market dataset respectively. The constructed model have been implemented as a web-based system freely available at http://140.138.155.216/deepcandle/ for predicting stock market using candlestick chart and deep learning neural networks. △ Less

Submitted 25 February, 2019; originally announced March 2019.

Comments: conference,13 pages,3 figures

Showing 1–14 of 14 results for author: Kao, W