-
Transfer Learning for Assessing Heavy Metal Pollution in Seaports Sediments
Authors:
Tin Lai,
Farnaz Farid,
Yueyang Kuan,
Xintian Zhang
Abstract:
Detecting heavy metal pollution in soils and seaports is vital for regional environmental monitoring. The Pollution Load Index (PLI), an international standard, is commonly used to assess heavy metal containment. However, the conventional PLI assessment involves laborious procedures and data analysis of sediment samples. To address this challenge, we propose a deep-learning-based model that simpli…
▽ More
Detecting heavy metal pollution in soils and seaports is vital for regional environmental monitoring. The Pollution Load Index (PLI), an international standard, is commonly used to assess heavy metal containment. However, the conventional PLI assessment involves laborious procedures and data analysis of sediment samples. To address this challenge, we propose a deep-learning-based model that simplifies the heavy metal assessment process. Our model tackles the issue of data scarcity in the water-sediment domain, which is traditionally plagued by challenges in data collection and varying standards across nations. By leveraging transfer learning, we develop an accurate quantitative assessment method for predicting PLI. Our approach allows the transfer of learned features across domains with different sets of features. We evaluate our model using data from six major ports in New South Wales, Australia: Port Yamba, Port Newcastle, Port Jackson, Port Botany, Port Kembla, and Port Eden. The results demonstrate significantly lower Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) of approximately 0.5 and 0.03, respectively, compared to other models. Our model performance is up to 2 orders of magnitude than other baseline models. Our proposed model offers an innovative, accessible, and cost-effective approach to predicting water quality, benefiting marine life conservation, aquaculture, and industrial pollution monitoring.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Target Attack Backdoor Malware Analysis and Attribution
Authors:
Anthony Cheuk Tung Lai,
Vitaly Kamluk,
Alan Ho,
Ping Fan Ke,
Byron Wai
Abstract:
Backdoor Malware are installed by an attacker on the victim's server(s) for authorized access. A customized backdoor is weaponized to execute unauthorized system, database and application commands to access the user credentials and confidential digital assets. Recently, we discovered and analyzed a targeted persistent module backdoor in Web Server in an online business company that was undetectabl…
▽ More
Backdoor Malware are installed by an attacker on the victim's server(s) for authorized access. A customized backdoor is weaponized to execute unauthorized system, database and application commands to access the user credentials and confidential digital assets. Recently, we discovered and analyzed a targeted persistent module backdoor in Web Server in an online business company that was undetectable by their deployed Anti-Virus software for a year. This led us to carry out research to detect this specific type of persistent module backdoor installed in Web servers. Other than typical Malware static analysis, we carry out analysis with binary similarity, strings, and command obfuscation over the backdoor, resulting in the Target Attack Backdoor Malware Analysis Matrix (TABMAX) for organizations to detect this sophisticated target attack backdoor instead of a general one which can be detected by Anti-Virus detectors. Our findings show that backdoor malware can be designed with different APIs, commands, strings, and query language on top of preferred libraries used by typical Malware.
△ Less
Submitted 5 February, 2025; v1 submitted 4 February, 2025;
originally announced February 2025.
-
An Attack-Driven Incident Response and Defense System (ADIRDS)
Authors:
Anthony Cheuk Tung Lai,
Siu Ming Yiu,
Ping Fan Ke,
Alan Ho
Abstract:
One of the major goals of incident response is to help an organization or a system owner to quickly identify and halt the attacks to minimize the damages (and financial loss) to the system being attacked. Typical incident responses rely very much on the log information captured by the system during the attacks and if needed, may need to isolate the victim from the network to avoid further destruct…
▽ More
One of the major goals of incident response is to help an organization or a system owner to quickly identify and halt the attacks to minimize the damages (and financial loss) to the system being attacked. Typical incident responses rely very much on the log information captured by the system during the attacks and if needed, may need to isolate the victim from the network to avoid further destructive attacks. However, there are real cases that there are insufficient log records/information for the incident response team to identify the attacks and their origins while the attacked system cannot be stopped due to service requirements (zero downtime online systems) such as online gaming sites. Typical incident response procedures and industrial standards do not provide an adequate solution to address this scenario. In this paper, being motivated by a real case, we propose a solution, called "Attack-Driven Incident Response and Defense System (ADIRDS)" to tackle this problem. ADIRDS is an online monitoring system to run with the real system. By modeling the real system as a graph, critical nodes/assets of the system are closely monitored. Instead of relying on the original logging system, evidence will be collected from the attack technique perspectives. To migrate the risks, realistic honeypots with very similar business context as the real system are deployed to trap the attackers. We successfully apply this system to a real case. Based on our experiments, we verify that our new approach of designing the realistic honeypots is effective, 38 unique attacker's IP addresses were captured. We also compare the performance of our realistic honey with both low and high interactive honeypots proposed in the literature, the results found that our proposed honeypot can successfully cheat the attackers to attack our honeypot, which verifies that our honeypot is more effective.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Ransomware IR Model: Proactive Threat Intelligence-Based Incident Response Strategy
Authors:
Anthony Cheuk Tung Lai,
Ping Fan Ke,
Alan Ho
Abstract:
Ransomware impact different organizations for years, it causes huge monetary, reputation loss and operation impact. Other than typical data encryption by ransomware, attackers can request ransom from the victim organizations via data extortion, otherwise, attackers will publish stolen data publicly in their ransomware dashboard forum and data-sharing platforms. However, there is no clear and prove…
▽ More
Ransomware impact different organizations for years, it causes huge monetary, reputation loss and operation impact. Other than typical data encryption by ransomware, attackers can request ransom from the victim organizations via data extortion, otherwise, attackers will publish stolen data publicly in their ransomware dashboard forum and data-sharing platforms. However, there is no clear and proven published incident response strategy to satisfy different business priorities and objectives under ransomware attack in detail. In this paper, we quote one of our representative front-line ransomware incident response experiences for Company X. Organization and incident responder can reference our established model strategy and implement proactive threat intelligence-based incident response architecture if one is under ransomware attack, which helps to respond the incident more effectively and speedy.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
Grasping by parallel shape matching
Authors:
Wenzheng Zhang,
Fahira Afzal Maken,
Tin Lai,
Fabio Ramos
Abstract:
Grasping is essential in robotic manipulation, yet challenging due to object and gripper diversity and real-world complexities. Traditional analytic approaches often have long optimization times, while data-driven methods struggle with unseen objects. This paper formulates the problem as a rigid shape matching between gripper and object, which optimizes with Annealed Stein Iterative Closest Point…
▽ More
Grasping is essential in robotic manipulation, yet challenging due to object and gripper diversity and real-world complexities. Traditional analytic approaches often have long optimization times, while data-driven methods struggle with unseen objects. This paper formulates the problem as a rigid shape matching between gripper and object, which optimizes with Annealed Stein Iterative Closest Point (AS-ICP) and leverages GPU-based parallelization. By incorporating the gripper's tool center point and the object's center of mass into the cost function and using a signed distance field of the gripper for collision checking, our method achieves robust grasps with low computational time. Experiments with the Kinova KG3 gripper show an 87.3% success rate and 0.926 s computation time across various objects and settings, highlighting its potential for real-world applications.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
Chronic Disease Diagnoses Using Behavioral Data
Authors:
Di Wang,
Yidan Hu,
Eng Sing Lee,
Hui Hwang Teong,
Ray Tian Rui Lai,
Wai Han Hoi,
Chunyan Miao
Abstract:
Early detection of chronic diseases is beneficial to healthcare by providing a golden opportunity for timely interventions. Although numerous prior studies have successfully used machine learning (ML) models for disease diagnoses, they highly rely on medical data, which are scarce for most patients in the early stage of the chronic diseases. In this paper, we aim to diagnose hyperglycemia (diabete…
▽ More
Early detection of chronic diseases is beneficial to healthcare by providing a golden opportunity for timely interventions. Although numerous prior studies have successfully used machine learning (ML) models for disease diagnoses, they highly rely on medical data, which are scarce for most patients in the early stage of the chronic diseases. In this paper, we aim to diagnose hyperglycemia (diabetes), hyperlipidemia, and hypertension (collectively known as 3H) using own collected behavioral data, thus, enable the early detection of 3H without using medical data collected in clinical settings. Specifically, we collected daily behavioral data from 629 participants over a 3-month study period, and trained various ML models after data preprocessing. Experimental results show that only using the participants' uploaded behavioral data, we can achieve accurate 3H diagnoses: 80.2\%, 71.3\%, and 81.2\% for diabetes, hyperlipidemia, and hypertension, respectively. Furthermore, we conduct Shapley analysis on the trained models to identify the most influential features for each type of diseases. The identified influential features are consistent with those reported in the literature.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Self-supervised transformer-based pre-training method with General Plant Infection dataset
Authors:
Zhengle Wang,
Ruifeng Wang,
Minjuan Wang,
Tianyun Lai,
Man Zhang
Abstract:
Pest and disease classification is a challenging issue in agriculture. The performance of deep learning models is intricately linked to training data diversity and quantity, posing issues for plant pest and disease datasets that remain underdeveloped. This study addresses these challenges by constructing a comprehensive dataset and proposing an advanced network architecture that combines Contrasti…
▽ More
Pest and disease classification is a challenging issue in agriculture. The performance of deep learning models is intricately linked to training data diversity and quantity, posing issues for plant pest and disease datasets that remain underdeveloped. This study addresses these challenges by constructing a comprehensive dataset and proposing an advanced network architecture that combines Contrastive Learning and Masked Image Modeling (MIM). The dataset comprises diverse plant species and pest categories, making it one of the largest and most varied in the field. The proposed network architecture demonstrates effectiveness in addressing plant pest and disease recognition tasks, achieving notable detection accuracy. This approach offers a viable solution for rapid, efficient, and cost-effective plant pest and disease detection, thereby reducing agricultural production costs. Our code and dataset will be publicly available to advance research in plant pest and disease recognition the GitHub repository at https://github.com/WASSER2545/GPID-22
△ Less
Submitted 20 July, 2024;
originally announced July 2024.
-
SarcNet: A Novel AI-based Framework to Automatically Analyze and Score Sarcomere Organizations in Fluorescently Tagged hiPSC-CMs
Authors:
Huyen Le,
Khiet Dang,
Tien Lai,
Nhung Nguyen,
Mai Tran,
Hieu Pham
Abstract:
Quantifying sarcomere structure organization in human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) is crucial for understanding cardiac disease pathology, improving drug screening, and advancing regenerative medicine. Traditional methods, such as manual annotation and Fourier transform analysis, are labor-intensive, error-prone, and lack high-throughput capabilities. In this st…
▽ More
Quantifying sarcomere structure organization in human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) is crucial for understanding cardiac disease pathology, improving drug screening, and advancing regenerative medicine. Traditional methods, such as manual annotation and Fourier transform analysis, are labor-intensive, error-prone, and lack high-throughput capabilities. In this study, we present a novel deep learning-based framework that leverages cell images and integrates cell features to automatically evaluate the sarcomere structure of hiPSC-CMs from the onset of differentiation. This framework overcomes the limitations of traditional methods through automated, high-throughput analysis, providing consistent, reliable results while accurately detecting complex sarcomere patterns across diverse samples. The proposed framework contains the SarcNet, a linear layers-added ResNet-18 module, to output a continuous score ranging from one to five that captures the level of sarcomere structure organization. It is trained and validated on an open-source dataset of hiPSC-CMs images with the endogenously GFP-tagged alpha-actinin-2 structure developed by the Allen Institute for Cell Science (AICS). SarcNet achieves a Spearman correlation of 0.831 with expert evaluations, demonstrating superior performance and an improvement of 0.075 over the current state-of-the-art approach, which uses Linear Regression. Our results also show a consistent pattern of increasing organization from day 18 to day 32 of differentiation, aligning with expert evaluations. By integrating the quantitative features calculated directly from the images with the visual features learned during the deep learning model, our framework offers a more comprehensive and accurate assessment, thereby enhancing the further utility of hiPSC-CMs in medical research and therapy development.
△ Less
Submitted 28 October, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Verifying Cake-Cutting, Faster
Authors:
Noah Bertram,
Tean Lai,
Justin Hsu
Abstract:
Envy-free cake-cutting protocols procedurally divide an infinitely divisible good among a set of agents so that no agent prefers another's allocation to their own. These protocols are highly complex and difficult to prove correct. Recently, Bertram, Levinson, and Hsu introduced a language called Slice for describing and verifying cake-cutting protocols. Slice programs can be translated to formulas…
▽ More
Envy-free cake-cutting protocols procedurally divide an infinitely divisible good among a set of agents so that no agent prefers another's allocation to their own. These protocols are highly complex and difficult to prove correct. Recently, Bertram, Levinson, and Hsu introduced a language called Slice for describing and verifying cake-cutting protocols. Slice programs can be translated to formulas encoding envy-freeness, which are solved by SMT. While Slice works well on smaller protocols, it has difficulty scaling to more complex cake-cutting protocols.
We improve Slice in two ways. First, we show any protocol execution in Slice can be replicated using piecewise uniform valuations. We then reduce Slice's constraint formulas to formulas within the theory of linear real arithmetic, showing that verifying envy-freeness is efficiently decidable. Second, we design and implement a linear type system which enforces that no two agents receive the same part of the good. We implement our methods and verify a range of challenging examples, including the first nontrivial four-agent protocol.
△ Less
Submitted 30 May, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Single-pixel 3D imaging based on fusion temporal data of single photon detector and millimeter-wave radar
Authors:
Tingqin Lai,
Xiaolin Liang,
Yi Zhu,
Xinyi Wu,
Lianye Liao,
Xuelin Yuan,
Ping Su,
Shihai Sun
Abstract:
Recently, there has been increased attention towards 3D imaging using single-pixel single-photon detection (also known as temporal data) due to its potential advantages in terms of cost and power efficiency. However, to eliminate the symmetry blur in the reconstructed images, a fixed background is required. This paper proposes a fusion-data-based 3D imaging method that utilizes a single-pixel sing…
▽ More
Recently, there has been increased attention towards 3D imaging using single-pixel single-photon detection (also known as temporal data) due to its potential advantages in terms of cost and power efficiency. However, to eliminate the symmetry blur in the reconstructed images, a fixed background is required. This paper proposes a fusion-data-based 3D imaging method that utilizes a single-pixel single-photon detector and a millimeter-wave radar to capture temporal histograms of a scene from multiple perspectives. Subsequently, the 3D information can be reconstructed from the one-dimensional fusion temporal data by using Artificial Neural Network (ANN). Both the simulation and experimental results demonstrate that our fusion method effectively eliminates symmetry blur and improves the quality of the reconstructed images.
△ Less
Submitted 20 October, 2023;
originally announced December 2023.
-
Open-Set Multivariate Time-Series Anomaly Detection
Authors:
Thomas Lai,
Thi Kieu Khanh Ho,
Narges Armanfard
Abstract:
Numerous methods for time-series anomaly detection (TSAD) have emerged in recent years, most of which are unsupervised and assume that only normal samples are available during the training phase, due to the challenge of obtaining abnormal data in real-world scenarios. Still, limited samples of abnormal data are often available, albeit they are far from representative of all possible anomalies. Sup…
▽ More
Numerous methods for time-series anomaly detection (TSAD) have emerged in recent years, most of which are unsupervised and assume that only normal samples are available during the training phase, due to the challenge of obtaining abnormal data in real-world scenarios. Still, limited samples of abnormal data are often available, albeit they are far from representative of all possible anomalies. Supervised methods can be utilized to classify normal and seen anomalies, but they tend to overfit to the seen anomalies present during training, hence, they fail to generalize to unseen anomalies. We propose the first algorithm to address the open-set TSAD problem, called Multivariate Open-Set Time-Series Anomaly Detector (MOSAD), that leverages only a few shots of labeled anomalies during the training phase in order to achieve superior anomaly detection performance compared to both supervised and unsupervised TSAD algorithms. MOSAD is a novel multi-head TSAD framework with a shared representation space and specialized heads, including the Generative head, the Discriminative head, and the Anomaly-Aware Contrastive head. The latter produces a superior representation space for anomaly detection compared to conventional supervised contrastive learning. Extensive experiments on three real-world datasets establish MOSAD as a new state-of-the-art in the TSAD field.
△ Less
Submitted 7 August, 2024; v1 submitted 18 October, 2023;
originally announced October 2023.
-
Interpretable Medical Imagery Diagnosis with Self-Attentive Transformers: A Review of Explainable AI for Health Care
Authors:
Tin Lai
Abstract:
Recent advancements in artificial intelligence (AI) have facilitated its widespread adoption in primary medical services, addressing the demand-supply imbalance in healthcare. Vision Transformers (ViT) have emerged as state-of-the-art computer vision models, benefiting from self-attention modules. However, compared to traditional machine-learning approaches, deep-learning models are complex and ar…
▽ More
Recent advancements in artificial intelligence (AI) have facilitated its widespread adoption in primary medical services, addressing the demand-supply imbalance in healthcare. Vision Transformers (ViT) have emerged as state-of-the-art computer vision models, benefiting from self-attention modules. However, compared to traditional machine-learning approaches, deep-learning models are complex and are often treated as a "black box" that can cause uncertainty regarding how they operate. Explainable Artificial Intelligence (XAI) refers to methods that explain and interpret machine learning models' inner workings and how they come to decisions, which is especially important in the medical domain to guide the healthcare decision-making process. This review summarises recent ViT advancements and interpretative approaches to understanding the decision-making process of ViT, enabling transparency in medical diagnosis applications.
△ Less
Submitted 1 September, 2023;
originally announced September 2023.
-
Path Signatures for Diversity in Probabilistic Trajectory Optimisation
Authors:
Lucas Barcelos,
Tin Lai,
Rafael Oliveira,
Paulo Borges,
Fabio Ramos
Abstract:
Motion planning can be cast as a trajectory optimisation problem where a cost is minimised as a function of the trajectory being generated. In complex environments with several obstacles and complicated geometry, this optimisation problem is usually difficult to solve and prone to local minima. However, recent advancements in computing hardware allow for parallel trajectory optimisation where mult…
▽ More
Motion planning can be cast as a trajectory optimisation problem where a cost is minimised as a function of the trajectory being generated. In complex environments with several obstacles and complicated geometry, this optimisation problem is usually difficult to solve and prone to local minima. However, recent advancements in computing hardware allow for parallel trajectory optimisation where multiple solutions are obtained simultaneously, each initialised from a different starting point. Unfortunately, without a strategy preventing two solutions to collapse on each other, naive parallel optimisation can suffer from mode collapse diminishing the efficiency of the approach and the likelihood of finding a global solution. In this paper we leverage on recent advances in the theory of rough paths to devise an algorithm for parallel trajectory optimisation that promotes diversity over the range of solutions, therefore avoiding mode collapses and achieving better global properties. Our approach builds on path signatures and Hilbert space representations of trajectories, and connects parallel variational inference for trajectory estimation with diversity promoting kernels. We empirically demonstrate that this strategy achieves lower average costs than competing alternatives on a range of problems, from 2D navigation to robotic manipulators operating in cluttered environments.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Psy-LLM: Scaling up Global Mental Health Psychological Services with AI-based Large Language Models
Authors:
Tin Lai,
Yukun Shi,
Zicong Du,
Jiajie Wu,
Ken Fu,
Yichao Dou,
Ziqi Wang
Abstract:
The demand for psychological counselling has grown significantly in recent years, particularly with the global outbreak of COVID-19, which has heightened the need for timely and professional mental health support. Online psychological counselling has emerged as the predominant mode of providing services in response to this demand. In this study, we propose the Psy-LLM framework, an AI-based assist…
▽ More
The demand for psychological counselling has grown significantly in recent years, particularly with the global outbreak of COVID-19, which has heightened the need for timely and professional mental health support. Online psychological counselling has emerged as the predominant mode of providing services in response to this demand. In this study, we propose the Psy-LLM framework, an AI-based assistive tool leveraging Large Language Models (LLMs) for question-answering in psychological consultation settings to ease the demand for mental health professions. Our framework combines pre-trained LLMs with real-world professional Q\&A from psychologists and extensively crawled psychological articles. The Psy-LLM framework serves as a front-end tool for healthcare professionals, allowing them to provide immediate responses and mindfulness activities to alleviate patient stress. Additionally, it functions as a screening tool to identify urgent cases requiring further assistance. We evaluated the framework using intrinsic metrics, such as perplexity, and extrinsic evaluation metrics, with human participant assessments of response helpfulness, fluency, relevance, and logic. The results demonstrate the effectiveness of the Psy-LLM framework in generating coherent and relevant answers to psychological questions. This article discusses the potential and limitations of using large language models to enhance mental health support through AI technologies.
△ Less
Submitted 1 September, 2023; v1 submitted 22 July, 2023;
originally announced July 2023.
-
Ensemble Learning based Anomaly Detection for IoT Cybersecurity via Bayesian Hyperparameters Sensitivity Analysis
Authors:
Tin Lai,
Farnaz Farid,
Abubakar Bello,
Fariza Sabrina
Abstract:
The Internet of Things (IoT) integrates more than billions of intelligent devices over the globe with the capability of communicating with other connected devices with little to no human intervention. IoT enables data aggregation and analysis on a large scale to improve life quality in many domains. In particular, data collected by IoT contain a tremendous amount of information for anomaly detecti…
▽ More
The Internet of Things (IoT) integrates more than billions of intelligent devices over the globe with the capability of communicating with other connected devices with little to no human intervention. IoT enables data aggregation and analysis on a large scale to improve life quality in many domains. In particular, data collected by IoT contain a tremendous amount of information for anomaly detection. The heterogeneous nature of IoT is both a challenge and an opportunity for cybersecurity. Traditional approaches in cybersecurity monitoring often require different kinds of data pre-processing and handling for various data types, which might be problematic for datasets that contain heterogeneous features. However, heterogeneous types of network devices can often capture a more diverse set of signals than a single type of device readings, which is particularly useful for anomaly detection. In this paper, we present a comprehensive study on using ensemble machine learning methods for enhancing IoT cybersecurity via anomaly detection. Rather than using one single machine learning model, ensemble learning combines the predictive power from multiple models, enhancing their predictive accuracy in heterogeneous datasets rather than using one single machine learning model. We propose a unified framework with ensemble learning that utilises Bayesian hyperparameter optimisation to adapt to a network environment that contains multiple IoT sensor readings. Experimentally, we illustrate their high predictive power when compared to traditional methods.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Real-time Aerial Detection and Reasoning on Embedded-UAVs
Authors:
Tin Lai
Abstract:
We present a unified pipeline architecture for a real-time detection system on an embedded system for UAVs. Neural architectures have been the industry standard for computer vision. However, most existing works focus solely on concatenating deeper layers to achieve higher accuracy with run-time performance as the trade-off. This pipeline of networks can exploit the domain-specific knowledge on aer…
▽ More
We present a unified pipeline architecture for a real-time detection system on an embedded system for UAVs. Neural architectures have been the industry standard for computer vision. However, most existing works focus solely on concatenating deeper layers to achieve higher accuracy with run-time performance as the trade-off. This pipeline of networks can exploit the domain-specific knowledge on aerial pedestrian detection and activity recognition for the emerging UAV applications of autonomous surveying and activity reporting. In particular, our pipeline architectures operate in a time-sensitive manner, have high accuracy in detecting pedestrians from various aerial orientations, use a novel attention map for multi-activities recognition, and jointly refine its detection with temporal information. Numerically, we demonstrate our model's accuracy and fast inference speed on embedded systems. We empirically deployed our prototype hardware with full live feeds in a real-world open-field environment.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
Hybrid Fuzzy-Crisp Clustering Algorithm: Theory and Experiments
Authors:
Akira R. Kinjo,
Daphne Teck Ching Lai
Abstract:
With the membership function being strictly positive, the conventional fuzzy c-means clustering method sometimes causes imbalanced influence when clusters of vastly different sizes exist. That is, an outstandingly large cluster drags to its center all the other clusters, however far they are separated. To solve this problem, we propose a hybrid fuzzy-crisp clustering algorithm based on a target fu…
▽ More
With the membership function being strictly positive, the conventional fuzzy c-means clustering method sometimes causes imbalanced influence when clusters of vastly different sizes exist. That is, an outstandingly large cluster drags to its center all the other clusters, however far they are separated. To solve this problem, we propose a hybrid fuzzy-crisp clustering algorithm based on a target function combining linear and quadratic terms of the membership function. In this algorithm, the membership of a data point to a cluster is automatically set to exactly zero if the data point is ``sufficiently'' far from the cluster center. In this paper, we present a new algorithm for hybrid fuzzy-crisp clustering along with its geometric interpretation. The algorithm is tested on twenty simulated data generated and five real-world datasets from the UCI repository and compared with conventional fuzzy and crisp clustering methods. The proposed algorithm is demonstrated to outperform the conventional methods on imbalanced datasets and can be competitive on more balanced datasets.
△ Less
Submitted 25 March, 2023;
originally announced March 2023.
-
Sequential Graph Attention Learning for Predicting Dynamic Stock Trends (Student Abstract)
Authors:
Tzu-Ya Lai,
Wen Jung Cheng,
Jun-En Ding
Abstract:
The stock market is characterized by a complex relationship between companies and the market. This study combines a sequential graph structure with attention mechanisms to learn global and local information within temporal time. Specifically, our proposed "GAT-AGNN" module compares model performance across multiple industries as well as within single industries. The results show that the proposed…
▽ More
The stock market is characterized by a complex relationship between companies and the market. This study combines a sequential graph structure with attention mechanisms to learn global and local information within temporal time. Specifically, our proposed "GAT-AGNN" module compares model performance across multiple industries as well as within single industries. The results show that the proposed framework outperforms the state-of-the-art methods in predicting stock trends across multiple industries on Taiwan Stock datasets.
△ Less
Submitted 15 January, 2023;
originally announced January 2023.
-
Ensemble Transfer Learning for Multilingual Coreference Resolution
Authors:
Tuan Manh Lai,
Heng Ji
Abstract:
Entity coreference resolution is an important research problem with many applications, including information extraction and question answering. Coreference resolution for English has been studied extensively. However, there is relatively little work for other languages. A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data. To overcome…
▽ More
Entity coreference resolution is an important research problem with many applications, including information extraction and question answering. Coreference resolution for English has been studied extensively. However, there is relatively little work for other languages. A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data. To overcome this challenge, we design a simple but effective ensemble-based framework that combines various transfer learning (TL) techniques. We first train several models using different TL methods. Then, during inference, we compute the unweighted average scores of the models' predictions to extract the final set of predicted clusters. Furthermore, we also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts. Leveraging the idea that the coreferential links naturally exist between anchor texts pointing to the same article, our method builds a sizeable distantly-supervised dataset for the target language that consists of tens of thousands of documents. We can pre-train a model on the pseudo-labeled dataset before finetuning it on the final target dataset. Experimental results on two benchmark datasets, OntoNotes and SemEval, confirm the effectiveness of our methods. Our best ensembles consistently outperform the baseline approach of simple training by up to 7.68% in the F1 score. These ensembles also achieve new state-of-the-art results for three languages: Arabic, Dutch, and Spanish.
△ Less
Submitted 22 January, 2023;
originally announced January 2023.
-
Profiling Obese Subgroups in National Health and Nutritional Status Survey Data using Machine Learning Techniques: A Case Study from Brunei Darussalam
Authors:
Usman Khalil,
Owais Ahmed Malik,
Daphne Teck Ching Lai,
Ong Sok King
Abstract:
National Health and Nutritional Status Survey (NHANSS) is conducted annually by the Ministry of Health in Negara Brunei Darussalam to assess the population health and nutritional patterns and characteristics. The main aim of this study was to discover meaningful patterns (groups) from the obese sample of NHANSS data by applying data reduction and interpretation techniques. The mixed nature of the…
▽ More
National Health and Nutritional Status Survey (NHANSS) is conducted annually by the Ministry of Health in Negara Brunei Darussalam to assess the population health and nutritional patterns and characteristics. The main aim of this study was to discover meaningful patterns (groups) from the obese sample of NHANSS data by applying data reduction and interpretation techniques. The mixed nature of the variables (qualitative and quantitative) in the data set added novelty to the study. Accordingly, the Categorical Principal Component (CATPCA) technique was chosen to interpret the meaningful results. The relationships between obesity and the lifestyle factors like demography, socioeconomic status, physical activity, dietary behavior, history of blood pressure, diabetes, etc., were determined based on the principal components generated by CATPCA. The results were validated with the help of the split method technique to counter verify the authenticity of the generated groups. Based on the analysis and results, two subgroups were found in the data set, and the salient features of these subgroups have been reported. These results can be proposed for the betterment of the healthcare industry.
△ Less
Submitted 9 November, 2022;
originally announced November 2022.
-
Adaptive Data Depth via Multi-Armed Bandits
Authors:
Tavor Z. Baharav,
Tze Leung Lai
Abstract:
Data depth, introduced by Tukey (1975), is an important tool in data science, robust statistics, and computational geometry. One chief barrier to its broader practical utility is that many common measures of depth are computationally intensive, requiring on the order of $n^d$ operations to exactly compute the depth of a single point within a data set of $n$ points in $d$-dimensional space. Often h…
▽ More
Data depth, introduced by Tukey (1975), is an important tool in data science, robust statistics, and computational geometry. One chief barrier to its broader practical utility is that many common measures of depth are computationally intensive, requiring on the order of $n^d$ operations to exactly compute the depth of a single point within a data set of $n$ points in $d$-dimensional space. Often however, we are not directly interested in the absolute depths of the points, but rather in their relative ordering. For example, we may want to find the most central point in a data set (a generalized median), or to identify and remove all outliers (points on the fringe of the data set with low depth). With this observation, we develop a novel and instance-adaptive algorithm for adaptive data depth computation by reducing the problem of exactly computing $n$ depths to an $n$-armed stochastic multi-armed bandit problem which we can efficiently solve. We focus our exposition on simplicial depth, developed by Liu (1990), which has emerged as a promising notion of depth due to its interpretability and asymptotic properties. We provide general instance-dependent theoretical guarantees for our proposed algorithms, which readily extend to many other common measures of data depth including majority depth, Oja depth, and likelihood depth. When specialized to the case where the gaps in the data follow a power law distribution with parameter $α<2$, we show that we can reduce the complexity of identifying the deepest point in the data set (the simplicial median) from $O(n^d)$ to $\tilde{O}(n^{d-(d-1)α/2})$, where $\tilde{O}$ suppresses logarithmic factors. We corroborate our theoretical results with numerical experiments on synthetic data, showing the practical utility of our proposed methods.
△ Less
Submitted 9 November, 2022; v1 submitted 7 November, 2022;
originally announced November 2022.
-
Achieving mouse-level strategic evasion performance using real-time computational planning
Authors:
German Espinosa,
Gabrielle E. Wink,
Alexander T. Lai,
Daniel A. Dombeck,
Malcolm A. MacIver
Abstract:
Planning is an extraordinary ability in which the brain imagines and then enacts evaluated possible futures. Using traditional planning models, computer scientists have attempted to replicate this capacity with some level of success but ultimately face a reoccurring limitation: as the plan grows in steps, the number of different possible futures makes it intractable to determine the right sequence…
▽ More
Planning is an extraordinary ability in which the brain imagines and then enacts evaluated possible futures. Using traditional planning models, computer scientists have attempted to replicate this capacity with some level of success but ultimately face a reoccurring limitation: as the plan grows in steps, the number of different possible futures makes it intractable to determine the right sequence of actions to reach a goal state. Based on prior theoretical work on how the ecology of an animal governs the value of spatial planning, we developed a more efficient biologically-inspired planning algorithm, TLPPO. This algorithm allows us to achieve mouselevel predator evasion performance with orders of magnitude less computation than a widespread algorithm for planning in the situations of partial observability that typify predator-prey interactions. We compared the performance of a real-time agent using TLPPO against the performance of live mice, all tasked with evading a robot predator. We anticipate these results will be helpful to planning algorithm users and developers, as well as to areas of neuroscience where robot-animal interaction can provide a useful approach to studying the basis of complex behaviors.
△ Less
Submitted 8 November, 2022; v1 submitted 4 November, 2022;
originally announced November 2022.
-
Comparative analysis of real bugs in open-source Machine Learning projects -- A Registered Report
Authors:
Tuan Dung Lai,
Anj Simmons,
Scott Barnett,
Jean-Guy Schneider,
Rajesh Vasa
Abstract:
Background: Machine Learning (ML) systems rely on data to make predictions, the systems have many added components compared to traditional software systems such as the data processing pipeline, serving pipeline, and model training. Existing research on software maintenance has studied the issue-reporting needs and resolution process for different types of issues, such as performance and security i…
▽ More
Background: Machine Learning (ML) systems rely on data to make predictions, the systems have many added components compared to traditional software systems such as the data processing pipeline, serving pipeline, and model training. Existing research on software maintenance has studied the issue-reporting needs and resolution process for different types of issues, such as performance and security issues. However, ML systems have specific classes of faults, and reporting ML issues requires domain-specific information. Because of the different characteristics between ML and traditional Software Engineering systems, we do not know to what extent the reporting needs are different, and to what extent these differences impact the issue resolution process. Objective: Our objective is to investigate whether there is a discrepancy in the distribution of resolution time between ML and non-ML issues and whether certain categories of ML issues require a longer time to resolve based on real issue reports in open-source applied ML projects. We further investigate the size of fix of ML issues and non-ML issues. Method: We extract issues reports, pull requests and code files in recent active applied ML projects from Github, and use an automatic approach to filter ML and non-ML issues. We manually label the issues using a known taxonomy of deep learning bugs. We measure the resolution time and size of fix of ML and non-ML issues on a controlled sample and compare the distributions for each category of issue.
△ Less
Submitted 20 September, 2022;
originally announced September 2022.
-
A Review on Visual-SLAM: Advancements from Geometric Modelling to Learning-based Semantic Scene Understanding
Authors:
Tin Lai
Abstract:
Simultaneous Localisation and Mapping (SLAM) is one of the fundamental problems in autonomous mobile robots where a robot needs to reconstruct a previously unseen environment while simultaneously localising itself with respect to the map. In particular, Visual-SLAM uses various sensors from the mobile robot for collecting and sensing a representation of the map. Traditionally, geometric model-base…
▽ More
Simultaneous Localisation and Mapping (SLAM) is one of the fundamental problems in autonomous mobile robots where a robot needs to reconstruct a previously unseen environment while simultaneously localising itself with respect to the map. In particular, Visual-SLAM uses various sensors from the mobile robot for collecting and sensing a representation of the map. Traditionally, geometric model-based techniques were used to tackle the SLAM problem, which tends to be error-prone under challenging environments. Recent advancements in computer vision, such as deep learning techniques, have provided a data-driven approach to tackle the Visual-SLAM problem. This review summarises recent advancements in the Visual-SLAM domain using various learning-based methods. We begin by providing a concise overview of the geometric model-based approaches, followed by technical reviews on the current paradigms in SLAM. Then, we present the various learning-based approaches to collecting sensory inputs from mobile robots and performing scene understanding. The current paradigms in deep-learning-based semantic understanding are discussed and placed under the context of Visual-SLAM. Finally, we discuss challenges and further opportunities in the direction of learning-based approaches in Visual-SLAM.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
Discover Life Skills for Planning with Bandits via Observing and Learning How the World Works
Authors:
Tin Lai
Abstract:
We propose a novel approach for planning agents to compose abstract skills via observing and learning from historical interactions with the world. Our framework operates in a Markov state-space model via a set of actions under unknown pre-conditions. We formulate skills as high-level abstract policies that propose action plans based on the current state. Each policy learns new plans by observing t…
▽ More
We propose a novel approach for planning agents to compose abstract skills via observing and learning from historical interactions with the world. Our framework operates in a Markov state-space model via a set of actions under unknown pre-conditions. We formulate skills as high-level abstract policies that propose action plans based on the current state. Each policy learns new plans by observing the states' transitions while the agent interacts with the world. Such an approach automatically learns new plans to achieve specific intended effects, but the success of such plans is often dependent on the states in which they are applicable. Therefore, we formulate the evaluation of such plans as infinitely many multi-armed bandit problems, where we balance the allocation of resources on evaluating the success probability of existing arms and exploring new options. The result is a planner capable of automatically learning robust high-level skills under a noisy environment; such skills implicitly learn the action pre-condition without explicit knowledge. We show that this planning approach is experimentally very competitive in high-dimensional state space domains.
△ Less
Submitted 17 July, 2022;
originally announced July 2022.
-
Translation between Molecules and Natural Language
Authors:
Carl Edwards,
Tuan Lai,
Kevin Ros,
Garrett Honke,
Kyunghyun Cho,
Heng Ji
Abstract:
We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings. $\textbf{MolT5}$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation (altogether: translation between molecules and language), wh…
▽ More
We present $\textbf{MolT5}$ $-$ a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings. $\textbf{MolT5}$ allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation (altogether: translation between molecules and language), which we explore for the first time. Since $\textbf{MolT5}$ pretrains models on single-modal data, it helps overcome the chemistry domain shortcoming of data scarcity. Furthermore, we consider several metrics, including a new cross-modal embedding-based metric, to evaluate the tasks of molecule captioning and text-based molecule generation. Our results show that $\textbf{MolT5}$-based models are able to generate outputs, both molecules and captions, which in many cases are high quality.
△ Less
Submitted 3 November, 2022; v1 submitted 25 April, 2022;
originally announced April 2022.
-
Water and Sediment Analyse Using Predictive Models
Authors:
Xiaoting Xu,
Tin Lai,
Sayka Jahan,
Farnaz Farid
Abstract:
The increasing prevalence of marine pollution during the past few decades motivated recent research to help ease the situation. Typical water quality assessment requires continuous monitoring of water and sediments at remote locations with labour intensive laboratory tests to determine the degree of pollution. We propose an automated framework where we formalise a predictive model using Machine Le…
▽ More
The increasing prevalence of marine pollution during the past few decades motivated recent research to help ease the situation. Typical water quality assessment requires continuous monitoring of water and sediments at remote locations with labour intensive laboratory tests to determine the degree of pollution. We propose an automated framework where we formalise a predictive model using Machine Learning to infer the water quality and level of pollution using collected water and sediments samples. One commonly encountered difficulty performing statistical analysis with water and sediment is the limited amount of data samples and incomplete dataset due to the sparsity of sample collection location. To this end, we performed extensive investigation on various data imputation methods' performance in water and sediment datasets with various data missing rates. Empirically, we show that our best model archives an accuracy of 75% after accounting for 57% of missing data. Experimentally, we show that our model would assist in assessing water quality screening based on possibly incomplete real-world data.
△ Less
Submitted 3 March, 2022;
originally announced March 2022.
-
L4KDE: Learning for KinoDynamic Tree Expansion
Authors:
Tin Lai,
Weiming Zhi,
Tucker Hermans,
Fabio Ramos
Abstract:
We present the Learning for KinoDynamic Tree Expansion (L4KDE) method for kinodynamic planning. Tree-based planning approaches, such as rapidly exploring random tree (RRT), are the dominant approach to finding globally optimal plans in continuous state-space motion planning. Central to these approaches is tree-expansion, the procedure in which new nodes are added into an ever-expanding tree. We st…
▽ More
We present the Learning for KinoDynamic Tree Expansion (L4KDE) method for kinodynamic planning. Tree-based planning approaches, such as rapidly exploring random tree (RRT), are the dominant approach to finding globally optimal plans in continuous state-space motion planning. Central to these approaches is tree-expansion, the procedure in which new nodes are added into an ever-expanding tree. We study the kinodynamic variants of tree-based planning, where we have known system dynamics and kinematic constraints. In the interest of quickly selecting nodes to connect newly sampled coordinates, existing methods typically cannot optimise to find nodes that have low cost to transition to sampled coordinates. Instead, they use metrics like Euclidean distance between coordinates as a heuristic for selecting candidate nodes to connect to the search tree. We propose L4KDE to address this issue. L4KDE uses a neural network to predict transition costs between queried states, which can be efficiently computed in batch, providing much higher quality estimates of transition cost compared to commonly used heuristics while maintaining almost-surely asymptotic optimality guarantee. We empirically demonstrate the significant performance improvement provided by L4KDE on a variety of challenging system dynamics, with the ability to generalise across different instances of the same model class, and in conjunction with a suite of modern tree-based motion planners.
△ Less
Submitted 17 September, 2023; v1 submitted 2 March, 2022;
originally announced March 2022.
-
Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking
Authors:
Tuan Manh Lai,
Heng Ji,
ChengXiang Zhai
Abstract:
Entity linking (EL) is the task of linking entity mentions in a document to referent entities in a knowledge base (KB). Many previous studies focus on Wikipedia-derived KBs. There is little work on EL over Wikidata, even though it is the most extensive crowdsourced KB. The scale of Wikidata can open up many new real-world applications, but its massive number of entities also makes EL challenging.…
▽ More
Entity linking (EL) is the task of linking entity mentions in a document to referent entities in a knowledge base (KB). Many previous studies focus on Wikipedia-derived KBs. There is little work on EL over Wikidata, even though it is the most extensive crowdsourced KB. The scale of Wikidata can open up many new real-world applications, but its massive number of entities also makes EL challenging. To effectively narrow down the search space, we propose a novel candidate retrieval paradigm based on entity profiling. Wikidata entities and their textual fields are first indexed into a text search engine (e.g., Elasticsearch). During inference, given a mention and its context, we use a sequence-to-sequence (seq2seq) model to generate the profile of the target entity, which consists of its title and description. We use the profile to query the indexed search engine to retrieve candidate entities. Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary, enabling us to further design a highly effective hybrid method for candidate retrieval. Combined with a simple cross-attention reranker, our complete EL framework achieves state-of-the-art results on three Wikidata-based datasets and strong performance on TACKBP-2010.
△ Less
Submitted 14 March, 2022; v1 submitted 27 February, 2022;
originally announced February 2022.
-
A Novel Skeleton-Based Human Activity Discovery Using Particle Swarm Optimization with Gaussian Mutation
Authors:
Parham Hadikhani,
Daphne Teck Ching Lai,
Wee-Hong Ong
Abstract:
Human activity discovery aims to cluster the activities performed by humans without any prior information on what defines each activity. Most methods presented in human activity recognition are supervised, where there are labeled inputs to train the system. In reality, it is difficult to label activities data because of its huge volume and the variety of human activities. This paper proposes an un…
▽ More
Human activity discovery aims to cluster the activities performed by humans without any prior information on what defines each activity. Most methods presented in human activity recognition are supervised, where there are labeled inputs to train the system. In reality, it is difficult to label activities data because of its huge volume and the variety of human activities. This paper proposes an unsupervised framework to perform human activity discovery in 3D skeleton sequences. First, an approach for data pre-processing is presented. In this stage, important frames are selected based on kinetic energy. Next, the displacement of joints, statistical displacements, angles, and orientation features are extracted to represent the activities information. Since not all extracted features have useful information, the dimension of features is reduced using PCA. Most methods proposed for human activity discovery are not fully unsupervised. They use pre-segmented videos before categorizing activities. To deal with this, we have used a sliding time window to segment the time series of activities with some overlapping. Then, activities are discovered by our proposed Hybrid Particle swarm optimization (PSO) with Gaussian Mutation and K-means (HPGMK) algorithm to provide diverse solutions. PSO is used due to its straightforward idea and powerful global search capability which can identify the ideal solution in a few iterations. Finally, k-means is applied to the outcome centroids from each iteration of the PSO to overcome the slow convergence rate of PSO. The experiment results on five datasets show that the proposed framework has superior performance in discovering activities compared to the other state-of-the-art methods and has increased accuracy of at least 4% on average.
△ Less
Submitted 18 October, 2022; v1 submitted 14 January, 2022;
originally announced January 2022.
-
Learning Non-Stationary Time-Series with Dynamic Pattern Extractions
Authors:
Xipei Wang,
Haoyu Zhang,
Yuanbo Zhang,
Meng Wang,
Jiarui Song,
Tin Lai,
Matloob Khushi
Abstract:
The era of information explosion had prompted the accumulation of a tremendous amount of time-series data, including stationary and non-stationary time-series data. State-of-the-art algorithms have achieved a decent performance in dealing with stationary temporal data. However, traditional algorithms that tackle stationary time-series do not apply to non-stationary series like Forex trading. This…
▽ More
The era of information explosion had prompted the accumulation of a tremendous amount of time-series data, including stationary and non-stationary time-series data. State-of-the-art algorithms have achieved a decent performance in dealing with stationary temporal data. However, traditional algorithms that tackle stationary time-series do not apply to non-stationary series like Forex trading. This paper investigates applicable models that can improve the accuracy of forecasting future trends of non-stationary time-series sequences. In particular, we focus on identifying potential models and investigate the effects of recognizing patterns from historical data. We propose a combination of \rebuttal{the} seq2seq model based on RNN, along with an attention mechanism and an enriched set features extracted via dynamic time warping and zigzag peak valley indicators. Customized loss functions and evaluating metrics have been designed to focus more on the predicting sequence's peaks and valley points. Our results show that our model can predict 4-hour future trends with high accuracy in the Forex dataset, which is crucial in realistic scenarios to assist foreign exchange trading decision making. We further provide evaluations of the effects of various loss functions, evaluation metrics, model variants, and components on model performance.
△ Less
Submitted 20 November, 2021;
originally announced November 2021.
-
sbp-env: Sampling-based Motion Planners' Testing Environment
Authors:
Tin Lai
Abstract:
Sampling-based motion planners' testing environment (sbp-env) is a full feature framework to quickly test different sampling-based algorithms for motion planning. sbp-env focuses on the flexibility of tinkering with different aspects of the framework, and had divided the main planning components into two categories (i) samplers and (ii) planners.
The focus of motion planning research had been ma…
▽ More
Sampling-based motion planners' testing environment (sbp-env) is a full feature framework to quickly test different sampling-based algorithms for motion planning. sbp-env focuses on the flexibility of tinkering with different aspects of the framework, and had divided the main planning components into two categories (i) samplers and (ii) planners.
The focus of motion planning research had been mainly on (i) improving the sampling efficiency (with methods such as heuristic or learned distribution) and (ii) the algorithmic aspect of the planner using different routines to build a connected graph. Therefore, by separating the two components one can quickly swap out different components to test novel ideas.
△ Less
Submitted 27 October, 2021; v1 submitted 15 October, 2021;
originally announced October 2021.
-
Rapid Replanning in Consecutive Pick-and-Place Tasks with Lazy Experience Graph
Authors:
Tin Lai,
Fabio Ramos
Abstract:
In an environment where a manipulator needs to execute multiple consecutive tasks, the act of object manoeuvre will change the underlying configuration space, affecting all subsequent tasks. Previously free configurations might now be occupied by the manoeuvred objects, and previously occupied space might now open up new paths. We propose Lazy Tree-based Replanner (LTR*) -- a novel hybrid planner…
▽ More
In an environment where a manipulator needs to execute multiple consecutive tasks, the act of object manoeuvre will change the underlying configuration space, affecting all subsequent tasks. Previously free configurations might now be occupied by the manoeuvred objects, and previously occupied space might now open up new paths. We propose Lazy Tree-based Replanner (LTR*) -- a novel hybrid planner that inherits the rapid planning nature of existing anytime incremental sampling-based planners. At the same time, it allows subsequent tasks to leverage prior experience via a lazy experience graph. Previous experience is summarised in a lazy graph structure, and LTR* is formulated to be robust and beneficial regardless of the extent of changes in the workspace. Our hybrid approach attains a faster speed in obtaining an initial solution than existing roadmap-based planners and often with a lower cost in trajectory length. Subsequent tasks can utilise the lazy experience graph to speed up finding a solution and take advantage of the optimised graph to minimise the cost objective. We provide proofs of probabilistic completeness and almost-surely asymptotic optimal guarantees. Experimentally, we show that in repeated pick-and-place tasks, LTR* attains a high gain in performance when planning for subsequent tasks.
△ Less
Submitted 3 September, 2022; v1 submitted 21 September, 2021;
originally announced September 2021.
-
BERT might be Overkill: A Tiny but Effective Biomedical Entity Linker based on Residual Convolutional Neural Networks
Authors:
Tuan Lai,
Heng Ji,
ChengXiang Zhai
Abstract:
Biomedical entity linking is the task of linking entity mentions in a biomedical document to referent entities in a knowledge base. Recently, many BERT-based models have been introduced for the task. While these models have achieved competitive results on many datasets, they are computationally expensive and contain about 110M parameters. Little is known about the factors contributing to their imp…
▽ More
Biomedical entity linking is the task of linking entity mentions in a biomedical document to referent entities in a knowledge base. Recently, many BERT-based models have been introduced for the task. While these models have achieved competitive results on many datasets, they are computationally expensive and contain about 110M parameters. Little is known about the factors contributing to their impressive performance and whether the over-parameterization is needed. In this work, we shed some light on the inner working mechanisms of these large BERT-based models. Through a set of probing experiments, we have found that the entity linking performance only changes slightly when the input word order is shuffled or when the attention scope is limited to a fixed window size. From these observations, we propose an efficient convolutional neural network with residual connections for biomedical entity linking. Because of the sparse connectivity and weight sharing properties, our model has a small number of parameters and is highly efficient. On five public datasets, our model achieves comparable or even better linking accuracy than the state-of-the-art BERT-based models while having about 60 times fewer parameters.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
Parallelised Diffeomorphic Sampling-based Motion Planning
Authors:
Tin Lai,
Weiming Zhi,
Tucker Hermans,
Fabio Ramos
Abstract:
We propose Parallelised Diffeomorphic Sampling-based Motion Planning (PDMP). PDMP is a novel parallelised framework that uses bijective and differentiable mappings, or diffeomorphisms, to transform sampling distributions of sampling-based motion planners, in a manner akin to normalising flows. Unlike normalising flow models which use invertible neural network structures to represent these diffeomo…
▽ More
We propose Parallelised Diffeomorphic Sampling-based Motion Planning (PDMP). PDMP is a novel parallelised framework that uses bijective and differentiable mappings, or diffeomorphisms, to transform sampling distributions of sampling-based motion planners, in a manner akin to normalising flows. Unlike normalising flow models which use invertible neural network structures to represent these diffeomorphisms, we develop them from gradient information of desired costs, and encode desirable behaviour, such as obstacle avoidance. These transformed sampling distributions can then be used for sampling-based motion planning. A particular example is when we wish to imbue the sampling distribution with knowledge of the environment geometry, such that drawn samples are less prone to be in collisions. To this end, we propose to learn a continuous occupancy representation from environment occupancy data, such that gradients of the representation defines a valid diffeomorphism and is amenable to fast parallel evaluation. We use this to "morph" the sampling distribution to draw far fewer collision-prone samples. PDMP is able to leverage gradient information of costs, to inject specifications, in a manner similar to optimisation-based motion planning methods, but relies on drawing from a sampling distribution, retaining the tendency to find more global solutions, thereby bridging the gap between trajectory optimisation and sampling-based planning methods.
△ Less
Submitted 22 September, 2021; v1 submitted 26 August, 2021;
originally announced August 2021.
-
A Unified Transformer-based Framework for Duplex Text Normalization
Authors:
Tuan Manh Lai,
Yang Zhang,
Evelina Bakhturina,
Boris Ginsburg,
Heng Ji
Abstract:
Text normalization (TN) and inverse text normalization (ITN) are essential preprocessing and postprocessing steps for text-to-speech synthesis and automatic speech recognition, respectively. Many methods have been proposed for either TN or ITN, ranging from weighted finite-state transducers to neural networks. Despite their impressive performance, these methods aim to tackle only one of the two ta…
▽ More
Text normalization (TN) and inverse text normalization (ITN) are essential preprocessing and postprocessing steps for text-to-speech synthesis and automatic speech recognition, respectively. Many methods have been proposed for either TN or ITN, ranging from weighted finite-state transducers to neural networks. Despite their impressive performance, these methods aim to tackle only one of the two tasks but not both. As a result, in a complete spoken dialog system, two separate models for TN and ITN need to be built. This heterogeneity increases the technical complexity of the system, which in turn increases the cost of maintenance in a production setting. Motivated by this observation, we propose a unified framework for building a single neural duplex system that can simultaneously handle TN and ITN. Combined with a simple but effective data augmentation method, our systems achieve state-of-the-art results on the Google TN dataset for English and Russian. They can also reach over 95% sentence-level accuracy on an internal English TN dataset without any additional fine-tuning. In addition, we also create a cleaned dataset from the Spoken Wikipedia Corpora for German and report the performance of our systems on the dataset. Overall, experimental results demonstrate the proposed duplex text normalization framework is highly effective and applicable to a range of domains and languages
△ Less
Submitted 22 August, 2021;
originally announced August 2021.
-
End-to-end Neural Coreference Resolution Revisited: A Simple yet Effective Baseline
Authors:
Tuan Manh Lai,
Trung Bui,
Doo Soon Kim
Abstract:
Since the first end-to-end neural coreference resolution model was introduced, many extensions to the model have been proposed, ranging from using higher-order inference to directly optimizing evaluation metrics using reinforcement learning. Despite improving the coreference resolution performance by a large margin, these extensions add substantial extra complexity to the original model. Motivated…
▽ More
Since the first end-to-end neural coreference resolution model was introduced, many extensions to the model have been proposed, ranging from using higher-order inference to directly optimizing evaluation metrics using reinforcement learning. Despite improving the coreference resolution performance by a large margin, these extensions add substantial extra complexity to the original model. Motivated by this observation and the recent advances in pre-trained Transformer language models, we propose a simple yet effective baseline for coreference resolution. Even though our model is a simplified version of the original neural coreference resolution model, it achieves impressive performance, outperforming all recent extended works on the public English OntoNotes benchmark. Our work provides evidence for the necessity of carefully justifying the complexity of existing or newly proposed models, as introducing a conceptual or practical simplification to an existing model can still yield competitive results.
△ Less
Submitted 8 February, 2022; v1 submitted 4 July, 2021;
originally announced July 2021.
-
Learning ODEs via Diffeomorphisms for Fast and Robust Integration
Authors:
Weiming Zhi,
Tin Lai,
Lionel Ott,
Edwin V. Bonilla,
Fabio Ramos
Abstract:
Advances in differentiable numerical integrators have enabled the use of gradient descent techniques to learn ordinary differential equations (ODEs). In the context of machine learning, differentiable solvers are central for Neural ODEs (NODEs), a class of deep learning models with continuous depth, rather than discrete layers. However, these integrators can be unsatisfactorily slow and inaccurate…
▽ More
Advances in differentiable numerical integrators have enabled the use of gradient descent techniques to learn ordinary differential equations (ODEs). In the context of machine learning, differentiable solvers are central for Neural ODEs (NODEs), a class of deep learning models with continuous depth, rather than discrete layers. However, these integrators can be unsatisfactorily slow and inaccurate when learning systems of ODEs from long sequences, or when solutions of the system vary at widely different timescales in each dimension. In this paper we propose an alternative approach to learning ODEs from data: we represent the underlying ODE as a vector field that is related to another base vector field by a differentiable bijection, modelled by an invertible neural network. By restricting the base ODE to be amenable to integration, we can drastically speed up and improve the robustness of integration. We demonstrate the efficacy of our method in training and evaluating continuous neural networks models, as well as in learning benchmark ODE systems. We observe improvements of up to two orders of magnitude when integrating learned ODEs with GPUs computation.
△ Less
Submitted 4 July, 2021;
originally announced July 2021.
-
Joint Biomedical Entity and Relation Extraction with Knowledge-Enhanced Collective Inference
Authors:
Tuan Lai,
Heng Ji,
ChengXiang Zhai,
Quan Hung Tran
Abstract:
Compared to the general news domain, information extraction (IE) from biomedical text requires much broader domain knowledge. However, many previous IE methods do not utilize any external knowledge during inference. Due to the exponential growth of biomedical publications, models that do not go beyond their fixed set of parameters will likely fall behind. Inspired by how humans look up relevant in…
▽ More
Compared to the general news domain, information extraction (IE) from biomedical text requires much broader domain knowledge. However, many previous IE methods do not utilize any external knowledge during inference. Due to the exponential growth of biomedical publications, models that do not go beyond their fixed set of parameters will likely fall behind. Inspired by how humans look up relevant information to comprehend a scientific text, we present a novel framework that utilizes external knowledge for joint entity and relation extraction named KECI (Knowledge-Enhanced Collective Inference). Given an input text, KECI first constructs an initial span graph representing its initial understanding of the text. It then uses an entity linker to form a knowledge graph containing relevant background knowledge for the the entity mentions in the text. To make the final predictions, KECI fuses the initial span graph and the knowledge graph into a more refined graph using an attention mechanism. KECI takes a collective approach to link mention spans to entities by integrating global relational information into local representations using graph convolutional networks. Our experimental results show that the framework is highly effective, achieving new state-of-the-art results in two different benchmark datasets: BioRelEx (binding interaction detection) and ADE (adverse drug event extraction). For example, KECI achieves absolute improvements of 4.59% and 4.91% in F1 scores over the state-of-the-art on the BioRelEx entity and relation extraction tasks.
△ Less
Submitted 31 May, 2021; v1 submitted 27 May, 2021;
originally announced May 2021.
-
A Context-Dependent Gated Module for Incorporating Symbolic Semantics into Event Coreference Resolution
Authors:
Tuan Lai,
Heng Ji,
Trung Bui,
Quan Hung Tran,
Franck Dernoncourt,
Walter Chang
Abstract:
Event coreference resolution is an important research problem with many applications. Despite the recent remarkable success of pretrained language models, we argue that it is still highly beneficial to utilize symbolic features for the task. However, as the input for coreference resolution typically comes from upstream components in the information extraction pipeline, the automatically extracted…
▽ More
Event coreference resolution is an important research problem with many applications. Despite the recent remarkable success of pretrained language models, we argue that it is still highly beneficial to utilize symbolic features for the task. However, as the input for coreference resolution typically comes from upstream components in the information extraction pipeline, the automatically extracted symbolic features can be noisy and contain errors. Also, depending on the specific context, some features can be more informative than others. Motivated by these observations, we propose a novel context-dependent gated module to adaptively control the information flows from the input symbolic features. Combined with a simple noisy training method, our best models achieve state-of-the-art results on two datasets: ACE 2005 and KBP 2016.
△ Less
Submitted 4 April, 2021;
originally announced April 2021.
-
Rapidly-exploring Random Forest: Adaptively Exploits Local Structure with Generalised Multi-Trees Motion Planning
Authors:
Tin Lai
Abstract:
Sampling-based motion planners perform exceptionally well in robotic applications that operate in high-dimensional space. However, most works often constrain the planning workspace rooted at some fixed locations, do not adaptively reason on strategy in narrow passages, and ignore valuable local structure information. In this paper, we propose Rapidly-exploring Random Forest (RRF*) -- a generalised…
▽ More
Sampling-based motion planners perform exceptionally well in robotic applications that operate in high-dimensional space. However, most works often constrain the planning workspace rooted at some fixed locations, do not adaptively reason on strategy in narrow passages, and ignore valuable local structure information. In this paper, we propose Rapidly-exploring Random Forest (RRF*) -- a generalised multi-trees motion planner that combines the rapid exploring property of tree-based methods and adaptively learns to deploys a Bayesian local sampling strategy in regions that are deemed to be bottlenecks. Local sampling exploits the local-connectivity of spaces via Markov Chain random sampling, which is updated sequentially with a Bayesian proposal distribution to learns the local structure from past observations. The trees selection problem is formulated as a multi-armed bandit problem, which efficiently allocates resources on the most promising tree to accelerate planning runtime. RRF* learns the region that is difficult to perform tree extensions and adaptively deploys local sampling in those regions to maximise the benefit of exploiting local structure. We provide rigorous proofs of completeness and optimal convergence guarantees, and we experimentally demonstrate that the effectiveness of RRF*'s adaptive multi-trees approach allows it to performs well in a wide range of problems.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
AutoNLU: An On-demand Cloud-based Natural Language Understanding System for Enterprises
Authors:
Nham Le,
Tuan Lai,
Trung Bui,
Doo Soon Kim
Abstract:
With the renaissance of deep learning, neural networks have achieved promising results on many natural language understanding (NLU) tasks. Even though the source codes of many neural network models are publicly available, there is still a large gap from open-sourced models to solving real-world problems in enterprises. Therefore, to fill this gap, we introduce AutoNLU, an on-demand cloud-based sys…
▽ More
With the renaissance of deep learning, neural networks have achieved promising results on many natural language understanding (NLU) tasks. Even though the source codes of many neural network models are publicly available, there is still a large gap from open-sourced models to solving real-world problems in enterprises. Therefore, to fill this gap, we introduce AutoNLU, an on-demand cloud-based system with an easy-to-use interface that covers all common use-cases and steps in developing an NLU model. AutoNLU has supported many product teams within Adobe with different use-cases and datasets, quickly delivering them working models. To demonstrate the effectiveness of AutoNLU, we present two case studies. i) We build a practical NLU model for handling various image-editing requests in Photoshop. ii) We build powerful keyphrase extraction models that achieve state-of-the-art results on two public benchmarks. In both cases, end users only need to write a small amount of code to convert their datasets into a common format used by AutoNLU.
△ Less
Submitted 26 November, 2020;
originally announced November 2020.
-
Anticipatory Navigation in Crowds by Probabilistic Prediction of Pedestrian Future Movements
Authors:
Weiming Zhi,
Tin Lai,
Lionel Ott,
Fabio Ramos
Abstract:
Critical for the coexistence of humans and robots in dynamic environments is the capability for agents to understand each other's actions, and anticipate their movements. This paper presents Stochastic Process Anticipatory Navigation (SPAN), a framework that enables nonholonomic robots to navigate in environments with crowds, while anticipating and accounting for the motion patterns of pedestrians…
▽ More
Critical for the coexistence of humans and robots in dynamic environments is the capability for agents to understand each other's actions, and anticipate their movements. This paper presents Stochastic Process Anticipatory Navigation (SPAN), a framework that enables nonholonomic robots to navigate in environments with crowds, while anticipating and accounting for the motion patterns of pedestrians. To this end, we learn a predictive model to predict continuous-time stochastic processes to model future movement of pedestrians. Anticipated pedestrian positions are used to conduct chance constrained collision-checking, and are incorporated into a time-to-collision control problem. An occupancy map is also integrated to allow for probabilistic collision-checking with static obstacles. We demonstrate the capability of SPAN in crowded simulation environments, as well as with a real-world pedestrian dataset.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering
Authors:
Quan Tran,
Nhan Dam,
Tuan Lai,
Franck Dernoncourt,
Trung Le,
Nham Le,
Dinh Phung
Abstract:
Interpretability and explainability of deep neural networks are challenging due to their scale, complexity, and the agreeable notions on which the explaining process rests. Previous work, in particular, has focused on representing internal components of neural networks through human-friendly visuals and concepts. On the other hand, in real life, when making a decision, human tends to rely on simil…
▽ More
Interpretability and explainability of deep neural networks are challenging due to their scale, complexity, and the agreeable notions on which the explaining process rests. Previous work, in particular, has focused on representing internal components of neural networks through human-friendly visuals and concepts. On the other hand, in real life, when making a decision, human tends to rely on similar situations and/or associations in the past. Hence arguably, a promising approach to make the model transparent is to design it in a way such that the model explicitly connects the current sample with the seen ones, and bases its decision on these samples. Grounded on that principle, we propose in this paper an explainable, evidence-based memory network architecture, which learns to summarize the dataset and extract supporting evidences to make its decision. Our model achieves state-of-the-art performance on two popular question answering datasets (i.e. TrecQA and WikiQA). Via further analysis, we show that this model can reliably trace the errors it has made in the validation step to the training instances that might have caused these errors. We believe that this error-tracing capability provides significant benefit in improving dataset quality in many applications.
△ Less
Submitted 5 November, 2020;
originally announced November 2020.
-
Robust Hierarchical Planning with Policy Delegation
Authors:
Tin Lai,
Philippe Morere
Abstract:
We propose a novel framework and algorithm for hierarchical planning based on the principle of delegation. This framework, the Markov Intent Process, features a collection of skills which are each specialised to perform a single task well. Skills are aware of their intended effects and are able to analyse planning goals to delegate planning to the best-suited skill. This principle dynamically crea…
▽ More
We propose a novel framework and algorithm for hierarchical planning based on the principle of delegation. This framework, the Markov Intent Process, features a collection of skills which are each specialised to perform a single task well. Skills are aware of their intended effects and are able to analyse planning goals to delegate planning to the best-suited skill. This principle dynamically creates a hierarchy of plans, in which each skill plans for sub-goals for which it is specialised. The proposed planning method features on-demand execution---skill policies are only evaluated when needed. Plans are only generated at the highest level, then expanded and optimised when the latest state information is available. The high-level plan retains the initial planning intent and previously computed skills, effectively reducing the computation needed to adapt to environmental changes. We show this planning approach is experimentally very competitive to classic planning and reinforcement learning techniques on a variety of domains, both in terms of solution length and planning time.
△ Less
Submitted 25 October, 2020;
originally announced October 2020.
-
A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents
Authors:
Tuan Manh Lai,
Trung Bui,
Doo Soon Kim,
Quan Hung Tran
Abstract:
Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents, making it challenging to train increasingly complex neural networks. In contrast, digital libraries store millions of scientific articles online, covering a wide range of topics. While a significant…
▽ More
Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents, making it challenging to train increasingly complex neural networks. In contrast, digital libraries store millions of scientific articles online, covering a wide range of topics. While a significant portion of these articles contain keyphrases provided by their authors, most other articles lack such kind of annotations. Therefore, to effectively utilize these large amounts of unlabeled articles, we propose a simple and efficient joint learning approach based on the idea of self-distillation. Experimental results show that our approach consistently improves the performance of baseline models for keyphrase extraction. Furthermore, our best models outperform previous methods for the task, achieving new state-of-the-art results on two public benchmarks: Inspec and SemEval-2017.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Learning to Plan Optimally with Flow-based Motion Planner
Authors:
Tin Lai,
Fabio Ramos
Abstract:
Sampling-based motion planning is the predominant paradigm in many real-world robotic applications, but its performance is immensely dependent on the quality of the samples. The majority of traditional planners are inefficient as they use uninformative sampling distributions as opposed to exploiting structures and patterns in the problem to guide better sampling strategies. Moreover, most current…
▽ More
Sampling-based motion planning is the predominant paradigm in many real-world robotic applications, but its performance is immensely dependent on the quality of the samples. The majority of traditional planners are inefficient as they use uninformative sampling distributions as opposed to exploiting structures and patterns in the problem to guide better sampling strategies. Moreover, most current learning-based planners are susceptible to posterior collapse or mode collapse due to the sparsity and highly varying nature of C-Space and motion plan configurations. In this work, we introduce a conditional normalising flow based distribution learned through previous experiences to improve sampling of these methods. Our distribution can be conditioned on the current problem instance to provide an informative prior for sampling configurations within promising regions. When we train our sampler with an expert planner, the resulting distribution is often near-optimal, and the planner can find a solution faster, with less invalid samples, and less initial cost. The normalising flow based distribution uses simple invertible transformations that are very computationally efficient, and our optimisation formulation explicitly avoids mode collapse in contrast to other existing learning-based planners. Finally, we provide a formulation and theoretical foundation to efficiently sample from the distribution; and demonstrate experimentally that, by using our normalising flow based distribution, a solution can be found faster, with less samples and better overall runtime performance.
△ Less
Submitted 21 October, 2020;
originally announced October 2020.
-
ISA: An Intelligent Shopping Assistant
Authors:
Tuan Manh Lai,
Trung Bui,
Nedim Lipka
Abstract:
Despite the growth of e-commerce, brick-and-mortar stores are still the preferred destinations for many people. In this paper, we present ISA, a mobile-based intelligent shopping assistant that is designed to improve shopping experience in physical stores. ISA assists users by leveraging advanced techniques in computer vision, speech processing, and natural language processing. An in-store user on…
▽ More
Despite the growth of e-commerce, brick-and-mortar stores are still the preferred destinations for many people. In this paper, we present ISA, a mobile-based intelligent shopping assistant that is designed to improve shopping experience in physical stores. ISA assists users by leveraging advanced techniques in computer vision, speech processing, and natural language processing. An in-store user only needs to take a picture or scan the barcode of the product of interest, and then the user can talk to the assistant about the product. The assistant can also guide the user through the purchase process or recommend other similar products to the user. We take a data-driven approach in building the engines of ISA's natural language processing component, and the engines achieve good performance.
△ Less
Submitted 23 September, 2020; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Heidelberg Colorectal Data Set for Surgical Data Science in the Sensor Operating Room
Authors:
Lena Maier-Hein,
Martin Wagner,
Tobias Ross,
Annika Reinke,
Sebastian Bodenstedt,
Peter M. Full,
Hellena Hempe,
Diana Mindroc-Filimon,
Patrick Scholz,
Thuy Nuong Tran,
Pierangela Bruno,
Anna Kisilenko,
Benjamin Müller,
Tornike Davitashvili,
Manuela Capek,
Minu Tizabi,
Matthias Eisenmann,
Tim J. Adler,
Janek Gröhl,
Melanie Schellenberg,
Silvia Seidlitz,
T. Y. Emmy Lai,
Bünyamin Pekdemir,
Veith Roethlingshoefer,
Fabian Both
, et al. (8 additional authors not shown)
Abstract:
Image-based tracking of medical instruments is an integral part of surgical data science applications. Previous research has addressed the tasks of detecting, segmenting and tracking medical instruments based on laparoscopic video data. However, the proposed methods still tend to fail when applied to challenging images and do not generalize well to data they have not been trained on. This paper in…
▽ More
Image-based tracking of medical instruments is an integral part of surgical data science applications. Previous research has addressed the tasks of detecting, segmenting and tracking medical instruments based on laparoscopic video data. However, the proposed methods still tend to fail when applied to challenging images and do not generalize well to data they have not been trained on. This paper introduces the Heidelberg Colorectal (HeiCo) data set - the first publicly available data set enabling comprehensive benchmarking of medical instrument detection and segmentation algorithms with a specific emphasis on method robustness and generalization capabilities. Our data set comprises 30 laparoscopic videos and corresponding sensor data from medical devices in the operating room for three different types of laparoscopic surgery. Annotations include surgical phase labels for all video frames as well as information on instrument presence and corresponding instance-wise segmentation masks for surgical instruments (if any) in more than 10,000 individual frames. The data has successfully been used to organize international competitions within the Endoscopic Vision Challenges 2017 and 2019.
△ Less
Submitted 23 February, 2021; v1 submitted 7 May, 2020;
originally announced May 2020.
-
A Simple but Effective BERT Model for Dialog State Tracking on Resource-Limited Systems
Authors:
Tuan Manh Lai,
Quan Hung Tran,
Trung Bui,
Daisuke Kihara
Abstract:
In a task-oriented dialog system, the goal of dialog state tracking (DST) is to monitor the state of the conversation from the dialog history. Recently, many deep learning based methods have been proposed for the task. Despite their impressive performance, current neural architectures for DST are typically heavily-engineered and conceptually complex, making it difficult to implement, debug, and ma…
▽ More
In a task-oriented dialog system, the goal of dialog state tracking (DST) is to monitor the state of the conversation from the dialog history. Recently, many deep learning based methods have been proposed for the task. Despite their impressive performance, current neural architectures for DST are typically heavily-engineered and conceptually complex, making it difficult to implement, debug, and maintain them in a production setting. In this work, we propose a simple but effective DST model based on BERT. In addition to its simplicity, our approach also has a number of other advantages: (a) the number of parameters does not grow with the ontology size (b) the model can operate in situations where the domain ontology may change dynamically. Experimental results demonstrate that our BERT-based model outperforms previous methods by a large margin, achieving new state-of-the-art results on the standard WoZ 2.0 dataset. Finally, to make the model small and fast enough for resource-restricted systems, we apply the knowledge distillation method to compress our model. The final compressed model achieves comparable results with the original model while being 8x smaller and 7x faster.
△ Less
Submitted 8 February, 2020; v1 submitted 28 October, 2019;
originally announced October 2019.