-
Analyzing Security and Privacy Challenges in Generative AI Usage Guidelines for Higher Education
Authors:
Bei Yi Ng,
Jiarui Li,
Xinyuan Tong,
Kevin Ye,
Gauthami Yenne,
Varun Chandrasekaran,
Jingjie Li
Abstract:
Educators and learners worldwide are embracing the rise of Generative Artificial Intelligence (GenAI) as it reshapes higher education. However, GenAI also raises significant privacy and security concerns, as models and privacy-sensitive user data, such as student records, may be misused by service providers. Unfortunately, end-users often have little awareness of or control over how these models o…
▽ More
Educators and learners worldwide are embracing the rise of Generative Artificial Intelligence (GenAI) as it reshapes higher education. However, GenAI also raises significant privacy and security concerns, as models and privacy-sensitive user data, such as student records, may be misused by service providers. Unfortunately, end-users often have little awareness of or control over how these models operate. To address these concerns, universities are developing institutional policies to guide GenAI use while safeguarding security and privacy. This work examines these emerging policies and guidelines, with a particular focus on the often-overlooked privacy and security dimensions of GenAI integration in higher education, alongside other academic values. Through a qualitative analysis of GenAI usage guidelines from universities across 12 countries, we identify key challenges and opportunities institutions face in providing effective privacy and security protections, including the need for GenAI safeguards tailored specifically to the academic context.
△ Less
Submitted 25 June, 2025;
originally announced June 2025.
-
Troika algorithm: approximate optimization for accurate clique partitioning and clustering of weighted networks
Authors:
Samin Aref,
Boris Ng
Abstract:
Clique partitioning is a fundamental network clustering task, with applications in a wide range of computational sciences. It involves identifying an optimal partition of the nodes for a real-valued weighted graph according to the edge weights. An optimal partition is one that maximizes the sum of within-cluster edge weights over all possible node partitions. This paper introduces a novel approxim…
▽ More
Clique partitioning is a fundamental network clustering task, with applications in a wide range of computational sciences. It involves identifying an optimal partition of the nodes for a real-valued weighted graph according to the edge weights. An optimal partition is one that maximizes the sum of within-cluster edge weights over all possible node partitions. This paper introduces a novel approximation algorithm named Troika to solve this NP-hard problem in small to mid-sized networks for instances of theoretical and practical relevance. Troika uses a branch-and-cut scheme for branching on node triples to find a partition that is within a user-specified optimality gap tolerance. Troika offers advantages over alternative methods like integer programming solvers and heuristics for clique partitioning. Unlike existing heuristics, Troika returns solutions within a guaranteed proximity to global optimality. And our results indicate that Troika is faster than using the state-of-the-art integer programming solver Gurobi for most benchmark instances. Besides its advantages for solving the clique partitioning problem, we demonstrate the applications of Troika in community detection and portfolio analysis. Troika returns partitions with higher proximity to optimal compared to eight modularity-based community detection algorithms. When used on networks of correlations among stocks, Troika reveals the dynamic changes in the structure of portfolio networks including downturns from the 2008 financial crisis and the reaction to the COVID-19 pandemic. Our comprehensive results based on benchmarks from the literature and new real and random networks point to Troika as a reliable and accurate method for solving clique partitioning instances with up to 5000 edges on standard hardware.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Proof of Useful Intelligence (PoUI): Blockchain Consensus Beyond Energy Waste
Authors:
Zan-Kai Chong,
Hiroyuki Ohsaki,
Bryan Ng
Abstract:
Blockchain technology enables secure, transparent data management in decentralized systems, supporting applications from cryptocurrencies like Bitcoin to tokenizing real-world assets like property. Its scalability and sustainability hinge on consensus mechanisms balancing security and efficiency. Proof of Work (PoW), used by Bitcoin, ensures security through energy-intensive computations but deman…
▽ More
Blockchain technology enables secure, transparent data management in decentralized systems, supporting applications from cryptocurrencies like Bitcoin to tokenizing real-world assets like property. Its scalability and sustainability hinge on consensus mechanisms balancing security and efficiency. Proof of Work (PoW), used by Bitcoin, ensures security through energy-intensive computations but demands significant resources. Proof of Stake (PoS), as in Ethereum post-Merge, selects validators based on staked cryptocurrency, offering energy efficiency but risking centralization from wealth concentration. With AI models straining computational resources, we propose Proof of Useful Intelligence (PoUI), a hybrid consensus mechanism. In PoUI, workers perform AI tasks like language processing or image analysis to earn coins, which are staked to secure the network, blending security with practical utility. Decentralized nodes--job posters, market coordinators, workers, and validators --collaborate via smart contracts to manage tasks and rewards.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
Fast Computation of the Discrete Fourier Transform Rectangular Index Coefficients
Authors:
Saulo Queiroz,
João P. Vilela,
Benjamin Koon Kei Ng,
Chan-Tong Lam,
Edmundo Monteiro
Abstract:
In~\cite{sic-magazine-2025}, the authors show that the square index coefficients (SICs) of the $N$-point discrete Fourier transform (DFT) -- that is, the coefficients $X_{k\sqrt{N}}$ for $k = 0, 1, \ldots, \sqrt{N} - 1$ -- can be losslessly compressed from $N$ to $\sqrt{N}$ points, thereby accelerating the computation of these specific DFT coefficients accordingly. Following up on that, in this ar…
▽ More
In~\cite{sic-magazine-2025}, the authors show that the square index coefficients (SICs) of the $N$-point discrete Fourier transform (DFT) -- that is, the coefficients $X_{k\sqrt{N}}$ for $k = 0, 1, \ldots, \sqrt{N} - 1$ -- can be losslessly compressed from $N$ to $\sqrt{N}$ points, thereby accelerating the computation of these specific DFT coefficients accordingly. Following up on that, in this article we generalize SICs into what we refer to as rectangular index coefficients (RICs) of the DFT, formalized as $X_{kL}, k=0,1,\cdots,C-1$, in which the integers $C$ and $L$ are generic roots of $N$ such that $N=LC$. We present an algorithm to compress the $N$-point input signal $\mathbf{x}$ into a $C$-point signal $\mathbf{\hat{x}}$ at the expense of $\mathcal{O}(N)$ complex sums and no complex multiplication. We show that a DFT on $\mathbf{\hat{x}}$ is equivalent to a DFT on the RICs of $\mathbf{x}$. In cases where specific frequencies of $\mathbf{x}$ are of interest -- as in harmonic analysis -- one can conveniently adjust the signal parameters (e.g., frequency resolution) to align the RICs with those frequencies, and use the proposed algorithm to compute them significantly faster. If $N$ is a power of two -- as required by the fast Fourier transform (FFT) algorithm -- then $C$ can be any power of two in the range $[2, N/2]$ and one can use our algorithm along with FFT to compute all RICs in $\mathcal{O}(C\log C)$ time complexity.
△ Less
Submitted 3 May, 2025; v1 submitted 16 April, 2025;
originally announced April 2025.
-
Feature-based Graph Attention Networks Improve Online Continual Learning
Authors:
Adjovi Sim,
Zhengkui Wang,
Aik Beng Ng,
Shalini De Mello,
Simon See,
Wonmin Byeon
Abstract:
Online continual learning for image classification is crucial for models to adapt to new data while retaining knowledge of previously learned tasks. This capability is essential to address real-world challenges involving dynamic environments and evolving data distributions. Traditional approaches predominantly employ Convolutional Neural Networks, which are limited to processing images as grids an…
▽ More
Online continual learning for image classification is crucial for models to adapt to new data while retaining knowledge of previously learned tasks. This capability is essential to address real-world challenges involving dynamic environments and evolving data distributions. Traditional approaches predominantly employ Convolutional Neural Networks, which are limited to processing images as grids and primarily capture local patterns rather than relational information. Although the emergence of transformer architectures has improved the ability to capture relationships, these models often require significantly larger resources. In this paper, we present a novel online continual learning framework based on Graph Attention Networks (GATs), which effectively capture contextual relationships and dynamically update the task-specific representation via learned attention weights. Our approach utilizes a pre-trained feature extractor to convert images into graphs using hierarchical feature maps, representing information at varying levels of granularity. These graphs are then processed by a GAT and incorporate an enhanced global pooling strategy to improve classification performance for continual learning. In addition, we propose the rehearsal memory duplication technique that improves the representation of the previous tasks while maintaining the memory budget. Comprehensive evaluations on benchmark datasets, including SVHN, CIFAR10, CIFAR100, and MiniImageNet, demonstrate the superiority of our method compared to the state-of-the-art methods.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
Optimal Signal Decomposition-based Multi-Stage Learning for Battery Health Estimation
Authors:
Vijay Babu Pamshetti,
Wei Zhang,
King Jet Tseng,
Bor Kiat Ng,
Qingyu Yan
Abstract:
Battery health estimation is fundamental to ensure battery safety and reduce cost. However, achieving accurate estimation has been challenging due to the batteries' complex nonlinear aging patterns and capacity regeneration phenomena. In this paper, we propose OSL, an optimal signal decomposition-based multi-stage machine learning for battery health estimation. OSL treats battery signals optimally…
▽ More
Battery health estimation is fundamental to ensure battery safety and reduce cost. However, achieving accurate estimation has been challenging due to the batteries' complex nonlinear aging patterns and capacity regeneration phenomena. In this paper, we propose OSL, an optimal signal decomposition-based multi-stage machine learning for battery health estimation. OSL treats battery signals optimally. It uses optimized variational mode decomposition to extract decomposed signals capturing different frequency bands of the original battery signals. It also incorporates a multi-stage learning process to analyze both spatial and temporal battery features effectively. An experimental study is conducted with a public battery aging dataset. OSL demonstrates exceptional performance with a mean error of just 0.26%. It significantly outperforms comparison algorithms, both those without and those with suboptimal signal decomposition and analysis. OSL considers practical battery challenges and can be integrated into real-world battery management systems, offering a good impact on battery monitoring and optimization.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
LLM-Net: Democratizing LLMs-as-a-Service through Blockchain-based Expert Networks
Authors:
Zan-Kai Chong,
Hiroyuki Ohsaki,
Bryan Ng
Abstract:
The centralization of Large Language Models (LLMs) development has created significant barriers to AI advancement, limiting the democratization of these powerful technologies. This centralization, coupled with the scarcity of high-quality training data and mounting complexity of maintaining comprehensive expertise across rapidly expanding knowledge domains, poses critical challenges to the continu…
▽ More
The centralization of Large Language Models (LLMs) development has created significant barriers to AI advancement, limiting the democratization of these powerful technologies. This centralization, coupled with the scarcity of high-quality training data and mounting complexity of maintaining comprehensive expertise across rapidly expanding knowledge domains, poses critical challenges to the continued growth of LLMs. While solutions like Retrieval-Augmented Generation (RAG) offer potential remedies, maintaining up-to-date expert knowledge across diverse domains remains a significant challenge, particularly given the exponential growth of specialized information. This paper introduces LLMs Networks (LLM-Net), a blockchain-based framework that democratizes LLMs-as-a-Service through a decentralized network of specialized LLM providers. By leveraging collective computational resources and distributed domain expertise, LLM-Net incorporates fine-tuned expert models for various specific domains, ensuring sustained knowledge growth while maintaining service quality through collaborative prompting mechanisms. The framework's robust design includes blockchain technology for transparent transaction and performance validation, establishing an immutable record of service delivery. Our simulation, built on top of state-of-the-art LLMs such as Claude 3.5 Sonnet, Llama 3.1, Grok-2, and GPT-4o, validates the effectiveness of the reputation-based mechanism in maintaining service quality by selecting high-performing respondents (LLM providers). Thereby it demonstrates the potential of LLM-Net to sustain AI advancement through the integration of decentralized expertise and blockchain-based accountability.
△ Less
Submitted 1 February, 2025; v1 submitted 13 January, 2025;
originally announced January 2025.
-
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild
Authors:
Peijun Bao,
Chenqi Kong,
Zihao Shao,
Boon Poh Ng,
Meng Hwa Er,
Alex C. Kot
Abstract:
Given a natural language query, video moment retrieval aims to localize the described temporal moment in an untrimmed video. A major challenge of this task is its heavy dependence on labor-intensive annotations for training. Unlike existing works that directly train models on manually curated data, we propose a novel paradigm to reduce annotation costs: pretraining the model on unlabeled, real-wor…
▽ More
Given a natural language query, video moment retrieval aims to localize the described temporal moment in an untrimmed video. A major challenge of this task is its heavy dependence on labor-intensive annotations for training. Unlike existing works that directly train models on manually curated data, we propose a novel paradigm to reduce annotation costs: pretraining the model on unlabeled, real-world videos. To support this, we introduce Video Moment Retrieval Pretraining (Vid-Morp), a large-scale dataset collected with minimal human intervention, consisting of over 50K videos captured in the wild and 200K pseudo annotations. Direct pretraining on these imperfect pseudo annotations, however, presents significant challenges, including mismatched sentence-video pairs and imprecise temporal boundaries. To address these issues, we propose the ReCorrect algorithm, which comprises two main phases: semantics-guided refinement and memory-consensus correction. The semantics-guided refinement enhances the pseudo labels by leveraging semantic similarity with video frames to clean out unpaired data and make initial adjustments to temporal boundaries. In the following memory-consensus correction phase, a memory bank tracks the model predictions, progressively correcting the temporal boundaries based on consensus within the memory. Comprehensive experiments demonstrate ReCorrect's strong generalization abilities across multiple downstream settings. Zero-shot ReCorrect achieves over 75% and 80% of the best fully-supervised performance on two benchmarks, while unsupervised ReCorrect reaches about 85% on both. The code, dataset, and pretrained models are available at https://github.com/baopj/Vid-Morp.
△ Less
Submitted 1 December, 2024;
originally announced December 2024.
-
Real-Time AI-Driven People Tracking and Counting Using Overhead Cameras
Authors:
Ishrath Ahamed,
Chamith Dilshan Ranathunga,
Dinuka Sandun Udayantha,
Benny Kai Kiat Ng,
Chau Yuen
Abstract:
Accurate people counting in smart buildings and intelligent transportation systems is crucial for energy management, safety protocols, and resource allocation. This is especially critical during emergencies, where precise occupant counts are vital for safe evacuation. Existing methods struggle with large crowds, often losing accuracy with even a few additional people. To address this limitation, t…
▽ More
Accurate people counting in smart buildings and intelligent transportation systems is crucial for energy management, safety protocols, and resource allocation. This is especially critical during emergencies, where precise occupant counts are vital for safe evacuation. Existing methods struggle with large crowds, often losing accuracy with even a few additional people. To address this limitation, this study proposes a novel approach combining a new object tracking algorithm, a novel counting algorithm, and a fine-tuned object detection model. This method achieves 97% accuracy in real-time people counting with a frame rate of 20-27 FPS on a low-power edge computer.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
Enhancing Trust in Clinically Significant Prostate Cancer Prediction with Multiple Magnetic Resonance Imaging Modalities
Authors:
Benjamin Ng,
Chi-en Amy Tai,
E. Zhixuan Zeng,
Alexander Wong
Abstract:
In the United States, prostate cancer is the second leading cause of deaths in males with a predicted 35,250 deaths in 2024. However, most diagnoses are non-lethal and deemed clinically insignificant which means that the patient will likely not be impacted by the cancer over their lifetime. As a result, numerous research studies have explored the accuracy of predicting clinical significance of pro…
▽ More
In the United States, prostate cancer is the second leading cause of deaths in males with a predicted 35,250 deaths in 2024. However, most diagnoses are non-lethal and deemed clinically insignificant which means that the patient will likely not be impacted by the cancer over their lifetime. As a result, numerous research studies have explored the accuracy of predicting clinical significance of prostate cancer based on magnetic resonance imaging (MRI) modalities and deep neural networks. Despite their high performance, these models are not trusted by most clinical scientists as they are trained solely on a single modality whereas clinical scientists often use multiple magnetic resonance imaging modalities during their diagnosis. In this paper, we investigate combining multiple MRI modalities to train a deep learning model to enhance trust in the models for clinically significant prostate cancer prediction. The promising performance and proposed training pipeline showcase the benefits of incorporating multiple MRI modalities for enhanced trust and accuracy.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
On-Device LLMs for SMEs: Challenges and Opportunities
Authors:
Jeremy Stephen Gabriel Yee,
Pai Chet Ng,
Zhengkui Wang,
Ian McLoughlin,
Aik Beng Ng,
Simon See
Abstract:
This paper presents a systematic review of the infrastructure requirements for deploying Large Language Models (LLMs) on-device within the context of small and medium-sized enterprises (SMEs), focusing on both hardware and software perspectives. From the hardware viewpoint, we discuss the utilization of processing units like GPUs and TPUs, efficient memory and storage solutions, and strategies for…
▽ More
This paper presents a systematic review of the infrastructure requirements for deploying Large Language Models (LLMs) on-device within the context of small and medium-sized enterprises (SMEs), focusing on both hardware and software perspectives. From the hardware viewpoint, we discuss the utilization of processing units like GPUs and TPUs, efficient memory and storage solutions, and strategies for effective deployment, addressing the challenges of limited computational resources typical in SME settings. From the software perspective, we explore framework compatibility, operating system optimization, and the use of specialized libraries tailored for resource-constrained environments. The review is structured to first identify the unique challenges faced by SMEs in deploying LLMs on-device, followed by an exploration of the opportunities that both hardware innovations and software adaptations offer to overcome these obstacles. Such a structured review provides practical insights, contributing significantly to the community by enhancing the technological resilience of SMEs in integrating LLMs.
△ Less
Submitted 22 October, 2024; v1 submitted 21 October, 2024;
originally announced October 2024.
-
A Multimodal Vision Foundation Model for Clinical Dermatology
Authors:
Siyuan Yan,
Zhen Yu,
Clare Primiero,
Cristina Vico-Alonso,
Zhonghua Wang,
Litao Yang,
Philipp Tschandl,
Ming Hu,
Lie Ju,
Gin Tan,
Vincent Tang,
Aik Beng Ng,
David Powell,
Paul Bonnington,
Simon See,
Elisabetta Magnaterra,
Peter Ferguson,
Jennifer Nguyen,
Pascale Guitera,
Jose Banuls,
Monika Janda,
Victoria Mar,
Harald Kittler,
H. Peter Soyer,
Zongyuan Ge
Abstract:
Diagnosing and treating skin diseases require advanced visual skills across domains and the ability to synthesize information from multiple imaging modalities. While current deep learning models excel at specific tasks like skin cancer diagnosis from dermoscopic images, they struggle to meet the complex, multimodal requirements of clinical practice. Here, we introduce PanDerm, a multimodal dermato…
▽ More
Diagnosing and treating skin diseases require advanced visual skills across domains and the ability to synthesize information from multiple imaging modalities. While current deep learning models excel at specific tasks like skin cancer diagnosis from dermoscopic images, they struggle to meet the complex, multimodal requirements of clinical practice. Here, we introduce PanDerm, a multimodal dermatology foundation model pretrained through self-supervised learning on over 2 million real-world skin disease images from 11 clinical institutions across 4 imaging modalities. We evaluated PanDerm on 28 diverse benchmarks, including skin cancer screening, risk stratification, differential diagnosis of common and rare skin conditions, lesion segmentation, longitudinal monitoring, and metastasis prediction and prognosis. PanDerm achieved state-of-the-art performance across all evaluated tasks, often outperforming existing models when using only 10% of labeled data. We conducted three reader studies to assess PanDerm's potential clinical utility. PanDerm outperformed clinicians by 10.2% in early-stage melanoma detection through longitudinal analysis, improved clinicians' skin cancer diagnostic accuracy by 11% on dermoscopy images, and enhanced non-dermatologist healthcare providers' differential diagnosis by 16.5% across 128 skin conditions on clinical photographs. These results demonstrate PanDerm's potential to improve patient care across diverse clinical scenarios and serve as a model for developing multimodal foundation models in other medical specialties, potentially accelerating the integration of AI support in healthcare. The code can be found at https://github.com/SiyuanYan1/PanDerm.
△ Less
Submitted 13 April, 2025; v1 submitted 19 October, 2024;
originally announced October 2024.
-
What Roles Can Spatial Modulation and Space Shift Keying Play in LEO Satellite-Assisted Communications?
Authors:
Chaorong Zhang,
Qingying Wu,
Yuyan Liu,
Benjamin K. Ng,
Chan-Tong Lam
Abstract:
In recent years, the rapid evolution of satellite communications play a pivotal role in addressing the ever-increasing demand for global connectivity, among which the Low Earth Orbit (LEO) satellites attract a great amount of attention due to their low latency and high data throughput capabilities. Based on this, we explore spatial modulation (SM) and space shift keying (SSK) designs as pivotal te…
▽ More
In recent years, the rapid evolution of satellite communications play a pivotal role in addressing the ever-increasing demand for global connectivity, among which the Low Earth Orbit (LEO) satellites attract a great amount of attention due to their low latency and high data throughput capabilities. Based on this, we explore spatial modulation (SM) and space shift keying (SSK) designs as pivotal techniques to enhance spectral efficiency (SE) and bit-error rate (BER) performance in the LEO satellite-assisted multiple-input multiple-output (MIMO) systems. The various performance analysis of these designs are presented in this paper, revealing insightful findings and conclusions through analytical methods and Monte Carlo simulations with perfect and imperfect channel state information (CSI) estimation. The results provide a comprehensive analysis of the merits and trade-offs associated with the investigated schemes, particularly in terms of BER, computational complexity, and SE. This analysis underscores the potential of both schemes as viable candidates for future 6G LEO satellite-assisted wireless communication systems.
△ Less
Submitted 10 January, 2025; v1 submitted 26 September, 2024;
originally announced September 2024.
-
Interactive Example-based Explanations to Improve Health Professionals' Onboarding with AI for Human-AI Collaborative Decision Making
Authors:
Min Hun Lee,
Renee Bao Xuan Ng,
Silvana Xinyi Choo,
Shamala Thilarajah
Abstract:
A growing research explores the usage of AI explanations on user's decision phases for human-AI collaborative decision-making. However, previous studies found the issues of overreliance on `wrong' AI outputs. In this paper, we propose interactive example-based explanations to improve health professionals' onboarding with AI for their better reliance on AI during AI-assisted decision-making. We imp…
▽ More
A growing research explores the usage of AI explanations on user's decision phases for human-AI collaborative decision-making. However, previous studies found the issues of overreliance on `wrong' AI outputs. In this paper, we propose interactive example-based explanations to improve health professionals' onboarding with AI for their better reliance on AI during AI-assisted decision-making. We implemented an AI-based decision support system that utilizes a neural network to assess the quality of post-stroke survivors' exercises and interactive example-based explanations that systematically surface the nearest neighborhoods of a test/task sample from the training set of the AI model to assist users' onboarding with the AI model. To investigate the effect of interactive example-based explanations, we conducted a study with domain experts, health professionals to evaluate their performance and reliance on AI. Our interactive example-based explanations during onboarding assisted health professionals in having a better reliance on AI and making a higher ratio of making `right' decisions and a lower ratio of `wrong' decisions than providing only feature-based explanations during the decision-support phase. Our study discusses new challenges of assisting user's onboarding with AI for human-AI collaborative decision-making.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
RIS-Assisted Received Adaptive Spatial Modulation for Wireless Communications
Authors:
Chaorong Zhang,
Hui Xu,
Benjamin K. Ng,
Chan-Tong Lam,
Ke Wang
Abstract:
A novel wireless transmission scheme, as named the reconfigurable intelligent surface (RIS)-assisted received adaptive spatial modulation (RASM) scheme, is proposed in this paper. In this scheme, the adaptive spatial modulation (ASM)-based antennas selection works at the receiver by employing the characteristics of the RIS in each time slot, where the signal-to-noise ratio at specific selected ant…
▽ More
A novel wireless transmission scheme, as named the reconfigurable intelligent surface (RIS)-assisted received adaptive spatial modulation (RASM) scheme, is proposed in this paper. In this scheme, the adaptive spatial modulation (ASM)-based antennas selection works at the receiver by employing the characteristics of the RIS in each time slot, where the signal-to-noise ratio at specific selected antennas can be further enhanced with near few powers. Besides for the bits from constellation symbols, the extra bits can be mapped into the indices of receive antenna combinations and conveyed to the receiver through the ASM-based antenna-combination selection, thus providing higher spectral efficiency. To explicitly present the RASM scheme, the analytical performance of bit error rate of it is discussed in this paper. As a trade-off selection, the proposed scheme shows higher spectral efficiency and remains the satisfactory error performance. Simulation and analytical results demonstrate the better performance and exhibit more potential to apply in practical wireless communication.
△ Less
Submitted 24 February, 2025; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Computational Complexity-Constrained Spectral Efficiency Analysis for 6G Waveforms
Authors:
Saulo Queiroz,
João P. Vilela,
Benjamin Koon Kei Ng,
Chan-Tong Lam,
Edmundo Monteiro
Abstract:
In this work, we present a tutorial on how to account for the computational time complexity overhead of signal processing in the spectral efficiency (SE) analysis of wireless waveforms. Our methodology is particularly relevant in scenarios where achieving higher SE entails a penalty in complexity, a common trade-off present in 6G candidate waveforms. We consider that SE derives from the data rate,…
▽ More
In this work, we present a tutorial on how to account for the computational time complexity overhead of signal processing in the spectral efficiency (SE) analysis of wireless waveforms. Our methodology is particularly relevant in scenarios where achieving higher SE entails a penalty in complexity, a common trade-off present in 6G candidate waveforms. We consider that SE derives from the data rate, which is impacted by time-dependent overheads. Thus, neglecting the computational complexity overhead in the SE analysis grants an unfair advantage to more computationally complex waveforms, as they require larger computational resources to meet a signal processing runtime below the symbol period. We demonstrate our points with two case studies. In the first, we refer to IEEE 802.11a-compliant baseband processors from the literature to show that their runtime significantly impacts the SE perceived by upper layers. In the second case study, we show that waveforms considered less efficient in terms of SE can outperform their more computationally expensive counterparts if provided with equivalent high-performance computational resources. Based on these cases, we believe our tutorial can address the comparative SE analysis of waveforms that operate under different computational resource constraints.
△ Less
Submitted 3 February, 2025; v1 submitted 8 July, 2024;
originally announced July 2024.
-
Nutrition Estimation for Dietary Management: A Transformer Approach with Depth Sensing
Authors:
Zhengyi Kwan,
Wei Zhang,
Zhengkui Wang,
Aik Beng Ng,
Simon See
Abstract:
Nutrition estimation is crucial for effective dietary management and overall health and well-being. Existing methods often struggle with sub-optimal accuracy and can be time-consuming. In this paper, we propose NuNet, a transformer-based network designed for nutrition estimation that utilizes both RGB and depth information from food images. We have designed and implemented a multi-scale encoder an…
▽ More
Nutrition estimation is crucial for effective dietary management and overall health and well-being. Existing methods often struggle with sub-optimal accuracy and can be time-consuming. In this paper, we propose NuNet, a transformer-based network designed for nutrition estimation that utilizes both RGB and depth information from food images. We have designed and implemented a multi-scale encoder and decoder, along with two types of feature fusion modules, specialized for estimating five nutritional factors. These modules effectively balance the efficiency and effectiveness of feature extraction with flexible usage of our customized attention mechanisms and fusion strategies. Our experimental study shows that NuNet outperforms its variants and existing solutions significantly for nutrition estimation. It achieves an error rate of 15.65%, the lowest known to us, largely due to our multi-scale architecture and fusion modules. This research holds practical values for dietary management with huge potential for transnational research and deployment and could inspire other applications involving multiple data types with varying degrees of importance.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
WiDRa -- Enabling Millimeter-Level Differential Ranging Accuracy in Wi-Fi Using Carrier Phase
Authors:
Vishnu V. Ratnam,
Bilal Sadiq,
Hao Chen,
Wei Sun,
Shunyao Wu,
Boon L. Ng,
Jianzhong,
Zhang
Abstract:
Although Wi-Fi is an ideal technology for many ranging applications, the performance of current methods is limited by the system bandwidth, leading to low accuracy of $\sim 1$ m. For many applications, measuring differential range, viz., the change in the range between adjacent measurements, is sufficient. Correspondingly, this work proposes WiDRa - a Wi-Fi based Differential Ranging solution that…
▽ More
Although Wi-Fi is an ideal technology for many ranging applications, the performance of current methods is limited by the system bandwidth, leading to low accuracy of $\sim 1$ m. For many applications, measuring differential range, viz., the change in the range between adjacent measurements, is sufficient. Correspondingly, this work proposes WiDRa - a Wi-Fi based Differential Ranging solution that provides differential range estimates by using the sum-carrier-phase information. The proposed method is not limited by system bandwidth and can track range changes even smaller than the carrier wavelength. The proposed method is first theoretically justified, while taking into consideration the various hardware impairments affecting Wi-Fi chips. In the process, methods to isolate the sum-carrier phase from the hardware impairments are proposed. Extensive simulation results show that WiDRa can achieve a differential range estimation root-mean-square-error (RMSE) of $\approx 1$ mm in channels with a Rician-factor $\geq 7$ (a $100 \times$ improvement to existing methods). The proposed methods are also validated on off-the-shelf Wi-Fi hardware to demonstrate feasibility, where they achieve an RMSE of $< 1$ mm in the differential range. Finally, limitations of current investigation and future directions of exploration are suggested, to further tap into the potential of WiDRa.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Continuous Predictive Modeling of Clinical Notes and ICD Codes in Patient Health Records
Authors:
Mireia Hernandez Caralt,
Clarence Boon Liang Ng,
Marek Rei
Abstract:
Electronic Health Records (EHR) serve as a valuable source of patient information, offering insights into medical histories, treatments, and outcomes. Previous research has developed systems for detecting applicable ICD codes that should be assigned while writing a given EHR document, mainly focusing on discharge summaries written at the end of a hospital stay. In this work, we investigate the pot…
▽ More
Electronic Health Records (EHR) serve as a valuable source of patient information, offering insights into medical histories, treatments, and outcomes. Previous research has developed systems for detecting applicable ICD codes that should be assigned while writing a given EHR document, mainly focusing on discharge summaries written at the end of a hospital stay. In this work, we investigate the potential of predicting these codes for the whole patient stay at different time points during their stay, even before they are officially assigned by clinicians. The development of methods to predict diagnoses and treatments earlier in advance could open opportunities for predictive medicine, such as identifying disease risks sooner, suggesting treatments, and optimizing resource allocation. Our experiments show that predictions regarding final ICD codes can be made already two days after admission and we propose a custom model that improves performance on this early prediction task.
△ Less
Submitted 5 July, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
LingML: Linguistic-Informed Machine Learning for Enhanced Fake News Detection
Authors:
Jasraj Singh,
Fang Liu,
Hong Xu,
Bee Chin Ng,
Wei Zhang
Abstract:
Nowadays, Information spreads at an unprecedented pace in social media and discerning truth from misinformation and fake news has become an acute societal challenge. Machine learning (ML) models have been employed to identify fake news but are far from perfect with challenging problems like limited accuracy, interpretability, and generalizability. In this paper, we enhance ML-based solutions with…
▽ More
Nowadays, Information spreads at an unprecedented pace in social media and discerning truth from misinformation and fake news has become an acute societal challenge. Machine learning (ML) models have been employed to identify fake news but are far from perfect with challenging problems like limited accuracy, interpretability, and generalizability. In this paper, we enhance ML-based solutions with linguistics input and we propose LingML, linguistic-informed ML, for fake news detection. We conducted an experimental study with a popular dataset on fake news during the pandemic. The experiment results show that our proposed solution is highly effective. There are fewer than two errors out of every ten attempts with only linguistic input used in ML and the knowledge is highly explainable. When linguistics input is integrated with advanced large-scale ML models for natural language processing, our solution outperforms existing ones with 1.8% average error rate. LingML creates a new path with linguistics to push the frontier of effective and efficient fake news detection. It also sheds light on real-world multi-disciplinary applications requiring both ML and domain expertise to achieve optimal performance.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Low-Cost Generation and Evaluation of Dictionary Example Sentences
Authors:
Bill Cai,
Clarence Boon Liang Ng,
Daniel Tan,
Shelvia Hotama
Abstract:
Dictionary example sentences play an important role in illustrating word definitions and usage, but manually creating quality sentences is challenging. Prior works have demonstrated that language models can be trained to generate example sentences. However, they relied on costly customized models and word sense datasets for generation and evaluation of their work. Rapid advancements in foundationa…
▽ More
Dictionary example sentences play an important role in illustrating word definitions and usage, but manually creating quality sentences is challenging. Prior works have demonstrated that language models can be trained to generate example sentences. However, they relied on costly customized models and word sense datasets for generation and evaluation of their work. Rapid advancements in foundational models present the opportunity to create low-cost, zero-shot methods for the generation and evaluation of dictionary example sentences. We introduce a new automatic evaluation metric called OxfordEval that measures the win-rate of generated sentences against existing Oxford Dictionary sentences. OxfordEval shows high alignment with human judgments, enabling large-scale automated quality evaluation. We experiment with various LLMs and configurations to generate dictionary sentences across word classes. We complement this with a novel approach of using masked language models to identify and select sentences that best exemplify word meaning. The eventual model, FM-MLM, achieves over 85.1% win rate against Oxford baseline sentences according to OxfordEval, compared to 39.8% win rate for prior model-generated sentences.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Towards Automated Generation of Smart Grid Cyber Range for Cybersecurity Experiments and Training
Authors:
Daisuke Mashima,
Muhammad M. Roomi,
Bennet Ng,
Zbigniew Kalbarczyk,
S. M. Suhail Hussain,
Ee-chien Chang
Abstract:
Assurance of cybersecurity is crucial to ensure dependability and resilience of smart power grid systems. In order to evaluate the impact of potential cyber attacks, to assess deployability and effectiveness of cybersecurity measures, and to enable hands-on exercise and training of personals, an interactive, virtual environment that emulates the behaviour of a smart grid system, namely smart grid…
▽ More
Assurance of cybersecurity is crucial to ensure dependability and resilience of smart power grid systems. In order to evaluate the impact of potential cyber attacks, to assess deployability and effectiveness of cybersecurity measures, and to enable hands-on exercise and training of personals, an interactive, virtual environment that emulates the behaviour of a smart grid system, namely smart grid cyber range, has been demanded by industry players as well as academia. A smart grid cyber range is typically implemented as a combination of cyber system emulation, which allows interactivity, and physical system (i.e., power grid) simulation that are tightly coupled for consistent cyber and physical behaviours. However, its design and implementation require intensive expertise and efforts in cyber and physical aspects of smart power systems as well as software/system engineering. While many industry players, including power grid operators, device vendors, research and education sectors are interested, availability of the smart grid cyber range is limited to a small number of research labs. To address this challenge, we have developed a framework for modelling a smart grid cyber range using an XML-based language, called SG-ML, and for "compiling" the model into an operational cyber range with minimal engineering efforts. The modelling language includes standardized schema from IEC 61850 and IEC 61131, which allows industry players to utilize their existing configurations. The SG-ML framework aims at making a smart grid cyber range available to broader user bases to facilitate cybersecurity R\&D and hands-on exercises.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Collaborative Active Learning in Conditional Trust Environment
Authors:
Zan-Kai Chong,
Hiroyuki Ohsaki,
Bryan Ng
Abstract:
In this paper, we investigate collaborative active learning, a paradigm in which multiple collaborators explore a new domain by leveraging their combined machine learning capabilities without disclosing their existing data and models. Instead, the collaborators share prediction results from the new domain and newly acquired labels. This collaboration offers several advantages: (a) it addresses pri…
▽ More
In this paper, we investigate collaborative active learning, a paradigm in which multiple collaborators explore a new domain by leveraging their combined machine learning capabilities without disclosing their existing data and models. Instead, the collaborators share prediction results from the new domain and newly acquired labels. This collaboration offers several advantages: (a) it addresses privacy and security concerns by eliminating the need for direct model and data disclosure; (b) it enables the use of different data sources and insights without direct data exchange; and (c) it promotes cost-effectiveness and resource efficiency through shared labeling costs. To realize these benefits, we introduce a collaborative active learning framework designed to fulfill the aforementioned objectives. We validate the effectiveness of the proposed framework through simulations. The results demonstrate that collaboration leads to higher AUC scores compared to independent efforts, highlighting the framework's ability to overcome the limitations of individual models. These findings support the use of collaborative approaches in active learning, emphasizing their potential to enhance outcomes through collective expertise and shared resources. Our work provides a foundation for further research on collaborative active learning and its practical applications in various domains where data privacy, cost efficiency, and model performance are critical considerations.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Maximum Channel Coding Rate of Finite Block Length MIMO Faster-Than-Nyquist Signaling
Authors:
Zichao Zhang,
Melda Yuksel,
Halim Yanikomeroglu,
Benjamin K. Ng,
Chan-Tong Lam
Abstract:
The pursuit of higher data rates and efficient spectrum utilization in modern communication technologies necessitates novel solutions. In order to provide insights into improving spectral efficiency and reducing latency, this study investigates the maximum channel coding rate (MCCR) of finite block length (FBL) multiple-input multiple-output (MIMO) faster-than-Nyquist (FTN) channels. By optimizing…
▽ More
The pursuit of higher data rates and efficient spectrum utilization in modern communication technologies necessitates novel solutions. In order to provide insights into improving spectral efficiency and reducing latency, this study investigates the maximum channel coding rate (MCCR) of finite block length (FBL) multiple-input multiple-output (MIMO) faster-than-Nyquist (FTN) channels. By optimizing power allocation, we derive the system's MCCR expression. Simulation results are compared with the existing literature to reveal the benefits of FTN in FBL transmission.
△ Less
Submitted 14 January, 2025; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Improve Cost Efficiency of Active Learning over Noisy Dataset
Authors:
Zan-Kai Chong,
Hiroyuki Ohsaki,
Bryan Ng
Abstract:
Active learning is a learning strategy whereby the machine learning algorithm actively identifies and labels data points to optimize its learning. This strategy is particularly effective in domains where an abundance of unlabeled data exists, but the cost of labeling these data points is prohibitively expensive. In this paper, we consider cases of binary classification, where acquiring a positive…
▽ More
Active learning is a learning strategy whereby the machine learning algorithm actively identifies and labels data points to optimize its learning. This strategy is particularly effective in domains where an abundance of unlabeled data exists, but the cost of labeling these data points is prohibitively expensive. In this paper, we consider cases of binary classification, where acquiring a positive instance incurs a significantly higher cost compared to that of negative instances. For example, in the financial industry, such as in money-lending businesses, a defaulted loan constitutes a positive event leading to substantial financial loss. To address this issue, we propose a shifted normal distribution sampling function that samples from a wider range than typical uncertainty sampling. Our simulation underscores that our proposed sampling function limits both noisy and positive label selection, delivering between 20% and 32% improved cost efficiency over different test datasets.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Joint Phase-Time Arrays: A Paradigm for Frequency-Dependent Analog Beamforming in 6G
Authors:
Vishnu V. Ratnam,
Jianhua Mo,
Ahmad AlAmmouri,
Boon L. Ng,
Jianzhong,
Zhang,
Andreas F. Molisch
Abstract:
Hybrid beamforming is an attractive solution to build cost-effective and energy-efficient transceivers for millimeter-wave and terahertz systems. However, conventional hybrid beamforming techniques rely on analog components that generate a frequency flat response such as phase-shifters and switches, which limits the flexibility of the achievable beam patterns. As a novel alternative, this paper pr…
▽ More
Hybrid beamforming is an attractive solution to build cost-effective and energy-efficient transceivers for millimeter-wave and terahertz systems. However, conventional hybrid beamforming techniques rely on analog components that generate a frequency flat response such as phase-shifters and switches, which limits the flexibility of the achievable beam patterns. As a novel alternative, this paper proposes a new class of hybrid beamforming called Joint phase-time arrays (JPTA), that additionally use true-time delay elements in the analog beamforming to create frequency-dependent analog beams. Using as an example two important frequency-dependent beam behaviors, the numerous benefits of such flexibility are exemplified. Subsequently, the JPTA beamformer design problem to generate any desired beam behavior is formulated and near-optimal algorithms to the problem are proposed. Simulations show that the proposed algorithms can outperform heuristics solutions for JPTA beamformer update. Furthermore, it is shown that JPTA can achieve the two exemplified beam behaviors with one radio-frequency chain, while conventional hybrid beamforming requires the radio-frequency chains to scale with the number of antennas to achieve similar performance. Finally, a wide range of problems to further tap into the potential of JPTA are also listed as future directions.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
GMC-Pos: Graph-Based Multi-Robot Coverage Positioning Method
Authors:
Khattiya Pongsirijinda,
Zhiqiang Cao,
Muhammad Shalihan,
Benny Kai Kiat Ng,
Billy Pik Lik Lau,
Chau Yuen,
U-Xuan Tan
Abstract:
Nowadays, several real-world tasks require adequate environment coverage for maintaining communication between multiple robots, for example, target search tasks, environmental monitoring, and post-disaster rescues. In this study, we look into a situation where there are a human operator and multiple robots, and we assume that each human or robot covers a certain range of areas. We want them to max…
▽ More
Nowadays, several real-world tasks require adequate environment coverage for maintaining communication between multiple robots, for example, target search tasks, environmental monitoring, and post-disaster rescues. In this study, we look into a situation where there are a human operator and multiple robots, and we assume that each human or robot covers a certain range of areas. We want them to maximize their area of coverage collectively. Therefore, in this paper, we propose the Graph-Based Multi-Robot Coverage Positioning Method (GMC-Pos) to find strategic positions for robots that maximize the area coverage. Our novel approach consists of two main modules: graph generation and node selection. Firstly, graph generation represents the environment using a weighted connected graph. Then, we present a novel generalized graph-based distance and utilize it together with the graph degrees to be the conditions for node selection in a recursive manner. Our method is deployed in three environments with different settings. The results show that it outperforms the benchmark method by 15.13% to 24.88% regarding the area coverage percentage.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Towards Intelligent Network Management: Leveraging AI for Network Service Detection
Authors:
Khuong N. Nguyen,
Abhishek Sehgal,
Yuming Zhu,
Junsu Choi,
Guanbo Chen,
Hao Chen,
Boon Loong Ng,
Charlie Zhang
Abstract:
As the complexity and scale of modern computer networks continue to increase, there has emerged an urgent need for precise traffic analysis, which plays a pivotal role in cutting-edge wireless connectivity technologies. This study focuses on leveraging Machine Learning methodologies to create an advanced network traffic classification system. We introduce a novel data-driven approach that excels i…
▽ More
As the complexity and scale of modern computer networks continue to increase, there has emerged an urgent need for precise traffic analysis, which plays a pivotal role in cutting-edge wireless connectivity technologies. This study focuses on leveraging Machine Learning methodologies to create an advanced network traffic classification system. We introduce a novel data-driven approach that excels in identifying various network service types in real-time, by analyzing patterns within the network traffic. Our method organizes similar kinds of network traffic into distinct categories, referred to as network services, based on latency requirement. Furthermore, it decomposes the network traffic stream into multiple, smaller traffic flows, with each flow uniquely carrying a specific service. Our ML models are trained on a dataset comprised of labeled examples representing different network service types collected on various Wi-Fi network conditions. Upon evaluation, our system demonstrates a remarkable accuracy in distinguishing the network services. These results emphasize the substantial promise of integrating Artificial Intelligence in wireless technologies. Such an approach encourages more efficient energy consumption, enhances Quality of Service assurance, and optimizes the allocation of network resources, thus laying a solid groundwork for the development of advanced intelligent networks.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
MIMO Asynchronous MAC with Faster-than-Nyquist (FTN) Signaling
Authors:
Zichao Zhang,
Melda Yuksel,
Halim Yanikomeroglu,
Benjamin K. Ng,
Chan-Tong Lam
Abstract:
Faster-than-Nyquist (FTN) signaling is a nonorthogonal transmission technique, which brings in intentional inter-symbol interference. This way it can significantly enhance spectral efficiency for practical pulse shapes such as the root raised cosine pulses. This paper proposes an achievable rate region for the multiple antenna (MIMO) asynchronous multiple access channel (aMAC) with FTN signaling.…
▽ More
Faster-than-Nyquist (FTN) signaling is a nonorthogonal transmission technique, which brings in intentional inter-symbol interference. This way it can significantly enhance spectral efficiency for practical pulse shapes such as the root raised cosine pulses. This paper proposes an achievable rate region for the multiple antenna (MIMO) asynchronous multiple access channel (aMAC) with FTN signaling. The scheme applies waterfilling in the spatial domain and precoding in time. Waterfilling in space provides better power allocation and precoding helps mitigate inter-symbol interference due to asynchronous transmission and FTN. The results show that the gains due to asynchronous transmission and FTN are more emphasized in MIMO aMAC than in single antenna aMAC. Moreover, FTN improves single-user rates, and asynchronous transmission improves the sum-rate, due to better inter-user interference management.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
Bit-Interleaved Multiple Access: Improved Fairness, Reliability, and Latency for Massive IoT Networks
Authors:
Ferdi Kara,
Hakan Kaya,
Halim Yanikomeroglu,
Chan-Tong Lam,
Ben K. Ng
Abstract:
In this paper, we propose bit-interleaved multiple access (BIMA) to enable Internet-of-Things (IoT) networks where a massive connection is required with limited resource blocks. First, by providing a true power allocation (PA) constraint for conventional NOMA with practical constraints, we demonstrate that it cannot support massive connections. To this end, we propose BIMA where there are no stric…
▽ More
In this paper, we propose bit-interleaved multiple access (BIMA) to enable Internet-of-Things (IoT) networks where a massive connection is required with limited resource blocks. First, by providing a true power allocation (PA) constraint for conventional NOMA with practical constraints, we demonstrate that it cannot support massive connections. To this end, we propose BIMA where there are no strict PA constraints, unlike conventional NOMA, thus allowing a high number of devices. We provide a comprehensive analytical framework for BIMA for all key performance indicators (KPIs) (i.e., ergodic capacity [EC], outage probability [OP], and bit error rate [BER]). We evaluate Jain's fairness index and proportional fairness index in terms of all KPIs. Based on the extensive computer simulations, we reveal that BIMA outperforms conventional NOMA significantly, with a performance gain of up to 20-30dB. This performance gain becomes greater when more devices are supported. BIMA provides a full diversity order and enables the implementation of an arbitrary number of devices and modulation orders, which is crucial for IoT networks in dense areas. BIMA guarantees a fairness system where none of the devices gets a severe performance and the sum-rate is shared in a fair manner among devices by guarantying QoS satisfaction. Finally, we provide an intense complexity and latency analysis and demonstrate that BIMA provides lower latency compared to conventional NOMA since it allows parallel computing at the receivers and no iterative operations are required. We show that BIMA reduces latency by up to 350\% for specific devices and 170\% on average.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Scientific Computing Algorithms to Learn Enhanced Scalable Surrogates for Mesh Physics
Authors:
Brian R. Bartoldson,
Yeping Hu,
Amar Saini,
Jose Cadena,
Yucheng Fu,
Jie Bao,
Zhijie Xu,
Brenda Ng,
Phan Nguyen
Abstract:
Data-driven modeling approaches can produce fast surrogates to study large-scale physics problems. Among them, graph neural networks (GNNs) that operate on mesh-based data are desirable because they possess inductive biases that promote physical faithfulness, but hardware limitations have precluded their application to large computational domains. We show that it is \textit{possible} to train a cl…
▽ More
Data-driven modeling approaches can produce fast surrogates to study large-scale physics problems. Among them, graph neural networks (GNNs) that operate on mesh-based data are desirable because they possess inductive biases that promote physical faithfulness, but hardware limitations have precluded their application to large computational domains. We show that it is \textit{possible} to train a class of GNN surrogates on 3D meshes. We scale MeshGraphNets (MGN), a subclass of GNNs for mesh-based physics modeling, via our domain decomposition approach to facilitate training that is mathematically equivalent to training on the whole domain under certain conditions. With this, we were able to train MGN on meshes with \textit{millions} of nodes to generate computational fluid dynamics (CFD) simulations. Furthermore, we show how to enhance MGN via higher-order numerical integration, which can reduce MGN's error and training time. We validated our methods on an accompanying dataset of 3D $\text{CO}_2$-capture CFD simulations on a 3.1M-node mesh. This work presents a practical path to scaling MGN for real-world applications.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Modelling Temporal Document Sequences for Clinical ICD Coding
Authors:
Clarence Boon Liang Ng,
Diogo Santos,
Marek Rei
Abstract:
Past studies on the ICD coding problem focus on predicting clinical codes primarily based on the discharge summary. This covers only a small fraction of the notes generated during each hospital stay and leaves potential for improving performance by analysing all the available clinical notes. We propose a hierarchical transformer architecture that uses text across the entire sequence of clinical no…
▽ More
Past studies on the ICD coding problem focus on predicting clinical codes primarily based on the discharge summary. This covers only a small fraction of the notes generated during each hospital stay and leaves potential for improving performance by analysing all the available clinical notes. We propose a hierarchical transformer architecture that uses text across the entire sequence of clinical notes in each hospital stay for ICD coding, and incorporates embeddings for text metadata such as their position, time, and type of note. While using all clinical notes increases the quantity of data substantially, superconvergence can be used to reduce training costs. We evaluate the model on the MIMIC-III dataset. Our model exceeds the prior state-of-the-art when using only discharge summaries as input, and achieves further performance improvements when all clinical notes are used as input.
△ Less
Submitted 24 February, 2023;
originally announced February 2023.
-
Location-based Activity Behavior Deviation Detection for Nursing Home using IoT Devices
Authors:
Billy Pik Lik Lau,
Zann Koh,
Yuren Zhou,
Benny Kai Kiat Ng,
Chau Yuen,
Mui Lang Low
Abstract:
With the advancement of the Internet of Things(IoT) and pervasive computing applications, it provides a better opportunity to understand the behavior of the aging population. However, in a nursing home scenario, common sensors and techniques used to track an elderly living alone are not suitable.
In this paper, we design a location-based tracking system for a four-story nursing home - The Salvatio…
▽ More
With the advancement of the Internet of Things(IoT) and pervasive computing applications, it provides a better opportunity to understand the behavior of the aging population. However, in a nursing home scenario, common sensors and techniques used to track an elderly living alone are not suitable.
In this paper, we design a location-based tracking system for a four-story nursing home - The Salvation Army, Peacehaven Nursing Home in Singapore. The main challenge here is to identify the group activity among the nursing home's residents and to detect if they have any deviated activity behavior. We propose a location-based deviated activity behavior detection system to detect deviated activity behavior by leveraging data fusion technique. In order to compute the features for data fusion, an adaptive method is applied for extracting the group and individual activity time and generate daily hybrid norm for each of the residents. Next, deviated activity behavior detection is executed by considering the difference between daily norm patterns and daily input data for each resident. Lastly, the deviated activity behavior among the residents are classified using a rule-based classification approach.
Through the implementation, there are 44.4% of the residents do not have deviated activity behavior , while 37% residents involved in one deviated activity behavior and 18.6% residents have two or more deviated activity behaviors.
△ Less
Submitted 25 January, 2023;
originally announced January 2023.
-
Image Projective Transformation Rectification with Synthetic Data for Smartphone-captured Chest X-ray Photos Classification
Authors:
Chak Fong Chong,
Yapeng Wang,
Benjamin Ng,
Wuman Luo,
Xu Yang
Abstract:
Classification on smartphone-captured chest X-ray (CXR) photos to detect pathologies is challenging due to the projective transformation caused by the non-ideal camera position. Recently, various rectification methods have been proposed for different photo rectification tasks such as document photos, license plate photos, etc. Unfortunately, we found that none of them is suitable for CXR photos, d…
▽ More
Classification on smartphone-captured chest X-ray (CXR) photos to detect pathologies is challenging due to the projective transformation caused by the non-ideal camera position. Recently, various rectification methods have been proposed for different photo rectification tasks such as document photos, license plate photos, etc. Unfortunately, we found that none of them is suitable for CXR photos, due to their specific transformation type, image appearance, annotation type, etc. In this paper, we propose an innovative deep learning-based Projective Transformation Rectification Network (PTRN) to automatically rectify CXR photos by predicting the projective transformation matrix. To the best of our knowledge, it is the first work to predict the projective transformation matrix as the learning goal for photo rectification. Additionally, to avoid the expensive collection of natural data, synthetic CXR photos are generated under the consideration of natural perturbations, extra screens, etc. We evaluate the proposed approach in the CheXphoto smartphone-captured CXR photos classification competition hosted by the Stanford University Machine Learning Group, our approach won first place with a huge performance improvement (ours 0.850, second-best 0.762, in AUC). A deeper study demonstrates that the use of PTRN successfully achieves the classification performance on the spatially transformed CXR photos to the same level as on the high-quality digital CXR images, indicating PTRN can eliminate all negative impacts of projective transformation on the CXR photos.
△ Less
Submitted 30 November, 2022; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot Document-Level Question Answering
Authors:
Tavish McDonald,
Brian Tsan,
Amar Saini,
Juanita Ordonez,
Luis Gutierrez,
Phan Nguyen,
Blake Mason,
Brenda Ng
Abstract:
Researchers produce thousands of scholarly documents containing valuable technical knowledge. The community faces the laborious task of reading these documents to identify, extract, and synthesize information. To automate information gathering, document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge. Finetuning QA s…
▽ More
Researchers produce thousands of scholarly documents containing valuable technical knowledge. The community faces the laborious task of reading these documents to identify, extract, and synthesize information. To automate information gathering, document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge. Finetuning QA systems requires access to labeled data (tuples of context, question and answer). However, data curation for document QA is uniquely challenging because the context (i.e. answer evidence passage) needs to be retrieved from potentially long, ill-formatted documents. Existing QA datasets sidestep this challenge by providing short, well-defined contexts that are unrealistic in real-world applications. We present a three-stage document QA approach: (1) text extraction from PDF; (2) evidence retrieval from extracted texts to form well-posed contexts; (3) QA to extract knowledge from contexts to return high-quality answers -- extractive, abstractive, or Boolean. Using QASPER for evaluation, our detect-retrieve-comprehend (DRC) system achieves a +7.19 improvement in Answer-F1 over existing baselines while delivering superior context selection. Our results demonstrate that DRC holds tremendous promise as a flexible framework for practical scientific document QA.
△ Less
Submitted 11 December, 2023; v1 submitted 4 October, 2022;
originally announced October 2022.
-
Cut-Paste Consistency Learning for Semi-Supervised Lesion Segmentation
Authors:
Boon Peng Yap,
Beng Koon Ng
Abstract:
Semi-supervised learning has the potential to improve the data-efficiency of training data-hungry deep neural networks, which is especially important for medical image analysis tasks where labeled data is scarce. In this work, we present a simple semi-supervised learning method for lesion segmentation tasks based on the ideas of cut-paste augmentation and consistency regularization. By exploiting…
▽ More
Semi-supervised learning has the potential to improve the data-efficiency of training data-hungry deep neural networks, which is especially important for medical image analysis tasks where labeled data is scarce. In this work, we present a simple semi-supervised learning method for lesion segmentation tasks based on the ideas of cut-paste augmentation and consistency regularization. By exploiting the mask information available in the labeled data, we synthesize partially labeled samples from the unlabeled images so that the usual supervised learning objective (e.g., binary cross entropy) can be applied. Additionally, we introduce a background consistency term to regularize the training on the unlabeled background regions of the synthetic images. We empirically verify the effectiveness of the proposed method on two public lesion segmentation datasets, including an eye fundus photograph dataset and a brain CT scan dataset. The experiment results indicate that our method achieves consistent and superior performance over other self-training and consistency-based methods without introducing sophisticated network components.
△ Less
Submitted 1 October, 2022;
originally announced October 2022.
-
Update Compression for Deep Neural Networks on the Edge
Authors:
Bo Chen,
Ali Bakhshi,
Gustavo Batista,
Brian Ng,
Tat-Jun Chin
Abstract:
An increasing number of artificial intelligence (AI) applications involve the execution of deep neural networks (DNNs) on edge devices. Many practical reasons motivate the need to update the DNN model on the edge device post-deployment, such as refining the model, concept drift, or outright change in the learning task. In this paper, we consider the scenario where retraining can be done on the ser…
▽ More
An increasing number of artificial intelligence (AI) applications involve the execution of deep neural networks (DNNs) on edge devices. Many practical reasons motivate the need to update the DNN model on the edge device post-deployment, such as refining the model, concept drift, or outright change in the learning task. In this paper, we consider the scenario where retraining can be done on the server side based on a copy of the DNN model, with only the necessary data transmitted to the edge to update the deployed model. However, due to bandwidth constraints, we want to minimise the transmission required to achieve the update. We develop a simple approach based on matrix factorisation to compress the model update -- this differs from compressing the model itself. The key idea is to preserve existing knowledge in the current model and optimise only small additional parameters for the update which can be used to reconstitute the model on the edge. We compared our method to similar techniques used in federated learning; our method usually requires less than half of the update size of existing methods to achieve the same accuracy.
△ Less
Submitted 21 April, 2022; v1 submitted 8 March, 2022;
originally announced March 2022.
-
Beam Management with Orientation and RSRP using Deep Learning for Beyond 5G Systems
Authors:
Khuong N. Nguyen,
Anum Ali,
Jianhua Mo,
Boon Loong Ng,
Vutha Va,
Jianzhong Charlie Zhang
Abstract:
Beam management (BM), i.e., the process of finding and maintaining a suitable transmit and receive beam pair, can be challenging, particularly in highly dynamic scenarios. Side-information, e.g., orientation, from on-board sensors can assist the user equipment (UE) BM. In this work, we use the orientation information coming from the inertial measurement unit (IMU) for effective BM. We use a data-d…
▽ More
Beam management (BM), i.e., the process of finding and maintaining a suitable transmit and receive beam pair, can be challenging, particularly in highly dynamic scenarios. Side-information, e.g., orientation, from on-board sensors can assist the user equipment (UE) BM. In this work, we use the orientation information coming from the inertial measurement unit (IMU) for effective BM. We use a data-driven strategy that fuses the reference signal received power (RSRP) with orientation information using a recurrent neural network (RNN). Simulation results show that the proposed strategy performs much better than the conventional BM and an orientation-assisted BM strategy that utilizes particle filter in another study. Specifically, the proposed data-driven strategy improves the beam-prediction accuracy up to 34% and increases mean RSRP by up to 4.2 dB when the UE orientation changes quickly.
△ Less
Submitted 4 February, 2022;
originally announced February 2022.
-
Sub-Chain Beam for mmWave Devices: A Trade-off between Power Saving and Beam Correspondence
Authors:
Jianhua Mo,
Daehee Park,
Boon Loong Ng,
Vutha Va,
Anum Ali,
Chonghwa Seo,
Jianzhong Charlie Zhang
Abstract:
Beam correspondence, or downlink-uplink (DL-UL) beam reciprocity, refers to the assumption that the best beams in the DL are also the best beams in the UL. This is an important assumption that allows the existing beam management framework in 5G to rely heavily on DL beam sweeping and avoid UL beam sweeping: UL beams are inferred from the measurements of the DL reference signals. Beam correspondenc…
▽ More
Beam correspondence, or downlink-uplink (DL-UL) beam reciprocity, refers to the assumption that the best beams in the DL are also the best beams in the UL. This is an important assumption that allows the existing beam management framework in 5G to rely heavily on DL beam sweeping and avoid UL beam sweeping: UL beams are inferred from the measurements of the DL reference signals. Beam correspondence holds when the radio configurations are symmetric in the DL and UL. However, as mmWave technology matures, the DL and the UL face different constraints often breaking the beam correspondence. For example, power constraints may require a UE to activate only a portion of its antenna array for UL transmission, while still activating the full array for DL reception. Meanwhile, if the UL beam with sub-array, named as sub-chain beam in this paper, has a similar radiation pattern as the DL beam, the beam correspondence can still hold. This paper proposes methods for sub-chain beam codebook design to achieve a trade-off between the power saving and beam correspondence.
△ Less
Submitted 22 December, 2021;
originally announced December 2021.
-
Latent Space Simulation for Carbon Capture Design Optimization
Authors:
Brian Bartoldson,
Rui Wang,
Yucheng Fu,
David Widemann,
Sam Nguyen,
Jie Bao,
Zhijie Xu,
Brenda Ng
Abstract:
The CO2 capture efficiency in solvent-based carbon capture systems (CCSs) critically depends on the gas-solvent interfacial area (IA), making maximization of IA a foundational challenge in CCS design. While the IA associated with a particular CCS design can be estimated via a computational fluid dynamics (CFD) simulation, using CFD to derive the IAs associated with numerous CCS designs is prohibit…
▽ More
The CO2 capture efficiency in solvent-based carbon capture systems (CCSs) critically depends on the gas-solvent interfacial area (IA), making maximization of IA a foundational challenge in CCS design. While the IA associated with a particular CCS design can be estimated via a computational fluid dynamics (CFD) simulation, using CFD to derive the IAs associated with numerous CCS designs is prohibitively costly. Fortunately, previous works such as Deep Fluids (DF) (Kim et al., 2019) show that large simulation speedups are achievable by replacing CFD simulators with neural network (NN) surrogates that faithfully mimic the CFD simulation process. This raises the possibility of a fast, accurate replacement for a CFD simulator and therefore efficient approximation of the IAs required by CCS design optimization. Thus, here, we build on the DF approach to develop surrogates that can successfully be applied to our complex carbon-capture CFD simulations. Our optimized DF-style surrogates produce large speedups (4000x) while obtaining IA relative errors as low as 4% on unseen CCS configurations that lie within the range of training configurations. This hints at the promise of NN surrogates for our CCS design optimization problem. Nonetheless, DF has inherent limitations with respect to CCS design (e.g., limited transferability of trained models to new CCS packings). We conclude with ideas to address these challenges.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Semi-weakly Supervised Contrastive Representation Learning for Retinal Fundus Images
Authors:
Boon Peng Yap,
Beng Koon Ng
Abstract:
We explore the value of weak labels in learning transferable representations for medical images. Compared to hand-labeled datasets, weak or inexact labels can be acquired in large quantities at significantly lower cost and can provide useful training signals for data-hungry models such as deep neural networks. We consider weak labels in the form of pseudo-labels and propose a semi-weakly supervise…
▽ More
We explore the value of weak labels in learning transferable representations for medical images. Compared to hand-labeled datasets, weak or inexact labels can be acquired in large quantities at significantly lower cost and can provide useful training signals for data-hungry models such as deep neural networks. We consider weak labels in the form of pseudo-labels and propose a semi-weakly supervised contrastive learning (SWCL) framework for representation learning using semi-weakly annotated images. Specifically, we train a semi-supervised model to propagate labels from a small dataset consisting of diverse image-level annotations to a large unlabeled dataset. Using the propagated labels, we generate a patch-level dataset for pretraining and formulate a multi-label contrastive learning objective to capture position-specific features encoded in each patch. We empirically validate the transfer learning performance of SWCL on seven public retinal fundus datasets, covering three disease classification tasks and two anatomical structure segmentation tasks. Our experiment results suggest that, under very low data regime, large-scale ImageNet pretraining on improved architecture remains a very strong baseline, and recently proposed self-supervised methods falter in segmentation tasks, possibly due to the strong invariant constraint imposed. Our method surpasses all prior self-supervised methods and standard cross-entropy training, while closing the gaps with ImageNet pretraining.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
Relative Localization of Mobile Robots with Multiple Ultra-WideBand Ranging Measurements
Authors:
Zhiqiang Cao,
Ran Liu,
Chau Yuen,
Achala Athukorala,
Benny Kai Kiat Ng,
Muraleetharan Mathanraj,
U-Xuan Tan
Abstract:
Relative localization between autonomous robots without infrastructure is crucial to achieve their navigation, path planning, and formation in many applications, such as emergency response, where acquiring a prior knowledge of the environment is not possible. The traditional Ultra-WideBand (UWB)-based approach provides a good estimation of the distance between the robots, but obtaining the relativ…
▽ More
Relative localization between autonomous robots without infrastructure is crucial to achieve their navigation, path planning, and formation in many applications, such as emergency response, where acquiring a prior knowledge of the environment is not possible. The traditional Ultra-WideBand (UWB)-based approach provides a good estimation of the distance between the robots, but obtaining the relative pose (including the displacement and orientation) remains challenging. We propose an approach to estimate the relative pose between a group of robots by equipping each robot with multiple UWB ranging nodes. We determine the pose between two robots by minimizing the residual error of the ranging measurements from all UWB nodes. To improve the localization accuracy, we propose to utilize the odometry constraints through a sliding window-based optimization. The optimized pose is then fused with the odometry in a particle filtering for pose tracking among a group of mobile robots. We have conducted extensive experiments to validate the effectiveness of the proposed approach.
△ Less
Submitted 30 July, 2021; v1 submitted 19 July, 2021;
originally announced July 2021.
-
Action Forecasting with Feature-wise Self-Attention
Authors:
Yan Bin Ng,
Basura Fernando
Abstract:
We present a new architecture for human action forecasting from videos. A temporal recurrent encoder captures temporal information of input videos while a self-attention model is used to attend on relevant feature dimensions of the input space. To handle temporal variations in observed video data, a feature masking techniques is employed. We classify observed actions accurately using an auxiliary…
▽ More
We present a new architecture for human action forecasting from videos. A temporal recurrent encoder captures temporal information of input videos while a self-attention model is used to attend on relevant feature dimensions of the input space. To handle temporal variations in observed video data, a feature masking techniques is employed. We classify observed actions accurately using an auxiliary classifier which helps to understand what has happened so far. Then the decoder generates actions for the future based on the output of the recurrent encoder and the self-attention model. Experimentally, we validate each component of our architecture where we see that the impact of self-attention to identify relevant feature dimensions, temporal masking, and observed auxiliary classifier. We evaluate our method on two standard action forecasting benchmarks and obtain state-of-the-art results.
△ Less
Submitted 18 July, 2021;
originally announced July 2021.
-
The Study of Urban Residential's Public Space Activeness using Space-centric Approach
Authors:
Billy Pik Lik Lau,
Benny Kai Kiat Ng,
Chau Yuen,
Bige Tuncer,
Keng Hua Chong
Abstract:
With the advancement of the Internet of Things (IoT) and communication platform, large scale sensor deployment can be easily implemented in an urban city to collect various information. To date, there are only a handful of research studies about understanding the usage of urban public spaces. Leveraging IoT, various sensors have been deployed in an urban residential area to monitor and study publi…
▽ More
With the advancement of the Internet of Things (IoT) and communication platform, large scale sensor deployment can be easily implemented in an urban city to collect various information. To date, there are only a handful of research studies about understanding the usage of urban public spaces. Leveraging IoT, various sensors have been deployed in an urban residential area to monitor and study public space utilization patterns. In this paper, we propose a data processing system to generate space-centric insights about the utilization of an urban residential region of multiple points of interest (PoIs) that consists of 190,000m$^2$ real estate. We identify the activeness of each PoI based on the spectral clustering, and then study their corresponding static features, which are composed of transportation, commercial facilities, population density, along with other characteristics. Through the heuristic features inferring, the residential density and commercial facilities are the most significant factors affecting public place utilization.
△ Less
Submitted 11 January, 2021; v1 submitted 11 January, 2021;
originally announced January 2021.
-
Urban Space Insights Extraction using Acoustic Histogram Information
Authors:
Nipun Wijerathne,
Billy Pik Lik Lau,
Benny Kai Kiat Ng,
Chau Yuen
Abstract:
Urban data mining can be identified as a highly potential area that can enhance the smart city services towards better sustainable development especially in the urban residential activity tracking. While existing human activity tracking systems have demonstrated the capability to unveil the hidden aspects of citizens' behavior, they often come with a high implementation cost and require a large co…
▽ More
Urban data mining can be identified as a highly potential area that can enhance the smart city services towards better sustainable development especially in the urban residential activity tracking. While existing human activity tracking systems have demonstrated the capability to unveil the hidden aspects of citizens' behavior, they often come with a high implementation cost and require a large communication bandwidth. In this paper, we study the implementation of low-cost analogue sound sensors to detect outdoor activities and estimate the raining period in an urban residential area. The analogue sound sensors are transmitted to the cloud every 5 minutes in histogram format, which consists of sound data sampled every 100ms (10Hz). We then use wavelet transformation (WT) and principal component analysis (PCA) to generate a more robust and consistent feature set from the histogram. After that, we performed unsupervised clustering and attempt to understand the individual characteristics of each cluster to identify outdoor residential activities. In addition, on-site validation has been conducted to show the effectiveness of our approach.
△ Less
Submitted 14 December, 2020; v1 submitted 10 December, 2020;
originally announced December 2020.
-
Attend and Decode: 4D fMRI Task State Decoding Using Attention Models
Authors:
Sam Nguyen,
Brenda Ng,
Alan D. Kaplan,
Priyadip Ray
Abstract:
Functional magnetic resonance imaging (fMRI) is a neuroimaging modality that captures the blood oxygen level in a subject's brain while the subject either rests or performs a variety of functional tasks under different conditions. Given fMRI data, the problem of inferring the task, known as task state decoding, is challenging due to the high dimensionality (hundreds of million sampling points per…
▽ More
Functional magnetic resonance imaging (fMRI) is a neuroimaging modality that captures the blood oxygen level in a subject's brain while the subject either rests or performs a variety of functional tasks under different conditions. Given fMRI data, the problem of inferring the task, known as task state decoding, is challenging due to the high dimensionality (hundreds of million sampling points per datum) and complex spatio-temporal blood flow patterns inherent in the data. In this work, we propose to tackle the fMRI task state decoding problem by casting it as a 4D spatio-temporal classification problem. We present a novel architecture called Brain Attend and Decode (BAnD), that uses residual convolutional neural networks for spatial feature extraction and self-attention mechanisms for temporal modeling. We achieve significant performance gain compared to previous works on a 7-task benchmark from the large-scale Human Connectome Project-Young Adult (HCP-YA) dataset. We also investigate the transferability of BAnD's extracted features on unseen HCP tasks, either by freezing the spatial feature extraction layers and retraining the temporal model, or finetuning the entire model. The pre-trained features from BAnD are useful on similar tasks while finetuning them yields competitive results on unseen tasks/conditions.
△ Less
Submitted 19 January, 2021; v1 submitted 10 April, 2020;
originally announced April 2020.
-
Dependently Typed Knowledge Graphs
Authors:
Zhangsheng Lai,
Aik Beng Ng,
Liang Ze Wong,
Simon See,
Shaowei Lin
Abstract:
Reasoning over knowledge graphs is traditionally built upon a hierarchy of languages in the Semantic Web Stack. Starting from the Resource Description Framework (RDF) for knowledge graphs, more advanced constructs have been introduced through various syntax extensions to add reasoning capabilities to knowledge graphs. In this paper, we show how standardized semantic web technologies (RDF and its q…
▽ More
Reasoning over knowledge graphs is traditionally built upon a hierarchy of languages in the Semantic Web Stack. Starting from the Resource Description Framework (RDF) for knowledge graphs, more advanced constructs have been introduced through various syntax extensions to add reasoning capabilities to knowledge graphs. In this paper, we show how standardized semantic web technologies (RDF and its query language SPARQL) can be reproduced in a unified manner with dependent type theory. In addition to providing the basic functionalities of knowledge graphs, dependent types add expressiveness in encoding both entities and queries, explainability in answers to queries through witnesses, and compositionality and automation in the construction of witnesses. Using the Coq proof assistant, we demonstrate how to build and query dependently typed knowledge graphs as a proof of concept for future works in this direction.
△ Less
Submitted 8 March, 2020;
originally announced March 2020.
-
Cluster Pruning: An Efficient Filter Pruning Method for Edge AI Vision Applications
Authors:
Chinthaka Gamanayake,
Lahiru Jayasinghe,
Benny Ng,
Chau Yuen
Abstract:
Even though the Convolutional Neural Networks (CNN) has shown superior results in the field of computer vision, it is still a challenging task to implement computer vision algorithms in real-time at the edge, especially using a low-cost IoT device due to high memory consumption and computation complexities in a CNN. Network compression methodologies such as weight pruning, filter pruning, and quan…
▽ More
Even though the Convolutional Neural Networks (CNN) has shown superior results in the field of computer vision, it is still a challenging task to implement computer vision algorithms in real-time at the edge, especially using a low-cost IoT device due to high memory consumption and computation complexities in a CNN. Network compression methodologies such as weight pruning, filter pruning, and quantization are used to overcome the above mentioned problem. Even though filter pruning methodology has shown better performances compared to other techniques, irregularity of the number of filters pruned across different layers of a CNN might not comply with majority of the neural computing hardware architectures. In this paper, a novel greedy approach called cluster pruning has been proposed, which provides a structured way of removing filters in a CNN by considering the importance of filters and the underlying hardware architecture. The proposed methodology is compared with the conventional filter pruning algorithm on Pascal-VOC open dataset, and Head-Counting dataset, which is our own dataset developed to detect and count people entering a room. We benchmark our proposed method on three hardware architectures, namely CPU, GPU, and Intel Movidius Neural Computer Stick (NCS) using the popular SSD-MobileNet and SSD-SqueezeNet neural network architectures used for edge-AI vision applications. Results demonstrate that our method outperforms the conventional filter pruning methodology, using both datasets on above mentioned hardware architectures. Furthermore, a low cost IoT hardware setup consisting of an Intel Movidius-NCS is proposed to deploy an edge-AI application using our proposed pruning methodology.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
Understanding Crowd Behaviors in a Social Event by Passive WiFi Sensing and Data Mining
Authors:
Yuren Zhou,
Billy Pik Lik Lau,
Zann Koh,
Chau Yuen,
Benny Kai Kiat Ng
Abstract:
Understanding crowd behaviors in a large social event is crucial for event management. Passive WiFi sensing, by collecting WiFi probe requests sent from mobile devices, provides a better way to monitor crowds compared with people counters and cameras in terms of free interference, larger coverage, lower cost, and more information on people's movement. In existing studies, however, not enough atten…
▽ More
Understanding crowd behaviors in a large social event is crucial for event management. Passive WiFi sensing, by collecting WiFi probe requests sent from mobile devices, provides a better way to monitor crowds compared with people counters and cameras in terms of free interference, larger coverage, lower cost, and more information on people's movement. In existing studies, however, not enough attention has been paid to the thorough analysis and mining of collected data. Especially, the power of machine learning has not been fully exploited. In this paper, therefore, we propose a comprehensive data analysis framework to fully analyze the collected probe requests to extract three types of patterns related to crowd behaviors in a large social event, with the help of statistics, visualization, and unsupervised machine learning. First, trajectories of the mobile devices are extracted from probe requests and analyzed to reveal the spatial patterns of the crowds' movement. Hierarchical agglomerative clustering is adopted to find the interconnections between different locations. Next, k-means and k-shape clustering algorithms are applied to extract temporal visiting patterns of the crowds by days and locations, respectively. Finally, by combining with time, trajectories are transformed into spatiotemporal patterns, which reveal how trajectory duration changes over the length and how the overall trends of crowd movement change over time. The proposed data analysis framework is fully demonstrated using real-world data collected in a large social event. Results show that one can extract comprehensive patterns from data collected by a network of passive WiFi sensors.
△ Less
Submitted 4 February, 2020;
originally announced February 2020.
-
Forecasting future action sequences with attention: a new approach to weakly supervised action forecasting
Authors:
Yan Bin Ng,
Basura Fernando
Abstract:
Future human action forecasting from partial observations of activities is an important problem in many practical applications such as assistive robotics, video surveillance and security. We present a method to forecast actions for the unseen future of the video using a neural machine translation technique that uses encoder-decoder architecture. The input to this model is the observed RGB video, a…
▽ More
Future human action forecasting from partial observations of activities is an important problem in many practical applications such as assistive robotics, video surveillance and security. We present a method to forecast actions for the unseen future of the video using a neural machine translation technique that uses encoder-decoder architecture. The input to this model is the observed RGB video, and the objective is to forecast the correct future symbolic action sequence. Unlike prior methods that make action predictions for some unseen percentage of video one for each frame, we predict the complete action sequence that is required to accomplish the activity. We coin this task action sequence forecasting. To cater for two types of uncertainty in the future predictions, we propose a novel loss function. We show a combination of optimal transport and future uncertainty losses help to improve results.
We extend our action sequence forecasting model to perform weakly supervised action forecasting on two challenging datasets, the Breakfast and the 50Salads. Specifically, we propose a model to predict actions of future unseen frames without using frame level annotations during training. Using Fisher vector features, our supervised model outperforms the state-of-the-art action forecasting model by 0.83% and 7.09% on the Breakfast and the 50Salads datasets respectively. Our weakly supervised model is only 0.6% behind the most recent state-of-the-art supervised model and obtains comparable results to other published fully supervised methods, and sometimes even outperforms them on the Breakfast dataset. Most interestingly, our weakly supervised model outperforms prior models by 1.04% leveraging on proposed weakly supervised architecture, and effective use of attention mechanism and loss functions.
△ Less
Submitted 3 February, 2022; v1 submitted 10 December, 2019;
originally announced December 2019.