-
HLStrans: Dataset for LLM-Driven C-to-HLS Hardware Code Synthesis
Authors:
Qingyun Zou,
Nuo Chen,
Yao Chen,
Bingsheng He,
WengFei Wong
Abstract:
High-level synthesis (HLS) enables software developers to describe and implement hardware at a higher level of abstraction by using C/C++ instead of traditional hardware description languages to automatically generate FPGA-ready designs. However, generating HLS code significantly differs from standard C/C++: it disallows certain coding idioms, relies on specialized libraries, and critically requir…
▽ More
High-level synthesis (HLS) enables software developers to describe and implement hardware at a higher level of abstraction by using C/C++ instead of traditional hardware description languages to automatically generate FPGA-ready designs. However, generating HLS code significantly differs from standard C/C++: it disallows certain coding idioms, relies on specialized libraries, and critically requires fine-grained transformations and the insertion of optimization directives (pragmas) to achieve high performance. Large language models (LLMs) have shown promise in automating such transformations, yet existing open-source datasets lack sufficient complexity and optimization diversity. To address this gap, we introduce the HLStrans dataset, a comprehensive collection of 137 distinct real word programs, each annotated with a variety of C-to-HLS transformations that yield over 23K labeled design variants. These include a broad spectrum of pragmas and code-level optimizations. We benchmark state-of-the-art LLMs on this dataset to evaluate their ability to generate synthesizable, high-performance HLS code. As part of an ongoing effort, we plan to expand the HLStrans dataset in both scale and program variety, further empowering research at the intersection of AI and hardware synthesis.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
A Principled Path to Fitted Distributional Evaluation
Authors:
Sungee Hong,
Jiayi Wang,
Zhengling Qi,
Raymond Ka Wai Wong
Abstract:
In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used fitted-Q evaluation -- developed for expectation-based reinforcement learning -- to the distributional OPE setting. We refer to this extension as fitted distributi…
▽ More
In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used fitted-Q evaluation -- developed for expectation-based reinforcement learning -- to the distributional OPE setting. We refer to this extension as fitted distributional evaluation (FDE). While only a few related approaches exist, there remains no unified framework for designing FDE methods. To fill this gap, we present a set of guiding principles for constructing theoretically grounded FDE methods. Building on these principles, we develop several new FDE methods with convergence analysis and provide theoretical justification for existing methods, even in non-tabular environments. Extensive experiments, including simulations on linear quadratic regulators and Atari games, demonstrate the superior performance of the FDE methods.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
Diffusion Transformer-based Universal Dose Denoising for Pencil Beam Scanning Proton Therapy
Authors:
Yuzhen Ding,
Jason Holmes,
Hongying Feng,
Martin Bues,
Lisa A. McGee,
Jean-Claude M. Rwigema,
Nathan Y. Yu,
Terence S. Sio,
Sameer R. Keole,
William W. Wong,
Steven E. Schild,
Jonathan B. Ashman,
Sujay A. Vora,
Daniel J. Ma,
Samir H. Patel,
Wei Liu
Abstract:
Purpose: Intensity-modulated proton therapy (IMPT) offers precise tumor coverage while sparing organs at risk (OARs) in head and neck (H&N) cancer. However, its sensitivity to anatomical changes requires frequent adaptation through online adaptive radiation therapy (oART), which depends on fast, accurate dose calculation via Monte Carlo (MC) simulations. Reducing particle count accelerates MC but…
▽ More
Purpose: Intensity-modulated proton therapy (IMPT) offers precise tumor coverage while sparing organs at risk (OARs) in head and neck (H&N) cancer. However, its sensitivity to anatomical changes requires frequent adaptation through online adaptive radiation therapy (oART), which depends on fast, accurate dose calculation via Monte Carlo (MC) simulations. Reducing particle count accelerates MC but degrades accuracy. To address this, denoising low-statistics MC dose maps is proposed to enable fast, high-quality dose generation.
Methods: We developed a diffusion transformer-based denoising framework. IMPT plans and 3D CT images from 80 H&N patients were used to generate noisy and high-statistics dose maps using MCsquare (1 min and 10 min per plan, respectively). Data were standardized into uniform chunks with zero-padding, normalized, and transformed into quasi-Gaussian distributions. Testing was done on 10 H&N, 10 lung, 10 breast, and 10 prostate cancer cases, preprocessed identically. The model was trained with noisy dose maps and CT images as input and high-statistics dose maps as ground truth, using a combined loss of mean square error (MSE), residual loss, and regional MAE (focusing on top/bottom 10% dose voxels). Performance was assessed via MAE, 3D Gamma passing rate, and DVH indices.
Results: The model achieved MAEs of 0.195 (H&N), 0.120 (lung), 0.172 (breast), and 0.376 Gy[RBE] (prostate). 3D Gamma passing rates exceeded 92% (3%/2mm) across all sites. DVH indices for clinical target volumes (CTVs) and OARs closely matched the ground truth.
Conclusion: A diffusion transformer-based denoising framework was developed and, though trained only on H&N data, generalizes well across multiple disease sites.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Data Mining-Based Techniques for Software Fault Localization
Authors:
Peggy Cellier,
Mireille Ducassé,
Sébastien Ferré,
Olivier Ridoux,
W. Eric Wong
Abstract:
This chapter illustrates the basic concepts of fault localization using a data mining technique. It utilizes the Trityp program to illustrate the general method. Formal concept analysis and association rule are two well-known methods for symbolic data mining. In their original inception, they both consider data in the form of an object-attribute table. In their original inception, they both consid…
▽ More
This chapter illustrates the basic concepts of fault localization using a data mining technique. It utilizes the Trityp program to illustrate the general method. Formal concept analysis and association rule are two well-known methods for symbolic data mining. In their original inception, they both consider data in the form of an object-attribute table. In their original inception, they both consider data in the form of an object-attribute table. The chapter considers a debugging process in which a program is tested against different test cases. Two attributes, PASS and FAIL, represent the issue of the test case. The chapter extends the analysis of data mining for fault localization for the multiple fault situations. It addresses how data mining can be further applied to fault localization for GUI components. Unlike traditional software, GUI test cases are usually event sequences, and each individual event has a unique corresponding event handler.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Likelihood-Free Adaptive Bayesian Inference via Nonparametric Distribution Matching
Authors:
Wenhui Sophia Lu,
Wing Hung Wong
Abstract:
When the likelihood is analytically unavailable and computationally intractable, approximate Bayesian computation (ABC) has emerged as a widely used methodology for approximate posterior inference; however, it suffers from severe computational inefficiency in high-dimensional settings or under diffuse priors. To overcome these limitations, we propose Adaptive Bayesian Inference (ABI), a framework…
▽ More
When the likelihood is analytically unavailable and computationally intractable, approximate Bayesian computation (ABC) has emerged as a widely used methodology for approximate posterior inference; however, it suffers from severe computational inefficiency in high-dimensional settings or under diffuse priors. To overcome these limitations, we propose Adaptive Bayesian Inference (ABI), a framework that bypasses traditional data-space discrepancies and instead compares distributions directly in posterior space through nonparametric distribution matching. By leveraging a novel Marginally-augmented Sliced Wasserstein (MSW) distance on posterior measures and exploiting its quantile representation, ABI transforms the challenging problem of measuring divergence between posterior distributions into a tractable sequence of one-dimensional conditional quantile regression tasks. Moreover, we introduce a new adaptive rejection sampling scheme that iteratively refines the posterior approximation by updating the proposal distribution via generative density estimation. Theoretically, we establish parametric convergence rates for the trimmed MSW distance and prove that the ABI posterior converges to the true posterior as the tolerance threshold vanishes. Through extensive empirical evaluation, we demonstrate that ABI significantly outperforms data-based Wasserstein ABC, summary-based ABC, and state-of-the-art likelihood-free simulators, especially in high-dimensional or dependent observation regimes.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
FashionM3: Multimodal, Multitask, and Multiround Fashion Assistant based on Unified Vision-Language Model
Authors:
Kaicheng Pang,
Xingxing Zou,
Waikeung Wong
Abstract:
Fashion styling and personalized recommendations are pivotal in modern retail, contributing substantial economic value in the fashion industry. With the advent of vision-language models (VLM), new opportunities have emerged to enhance retailing through natural language and visual interactions. This work proposes FashionM3, a multimodal, multitask, and multiround fashion assistant, built upon a VLM…
▽ More
Fashion styling and personalized recommendations are pivotal in modern retail, contributing substantial economic value in the fashion industry. With the advent of vision-language models (VLM), new opportunities have emerged to enhance retailing through natural language and visual interactions. This work proposes FashionM3, a multimodal, multitask, and multiround fashion assistant, built upon a VLM fine-tuned for fashion-specific tasks. It helps users discover satisfying outfits by offering multiple capabilities including personalized recommendation, alternative suggestion, product image generation, and virtual try-on simulation. Fine-tuned on the novel FashionRec dataset, comprising 331,124 multimodal dialogue samples across basic, personalized, and alternative recommendation tasks, FashionM3 delivers contextually personalized suggestions with iterative refinement through multiround interactions. Quantitative and qualitative evaluations, alongside user studies, demonstrate FashionM3's superior performance in recommendation effectiveness and practical value as a fashion assistant.
△ Less
Submitted 23 April, 2025;
originally announced April 2025.
-
The Power of Amortization on Scheduling with Explorable Uncertainty
Authors:
Alison Hsiang-Hsuan Liu,
Fu-Hong Liu,
Prudence W. H. Wong,
Xiao-Ou Zhang
Abstract:
In this work, we study a scheduling problem with explorable uncertainty. Each job comes with an upper limit of its processing time, which could be potentially reduced by testing the job, which also takes time. The objective is to schedule all jobs on a single machine with a minimum total completion time. The challenge lies in deciding which jobs to test and the order of testing/processing jobs.…
▽ More
In this work, we study a scheduling problem with explorable uncertainty. Each job comes with an upper limit of its processing time, which could be potentially reduced by testing the job, which also takes time. The objective is to schedule all jobs on a single machine with a minimum total completion time. The challenge lies in deciding which jobs to test and the order of testing/processing jobs.
The online problem was first introduced with unit testing time and later generalized to variable testing times. For this general setting, the upper bounds of the competitive ratio are shown to be $4$ and $3.3794$ for deterministic and randomized online algorithms; while the lower bounds for unit testing time stands, which are $1.8546$ (deterministic) and $1.6257$ (randomized).
We continue the study on variable testing times setting. We first enhance the analysis framework and improve the competitive ratio of the deterministic algorithm from $4$ to $1+\sqrt{2} \approx 2.4143$. Using the new analysis framework, we propose a new deterministic algorithm that further improves the competitive ratio to $2.316513$. The new framework also enables us to develop a randomized algorithm improving the expected competitive ratio from $3.3794$ to $2.152271$.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Integration of Explainable AI Techniques with Large Language Models for Enhanced Interpretability for Sentiment Analysis
Authors:
Thivya Thogesan,
Anupiya Nugaliyadde,
Kok Wai Wong
Abstract:
Interpretability remains a key difficulty in sentiment analysis with Large Language Models (LLMs), particularly in high-stakes applications where it is crucial to comprehend the rationale behind forecasts. This research addressed this by introducing a technique that applies SHAP (Shapley Additive Explanations) by breaking down LLMs into components such as embedding layer,encoder,decoder and attent…
▽ More
Interpretability remains a key difficulty in sentiment analysis with Large Language Models (LLMs), particularly in high-stakes applications where it is crucial to comprehend the rationale behind forecasts. This research addressed this by introducing a technique that applies SHAP (Shapley Additive Explanations) by breaking down LLMs into components such as embedding layer,encoder,decoder and attention layer to provide a layer-by-layer knowledge of sentiment prediction. The approach offers a clearer overview of how model interpret and categorise sentiment by breaking down LLMs into these parts. The method is evaluated using the Stanford Sentiment Treebank (SST-2) dataset, which shows how different sentences affect different layers. The effectiveness of layer-wise SHAP analysis in clarifying sentiment-specific token attributions is demonstrated by experimental evaluations, which provide a notable enhancement over current whole-model explainability techniques. These results highlight how the suggested approach could improve the reliability and transparency of LLM-based sentiment analysis in crucial applications.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
Faithful, Unfaithful or Ambiguous? Multi-Agent Debate with Initial Stance for Summary Evaluation
Authors:
Mahnaz Koupaee,
Jake W. Vincent,
Saab Mansour,
Igor Shalyminov,
Han He,
Hwanjun Song,
Raphael Shu,
Jianfeng He,
Yi Nian,
Amy Wing-mei Wong,
Kyu J. Han,
Hang Su
Abstract:
Faithfulness evaluators based on large language models (LLMs) are often fooled by the fluency of the text and struggle with identifying errors in the summaries. We propose an approach to summary faithfulness evaluation in which multiple LLM-based agents are assigned initial stances (regardless of what their belief might be) and forced to come up with a reason to justify the imposed belief, thus en…
▽ More
Faithfulness evaluators based on large language models (LLMs) are often fooled by the fluency of the text and struggle with identifying errors in the summaries. We propose an approach to summary faithfulness evaluation in which multiple LLM-based agents are assigned initial stances (regardless of what their belief might be) and forced to come up with a reason to justify the imposed belief, thus engaging in a multi-round debate to reach an agreement. The uniformly distributed initial assignments result in a greater diversity of stances leading to more meaningful debates and ultimately more errors identified. Furthermore, by analyzing the recent faithfulness evaluation datasets, we observe that naturally, it is not always the case for a summary to be either faithful to the source document or not. We therefore introduce a new dimension, ambiguity, and a detailed taxonomy to identify such special cases. Experiments demonstrate our approach can help identify ambiguities, and have even a stronger performance on non-ambiguous summaries.
△ Less
Submitted 13 February, 2025; v1 submitted 12 February, 2025;
originally announced February 2025.
-
A Zero-Shot LLM Framework for Automatic Assignment Grading in Higher Education
Authors:
Calvin Yeung,
Jeff Yu,
King Chau Cheung,
Tat Wing Wong,
Chun Man Chan,
Kin Chi Wong,
Keisuke Fujii
Abstract:
Automated grading has become an essential tool in education technology due to its ability to efficiently assess large volumes of student work, provide consistent and unbiased evaluations, and deliver immediate feedback to enhance learning. However, current systems face significant limitations, including the need for large datasets in few-shot learning methods, a lack of personalized and actionable…
▽ More
Automated grading has become an essential tool in education technology due to its ability to efficiently assess large volumes of student work, provide consistent and unbiased evaluations, and deliver immediate feedback to enhance learning. However, current systems face significant limitations, including the need for large datasets in few-shot learning methods, a lack of personalized and actionable feedback, and an overemphasis on benchmark performance rather than student experience. To address these challenges, we propose a Zero-Shot Large Language Model (LLM)-Based Automated Assignment Grading (AAG) system. This framework leverages prompt engineering to evaluate both computational and explanatory student responses without requiring additional training or fine-tuning. The AAG system delivers tailored feedback that highlights individual strengths and areas for improvement, thereby enhancing student learning outcomes. Our study demonstrates the system's effectiveness through comprehensive evaluations, including survey responses from higher education students that indicate significant improvements in motivation, understanding, and preparedness compared to traditional grading methods. The results validate the AAG system's potential to transform educational assessment by prioritizing learning experiences and providing scalable, high-quality feedback.
△ Less
Submitted 24 January, 2025;
originally announced January 2025.
-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1084 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 19 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Pinching Antennas: Principles, Applications and Challenges
Authors:
Zheng Yang,
Ning Wang,
Yanshi Sun,
Zhiguo Ding,
Robert Schober,
George K. Karagiannidis,
Vincent W. S. Wong,
Octavia A. Dobre
Abstract:
Flexible-antenna systems, such as fluid antennas and movable antennas, have been recognized as key enabling technologies for sixth-generation (6G) wireless networks, as they can intelligently reconfigure the effective channel gains of the users and hence significantly improve their data transmission capabilities. However, existing flexible-antenna systems have been designed to combat small-scale f…
▽ More
Flexible-antenna systems, such as fluid antennas and movable antennas, have been recognized as key enabling technologies for sixth-generation (6G) wireless networks, as they can intelligently reconfigure the effective channel gains of the users and hence significantly improve their data transmission capabilities. However, existing flexible-antenna systems have been designed to combat small-scale fading in non-line-of-sight (NLoS) conditions. As a result, they lack the ability to establish line-of-sight links, which are typically 100 times stronger than NLoS links. In addition, existing flexible-antenna systems have limited flexibility, where adding/removing an antenna is not straightforward. This article introduces an innovative flexible-antenna system called pinching antennas, which are realized by applying small dielectric particles to waveguides. We first describe the basics of pinching-antenna systems and their ability to provide strong LoS links by deploying pinching antennas close to the users as well as their capability to scale up/down the antenna system. We then focus on communication scenarios with different numbers of waveguides and pinching antennas, where innovative approaches to implement multiple-input multiple-output and non-orthogonal multiple access are discussed. In addition, promising 6G-related applications of pinching antennas, including integrated sensing and communication and next-generation multiple access, are presented. Finally, important directions for future research, such as waveguide deployment and channel estimation, are highlighted.
△ Less
Submitted 18 January, 2025;
originally announced January 2025.
-
Analogue Forecast System for Daily Precipitation Prediction Using Autoencoder Feature Extraction: Application in Hong Kong
Authors:
Yee Chun Tsoi,
Yu Ting Kwok,
Ming Chun Lam,
Wai Kin Wong
Abstract:
In the Hong Kong Observatory, the Analogue Forecast System (AFS) for precipitation has been providing useful reference in predicting possible daily rainfall scenarios for the next 9 days, by identifying historical cases with similar weather patterns to the latest output from the deterministic model of the European Centre for Medium-Range Weather Forecasts (ECMWF). Recent advances in machine learni…
▽ More
In the Hong Kong Observatory, the Analogue Forecast System (AFS) for precipitation has been providing useful reference in predicting possible daily rainfall scenarios for the next 9 days, by identifying historical cases with similar weather patterns to the latest output from the deterministic model of the European Centre for Medium-Range Weather Forecasts (ECMWF). Recent advances in machine learning allow more sophisticated models to be trained using historical data and the patterns of high-impact weather events to be represented more effectively. As such, an enhanced AFS has been developed using the deep learning technique autoencoder. The datasets of the fifth generation of the ECMWF Reanalysis (ERA5) are utilised where more meteorological elements in higher horizontal, vertical and temporal resolutions are available as compared to the previous ECMWF reanalysis products used in the existing AFS. The enhanced AFS features four major steps in generating the daily rain class forecasts: (1) preprocessing of gridded ERA5 and ECMWF model forecast, (2) feature extraction by the pretrained autoencoder, (3) application of optimised feature weightings based on historical cases, and (4) calculation of the final rain class from a weighted ensemble of top analogues. The enhanced AFS demonstrates a consistent and superior performance over the existing AFS, especially in capturing heavy rain cases, during the verification period from 2019 to 2022. This paper presents the detailed formulation of the enhanced AFS and discusses its advantages and limitations in supporting precipitation forecasting in Hong Kong.
△ Less
Submitted 6 January, 2025;
originally announced January 2025.
-
An AI-powered Bayesian generative modeling approach for causal inference in observational studies
Authors:
Qiao Liu,
Wing Hung Wong
Abstract:
Causal inference in observational studies with high-dimensional covariates presents significant challenges. We introduce CausalBGM, an AI-powered Bayesian generative modeling approach that captures the causal relationship among covariates, treatment, and outcome variables. The core innovation of CausalBGM lies in its ability to estimate the individual treatment effect (ITE) by learning individual-…
▽ More
Causal inference in observational studies with high-dimensional covariates presents significant challenges. We introduce CausalBGM, an AI-powered Bayesian generative modeling approach that captures the causal relationship among covariates, treatment, and outcome variables. The core innovation of CausalBGM lies in its ability to estimate the individual treatment effect (ITE) by learning individual-specific distributions of a low-dimensional latent feature set (e.g., latent confounders) that drives changes in both treatment and outcome. This approach not only effectively mitigates confounding effects but also provides comprehensive uncertainty quantification, offering reliable and interpretable causal effect estimates at the individual level. CausalBGM adopts a Bayesian model and uses a novel iterative algorithm to update the model parameters and the posterior distribution of latent features until convergence. This framework leverages the power of AI to capture complex dependencies among variables while adhering to the Bayesian principles. Extensive experiments demonstrate that CausalBGM consistently outperforms state-of-the-art methods, particularly in scenarios with high-dimensional covariates and large-scale datasets. Its Bayesian foundation ensures statistical rigor, providing robust and well-calibrated posterior intervals. By addressing key limitations of existing methods, CausalBGM emerges as a robust and promising framework for advancing causal inference in modern applications in fields such as genomics, healthcare, and social sciences. CausalBGM is maintained at the website https://causalbgm.readthedocs.io/.
△ Less
Submitted 1 January, 2025;
originally announced January 2025.
-
MVREC: A General Few-shot Defect Classification Model Using Multi-View Region-Context
Authors:
Shuai Lyu,
Rongchen Zhang,
Zeqi Ma,
Fangjian Liao,
Dongmei Mo,
Waikeung Wong
Abstract:
Few-shot defect multi-classification (FSDMC) is an emerging trend in quality control within industrial manufacturing. However, current FSDMC research often lacks generalizability due to its focus on specific datasets. Additionally, defect classification heavily relies on contextual information within images, and existing methods fall short of effectively extracting this information. To address the…
▽ More
Few-shot defect multi-classification (FSDMC) is an emerging trend in quality control within industrial manufacturing. However, current FSDMC research often lacks generalizability due to its focus on specific datasets. Additionally, defect classification heavily relies on contextual information within images, and existing methods fall short of effectively extracting this information. To address these challenges, we propose a general FSDMC framework called MVREC, which offers two primary advantages: (1) MVREC extracts general features for defect instances by incorporating the pre-trained AlphaCLIP model. (2) It utilizes a region-context framework to enhance defect features by leveraging mask region input and multi-view context augmentation. Furthermore, Few-shot Zip-Adapter(-F) classifiers within the model are introduced to cache the visual features of the support set and perform few-shot classification. We also introduce MVTec-FS, a new FSDMC benchmark based on MVTec AD, which includes 1228 defect images with instance-level mask annotations and 46 defect types. Extensive experiments conducted on MVTec-FS and four additional datasets demonstrate its effectiveness in general defect classification and its ability to incorporate contextual information to improve classification performance. Code: https://github.com/ShuaiLYU/MVREC
△ Less
Submitted 30 March, 2025; v1 submitted 22 December, 2024;
originally announced December 2024.
-
T-Edge: Trusted Heterogeneous Edge Computing
Authors:
Jiamin Shen,
Yao Chen,
Weng-Fai Wong,
Ee-Chien Chang
Abstract:
Heterogeneous computing, which incorporates GPUs, NPUs, and FPGAs, is increasingly utilized to improve the efficiency of computer systems. However, this shift has given rise to significant security and privacy concerns, especially when the execution platform is remote. One way to tackle these challenges is to establish a trusted and isolated environment for remote program execution, while maintain…
▽ More
Heterogeneous computing, which incorporates GPUs, NPUs, and FPGAs, is increasingly utilized to improve the efficiency of computer systems. However, this shift has given rise to significant security and privacy concerns, especially when the execution platform is remote. One way to tackle these challenges is to establish a trusted and isolated environment for remote program execution, while maintaining minimal overhead and flexibility. While CPU-based trusted execution has been extensively explored and found commercial success, extension to heterogeneous computing systems remains a challenge. This paper proposes a practical trusted execution environment design for ARM/FPGA System-on-Chip platforms, leveraging TrustZone's unique characteristics. The design features a dedicated security controller within the ARM TrustZone, overseeing FPGA reconfiguration and managing communication between CPU cores and FPGA fabrics. This design involves a provisioning service that enables application users to establish trust in the FPGA fabric within cloud-based computing resources provided by the platform owner, running applications developed by third-party developers and hardware manufactured by the device manufacturer. To ensure the security of our proposed system, we employ an automated protocol verifier, ProVerif, to validate its compliance with essential security requirements. Furthermore, we demonstrate the practicality of our system model by implementing a prototype application on the Xilinx MPSoC development board.
△ Less
Submitted 18 December, 2024;
originally announced December 2024.
-
Software Fault Localization Based on Multi-objective Feature Fusion and Deep Learning
Authors:
Xiaolei Hu,
Dongcheng Li,
W. Eric Wong,
Ya Zou
Abstract:
Software fault localization remains challenging due to limited feature diversity and low precision in traditional methods. This paper proposes a novel approach that integrates multi-objective optimization with deep learning models to improve both accuracy and efficiency in fault localization (FL). By framing feature selection as a multi-objective optimization problem (MOP), we extract and fuse thr…
▽ More
Software fault localization remains challenging due to limited feature diversity and low precision in traditional methods. This paper proposes a novel approach that integrates multi-objective optimization with deep learning models to improve both accuracy and efficiency in fault localization (FL). By framing feature selection as a multi-objective optimization problem (MOP), we extract and fuse three critical fault-related feature sets: spectrum-based, mutation-based, and text-based features, into a comprehensive feature fusion model. These features are then embedded within a deep learning architecture, comprising a multilayer perceptron (MLP) and gated recurrent network (GRN), which together enhance localization accuracy and generalizability. Experiments on the Defects4J benchmark dataset with 434 faults show that the proposed algorithm reduces processing time by 78.2% compared to single-objective methods. Additionally, our MLP and GRN models achieve a 94.2% improvement in localization accuracy compared to traditional FL methods, outperforming state-of-the-art deep learning-based FL method by 7.67%. Further validation using the PROMISE dataset demonstrates the generalizability of the proposed model, showing a 4.6% accuracy improvement in cross-project tests over state-of-the-art deep learning-based FL method.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
Many happy returns: machine learning to support platelet issuing and waste reduction in hospital blood banks
Authors:
Joseph Farrington,
Samah Alimam,
Martin Utley,
Kezhi Li,
Wai Keong Wong
Abstract:
Efforts to reduce platelet wastage in hospital blood banks have focused on ordering policies, but the predominant practice of issuing the oldest unit first may not be optimal when some units are returned unused. We propose a novel, machine learning (ML)-guided issuing policy to increase the likelihood of returned units being reissued before expiration. Our ML model trained to predict returns on 17…
▽ More
Efforts to reduce platelet wastage in hospital blood banks have focused on ordering policies, but the predominant practice of issuing the oldest unit first may not be optimal when some units are returned unused. We propose a novel, machine learning (ML)-guided issuing policy to increase the likelihood of returned units being reissued before expiration. Our ML model trained to predict returns on 17,297 requests for platelets gave AUROC 0.74 on 9,353 held-out requests. Prior to ML model development we built a simulation of the blood bank operation that incorporated returns to understand the scale of benefits of such a model. Using our trained model in the simulation gave an estimated reduction in wastage of 14%. Our partner hospital is considering adopting our approach, which would be particularly beneficial for hospitals with higher return rates and where units have a shorter remaining useful life on arrival.
△ Less
Submitted 22 November, 2024;
originally announced November 2024.
-
BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery
Authors:
Peter St. John,
Dejun Lin,
Polina Binder,
Malcolm Greaves,
Vega Shah,
John St. John,
Adrian Lange,
Patrick Hsu,
Rajesh Illango,
Arvind Ramanathan,
Anima Anandkumar,
David H Brookes,
Akosua Busia,
Abhishaike Mahajan,
Stephen Malina,
Neha Prasad,
Sam Sinai,
Lindsay Edwards,
Thomas Gaudelet,
Cristian Regep,
Martin Steinegger,
Burkhard Rost,
Alexander Brace,
Kyle Hippe,
Luca Naef
, et al. (68 additional authors not shown)
Abstract:
Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational bio…
▽ More
Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models across hundreds of GPUs. Its modular design allows the integration of individual components, such as data loaders, into existing workflows and is open to community contributions. We detail technical features of the BioNeMo Framework through use cases such as pLM pre-training and fine-tuning. On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days. The BioNeMo Framework is open-source and free for everyone to use.
△ Less
Submitted 12 June, 2025; v1 submitted 15 November, 2024;
originally announced November 2024.
-
Analysis Methodology for Age of Information under Sequence Based Scheduling
Authors:
Fang Liu,
Wing Shing Wong,
Yuan-Hsun Lo,
Yijin Zhang,
Chung Shue Chen
Abstract:
We focus on the Age of Information (AoI) performance in a system where each user generates packets periodically to send to a common access point (AP) for status updating. To avoid heavy overhead, we assume that channel sensing, feedback information from the AP, and time synchronization are not available in the system. We adopt a multi-access scheme called the sequence scheme, where each user is as…
▽ More
We focus on the Age of Information (AoI) performance in a system where each user generates packets periodically to send to a common access point (AP) for status updating. To avoid heavy overhead, we assume that channel sensing, feedback information from the AP, and time synchronization are not available in the system. We adopt a multi-access scheme called the sequence scheme, where each user is assigned a periodic binary sequence to schedule their transmissions. In our previous work [18], we have thoroughly studied the AoI performance under sequence scheme when the period of schedule sequences, $L$, is equal to the status generating period, $T$. The results can be extended to the case where $T>L$. However, the case of $T<L$ is not covered by [18]. Therefore, in this paper, we concentrate on analyzing the AoI performance in the case of $T<L$, which is more challenging and requires different approaches. We conduct in-depth analysis on this case and develop a mathematical tool based on integer partitions to facilitate the analysis. We derive low-complexity closed-form expressions for two scenarios under $T<L$. Based on the obtained analytical results, we propose an algorithm to optimize the construction parameters of the sequence scheme. Finally, we compare our proposed sequence scheme with two commonly used baselines, and show that our proposed scheme outperforms the baselines in terms of AoI performance while consuming less energy.
△ Less
Submitted 14 November, 2024;
originally announced November 2024.
-
Metaheuristics is All You Need
Authors:
Eliuvish Cuicizion,
Haowen Xu,
Weng Kee Wong
Abstract:
Optimization plays an important role in tackling public health problems. Animal instincts can be used effectively to solve complex public health management issues by providing optimal or approximately optimal solutions to complicated optimization problems common in public health. BAT algorithm is an exemplary member of a class of nature-inspired metaheuristic optimization algorithms and designed t…
▽ More
Optimization plays an important role in tackling public health problems. Animal instincts can be used effectively to solve complex public health management issues by providing optimal or approximately optimal solutions to complicated optimization problems common in public health. BAT algorithm is an exemplary member of a class of nature-inspired metaheuristic optimization algorithms and designed to outperform existing metaheuristic algorithms in terms of efficiency and accuracy. It's inspiration comes from the foraging behavior of group of microbats that use echolocation to find their target in the surrounding environment. In recent years, BAT algorithm has been extensively used by researchers in the area of optimization, and various variants of BAT algorithm have been developed to improve its performance and extend its application to diverse disciplines. This paper first reviews the basic BAT algorithm and its variants, including their applications in various fields. As a specific application, we apply the BAT algorithm to a biostatistical estimation problem and show it has some clear advantages over existing algorithms.
△ Less
Submitted 21 March, 2025; v1 submitted 25 October, 2024;
originally announced November 2024.
-
Super-Resolution without High-Resolution Labels for Black Hole Simulations
Authors:
Thomas Helfer,
Thomas D. P. Edwards,
Jessica Dafflon,
Kaze W. K. Wong,
Matthew Lyle Olson
Abstract:
Generating high-resolution simulations is key for advancing our understanding of one of the universe's most violent events: Black Hole mergers. However, generating Black Hole simulations is limited by prohibitive computational costs and scalability issues, reducing the simulation's fidelity and resolution achievable within reasonable time frames and resources. In this work, we introduce a novel me…
▽ More
Generating high-resolution simulations is key for advancing our understanding of one of the universe's most violent events: Black Hole mergers. However, generating Black Hole simulations is limited by prohibitive computational costs and scalability issues, reducing the simulation's fidelity and resolution achievable within reasonable time frames and resources. In this work, we introduce a novel method that circumvents these limitations by applying a super-resolution technique without directly needing high-resolution labels, leveraging the Hamiltonian and momentum constraints-fundamental equations in general relativity that govern the dynamics of spacetime. We demonstrate that our method achieves a reduction in constraint violation by one to two orders of magnitude and generalizes effectively to out-of-distribution simulations.
△ Less
Submitted 3 November, 2024;
originally announced November 2024.
-
Many-Objective Search-Based Coverage-Guided Automatic Test Generation for Deep Neural Networks
Authors:
Dongcheng Li,
W. Eric Wong,
Hu Liu,
Man Zhao
Abstract:
To ensure the reliability of DNN systems and address the test generation problem for neural networks, this paper proposes a fuzzing test generation technique based on many-objective optimization algorithms. Traditional fuzz testing employs random search, leading to lower testing efficiency and tends to generate numerous invalid test cases. By utilizing many-objective optimization techniques, effec…
▽ More
To ensure the reliability of DNN systems and address the test generation problem for neural networks, this paper proposes a fuzzing test generation technique based on many-objective optimization algorithms. Traditional fuzz testing employs random search, leading to lower testing efficiency and tends to generate numerous invalid test cases. By utilizing many-objective optimization techniques, effective test cases can be generated. To achieve high test coverage, this paper proposes several improvement strategies. The frequency-based fuzz sampling strategy assigns priorities based on the frequency of selection of initial data, avoiding the repetitive selection of the same data and enhancing the quality of initial data better than random sampling strategies. To address the issue that global search may yield test not satisfying semantic constraints, a local search strategy based on the Monte Carlo tree search is proposed to enhance the algorithm's local search capabilities. Furthermore, we improve the diversity of the population and the algorithm's global search capability by updating SPEA2's external archive based on a decomposition-based archiving strategy. To validate the effectiveness of the proposed approach, experiments were conducted on several public datasets and various neural network models. The results reveal that, compared to random and clustering-based sampling, the frequency-based fuzz sampling strategy provides a greater improvement in coverage rate in the later stages of iterations. On complex networks like VGG16, the improved SPEA2 algorithm increased the coverage rate by about 12% across several coverage metrics, and by approximately 40% on LeNet series networks. The experimental results also indicates that the newly generated test cases not only exhibit higher coverage rates but also generate adversarial samples that reveal model errors.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
Fourier Amplitude and Correlation Loss: Beyond Using L2 Loss for Skillful Precipitation Nowcasting
Authors:
Chiu-Wai Yan,
Shi Quan Foo,
Van Hoan Trinh,
Dit-Yan Yeung,
Ka-Hing Wong,
Wai-Kin Wong
Abstract:
Deep learning approaches have been widely adopted for precipitation nowcasting in recent years. Previous studies mainly focus on proposing new model architectures to improve pixel-wise metrics. However, they frequently result in blurry predictions which provide limited utility to forecasting operations. In this work, we propose a new Fourier Amplitude and Correlation Loss (FACL) which consists of…
▽ More
Deep learning approaches have been widely adopted for precipitation nowcasting in recent years. Previous studies mainly focus on proposing new model architectures to improve pixel-wise metrics. However, they frequently result in blurry predictions which provide limited utility to forecasting operations. In this work, we propose a new Fourier Amplitude and Correlation Loss (FACL) which consists of two novel loss terms: Fourier Amplitude Loss (FAL) and Fourier Correlation Loss (FCL). FAL regularizes the Fourier amplitude of the model prediction and FCL complements the missing phase information. The two loss terms work together to replace the traditional $L_2$ losses such as MSE and weighted MSE for the spatiotemporal prediction problem on signal-based data. Our method is generic, parameter-free and efficient. Extensive experiments using one synthetic dataset and three radar echo datasets demonstrate that our method improves perceptual metrics and meteorology skill scores, with a small trade-off to pixel-wise accuracy and structural similarity. Moreover, to improve the error margin in meteorological skill scores such as Critical Success Index (CSI) and Fractions Skill Score (FSS), we propose and adopt the Regional Histogram Divergence (RHD), a distance metric that considers the patch-wise similarity between signal-based imagery patterns with tolerance to local transforms. Code is available at https://github.com/argenycw/FACL
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
Accelerated Bayesian parameter estimation and model selection for gravitational waves with normalizing flows
Authors:
Alicja Polanska,
Thibeau Wouters,
Peter T. H. Pang,
Kaze K. W. Wong,
Jason D. McEwen
Abstract:
We present an accelerated pipeline, based on high-performance computing techniques and normalizing flows, for joint Bayesian parameter estimation and model selection and demonstrate its efficiency in gravitational wave astrophysics. We integrate the Jim inference toolkit, a normalizing flow-enhanced Markov chain Monte Carlo (MCMC) sampler, with the learned harmonic mean estimator. Our Bayesian evi…
▽ More
We present an accelerated pipeline, based on high-performance computing techniques and normalizing flows, for joint Bayesian parameter estimation and model selection and demonstrate its efficiency in gravitational wave astrophysics. We integrate the Jim inference toolkit, a normalizing flow-enhanced Markov chain Monte Carlo (MCMC) sampler, with the learned harmonic mean estimator. Our Bayesian evidence estimates run on $1$ GPU are consistent with traditional nested sampling techniques run on $16$ CPU cores, while reducing the computation time by factors of $5\times$ and $15\times$ for $4$-dimensional and $11$-dimensional gravitational wave inference problems, respectively. Our code is available in well-tested and thoroughly documented open-source packages, ensuring accessibility and reproducibility for the wider research community.
△ Less
Submitted 31 October, 2024; v1 submitted 28 October, 2024;
originally announced October 2024.
-
Enabling Energy-Efficient Deployment of Large Language Models on Memristor Crossbar: A Synergy of Large and Small
Authors:
Zhehui Wang,
Tao Luo,
Cheng Liu,
Weichen Liu,
Rick Siow Mong Goh,
Weng-Fai Wong
Abstract:
Large language models (LLMs) have garnered substantial attention due to their promising applications in diverse domains. Nevertheless, the increasing size of LLMs comes with a significant surge in the computational requirements for training and deployment. Memristor crossbars have emerged as a promising solution, which demonstrated a small footprint and remarkably high energy efficiency in compute…
▽ More
Large language models (LLMs) have garnered substantial attention due to their promising applications in diverse domains. Nevertheless, the increasing size of LLMs comes with a significant surge in the computational requirements for training and deployment. Memristor crossbars have emerged as a promising solution, which demonstrated a small footprint and remarkably high energy efficiency in computer vision (CV) models. Memristors possess higher density compared to conventional memory technologies, making them highly suitable for effectively managing the extreme model size associated with LLMs. However, deploying LLMs on memristor crossbars faces three major challenges. Firstly, the size of LLMs increases rapidly, already surpassing the capabilities of state-of-the-art memristor chips. Secondly, LLMs often incorporate multi-head attention blocks, which involve non-weight stationary multiplications that traditional memristor crossbars cannot support. Third, while memristor crossbars excel at performing linear operations, they are not capable of executing complex nonlinear operations in LLM such as softmax and layer normalization. To address these challenges, we present a novel architecture for the memristor crossbar that enables the deployment of state-of-the-art LLM on a single chip or package, eliminating the energy and time inefficiencies associated with off-chip communication. Our testing on BERT_Large showed negligible accuracy loss. Compared to traditional memristor crossbars, our architecture achieves enhancements of up to 39X in area overhead and 18X in energy consumption. Compared to modern TPU/GPU systems, our architecture demonstrates at least a 68X reduction in the area-delay product and a significant 69% energy consumption reduction.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Multiset Combinatorial Gray Codes with Application to Proximity Sensor Networks
Authors:
Chung Shue Chen,
Wing Shing Wong,
Yuan-Hsun Lo,
Tsai-Lien Wong
Abstract:
We investigate coding schemes that map source symbols into multisets of an alphabet set. Such a formulation of source coding is an alternative approach to the traditional framework and is inspired by an object tracking problem over proximity sensor networks. We define a \textit{multiset combinatorial Gray code} as a mulitset code with fixed multiset cardinality that possesses combinatorial Gray co…
▽ More
We investigate coding schemes that map source symbols into multisets of an alphabet set. Such a formulation of source coding is an alternative approach to the traditional framework and is inspired by an object tracking problem over proximity sensor networks. We define a \textit{multiset combinatorial Gray code} as a mulitset code with fixed multiset cardinality that possesses combinatorial Gray code characteristic. For source codes that are organized as a grid, namely an integer lattice, we propose a solution by first constructing a mapping from the grid to the alphabet set, the codes are then defined as the images of rectangular blocks in the grid of fixed dimensions. We refer to the mapping as a \textit{color mapping} and the code as a \textit{color multiset code}. We propose the idea of product multiset code that enables us to construct codes for high dimensional grids based on 1-dimensional (1D) grids. We provide a detailed analysis of color multiset codes on 1D grids, focusing on codes that require the minimal number of colors. To illustrate the application of such a coding scheme, we consider an object tracking problem on 2D grids and show its efficiency, which comes from exploiting transmission parallelism. Some numerical results are presented to conclude the paper.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
A Hybrid Sampling and Multi-Objective Optimization Approach for Enhanced Software Defect Prediction
Authors:
Jie Zhang,
Dongcheng Li,
W. Eric Wong,
Shengrong Wang
Abstract:
Accurate early prediction of software defects is essential to maintain software quality and reduce maintenance costs. However, the field of software defect prediction (SDP) faces challenges such as class imbalances, high-dimensional feature spaces, and suboptimal prediction accuracy. To mitigate these challenges, this paper introduces a novel SDP framework that integrates hybrid sampling technique…
▽ More
Accurate early prediction of software defects is essential to maintain software quality and reduce maintenance costs. However, the field of software defect prediction (SDP) faces challenges such as class imbalances, high-dimensional feature spaces, and suboptimal prediction accuracy. To mitigate these challenges, this paper introduces a novel SDP framework that integrates hybrid sampling techniques, specifically Borderline SMOTE and Tomek Links, with a suite of multi-objective optimization algorithms, including NSGA-II, MOPSO, and MODE. The proposed model applies feature fusion through multi-objective optimization, enhancing both the generalization capability and stability of the predictions. Furthermore, the integration of parallel processing for these optimization algorithms significantly boosts the computational efficiency of the model. Comprehensive experiments conducted on datasets from NASA and PROMISE repositories demonstrate that the proposed hybrid sampling and multi-objective optimization approach improves data balance, eliminates redundant features, and enhances prediction accuracy. The experimental results also highlight the robustness of the feature fusion approach, confirming its superiority over existing state-of-the-art techniques in terms of predictive performance and applicability across diverse datasets.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Smart Contract Vulnerability Detection based on Static Analysis and Multi-Objective Search
Authors:
Dongcheng Li,
W. Eric Wong,
Xiaodan Wang,
Sean Pan,
Liang-Seng Koh
Abstract:
This paper introduces a method for detecting vulnerabilities in smart contracts using static analysis and a multi-objective optimization algorithm. We focus on four types of vulnerabilities: reentrancy, call stack overflow, integer overflow, and timestamp dependencies. Initially, smart contracts are compiled into an abstract syntax tree to analyze relationships between contracts and functions, inc…
▽ More
This paper introduces a method for detecting vulnerabilities in smart contracts using static analysis and a multi-objective optimization algorithm. We focus on four types of vulnerabilities: reentrancy, call stack overflow, integer overflow, and timestamp dependencies. Initially, smart contracts are compiled into an abstract syntax tree to analyze relationships between contracts and functions, including calls, inheritance, and data flow. These analyses are transformed into static evaluations and intermediate representations that reveal internal relations. Based on these representations, we examine contract's functions, variables, and data dependencies to detect the specified vulnerabilities. To enhance detection accuracy and coverage, we apply a multi-objective optimization algorithm to the static analysis process. This involves assigning initial numeric values to input data and monitoring changes in statement coverage and detection accuracy. Using coverage and accuracy as fitness values, we calculate Pareto front and crowding distance values to select the best individuals for the new parent population, iterating until optimization criteria are met. We validate our approach using an open-source dataset collected from Etherscan, containing 6,693 smart contracts. Experimental results show that our method outperforms state-of-the-art tools in terms of coverage, accuracy, efficiency, and effectiveness in detecting the targeted vulnerabilities.
△ Less
Submitted 30 September, 2024;
originally announced October 2024.
-
Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model
Authors:
Kaiwen Tang,
Zhanglu Yan,
Weng-Fai Wong
Abstract:
For reasons such as privacy, there are use cases for language models at the edge. This has given rise to small language models (SLMs) targeted for deployment in resource-constrained devices where energy efficiency is a significant concern. Spiking neural networks (SNNs) offer a promising solution due to their energy efficiency, and there are already works on realizing transformer-based models on S…
▽ More
For reasons such as privacy, there are use cases for language models at the edge. This has given rise to small language models (SLMs) targeted for deployment in resource-constrained devices where energy efficiency is a significant concern. Spiking neural networks (SNNs) offer a promising solution due to their energy efficiency, and there are already works on realizing transformer-based models on SNNs. However, key operations like softmax and layer normalization (LN) are difficult to implement on neuromorphic hardware, and many of these early works sidestepped them. To address these challenges, we introduce Sorbet, a transformer-based spiking language model that is more neuromorphic hardware-compatible. Sorbet incorporates a novel shifting-based softmax called PTsoftmax and a power normalization method using bit-shifting (BSPN), both designed to replace the respective energy-intensive operations. By leveraging knowledge distillation and model quantization, Sorbet achieved a highly compressed binary weight model that maintains competitive performance while significantly reducing energy consumption. We validate Sorbet's effectiveness through extensive testing on the GLUE benchmark and a series of ablation studies, demonstrating its potential as an energy-efficient solution for language model inference.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology
Authors:
Aidan Gilson,
Xuguang Ai,
Thilaka Arunachalam,
Ziyou Chen,
Ki Xiong Cheong,
Amisha Dave,
Cameron Duic,
Mercy Kibe,
Annette Kaminaka,
Minali Prasad,
Fares Siddig,
Maxwell Singer,
Wendy Wong,
Qiao Jin,
Tiarnan D. L. Keenan,
Xia Hu,
Emily Y. Chew,
Zhiyong Lu,
Hua Xu,
Ron A. Adelman,
Yih-Chung Tham,
Qingyu Chen
Abstract:
Despite the potential of Large Language Models (LLMs) in medicine, they may generate responses lacking supporting evidence or based on hallucinated evidence. While Retrieval Augment Generation (RAG) is popular to address this issue, few studies implemented and evaluated RAG in downstream domain-specific applications. We developed a RAG pipeline with 70,000 ophthalmology-specific documents that ret…
▽ More
Despite the potential of Large Language Models (LLMs) in medicine, they may generate responses lacking supporting evidence or based on hallucinated evidence. While Retrieval Augment Generation (RAG) is popular to address this issue, few studies implemented and evaluated RAG in downstream domain-specific applications. We developed a RAG pipeline with 70,000 ophthalmology-specific documents that retrieve relevant documents to augment LLMs during inference time. In a case study on long-form consumer health questions, we systematically evaluated the responses including over 500 references of LLMs with and without RAG on 100 questions with 10 healthcare professionals. The evaluation focuses on factuality of evidence, selection and ranking of evidence, attribution of evidence, and answer accuracy and completeness. LLMs without RAG provided 252 references in total. Of which, 45.3% hallucinated, 34.1% consisted of minor errors, and 20.6% were correct. In contrast, LLMs with RAG significantly improved accuracy (54.5% being correct) and reduced error rates (18.8% with minor hallucinations and 26.7% with errors). 62.5% of the top 10 documents retrieved by RAG were selected as the top references in the LLM response, with an average ranking of 4.9. The use of RAG also improved evidence attribution (increasing from 1.85 to 2.49 on a 5-point scale, P<0.001), albeit with slight decreases in accuracy (from 3.52 to 3.23, P=0.03) and completeness (from 3.47 to 3.27, P=0.17). The results demonstrate that LLMs frequently exhibited hallucinated and erroneous evidence in the responses, raising concerns for downstream applications in the medical domain. RAG substantially reduced the proportion of such evidence but encountered challenges.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Reconsidering the energy efficiency of spiking neural networks
Authors:
Zhanglu Yan,
Zhenyu Bai,
Weng-Fai Wong
Abstract:
Spiking Neural Networks (SNNs) promise higher energy efficiency over conventional Quantized Artificial Neural Networks (QNNs) due to their event-driven, spike-based computation. However, prevailing energy evaluations often oversimplify, focusing on computational aspects while neglecting critical overheads like comprehensive data movement and memory access. Such simplifications can lead to misleadi…
▽ More
Spiking Neural Networks (SNNs) promise higher energy efficiency over conventional Quantized Artificial Neural Networks (QNNs) due to their event-driven, spike-based computation. However, prevailing energy evaluations often oversimplify, focusing on computational aspects while neglecting critical overheads like comprehensive data movement and memory access. Such simplifications can lead to misleading conclusions regarding the true energy benefits of SNNs. This paper presents a rigorous re-evaluation. We establish a fair baseline by mapping rate-encoded SNNs with $T$ timesteps to functionally equivalent QNNs with $\lceil \log_2(T+1) \rceil$ bits. This ensures both models have comparable representational capacities, as well has similar hardware requirement, enabling meaningful energy comparisons. We introduce a detailed analytical energy model encompassing core computation and data movement (sparse and dense activations, weights). Using this model, we systematically explore a wide parameter space, including intrinsic network characteristics ($T$, spike rate $s_r$, QNN sparsity $γ$, model size $N$, weight bit-level) and hardware characteristics (memory system and network-on-chip). Our analysis identifies specific operational regimes where SNNs genuinely offer superior energy efficiency. For example, under typical neuromorphic hardware conditions, SNNs with moderate time windows ($T \in [5,10]$) require an average spike rate ($s_r$) below 6.4% to outperform equivalent QNNs. These insights guide the design of genuinely energy-efficient neural network solutions.
△ Less
Submitted 3 July, 2025; v1 submitted 29 August, 2024;
originally announced September 2024.
-
Task-Augmented Cross-View Imputation Network for Partial Multi-View Incomplete Multi-Label Classification
Authors:
Lian Zhao,
Jie Wen,
Xiaohuan Lu,
Wai Keung Wong,
Jiang Long,
Wulin Xie
Abstract:
In real-world scenarios, multi-view multi-label learning often encounters the challenge of incomplete training data due to limitations in data collection and unreliable annotation processes. The absence of multi-view features impairs the comprehensive understanding of samples, omitting crucial details essential for classification. To address this issue, we present a task-augmented cross-view imput…
▽ More
In real-world scenarios, multi-view multi-label learning often encounters the challenge of incomplete training data due to limitations in data collection and unreliable annotation processes. The absence of multi-view features impairs the comprehensive understanding of samples, omitting crucial details essential for classification. To address this issue, we present a task-augmented cross-view imputation network (TACVI-Net) for the purpose of handling partial multi-view incomplete multi-label classification. Specifically, we employ a two-stage network to derive highly task-relevant features to recover the missing views. In the first stage, we leverage the information bottleneck theory to obtain a discriminative representation of each view by extracting task-relevant information through a view-specific encoder-classifier architecture. In the second stage, an autoencoder based multi-view reconstruction network is utilized to extract high-level semantic representation of the augmented features and recover the missing data, thereby aiding the final classification task. Extensive experiments on five datasets demonstrate that our TACVI-Net outperforms other state-of-the-art methods.
△ Less
Submitted 24 March, 2025; v1 submitted 12 September, 2024;
originally announced September 2024.
-
Chemical Shift Encoding based Double Bonds Quantification in Triglycerides using Deep Image Prior
Authors:
Chaoxing Huang,
Ziqiang Yu,
Zijian Gao,
Qiuyi Shen,
Queenie Chan,
Vincent Wai-Sun Wong,
Winnie Chiu-Wing Chu,
Weitian Chen
Abstract:
Fatty acid can potentially serve as biomarker for evaluating metabolic disorder and inflammation condition, and quantifying the double bonds is the key for revealing fatty acid information. This study presents an assessment of a deep learning approach utilizing Deep Image Prior (DIP) for the quantification of double bonds and methylene-interrupted double bonds of triglyceride derived from chemical…
▽ More
Fatty acid can potentially serve as biomarker for evaluating metabolic disorder and inflammation condition, and quantifying the double bonds is the key for revealing fatty acid information. This study presents an assessment of a deep learning approach utilizing Deep Image Prior (DIP) for the quantification of double bonds and methylene-interrupted double bonds of triglyceride derived from chemical-shift encoded multi-echo gradient echo images, all achieved without the necessity for network training. The methodology implemented a cost function grounded in signal constraints to continually refine the neural network's parameters on a single slice of images through iterative processes. Validation procedures encompassed both phantom experiments and in-vivo scans. The outcomes evidenced a concordance between the quantified values and the established reference standards, notably exemplified by a Pearson correlation coefficient of 0.96 (p = 0.0005) derived from the phantom experiments. The results in water-oil phantom also demonstrate the quantification reliability of the DIP method under the condition of having a relatively low-fat signal. Furthermore, the in-vivo assessments showcased the method's competency by showcasing consistent quantification results that closely mirrored previously published findings concerning subcutaneous fat. In summary, the study underscores the potential of Deep Image Prior in enabling the quantification of double bonds and methylene-interrupted double bonds from chemical-shift encoded multi-echo magnetic resonance imaging (MRI) data, suggesting potential avenues for future research and clinical applications in the field.
△ Less
Submitted 29 October, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Enhancing Diagnostic Accuracy in Rare and Common Fundus Diseases with a Knowledge-Rich Vision-Language Model
Authors:
Meng Wang,
Tian Lin,
Aidi Lin,
Kai Yu,
Yuanyuan Peng,
Lianyu Wang,
Cheng Chen,
Ke Zou,
Huiyu Liang,
Man Chen,
Xue Yao,
Meiqin Zhang,
Binwei Huang,
Chaoxin Zheng,
Peixin Zhang,
Wei Chen,
Yilong Luo,
Yifan Chen,
Honghe Xia,
Tingkun Shi,
Qi Zhang,
Jinming Guo,
Xiaolin Chen,
Jingcheng Wang,
Yih Chung Tham
, et al. (24 additional authors not shown)
Abstract:
Previous foundation models for fundus images were pre-trained with limited disease categories and knowledge base. Here we introduce a knowledge-rich vision-language model (RetiZero) that leverages knowledge from more than 400 fundus diseases. For RetiZero's pretraining, we compiled 341,896 fundus images paired with texts, sourced from public datasets, ophthalmic literature, and online resources, e…
▽ More
Previous foundation models for fundus images were pre-trained with limited disease categories and knowledge base. Here we introduce a knowledge-rich vision-language model (RetiZero) that leverages knowledge from more than 400 fundus diseases. For RetiZero's pretraining, we compiled 341,896 fundus images paired with texts, sourced from public datasets, ophthalmic literature, and online resources, encompassing a diverse range of diseases across multiple ethnicities and countries. RetiZero exhibits remarkable performance in several downstream tasks, including zero-shot disease recognition, image-to-image retrieval, AI-assisted clinical diagnosis,few-shot fine-tuning, and internal- and cross-domain disease identification. In zero-shot scenarios, RetiZero achieves Top-5 accuracies of 0.843 for 15 diseases and 0.756 for 52 diseases. For image retrieval, it achieves Top-5 scores of 0.950 and 0.886 for the same sets, respectively. AI-assisted clinical diagnosis results show that RetiZero's Top-3 zero-shot performance surpasses the average of 19 ophthalmologists from Singapore, China, and the United States. RetiZero substantially enhances clinicians' accuracy in diagnosing fundus diseases, in particularly rare ones. These findings underscore the value of integrating the RetiZero into clinical settings, where various fundus diseases are encountered.
△ Less
Submitted 10 April, 2025; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Biharmonic Distance of Graphs and its Higher-Order Variants: Theoretical Properties with Applications to Centrality and Clustering
Authors:
Mitchell Black,
Lucy Lin,
Amir Nayyeri,
Weng-Keen Wong
Abstract:
Effective resistance is a distance between vertices of a graph that is both theoretically interesting and useful in applications. We study a variant of effective resistance called the biharmonic distance. While the effective resistance measures how well-connected two vertices are, we prove several theoretical results supporting the idea that the biharmonic distance measures how important an edge i…
▽ More
Effective resistance is a distance between vertices of a graph that is both theoretically interesting and useful in applications. We study a variant of effective resistance called the biharmonic distance. While the effective resistance measures how well-connected two vertices are, we prove several theoretical results supporting the idea that the biharmonic distance measures how important an edge is to the global topology of the graph. Our theoretical results connect the biharmonic distance to well-known measures of connectivity of a graph like its total resistance and sparsity. Based on these results, we introduce two clustering algorithms using the biharmonic distance. Finally, we introduce a further generalization of the biharmonic distance that we call the $k$-harmonic distance. We empirically study the utility of biharmonic and $k$-harmonic distance for edge centrality and graph clustering.
△ Less
Submitted 17 February, 2025; v1 submitted 4 June, 2024;
originally announced June 2024.
-
SparrowSNN: A Hardware/software Co-design for Energy Efficient ECG Classification
Authors:
Zhanglu Yan,
Zhenyu Bai,
Tulika Mitra,
Weng-Fai Wong
Abstract:
Heart disease is one of the leading causes of death worldwide. Given its high risk and often asymptomatic nature, real-time continuous monitoring is essential. Unlike traditional artificial neural networks (ANNs), spiking neural networks (SNNs) are well-known for their energy efficiency, making them ideal for wearable devices and energy-constrained edge computing platforms. However, current energy…
▽ More
Heart disease is one of the leading causes of death worldwide. Given its high risk and often asymptomatic nature, real-time continuous monitoring is essential. Unlike traditional artificial neural networks (ANNs), spiking neural networks (SNNs) are well-known for their energy efficiency, making them ideal for wearable devices and energy-constrained edge computing platforms. However, current energy measurement of SNN implementations for detecting heart diseases typically rely on empirical values, often overlooking hardware overhead. Additionally, the integer and fire activations in SNNs require multiple memory accesses and repeated computations, which can further compromise energy efficiency. In this paper, we propose sparrowSNN, a redesign of the standard SNN workflow from a hardware perspective, and present a dedicated ASIC design for SNNs, optimized for ultra-low power wearable devices used in heartbeat classification. Using the MIT-BIH dataset, our SNN achieves a state-of-the-art accuracy of 98.29% for SNNs, with energy consumption of 31.39nJ per inference and power usage of 6.1uW, making sparrowSNN the highest accuracy with the lowest energy use among comparable systems. We also compare the energy-to-accuracy trade-offs between SNNs and quantized ANNs, offering recommendations on insights on how best to use SNNs.
△ Less
Submitted 6 May, 2024;
originally announced June 2024.
-
World Models for General Surgical Grasping
Authors:
Hongbin Lin,
Bin Li,
Chun Wai Wong,
Juan Rojas,
Xiangyu Chu,
Kwok Wai Samuel Au
Abstract:
Intelligent vision control systems for surgical robots should adapt to unknown and diverse objects while being robust to system disturbances. Previous methods did not meet these requirements due to mainly relying on pose estimation and feature tracking. We propose a world-model-based deep reinforcement learning framework "Grasp Anything for Surgery" (GAS), that learns a pixel-level visuomotor poli…
▽ More
Intelligent vision control systems for surgical robots should adapt to unknown and diverse objects while being robust to system disturbances. Previous methods did not meet these requirements due to mainly relying on pose estimation and feature tracking. We propose a world-model-based deep reinforcement learning framework "Grasp Anything for Surgery" (GAS), that learns a pixel-level visuomotor policy for surgical grasping, enhancing both generality and robustness. In particular, a novel method is proposed to estimate the values and uncertainties of depth pixels for a rigid-link object's inaccurate region based on the empirical prior of the object's size; both depth and mask images of task objects are encoded to a single compact 3-channel image (size: 64x64x3) by dynamically zooming in the mask regions, minimizing the information loss. The learned controller's effectiveness is extensively evaluated in simulation and in a real robot. Our learned visuomotor policy handles: i) unseen objects, including 5 types of target grasping objects and a robot gripper, in unstructured real-world surgery environments, and ii) disturbances in perception and control. Note that we are the first work to achieve a unified surgical control system that grasps diverse surgical objects using different robot grippers on real robots in complex surgery scenes (average success rate: 69%). Our system also demonstrates significant robustness across 6 conditions including background variation, target disturbance, camera pose variation, kinematic control error, image noise, and re-grasping after the gripped target object drops from the gripper. Videos and codes can be found on our project page: https://linhongbin.github.io/gas/.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Particle swarm optimization with Applications to Maximum Likelihood Estimation and Penalized Negative Binomial Regression
Authors:
Sisi Shao,
Junhyung Park,
Weng Kee Wong
Abstract:
General purpose optimization routines such as nlminb, optim (R) or nlmixed (SAS) are frequently used to estimate model parameters in nonstandard distributions. This paper presents Particle Swarm Optimization (PSO), as an alternative to many of the current algorithms used in statistics. We find that PSO can not only reproduce the same results as the above routines, it can also produce results that…
▽ More
General purpose optimization routines such as nlminb, optim (R) or nlmixed (SAS) are frequently used to estimate model parameters in nonstandard distributions. This paper presents Particle Swarm Optimization (PSO), as an alternative to many of the current algorithms used in statistics. We find that PSO can not only reproduce the same results as the above routines, it can also produce results that are more optimal or when others cannot converge. In the latter case, it can also identify the source of the problem or problems. We highlight advantages of using PSO using four examples, where: (1) some parameters in a generalized distribution are unidentified using PSO when it is not apparent or computationally manifested using routines in R or SAS; (2) PSO can produce estimation results for the log-binomial regressions when current routines may not; (3) PSO provides flexibility in the link function for binomial regression with LASSO penalty, which is unsupported by standard packages like GLM and GENMOD in Stata and SAS, respectively, and (4) PSO provides superior MLE estimates for an EE-IW distribution compared with those from the traditional statistical methods that rely on moments.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator
Authors:
Mohit Upadhyay,
Rohan Juneja,
Weng-Fai Wong,
Li-Shiuan Peh
Abstract:
Attention mechanisms are becoming increasingly popular, being used in neural network models in multiple domains such as natural language processing (NLP) and vision applications, especially at the edge. However, attention layers are difficult to map onto existing neuro accelerators since they have a much higher density of non-linear operations, which lead to inefficient utilization of today's vect…
▽ More
Attention mechanisms are becoming increasingly popular, being used in neural network models in multiple domains such as natural language processing (NLP) and vision applications, especially at the edge. However, attention layers are difficult to map onto existing neuro accelerators since they have a much higher density of non-linear operations, which lead to inefficient utilization of today's vector units. This work introduces NOVA, a NoC-based Vector Unit that can perform non-linear operations within the NoC of the accelerators, and can be overlaid onto existing neuro accelerators to map attention layers at the edge. Our results show that the NOVA architecture is up to 37.8x more power-efficient than state-of-the-art hardware approximators when running existing attention-based neural networks.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Table-Lookup MAC: Scalable Processing of Quantised Neural Networks in FPGA Soft Logic
Authors:
Daniel Gerlinghoff,
Benjamin Chen Ming Choong,
Rick Siow Mong Goh,
Weng-Fai Wong,
Tao Luo
Abstract:
Recent advancements in neural network quantisation have yielded remarkable outcomes, with three-bit networks reaching state-of-the-art full-precision accuracy in complex tasks. These achievements present valuable opportunities for accelerating neural networks by computing in reduced precision. Implementing it on FPGAs can take advantage of bit-level reconfigurability, which is not available on con…
▽ More
Recent advancements in neural network quantisation have yielded remarkable outcomes, with three-bit networks reaching state-of-the-art full-precision accuracy in complex tasks. These achievements present valuable opportunities for accelerating neural networks by computing in reduced precision. Implementing it on FPGAs can take advantage of bit-level reconfigurability, which is not available on conventional CPUs and GPUs. Simultaneously, the high data intensity of neural network processing has inspired computing-in-memory paradigms, including on FPGA platforms. By programming the effects of trained model weights as lookup operations in soft logic, the transfer of weight data from memory units can be avoided, alleviating the memory bottleneck. However, previous methods face poor scalability - the high logic utilisation limiting them to small networks/sub-networks of binary models with low accuracy. In this paper, we introduce Table Lookup Multiply-Accumulate (TLMAC) as a framework to compile and optimise quantised neural networks for scalable lookup-based processing. TLMAC clusters and maps unique groups of weights to lookup-based processing elements, enabling highly parallel computation while taking advantage of parameter redundancy. Further place and route algorithms are proposed to reduce LUT utilisation and routing congestion. We demonstrate that TLMAC significantly improves the scalability of previous related works. Our efficient logic mapping and high degree of reuse enables entire ImageNet-scale quantised models with full-precision accuracy to be implemented using lookup-based computing on one commercially available FPGA.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1112 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 16 December, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Unsupervised Contrastive Learning for Robust RF Device Fingerprinting Under Time-Domain Shift
Authors:
Jun Chen,
Weng-Keen Wong,
Bechir Hamdaoui
Abstract:
Radio Frequency (RF) device fingerprinting has been recognized as a potential technology for enabling automated wireless device identification and classification. However, it faces a key challenge due to the domain shift that could arise from variations in the channel conditions and environmental settings, potentially degrading the accuracy of RF-based device classification when testing and traini…
▽ More
Radio Frequency (RF) device fingerprinting has been recognized as a potential technology for enabling automated wireless device identification and classification. However, it faces a key challenge due to the domain shift that could arise from variations in the channel conditions and environmental settings, potentially degrading the accuracy of RF-based device classification when testing and training data is collected in different domains. This paper introduces a novel solution that leverages contrastive learning to mitigate this domain shift problem. Contrastive learning, a state-of-the-art self-supervised learning approach from deep learning, learns a distance metric such that positive pairs are closer (i.e. more similar) in the learned metric space than negative pairs. When applied to RF fingerprinting, our model treats RF signals from the same transmission as positive pairs and those from different transmissions as negative pairs. Through experiments on wireless and wired RF datasets collected over several days, we demonstrate that our contrastive learning approach captures domain-invariant features, diminishing the effects of domain-specific variations. Our results show large and consistent improvements in accuracy (10.8\% to 27.8\%) over baseline models, thus underscoring the effectiveness of contrastive learning in improving device classification under domain shift.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Block-MDS QC-LDPC Codes for Information Reconciliation in Key Distribution
Authors:
Lev Tauz,
Debarnab Mitra,
Jayanth Shreekumar,
Murat Can Sarihan,
Chee Wei Wong,
Lara Dolecek
Abstract:
Quantum key distribution (QKD) is a popular protocol that provides information theoretically secure keys to multiple parties. Two important post-processing steps of QKD are 1) the information reconciliation (IR) step, where parties reconcile mismatches in generated keys through classical communication, and 2) the privacy amplification (PA) step, where parties distill their common key into a new se…
▽ More
Quantum key distribution (QKD) is a popular protocol that provides information theoretically secure keys to multiple parties. Two important post-processing steps of QKD are 1) the information reconciliation (IR) step, where parties reconcile mismatches in generated keys through classical communication, and 2) the privacy amplification (PA) step, where parties distill their common key into a new secure key that the adversary has little to no information about. In general, these two steps have been abstracted as two distinct problems. In this work, we consider a new technique of performing the IR and PA steps jointly through sampling that relaxes the requirement on the IR step, allowing for more success in key creation. We provide a novel LDPC code construction known as Block-MDS QC-LDPC codes that can utilize the relaxed requirement by creating LDPC codes with pre-defined sub-matrices of full-rank. We demonstrate through simulations that our technique of sampling can provide notable gains in successfully creating secret keys.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Detecting misinformation through Framing Theory: the Frame Element-based Model
Authors:
Guan Wang,
Rebecca Frederick,
Jinglong Duan,
William Wong,
Verica Rupar,
Weihua Li,
Quan Bai
Abstract:
In this paper, we delve into the rapidly evolving challenge of misinformation detection, with a specific focus on the nuanced manipulation of narrative frames - an under-explored area within the AI community. The potential for Generative AI models to generate misleading narratives underscores the urgency of this problem. Drawing from communication and framing theories, we posit that the presentati…
▽ More
In this paper, we delve into the rapidly evolving challenge of misinformation detection, with a specific focus on the nuanced manipulation of narrative frames - an under-explored area within the AI community. The potential for Generative AI models to generate misleading narratives underscores the urgency of this problem. Drawing from communication and framing theories, we posit that the presentation or 'framing' of accurate information can dramatically alter its interpretation, potentially leading to misinformation. We highlight this issue through real-world examples, demonstrating how shifts in narrative frames can transmute fact-based information into misinformation. To tackle this challenge, we propose an innovative approach leveraging the power of pre-trained Large Language Models and deep neural networks to detect misinformation originating from accurate facts portrayed under different frames. These advanced AI techniques offer unprecedented capabilities in identifying complex patterns within unstructured data critical for examining the subtleties of narrative frames. The objective of this paper is to bridge a significant research gap in the AI domain, providing valuable insights and methodologies for tackling framing-induced misinformation, thus contributing to the advancement of responsible and trustworthy AI technologies. Several experiments are intensively conducted and experimental results explicitly demonstrate the various impact of elements of framing theory proving the rationale of applying framing theory to increase the performance in misinformation detection.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Integrating Deep Learning and Synthetic Biology: A Co-Design Approach for Enhancing Gene Expression via N-terminal Coding Sequences
Authors:
Zhanglu Yan,
Weiran Chu,
Yuhua Sheng,
Kaiwen Tang,
Shida Wang,
Yanfeng Liu,
Weng-Fai Wong
Abstract:
N-terminal coding sequence (NCS) influences gene expression by impacting the translation initiation rate. The NCS optimization problem is to find an NCS that maximizes gene expression. The problem is important in genetic engineering. However, current methods for NCS optimization such as rational design and statistics-guided approaches are labor-intensive yield only relatively small improvements. T…
▽ More
N-terminal coding sequence (NCS) influences gene expression by impacting the translation initiation rate. The NCS optimization problem is to find an NCS that maximizes gene expression. The problem is important in genetic engineering. However, current methods for NCS optimization such as rational design and statistics-guided approaches are labor-intensive yield only relatively small improvements. This paper introduces a deep learning/synthetic biology co-designed few-shot training workflow for NCS optimization. Our method utilizes k-nearest encoding followed by word2vec to encode the NCS, then performs feature extraction using attention mechanisms, before constructing a time-series network for predicting gene expression intensity, and finally a direct search algorithm identifies the optimal NCS with limited training data. We took green fluorescent protein (GFP) expressed by Bacillus subtilis as a reporting protein of NCSs, and employed the fluorescence enhancement factor as the metric of NCS optimization. Within just six iterative experiments, our model generated an NCS (MLD62) that increased average GFP expression by 5.41-fold, outperforming the state-of-the-art NCS designs. Extending our findings beyond GFP, we showed that our engineered NCS (MLD62) can effectively boost the production of N-acetylneuraminic acid by enhancing the expression of the crucial rate-limiting GNA1 gene, demonstrating its practical utility. We have open-sourced our NCS expression database and experimental procedures for public use.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Authors:
Liyan Tang,
Igor Shalyminov,
Amy Wing-mei Wong,
Jon Burnsky,
Jake W. Vincent,
Yu'an Yang,
Siffi Singh,
Song Feng,
Hwanjun Song,
Hang Su,
Lijia Sun,
Yi Zhang,
Saab Mansour,
Kathleen McKeown
Abstract:
Single document news summarization has seen substantial progress on faithfulness in recent years, driven by research on the evaluation of factual consistency, or hallucinations. We ask whether these advances carry over to other text summarization domains. We propose a new evaluation benchmark on topic-focused dialogue summarization, generated by LLMs of varying sizes. We provide binary sentence-le…
▽ More
Single document news summarization has seen substantial progress on faithfulness in recent years, driven by research on the evaluation of factual consistency, or hallucinations. We ask whether these advances carry over to other text summarization domains. We propose a new evaluation benchmark on topic-focused dialogue summarization, generated by LLMs of varying sizes. We provide binary sentence-level human annotations of the factual consistency of these summaries along with detailed explanations of factually inconsistent sentences. Our analysis shows that existing LLMs hallucinate significant amounts of factual errors in the dialogue domain, regardless of the model's size. On the other hand, when LLMs, including GPT-4, serve as binary factual evaluators, they perform poorly and can be outperformed by prevailing state-of-the-art specialized factuality evaluation metrics. Finally, we conducted an analysis of hallucination types with a curated error taxonomy. We find that there are diverse errors and error distributions in model-generated summaries and that non-LLM based metrics can capture all error types better than LLM-based evaluators.
△ Less
Submitted 31 March, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Efficient Generative Modeling via Penalized Optimal Transport Network
Authors:
Wenhui Sophia Lu,
Chenyang Zhong,
Wing Hung Wong
Abstract:
The generation of synthetic data with distributions that faithfully emulate the underlying data-generating mechanism holds paramount significance. Wasserstein Generative Adversarial Networks (WGANs) have emerged as a prominent tool for this task; however, due to the delicate equilibrium of the minimax formulation and the instability of Wasserstein distance in high dimensions, WGAN often manifests…
▽ More
The generation of synthetic data with distributions that faithfully emulate the underlying data-generating mechanism holds paramount significance. Wasserstein Generative Adversarial Networks (WGANs) have emerged as a prominent tool for this task; however, due to the delicate equilibrium of the minimax formulation and the instability of Wasserstein distance in high dimensions, WGAN often manifests the pathological phenomenon of mode collapse. This results in generated samples that converge to a restricted set of outputs and fail to adequately capture the tail behaviors of the true distribution. Such limitations can lead to serious downstream consequences. To this end, we propose the Penalized Optimal Transport Network (POTNet), a versatile deep generative model based on the marginally-penalized Wasserstein (MPW) distance. Through the MPW distance, POTNet effectively leverages low-dimensional marginal information to guide the overall alignment of joint distributions. Furthermore, our primal-based framework enables direct evaluation of the MPW distance, thus eliminating the need for a critic network. This formulation circumvents training instabilities inherent in adversarial approaches and avoids the need for extensive parameter tuning. We derive a non-asymptotic bound on the generalization error of the MPW loss and establish convergence rates of the generative distribution learned by POTNet. Our theoretical analysis together with extensive empirical evaluations demonstrate the superior performance of POTNet in accurately capturing underlying data structures, including their tail behaviors and minor modalities. Moreover, our model achieves orders of magnitude speedup during the sampling stage compared to state-of-the-art alternatives, which enables computationally efficient large-scale synthetic data generation.
△ Less
Submitted 7 January, 2025; v1 submitted 16 February, 2024;
originally announced February 2024.
-
Distributional Off-policy Evaluation with Bellman Residual Minimization
Authors:
Sungee Hong,
Zhengling Qi,
Raymond K. W. Wong
Abstract:
We study distributional off-policy evaluation (OPE), of which the goal is to learn the distribution of the return for a target policy using offline data generated by a different policy. The theoretical foundation of many existing work relies on the supremum-extended statistical distances such as supremum-Wasserstein distance, which are hard to estimate. In contrast, we study the more manageable ex…
▽ More
We study distributional off-policy evaluation (OPE), of which the goal is to learn the distribution of the return for a target policy using offline data generated by a different policy. The theoretical foundation of many existing work relies on the supremum-extended statistical distances such as supremum-Wasserstein distance, which are hard to estimate. In contrast, we study the more manageable expectation-extended statistical distances and provide a novel theoretical justification on their validity for learning the return distribution. Based on this attractive property, we propose a new method called Energy Bellman Residual Minimizer (EBRM) for distributional OPE. We provide corresponding in-depth theoretical analyses. We establish a finite-sample error bound for the EBRM estimator under the realizability assumption. Furthermore, we introduce a variant of our method based on a multi-step extension which improves the error bound for non-realizable settings. Notably, unlike prior distributional OPE methods, the theoretical guarantees of our method do not require the completeness assumption.
△ Less
Submitted 12 March, 2025; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Towards Optimal Grammars for RNA Structures
Authors:
Evarista Onokpasa,
Sebastian Wild,
Prudence W. H. Wong
Abstract:
In past work (Onokpasa, Wild, Wong, DCC 2023), we showed that (a) for joint compression of RNA sequence and structure, stochastic context-free grammars are the best known compressors and (b) that grammars which have better compression ability also show better performance in ab initio structure prediction. Previous grammars were manually curated by human experts. In this work, we develop a framewor…
▽ More
In past work (Onokpasa, Wild, Wong, DCC 2023), we showed that (a) for joint compression of RNA sequence and structure, stochastic context-free grammars are the best known compressors and (b) that grammars which have better compression ability also show better performance in ab initio structure prediction. Previous grammars were manually curated by human experts. In this work, we develop a framework for automatic and systematic search algorithms for stochastic grammars with better compression (and prediction) ability for RNA. We perform an exhaustive search of small grammars and identify grammars that surpass the performance of human-expert grammars.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.