-
A Survey on Vision-Language-Action Models for Autonomous Driving
Authors:
Sicong Jiang,
Zilin Huang,
Kangan Qian,
Ziang Luo,
Tianze Zhu,
Yang Zhong,
Yihong Tang,
Menglin Kong,
Yunlong Wang,
Siwen Jiao,
Hao Ye,
Zihao Sheng,
Xin Zhao,
Tuopu Wen,
Zheng Fu,
Sikai Chen,
Kun Jiang,
Diange Yang,
Seongjin Choi,
Lijun Sun
Abstract:
The rapid progress of multimodal large language models (MLLM) has paved the way for Vision-Language-Action (VLA) paradigms, which integrate visual perception, natural language understanding, and control within a single policy. Researchers in autonomous driving are actively adapting these methods to the vehicle domain. Such models promise autonomous vehicles that can interpret high-level instructio…
▽ More
The rapid progress of multimodal large language models (MLLM) has paved the way for Vision-Language-Action (VLA) paradigms, which integrate visual perception, natural language understanding, and control within a single policy. Researchers in autonomous driving are actively adapting these methods to the vehicle domain. Such models promise autonomous vehicles that can interpret high-level instructions, reason about complex traffic scenes, and make their own decisions. However, the literature remains fragmented and is rapidly expanding. This survey offers the first comprehensive overview of VLA for Autonomous Driving (VLA4AD). We (i) formalize the architectural building blocks shared across recent work, (ii) trace the evolution from early explainer to reasoning-centric VLA models, and (iii) compare over 20 representative models according to VLA's progress in the autonomous driving domain. We also consolidate existing datasets and benchmarks, highlighting protocols that jointly measure driving safety, accuracy, and explanation quality. Finally, we detail open challenges - robustness, real-time efficiency, and formal verification - and outline future directions of VLA4AD. This survey provides a concise yet complete reference for advancing interpretable socially aligned autonomous vehicles. Github repo is available at \href{https://github.com/JohnsonJiang1996/Awesome-VLA4AD}{SicongJiang/Awesome-VLA4AD}.
△ Less
Submitted 30 June, 2025;
originally announced June 2025.
-
Style-Aligned Image Composition for Robust Detection of Abnormal Cells in Cytopathology
Authors:
Qiuyi Qi,
Xin Li,
Ming Kong,
Zikang Xu,
Bingdi Chen,
Qiang Zhu,
S Kevin Zhou
Abstract:
Challenges such as the lack of high-quality annotations, long-tailed data distributions, and inconsistent staining styles pose significant obstacles to training neural networks to detect abnormal cells in cytopathology robustly. This paper proposes a style-aligned image composition (SAIC) method that composes high-fidelity and style-preserved pathological images to enhance the effectiveness and ro…
▽ More
Challenges such as the lack of high-quality annotations, long-tailed data distributions, and inconsistent staining styles pose significant obstacles to training neural networks to detect abnormal cells in cytopathology robustly. This paper proposes a style-aligned image composition (SAIC) method that composes high-fidelity and style-preserved pathological images to enhance the effectiveness and robustness of detection models. Without additional training, SAIC first selects an appropriate candidate from the abnormal cell bank based on attribute guidance. Then, it employs a high-frequency feature reconstruction to achieve a style-aligned and high-fidelity composition of abnormal cells and pathological backgrounds. Finally, it introduces a large vision-language model to filter high-quality synthesis images. Experimental results demonstrate that incorporating SAIC-synthesized images effectively enhances the performance and robustness of abnormal cell detection for tail categories and styles, thereby improving overall detection performance. The comprehensive quality evaluation further confirms the generalizability and practicality of SAIC in clinical application scenarios. Our code will be released at https://github.com/Joey-Qi/SAIC.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
AI Magnetic Levitation (Maglev) Conveyor for Automated Assembly Production
Authors:
Ray Wai Man Kong
Abstract:
Efficiency, speed, and precision are essential in modern manufacturing. AI Maglev Conveyor system, combining magnetic levitation (maglev) technology with artificial intelligence (AI), revolutionizes automated production processes. This system reduces maintenance costs and downtime by eliminating friction, enhancing operational efficiency. It transports goods swiftly with minimal energy consumption…
▽ More
Efficiency, speed, and precision are essential in modern manufacturing. AI Maglev Conveyor system, combining magnetic levitation (maglev) technology with artificial intelligence (AI), revolutionizes automated production processes. This system reduces maintenance costs and downtime by eliminating friction, enhancing operational efficiency. It transports goods swiftly with minimal energy consumption, optimizing resource use and supporting sustainability. AI integration enables real-time monitoring and adaptive control, allowing businesses to respond to production demand fluctuations and streamline supply chain operations.
The AI Maglev Conveyor offers smooth, silent operation, accommodating diverse product types and sizes for flexible manufacturing without extensive reconfiguration. AI algorithms optimize routing, reduce cycle times, and improve throughput, creating an agile production line adaptable to market changes.
This applied research paper introduces the Maglev Conveyor system, featuring an electromagnetic controller and multiple movers to enhance automation. It offers cost savings as an alternative to setups using six-axis robots or linear motors, with precise adjustments for robotic arm loading. Operating at high speeds minimizes treatment time for delicate components while maintaining precision. Its adaptable design accommodates various materials, facilitating integration of processing stations alongside electronic product assembly. Positioned between linear-axis and robotic systems in cost, the Maglev Conveyor is ideal for flat parts requiring minimal travel, transforming production efficiency across industries. It explores its technical advantages, flexibility, cost reductions, and overall benefits.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Dynamic Modes as Time Representation for Spatiotemporal Forecasting
Authors:
Menglin Kong,
Vincent Zhihao Zheng,
Xudong Wang,
Lijun Sun
Abstract:
This paper introduces a data-driven time embedding method for modeling long-range seasonal dependencies in spatiotemporal forecasting tasks. The proposed approach employs Dynamic Mode Decomposition (DMD) to extract temporal modes directly from observed data, eliminating the need for explicit timestamps or hand-crafted time features. These temporal modes serve as time representations that can be se…
▽ More
This paper introduces a data-driven time embedding method for modeling long-range seasonal dependencies in spatiotemporal forecasting tasks. The proposed approach employs Dynamic Mode Decomposition (DMD) to extract temporal modes directly from observed data, eliminating the need for explicit timestamps or hand-crafted time features. These temporal modes serve as time representations that can be seamlessly integrated into deep spatiotemporal forecasting models. Unlike conventional embeddings such as time-of-day indicators or sinusoidal functions, our method captures complex multi-scale periodicity through spectral analysis of spatiotemporal data. Extensive experiments on urban mobility, highway traffic, and climate datasets demonstrate that the DMD-based embedding consistently improves long-horizon forecasting accuracy, reduces residual correlation, and enhances temporal generalization. The method is lightweight, model-agnostic, and compatible with any architecture that incorporates time covariates.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Aggregation Buffer: Revisiting DropEdge with a New Parameter Block
Authors:
Dooho Lee,
Myeong Kong,
Sagad Hamid,
Cheonwoo Lee,
Jaemin Yoo
Abstract:
We revisit DropEdge, a data augmentation technique for GNNs which randomly removes edges to expose diverse graph structures during training. While being a promising approach to effectively reduce overfitting on specific connections in the graph, we observe that its potential performance gain in supervised learning tasks is significantly limited. To understand why, we provide a theoretical analysis…
▽ More
We revisit DropEdge, a data augmentation technique for GNNs which randomly removes edges to expose diverse graph structures during training. While being a promising approach to effectively reduce overfitting on specific connections in the graph, we observe that its potential performance gain in supervised learning tasks is significantly limited. To understand why, we provide a theoretical analysis showing that the limited performance of DropEdge comes from the fundamental limitation that exists in many GNN architectures. Based on this analysis, we propose Aggregation Buffer, a parameter block specifically designed to improve the robustness of GNNs by addressing the limitation of DropEdge. Our method is compatible with any GNN model, and shows consistent performance improvements on multiple datasets. Moreover, our method effectively addresses well-known problems such as degree bias or structural disparity as a unifying solution. Code and datasets are available at https://github.com/dooho00/agg-buffer.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
Large Language Models for Data Synthesis
Authors:
Yihong Tang,
Menglin Kong,
Lijun Sun
Abstract:
Generating synthetic data that faithfully captures the statistical structure of real-world distributions is a fundamental challenge in data modeling. Classical approaches often depend on strong parametric assumptions or manual structural design and struggle in high-dimensional or heterogeneous domains. Recent progress in Large Language Models (LLMs) reveals their potential as flexible, high-dimens…
▽ More
Generating synthetic data that faithfully captures the statistical structure of real-world distributions is a fundamental challenge in data modeling. Classical approaches often depend on strong parametric assumptions or manual structural design and struggle in high-dimensional or heterogeneous domains. Recent progress in Large Language Models (LLMs) reveals their potential as flexible, high-dimensional priors over real-world distributions. However, when applied to data synthesis, standard LLM-based sampling is inefficient, constrained by fixed context limits, and fails to ensure statistical alignment. Given this, we introduce LLMSynthor, a general framework for data synthesis that transforms LLMs into structure-aware simulators guided by distributional feedback. LLMSynthor treats the LLM as a nonparametric copula simulator for modeling high-order dependencies and introduces LLM Proposal Sampling to generate grounded proposal distributions that improve sampling efficiency without requiring rejection. By minimizing discrepancies in the summary statistics space, the iterative synthesis loop aligns real and synthetic data while gradually uncovering and refining the latent generative structure. We evaluate LLMSynthor in both controlled and real-world settings using heterogeneous datasets in privacy-sensitive domains (e.g., e-commerce, population, and mobility) that encompass both structured and unstructured formats. The synthetic data produced by LLMSynthor shows high statistical fidelity, practical utility, and cross-data adaptability, positioning it as a valuable tool across economics, social science, urban studies, and beyond.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Distilling Multi-view Diffusion Models into 3D Generators
Authors:
Hao Qin,
Luyuan Chen,
Ming Kong,
Mengxu Lu,
Qiang Zhu
Abstract:
We introduce DD3G, a formulation that Distills a multi-view Diffusion model (MV-DM) into a 3D Generator using gaussian splatting. DD3G compresses and integrates extensive visual and spatial geometric knowledge from the MV-DM by simulating its ordinary differential equation (ODE) trajectory, ensuring the distilled generator generalizes better than those trained solely on 3D data. Unlike previous am…
▽ More
We introduce DD3G, a formulation that Distills a multi-view Diffusion model (MV-DM) into a 3D Generator using gaussian splatting. DD3G compresses and integrates extensive visual and spatial geometric knowledge from the MV-DM by simulating its ordinary differential equation (ODE) trajectory, ensuring the distilled generator generalizes better than those trained solely on 3D data. Unlike previous amortized optimization approaches, we align the MV-DM and 3D generator representation spaces to transfer the teacher's probabilistic flow to the student, thus avoiding inconsistencies in optimization objectives caused by probabilistic sampling. The introduction of probabilistic flow and the coupling of various attributes in 3D Gaussians introduce challenges in the generation process. To tackle this, we propose PEPD, a generator consisting of Pattern Extraction and Progressive Decoding phases, which enables efficient fusion of probabilistic flow and converts a single image into 3D Gaussians within 0.06 seconds. Furthermore, to reduce knowledge loss and overcome sparse-view supervision, we design a joint optimization objective that ensures the quality of generated samples through explicit supervision and implicit verification. Leveraging existing 2D generation models, we compile 120k high-quality RGBA images for distillation. Experiments on synthetic and public datasets demonstrate the effectiveness of our method. Our project is available at: https://qinbaigao.github.io/DD3G_project/
△ Less
Submitted 2 April, 2025; v1 submitted 1 April, 2025;
originally announced April 2025.
-
Innovative Automated Stretch Elastic Waistband Sewing Machine for Garment Manufacturing
Authors:
Prof Dr Ray Wai Man Kong
Abstract:
There is applied research for the development of the Automated Stretch Elastic Waistband Sewing Machine represents a significant advancement in garment manufacturing, addressing the industry's need for increased efficiency, precision, and adaptability. This machine integrates innovative features such as a sensor-based automatic waistband expansion system, synchronized sewing speed and rolling whee…
▽ More
There is applied research for the development of the Automated Stretch Elastic Waistband Sewing Machine represents a significant advancement in garment manufacturing, addressing the industry's need for increased efficiency, precision, and adaptability. This machine integrates innovative features such as a sensor-based automatic waistband expansion system, synchronized sewing speed and rolling wheel speed, and a differential feed top-loading mechanism. These enhancements streamline the sewing process, reduce manual intervention, and ensure consistent product quality. The machine's design incorporates both 3-wheel and 2-wheel rolling systems, each optimized for different elastic band dimensions and elongation factors. The 3-wheel rolling system accommodates a larger maximum boundary, while the 2-wheel rolling system offers a tighter operational range, providing flexibility to meet diverse manufacturing requirements. The Automated Stretch Elastic Waistband Sewing Machine has a design that controls the pulling apart force so as not to break the elastic waistband. It sets a new standard for quality and innovation, empowering manufacturers to meet the demands of a competitive market with precision and ease.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
A Mixed-Integer Linear Programming (MILP) for Garment Line Balancing
Authors:
Ray Wai Man Kong,
Ding Ning,
Theodore Ho Tin Kong
Abstract:
This applied research article explores the application of Mixed-Integer Linear Programming (MILP) to address line-balancing challenges in the garment industry, focusing on optimizing production processes under multiple constraints. By integrating MILP with Lean Methodology principles, the study demonstrates significant improvements in operational efficiency and cost-effectiveness. The case study,…
▽ More
This applied research article explores the application of Mixed-Integer Linear Programming (MILP) to address line-balancing challenges in the garment industry, focusing on optimizing production processes under multiple constraints. By integrating MILP with Lean Methodology principles, the study demonstrates significant improvements in operational efficiency and cost-effectiveness. The case study, conducted in collaboration with Prof Dr Ray WM Kong, highlights the successful implementation of MILP using IBM CPLEX Studio to optimize production order quantities across online and offline operations. The results reveal a remarkable reduction in labour costs, exceeding 50%, while effectively managing resource capacity and demand constraints. This study not only validates the theoretical underpinnings of MILP in resolving line-balancing issues but also underscores its practical applicability in modernizing garment production. The findings contribute valuable insights into the potential of advanced optimization techniques to enhance competitiveness and sustainability in the garment industry. This abstract succinctly captures the essence of the research, emphasizing the methodology, results, and significance of the study.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
Exploiting Epistemic Uncertainty in Cold-Start Recommendation Systems
Authors:
Yang Xiang,
Li Fan,
Chenke Yin,
Menglin Kong,
Chengtao Ji
Abstract:
The cold-start problem remains a significant challenge in recommendation systems based on generative models. Current methods primarily focus on enriching embeddings or inputs by gathering more data, often overlooking the effectiveness of how existing training knowledge is utilized. This inefficiency can lead to missed opportunities for improving cold-start recommendations. To address this, we prop…
▽ More
The cold-start problem remains a significant challenge in recommendation systems based on generative models. Current methods primarily focus on enriching embeddings or inputs by gathering more data, often overlooking the effectiveness of how existing training knowledge is utilized. This inefficiency can lead to missed opportunities for improving cold-start recommendations. To address this, we propose the use of epistemic uncertainty, which reflects a lack of certainty about the optimal model, as a tool to measure and enhance the efficiency with which a recommendation system leverages available knowledge. By considering epistemic uncertainty as a reducible component of overall uncertainty, we introduce a new approach to refine model performance. The effectiveness of this approach is validated through extensive offline experiments on publicly available datasets, demonstrating its superior performance and robustness in tackling the cold-start problem.
△ Less
Submitted 29 April, 2025; v1 submitted 22 February, 2025;
originally announced February 2025.
-
Online Clustering of Dueling Bandits
Authors:
Zhiyong Wang,
Jiahang Sun,
Mingze Kong,
Jize Xie,
Qinghua Hu,
John C. S. Lui,
Zhongxiang Dai
Abstract:
The contextual multi-armed bandit (MAB) is a widely used framework for problems requiring sequential decision-making under uncertainty, such as recommendation systems. In applications involving a large number of users, the performance of contextual MAB can be significantly improved by facilitating collaboration among multiple users. This has been achieved by the clustering of bandits (CB) methods,…
▽ More
The contextual multi-armed bandit (MAB) is a widely used framework for problems requiring sequential decision-making under uncertainty, such as recommendation systems. In applications involving a large number of users, the performance of contextual MAB can be significantly improved by facilitating collaboration among multiple users. This has been achieved by the clustering of bandits (CB) methods, which adaptively group the users into different clusters and achieve collaboration by allowing the users in the same cluster to share data. However, classical CB algorithms typically rely on numerical reward feedback, which may not be practical in certain real-world applications. For instance, in recommendation systems, it is more realistic and reliable to solicit preference feedback between pairs of recommended items rather than absolute rewards. To address this limitation, we introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback. We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions. Both algorithms are supported by rigorous theoretical analyses, demonstrating that user collaboration leads to improved regret bounds. Extensive empirical evaluations on synthetic and real-world datasets further validate the effectiveness of our methods, establishing their potential in real-world applications involving multiple users with preference-based feedback.
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Meta-Prompt Optimization for LLM-Based Sequential Decision Making
Authors:
Mingze Kong,
Zhiyong Wang,
Yao Shu,
Zhongxiang Dai
Abstract:
Large language models (LLMs) have recently been employed as agents to solve sequential decision-making tasks such as Bayesian optimization and multi-armed bandits (MAB). These works usually adopt an LLM for sequential action selection by providing it with a fixed, manually designed meta-prompt. However, numerous previous works have found that the prompt has a significant impact on the performance…
▽ More
Large language models (LLMs) have recently been employed as agents to solve sequential decision-making tasks such as Bayesian optimization and multi-armed bandits (MAB). These works usually adopt an LLM for sequential action selection by providing it with a fixed, manually designed meta-prompt. However, numerous previous works have found that the prompt has a significant impact on the performance of the LLM, which calls for a method to automatically optimize the meta-prompt for LLM-based agents. Unfortunately, the non-stationarity in the reward observations during LLM-based sequential decision-making makes meta-prompt optimization highly challenging. To address this challenge, we draw inspirations from adversarial bandit algorithms, which are inherently capable of handling non-stationary reward observations. Building on this foundation, we propose our EXPonential-weight algorithm for prompt Optimization} (EXPO) to automatically optimize the task description and meta-instruction in the meta-prompt for LLM-based agents. We also extend EXPO to additionally optimize the exemplars (i.e., history of interactions) in the meta-prompt to further enhance the performance, hence introducing our EXPO-ES algorithm. We use extensive experiments to show that our algorithms significantly improve the performance of LLM-based sequential decision-making.
△ Less
Submitted 2 February, 2025;
originally announced February 2025.
-
Line Balancing in the Modern Garment Industry
Authors:
Ray Wai Man Kong,
Ding Ning,
Theodore Ho Tin Kong
Abstract:
This article presents applied research on line balancing within the modern garment industry, focusing on the significant impact of intelligent hanger systems and hanger lines on the stitching process, by Lean Methodology for garment modernization. It explores the application of line balancing in the modern garment industry, focusing on the significant impact of intelligent hanger systems and hange…
▽ More
This article presents applied research on line balancing within the modern garment industry, focusing on the significant impact of intelligent hanger systems and hanger lines on the stitching process, by Lean Methodology for garment modernization. It explores the application of line balancing in the modern garment industry, focusing on the significant impact of intelligent hanger systems and hanger lines on the stitching process. It aligns with Lean Methodology principles for garment modernization. Without the implementation of line balancing technology, the garment manufacturing process using hanger systems cannot improve output rates. The case study demonstrates that implementing intelligent line balancing in a straightforward practical setup facilitates lean practices combined with a digitalization system and automaton. This approach illustrates how to enhance output and reduce accumulated work in progress.
△ Less
Submitted 13 February, 2025; v1 submitted 1 February, 2025;
originally announced February 2025.
-
GeomGS: LiDAR-Guided Geometry-Aware Gaussian Splatting for Robot Localization
Authors:
Jaewon Lee,
Mangyu Kong,
Minseong Park,
Euntai Kim
Abstract:
Mapping and localization are crucial problems in robotics and autonomous driving. Recent advances in 3D Gaussian Splatting (3DGS) have enabled precise 3D mapping and scene understanding by rendering photo-realistic images. However, existing 3DGS methods often struggle to accurately reconstruct a 3D map that reflects the actual scale and geometry of the real world, which degrades localization perfo…
▽ More
Mapping and localization are crucial problems in robotics and autonomous driving. Recent advances in 3D Gaussian Splatting (3DGS) have enabled precise 3D mapping and scene understanding by rendering photo-realistic images. However, existing 3DGS methods often struggle to accurately reconstruct a 3D map that reflects the actual scale and geometry of the real world, which degrades localization performance. To address these limitations, we propose a novel 3DGS method called Geometry-Aware Gaussian Splatting (GeomGS). This method fully integrates LiDAR data into 3D Gaussian primitives via a probabilistic approach, as opposed to approaches that only use LiDAR as initial points or introduce simple constraints for Gaussian points. To this end, we introduce a Geometric Confidence Score (GCS), which identifies the structural reliability of each Gaussian point. The GCS is optimized simultaneously with Gaussians under probabilistic distance constraints to construct a precise structure. Furthermore, we propose a novel localization method that fully utilizes both the geometric and photometric properties of GeomGS. Our GeomGS demonstrates state-of-the-art geometric and localization performance across several benchmarks, while also improving photometric performance.
△ Less
Submitted 23 January, 2025;
originally announced January 2025.
-
Citation Structural Diversity: A Novel and Concise Metric Combining Structure and Semantics for Literature Evaluation
Authors:
Mingyue Kong,
Yinglong Zhang,
Likun Sheng,
Kaifeng Hong
Abstract:
As academic research becomes increasingly diverse, traditional literature evaluation methods face significant limitations,particularly in capturing the complexity of academic dissemination and the multidimensional impacts of literature. To address these challenges, this paper introduces a novel literature evaluation model of citation structural diversity, with a focus on assessing its feasibility…
▽ More
As academic research becomes increasingly diverse, traditional literature evaluation methods face significant limitations,particularly in capturing the complexity of academic dissemination and the multidimensional impacts of literature. To address these challenges, this paper introduces a novel literature evaluation model of citation structural diversity, with a focus on assessing its feasibility as an evaluation metric. By refining citation network and incorporating both ciation structural features and semantic information, the study examines the influence of the proposed model of citation structural diversity on citation volume and long-term academic impact. The findings reveal that literature with higher citation structural diversity demonstrates notable advantages in both citation frequency and sustained academic influence. Through data grouping and a decade-long citation trend analysis, the potential application of this model in literature evaluation is further validated. This research offers a fresh perspective on optimizing literature evaluation methods and emphasizes the distinct advantages of citation structural diversity in measuring interdisciplinarity.
△ Less
Submitted 4 January, 2025;
originally announced January 2025.
-
Design a New Pulling Gear for the Automated Pant Bottom Hem Sewing Machine
Authors:
Ray Wai Man Kong,
Theodore Ho Tin Kong,
Miao Yi,
Zerui Zhang
Abstract:
Automated machinery design for garment manufacturing is essential for improving productivity, consistency, and quality. This paper focuses on the development of new pulling gear for automated pant bottom hem sewing machines. Traditionally, these machines require manual intervention to guide the bottom hem sewing process, which often leads to inconsistent stitch quality and alignment. While twin-ne…
▽ More
Automated machinery design for garment manufacturing is essential for improving productivity, consistency, and quality. This paper focuses on the development of new pulling gear for automated pant bottom hem sewing machines. Traditionally, these machines require manual intervention to guide the bottom hem sewing process, which often leads to inconsistent stitch quality and alignment. While twin-needle sewing machines can create twin lines for the bottom hem, they typically lack sufficient pulling force to adequately handle the fabric of the pants' bottom hem. The innovative design of the pulling gear aims to address this issue by providing the necessary pulling force for the bottom hem of eyelet pants. The research and design discussed in this article seek to solve technical challenges, eliminate the need for skilled manual operators, and enhance overall productivity. This improvement ensures smooth and precise feeding of fabric pieces in the automated twin needle sewing machine, ultimately improving the consistency and quality of the stitching. By integrating this innovation, garment manufacturers can boost productivity, reduce reliance on manual skilful labour, and optimize the output of the production process, thereby reaping the benefits of automation in the garment manufacturing industry.
△ Less
Submitted 18 November, 2024;
originally announced November 2024.
-
DGS-SLAM: Gaussian Splatting SLAM in Dynamic Environment
Authors:
Mangyu Kong,
Jaewon Lee,
Seongwon Lee,
Euntai Kim
Abstract:
We introduce Dynamic Gaussian Splatting SLAM (DGS-SLAM), the first dynamic SLAM framework built on the foundation of Gaussian Splatting. While recent advancements in dense SLAM have leveraged Gaussian Splatting to enhance scene representation, most approaches assume a static environment, making them vulnerable to photometric and geometric inconsistencies caused by dynamic objects. To address these…
▽ More
We introduce Dynamic Gaussian Splatting SLAM (DGS-SLAM), the first dynamic SLAM framework built on the foundation of Gaussian Splatting. While recent advancements in dense SLAM have leveraged Gaussian Splatting to enhance scene representation, most approaches assume a static environment, making them vulnerable to photometric and geometric inconsistencies caused by dynamic objects. To address these challenges, we integrate Gaussian Splatting SLAM with a robust filtering process to handle dynamic objects throughout the entire pipeline, including Gaussian insertion and keyframe selection. Within this framework, to further improve the accuracy of dynamic object removal, we introduce a robust mask generation method that enforces photometric consistency across keyframes, reducing noise from inaccurate segmentation and artifacts such as shadows. Additionally, we propose the loop-aware window selection mechanism, which utilizes unique keyframe IDs of 3D Gaussians to detect loops between the current and past frames, facilitating joint optimization of the current camera poses and the Gaussian map. DGS-SLAM achieves state-of-the-art performance in both camera tracking and novel view synthesis on various dynamic SLAM benchmarks, proving its effectiveness in handling real-world dynamic scenes.
△ Less
Submitted 16 November, 2024;
originally announced November 2024.
-
Lean Methodology for Garment Modernization
Authors:
Ray Wai Man Kong,
Theodore Ho Tin Kong,
Tianxu Huang
Abstract:
Lean Methodology for Garment Modernization. This article presents the lean methodology for modernizing garment manufacturing, focusing on lean thinking, lean practices, automation development, VSM, and CRP, and how to integrate them effectively. While isolated automation of specific operations can improve efficiency and reduce cycle time, it does not necessarily enhance overall garment output and…
▽ More
Lean Methodology for Garment Modernization. This article presents the lean methodology for modernizing garment manufacturing, focusing on lean thinking, lean practices, automation development, VSM, and CRP, and how to integrate them effectively. While isolated automation of specific operations can improve efficiency and reduce cycle time, it does not necessarily enhance overall garment output and efficiency. To achieve these broader improvements, it is essential to consider the entire production line and process using VSM and CRP to optimize production and center balance. This approach can increase efficiency, and reduce manufacturing costs, labor time, and lead time, ultimately adding value to the company and factory.
△ Less
Submitted 10 October, 2024; v1 submitted 10 October, 2024;
originally announced October 2024.
-
Design and Experimental Study of Vacuum Suction Grabbing Technology to Grasp Fabric Piece
Authors:
Ray Wai Man Kong,
Mingyi Liu,
Theodore Ho Tin Kong
Abstract:
Vacuum Suction Grabbing Technology. The primary objective of this study was to design the grabbing technique used to determine the vacuum suction gripper and its design parameters for the pocket welting operation in apparel manufacturing. It presents the application of vacuum suction in grabbing technology, a technique that has revolutionized the handling and manipulation to grasp the various fabr…
▽ More
Vacuum Suction Grabbing Technology. The primary objective of this study was to design the grabbing technique used to determine the vacuum suction gripper and its design parameters for the pocket welting operation in apparel manufacturing. It presents the application of vacuum suction in grabbing technology, a technique that has revolutionized the handling and manipulation to grasp the various fabric materials in a range of garment industries. Vacuum suction, being non-intrusive and non-invasive, offers several advantages compared to traditional grabbing methods. It is particularly useful in scenarios where soft woven fabric and air-impermeable fabric items need to be handled with utmost care. The paper delves into the working principles of vacuum suction, its various components, and the underlying physics involved. Furthermore, it explores the various applications of vacuum suction in the garment industry into the automation exploration. The paper also highlights the challenges and limitations of vacuum suction technology and suggests potential areas for further research and development.
△ Less
Submitted 8 October, 2024; v1 submitted 18 August, 2024;
originally announced August 2024.
-
Fast Global Localization on Neural Radiance Field
Authors:
Mangyu Kong,
Seongwon Lee,
Jaewon Lee,
Euntai Kim
Abstract:
Neural Radiance Fields (NeRF) presented a novel way to represent scenes, allowing for high-quality 3D reconstruction from 2D images. Following its remarkable achievements, global localization within NeRF maps is an essential task for enabling a wide range of applications. Recently, Loc-NeRF demonstrated a localization approach that combines traditional Monte Carlo Localization with NeRF, showing p…
▽ More
Neural Radiance Fields (NeRF) presented a novel way to represent scenes, allowing for high-quality 3D reconstruction from 2D images. Following its remarkable achievements, global localization within NeRF maps is an essential task for enabling a wide range of applications. Recently, Loc-NeRF demonstrated a localization approach that combines traditional Monte Carlo Localization with NeRF, showing promising results for using NeRF as an environment map. However, despite its advancements, Loc-NeRF encounters the challenge of a time-intensive ray rendering process, which can be a significant limitation in practical applications. To address this issue, we introduce Fast Loc-NeRF, which leverages a coarse-to-fine approach to enable more efficient and accurate NeRF map-based global localization. Specifically, Fast Loc-NeRF matches rendered pixels and observed images on a multi-resolution from low to high resolution. As a result, it speeds up the costly particle update process while maintaining precise localization results. Additionally, to reject the abnormal particles, we propose particle rejection weighting, which estimates the uncertainty of particles by exploiting NeRF's characteristics and considers them in the particle weighting process. Our Fast Loc-NeRF sets new state-of-the-art localization performances on several benchmarks, convincing its accuracy and efficiency.
△ Less
Submitted 14 March, 2025; v1 submitted 17 June, 2024;
originally announced June 2024.
-
Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation
Authors:
Xianzhou Zeng,
Hao Qin,
Ming Kong,
Luyuan Chen,
Qiang Zhu
Abstract:
The accuracy and robustness of 3D human pose estimation (HPE) are limited by 2D pose detection errors and 2D to 3D ill-posed challenges, which have drawn great attention to Multi-Hypothesis HPE research. Most existing MH-HPE methods are based on generative models, which are computationally expensive and difficult to train. In this study, we propose a Probabilistic Restoration 3D Human Pose Estimat…
▽ More
The accuracy and robustness of 3D human pose estimation (HPE) are limited by 2D pose detection errors and 2D to 3D ill-posed challenges, which have drawn great attention to Multi-Hypothesis HPE research. Most existing MH-HPE methods are based on generative models, which are computationally expensive and difficult to train. In this study, we propose a Probabilistic Restoration 3D Human Pose Estimation framework (PRPose) that can be integrated with any lightweight single-hypothesis model. Specifically, PRPose employs a weakly supervised approach to fit the hidden probability distribution of the 2D-to-3D lifting process in the Single-Hypothesis HPE model and then reverse-map the distribution to the 2D pose input through an adaptive noise sampling strategy to generate reasonable multi-hypothesis samples effectively. Extensive experiments on 3D HPE benchmarks (Human3.6M and MPI-INF-3DHP) highlight the effectiveness and efficiency of PRPose. Code is available at: https://github.com/xzhouzeng/PRPose.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
False Positive Sampling-based Data Augmentation for Enhanced 3D Object Detection Accuracy
Authors:
Jiyong Oh,
Junhaeng Lee,
Woongchan Byun,
Minsang Kong,
Sang Hun Lee
Abstract:
Recent studies have focused on enhancing the performance of 3D object detection models. Among various approaches, ground-truth sampling has been proposed as an augmentation technique to address the challenges posed by limited ground-truth data. However, an inherent issue with ground-truth sampling is its tendency to increase false positives. Therefore, this study aims to overcome the limitations o…
▽ More
Recent studies have focused on enhancing the performance of 3D object detection models. Among various approaches, ground-truth sampling has been proposed as an augmentation technique to address the challenges posed by limited ground-truth data. However, an inherent issue with ground-truth sampling is its tendency to increase false positives. Therefore, this study aims to overcome the limitations of ground-truth sampling and improve the performance of 3D object detection models by developing a new augmentation technique called false-positive sampling. False-positive sampling involves retraining the model using point clouds that are identified as false positives in the model's predictions. We propose an algorithm that utilizes both ground-truth and false-positive sampling and an algorithm for building the false-positive sample database. Additionally, we analyze the principles behind the performance enhancement due to false-positive sampling. Our experiments demonstrate that models utilizing false-positive sampling show a reduction in false positives and exhibit improved object detection performance. On the KITTI and Waymo Open datasets, models with false-positive sampling surpass the baseline models by a large margin.
△ Less
Submitted 19 May, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
Landslide Surface Displacement Prediction Based on VSXC-LSTM Algorithm
Authors:
Menglin Kong,
Ruichen Li,
Fan Liu,
Xingquan Li,
Juan Cheng,
Muzhou Hou,
Cong Cao
Abstract:
Landslide is a natural disaster that can easily threaten local ecology, people's lives and property. In this paper, we conduct modelling research on real unidirectional surface displacement data of recent landslides in the research area and propose a time series prediction framework named VMD-SegSigmoid-XGBoost-ClusterLSTM (VSXC-LSTM) based on variational mode decomposition, which can predict the…
▽ More
Landslide is a natural disaster that can easily threaten local ecology, people's lives and property. In this paper, we conduct modelling research on real unidirectional surface displacement data of recent landslides in the research area and propose a time series prediction framework named VMD-SegSigmoid-XGBoost-ClusterLSTM (VSXC-LSTM) based on variational mode decomposition, which can predict the landslide surface displacement more accurately. The model performs well on the test set. Except for the random item subsequence that is hard to fit, the root mean square error (RMSE) and the mean absolute percentage error (MAPE) of the trend item subsequence and the periodic item subsequence are both less than 0.1, and the RMSE is as low as 0.006 for the periodic item prediction module based on XGBoost\footnote{Accepted in ICANN2023}.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
DEPHN: Different Expression Parallel Heterogeneous Network using virtual gradient optimization for Multi-task Learning
Authors:
Menglin Kong,
Ri Su,
Shaojie Zhao,
Muzhou Hou
Abstract:
Recommendation system algorithm based on multi-task learning (MTL) is the major method for Internet operators to understand users and predict their behaviors in the multi-behavior scenario of platform. Task correlation is an important consideration of MTL goals, traditional models use shared-bottom models and gating experts to realize shared representation learning and information differentiation.…
▽ More
Recommendation system algorithm based on multi-task learning (MTL) is the major method for Internet operators to understand users and predict their behaviors in the multi-behavior scenario of platform. Task correlation is an important consideration of MTL goals, traditional models use shared-bottom models and gating experts to realize shared representation learning and information differentiation. However, The relationship between real-world tasks is often more complex than existing methods do not handle properly sharing information. In this paper, we propose an Different Expression Parallel Heterogeneous Network (DEPHN) to model multiple tasks simultaneously. DEPHN constructs the experts at the bottom of the model by using different feature interaction methods to improve the generalization ability of the shared information flow. In view of the model's differentiating ability for different task information flows, DEPHN uses feature explicit mapping and virtual gradient coefficient for expert gating during the training process, and adaptively adjusts the learning intensity of the gated unit by considering the difference of gating values and task correlation. Extensive experiments on artificial and real-world datasets demonstrate that our proposed method can capture task correlation in complex situations and achieve better performance than baseline models\footnote{Accepted in IJCNN2023}.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
FaFCNN: A General Disease Classification Framework Based on Feature Fusion Neural Networks
Authors:
Menglin Kong,
Shaojie Zhao,
Juan Cheng,
Xingquan Li,
Ri Su,
Muzhou Hou,
Cong Cao
Abstract:
There are two fundamental problems in applying deep learning/machine learning methods to disease classification tasks, one is the insufficient number and poor quality of training samples; another one is how to effectively fuse multiple source features and thus train robust classification models. To address these problems, inspired by the process of human learning knowledge, we propose the Feature-…
▽ More
There are two fundamental problems in applying deep learning/machine learning methods to disease classification tasks, one is the insufficient number and poor quality of training samples; another one is how to effectively fuse multiple source features and thus train robust classification models. To address these problems, inspired by the process of human learning knowledge, we propose the Feature-aware Fusion Correlation Neural Network (FaFCNN), which introduces a feature-aware interaction module and a feature alignment module based on domain adversarial learning. This is a general framework for disease classification, and FaFCNN improves the way existing methods obtain sample correlation features. The experimental results show that training using augmented features obtained by pre-training gradient boosting decision tree yields more performance gains than random-forest based methods. On the low-quality dataset with a large amount of missing data in our setup, FaFCNN obtains a consistently optimal performance compared to competitive baselines. In addition, extensive experiments demonstrate the robustness of the proposed method and the effectiveness of each component of the model\footnote{Accepted in IEEE SMC2023}.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
DADIN: Domain Adversarial Deep Interest Network for Cross Domain Recommender Systems
Authors:
Menglin Kong,
Muzhou Hou,
Shaojie Zhao,
Feng Liu,
Ri Su,
Yinghao Chen
Abstract:
Click-Through Rate (CTR) prediction is one of the main tasks of the recommendation system, which is conducted by a user for different items to give the recommendation results. Cross-domain CTR prediction models have been proposed to overcome problems of data sparsity, long tail distribution of user-item interactions, and cold start of items or users. In order to make knowledge transfer from source…
▽ More
Click-Through Rate (CTR) prediction is one of the main tasks of the recommendation system, which is conducted by a user for different items to give the recommendation results. Cross-domain CTR prediction models have been proposed to overcome problems of data sparsity, long tail distribution of user-item interactions, and cold start of items or users. In order to make knowledge transfer from source domain to target domain more smoothly, an innovative deep learning cross-domain CTR prediction model, Domain Adversarial Deep Interest Network (DADIN) is proposed to convert the cross-domain recommendation task into a domain adaptation problem. The joint distribution alignment of two domains is innovatively realized by introducing domain agnostic layers and specially designed loss, and optimized together with CTR prediction loss in a way of adversarial training. It is found that the Area Under Curve (AUC) of DADIN is 0.08% higher than the most competitive baseline on Huawei dataset and is 0.71% higher than its competitors on Amazon dataset, achieving the state-of-the-art results on the basis of the evaluation of this model performance on two real datasets. The ablation study shows that by introducing adversarial method, this model has respectively led to the AUC improvements of 2.34% on Huawei dataset and 16.67% on Amazon dataset.
△ Less
Submitted 19 May, 2023;
originally announced May 2023.
-
A Unified Framework for Contrastive Learning from a Perspective of Affinity Matrix
Authors:
Wenbin Li,
Meihao Kong,
Xuesong Yang,
Lei Wang,
Jing Huo,
Yang Gao,
Jiebo Luo
Abstract:
In recent years, a variety of contrastive learning based unsupervised visual representation learning methods have been designed and achieved great success in many visual tasks. Generally, these methods can be roughly classified into four categories: (1) standard contrastive methods with an InfoNCE like loss, such as MoCo and SimCLR; (2) non-contrastive methods with only positive pairs, such as BYO…
▽ More
In recent years, a variety of contrastive learning based unsupervised visual representation learning methods have been designed and achieved great success in many visual tasks. Generally, these methods can be roughly classified into four categories: (1) standard contrastive methods with an InfoNCE like loss, such as MoCo and SimCLR; (2) non-contrastive methods with only positive pairs, such as BYOL and SimSiam; (3) whitening regularization based methods, such as W-MSE and VICReg; and (4) consistency regularization based methods, such as CO2. In this study, we present a new unified contrastive learning representation framework (named UniCLR) suitable for all the above four kinds of methods from a novel perspective of basic affinity matrix. Moreover, three variants, i.e., SimAffinity, SimWhitening and SimTrace, are presented based on UniCLR. In addition, a simple symmetric loss, as a new consistency regularization term, is proposed based on this framework. By symmetrizing the affinity matrix, we can effectively accelerate the convergence of the training process. Extensive experiments have been conducted to show that (1) the proposed UniCLR framework can achieve superior results on par with and even be better than the state of the art, (2) the proposed symmetric loss can significantly accelerate the convergence of models, and (3) SimTrace can avoid the mode collapse problem by maximizing the trace of a whitened affinity matrix without relying on asymmetry designs or stop-gradients.
△ Less
Submitted 26 November, 2022;
originally announced November 2022.
-
Playing Lottery Tickets in Style Transfer Models
Authors:
Meihao Kong,
Jing Huo,
Wenbin Li,
Jing Wu,
Yu-Kun Lai,
Yang Gao
Abstract:
Style transfer has achieved great success and attracted a wide range of attention from both academic and industrial communities due to its flexible application scenarios. However, the dependence on a pretty large VGG-based autoencoder leads to existing style transfer models having high parameter complexities, which limits their applications on resource-constrained devices. Compared with many other…
▽ More
Style transfer has achieved great success and attracted a wide range of attention from both academic and industrial communities due to its flexible application scenarios. However, the dependence on a pretty large VGG-based autoencoder leads to existing style transfer models having high parameter complexities, which limits their applications on resource-constrained devices. Compared with many other tasks, the compression of style transfer models has been less explored. Recently, the lottery ticket hypothesis (LTH) has shown great potential in finding extremely sparse matching subnetworks which can achieve on par or even better performance than the original full networks when trained in isolation. In this work, we for the first time perform an empirical study to verify whether such trainable matching subnetworks also exist in style transfer models. Specifically, we take two most popular style transfer models, i.e., AdaIN and SANet, as the main testbeds, which represent global and local transformation based style transfer methods respectively. We carry out extensive experiments and comprehensive analysis, and draw the following conclusions. (1) Compared with fixing the VGG encoder, style transfer models can benefit more from training the whole network together. (2) Using iterative magnitude pruning, we find the matching subnetworks at 89.2% sparsity in AdaIN and 73.7% sparsity in SANet, which demonstrates that style transfer models can play lottery tickets too. (3) The feature transformation module should also be pruned to obtain a much sparser model without affecting the existence and quality of the matching subnetworks. (4) Besides AdaIN and SANet, other models such as LST, MANet, AdaAttN and MCCNet can also play lottery tickets, which shows that LTH can be generalized to various style transfer models.
△ Less
Submitted 10 April, 2022; v1 submitted 25 March, 2022;
originally announced March 2022.
-
String Abstractions for Qubit Mapping
Authors:
Blake Gerard,
Martin Kong
Abstract:
One of the key compilation steps in Quantum Computing (QC) is to determine an initial logical to physical mapping of the qubits used in a quantum circuit. The impact of the starting qubit layout can vastly affect later scheduling and placement decisions of QASM operations, yielding higher values on critical performance metrics (gate count and circuit depth) as a result of quantum compilers introdu…
▽ More
One of the key compilation steps in Quantum Computing (QC) is to determine an initial logical to physical mapping of the qubits used in a quantum circuit. The impact of the starting qubit layout can vastly affect later scheduling and placement decisions of QASM operations, yielding higher values on critical performance metrics (gate count and circuit depth) as a result of quantum compilers introducing SWAP operations to meet the underlying physical neighboring and connectivity constraints of the quantum device.
In this paper we introduce a novel qubit mapping approach, string-based qubit mapping. The key insight is to prioritize the mapping of logical qubits that appear in longest repeating non-overlapping substrings of qubit pairs accessed. This mapping method is complemented by allocating qubits according to their global frequency usage. We evaluate and compare our new mapping scheme against two quantum compilers (QISKIT and TKET) and two device topologies, the IBM Manhattan (65 qubits) and the IBM Kolkata (27 qubits). Our results demonstrate that combining both mapping mechanisms often achieve better results than either one individually, allowing us to best QISKIT and TKET baselines, yielding between 13% and 17% average improvement in several group sizes, up to 32% circuit depth reduction and 63% gate volume improvement.
△ Less
Submitted 5 November, 2021;
originally announced November 2021.
-
Bach Style Music Authoring System based on Deep Learning
Authors:
Minghe Kong,
Lican Huang
Abstract:
With the continuous improvement in various aspects in the field of artificial intelligence, the momentum of artificial intelligence with deep learning capabilities into the field of music is coming. The research purpose of this paper is to design a Bach style music authoring system based on deep learning. We use a LSTM neural network to train serialized and standardized music feature data. By repe…
▽ More
With the continuous improvement in various aspects in the field of artificial intelligence, the momentum of artificial intelligence with deep learning capabilities into the field of music is coming. The research purpose of this paper is to design a Bach style music authoring system based on deep learning. We use a LSTM neural network to train serialized and standardized music feature data. By repeated experiments, we find the optimal LSTM model which can generate imitation of Bach music. Finally the generated music is comprehensively evaluated in the form of online audition and Turing test. The repertoires which the music generation system constructed in this article are very close to the style of Bach's original music, and it is relatively difficult for ordinary people to distinguish the musics Bach authored and AI created.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Lachesis: Scalable Asynchronous BFT on DAG Streams
Authors:
Quan Nguyen,
Andre Cronje,
Michael Kong,
Egor Lysenko,
Alex Guzev
Abstract:
This paper consolidates the core technologies and key concepts of our novel Lachesis consensus protocol and Fantom Opera platform, which is permissionless, leaderless and EVM compatible.
We introduce our new protocol, so-called Lachesis, for distributed networks achieving Byzantine fault tolerance (BFT)~\cite{lachesis01}. Each node in Lachesis protocol operates on a local block DAG, namely \emph…
▽ More
This paper consolidates the core technologies and key concepts of our novel Lachesis consensus protocol and Fantom Opera platform, which is permissionless, leaderless and EVM compatible.
We introduce our new protocol, so-called Lachesis, for distributed networks achieving Byzantine fault tolerance (BFT)~\cite{lachesis01}. Each node in Lachesis protocol operates on a local block DAG, namely \emph{OPERA DAG}. Aiming for a low time to finality (TTF) for transactions, our general model considers DAG streams of high speed but asynchronous events. We integrate Proof-of-Stake (PoS) into a DAG model in Lachesis protocol to improve performance and security. Our general model of trustless system leverages participants' stake as their validating power~\cite{stakedag}. Lachesis's consensus algorithm uses Lamport timestamps, graph layering and concurrent common knowledge to guarantee a consistent total ordering of event blocks and transactions. In addition, Lachesis protocol allows dynamic participation of new nodes into Opera network. Lachesis optimizes DAG storage and processing time by splitting local history into checkpoints (so-called epochs). We also propose a model to improve stake decentralization, and network safety and liveness ~\cite{stairdag}.
Built on our novel Lachesis protocol, Fantom's Opera platform is a public, leaderless, asynchronous BFT, layer-1 blockchain, with guaranteed deterministic finality. Hence, Lachesis protocol is suitable for distributed ledgers by leveraging asynchronous partially ordered sets with logical time ordering instead of blockchains. We also present our proofs into a model that can be applied to abstract asynchronous distributed system.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
Trip-ROMA: Self-Supervised Learning with Triplets and Random Mappings
Authors:
Wenbin Li,
Xuesong Yang,
Meihao Kong,
Lei Wang,
Jing Huo,
Yang Gao,
Jiebo Luo
Abstract:
Contrastive self-supervised learning (SSL) methods, such as MoCo and SimCLR, have achieved great success in unsupervised visual representation learning. They rely on a large number of negative pairs and thus require either large memory banks or large batches. Some recent non-contrastive SSL methods, such as BYOL and SimSiam, attempt to discard negative pairs and have also shown remarkable performa…
▽ More
Contrastive self-supervised learning (SSL) methods, such as MoCo and SimCLR, have achieved great success in unsupervised visual representation learning. They rely on a large number of negative pairs and thus require either large memory banks or large batches. Some recent non-contrastive SSL methods, such as BYOL and SimSiam, attempt to discard negative pairs and have also shown remarkable performance. To avoid collapsed solutions caused by not using negative pairs, these methods require non-trivial asymmetry designs. However, in small data regimes, we can not obtain a sufficient number of negative pairs or effectively avoid the over-fitting problem when negatives are not used at all. To address this situation, we argue that negative pairs are still important but one is generally sufficient for each positive pair. We show that a simple Triplet-based loss (Trip) can achieve surprisingly good performance without requiring large batches or asymmetry designs. Moreover, to alleviate the over-fitting problem in small data regimes and further enhance the effect of Trip, we propose a simple plug-and-play RandOm MApping (ROMA) strategy by randomly mapping samples into other spaces and requiring these randomly projected samples to satisfy the same relationship indicated by the triplets. Integrating the triplet-based loss with random mapping, we obtain the proposed method Trip-ROMA. Extensive experiments, including unsupervised representation learning and unsupervised few-shot learning, have been conducted on ImageNet-1K and seven small datasets. They successfully demonstrate the effectiveness of Trip-ROMA and consistently show that ROMA can further effectively boost other SSL methods. Code is available at https://github.com/WenbinLee/Trip-ROMA.
△ Less
Submitted 23 August, 2023; v1 submitted 21 July, 2021;
originally announced July 2021.
-
Exploring the Impact of Affine Loop Transformations in Qubit Allocation
Authors:
Martin Kong
Abstract:
Most quantum compiler transformations and qubit allocation techniques to date are either peep-hole focused or rely on sliding windows that depend on a number of external parameters. Thus, global optimization criteria are still lacking. In this paper we explore the synergies and impact of affine loop transformations in the context of qubit allocation and mapping. With this goal in mind, we have imp…
▽ More
Most quantum compiler transformations and qubit allocation techniques to date are either peep-hole focused or rely on sliding windows that depend on a number of external parameters. Thus, global optimization criteria are still lacking. In this paper we explore the synergies and impact of affine loop transformations in the context of qubit allocation and mapping. With this goal in mind, we have implemented a domain specific language and source-to-source compiler for quantum circuits that can be directly described with affine relations. We conduct an extensive evaluation spanning 8 quantum circuits taken from the literature, 3 distinct coupling graphs, 4 affine transformations (including the Pluto dependence distance minimization and Feautrier's minimum latency algorithms), and 4 qubit allocators. Our results demonstrate that affine transformations using global optimization criteria can cooperate effectively in several scenarios with quantum qubit mapping algorithms to reduce the circuit depth, size and allocation time.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Deep Sequential Feature Learning in Clinical Image Classification of Infectious Keratitis
Authors:
Yesheng Xu,
Ming Kong,
Wenjia Xie,
Runping Duan,
Zhengqing Fang,
Yuxiao Lin,
Qiang Zhu,
Siliang Tang,
Fei Wu,
Yu-Feng Yao
Abstract:
Infectious keratitis is the most common entities of corneal diseases, in which pathogen grows in the cornea leading to inflammation and destruction of the corneal tissues. Infectious keratitis is a medical emergency, for which a rapid and accurate diagnosis is needed for speedy initiation of prompt and precise treatment to halt the disease progress and to limit the extent of corneal damage; otherw…
▽ More
Infectious keratitis is the most common entities of corneal diseases, in which pathogen grows in the cornea leading to inflammation and destruction of the corneal tissues. Infectious keratitis is a medical emergency, for which a rapid and accurate diagnosis is needed for speedy initiation of prompt and precise treatment to halt the disease progress and to limit the extent of corneal damage; otherwise it may develop sight-threatening and even eye-globe-threatening condition. In this paper, we propose a sequential-level deep learning model to effectively discriminate the distinction and subtlety of infectious corneal disease via the classification of clinical images. In this approach, we devise an appropriate mechanism to preserve the spatial structures of clinical images and disentangle the informative features for clinical image classification of infectious keratitis. In competition with 421 ophthalmologists, the performance of the proposed sequential-level deep model achieved 80.00% diagnostic accuracy, far better than the 49.27% diagnostic accuracy achieved by ophthalmologists over 120 test images.
△ Less
Submitted 4 June, 2020;
originally announced June 2020.
-
OV: Validity-based Optimistic Smart Contracts
Authors:
Quan Nguyen,
Andre Cronje,
Michael Kong
Abstract:
Smart contract (SC) platforms form blocks of transactions into a chain and execute them via user-defined smart contracts. In conventional platforms like Bitcoin and Ethereum, the transactions within a block are executed \emph{sequentially} by the miner and are then validated \emph{sequentially} by the validators to reach consensus about the final state of the block.
In order to leverage the adva…
▽ More
Smart contract (SC) platforms form blocks of transactions into a chain and execute them via user-defined smart contracts. In conventional platforms like Bitcoin and Ethereum, the transactions within a block are executed \emph{sequentially} by the miner and are then validated \emph{sequentially} by the validators to reach consensus about the final state of the block.
In order to leverage the advances of multicores, this paper explores the next generation of smart contract platforms that enables concurrent execution of such contracts. Reasoning about the validity of the object states is challenging in concurrent smart contracts. We examine a programming model to support \emph{optimistic} execution of SCTs. We introduce a novel programming language, so-called OV, and a Solidity API to ease programing of optimistic smart contracts. OV language together with static checking will help reasoning about a crucial property of optimistically executed smart contracts -- the validity of object states in trustless systems.
△ Less
Submitted 8 April, 2020;
originally announced April 2020.
-
Fast Stochastic Peer Selection in Proof-of-Stake Protocols
Authors:
Quan Nguyen,
Andre Cronje,
Michael Kong
Abstract:
The problem of peer selection, which randomly selects a peer from a set, is commonplace in Proof-of-Stake (PoS) protocols. In PoS, peers are chosen randomly with probability proportional to the amount of stake that they possess. This paper presents an approach that relates PoS peer selection to Roulette-wheel selection, which is frequently used in genetic and evolutionary algorithms or complex net…
▽ More
The problem of peer selection, which randomly selects a peer from a set, is commonplace in Proof-of-Stake (PoS) protocols. In PoS, peers are chosen randomly with probability proportional to the amount of stake that they possess. This paper presents an approach that relates PoS peer selection to Roulette-wheel selection, which is frequently used in genetic and evolutionary algorithms or complex network modelling. In particular, we introduce the use of stochastic acceptance algorithm [6] for fast peer selection. The roulette-wheel selection algorithm [6] achieves O(1) complexity based on stochastic acceptance, whereas searching based algorithms may take O(N ) or O(logN ) complexity in a network of N peers.
△ Less
Submitted 11 November, 2019;
originally announced November 2019.
-
StairDag: Cross-DAG Validation For Scalable BFT Consensus
Authors:
Quan Nguyen,
Andre Cronje,
Michael Kong,
Alex Kampa,
George Samman
Abstract:
This paper introduces a new consensus protocol, so-called \emph{\stair}, for fast consensus in DAG-based trustless system. In \stair, we propose a new approach to creating local block DAG, namely \emph{x-DAG} (cross-DAG), on each node. \emph{\stair} protocol is based on our Proof-of-Stake StakeDag framework \cite{stakedag} that distinguishes participants into users and validators by their stake. B…
▽ More
This paper introduces a new consensus protocol, so-called \emph{\stair}, for fast consensus in DAG-based trustless system. In \stair, we propose a new approach to creating local block DAG, namely \emph{x-DAG} (cross-DAG), on each node. \emph{\stair} protocol is based on our Proof-of-Stake StakeDag framework \cite{stakedag} that distinguishes participants into users and validators by their stake. Both users and validators can create and validate event blocks. Unlike StakeDag's DAG, x-DAG ensures that each new block has to have parent blocks from both Users and Validators to achieve more safety and liveness. Our protocol leverages a pool of validators to expose more validating power to new blocks for faster consensus in a leaderless asynchronous system. Further, our framework allows participants to join as observers / monitors, who can retrieve DAG for post-validation, but do not participate in onchain validation.
△ Less
Submitted 4 September, 2019; v1 submitted 29 August, 2019;
originally announced August 2019.
-
StakeDag: Stake-based Consensus For Scalable Trustless Systems
Authors:
Quan Nguyen,
Andre Cronje,
Michael Kong,
Alex Kampa,
George Samman
Abstract:
Trustless systems, such as those blockchain enpowered, provide trust in the system regardless of the trust of its participants, who may be honest or malicious. Proof-of-stake (PoS) protocols and DAG-based approaches have emerged as a better alternative than the proof of work (PoW) for consensus. This paper introduces a new model, so-called \emph{\stakedag}, which aims for PoS consensus in a DAG-ba…
▽ More
Trustless systems, such as those blockchain enpowered, provide trust in the system regardless of the trust of its participants, who may be honest or malicious. Proof-of-stake (PoS) protocols and DAG-based approaches have emerged as a better alternative than the proof of work (PoW) for consensus. This paper introduces a new model, so-called \emph{\stakedag}, which aims for PoS consensus in a DAG-based trustless system. We address a general model of trustless system in which participants are distinguished by their stake or trust: users and validators. Users are normal participants with a no assumed trust and validators are high profile participants with an established trust. We then propose a new family of stake-based consensus protocols $\mathfrak{S}$, operating on the DAG as in the Lachesis protocol~\cite{lachesis01}. Specifically, we propose a stake-based protocol $S_φ$ that leverages participants' stake as validating weights to achieve more secure distributed systems with practical Byzantine fault tolerance (pBFT) in leaderless asynchronous Directed Acyclic Graph (DAG). We then present a general model of staking for asynchronous DAG-based distributed systems.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
A Performance Vocabulary for Affine Loop Transformations
Authors:
Martin Kong,
Louis-Noël Pouchet
Abstract:
Modern polyhedral compilers excel at aggressively optimizing codes with static control parts, but the state-of-practice to find high-performance polyhedral transformations especially for different hardware targets still largely involves auto-tuning. In this work we propose a novel polyhedral scheduling technique, with the aim to reduce the need for auto-tuning while allowing to build customizable…
▽ More
Modern polyhedral compilers excel at aggressively optimizing codes with static control parts, but the state-of-practice to find high-performance polyhedral transformations especially for different hardware targets still largely involves auto-tuning. In this work we propose a novel polyhedral scheduling technique, with the aim to reduce the need for auto-tuning while allowing to build customizable and specific transformation strategies. We design constraints and objectives that model several crucial aspects of performance such as stride optimization or the trade-off between parallelism and reuse, while taking into account important architectural features of the target machine. The developed set of objectives embody a Performance Vocabulary for loop transformations. The goal is to use this vocabulary, consisting of performance idioms, to construct transformation recipes adapted to a number of program classes. We evaluate our work using the PolyBench/C benchmark suite and experimentally validate it against large optimization spaces generated with the Pluto compiler on a 10-core Intel Core-i9 (Skylake-X). Our results show that we can achieve comparable or superior performance to Pluto on the majority of benchmarks, without implementing tiling in the source code nor using experimental autotuning.
△ Less
Submitted 9 April, 2019; v1 submitted 14 November, 2018;
originally announced November 2018.
-
Vandal: A Scalable Security Analysis Framework for Smart Contracts
Authors:
Lexi Brent,
Anton Jurisevic,
Michael Kong,
Eric Liu,
Francois Gauthier,
Vincent Gramoli,
Ralph Holz,
Bernhard Scholz
Abstract:
The rise of modern blockchains has facilitated the emergence of smart contracts: autonomous programs that live and run on the blockchain. Smart contracts have seen a rapid climb to prominence, with applications predicted in law, business, commerce, and governance.
Smart contracts are commonly written in a high-level language such as Ethereum's Solidity, and translated to compact low-level byteco…
▽ More
The rise of modern blockchains has facilitated the emergence of smart contracts: autonomous programs that live and run on the blockchain. Smart contracts have seen a rapid climb to prominence, with applications predicted in law, business, commerce, and governance.
Smart contracts are commonly written in a high-level language such as Ethereum's Solidity, and translated to compact low-level bytecode for deployment on the blockchain. Once deployed, the bytecode is autonomously executed, usually by a %Turing-complete virtual machine. As with all programs, smart contracts can be highly vulnerable to malicious attacks due to deficient programming methodologies, languages, and toolchains, including buggy compilers. At the same time, smart contracts are also high-value targets, often commanding large amounts of cryptocurrency. Hence, developers and auditors need security frameworks capable of analysing low-level bytecode to detect potential security vulnerabilities.
In this paper, we present Vandal: a security analysis framework for Ethereum smart contracts. Vandal consists of an analysis pipeline that converts low-level Ethereum Virtual Machine (EVM) bytecode to semantic logic relations. Users of the framework can express security analyses in a declarative fashion: a security analysis is expressed in a logic specification written in the \souffle language. We conduct a large-scale empirical study for a set of common smart contract security vulnerabilities, and show the effectiveness and efficiency of Vandal. Vandal is both fast and robust, successfully analysing over 95\% of all 141k unique contracts with an average runtime of 4.15 seconds; outperforming the current state of the art tools---Oyente, EthIR, Mythril, and Rattle---under equivalent conditions.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.