-
Robust Optimal Task Planning to Maximize Battery Life
Authors:
Jiachen Li,
Chu Jian,
Feiyang Zhao,
Shihao Li,
Wei Li,
Dongmei Chen
Abstract:
This paper proposes a control-oriented optimization platform for autonomous mobile robots (AMRs), focusing on extending battery life while ensuring task completion. The requirement of fast AMR task planning while maintaining minimum battery state of charge, thus maximizing the battery life, renders a bilinear optimization problem. McCormick envelop technique is proposed to linearize the bilinear t…
▽ More
This paper proposes a control-oriented optimization platform for autonomous mobile robots (AMRs), focusing on extending battery life while ensuring task completion. The requirement of fast AMR task planning while maintaining minimum battery state of charge, thus maximizing the battery life, renders a bilinear optimization problem. McCormick envelop technique is proposed to linearize the bilinear term. A novel planning algorithm with relaxed constraints is also developed to handle parameter uncertainties robustly with high efficiency ensured. Simulation results are provided to demonstrate the utility of the proposed methods in reducing battery degradation while satisfying task completion requirements.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
PO-GVINS: Tightly Coupled GNSS-Visual-Inertial Integration with Pose-Only Representation
Authors:
Zhuo Xu,
Feng Zhu,
Zihang Zhang,
Chang Jian,
Jiarui Lv,
Yuantai Zhang,
Xiaohong Zhang
Abstract:
Accurate and reliable positioning is crucial for perception, decision-making, and other high-level applications in autonomous driving, unmanned aerial vehicles, and intelligent robots. Given the inherent limitations of standalone sensors, integrating heterogeneous sensors with complementary capabilities is one of the most effective approaches to achieving this goal. In this paper, we propose a fil…
▽ More
Accurate and reliable positioning is crucial for perception, decision-making, and other high-level applications in autonomous driving, unmanned aerial vehicles, and intelligent robots. Given the inherent limitations of standalone sensors, integrating heterogeneous sensors with complementary capabilities is one of the most effective approaches to achieving this goal. In this paper, we propose a filtering-based, tightly coupled global navigation satellite system (GNSS)-visual-inertial positioning framework with a pose-only formulation applied to the visual-inertial system (VINS), termed PO-GVINS. Specifically, multiple-view imaging used in current VINS requires a priori of 3D feature, then jointly estimate camera poses and 3D feature position, which inevitably introduces linearization error of the feature as well as facing dimensional explosion. However, the pose-only (PO) formulation, which is demonstrated to be equivalent to the multiple-view imaging and has been applied in visual reconstruction, represent feature depth using two camera poses and thus 3D feature position is removed from state vector avoiding aforementioned difficulties. Inspired by this, we first apply PO formulation in our VINS, i.e., PO-VINS. GNSS raw measurements are then incorporated with integer ambiguity resolved to achieve accurate and drift-free estimation. Extensive experiments demonstrate that the proposed PO-VINS significantly outperforms the multi-state constrained Kalman filter (MSCKF). By incorporating GNSS measurements, PO-GVINS achieves accurate, drift-free state estimation, making it a robust solution for positioning in challenging environments.
△ Less
Submitted 16 January, 2025; v1 submitted 13 January, 2025;
originally announced January 2025.
-
Unlocking TriLevel Learning with Level-Wise Zeroth Order Constraints: Distributed Algorithms and Provable Non-Asymptotic Convergence
Authors:
Yang Jiao,
Kai Yang,
Chengtao Jian
Abstract:
Trilevel learning (TLL) found diverse applications in numerous machine learning applications, ranging from robust hyperparameter optimization to domain adaptation. However, existing researches primarily focus on scenarios where TLL can be addressed with first order information available at each level, which is inadequate in many situations involving zeroth order constraints, such as when black-box…
▽ More
Trilevel learning (TLL) found diverse applications in numerous machine learning applications, ranging from robust hyperparameter optimization to domain adaptation. However, existing researches primarily focus on scenarios where TLL can be addressed with first order information available at each level, which is inadequate in many situations involving zeroth order constraints, such as when black-box models are employed. Moreover, in trilevel learning, data may be distributed across various nodes, necessitating strategies to address TLL problems without centralizing data on servers to uphold data privacy. To this end, an effective distributed trilevel zeroth order learning framework DTZO is proposed in this work to address the TLL problems with level-wise zeroth order constraints in a distributed manner. The proposed DTZO is versatile and can be adapted to a wide range of (grey-box) TLL problems with partial zeroth order constraints. In DTZO, the cascaded polynomial approximation can be constructed without relying on gradients or sub-gradients, leveraging a novel cut, i.e., zeroth order cut. Furthermore, we theoretically carry out the non-asymptotic convergence rate analysis for the proposed DTZO in achieving the $ε$-stationary point. Extensive experiments have been conducted to demonstrate and validate the superior performance of the proposed DTZO, e.g., it approximately achieves up to a 40$\%$ improvement in performance.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Tri-Level Navigator: LLM-Empowered Tri-Level Learning for Time Series OOD Generalization
Authors:
Chengtao Jian,
Kai Yang,
Yang Jiao
Abstract:
Out-of-Distribution (OOD) generalization in machine learning is a burgeoning area of study. Its primary goal is to enhance the adaptability and resilience of machine learning models when faced with new, unseen, and potentially adversarial data that significantly diverges from their original training datasets. In this paper, we investigate time series OOD generalization via pre-trained Large Langua…
▽ More
Out-of-Distribution (OOD) generalization in machine learning is a burgeoning area of study. Its primary goal is to enhance the adaptability and resilience of machine learning models when faced with new, unseen, and potentially adversarial data that significantly diverges from their original training datasets. In this paper, we investigate time series OOD generalization via pre-trained Large Language Models (LLMs). We first propose a novel \textbf{T}ri-level learning framework for \textbf{T}ime \textbf{S}eries \textbf{O}OD generalization, termed TTSO, which considers both sample-level and group-level uncertainties. This formula offers a fresh theoretic perspective for formulating and analyzing OOD generalization problem. In addition, we provide a theoretical analysis to justify this method is well motivated. We then develop a stratified localization algorithm tailored for this tri-level optimization problem, theoretically demonstrating the guaranteed convergence of the proposed algorithm. Our analysis also reveals that the iteration complexity to obtain an $ε$-stationary point is bounded by O($\frac{1}{ε^{2}}$). Extensive experiments on real-world datasets have been conducted to elucidate the effectiveness of the proposed method.
△ Less
Submitted 1 November, 2024; v1 submitted 9 October, 2024;
originally announced October 2024.
-
Revisiting Vision-Language Features Adaptation and Inconsistency for Social Media Popularity Prediction
Authors:
Chih-Chung Hsu,
Chia-Ming Lee,
Yu-Fan Lin,
Yi-Shiuan Chou,
Chih-Yu Jian,
Chi-Han Tsai
Abstract:
Social media popularity (SMP) prediction is a complex task involving multi-modal data integration. While pre-trained vision-language models (VLMs) like CLIP have been widely adopted for this task, their effectiveness in capturing the unique characteristics of social media content remains unexplored. This paper critically examines the applicability of CLIP-based features in SMP prediction, focusing…
▽ More
Social media popularity (SMP) prediction is a complex task involving multi-modal data integration. While pre-trained vision-language models (VLMs) like CLIP have been widely adopted for this task, their effectiveness in capturing the unique characteristics of social media content remains unexplored. This paper critically examines the applicability of CLIP-based features in SMP prediction, focusing on the overlooked phenomenon of semantic inconsistency between images and text in social media posts. Through extensive analysis, we demonstrate that this inconsistency increases with post popularity, challenging the conventional use of VLM features. We provide a comprehensive investigation of semantic inconsistency across different popularity intervals and analyze the impact of VLM feature adaptation on SMP tasks. Our experiments reveal that incorporating inconsistency measures and adapted text features significantly improves model performance, achieving an SRC of 0.729 and an MAE of 1.227. These findings not only enhance SMP prediction accuracy but also provide crucial insights for developing more targeted approaches in social media analysis.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Real-Time Compressed Sensing for Joint Hyperspectral Image Transmission and Restoration for CubeSat
Authors:
Chih-Chung Hsu,
Chih-Yu Jian,
Eng-Shen Tu,
Chia-Ming Lee,
Guan-Lin Chen
Abstract:
This paper addresses the challenges associated with hyperspectral image (HSI) reconstruction from miniaturized satellites, which often suffer from stripe effects and are computationally resource-limited. We propose a Real-Time Compressed Sensing (RTCS) network designed to be lightweight and require only relatively few training samples for efficient and robust HSI reconstruction in the presence of…
▽ More
This paper addresses the challenges associated with hyperspectral image (HSI) reconstruction from miniaturized satellites, which often suffer from stripe effects and are computationally resource-limited. We propose a Real-Time Compressed Sensing (RTCS) network designed to be lightweight and require only relatively few training samples for efficient and robust HSI reconstruction in the presence of the stripe effect and under noisy transmission conditions. The RTCS network features a simplified architecture that reduces the required training samples and allows for easy implementation on integer-8-based encoders, facilitating rapid compressed sensing for stripe-like HSI, which exactly matches the moderate design of miniaturized satellites on push broom scanning mechanism. This contrasts optimization-based models that demand high-precision floating-point operations, making them difficult to deploy on edge devices. Our encoder employs an integer-8-compatible linear projection for stripe-like HSI data transmission, ensuring real-time compressed sensing. Furthermore, based on the novel two-streamed architecture, an efficient HSI restoration decoder is proposed for the receiver side, allowing for edge-device reconstruction without needing a sophisticated central server. This is particularly crucial as an increasing number of miniaturized satellites necessitates significant computing resources on the ground station. Extensive experiments validate the superior performance of our approach, offering new and vital capabilities for existing miniaturized satellite systems.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Provably Convergent Federated Trilevel Learning
Authors:
Yang Jiao,
Kai Yang,
Tiancheng Wu,
Chengtao Jian,
Jianwei Huang
Abstract:
Trilevel learning, also called trilevel optimization (TLO), has been recognized as a powerful modelling tool for hierarchical decision process and widely applied in many machine learning applications, such as robust neural architecture search, hyperparameter optimization, and domain adaptation. Tackling TLO problems has presented a great challenge due to their nested decision-making structure. In…
▽ More
Trilevel learning, also called trilevel optimization (TLO), has been recognized as a powerful modelling tool for hierarchical decision process and widely applied in many machine learning applications, such as robust neural architecture search, hyperparameter optimization, and domain adaptation. Tackling TLO problems has presented a great challenge due to their nested decision-making structure. In addition, existing works on TLO face the following key challenges: 1) they all focus on the non-distributed setting, which may lead to privacy breach; 2) they do not offer any non-asymptotic convergence analysis which characterizes how fast an algorithm converges. To address the aforementioned challenges, this paper proposes an asynchronous federated trilevel optimization method to solve TLO problems. The proposed method utilizes $μ$-cuts to construct a hyper-polyhedral approximation for the TLO problem and solve it in an asynchronous manner. We demonstrate that the proposed $μ$-cuts are applicable to not only convex functions but also a wide range of non-convex functions that meet the $μ$-weakly convex assumption. Furthermore, we theoretically analyze the non-asymptotic convergence rate for the proposed method by showing its iteration complexity to obtain $ε$-stationary point is upper bounded by $\mathcal{O}(\frac{1}{ε^2})$. Extensive experiments on real-world datasets have been conducted to elucidate the superiority of the proposed method, e.g., it has a faster convergence rate with a maximum acceleration of approximately 80$\%$.
△ Less
Submitted 21 January, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
DFGET: Displacement-Field Assisted Graph Energy Transmitter for Gland Instance Segmentation
Authors:
Caiqing Jian,
Yongbin Qin,
Lihui Wang
Abstract:
Gland instance segmentation is an essential but challenging task in the diagnosis and treatment of adenocarcinoma. The existing models usually achieve gland instance segmentation through multi-task learning and boundary loss constraint. However, how to deal with the problems of gland adhesion and inaccurate boundary in segmenting the complex samples remains a challenge. In this work, we propose a…
▽ More
Gland instance segmentation is an essential but challenging task in the diagnosis and treatment of adenocarcinoma. The existing models usually achieve gland instance segmentation through multi-task learning and boundary loss constraint. However, how to deal with the problems of gland adhesion and inaccurate boundary in segmenting the complex samples remains a challenge. In this work, we propose a displacement-field assisted graph energy transmitter (DFGET) framework to solve these problems. Specifically, a novel message passing manner based on anisotropic diffusion is developed to update the node features, which can distinguish the isomorphic graphs and improve the expressivity of graph nodes for complex samples. Using such graph framework, the gland semantic segmentation map and the displacement field (DF) of the graph nodes are estimated with two graph network branches. With the constraint of DF, a graph cluster module based on diffusion theory is presented to improve the intra-class feature consistency and inter-class feature discrepancy, as well as to separate the adherent glands from the semantic segmentation maps. Extensive comparison and ablation experiments on the GlaS dataset demonstrate the superiority of DFGET and effectiveness of the proposed anisotropic message passing manner and clustering method. Compared to the best comparative model, DFGET increases the object-Dice and object-F1 score by 2.5% and 3.4% respectively, while decreases the object-HD by 32.4%, achieving state-of-the-art performance.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Strong Baseline and Bag of Tricks for COVID-19 Detection of CT Scans
Authors:
Chih-Chung Hsu,
Chih-Yu Jian,
Chia-Ming Lee,
Chi-Han Tsai,
Sheng-Chieh Dai
Abstract:
This paper investigates the application of deep learning models for lung Computed Tomography (CT) image analysis. Traditional deep learning frameworks encounter compatibility issues due to variations in slice numbers and resolutions in CT images, which stem from the use of different machines. Commonly, individual slices are predicted and subsequently merged to obtain the final result; however, thi…
▽ More
This paper investigates the application of deep learning models for lung Computed Tomography (CT) image analysis. Traditional deep learning frameworks encounter compatibility issues due to variations in slice numbers and resolutions in CT images, which stem from the use of different machines. Commonly, individual slices are predicted and subsequently merged to obtain the final result; however, this approach lacks slice-wise feature learning and consequently results in decreased performance. We propose a novel slice selection method for each CT dataset to address this limitation, effectively filtering out uncertain slices and enhancing the model's performance. Furthermore, we introduce a spatial-slice feature learning (SSFL) technique\cite{hsu2022} that employs a conventional and efficient backbone model for slice feature training, followed by extracting one-dimensional data from the trained model for COVID and non-COVID classification using a dedicated classification model. Leveraging these experimental steps, we integrate one-dimensional features with multiple slices for channel merging and employ a 2D convolutional neural network (CNN) model for classification. In addition to the aforementioned methods, we explore various high-performance classification models, ultimately achieving promising results.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Asynchronous Distributed Bilevel Optimization
Authors:
Yang Jiao,
Kai Yang,
Tiancheng Wu,
Dongjin Song,
Chengtao Jian
Abstract:
Bilevel optimization plays an essential role in many machine learning tasks, ranging from hyperparameter optimization to meta-learning. Existing studies on bilevel optimization, however, focus on either centralized or synchronous distributed setting. The centralized bilevel optimization approaches require collecting massive amount of data to a single server, which inevitably incur significant comm…
▽ More
Bilevel optimization plays an essential role in many machine learning tasks, ranging from hyperparameter optimization to meta-learning. Existing studies on bilevel optimization, however, focus on either centralized or synchronous distributed setting. The centralized bilevel optimization approaches require collecting massive amount of data to a single server, which inevitably incur significant communication expenses and may give rise to data privacy risks. Synchronous distributed bilevel optimization algorithms, on the other hand, often face the straggler problem and will immediately stop working if a few workers fail to respond. As a remedy, we propose Asynchronous Distributed Bilevel Optimization (ADBO) algorithm. The proposed ADBO can tackle bilevel optimization problems with both nonconvex upper-level and lower-level objective functions, and its convergence is theoretically guaranteed. Furthermore, it is revealed through theoretic analysis that the iteration complexity of ADBO to obtain the $ε$-stationary point is upper bounded by $\mathcal{O}(\frac{1}{ε^2})$. Thorough empirical studies on public datasets have been conducted to elucidate the effectiveness and efficiency of the proposed ADBO.
△ Less
Submitted 23 February, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
A lightweight deep learning based cloud detection method for Sentinel-2A imagery fusing multi-scale spectral and spatial features
Authors:
Jun Li,
Zhaocong Wu,
Zhongwen Hu,
Canliang Jian,
Shaojie Luo,
Lichao Mou,
Xiao Xiang Zhu,
Matthieu Molinier
Abstract:
Clouds are a very important factor in the availability of optical remote sensing images. Recently, deep learning-based cloud detection methods have surpassed classical methods based on rules and physical models of clouds. However, most of these deep models are very large which limits their applicability and explainability, while other models do not make use of the full spectral information in mult…
▽ More
Clouds are a very important factor in the availability of optical remote sensing images. Recently, deep learning-based cloud detection methods have surpassed classical methods based on rules and physical models of clouds. However, most of these deep models are very large which limits their applicability and explainability, while other models do not make use of the full spectral information in multi-spectral images such as Sentinel-2. In this paper, we propose a lightweight network for cloud detection, fusing multi-scale spectral and spatial features (CDFM3SF) and tailored for processing all spectral bands in Sentinel- 2A images. The proposed method consists of an encoder and a decoder. In the encoder, three input branches are designed to handle spectral bands at their native resolution and extract multiscale spectral features. Three novel components are designed: a mixed depth-wise separable convolution (MDSC) and a shared and dilated residual block (SDRB) to extract multi-scale spatial features, and a concatenation and sum (CS) operation to fuse multi-scale spectral and spatial features with little calculation and no additional parameters. The decoder of CD-FM3SF outputs three cloud masks at the same resolution as input bands to enhance the supervision information of small, middle and large clouds. To validate the performance of the proposed method, we manually labeled 36 Sentinel-2A scenes evenly distributed over mainland China. The experiment results demonstrate that CD-FM3SF outperforms traditional cloud detection methods and state-of-theart deep learning-based methods in both accuracy and speed.
△ Less
Submitted 29 April, 2021;
originally announced May 2021.
-
Scalar Coupling Constant Prediction Using Graph Embedding Local Attention Encoder
Authors:
Caiqing Jian,
Xinyu Cheng,
Jian Zhang,
Lihui Wang
Abstract:
Scalar coupling constant (SCC) plays a key role in the analysis of three-dimensional structure of organic matter, however, the traditional SCC prediction using quantum mechanical calculations is very time-consuming. To calculate SCC efficiently and accurately, we proposed a graph embedding local self-attention encoder (GELAE) model, in which, a novel invariant structure representation of the coupl…
▽ More
Scalar coupling constant (SCC) plays a key role in the analysis of three-dimensional structure of organic matter, however, the traditional SCC prediction using quantum mechanical calculations is very time-consuming. To calculate SCC efficiently and accurately, we proposed a graph embedding local self-attention encoder (GELAE) model, in which, a novel invariant structure representation of the coupling system in terms of bond length, bond angle and dihedral angle was presented firstly, and then a local self-attention module embedded with the adjacent matrix of a graph was designed to extract effectively the features of coupling systems, finally, with a modified classification loss function, the SCC was predicted. To validate the superiority of the proposed method, we conducted a series of comparison experiments using different structure representations, different attention modules, and different losses. The experimental results demonstrate that, compared to the traditional chemical bond structure representations, the rotation and translation invariant structure representations proposed in this work can improve the SCC prediction accuracy; with the graph embedded local self-attention, the mean absolute error (MAE) of the prediction model in the validation set decreases from 0.1603 Hz to 0.1067 Hz; using the classification based loss function instead of the scaled regression loss, the MAE of the predicted SCC can be decreased to 0.0963 HZ, which is close to the quantum chemistry standard on CHAMPS dataset.
△ Less
Submitted 6 September, 2020;
originally announced September 2020.
-
Approximate Discovery of Service Nodes by Duplicate Detection in Flows
Authors:
Zhou Changling,
Xiao Jianguo,
Cui Jian,
Zhang Bei,
Li Feng
Abstract:
Knowledge about which nodes provide services is of critical importance for network administrators. Discovery of service nodes can be done by making full use of duplicate element detection in flows. Because the amount of traffic across network is massive, especially in large ISPs or campus networks, we propose an approximate algorithm with Round-robin Buddy Bloom Filters(RBBF) for service detection…
▽ More
Knowledge about which nodes provide services is of critical importance for network administrators. Discovery of service nodes can be done by making full use of duplicate element detection in flows. Because the amount of traffic across network is massive, especially in large ISPs or campus networks, we propose an approximate algorithm with Round-robin Buddy Bloom Filters(RBBF) for service detection using NetFlow data solely. The properties and analysis of RBBF data structure are also given. Our method has better time/space efficiency than conventional algorithm with a small false positive rate.%portion of false positive. We also demonstrate the contributions through a prototype system by real world case studies.
△ Less
Submitted 14 March, 2015;
originally announced March 2015.