-
SOAP: Style-Omniscient Animatable Portraits
Authors:
Tingting Liao,
Yujian Zheng,
Adilbek Karmanov,
Liwen Hu,
Leyang Jin,
Yuliang Xiu,
Hao Li
Abstract:
Creating animatable 3D avatars from a single image remains challenging due to style limitations (realistic, cartoon, anime) and difficulties in handling accessories or hairstyles. While 3D diffusion models advance single-view reconstruction for general objects, outputs often lack animation controls or suffer from artifacts because of the domain gap. We propose SOAP, a style-omniscient framework to…
▽ More
Creating animatable 3D avatars from a single image remains challenging due to style limitations (realistic, cartoon, anime) and difficulties in handling accessories or hairstyles. While 3D diffusion models advance single-view reconstruction for general objects, outputs often lack animation controls or suffer from artifacts because of the domain gap. We propose SOAP, a style-omniscient framework to generate rigged, topology-consistent avatars from any portrait. Our method leverages a multiview diffusion model trained on 24K 3D heads with multiple styles and an adaptive optimization pipeline to deform the FLAME mesh while maintaining topology and rigging via differentiable rendering. The resulting textured avatars support FACS-based animation, integrate with eyeballs and teeth, and preserve details like braided hair or accessories. Extensive experiments demonstrate the superiority of our method over state-of-the-art techniques for both single-view head modeling and diffusion-based generation of Image-to-3D. Our code and data are publicly available for research purposes at https://github.com/TingtingLiao/soap.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Fluid Antenna-Assisted MU-MIMO Systems with Decentralized Baseband Processing
Authors:
Tianyi Liao,
Wei Guo,
Hengtao He,
Shenghui Song,
Jun Zhang,
Khaled B. Letaief
Abstract:
The fluid antenna system (FAS) has emerged as a disruptive technology, offering unprecedented degrees of freedom (DoF) for wireless communication systems. However, optimizing fluid antenna (FA) positions entails significant computational costs, especially when the number of FAs is large. To address this challenge, we introduce a decentralized baseband processing (DBP) architecture to FAS, which pa…
▽ More
The fluid antenna system (FAS) has emerged as a disruptive technology, offering unprecedented degrees of freedom (DoF) for wireless communication systems. However, optimizing fluid antenna (FA) positions entails significant computational costs, especially when the number of FAs is large. To address this challenge, we introduce a decentralized baseband processing (DBP) architecture to FAS, which partitions the FA array into clusters and enables parallel processing. Based on the DBP architecture, we formulate a weighted sum rate (WSR) maximization problem through joint beamforming and FA position design for FA-assisted multiuser multiple-input multiple-output (MU-MIMO) systems. To solve the WSR maximization problem, we propose a novel decentralized block coordinate ascent (BCA)-based algorithm that leverages matrix fractional programming (FP) and majorization-minimization (MM) methods. The proposed decentralized algorithm achieves low computational, communication, and storage costs, thus unleashing the potential of the DBP architecture. Simulation results show that our proposed algorithm under the DBP architecture reduces computational time by over 70% compared to centralized architectures with negligible WSR performance loss.
△ Less
Submitted 12 May, 2025; v1 submitted 8 May, 2025;
originally announced May 2025.
-
Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions
Authors:
Ting-Hsuan Liao,
Yi Zhou,
Yu Shen,
Chun-Hao Paul Huang,
Saayan Mitra,
Jia-Bin Huang,
Uttaran Bhattacharya
Abstract:
We explore how body shapes influence human motion synthesis, an aspect often overlooked in existing text-to-motion generation methods due to the ease of learning a homogenized, canonical body shape. However, this homogenization can distort the natural correlations between different body shapes and their motion dynamics. Our method addresses this gap by generating body-shape-aware human motions fro…
▽ More
We explore how body shapes influence human motion synthesis, an aspect often overlooked in existing text-to-motion generation methods due to the ease of learning a homogenized, canonical body shape. However, this homogenization can distort the natural correlations between different body shapes and their motion dynamics. Our method addresses this gap by generating body-shape-aware human motions from natural language prompts. We utilize a finite scalar quantization-based variational autoencoder (FSQ-VAE) to quantize motion into discrete tokens and then leverage continuous body shape information to de-quantize these tokens back into continuous, detailed motion. Additionally, we harness the capabilities of a pretrained language model to predict both continuous shape parameters and motion tokens, facilitating the synthesis of text-aligned motions and decoding them into shape-aware motions. We evaluate our method quantitatively and qualitatively, and also conduct a comprehensive perceptual study to demonstrate its efficacy in generating shape-aware motions.
△ Less
Submitted 4 April, 2025;
originally announced April 2025.
-
Joint Beamforming and Antenna Position Optimization for Fluid Antenna-Assisted MU-MIMO Networks
Authors:
Tianyi Liao,
Wei Guo,
Hengtao He,
Shenghui Song,
Jun Zhang,
Khaled B. Letaief
Abstract:
The fluid antenna system (FAS) has emerged as a disruptive technology for future wireless networks, offering unprecedented degrees of freedom (DoF) through the dynamic configuration of antennas in response to propagation environment variations. The integration of fluid antennas (FAs) with multiuser multiple-input multiple-output (MU-MIMO) networks promises substantial weighted sum rate (WSR) gains…
▽ More
The fluid antenna system (FAS) has emerged as a disruptive technology for future wireless networks, offering unprecedented degrees of freedom (DoF) through the dynamic configuration of antennas in response to propagation environment variations. The integration of fluid antennas (FAs) with multiuser multiple-input multiple-output (MU-MIMO) networks promises substantial weighted sum rate (WSR) gains via joint beamforming and FA position optimization. However, the joint design is challenging due to the strong coupling between beamforming matrices and antenna positions. To address the challenge, we propose a novel block coordinate ascent (BCA)-based method in FA-assisted MU-MIMO networks. Specifically, we first employ matrix fractional programming techniques to reformulate the original complex problem into a more tractable form. Then, we solve the reformulated problem following the BCA principle, where we develop a low-complexity majorization maximization algorithm capable of optimizing all FA positions simultaneously. To further reduce the computational, storage, and interconnection costs, we propose a decentralized implementation for our proposed algorithm by utilizing the decentralized baseband processing (DBP) architecture. Simulation results demonstrate that with our proposed algorithm, the FA-assisted MU-MIMO system achieves up to a 47% WSR improvement over conventional MIMO networks equipped with fixed-position antennas. Moreover, the decentralized implementation reduces computation time by approximately 70% and has similar performance compared with the centralized implementation.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Data Poisoning Attacks to Locally Differentially Private Range Query Protocols
Authors:
Ting-Wei Liao,
Chih-Hsun Lin,
Yu-Lin Tsai,
Takao Murakami,
Chia-Mu Yu,
Jun Sakuma,
Chun-Ying Huang,
Hiroaki Kikuchi
Abstract:
Local Differential Privacy (LDP) has been widely adopted to protect user privacy in decentralized data collection. However, recent studies have revealed that LDP protocols are vulnerable to data poisoning attacks, where malicious users manipulate their reported data to distort aggregated results. In this work, we present the first study on data poisoning attacks targeting LDP range query protocols…
▽ More
Local Differential Privacy (LDP) has been widely adopted to protect user privacy in decentralized data collection. However, recent studies have revealed that LDP protocols are vulnerable to data poisoning attacks, where malicious users manipulate their reported data to distort aggregated results. In this work, we present the first study on data poisoning attacks targeting LDP range query protocols, focusing on both tree-based and grid-based approaches. We identify three key challenges in executing such attacks, including crafting consistent and effective fake data, maintaining data consistency across levels or grids, and preventing server detection. To address the first two challenges, we propose novel attack methods that are provably optimal, including a tree-based attack and a grid-based attack, designed to manipulate range query results with high effectiveness. \textbf{Our key finding is that the common post-processing procedure, Norm-Sub, in LDP range query protocols can help the attacker massively amplify their attack effectiveness.} In addition, we study a potential countermeasure, but also propose an adaptive attack capable of evading this defense to address the third challenge. We evaluate our methods through theoretical analysis and extensive experiments on synthetic and real-world datasets. Our results show that the proposed attacks can significantly amplify estimations for arbitrary range queries by manipulating a small fraction of users, providing 5-10x more influence than a normal user to the estimation.
△ Less
Submitted 6 March, 2025; v1 submitted 5 March, 2025;
originally announced March 2025.
-
Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
Authors:
Kaixuan Lu,
Ruiqian Zhang,
Xiao Huang,
Yuxing Xie,
Xiaogang Ning,
Hanchao Zhang,
Mengke Yuan,
Pan Zhang,
Tao Wang,
Tongkui Liao
Abstract:
Recent self-supervised learning (SSL) methods have demonstrated impressive results in learning visual representations from unlabeled remote sensing images. However, most remote sensing images predominantly consist of scenographic scenes containing multiple ground objects without explicit foreground targets, which limits the performance of existing SSL methods that focus on foreground targets. This…
▽ More
Recent self-supervised learning (SSL) methods have demonstrated impressive results in learning visual representations from unlabeled remote sensing images. However, most remote sensing images predominantly consist of scenographic scenes containing multiple ground objects without explicit foreground targets, which limits the performance of existing SSL methods that focus on foreground targets. This raises the question: Is there a method that can automatically aggregate similar objects within scenographic remote sensing images, thereby enabling models to differentiate knowledge embedded in various geospatial patterns for improved feature representation? In this work, we present the Pattern Integration and Enhancement Vision Transformer (PIEViT), a novel self-supervised learning framework designed specifically for remote sensing imagery. PIEViT utilizes a teacher-student architecture to address both image-level and patch-level tasks. It employs the Geospatial Pattern Cohesion (GPC) module to explore the natural clustering of patches, enhancing the differentiation of individual features. The Feature Integration Projection (FIP) module further refines masked token reconstruction using geospatially clustered patches. We validated PIEViT across multiple downstream tasks, including object detection, semantic segmentation, and change detection. Experiments demonstrated that PIEViT enhances the representation of internal patch features, providing significant improvements over existing self-supervised baselines. It achieves excellent results in object detection, land cover classification, and change detection, underscoring its robustness, generalization, and transferability for remote sensing image interpretation tasks.
△ Less
Submitted 9 November, 2024;
originally announced November 2024.
-
Multimodal Relational Triple Extraction with Query-based Entity Object Transformer
Authors:
Lei Hei,
Ning An,
Tingjing Liao,
Qi Ma,
Jiaqi Wang,
Feiliang Ren
Abstract:
Multimodal Relation Extraction is crucial for constructing flexible and realistic knowledge graphs. Recent studies focus on extracting the relation type with entity pairs present in different modalities, such as one entity in the text and another in the image. However, existing approaches require entities and objects given beforehand, which is costly and impractical. To address the limitation, we…
▽ More
Multimodal Relation Extraction is crucial for constructing flexible and realistic knowledge graphs. Recent studies focus on extracting the relation type with entity pairs present in different modalities, such as one entity in the text and another in the image. However, existing approaches require entities and objects given beforehand, which is costly and impractical. To address the limitation, we propose a novel task, Multimodal Entity-Object Relational Triple Extraction, which aims to extract all triples (entity span, relation, object region) from image-text pairs. To facilitate this study, we modified a multimodal relation extraction dataset MORE, which includes 21 relation types, to create a new dataset containing 20,264 triples, averaging 5.75 triples per image-text pair. Moreover, we propose QEOT, a query-based model with a selective attention mechanism, to dynamically explore the interaction and fusion of textual and visual information. In particular, the proposed method can simultaneously accomplish entity extraction, relation classification, and object detection with a set of queries. Our method is suitable for downstream applications and reduces error accumulation due to the pipeline-style approaches. Extensive experimental results demonstrate that our proposed method outperforms the existing baselines by 8.06% and achieves state-of-the-art performance.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
CorrAdaptor: Adaptive Local Context Learning for Correspondence Pruning
Authors:
Wei Zhu,
Yicheng Liu,
Yuping He,
Tangfei Liao,
Kang Zheng,
Xiaoqiu Xu,
Tao Wang,
Tong Lu
Abstract:
In the fields of computer vision and robotics, accurate pixel-level correspondences are essential for enabling advanced tasks such as structure-from-motion and simultaneous localization and mapping. Recent correspondence pruning methods usually focus on learning local consistency through k-nearest neighbors, which makes it difficult to capture robust context for each correspondence. We propose Cor…
▽ More
In the fields of computer vision and robotics, accurate pixel-level correspondences are essential for enabling advanced tasks such as structure-from-motion and simultaneous localization and mapping. Recent correspondence pruning methods usually focus on learning local consistency through k-nearest neighbors, which makes it difficult to capture robust context for each correspondence. We propose CorrAdaptor, a novel architecture that introduces a dual-branch structure capable of adaptively adjusting local contexts through both explicit and implicit local graph learning. Specifically, the explicit branch uses KNN-based graphs tailored for initial neighborhood identification, while the implicit branch leverages a learnable matrix to softly assign neighbors and adaptively expand the local context scope, significantly enhancing the model's robustness and adaptability to complex image variations. Moreover, we design a motion injection module to integrate motion consistency into the network to suppress the impact of outliers and refine local context learning, resulting in substantial performance improvements. The experimental results on extensive correspondence-based tasks indicate that our CorrAdaptor achieves state-of-the-art performance both qualitatively and quantitatively. The code and pre-trained models are available at https://github.com/TaoWangzj/CorrAdaptor.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Greener GRASS: Enhancing GNNs with Encoding, Rewiring, and Attention
Authors:
Tongzhou Liao,
Barnabás Póczos
Abstract:
Graph Neural Networks (GNNs) have become important tools for machine learning on graph-structured data. In this paper, we explore the synergistic combination of graph encoding, graph rewiring, and graph attention, by introducing Graph Attention with Stochastic Structures (GRASS), a novel GNN architecture. GRASS utilizes relative random walk probabilities (RRWP) encoding and a novel decomposed vari…
▽ More
Graph Neural Networks (GNNs) have become important tools for machine learning on graph-structured data. In this paper, we explore the synergistic combination of graph encoding, graph rewiring, and graph attention, by introducing Graph Attention with Stochastic Structures (GRASS), a novel GNN architecture. GRASS utilizes relative random walk probabilities (RRWP) encoding and a novel decomposed variant (D-RRWP) to efficiently capture structural information. It rewires the input graph by superimposing a random regular graph to enhance long-range information propagation. It also employs a novel additive attention mechanism tailored for graph-structured data. Our empirical evaluations demonstrate that GRASS achieves state-of-the-art performance on multiple benchmark datasets, including a 20.3% reduction in mean absolute error on the ZINC dataset.
△ Less
Submitted 14 March, 2025; v1 submitted 8 July, 2024;
originally announced July 2024.
-
Parallax-tolerant Image Stitching via Segmentation-guided Multi-homography Warping
Authors:
Tianli Liao,
Ce Wang,
Lei Li,
Guangen Liu,
Nan Li
Abstract:
Large parallax between images is an intractable issue in image stitching. Various warping-based methods are proposed to address it, yet the results are unsatisfactory. In this paper, we propose a novel image stitching method using multi-homography warping guided by image segmentation. Specifically, we leverage the Segment Anything Model to segment the target image into numerous contents and partit…
▽ More
Large parallax between images is an intractable issue in image stitching. Various warping-based methods are proposed to address it, yet the results are unsatisfactory. In this paper, we propose a novel image stitching method using multi-homography warping guided by image segmentation. Specifically, we leverage the Segment Anything Model to segment the target image into numerous contents and partition the feature points into multiple subsets via the energy-based multi-homography fitting algorithm. The multiple subsets of feature points are used to calculate the corresponding multiple homographies. For each segmented content in the overlapping region, we select its best-fitting homography with the lowest photometric error. For each segmented content in the non-overlapping region, we calculate a weighted combination of the linearized homographies. Finally, the target image is warped via the best-fitting homographies to align with the reference image, and the final panorama is generated via linear blending. Comprehensive experimental results on the public datasets demonstrate that our method provides the best alignment accuracy by a large margin, compared with the state-of-the-art methods. The source code is available at https://github.com/tlliao/multi-homo-warp.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Collective Constitutional AI: Aligning a Language Model with Public Input
Authors:
Saffron Huang,
Divya Siddarth,
Liane Lovitt,
Thomas I. Liao,
Esin Durmus,
Alex Tamkin,
Deep Ganguli
Abstract:
There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a t…
▽ More
There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods that enable the broader public to collectively shape the behavior of LM systems that affect them. To address this need, we present Collective Constitutional AI (CCAI): a multi-stage process for sourcing and integrating public input into LMs-from identifying a target population to sourcing principles to training and evaluating a model. We demonstrate the real-world practicality of this approach by creating what is, to our knowledge, the first LM fine-tuned with collectively sourced public input and evaluating this model against a baseline model trained with established principles from a LM developer. Our quantitative evaluations demonstrate several benefits of our approach: the CCAI-trained model shows lower bias across nine social dimensions compared to the baseline model, while maintaining equivalent performance on language, math, and helpful-harmless evaluations. Qualitative comparisons of the models suggest that the models differ on the basis of their respective constitutions, e.g., when prompted with contentious topics, the CCAI-trained model tends to generate responses that reframe the matter positively instead of a refusal. These results demonstrate a promising, tractable pathway toward publicly informed development of language models.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
CorrMAE: Pre-training Correspondence Transformers with Masked Autoencoder
Authors:
Tangfei Liao,
Xiaoqin Zhang,
Guobao Xiao,
Min Li,
Tao Wang,
Mang Ye
Abstract:
Pre-training has emerged as a simple yet powerful methodology for representation learning across various domains. However, due to the expensive training cost and limited data, pre-training has not yet been extensively studied in correspondence pruning. To tackle these challenges, we propose a pre-training method to acquire a generic inliers-consistent representation by reconstructing masked corres…
▽ More
Pre-training has emerged as a simple yet powerful methodology for representation learning across various domains. However, due to the expensive training cost and limited data, pre-training has not yet been extensively studied in correspondence pruning. To tackle these challenges, we propose a pre-training method to acquire a generic inliers-consistent representation by reconstructing masked correspondences, providing a strong initial representation for downstream tasks. Toward this objective, a modicum of true correspondences naturally serve as input, thus significantly reducing pre-training overhead. In practice, we introduce CorrMAE, an extension of the mask autoencoder framework tailored for the pre-training of correspondence pruning. CorrMAE involves two main phases, \ie correspondence learning and matching point reconstruction, guiding the reconstruction of masked correspondences through learning visible correspondence consistency. Herein, we employ a dual-branch structure with an ingenious positional encoding to reconstruct unordered and irregular correspondences. Also, a bi-level designed encoder is proposed for correspondence learning, which offers enhanced consistency learning capability and transferability. Extensive experiments have shown that the model pre-trained with our CorrMAE outperforms prior work on multiple challenging benchmarks. Meanwhile, our CorrMAE is primarily a task-driven pre-training method, and can achieve notable improvements for downstream tasks by pre-training on the targeted dataset. We hope this work can provide a starting point for correspondence pruning pre-training.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
VividDream: Generating 3D Scene with Ambient Dynamics
Authors:
Yao-Chih Lee,
Yi-Ting Chen,
Andrew Wang,
Ting-Hsuan Liao,
Brandon Y. Feng,
Jia-Bin Huang
Abstract:
We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of…
▽ More
We introduce VividDream, a method for generating explorable 4D scenes with ambient dynamics from a single input image or text prompt. VividDream first expands an input image into a static 3D point cloud through iterative inpainting and geometry merging. An ensemble of animated videos is then generated using video diffusion models with quality refinement techniques and conditioned on renderings of the static 3D scene from the sampled camera trajectories. We then optimize a canonical 4D scene representation using an animated video ensemble, with per-video motion embeddings and visibility masks to mitigate inconsistencies. The resulting 4D scene enables free-view exploration of a 3D scene with plausible ambient scene dynamics. Experiments demonstrate that VividDream can provide human viewers with compelling 4D experiences generated based on diverse real images and text prompts.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Advances in Robust Federated Learning: A Survey with Heterogeneity Considerations
Authors:
Chuan Chen,
Tianchi Liao,
Xiaojun Deng,
Zihou Wu,
Sheng Huang,
Zibin Zheng
Abstract:
In the field of heterogeneous federated learning (FL), the key challenge is to efficiently and collaboratively train models across multiple clients with different data distributions, model structures, task objectives, computational capabilities, and communication resources. This diversity leads to significant heterogeneity, which increases the complexity of model training. In this paper, we first…
▽ More
In the field of heterogeneous federated learning (FL), the key challenge is to efficiently and collaboratively train models across multiple clients with different data distributions, model structures, task objectives, computational capabilities, and communication resources. This diversity leads to significant heterogeneity, which increases the complexity of model training. In this paper, we first outline the basic concepts of heterogeneous federated learning and summarize the research challenges in federated learning in terms of five aspects: data, model, task, device, and communication. In addition, we explore how existing state-of-the-art approaches cope with the heterogeneity of federated learning, and categorize and review these approaches at three different levels: data-level, model-level, and architecture-level. Subsequently, the paper extensively discusses privacy-preserving strategies in heterogeneous federated learning environments. Finally, the paper discusses current open issues and directions for future research, aiming to promote the further development of heterogeneous federated learning.
△ Less
Submitted 8 March, 2025; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Dynamic Loss Decay based Robust Oriented Object Detection on Remote Sensing Images with Noisy Labels
Authors:
Guozhang Liu,
Ting Liu,
Mengke Yuan,
Tao Pang,
Guangxing Yang,
Hao Fu,
Tao Wang,
Tongkui Liao
Abstract:
The ambiguous appearance, tiny scale, and fine-grained classes of objects in remote sensing imagery inevitably lead to the noisy annotations in category labels of detection dataset. However, the effects and treatments of the label noises are underexplored in modern oriented remote sensing object detectors. To address this issue, we propose a robust oriented remote sensing object detection method t…
▽ More
The ambiguous appearance, tiny scale, and fine-grained classes of objects in remote sensing imagery inevitably lead to the noisy annotations in category labels of detection dataset. However, the effects and treatments of the label noises are underexplored in modern oriented remote sensing object detectors. To address this issue, we propose a robust oriented remote sensing object detection method through dynamic loss decay (DLD) mechanism, inspired by the two phase ``early-learning'' and ``memorization'' learning dynamics of deep neural networks on clean and noisy samples. To be specific, we first observe the end point of early learning phase termed as EL, after which the models begin to memorize the false labels that significantly degrade the detection accuracy. Secondly, under the guidance of the training indicator, the losses of each sample are ranked in descending order, and we adaptively decay the losses of the top K largest ones (bad samples) in the following epochs. Because these large losses are of high confidence to be calculated with wrong labels. Experimental results show that the method achieves excellent noise resistance performance tested on multiple public datasets such as HRSC2016 and DOTA-v1.0/v2.0 with synthetic category label noise. Our solution also has won the 2st place in the "fine-grained object detection based on sub-meter remote sensing imagery" track with noisy labels of 2023 National Big Data and Computing Intelligence Challenge.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Under-actuated Robotic Gripper with Multiple Grasping Modes Inspired by Human Finger
Authors:
Jihao Li,
Tingbo Liao,
Hassen Nigatu,
Haotian Guo,
Guodong Lu,
Huixu Dong
Abstract:
Under-actuated robot grippers as a pervasive tool of robots have become a considerable research focus. Despite their simplicity of mechanical design and control strategy, they suffer from poor versatility and weak adaptability, making widespread applications limited. To better relieve relevant research gaps, we present a novel 3-finger linkage-based gripper that realizes retractable and reconfigur…
▽ More
Under-actuated robot grippers as a pervasive tool of robots have become a considerable research focus. Despite their simplicity of mechanical design and control strategy, they suffer from poor versatility and weak adaptability, making widespread applications limited. To better relieve relevant research gaps, we present a novel 3-finger linkage-based gripper that realizes retractable and reconfigurable multi-mode grasps driven by a single motor. Firstly, inspired by the changes that occurred in the contact surface with a human finger moving, we artfully design a slider-slide rail mechanism as the phalanx to achieve retraction of each finger, allowing for better performance in the enveloping grasping mode. Secondly, a reconfigurable structure is constructed to broaden the grasping range of objects' dimensions for the proposed gripper. By adjusting the configuration and gesture of each finger, the gripper can achieve five grasping modes. Thirdly, the proposed gripper is just actuated by a single motor, yet it can be capable of grasping and reconfiguring simultaneously. Finally, various experiments on grasps of slender, thin, and large-volume objects are implemented to evaluate the performance of the proposed gripper in practical scenarios, which demonstrates the excellent grasping capabilities of the gripper.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems
Authors:
Zihao Yi,
Jiarui Ouyang,
Yuwen Liu,
Tianhao Liao,
Zhe Xu,
Ying Shen
Abstract:
This survey provides a comprehensive review of research on multi-turn dialogue systems, with a particular focus on multi-turn dialogue systems based on large language models (LLMs). This paper aims to (a) give a summary of existing LLMs and approaches for adapting LLMs to downstream tasks; (b) elaborate recent advances in multi-turn dialogue systems, covering both LLM-based open-domain dialogue (O…
▽ More
This survey provides a comprehensive review of research on multi-turn dialogue systems, with a particular focus on multi-turn dialogue systems based on large language models (LLMs). This paper aims to (a) give a summary of existing LLMs and approaches for adapting LLMs to downstream tasks; (b) elaborate recent advances in multi-turn dialogue systems, covering both LLM-based open-domain dialogue (ODD) and task-oriented dialogue (TOD) systems, along with datasets and evaluation metrics; (c) discuss some future emphasis and recent research problems arising from the development of LLMs and the increasing demands on multi-turn dialogue systems.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
FedBRB: An Effective Solution to the Small-to-Large Scenario in Device-Heterogeneity Federated Learning
Authors:
Ziyue Xu,
Mingfeng Xu,
Tianchi Liao,
Zibin Zheng,
Chuan Chen
Abstract:
Recently, the success of large models has demonstrated the importance of scaling up model size. This has spurred interest in exploring collaborative training of large-scale models from federated learning perspective. Due to computational constraints, many institutions struggle to train a large-scale model locally. Thus, training a larger global model using only smaller local models has become an i…
▽ More
Recently, the success of large models has demonstrated the importance of scaling up model size. This has spurred interest in exploring collaborative training of large-scale models from federated learning perspective. Due to computational constraints, many institutions struggle to train a large-scale model locally. Thus, training a larger global model using only smaller local models has become an important scenario (i.e., the \textbf{small-to-large scenario}). Although recent device-heterogeneity federated learning approaches have started to explore this area, they face limitations in fully covering the parameter space of the global model. In this paper, we propose a method called \textbf{FedBRB} (\underline{B}lock-wise \underline{R}olling and weighted \underline{B}roadcast) based on the block concept. FedBRB can uses small local models to train all blocks of the large global model, and broadcasts the trained parameters to the entire space for faster information interaction. Experiments demonstrate FedBRB yields substantial performance gains, achieving state-of-the-art results in this scenario. Moreover, FedBRB using only minimal local models can even surpass baselines using larger local models.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
VSFormer: Visual-Spatial Fusion Transformer for Correspondence Pruning
Authors:
Tangfei Liao,
Xiaoqin Zhang,
Li Zhao,
Tao Wang,
Guobao Xiao
Abstract:
Correspondence pruning aims to find correct matches (inliers) from an initial set of putative correspondences, which is a fundamental task for many applications. The process of finding is challenging, given the varying inlier ratios between scenes/image pairs due to significant visual differences. However, the performance of the existing methods is usually limited by the problem of lacking visual…
▽ More
Correspondence pruning aims to find correct matches (inliers) from an initial set of putative correspondences, which is a fundamental task for many applications. The process of finding is challenging, given the varying inlier ratios between scenes/image pairs due to significant visual differences. However, the performance of the existing methods is usually limited by the problem of lacking visual cues (\eg texture, illumination, structure) of scenes. In this paper, we propose a Visual-Spatial Fusion Transformer (VSFormer) to identify inliers and recover camera poses accurately. Firstly, we obtain highly abstract visual cues of a scene with the cross attention between local features of two-view images. Then, we model these visual cues and correspondences by a joint visual-spatial fusion module, simultaneously embedding visual cues into correspondences for pruning. Additionally, to mine the consistency of correspondences, we also design a novel module that combines the KNN-based graph and the transformer, effectively capturing both local and global contexts. Extensive experiments have demonstrated that the proposed VSFormer outperforms state-of-the-art methods on outdoor and indoor benchmarks. Our code is provided at the following repository: https://github.com/sugar-fly/VSFormer.
△ Less
Submitted 4 January, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Tokenized Model: A Blockchain-Empowered Decentralized Model Ownership Verification Platform
Authors:
Yihao Li,
Yanyi Lai,
Tianchi Liao,
Chuan Chen,
Zibin Zheng
Abstract:
With the development of practical deep learning models like generative AI, their excellent performance has brought huge economic value. For instance, ChatGPT has attracted more than 100 million users in three months. Since the model training requires a lot of data and computing power, a well-performing deep learning model is behind a huge effort and cost. Facing various model attacks, unauthorized…
▽ More
With the development of practical deep learning models like generative AI, their excellent performance has brought huge economic value. For instance, ChatGPT has attracted more than 100 million users in three months. Since the model training requires a lot of data and computing power, a well-performing deep learning model is behind a huge effort and cost. Facing various model attacks, unauthorized use and abuse from the network that threaten the interests of model owners, in addition to considering legal and other administrative measures, it is equally important to protect the model's copyright from the technical means. By using the model watermarking technology, we point out the possibility of building a unified platform for model ownership verification. Given the application history of blockchain in copyright verification and the drawbacks of a centralized third-party, this paper considers combining model watermarking technology and blockchain to build a unified model copyright protection platform. By a new solution we called Tokenized Model, it protects the model's copyright by reliable ownership record and verification mechanism. It also promotes the financial value of model by constructing the model's transaction process and contribution shares of a model. In the typical case study, we also study the various performance under usual scenario to verify the effectiveness of this platform.
△ Less
Submitted 27 November, 2023;
originally announced December 2023.
-
Leveraging Local Patch Alignment to Seam-cutting for Large Parallax Image Stitching
Authors:
Tianli Liao,
Chenyang Zhao,
Lei Li,
Heling Cao
Abstract:
Seam cutting methods have been proven effective in the composition step of image stitching, especially for images with parallax. However, current seam cutting can be seen as the subsequent step after the image alignment is settled. Its effectiveness usually depends on the fact that images can be roughly aligned such that a local region exists where an unnoticeable seam can be found. Current alignm…
▽ More
Seam cutting methods have been proven effective in the composition step of image stitching, especially for images with parallax. However, current seam cutting can be seen as the subsequent step after the image alignment is settled. Its effectiveness usually depends on the fact that images can be roughly aligned such that a local region exists where an unnoticeable seam can be found. Current alignment methods often fall short of expectations for images with large parallax, and most efforts are devoted to improving the alignment accuracy.
In this paper, we argue that by adding a simple Local Patch Alignment Module (LPAM) into the seam cutting, the final result can be efficiently improved for large parallax image stitching. Concretely, we first evaluate the quality of pixels along the estimated seam of the seam cutting method. Then, for pixels with low qualities, we separate their enclosing patches in the aligned images and locally align them by constructing modified dense correspondences via SIFT flow. Finally, we composite the aligned patches via seam cutting and merge them into the original aligned result to generate the final mosaic. Experiments show that introducing LPAM can effectively and efficiently improve the stitching results.
△ Less
Submitted 19 January, 2025; v1 submitted 30 November, 2023;
originally announced November 2023.
-
Specific versus General Principles for Constitutional AI
Authors:
Sandipan Kundu,
Yuntao Bai,
Saurav Kadavath,
Amanda Askell,
Andrew Callahan,
Anna Chen,
Anna Goldie,
Avital Balwit,
Azalia Mirhoseini,
Brayden McLean,
Catherine Olsson,
Cassie Evraets,
Eli Tran-Johnson,
Esin Durmus,
Ethan Perez,
Jackson Kernion,
Jamie Kerr,
Kamal Ndousse,
Karina Nguyen,
Nelson Elhage,
Newton Cheng,
Nicholas Schiefer,
Nova DasSarma,
Oliver Rausch,
Robin Larson
, et al. (11 additional authors not shown)
Abstract:
Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative, replacing human feedback with feedback from AI models conditioned only on a list of written principles. We find this approach effectively prevents the expressi…
▽ More
Human feedback can prevent overtly harmful utterances in conversational models, but may not automatically mitigate subtle problematic behaviors such as a stated desire for self-preservation or power. Constitutional AI offers an alternative, replacing human feedback with feedback from AI models conditioned only on a list of written principles. We find this approach effectively prevents the expression of such behaviors. The success of simple principles motivates us to ask: can models learn general ethical behaviors from only a single written principle? To test this, we run experiments using a principle roughly stated as "do what's best for humanity". We find that the largest dialogue models can generalize from this short constitution, resulting in harmless assistants with no stated interest in specific motivations like power. A general principle may thus partially avoid the need for a long list of constitutions targeting potentially harmful behaviors. However, more detailed constitutions still improve fine-grained control over specific types of harms. This suggests both general and specific principles have value for steering AI safely.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
TADA! Text to Animatable Digital Avatars
Authors:
Tingting Liao,
Hongwei Yi,
Yuliang Xiu,
Jiaxaing Tang,
Yangyi Huang,
Justus Thies,
Michael J. Black
Abstract:
We introduce TADA, a simple-yet-effective approach that takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures, that can be animated and rendered with traditional graphics pipelines. Existing text-based character generation methods are limited in terms of geometry and texture quality, and cannot be realistically animated due to inconsistent a…
▽ More
We introduce TADA, a simple-yet-effective approach that takes textual descriptions and produces expressive 3D avatars with high-quality geometry and lifelike textures, that can be animated and rendered with traditional graphics pipelines. Existing text-based character generation methods are limited in terms of geometry and texture quality, and cannot be realistically animated due to inconsistent alignment between the geometry and the texture, particularly in the face region. To overcome these limitations, TADA leverages the synergy of a 2D diffusion model and an animatable parametric body model. Specifically, we derive an optimizable high-resolution body model from SMPL-X with 3D displacements and a texture map, and use hierarchical rendering with score distillation sampling (SDS) to create high-quality, detailed, holistic 3D avatars from text. To ensure alignment between the geometry and texture, we render normals and RGB images of the generated character and exploit their latent embeddings in the SDS training process. We further introduce various expression parameters to deform the generated character during training, ensuring that the semantics of our generated character remain consistent with the original SMPL-X model, resulting in an animatable character. Comprehensive evaluations demonstrate that TADA significantly surpasses existing approaches on both qualitative and quantitative measures. TADA enables creation of large-scale digital character assets that are ready for animation and rendering, while also being easily editable through natural language. The code will be public for research purposes.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Authors:
Yangyi Huang,
Hongwei Yi,
Yuliang Xiu,
Tingting Liao,
Jiaxiang Tang,
Deng Cai,
Justus Thies
Abstract:
Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are su…
▽ More
Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are sufficient to reconstruct unseen areas (e.g., the back view)? Motivated by the power of foundation models, TeCH reconstructs the 3D human by leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles) which are automatically generated via a garment parsing model and Visual Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion model (T2I) which learns the "indescribable" appearance. To represent high-resolution 3D clothed humans at an affordable cost, we propose a hybrid 3D representation based on DMTet, which consists of an explicit body shape grid and an implicit distance field. Guided by the descriptive prompts + personalized T2I diffusion model, the geometry and texture of the 3D humans are optimized through multi-view Score Distillation Sampling (SDS) and reconstruction losses based on the original observation. TeCH produces high-fidelity 3D clothed humans with consistent & delicate texture, and detailed full-body geometry. Quantitative and qualitative experiments demonstrate that TeCH outperforms the state-of-the-art methods in terms of reconstruction accuracy and rendering quality. The code will be publicly available for research purposes at https://huangyangyi.github.io/TeCH
△ Less
Submitted 19 August, 2023; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Linguistic representations for fewer-shot relation extraction across domains
Authors:
Sireesh Gururaja,
Ritam Dutt,
Tinglong Liao,
Carolyn Rose
Abstract:
Recent work has demonstrated the positive impact of incorporating linguistic representations as additional context and scaffolding on the in-domain performance of several NLP tasks. We extend this work by exploring the impact of linguistic representations on cross-domain performance in a few-shot transfer setting. An important question is whether linguistic representations enhance generalizability…
▽ More
Recent work has demonstrated the positive impact of incorporating linguistic representations as additional context and scaffolding on the in-domain performance of several NLP tasks. We extend this work by exploring the impact of linguistic representations on cross-domain performance in a few-shot transfer setting. An important question is whether linguistic representations enhance generalizability by providing features that function as cross-domain pivots. We focus on the task of relation extraction on three datasets of procedural text in two domains, cooking and materials science. Our approach augments a popular transformer-based architecture by alternately incorporating syntactic and semantic graphs constructed by freely available off-the-shelf tools. We examine their utility for enhancing generalization, and investigate whether earlier findings, e.g. that semantic representations can be more helpful than syntactic ones, extend to relation extraction in multiple domains. We find that while the inclusion of these graphs results in significantly higher performance in few-shot transfer, both types of graph exhibit roughly equivalent utility.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Authors:
Esin Durmus,
Karina Nguyen,
Thomas I. Liao,
Nicholas Schiefer,
Amanda Askell,
Anton Bakhtin,
Carol Chen,
Zac Hatfield-Dodds,
Danny Hernandez,
Nicholas Joseph,
Liane Lovitt,
Sam McCandlish,
Orowa Sikder,
Alex Tamkin,
Janel Thamkul,
Jared Kaplan,
Jack Clark,
Deep Ganguli
Abstract:
Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across dif…
▽ More
Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build a dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed to capture diverse opinions on global issues across different countries. Next, we define a metric that quantifies the similarity between LLM-generated survey responses and human responses, conditioned on country. With our framework, we run three experiments on an LLM trained to be helpful, honest, and harmless with Constitutional AI. By default, LLM responses tend to be more similar to the opinions of certain populations, such as those from the USA, and some European and South American countries, highlighting the potential for biases. When we prompt the model to consider a particular country's perspective, responses shift to be more similar to the opinions of the prompted populations, but can reflect harmful cultural stereotypes. When we translate GlobalOpinionQA questions to a target language, the model's responses do not necessarily become the most similar to the opinions of speakers of those languages. We release our dataset for others to use and build on. Our data is at https://huggingface.co/datasets/Anthropic/llm_global_opinions. We also provide an interactive visualization at https://llmglobalvalues.anthropic.com.
△ Less
Submitted 11 April, 2024; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Migrate Demographic Group For Fair GNNs
Authors:
YanMing Hu,
TianChi Liao,
JiaLong Chen,
Jing Bian,
ZiBin Zheng,
Chuan Chen
Abstract:
Graph Neural networks (GNNs) have been applied in many scenarios due to the superior performance of graph learning. However, fairness is always ignored when designing GNNs. As a consequence, biased information in training data can easily affect vanilla GNNs, causing biased results toward particular demographic groups (divided by sensitive attributes, such as race and age). There have been efforts…
▽ More
Graph Neural networks (GNNs) have been applied in many scenarios due to the superior performance of graph learning. However, fairness is always ignored when designing GNNs. As a consequence, biased information in training data can easily affect vanilla GNNs, causing biased results toward particular demographic groups (divided by sensitive attributes, such as race and age). There have been efforts to address the fairness issue. However, existing fair techniques generally divide the demographic groups by raw sensitive attributes and assume that are fixed. The biased information correlated with raw sensitive attributes will run through the training process regardless of the implemented fair techniques. It is urgent to resolve this problem for training fair GNNs. To tackle this problem, we propose a brand new framework, FairMigration, which can dynamically migrate the demographic groups instead of keeping that fixed with raw sensitive attributes. FairMigration is composed of two training stages. In the first stage, the GNNs are initially optimized by personalized self-supervised learning, and the demographic groups are adjusted dynamically. In the second stage, the new demographic groups are frozen and supervised learning is carried out under the constraints of new demographic groups and adversarial training. Extensive experiments reveal that FairMigration balances model performance and fairness well.
△ Less
Submitted 23 March, 2024; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Challenges and Remedies to Privacy and Security in AIGC: Exploring the Potential of Privacy Computing, Blockchain, and Beyond
Authors:
Chuan Chen,
Zhenpeng Wu,
Yanyi Lai,
Wenlin Ou,
Tianchi Liao,
Zibin Zheng
Abstract:
Artificial Intelligence Generated Content (AIGC) is one of the latest achievements in AI development. The content generated by related applications, such as text, images and audio, has sparked a heated discussion. Various derived AIGC applications are also gradually entering all walks of life, bringing unimaginable impact to people's daily lives. However, the rapid development of such generative t…
▽ More
Artificial Intelligence Generated Content (AIGC) is one of the latest achievements in AI development. The content generated by related applications, such as text, images and audio, has sparked a heated discussion. Various derived AIGC applications are also gradually entering all walks of life, bringing unimaginable impact to people's daily lives. However, the rapid development of such generative tools has also raised concerns about privacy and security issues, and even copyright issues in AIGC. We note that advanced technologies such as blockchain and privacy computing can be combined with AIGC tools, but no work has yet been done to investigate their relevance and prospect in a systematic and detailed way. Therefore it is necessary to investigate how they can be used to protect the privacy and security of data in AIGC by fully exploring the aforementioned technologies. In this paper, we first systematically review the concept, classification and underlying technologies of AIGC. Then, we discuss the privacy and security challenges faced by AIGC from multiple perspectives and purposefully list the countermeasures that currently exist. We hope our survey will help researchers and industry to build a more secure and robust AIGC system.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Anomaly Detection Using One-Class SVM for Logs of Juniper Router Devices
Authors:
Tat-Bao-Thien Nguyen,
Teh-Lu Liao,
Tuan-Anh Vu
Abstract:
The article deals with anomaly detection of Juniper router logs. Abnormal Juniper router logs include logs that are usually different from the normal operation, and they often reflect the abnormal operation of router devices. To prevent router devices from being damaged and help administrator to grasp the situation of error quickly, detecting abnormal operation soon is very important. In this work…
▽ More
The article deals with anomaly detection of Juniper router logs. Abnormal Juniper router logs include logs that are usually different from the normal operation, and they often reflect the abnormal operation of router devices. To prevent router devices from being damaged and help administrator to grasp the situation of error quickly, detecting abnormal operation soon is very important. In this work, we present a new way to get important features from log data of Juniper router devices and use machine learning method (basing on One-Class SVM model) for anomaly detection. One-Class SVM model requires some knowledge and comprehension about logs of Juniper router devices so that it can analyze, interpret, and test the knowledge ac-quired. We collect log data from a lot of real Juniper router devices and clas-sify them based on our knowledge. Before these logs are used for training and testing the One-Class SVM model, the feature extraction phase for these data was carried out. Finally, with the proposed method, the system errors of the routers were dectected quickly and accurately. This may help our com-pany to reduce the operation cost for the router systems.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
High-Fidelity Clothed Avatar Reconstruction from a Single Image
Authors:
Tingting Liao,
Xiaomei Zhang,
Yuliang Xiu,
Hongwei Yi,
Xudong Liu,
Guo-Jun Qi,
Yong Zhang,
Xuan Wang,
Xiangyu Zhu,
Zhen Lei
Abstract:
This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of the high accuracy of optimization-based methods and the efficiency of learning-based methods, we propose a coarse-to-fine way to realize a high-fidelity clothed avatar reconstruction (CAR) from a single image. At the first stage, we use an implicit model to learn the general shape in the…
▽ More
This paper presents a framework for efficient 3D clothed avatar reconstruction. By combining the advantages of the high accuracy of optimization-based methods and the efficiency of learning-based methods, we propose a coarse-to-fine way to realize a high-fidelity clothed avatar reconstruction (CAR) from a single image. At the first stage, we use an implicit model to learn the general shape in the canonical space of a person in a learning-based way, and at the second stage, we refine the surface detail by estimating the non-rigid deformation in the posed space in an optimization way. A hyper-network is utilized to generate a good initialization so that the convergence o f the optimization process is greatly accelerated. Extensive experiments on various datasets show that the proposed CAR successfully produces high-fidelity avatars for arbitrarily clothed humans in real scenes.
△ Less
Submitted 8 April, 2023;
originally announced April 2023.
-
Ecosystem Graphs: The Social Footprint of Foundation Models
Authors:
Rishi Bommasani,
Dilara Soylu,
Thomas I. Liao,
Kathleen A. Creel,
Percy Liang
Abstract:
Foundation models (e.g. ChatGPT, StableDiffusion) pervasively influence society, warranting immediate social attention. While the models themselves garner much attention, to accurately characterize their impact, we must consider the broader sociotechnical ecosystem. We propose Ecosystem Graphs as a documentation framework to transparently centralize knowledge of this ecosystem. Ecosystem Graphs is…
▽ More
Foundation models (e.g. ChatGPT, StableDiffusion) pervasively influence society, warranting immediate social attention. While the models themselves garner much attention, to accurately characterize their impact, we must consider the broader sociotechnical ecosystem. We propose Ecosystem Graphs as a documentation framework to transparently centralize knowledge of this ecosystem. Ecosystem Graphs is composed of assets (datasets, models, applications) linked together by dependencies that indicate technical (e.g. how Bing relies on GPT-4) and social (e.g. how Microsoft relies on OpenAI) relationships. To supplement the graph structure, each asset is further enriched with fine-grained metadata (e.g. the license or training emissions). We document the ecosystem extensively at https://crfm.stanford.edu/ecosystem-graphs/. As of March 16, 2023, we annotate 262 assets (64 datasets, 128 models, 70 applications) from 63 organizations linked by 356 dependencies. We show Ecosystem Graphs functions as a powerful abstraction and interface for achieving the minimum transparency required to address myriad use cases. Therefore, we envision Ecosystem Graphs will be a community-maintained resource that provides value to stakeholders spanning AI researchers, industry professionals, social scientists, auditors and policymakers.
△ Less
Submitted 28 March, 2023;
originally announced March 2023.
-
Text-driven Visual Synthesis with Latent Diffusion Prior
Authors:
Ting-Hsuan Liao,
Songwei Ge,
Yiran Xu,
Yao-Chih Lee,
Badour AlBahar,
Jia-Bin Huang
Abstract:
There has been tremendous progress in large-scale text-to-image synthesis driven by diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing, and customized generation. We present a generic approach using latent diffusion models as powerful image priors for various visual synthesis tasks. Existing methods that utilize such priors fail to use…
▽ More
There has been tremendous progress in large-scale text-to-image synthesis driven by diffusion models enabling versatile downstream applications such as 3D object synthesis from texts, image editing, and customized generation. We present a generic approach using latent diffusion models as powerful image priors for various visual synthesis tasks. Existing methods that utilize such priors fail to use these models' full capabilities. To improve this, our core ideas are 1) a feature matching loss between features from different layers of the decoder to provide detailed guidance and 2) a KL divergence loss to regularize the predicted latent features and stabilize the training. We demonstrate the efficacy of our approach on three different applications, text-to-3D, StyleGAN adaptation, and layered image editing. Extensive results show our method compares favorably against baselines.
△ Less
Submitted 3 April, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
The Capacity for Moral Self-Correction in Large Language Models
Authors:
Deep Ganguli,
Amanda Askell,
Nicholas Schiefer,
Thomas I. Liao,
Kamilė Lukošiūtė,
Anna Chen,
Anna Goldie,
Azalia Mirhoseini,
Catherine Olsson,
Danny Hernandez,
Dawn Drain,
Dustin Li,
Eli Tran-Johnson,
Ethan Perez,
Jackson Kernion,
Jamie Kerr,
Jared Mueller,
Joshua Landau,
Kamal Ndousse,
Karina Nguyen,
Liane Lovitt,
Michael Sellitto,
Nelson Elhage,
Noemi Mercado,
Nova DasSarma
, et al. (24 additional authors not shown)
Abstract:
We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability…
▽ More
We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability for moral self-correction emerges at 22B model parameters, and typically improves with increasing model size and RLHF training. We believe that at this level of scale, language models obtain two capabilities that they can use for moral self-correction: (1) they can follow instructions and (2) they can learn complex normative concepts of harm like stereotyping, bias, and discrimination. As such, they can follow instructions to avoid certain kinds of morally harmful outputs. We believe our results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles.
△ Less
Submitted 18 February, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Hierarchical Motion Planning under Probabilistic Temporal Tasks and Safe-Return Constraints
Authors:
Meng Guo,
Tianjun Liao,
Junjie Wang,
Zhongkui Li
Abstract:
Safety is crucial for robotic missions within an uncertain environment. Common safety requirements such as collision avoidance are only state-dependent, which can be restrictive for complex missions. In this work, we address a more general formulation as safe-return constraints, which require the existence of a return-policy to drive the system back to a set of safe states with high probability. T…
▽ More
Safety is crucial for robotic missions within an uncertain environment. Common safety requirements such as collision avoidance are only state-dependent, which can be restrictive for complex missions. In this work, we address a more general formulation as safe-return constraints, which require the existence of a return-policy to drive the system back to a set of safe states with high probability. The robot motion is modeled as a Markov Decision Process (MDP) with probabilistic labels, which can be highly non-ergodic. The robotic task is specified as Linear Temporal Logic (LTL) formulas over these labels, such as surveillance and transportation. We first provide theoretical guarantees on the re-formulation of such safe-return constraints, and a baseline solution based on computing two complete product automata. Furthermore, to tackle the computational complexity, we propose a hierarchical planning algorithm that combines the feature-based symbolic and temporal abstraction with constrained optimization. It synthesizes simultaneously two dependent motion policies: the outbound policy minimizes the overall cost of satisfying the task with a high probability, while the return policy ensures the safe-return constraints. The problem formulation is versatile regarding the robot model, task specifications and safety constraints. The proposed hierarchical algorithm is more efficient and can solve much larger problems than the baseline solution, with only a slight loss of optimality. Numerical validations include simulations and hardware experiments of a search-and-rescue mission and a planetary exploration mission over various system sizes.
△ Less
Submitted 10 February, 2023;
originally announced February 2023.
-
ELDA: Using Edges to Have an Edge on Semantic Segmentation Based UDA
Authors:
Ting-Hsuan Liao,
Huang-Ru Liao,
Shan-Ya Yang,
Jie-En Yao,
Li-Yuan Tsao,
Hsu-Shen Liu,
Bo-Wun Cheng,
Chen-Hao Chao,
Chia-Che Chang,
Yi-Chen Lo,
Chun-Yi Lee
Abstract:
Many unsupervised domain adaptation (UDA) methods have been proposed to bridge the domain gap by utilizing domain invariant information. Most approaches have chosen depth as such information and achieved remarkable success. Despite their effectiveness, using depth as domain invariant information in UDA tasks may lead to multiple issues, such as excessively high extraction costs and difficulties in…
▽ More
Many unsupervised domain adaptation (UDA) methods have been proposed to bridge the domain gap by utilizing domain invariant information. Most approaches have chosen depth as such information and achieved remarkable success. Despite their effectiveness, using depth as domain invariant information in UDA tasks may lead to multiple issues, such as excessively high extraction costs and difficulties in achieving a reliable prediction quality. As a result, we introduce Edge Learning based Domain Adaptation (ELDA), a framework which incorporates edge information into its training process to serve as a type of domain invariant information. In our experiments, we quantitatively and qualitatively demonstrate that the incorporation of edge information is indeed beneficial and effective and enables ELDA to outperform the contemporary state-of-the-art methods on two commonly adopted benchmarks for semantic segmentation based UDA tasks. In addition, we show that ELDA is able to better separate the feature distributions of different classes. We further provide an ablation analysis to justify our design decisions.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
Robust Unstructured Knowledge Access in Conversational Dialogue with ASR Errors
Authors:
Yik-Cheung Tam,
Jiacheng Xu,
Jiakai Zou,
Zecheng Wang,
Tinglong Liao,
Shuhan Yuan
Abstract:
Performance of spoken language understanding (SLU) can be degraded with automatic speech recognition (ASR) errors. We propose a novel approach to improve SLU robustness by randomly corrupting clean training text with an ASR error simulator, followed by self-correcting the errors and minimizing the target classification loss in a joint manner. In the proposed error simulator, we leverage confusion…
▽ More
Performance of spoken language understanding (SLU) can be degraded with automatic speech recognition (ASR) errors. We propose a novel approach to improve SLU robustness by randomly corrupting clean training text with an ASR error simulator, followed by self-correcting the errors and minimizing the target classification loss in a joint manner. In the proposed error simulator, we leverage confusion networks generated from an ASR decoder without human transcriptions to generate a variety of error patterns for model training. We evaluate our approach on the DSTC10 challenge targeted for knowledge-grounded task-oriented conversational dialogues with ASR errors. Experimental results show the effectiveness of our proposed approach, boosting the knowledge-seeking turn detection (KTD) F1 significantly from 0.9433 to 0.9904. Knowledge cluster classification is boosted from 0.7924 to 0.9333 in Recall@1. After knowledge document re-ranking, our approach shows significant improvement in all knowledge selection metrics, from 0.7358 to 0.7806 in Recall@1, from 0.8301 to 0.9333 in Recall@5, and from 0.7798 to 0.8460 in MRR@5 on the test set. In the recent DSTC10 evaluation, our approach demonstrates significant improvement in knowledge selection, boosting Recall@1 from 0.495 to 0.7144 compared to the official baseline. Our source code is released in GitHub https://github.com/yctam/dstc10_track2_task2.git.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
When Digital Economy Meets Web3.0: Applications and Challenges
Authors:
Chuan Chen,
Lei Zhang,
Yihao Li,
Tianchi Liao,
Siran Zhao,
Zibin Zheng,
Huawei Huang,
Jiajing Wu
Abstract:
With the continuous development of web technology, Web3.0 has attracted a considerable amount of attention due to its unique decentralized characteristics. The digital economy is an important driver of high-quality economic development and is currently in a rapid development stage. In the digital economy scenario, the centralized nature of the Internet and other characteristics usually bring about…
▽ More
With the continuous development of web technology, Web3.0 has attracted a considerable amount of attention due to its unique decentralized characteristics. The digital economy is an important driver of high-quality economic development and is currently in a rapid development stage. In the digital economy scenario, the centralized nature of the Internet and other characteristics usually bring about security issues such as infringement and privacy leakage. Therefore, it is necessary to investigate how to use Web3.0 technologies to solve the pain points encountered in the development of the digital economy by fully exploring the critical technologies of digital economy and Web3.0. In this paper, we discuss the aspects of Web3.0 that should be integrated with the digital economy to better find the entry point to solve the problems by examining the latest advances of Web3.0 in machine learning, finance, and data management. We hope this research will inspire those who are involved in both academia and industry, and finally help to build a favourable ecology for the digital economy.
△ Less
Submitted 29 October, 2022; v1 submitted 26 September, 2022;
originally announced October 2022.
-
HarDNet-DFUS: An Enhanced Harmonically-Connected Network for Diabetic Foot Ulcer Image Segmentation and Colonoscopy Polyp Segmentation
Authors:
Ting-Yu Liao,
Ching-Hui Yang,
Yu-Wen Lo,
Kuan-Ying Lai,
Po-Huai Shen,
Youn-Long Lin
Abstract:
We present a neural network architecture for medical image segmentation of diabetic foot ulcers and colonoscopy polyps. Diabetic foot ulcers are caused by neuropathic and vascular complications of diabetes mellitus. In order to provide a proper diagnosis and treatment, wound care professionals need to extract accurate morphological features from the foot wounds. Using computer-aided systems is a p…
▽ More
We present a neural network architecture for medical image segmentation of diabetic foot ulcers and colonoscopy polyps. Diabetic foot ulcers are caused by neuropathic and vascular complications of diabetes mellitus. In order to provide a proper diagnosis and treatment, wound care professionals need to extract accurate morphological features from the foot wounds. Using computer-aided systems is a promising approach to extract related morphological features and segment the lesions. We propose a convolution neural network called HarDNet-DFUS by enhancing the backbone and replacing the decoder of HarDNet-MSEG, which was SOTA for colonoscopy polyp segmentation in 2021. For the MICCAI 2022 Diabetic Foot Ulcer Segmentation Challenge (DFUC2022), we train HarDNet-DFUS using the DFUC2022 dataset and increase its robustness by means of five-fold cross validation, Test Time Augmentation, etc. In the validation phase of DFUC2022, HarDNet-DFUS achieved 0.7063 mean dice and was ranked third among all participants. In the final testing phase of DFUC2022, it achieved 0.7287 mean dice and was the first place winner. HarDNet-DFUS also deliver excellent performance for the colonoscopy polyp segmentation task. It achieves 0.924 mean Dice on the famous Kvasir dataset, an improvement of 1.2\% over the original HarDNet-MSEG. The codes are available on https://github.com/kytimmylai/DFUC2022 (for Diabetic Foot Ulcers Segmentation) and https://github.com/YuWenLo/HarDNet-DFUS (for Colonoscopy Polyp Segmentation).
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Pixel-Wise Prediction based Visual Odometry via Uncertainty Estimation
Authors:
Hao-Wei Chen,
Ting-Hsuan Liao,
Hsuan-Kung Yang,
Chun-Yi Lee
Abstract:
This paper introduces pixel-wise prediction based visual odometry (PWVO), which is a dense prediction task that evaluates the values of translation and rotation for every pixel in its input observations. PWVO employs uncertainty estimation to identify the noisy regions in the input observations, and adopts a selection mechanism to integrate pixel-wise predictions based on the estimated uncertainty…
▽ More
This paper introduces pixel-wise prediction based visual odometry (PWVO), which is a dense prediction task that evaluates the values of translation and rotation for every pixel in its input observations. PWVO employs uncertainty estimation to identify the noisy regions in the input observations, and adopts a selection mechanism to integrate pixel-wise predictions based on the estimated uncertainty maps to derive the final translation and rotation. In order to train PWVO in a comprehensive fashion, we further develop a data generation workflow for generating synthetic training data. The experimental results show that PWVO is able to deliver favorable results. In addition, our analyses validate the effectiveness of the designs adopted in PWVO, and demonstrate that the uncertainty maps estimated by PWVO is capable of capturing the noises in its input observations.
△ Less
Submitted 18 August, 2022;
originally announced August 2022.
-
MSP-Former: Multi-Scale Projection Transformer for Single Image Desnowing
Authors:
Sixiang Chen,
Tian Ye,
Yun Liu,
Taodong Liao,
Jingxia Jiang,
Erkang Chen,
Peng Chen
Abstract:
Snow removal causes challenges due to its characteristic of complex degradations. To this end, targeted treatment of multi-scale snow degradations is critical for the network to learn effective snow removal. In order to handle the diverse scenes, we propose a multi-scale projection transformer (MSP-Former), which understands and covers a variety of snow degradation features in a multi-path manner,…
▽ More
Snow removal causes challenges due to its characteristic of complex degradations. To this end, targeted treatment of multi-scale snow degradations is critical for the network to learn effective snow removal. In order to handle the diverse scenes, we propose a multi-scale projection transformer (MSP-Former), which understands and covers a variety of snow degradation features in a multi-path manner, and integrates comprehensive scene context information for clean reconstruction via self-attention operation. For the local details of various snow degradations, the local capture module is introduced in parallel to assist in the rebuilding of a clean image. Such design achieves the SOTA performance on three desnowing benchmark datasets while costing the low parameters and computational complexity, providing a guarantee of practicality.
△ Less
Submitted 11 March, 2023; v1 submitted 12 July, 2022;
originally announced July 2022.
-
MVP-Human Dataset for 3D Human Avatar Reconstruction from Unconstrained Frames
Authors:
Xiangyu Zhu,
Tingting Liao,
Jiangjing Lyu,
Xiang Yan,
Yunfeng Wang,
Kan Guo,
Qiong Cao,
Stan Z. Li,
Zhen Lei
Abstract:
In this paper, we consider a novel problem of reconstructing a 3D human avatar from multiple unconstrained frames, independent of assumptions on camera calibration, capture space, and constrained actions. The problem should be addressed by a framework that takes multiple unconstrained images as inputs, and generates a shape-with-skinning avatar in the canonical space, finished in one feed-forward…
▽ More
In this paper, we consider a novel problem of reconstructing a 3D human avatar from multiple unconstrained frames, independent of assumptions on camera calibration, capture space, and constrained actions. The problem should be addressed by a framework that takes multiple unconstrained images as inputs, and generates a shape-with-skinning avatar in the canonical space, finished in one feed-forward pass. To this end, we present 3D Avatar Reconstruction in the wild (ARwild), which first reconstructs the implicit skinning fields in a multi-level manner, by which the image features from multiple images are aligned and integrated to estimate a pixel-aligned implicit function that represents the clothed shape. To enable the training and testing of the new framework, we contribute a large-scale dataset, MVP-Human (Multi-View and multi-Pose 3D Human), which contains 400 subjects, each of which has 15 scans in different poses and 8-view images for each pose, providing 6,000 3D scans and 48,000 images in total. Overall, benefits from the specific network architecture and the diverse data, the trained model enables 3D avatar reconstruction from unconstrained frames and achieves state-of-the-art performance.
△ Less
Submitted 17 May, 2023; v1 submitted 23 April, 2022;
originally announced April 2022.
-
Deep Transfer Learning with Graph Neural Network for Sensor-Based Human Activity Recognition
Authors:
Yan Yan,
Tianzheng Liao,
Jinjin Zhao,
Jiahong Wang,
Liang Ma,
Wei Lv,
Jing Xiong,
Lei Wang
Abstract:
The sensor-based human activity recognition (HAR) in mobile application scenarios is often confronted with sensor modalities variation and annotated data deficiency. Given this observation, we devised a graph-inspired deep learning approach toward the sensor-based HAR tasks, which was further used to build a deep transfer learning model toward giving a tentative solution for these two challenging…
▽ More
The sensor-based human activity recognition (HAR) in mobile application scenarios is often confronted with sensor modalities variation and annotated data deficiency. Given this observation, we devised a graph-inspired deep learning approach toward the sensor-based HAR tasks, which was further used to build a deep transfer learning model toward giving a tentative solution for these two challenging problems. Specifically, we present a multi-layer residual structure involved graph convolutional neural network (ResGCNN) toward the sensor-based HAR tasks, namely the HAR-ResGCNN approach. Experimental results on the PAMAP2 and mHealth data sets demonstrate that our ResGCNN is effective at capturing the characteristics of actions with comparable results compared to other sensor-based HAR models (with an average accuracy of 98.18% and 99.07%, respectively). More importantly, the deep transfer learning experiments using the ResGCNN model show excellent transferability and few-shot learning performance. The graph-based framework shows good meta-learning ability and is supposed to be a promising solution in sensor-based HAR tasks.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
Investigation of Factorized Optical Flows as Mid-Level Representations
Authors:
Hsuan-Kung Yang,
Tsu-Ching Hsiao,
Ting-Hsuan Liao,
Hsu-Shen Liu,
Li-Yuan Tsao,
Tzu-Wen Wang,
Shan-Ya Yang,
Yu-Wen Chen,
Huang-Ru Liao,
Chun-Yi Lee
Abstract:
In this paper, we introduce a new concept of incorporating factorized flow maps as mid-level representations, for bridging the perception and the control modules in modular learning based robotic frameworks. To investigate the advantages of factorized flow maps and examine their interplay with the other types of mid-level representations, we further develop a configurable framework, along with fou…
▽ More
In this paper, we introduce a new concept of incorporating factorized flow maps as mid-level representations, for bridging the perception and the control modules in modular learning based robotic frameworks. To investigate the advantages of factorized flow maps and examine their interplay with the other types of mid-level representations, we further develop a configurable framework, along with four different environments that contain both static and dynamic objects, for analyzing the impacts of factorized optical flow maps on the performance of deep reinforcement learning agents. Based on this framework, we report our experimental results on various scenarios, and offer a set of analyses to justify our hypothesis. Finally, we validate flow factorization in real world scenarios.
△ Less
Submitted 10 March, 2022; v1 submitted 9 March, 2022;
originally announced March 2022.
-
Natural Image Stitching Using Depth Maps
Authors:
Tianli Liao,
Nan Li
Abstract:
Natural image stitching (NIS) aims to create one natural-looking mosaic from two overlapping images that capture the same 3D scene from different viewing positions. Challenges inevitably arise when the scene is non-planar and the camera baseline is wide, since parallax becomes not negligible in such cases. In this paper, we propose a novel NIS method using depth maps, which generates natural-looki…
▽ More
Natural image stitching (NIS) aims to create one natural-looking mosaic from two overlapping images that capture the same 3D scene from different viewing positions. Challenges inevitably arise when the scene is non-planar and the camera baseline is wide, since parallax becomes not negligible in such cases. In this paper, we propose a novel NIS method using depth maps, which generates natural-looking mosaics against parallax in both overlapping and non-overlapping regions. Firstly, we construct a robust fitting method to filter out the outliers in feature matches and estimate the epipolar geometry between input images. Then, we draw a triangulation of the target image and estimate multiple local homographies, one per triangle, based on the locations of their vertices, the rectified depth values and the epipolar geometry. Finally, the warping image is rendered by the backward mapping of piece-wise homographies. Panorama is then produced via average blending and image inpainting. Experimental results demonstrate that the proposed method not only provides accurate alignment in the overlapping regions but also virtual naturalness in the non-overlapping region.
△ Less
Submitted 22 February, 2023; v1 submitted 13 February, 2022;
originally announced February 2022.
-
A Transfer Learning Pipeline for Educational Resource Discovery with Application in Leading Paragraph Generation
Authors:
Irene Li,
Thomas George,
Alexander Fabbri,
Tammy Liao,
Benjamin Chen,
Rina Kawamura,
Richard Zhou,
Vanessa Yan,
Swapnil Hingmire,
Dragomir Radev
Abstract:
Effective human learning depends on a wide selection of educational materials that align with the learner's current understanding of the topic. While the Internet has revolutionized human learning or education, a substantial resource accessibility barrier still exists. Namely, the excess of online information can make it challenging to navigate and discover high-quality learning materials. In this…
▽ More
Effective human learning depends on a wide selection of educational materials that align with the learner's current understanding of the topic. While the Internet has revolutionized human learning or education, a substantial resource accessibility barrier still exists. Namely, the excess of online information can make it challenging to navigate and discover high-quality learning materials. In this paper, we propose the educational resource discovery (ERD) pipeline that automates web resource discovery for novel domains. The pipeline consists of three main steps: data collection, feature extraction, and resource classification. We start with a known source domain and conduct resource discovery on two unseen target domains via transfer learning. We first collect frequent queries from a set of seed documents and search on the web to obtain candidate resources, such as lecture slides and introductory blog posts. Then we introduce a novel pretrained information retrieval deep neural network model, query-document masked language modeling (QD-MLM), to extract deep features of these candidate resources. We apply a tree-based classifier to decide whether the candidate is a positive learning resource. The pipeline achieves F1 scores of 0.94 and 0.82 when evaluated on two similar but novel target domains. Finally, we demonstrate how this pipeline can benefit an application: leading paragraph generation for surveys. This is the first study that considers various web resources for survey generation, to the best of our knowledge. We also release a corpus of 39,728 manually labeled web resources and 659 queries from NLP, Computer Vision (CV), and Statistics (STATS).
△ Less
Submitted 6 January, 2022;
originally announced January 2022.
-
CLICKER: A Computational LInguistics Classification Scheme for Educational Resources
Authors:
Swapnil Hingmire,
Irene Li,
Rena Kawamura,
Benjamin Chen,
Alexander Fabbri,
Xiangru Tang,
Yixin Liu,
Thomas George,
Tammy Liao,
Wai Pan Wong,
Vanessa Yan,
Richard Zhou,
Girish K. Palshikar,
Dragomir Radev
Abstract:
A classification scheme of a scientific subject gives an overview of its body of knowledge. It can also be used to facilitate access to research articles and other materials related to the subject. For example, the ACM Computing Classification System (CCS) is used in the ACM Digital Library search interface and also for indexing computer science papers. We observed that a comprehensive classificat…
▽ More
A classification scheme of a scientific subject gives an overview of its body of knowledge. It can also be used to facilitate access to research articles and other materials related to the subject. For example, the ACM Computing Classification System (CCS) is used in the ACM Digital Library search interface and also for indexing computer science papers. We observed that a comprehensive classification system like CCS or Mathematics Subject Classification (MSC) does not exist for Computational Linguistics (CL) and Natural Language Processing (NLP). We propose a classification scheme -- CLICKER for CL/NLP based on the analysis of online lectures from 77 university courses on this subject. The currently proposed taxonomy includes 334 topics and focuses on educational aspects of CL/NLP; it is based primarily, but not exclusively, on lecture notes from NLP courses. We discuss how such a taxonomy can help in various real-world applications, including tutoring platforms, resource retrieval, resource recommendation, prerequisite chain learning, and survey generation.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Autonomous Vision-based UAV Landing with Collision Avoidance using Deep Learning
Authors:
Tianpei Liao,
Amal Haridevan,
Yibo Liu,
Jinjun Shan
Abstract:
There is a risk of collision when multiple UAVs land simultaneously without communication on the same platform. This work accomplishes vision-based autonomous landing and uses a deep-learning-based method to realize collision avoidance during the landing process.
There is a risk of collision when multiple UAVs land simultaneously without communication on the same platform. This work accomplishes vision-based autonomous landing and uses a deep-learning-based method to realize collision avoidance during the landing process.
△ Less
Submitted 17 September, 2021;
originally announced September 2021.
-
Tensor Completion via Convolutional Sparse Coding Regularization
Authors:
Zhebin Wu,
Tianchi Liao,
Chuan Chen,
Cong Liu,
Zibin Zheng,
Xiongjun Zhang
Abstract:
Tensor data often suffer from missing value problem due to the complex high-dimensional structure while acquiring them. To complete the missing information, lots of Low-Rank Tensor Completion (LRTC) methods have been proposed, most of which depend on the low-rank property of tensor data. In this way, the low-rank component of the original data could be recovered roughly. However, the shortcoming i…
▽ More
Tensor data often suffer from missing value problem due to the complex high-dimensional structure while acquiring them. To complete the missing information, lots of Low-Rank Tensor Completion (LRTC) methods have been proposed, most of which depend on the low-rank property of tensor data. In this way, the low-rank component of the original data could be recovered roughly. However, the shortcoming is that the detail information can not be fully restored, no matter the Sum of the Nuclear Norm (SNN) nor the Tensor Nuclear Norm (TNN) based methods. On the contrary, in the field of signal processing, Convolutional Sparse Coding (CSC) can provide a good representation of the high-frequency component of the image, which is generally associated with the detail component of the data. Nevertheless, CSC can not handle the low-frequency component well. To this end, we propose two novel methods, LRTC-CSC-I and LRTC-CSC-II, which adopt CSC as a supplementary regularization for LRTC to capture the high-frequency components. Therefore, the LRTC-CSC methods can not only solve the missing value problem but also recover the details. Moreover, the regularizer CSC can be trained with small samples due to the sparsity characteristic. Extensive experiments show the effectiveness of LRTC-CSC methods, and quantitative evaluation indicates that the performance of our models are superior to state-of-the-art methods.
△ Less
Submitted 6 May, 2021; v1 submitted 1 December, 2020;
originally announced December 2020.
-
On the Compressed-Oracle Technique, and Post-Quantum Security of Proofs of Sequential Work
Authors:
Kai-Min Chung,
Serge Fehr,
Yu-Hsuan Huang,
Tai-Ning Liao
Abstract:
We revisit the so-called compressed oracle technique, introduced by Zhandry for analyzing quantum algorithms in the quantum random oracle model (QROM). To start off with, we offer a concise exposition of the technique, which easily extends to the parallel-query QROM, where in each query-round the considered algorithm may make several queries to the QROM in parallel. This variant of the QROM allows…
▽ More
We revisit the so-called compressed oracle technique, introduced by Zhandry for analyzing quantum algorithms in the quantum random oracle model (QROM). To start off with, we offer a concise exposition of the technique, which easily extends to the parallel-query QROM, where in each query-round the considered algorithm may make several queries to the QROM in parallel. This variant of the QROM allows for a more fine-grained query-complexity analysis.
Our main technical contribution is a framework that simplifies the use of (the parallel-query generalization of) the compressed oracle technique for proving query complexity results. With our framework in place, whenever applicable, it is possible to prove quantum query complexity lower bounds by means of purely classical reasoning. More than that, for typical examples the crucial classical observations that give rise to the classical bounds are sufficient to conclude the corresponding quantum bounds.
We demonstrate this on a few examples, recovering known results (like the optimality of parallel Grover), but also obtaining new results (like the optimality of parallel BHT collision search). Our main target is the hardness of finding a $q$-chain with fewer than $q$ parallel queries, i.e., a sequence $x_0, x_1,\ldots, x_q$ with $x_i = H(x_{i-1})$ for all $1 \leq i \leq q$.
The above problem of finding a hash chain is of fundamental importance in the context of proofs of sequential work. Indeed, as a concrete cryptographic application of our techniques, we prove that the "Simple Proofs of Sequential Work" proposed by Cohen and Pietrzak remains secure against quantum attacks. Such an analysis is not simply a matter of plugging in our new bound; the entire protocol needs to be analyzed in the light of a quantum attack. Thanks to our framework, this can now be done with purely classical reasoning.
△ Less
Submitted 9 July, 2021; v1 submitted 22 October, 2020;
originally announced October 2020.
-
NTIRE 2020 Challenge on NonHomogeneous Dehazing
Authors:
Codruta O. Ancuti,
Cosmin Ancuti,
Florin-Alexandru Vasluianu,
Radu Timofte,
Jing Liu,
Haiyan Wu,
Yuan Xie,
Yanyun Qu,
Lizhuang Ma,
Ziling Huang,
Qili Deng,
Ju-Chin Chao,
Tsung-Shan Yang,
Peng-Wen Chen,
Po-Min Hsu,
Tzu-Yi Liao,
Chung-En Sun,
Pei-Yuan Wu,
Jeonghyeok Do,
Jongmin Park,
Munchurl Kim,
Kareem Metwaly,
Xuelu Li,
Tiantong Guo,
Vishal Monga
, et al. (27 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2020 Challenge on NonHomogeneous Dehazing of images (restoration of rich details in hazy image). We focus on the proposed solutions and their results evaluated on NH-Haze, a novel dataset consisting of 55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor. NH-Haze is the first realistic nonhomogeneous haze dataset that provides ground truth images.…
▽ More
This paper reviews the NTIRE 2020 Challenge on NonHomogeneous Dehazing of images (restoration of rich details in hazy image). We focus on the proposed solutions and their results evaluated on NH-Haze, a novel dataset consisting of 55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor. NH-Haze is the first realistic nonhomogeneous haze dataset that provides ground truth images. The nonhomogeneous haze has been produced using a professional haze generator that imitates the real conditions of haze scenes. 168 participants registered in the challenge and 27 teams competed in the final testing phase. The proposed solutions gauge the state-of-the-art in image dehazing.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.