-
MAGI-1: Autoregressive Video Generation at Scale
Authors:
Sand. ai,
Hansi Teng,
Hongyu Jia,
Lei Sun,
Lingzhi Li,
Maolin Li,
Mingqiu Tang,
Shuai Han,
Tianning Zhang,
W. Q. Zhang,
Weifeng Luo,
Xiaoyang Kang,
Yuchen Sun,
Yue Cao,
Yunpeng Huang,
Yutong Lin,
Yuxin Fang,
Zewei Tao,
Zheng Zhang,
Zhongshu Wang,
Zixun Liu,
Dai Shi,
Guoli Su,
Hanwen Sun,
Hong Pan
, et al. (14 additional authors not shown)
Abstract:
We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks condition…
▽ More
We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. MAGI-1 facilitates controllable generation via chunk-wise prompting and supports real-time, memory-efficient deployment by maintaining constant peak inference cost, regardless of video length. The largest variant of MAGI-1 comprises 24 billion parameters and supports context lengths of up to 4 million tokens, demonstrating the scalability and robustness of our approach. The code and models are available at https://github.com/SandAI-org/MAGI-1 and https://github.com/SandAI-org/MagiAttention. The product can be accessed at https://sand.ai.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
Adaptive LiDAR Odometry and Mapping for Autonomous Agricultural Mobile Robots in Unmanned Farms
Authors:
Hanzhe Teng,
Yipeng Wang,
Dimitrios Chatziparaschis,
Konstantinos Karydis
Abstract:
Unmanned and intelligent agricultural systems are crucial for enhancing agricultural efficiency and for helping mitigate the effect of labor shortage. However, unlike urban environments, agricultural fields impose distinct and unique challenges on autonomous robotic systems, such as the unstructured and dynamic nature of the environment, the rough and uneven terrain, and the resulting non-smooth r…
▽ More
Unmanned and intelligent agricultural systems are crucial for enhancing agricultural efficiency and for helping mitigate the effect of labor shortage. However, unlike urban environments, agricultural fields impose distinct and unique challenges on autonomous robotic systems, such as the unstructured and dynamic nature of the environment, the rough and uneven terrain, and the resulting non-smooth robot motion. To address these challenges, this work introduces an adaptive LiDAR odometry and mapping framework tailored for autonomous agricultural mobile robots operating in complex agricultural environments. The proposed framework consists of a robust LiDAR odometry algorithm based on dense Generalized-ICP scan matching, and an adaptive mapping module that considers motion stability and point cloud consistency for selective map updates. The key design principle of this framework is to prioritize the incremental consistency of the map by rejecting motion-distorted points and sparse dynamic objects, which in turn leads to high accuracy in odometry estimated from scan matching against the map. The effectiveness of the proposed method is validated via extensive evaluation against state-of-the-art methods on field datasets collected in real-world agricultural environments featuring various planting types, terrain types, and robot motion profiles. Results demonstrate that our method can achieve accurate odometry estimation and mapping results consistently and robustly across diverse agricultural settings, whereas other methods are sensitive to abrupt robot motion and accumulated drift in unstructured environments. Further, the computational efficiency of our method is competitive compared with other methods. The source code of the developed method and the associated field dataset are publicly available at https://github.com/UCR-Robotics/AG-LOAM.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Authors:
Weifeng Lin,
Xinyu Wei,
Renrui Zhang,
Le Zhuo,
Shitian Zhao,
Siyuan Huang,
Huan Teng,
Junlin Xie,
Yu Qiao,
Peng Gao,
Hongsheng Li
Abstract:
This paper presents a versatile image-to-image visual assistant, PixWizard, designed for image generation, manipulation, and translation based on free-from language instructions. To this end, we tackle a variety of vision tasks into a unified image-text-to-image generation framework and curate an Omni Pixel-to-Pixel Instruction-Tuning Dataset. By constructing detailed instruction templates in natu…
▽ More
This paper presents a versatile image-to-image visual assistant, PixWizard, designed for image generation, manipulation, and translation based on free-from language instructions. To this end, we tackle a variety of vision tasks into a unified image-text-to-image generation framework and curate an Omni Pixel-to-Pixel Instruction-Tuning Dataset. By constructing detailed instruction templates in natural language, we comprehensively include a large set of diverse vision tasks such as text-to-image generation, image restoration, image grounding, dense image prediction, image editing, controllable generation, inpainting/outpainting, and more. Furthermore, we adopt Diffusion Transformers (DiT) as our foundation model and extend its capabilities with a flexible any resolution mechanism, enabling the model to dynamically process images based on the aspect ratio of the input, closely aligning with human perceptual processes. The model also incorporates structure-aware and semantic-aware guidance to facilitate effective fusion of information from the input image. Our experiments demonstrate that PixWizard not only shows impressive generative and understanding abilities for images with diverse resolutions but also exhibits promising generalization capabilities with unseen tasks and human instructions. The code and related resources are available at https://github.com/AFeng-x/PixWizard
△ Less
Submitted 27 February, 2025; v1 submitted 23 September, 2024;
originally announced September 2024.
-
Residual-NeRF: Learning Residual NeRFs for Transparent Object Manipulation
Authors:
Bardienus P. Duisterhof,
Yuemin Mao,
Si Heng Teng,
Jeffrey Ichnowski
Abstract:
Transparent objects are ubiquitous in industry, pharmaceuticals, and households. Grasping and manipulating these objects is a significant challenge for robots. Existing methods have difficulty reconstructing complete depth maps for challenging transparent objects, leaving holes in the depth reconstruction. Recent work has shown neural radiance fields (NeRFs) work well for depth perception in scene…
▽ More
Transparent objects are ubiquitous in industry, pharmaceuticals, and households. Grasping and manipulating these objects is a significant challenge for robots. Existing methods have difficulty reconstructing complete depth maps for challenging transparent objects, leaving holes in the depth reconstruction. Recent work has shown neural radiance fields (NeRFs) work well for depth perception in scenes with transparent objects, and these depth maps can be used to grasp transparent objects with high accuracy. NeRF-based depth reconstruction can still struggle with especially challenging transparent objects and lighting conditions. In this work, we propose Residual-NeRF, a method to improve depth perception and training speed for transparent objects. Robots often operate in the same area, such as a kitchen. By first learning a background NeRF of the scene without transparent objects to be manipulated, we reduce the ambiguity faced by learning the changes with the new object. We propose training two additional networks: a residual NeRF learns to infer residual RGB values and densities, and a Mixnet learns how to combine background and residual NeRFs. We contribute synthetic and real experiments that suggest Residual-NeRF improves depth perception of transparent objects. The results on synthetic data suggest Residual-NeRF outperforms the baselines with a 46.1% lower RMSE and a 29.5% lower MAE. Real-world qualitative experiments suggest Residual-NeRF leads to more robust depth maps with less noise and fewer holes. Website: https://residual-nerf.github.io
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
On-the-Go Tree Detection and Geometric Traits Estimation with Ground Mobile Robots in Fruit Tree Groves
Authors:
Dimitrios Chatziparaschis,
Hanzhe Teng,
Yipeng Wang,
Pamodya Peiris,
Elia Scudiero,
Konstantinos Karydis
Abstract:
By-tree information gathering is an essential task in precision agriculture achieved by ground mobile sensors, but it can be time- and labor-intensive. In this paper we present an algorithmic framework to perform real-time and on-the-go detection of trees and key geometric characteristics (namely, width and height) with wheeled mobile robots in the field. Our method is based on the fusion of 2D do…
▽ More
By-tree information gathering is an essential task in precision agriculture achieved by ground mobile sensors, but it can be time- and labor-intensive. In this paper we present an algorithmic framework to perform real-time and on-the-go detection of trees and key geometric characteristics (namely, width and height) with wheeled mobile robots in the field. Our method is based on the fusion of 2D domain-specific data (normalized difference vegetation index [NDVI] acquired via a red-green-near-infrared [RGN] camera) and 3D LiDAR point clouds, via a customized tree landmark association and parameter estimation algorithm. The proposed system features a multi-modal and entropy-based landmark correspondences approach, integrated into an underlying Kalman filter system to recognize the surrounding trees and jointly estimate their spatial and vegetation-based characteristics. Realistic simulated tests are used to evaluate our proposed algorithm's behavior in a variety of settings. Physical experiments in agricultural fields help validate our method's efficacy in acquiring accurate by-tree information on-the-go and in real-time by employing only onboard computational and sensing resources.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Climate Trends of Tropical Cyclone Intensity and Energy Extremes Revealed by Deep Learning
Authors:
Buo-Fu Chen,
Boyo Chen,
Chun-Min Hsiao,
Hsu-Feng Teng,
Cheng-Shang Lee,
Hung-Chi Kuo
Abstract:
Anthropogenic influences have been linked to tropical cyclone (TC) poleward migration, TC extreme precipitation, and an increased proportion of major hurricanes [1, 2, 3, 4]. Understanding past TC trends and variability is critical for projecting future TC impacts on human society considering the changing climate [5]. However, past trends of TC structure/energy remain uncertain due to limited obse…
▽ More
Anthropogenic influences have been linked to tropical cyclone (TC) poleward migration, TC extreme precipitation, and an increased proportion of major hurricanes [1, 2, 3, 4]. Understanding past TC trends and variability is critical for projecting future TC impacts on human society considering the changing climate [5]. However, past trends of TC structure/energy remain uncertain due to limited observations; subjective-analyzed and spatiotemporal-heterogeneous "best-track" datasets lead to reduced confidence in the assessed TC repose to climate change [6, 7]. Here, we use deep learning to reconstruct past "observations" and yield an objective global TC wind profile dataset during 1981 to 2020, facilitating a comprehensive examination of TC structure/energy. By training with uniquely labeled data integrating best tracks and numerical model analysis of 2004 to 2018 TCs, our model converts multichannel satellite imagery to a 0-750-km wind profile of axisymmetric surface winds. The model performance is verified to be sufficient for climate studies by comparing it to independent satellite-radar surface winds. Based on the new homogenized dataset, the major TC proportion has increased by ~13% in the past four decades. Moreover, the proportion of extremely high-energy TCs has increased by ~25%, along with an increasing trend (> one standard deviation of the 40-y variability) of the mean total energy of high-energy TCs. Although the warming ocean favors TC intensification, the TC track migration to higher latitudes and altered environments further affect TC structure/energy. This new deep learning method/dataset reveals novel trends regarding TC structure extremes and may help verify simulations/studies regarding TCs in the changing climate.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
Multimodal Dataset for Localization, Mapping and Crop Monitoring in Citrus Tree Farms
Authors:
Hanzhe Teng,
Yipeng Wang,
Xiaoao Song,
Konstantinos Karydis
Abstract:
In this work we introduce the CitrusFarm dataset, a comprehensive multimodal sensory dataset collected by a wheeled mobile robot operating in agricultural fields. The dataset offers stereo RGB images with depth information, as well as monochrome, near-infrared and thermal images, presenting diverse spectral responses crucial for agricultural research. Furthermore, it provides a range of navigation…
▽ More
In this work we introduce the CitrusFarm dataset, a comprehensive multimodal sensory dataset collected by a wheeled mobile robot operating in agricultural fields. The dataset offers stereo RGB images with depth information, as well as monochrome, near-infrared and thermal images, presenting diverse spectral responses crucial for agricultural research. Furthermore, it provides a range of navigational sensor data encompassing wheel odometry, LiDAR, inertial measurement unit (IMU), and GNSS with Real-Time Kinematic (RTK) as the centimeter-level positioning ground truth. The dataset comprises seven sequences collected in three fields of citrus trees, featuring various tree species at different growth stages, distinctive planting patterns, as well as varying daylight conditions. It spans a total operation time of 1.7 hours, covers a distance of 7.5 km, and constitutes 1.3 TB of data. We anticipate that this dataset can facilitate the development of autonomous robot systems operating in agricultural tree environments, especially for localization, mapping and crop monitoring tasks. Moreover, the rich sensing modalities offered in this dataset can also support research in a range of robotics and computer vision tasks, such as place recognition, scene understanding, object detection and segmentation, and multimodal learning. The dataset, in conjunction with related tools and resources, is made publicly available at https://github.com/UCR-Robotics/Citrus-Farm-Dataset.
△ Less
Submitted 28 September, 2023; v1 submitted 26 September, 2023;
originally announced September 2023.
-
UniMC: A Unified Framework for Long-Term Memory Conversation via Relevance Representation Learning
Authors:
Kang Zhao,
Wei Liu,
Jian Luan,
Minglei Gao,
Li Qian,
Hanlin Teng,
Bin Wang
Abstract:
Open-domain long-term memory conversation can establish long-term intimacy with humans, and the key is the ability to understand and memorize long-term dialogue history information. Existing works integrate multiple models for modelling through a pipeline, which ignores the coupling between different stages. In this paper, we propose a Unified framework for Long-term Memory Conversations (UniMC),…
▽ More
Open-domain long-term memory conversation can establish long-term intimacy with humans, and the key is the ability to understand and memorize long-term dialogue history information. Existing works integrate multiple models for modelling through a pipeline, which ignores the coupling between different stages. In this paper, we propose a Unified framework for Long-term Memory Conversations (UniMC), which increases the connection between different stages by learning relevance representation. Specifically, we decompose the main task into three subtasks based on probability graphs: 1) conversation summarization, 2) memory retrieval, 3) memory-augmented generation. Each subtask involves learning a representation for calculating the relevance between the query and memory, which is modelled by inserting a special token at the beginning of the decoder input. The relevance representation learning strengthens the connection across subtasks through parameter sharing and joint training. Extensive experimental results show that the proposed method consistently improves over strong baselines and yields better dialogue consistency and engagingness.
△ Less
Submitted 18 June, 2023;
originally announced June 2023.
-
TartanCalib: Iterative Wide-Angle Lens Calibration using Adaptive SubPixel Refinement of AprilTags
Authors:
Bardienus P Duisterhof,
Yaoyu Hu,
Si Heng Teng,
Michael Kaess,
Sebastian Scherer
Abstract:
Wide-angle cameras are uniquely positioned for mobile robots, by virtue of the rich information they provide in a small, light, and cost-effective form factor. An accurate calibration of the intrinsics and extrinsics is a critical pre-requisite for using the edge of a wide-angle lens for depth perception and odometry. Calibrating wide-angle lenses with current state-of-the-art techniques yields po…
▽ More
Wide-angle cameras are uniquely positioned for mobile robots, by virtue of the rich information they provide in a small, light, and cost-effective form factor. An accurate calibration of the intrinsics and extrinsics is a critical pre-requisite for using the edge of a wide-angle lens for depth perception and odometry. Calibrating wide-angle lenses with current state-of-the-art techniques yields poor results due to extreme distortion at the edge, as most algorithms assume a lens with low to medium distortion closer to a pinhole projection. In this work we present our methodology for accurate wide-angle calibration. Our pipeline generates an intermediate model, and leverages it to iteratively improve feature detection and eventually the camera parameters. We test three key methods to utilize intermediate camera models: (1) undistorting the image into virtual pinhole cameras, (2) reprojecting the target into the image frame, and (3) adaptive subpixel refinement. Combining adaptive subpixel refinement and feature reprojection significantly improves reprojection errors by up to 26.59 %, helps us detect up to 42.01 % more features, and improves performance in the downstream task of dense depth mapping. Finally, TartanCalib is open-source and implemented into an easy-to-use calibration toolbox. We also provide a translation layer with other state-of-the-art works, which allows for regressing generic models with thousands of parameters or using a more robust solver. To this end, TartanCalib is the tool of choice for wide-angle calibration. Project website and code: http://tartancalib.com.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Centroid Distance Keypoint Detector for Colored Point Clouds
Authors:
Hanzhe Teng,
Dimitrios Chatziparaschis,
Xinyue Kan,
Amit K. Roy-Chowdhury,
Konstantinos Karydis
Abstract:
Keypoint detection serves as the basis for many computer vision and robotics applications. Despite the fact that colored point clouds can be readily obtained, most existing keypoint detectors extract only geometry-salient keypoints, which can impede the overall performance of systems that intend to (or have the potential to) leverage color information. To promote advances in such systems, we propo…
▽ More
Keypoint detection serves as the basis for many computer vision and robotics applications. Despite the fact that colored point clouds can be readily obtained, most existing keypoint detectors extract only geometry-salient keypoints, which can impede the overall performance of systems that intend to (or have the potential to) leverage color information. To promote advances in such systems, we propose an efficient multi-modal keypoint detector that can extract both geometry-salient and color-salient keypoints in colored point clouds. The proposed CEntroid Distance (CED) keypoint detector comprises an intuitive and effective saliency measure, the centroid distance, that can be used in both 3D space and color space, and a multi-modal non-maximum suppression algorithm that can select keypoints with high saliency in two or more modalities. The proposed saliency measure leverages directly the distribution of points in a local neighborhood and does not require normal estimation or eigenvalue decomposition. We evaluate the proposed method in terms of repeatability and computational efficiency (i.e. running time) against state-of-the-art keypoint detectors on both synthetic and real-world datasets. Results demonstrate that our proposed CED keypoint detector requires minimal computational time while attaining high repeatability. To showcase one of the potential applications of the proposed method, we further investigate the task of colored point cloud registration. Results suggest that our proposed CED detector outperforms state-of-the-art handcrafted and learning-based keypoint detectors in the evaluated scenes. The C++ implementation of the proposed method is made publicly available at https://github.com/UCR-Robotics/CED_Detector.
△ Less
Submitted 15 June, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
A High-Accuracy Unsupervised Person Re-identification Method Using Auxiliary Information Mined from Datasets
Authors:
Hehan Teng,
Tao He,
Yuchen Guo,
Guiguang Ding
Abstract:
Supervised person re-identification methods rely heavily on high-quality cross-camera training label. This significantly hinders the deployment of re-ID models in real-world applications. The unsupervised person re-ID methods can reduce the cost of data annotation, but their performance is still far lower than the supervised ones. In this paper, we make full use of the auxiliary information mined…
▽ More
Supervised person re-identification methods rely heavily on high-quality cross-camera training label. This significantly hinders the deployment of re-ID models in real-world applications. The unsupervised person re-ID methods can reduce the cost of data annotation, but their performance is still far lower than the supervised ones. In this paper, we make full use of the auxiliary information mined from the datasets for multi-modal feature learning, including camera information, temporal information and spatial information. By analyzing the style bias of cameras, the characteristics of pedestrians' motion trajectories and the positions of camera network, this paper designs three modules: Time-Overlapping Constraint (TOC), Spatio-Temporal Similarity (STS) and Same-Camera Penalty (SCP) to exploit the auxiliary information. Auxiliary information can improve the model performance and inference accuracy by constructing association constraints or fusing with visual features. In addition, this paper proposes three effective training tricks, including Restricted Label Smoothing Cross Entropy Loss (RLSCE), Weight Adaptive Triplet Loss (WATL) and Dynamic Training Iterations (DTI). The tricks achieve mAP of 72.4% and 81.1% on MARS and DukeMTMC-VideoReID, respectively. Combined with auxiliary information exploiting modules, our methods achieve mAP of 89.9% on DukeMTMC, where TOC, STS and SCP all contributed considerable performance improvements. The method proposed by this paper outperforms most existing unsupervised re-ID methods and narrows the gap between unsupervised and supervised re-ID methods. Our code is at https://github.com/tenghehan/AuxUSLReID.
△ Less
Submitted 6 May, 2022;
originally announced May 2022.
-
A Free Lunch to Person Re-identification: Learning from Automatically Generated Noisy Tracklets
Authors:
Hehan Teng,
Tao He,
Yuchen Guo,
Zhenhua Guo,
Guiguang Ding
Abstract:
A series of unsupervised video-based re-identification (re-ID) methods have been proposed to solve the problem of high labor cost required to annotate re-ID datasets. But their performance is still far lower than the supervised counterparts. In the mean time, clean datasets without noise are used in these methods, which is not realistic. In this paper, we propose to tackle this problem by learning…
▽ More
A series of unsupervised video-based re-identification (re-ID) methods have been proposed to solve the problem of high labor cost required to annotate re-ID datasets. But their performance is still far lower than the supervised counterparts. In the mean time, clean datasets without noise are used in these methods, which is not realistic. In this paper, we propose to tackle this problem by learning re-ID models from automatically generated person tracklets by multiple objects tracking (MOT) algorithm. To this end, we design a tracklet-based multi-level clustering (TMC) framework to effectively learn the re-ID model from the noisy person tracklets. First, intra-tracklet isolation to reduce ID switch noise within tracklets; second, alternates between using inter-tracklet association to eliminate ID fragmentation noise and network training using the pseudo label. Extensive experiments on MARS with various manually generated noises show the effectiveness of the proposed framework. Specifically, the proposed framework achieved mAP 53.4% and rank-1 63.7% on the simulated tracklets with strongest noise, even outperforming the best existing method on clean tracklets. Based on the results, we believe that building re-ID models from automatically generated noisy tracklets is a reasonable approach and will also be an important way to make re-ID models feasible in real-world applications.
△ Less
Submitted 2 April, 2022;
originally announced April 2022.
-
Development and Testing of a Novel Automated Insect Capture Module for Sample Collection and Transfer
Authors:
Keran Ye,
Gustavo J. Correa,
Tom Guda,
Hanzhe Teng,
Anandasankar Ray,
Konstantinos Karydis
Abstract:
There exists an urgent need for efficient tools in disease surveillance to help model and predict the spread of disease. The transmission of insect-borne diseases poses a serious concern to public health officials and the medical and research community at large. In the modeling of this spread, we face bottlenecks in (1) the frequency at which we are able to sample insect vectors in environments th…
▽ More
There exists an urgent need for efficient tools in disease surveillance to help model and predict the spread of disease. The transmission of insect-borne diseases poses a serious concern to public health officials and the medical and research community at large. In the modeling of this spread, we face bottlenecks in (1) the frequency at which we are able to sample insect vectors in environments that are prone to propagating disease, (2) manual labor needed to set up and retrieve surveillance devices like traps, and (3) the return time in analyzing insect samples and determining if an infectious disease is spreading in a region. To help address these bottlenecks, we present in this paper the design, fabrication, and testing of a novel automated insect capture module (ICM) or trap that aims to improve the rate of transferring samples collected from the environment via aerial robots. The ICM features an ultraviolet light attractant, passive capture mechanism, panels which can open and close for access to insects, and a small onboard computer for automated operation and data logging. At the same time, the ICM is designed to be accessible; it is small-scale, lightweight and low-cost, and can be integrated with commercially available aerial robots. Indoor and outdoor experimentation validates ICM's feasibility in insect capturing and safe transportation. The device can help bring us one step closer toward achieving fully autonomous and scalable epidemiology by leveraging autonomous robots technology to aid the medical and research community.
△ Less
Submitted 29 August, 2020;
originally announced August 2020.
-
Online Exploration and Coverage Planning in Unknown Obstacle-Cluttered Environments
Authors:
Xinyue Kan,
Hanzhe Teng,
Konstantinos Karydis
Abstract:
Online coverage planning can be useful in applications like field monitoring and search and rescue. Without prior information of the environment, achieving resolution-complete coverage considering the non-holonomic mobility constraints in commonly-used vehicles (e.g., wheeled robots) remains a challenge. In this paper, we propose a hierarchical, hex-decomposition-based coverage planning algorithm…
▽ More
Online coverage planning can be useful in applications like field monitoring and search and rescue. Without prior information of the environment, achieving resolution-complete coverage considering the non-holonomic mobility constraints in commonly-used vehicles (e.g., wheeled robots) remains a challenge. In this paper, we propose a hierarchical, hex-decomposition-based coverage planning algorithm for unknown, obstacle-cluttered environments. The proposed approach ensures resolution-complete coverage, can be tuned to achieve fast exploration, and plans smooth paths for Dubins vehicles to follow at constant velocity in real-time. Gazebo simulations and hardware experiments with a non-holonomic wheeled robot show that our approach can successfully tradeoff between coverage and exploration speed and can outperform existing online coverage algorithms in terms of total covered area or exploration speed according to how it is tuned.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Informative sample generation using class aware generative adversarial networks for classification of chest Xrays
Authors:
Behzad Bozorgtabar,
Dwarikanath Mahapatra,
Hendrik von Teng,
Alexander Pollinger,
Lukas Ebner,
Jean-Phillipe Thiran,
Mauricio Reyes
Abstract:
Training robust deep learning (DL) systems for disease detection from medical images is challenging due to limited images covering different disease types and severity. The problem is especially acute, where there is a severe class imbalance. We propose an active learning (AL) framework to select most informative samples for training our model using a Bayesian neural network. Informative samples a…
▽ More
Training robust deep learning (DL) systems for disease detection from medical images is challenging due to limited images covering different disease types and severity. The problem is especially acute, where there is a severe class imbalance. We propose an active learning (AL) framework to select most informative samples for training our model using a Bayesian neural network. Informative samples are then used within a novel class aware generative adversarial network (CAGAN) to generate realistic chest xray images for data augmentation by transferring characteristics from one class label to another. Experiments show our proposed AL framework is able to achieve state-of-the-art performance by using about $35\%$ of the full dataset, thus saving significant time and effort over conventional methods.
△ Less
Submitted 30 April, 2019; v1 submitted 24 April, 2019;
originally announced April 2019.