-
Rendering Large Volume Datasets in Unreal Engine 5: A Survey
Authors:
Markus Schlüter,
Tom Kwasnitschka,
Armin Bernstetter,
Jens Karstens
Abstract:
In this technical report, we discuss several approaches to in-core rendering of large volumetric datasets in Unreal Engine 5 (UE5). We explore the following methods: the TBRayMarcher Plugin, the Niagara Fluids Plugin , and various approaches using Sparse Volume Textures (SVT), with a particular focus on Heterogeneous Volumes (HV). We found the HV approach to be the most promising. The biggest chal…
▽ More
In this technical report, we discuss several approaches to in-core rendering of large volumetric datasets in Unreal Engine 5 (UE5). We explore the following methods: the TBRayMarcher Plugin, the Niagara Fluids Plugin , and various approaches using Sparse Volume Textures (SVT), with a particular focus on Heterogeneous Volumes (HV). We found the HV approach to be the most promising. The biggest challenge we encountered with other approaches was the need to chunk datasets so that each fits into volume textures smaller than one gigavoxel. While this enables display of the entire dataset at reasonable frame rates, it introduces noticeable artifacts at chunk borders due to incorrect lighting, as each chunk lacks information about its neighbors. After addressing some (signed) int32 overflows in the Engine's SVT-related source code by converting them to to (unsigned) uint32 or int64, the SVT-based HV system allows us to render sparse datasets up to 32k x 32k x 16k voxels, provided the compressed tile data (including MIP data and padding for correct interpolation) does not exceed 4 gigavoxels. In the future, we intend to extend the existing SVT streaming functionality to support out-of-core rendering, in order to eventually overcome VRAM limitations, graphics API constraints, and the performance issues associated with 64-bit arithmetic in GPU shaders.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
MVIP -- A Dataset and Methods for Application Oriented Multi-View and Multi-Modal Industrial Part Recognition
Authors:
Paul Koch,
Marian Schlüter,
Jörg Krüger
Abstract:
We present MVIP, a novel dataset for multi-modal and multi-view application-oriented industrial part recognition. Here we are the first to combine a calibrated RGBD multi-view dataset with additional object context such as physical properties, natural language, and super-classes. The current portfolio of available datasets offers a wide range of representations to design and benchmark related meth…
▽ More
We present MVIP, a novel dataset for multi-modal and multi-view application-oriented industrial part recognition. Here we are the first to combine a calibrated RGBD multi-view dataset with additional object context such as physical properties, natural language, and super-classes. The current portfolio of available datasets offers a wide range of representations to design and benchmark related methods. In contrast to existing classification challenges, industrial recognition applications offer controlled multi-modal environments but at the same time have different problems than traditional 2D/3D classification challenges. Frequently, industrial applications must deal with a small amount or increased number of training data, visually similar parts, and varying object sizes, while requiring a robust near 100% top 5 accuracy under cost and time constraints. Current methods tackle such challenges individually, but direct adoption of these methods within industrial applications is complex and requires further research. Our main goal with MVIP is to study and push transferability of various state-of-the-art methods within related downstream tasks towards an efficient deployment of industrial classifiers. Additionally, we intend to push with MVIP research regarding several modality fusion topics, (automated) synthetic data generation, and complex data sampling -- combined in a single application-oriented benchmark.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
Virtual Fieldwork in Immersive Environments using Game Engines
Authors:
Armin Bernstetter,
Tom Kwasnitschka,
Jens Karstens,
Markus Schlüter,
Isabella Peters
Abstract:
Fieldwork still is the first and foremost source of insight in many disciplines of the geosciences. Virtual fieldwork is an approach meant to enable scientists trained in fieldwork to apply these skills to a virtual representation of outcrops that are inaccessible to humans e.g. due to being located on the seafloor. For this purpose we develop a virtual fieldwork software in the game engine and 3D…
▽ More
Fieldwork still is the first and foremost source of insight in many disciplines of the geosciences. Virtual fieldwork is an approach meant to enable scientists trained in fieldwork to apply these skills to a virtual representation of outcrops that are inaccessible to humans e.g. due to being located on the seafloor. For this purpose we develop a virtual fieldwork software in the game engine and 3D creation tool Unreal Engine. This software is developed specifically for a large, spatially immersive environment as well as virtual reality using head-mounted displays. It contains multiple options for quantitative measurements of visualized 3D model data. We visualize three distinct real-world datasets gathered by different photogrammetric and bathymetric methods as use cases and gather initial feedback from domain experts.
△ Less
Submitted 17 December, 2024; v1 submitted 29 August, 2024;
originally announced August 2024.
-
Navigating simplicity and complexity of social-ecological systems through a dialog between dynamical systems and agent-based models
Authors:
Sonja Radosavljevic,
Udita Sanga,
Maja Schlüter
Abstract:
Social-ecological systems research aims to understand the nature of social-ecological phenomena, to find ways to foster or manage conditions under which desired phenomena occur or to reduce the negative consequences of undesirable phenomena. Such challenges are often addressed using dynamical systems models (DSM) or agent-based models (ABM). Here we develop an iterative procedure for combining DSM…
▽ More
Social-ecological systems research aims to understand the nature of social-ecological phenomena, to find ways to foster or manage conditions under which desired phenomena occur or to reduce the negative consequences of undesirable phenomena. Such challenges are often addressed using dynamical systems models (DSM) or agent-based models (ABM). Here we develop an iterative procedure for combining DSM and ABM to leverage their strengths and gain insights that surpass insights obtained by each approach separately. The procedure uses results of an ABM as inputs for a DSM development. In the following steps, results of the DSM analyses guide future analysis of the ABM and vice versa. This dialogue, more than having a tight connection between the models, enables pushing the research frontier, expanding the set of research questions and insights. We illustrate our method with the example of poverty traps and innovation in agricultural systems, but our conclusions are general and can be applied to other DSM-ABM combinations.
△ Less
Submitted 24 June, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Revisiting the Performance-Explainability Trade-Off in Explainable Artificial Intelligence (XAI)
Authors:
Barnaby Crook,
Maximilian Schlüter,
Timo Speith
Abstract:
Within the field of Requirements Engineering (RE), the increasing significance of Explainable Artificial Intelligence (XAI) in aligning AI-supported systems with user needs, societal expectations, and regulatory standards has garnered recognition. In general, explainability has emerged as an important non-functional requirement that impacts system quality. However, the supposed trade-off between e…
▽ More
Within the field of Requirements Engineering (RE), the increasing significance of Explainable Artificial Intelligence (XAI) in aligning AI-supported systems with user needs, societal expectations, and regulatory standards has garnered recognition. In general, explainability has emerged as an important non-functional requirement that impacts system quality. However, the supposed trade-off between explainability and performance challenges the presumed positive influence of explainability. If meeting the requirement of explainability entails a reduction in system performance, then careful consideration must be given to which of these quality aspects takes precedence and how to compromise between them. In this paper, we critically examine the alleged trade-off. We argue that it is best approached in a nuanced way that incorporates resource availability, domain characteristics, and considerations of risk. By providing a foundation for future research and best practices, this work aims to advance the field of RE for AI.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
The Power of Typed Affine Decision Structures: A Case Study
Authors:
Gerrit Nolte,
Maximilian Schlüter,
Alnis Murtovi,
Bernhard Steffen
Abstract:
TADS are a novel, concise white-box representation of neural networks. In this paper, we apply TADS to the problem of neural network verification, using them to generate either proofs or concise error characterizations for desirable neural network properties. In a case study, we consider the robustness of neural networks to adversarial attacks, i.e., small changes to an input that drastically chan…
▽ More
TADS are a novel, concise white-box representation of neural networks. In this paper, we apply TADS to the problem of neural network verification, using them to generate either proofs or concise error characterizations for desirable neural network properties. In a case study, we consider the robustness of neural networks to adversarial attacks, i.e., small changes to an input that drastically change a neural networks perception, and show that TADS can be used to provide precise diagnostics on how and where robustness errors a occur. We achieve these results by introducing Precondition Projection, a technique that yields a TADS describing network behavior precisely on a given subset of its input space, and combining it with PCA, a traditional, well-understood dimensionality reduction technique. We show that PCA is easily compatible with TADS. All analyses can be implemented in a straightforward fashion using the rich algebraic properties of TADS, demonstrating the utility of the TADS framework for neural network explainability and verification. While TADS do not yet scale as efficiently as state-of-the-art neural network verifiers, we show that, using PCA-based simplifications, they can still scale to mediumsized problems and yield concise explanations for potential errors that can be used for other purposes such as debugging a network or generating new training samples.
△ Less
Submitted 28 April, 2023;
originally announced April 2023.
-
Towards Rigorous Understanding of Neural Networks via Semantics-preserving Transformations
Authors:
Maximilian Schlüter,
Gerrit Nolte,
Alnis Murtovi,
Bernhard Steffen
Abstract:
In this paper we present an algebraic approach to the precise and global verification and explanation of Rectifier Neural Networks, a subclass of Piece-wise Linear Neural Networks (PLNNs), i.e., networks that semantically represent piece-wise affine functions. Key to our approach is the symbolic execution of these networks that allows the construction of semantically equivalent Typed Affine Decisi…
▽ More
In this paper we present an algebraic approach to the precise and global verification and explanation of Rectifier Neural Networks, a subclass of Piece-wise Linear Neural Networks (PLNNs), i.e., networks that semantically represent piece-wise affine functions. Key to our approach is the symbolic execution of these networks that allows the construction of semantically equivalent Typed Affine Decision Structures (TADS). Due to their deterministic and sequential nature, TADS can, similarly to decision trees, be considered as white-box models and therefore as precise solutions to the model and outcome explanation problem. TADS are linear algebras which allows one to elegantly compare Rectifier Networks for equivalence or similarity, both with precise diagnostic information in case of failure, and to characterize their classification potential by precisely characterizing the set of inputs that are specifically classified or the set of inputs where two network-based classifiers differ. All phenomena are illustrated along a detailed discussion of a minimal, illustrative example: the continuous XOR function.
△ Less
Submitted 28 April, 2023; v1 submitted 19 January, 2023;
originally announced January 2023.
-
Natural Synthetic Anomalies for Self-Supervised Anomaly Detection and Localization
Authors:
Hannah M. Schlüter,
Jeremy Tan,
Benjamin Hou,
Bernhard Kainz
Abstract:
We introduce a simple and intuitive self-supervision task, Natural Synthetic Anomalies (NSA), for training an end-to-end model for anomaly detection and localization using only normal training data. NSA integrates Poisson image editing to seamlessly blend scaled patches of various sizes from separate images. This creates a wide range of synthetic anomalies which are more similar to natural sub-ima…
▽ More
We introduce a simple and intuitive self-supervision task, Natural Synthetic Anomalies (NSA), for training an end-to-end model for anomaly detection and localization using only normal training data. NSA integrates Poisson image editing to seamlessly blend scaled patches of various sizes from separate images. This creates a wide range of synthetic anomalies which are more similar to natural sub-image irregularities than previous data-augmentation strategies for self-supervised anomaly detection. We evaluate the proposed method using natural and medical images. Our experiments with the MVTec AD dataset show that a model trained to localize NSA anomalies generalizes well to detecting real-world a priori unknown types of manufacturing defects. Our method achieves an overall detection AUROC of 97.2 outperforming all previous methods that learn without the use of additional datasets. Code available at https://github.com/hmsch/natural-synthetic-anomalies.
△ Less
Submitted 24 July, 2022; v1 submitted 30 September, 2021;
originally announced September 2021.
-
BackREST: A Model-Based Feedback-Driven Greybox Fuzzer for Web Applications
Authors:
François Gauthier,
Behnaz Hassanshahi,
Benjamin Selwyn-Smith,
Trong Nhan Mai,
Max Schlüter,
Micah Williams
Abstract:
Following the advent of the American Fuzzy Lop (AFL), fuzzing had a surge in popularity, and modern day fuzzers range from simple blackbox random input generators to complex whitebox concolic frameworks that are capable of deep program introspection. Web application fuzzers, however, did not benefit from the tremendous advancements in fuzzing for binary programs and remain largely blackbox in natu…
▽ More
Following the advent of the American Fuzzy Lop (AFL), fuzzing had a surge in popularity, and modern day fuzzers range from simple blackbox random input generators to complex whitebox concolic frameworks that are capable of deep program introspection. Web application fuzzers, however, did not benefit from the tremendous advancements in fuzzing for binary programs and remain largely blackbox in nature. This paper introduces BackREST, a fully automated, model-based, coverage- and taint-driven fuzzer that uses its feedback loops to find more critical vulnerabilities, faster (speedups between 7.4x and 25.9x). To model the server-side of web applications, BackREST automatically infers REST specifications through directed state-aware crawling. Comparing BackREST against three other web fuzzers on five large (>500 KLOC) Node.js applications shows how it consistently achieves comparable coverage while reporting more vulnerabilities than state-of-the-art. Finally, using BackREST, we uncovered nine 0-days, out of which six were not reported by any other fuzzer. All the 0-days have been disclosed and most are now public, including two in the highly popular Sequelize and Mongodb libraries.
△ Less
Submitted 18 August, 2021;
originally announced August 2021.
-
Generating Annotated Training Data for 6D Object Pose Estimation in Operational Environments with Minimal User Interaction
Authors:
Paul Koch,
Marian Schlüter,
Serge Thill
Abstract:
Recently developed deep neural networks achieved state-of-the-art results in the subject of 6D object pose estimation for robot manipulation. However, those supervised deep learning methods require expensive annotated training data. Current methods for reducing those costs frequently use synthetic data from simulations, but rely on expert knowledge and suffer from the "domain gap" when shifting to…
▽ More
Recently developed deep neural networks achieved state-of-the-art results in the subject of 6D object pose estimation for robot manipulation. However, those supervised deep learning methods require expensive annotated training data. Current methods for reducing those costs frequently use synthetic data from simulations, but rely on expert knowledge and suffer from the "domain gap" when shifting to the real world. Here, we present a proof of concept for a novel approach of autonomously generating annotated training data for 6D object pose estimation. This approach is designed for learning new objects in operational environments while requiring little interaction and no expertise on the part of the user. We evaluate our autonomous data generation approach in two grasping experiments, where we archive a similar grasping success rate as related work on a non autonomously generated data set.
△ Less
Submitted 11 May, 2022; v1 submitted 17 March, 2021;
originally announced March 2021.
-
GTOPX Space Mission Benchmarks
Authors:
Martin Schlueter,
Mehdi Neshat,
Mohamed Wahib,
Masaharu Munetomo,
Markus Wagner
Abstract:
This contribution introduces the GTOPX space mission benchmark collection, which is an extension of GTOP database published by the European Space Agency (ESA). GTOPX consists of ten individual benchmark instances representing real-world interplanetary space trajectory design problems. In regard to the original GTOP collection, GTOPX includes three new problem instances featuring mixed-integer and…
▽ More
This contribution introduces the GTOPX space mission benchmark collection, which is an extension of GTOP database published by the European Space Agency (ESA). GTOPX consists of ten individual benchmark instances representing real-world interplanetary space trajectory design problems. In regard to the original GTOP collection, GTOPX includes three new problem instances featuring mixed-integer and multi-objective properties. GTOPX enables a simplified user handling, unified benchmark function call and some minor bug corrections to the original GTOP implementation. Furthermore, GTOPX is linked from it's original C++ source code to Python and Matlab based on dynamic link libraries, assuring computationally fast and accurate reproduction of the benchmark results in all three programming languages. Space mission trajectory design problems as those represented in GTOPX are known to be highly non-linear and difficult to solve. The GTOPX collection, therefore, aims particularly at researchers wishing to put advanced (meta)heuristic and hybrid optimization algorithms to the test. The goal of this paper is to provide researchers with a manual and reference to the newly available GTOPX benchmark software.
△ Less
Submitted 17 February, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Deep learning with 4D spatio-temporal data representations for OCT-based force estimation
Authors:
Nils Gessert,
Marcel Bengs,
Matthias Schlüter,
Alexander Schlaefer
Abstract:
Estimating the forces acting between instruments and tissue is a challenging problem for robot-assisted minimally-invasive surgery. Recently, numerous vision-based methods have been proposed to replace electro-mechanical approaches. Moreover, optical coherence tomography (OCT) and deep learning have been used for estimating forces based on deformation observed in volumetric image data. The method…
▽ More
Estimating the forces acting between instruments and tissue is a challenging problem for robot-assisted minimally-invasive surgery. Recently, numerous vision-based methods have been proposed to replace electro-mechanical approaches. Moreover, optical coherence tomography (OCT) and deep learning have been used for estimating forces based on deformation observed in volumetric image data. The method demonstrated the advantage of deep learning with 3D volumetric data over 2D depth images for force estimation. In this work, we extend the problem of deep learning-based force estimation to 4D spatio-temporal data with streams of 3D OCT volumes. For this purpose, we design and evaluate several methods extending spatio-temporal deep learning to 4D which is largely unexplored so far. Furthermore, we provide an in-depth analysis of multi-dimensional image data representations for force estimation, comparing our 4D approach to previous, lower-dimensional methods. Also, we analyze the effect of temporal information and we study the prediction of short-term future force values, which could facilitate safety features. For our 4D force estimation architectures, we find that efficient decoupling of spatial and temporal processing is advantageous. We show that using 4D spatio-temporal data outperforms all previously used data representations with a mean absolute error of 10.7mN. We find that temporal information is valuable for force estimation and we demonstrate the feasibility of force prediction.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.
-
Spatio-Temporal Deep Learning Methods for Motion Estimation Using 4D OCT Image Data
Authors:
Marcel Bengs,
Nils Gessert,
Matthias Schlüter,
Alexander Schlaefer
Abstract:
Purpose. Localizing structures and estimating the motion of a specific target region are common problems for navigation during surgical interventions. Optical coherence tomography (OCT) is an imaging modality with a high spatial and temporal resolution that has been used for intraoperative imaging and also for motion estimation, for example, in the context of ophthalmic surgery or cochleostomy. Re…
▽ More
Purpose. Localizing structures and estimating the motion of a specific target region are common problems for navigation during surgical interventions. Optical coherence tomography (OCT) is an imaging modality with a high spatial and temporal resolution that has been used for intraoperative imaging and also for motion estimation, for example, in the context of ophthalmic surgery or cochleostomy. Recently, motion estimation between a template and a moving OCT image has been studied with deep learning methods to overcome the shortcomings of conventional, feature-based methods.
Methods. We investigate whether using a temporal stream of OCT image volumes can improve deep learning-based motion estimation performance. For this purpose, we design and evaluate several 3D and 4D deep learning methods and we propose a new deep learning approach. Also, we propose a temporal regularization strategy at the model output.
Results. Using a tissue dataset without additional markers, our deep learning methods using 4D data outperform previous approaches. The best performing 4D architecture achieves an correlation coefficient (aCC) of 98.58% compared to 85.0% of a previous 3D deep learning method. Also, our temporal regularization strategy at the output further improves 4D model performance to an aCC of 99.06%. In particular, our 4D method works well for larger motion and is robust towards image rotations and motion distortions.
Conclusions. We propose 4D spatio-temporal deep learning for OCT-based motion estimation. On a tissue dataset, we find that using 4D information for the model input improves performance while maintaining reasonable inference times. Our regularization strategy demonstrates that additional temporal information is also beneficial at the model output.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Zero Crossing Modulation for Communication with Temporally Oversampled 1-Bit Quantization
Authors:
Gerhard Fettweis,
Meik Dörpinghaus,
Sandra Bender,
Lukas Landau,
Peter Neuhaus,
Martin Schlüter
Abstract:
Today's communication systems typically use high resolution analog-to-digital converters (ADCs). However, considering future communication systems with data rates in the order of 100Gbit/s the ADC power consumption becomes a major factor due to the high sampling rates. A promising alternative are receivers based on 1-bit quantization and oversampling w.r.t. the signal bandwidth. Such an approach r…
▽ More
Today's communication systems typically use high resolution analog-to-digital converters (ADCs). However, considering future communication systems with data rates in the order of 100Gbit/s the ADC power consumption becomes a major factor due to the high sampling rates. A promising alternative are receivers based on 1-bit quantization and oversampling w.r.t. the signal bandwidth. Such an approach requires a redesign of modulation, receiver synchronization, and demapping. A zero crossing modulation is a natural choice as the information needs to be carried in the zero crossing time instants. The present paper provides an overview on zero crossing modulation, achievable rates, sequence mapping and demapping, 1-bit based channel parameter estimation, and continuous phase modulation as an alternative zero crossing modulation scheme.
△ Less
Submitted 2 December, 2019;
originally announced December 2019.
-
Towards Automatic Lesion Classification in the Upper Aerodigestive Tract Using OCT and Deep Transfer Learning Methods
Authors:
Nils Gessert,
Matthias Schlüter,
Sarah Latus,
Veronika Volgger,
Christian Betz,
Alexander Schlaefer
Abstract:
Early detection of cancer is crucial for treatment and overall patient survival. In the upper aerodigestive tract (UADT) the gold standard for identification of malignant tissue is an invasive biopsy. Recently, non-invasive imaging techniques such as confocal laser microscopy and optical coherence tomography (OCT) have been used for tissue assessment. In particular, in a recent study experts class…
▽ More
Early detection of cancer is crucial for treatment and overall patient survival. In the upper aerodigestive tract (UADT) the gold standard for identification of malignant tissue is an invasive biopsy. Recently, non-invasive imaging techniques such as confocal laser microscopy and optical coherence tomography (OCT) have been used for tissue assessment. In particular, in a recent study experts classified lesions in the UADT with respect to their invasiveness using OCT images only. As the results were promising, automatic classification of lesions might be feasible which could assist experts in their decision making. Therefore, we address the problem of automatic lesion classification from OCT images. This task is very challenging as the available dataset is extremely small and the data quality is limited. However, as similar issues are typical in many clinical scenarios we study to what extent deep learning approaches can still be trained and used for decision support.
△ Less
Submitted 10 February, 2019;
originally announced February 2019.
-
Two-path 3D CNNs for calibration of system parameters for OCT-based motion compensation
Authors:
Nils Gessert,
Martin Gromniak,
Matthias Schlüter,
Alexander Schlaefer
Abstract:
Automatic motion compensation and adjustment of an intraoperative imaging modality's field of view is a common problem during interventions. Optical coherence tomography (OCT) is an imaging modality which is used in interventions due to its high spatial resolution of few micrometers and its temporal resolution of potentially several hundred volumes per second. However, performing motion compensati…
▽ More
Automatic motion compensation and adjustment of an intraoperative imaging modality's field of view is a common problem during interventions. Optical coherence tomography (OCT) is an imaging modality which is used in interventions due to its high spatial resolution of few micrometers and its temporal resolution of potentially several hundred volumes per second. However, performing motion compensation with OCT is problematic due to its small field of view which might lead to tracked objects being lost quickly. We propose a novel deep learning-based approach that directly learns input parameters of motors that move the scan area for motion compensation from optical coherence tomography volumes. We design a two-path 3D convolutional neural network (CNN) architecture that takes two volumes with an object to be tracked as its input and predicts the necessary motor input parameters to compensate the object's movement. In this way, we learn the calibration between object movement and system parameters for motion compensation with arbitrary objects. Thus, we avoid error-prone hand-eye calibration and handcrafted feature tracking from classical approaches. We achieve an average correlation coefficient of 0.998 between predicted and ground-truth motor parameters which leads to sub-voxel accuracy. Furthermore, we show that our deep learning model is real-time capable for use with the system's high volume acquisition frequency.
△ Less
Submitted 22 October, 2018;
originally announced October 2018.
-
A Deep Learning Approach for Pose Estimation from Volumetric OCT Data
Authors:
Nils Gessert,
Matthias Schlüter,
Alexander Schlaefer
Abstract:
Tracking the pose of instruments is a central problem in image-guided surgery. For microscopic scenarios, optical coherence tomography (OCT) is increasingly used as an imaging modality. OCT is suitable for accurate pose estimation due to its micrometer range resolution and volumetric field of view. However, OCT image processing is challenging due to speckle noise and reflection artifacts in additi…
▽ More
Tracking the pose of instruments is a central problem in image-guided surgery. For microscopic scenarios, optical coherence tomography (OCT) is increasingly used as an imaging modality. OCT is suitable for accurate pose estimation due to its micrometer range resolution and volumetric field of view. However, OCT image processing is challenging due to speckle noise and reflection artifacts in addition to the images' 3D nature. We address pose estimation from OCT volume data with a new deep learning-based tracking framework. For this purpose, we design a new 3D convolutional neural network (CNN) architecture to directly predict the 6D pose of a small marker geometry from OCT volumes. We use a hexapod robot to automatically acquire labeled data points which we use to train 3D CNN architectures for multi-output regression. We use this setup to provide an in-depth analysis on deep learning-based pose estimation from volumes. Specifically, we demonstrate that exploiting volume information for pose estimation yields higher accuracy than relying on 2D representations with depth information. Supporting this observation, we provide quantitative and qualitative results that 3D CNNs effectively exploit the depth structure of marker objects. Regarding the deep learning aspect, we present efficient design principles for 3D CNNs, making use of insights from the 2D deep learning community. In particular, we present Inception3D as a new architecture which performs best for our application. We show that our deep learning approach reaches errors at our ground-truth label's resolution. We achieve a mean average error of $\SI{14.89 \pm 9.3}{\micro\metre}$ and $\SI{0.096 \pm 0.072}{\degree}$ for position and orientation learning, respectively.
△ Less
Submitted 10 March, 2018;
originally announced March 2018.