Search | arXiv e-print repository

arXiv:2509.20503 [pdf, ps, other]

Myosotis: structured computation for attention like layer

Authors: Evgenii Egorov, Hanno Ackermann, Markus Nagel, Hong Cai

Abstract: Attention layers apply a sequence-to-sequence mapping whose parameters depend on the pairwise interactions of the input elements. However, without any structural assumptions, memory and compute scale quadratically with the sequence length. The two main ways to mitigate this are to introduce sparsity by ignoring a sufficient amount of pairwise interactions or to introduce recurrent dependence along… ▽ More Attention layers apply a sequence-to-sequence mapping whose parameters depend on the pairwise interactions of the input elements. However, without any structural assumptions, memory and compute scale quadratically with the sequence length. The two main ways to mitigate this are to introduce sparsity by ignoring a sufficient amount of pairwise interactions or to introduce recurrent dependence along them, as SSM does. Although both approaches are reasonable, they both have disadvantages. We propose a novel algorithm that combines the advantages of both concepts. Our idea is based on the efficient inversion of tree-structured matrices. △ Less

Submitted 24 September, 2025; originally announced September 2025.

arXiv:2506.03290 [pdf, other]

Learning Optical Flow Field via Neural Ordinary Differential Equation

Authors: Leyla Mirvakhabova, Hong Cai, Jisoo Jeong, Hanno Ackermann, Farhad Zanjani, Fatih Porikli

Abstract: Recent works on optical flow estimation use neural networks to predict the flow field that maps positions of one image to positions of the other. These networks consist of a feature extractor, a correlation volume, and finally several refinement steps. These refinement steps mimic the iterative refinements performed by classical optimization algorithms and are usually implemented by neural layers… ▽ More Recent works on optical flow estimation use neural networks to predict the flow field that maps positions of one image to positions of the other. These networks consist of a feature extractor, a correlation volume, and finally several refinement steps. These refinement steps mimic the iterative refinements performed by classical optimization algorithms and are usually implemented by neural layers (e.g., GRU) which are recurrently executed for a fixed and pre-determined number of steps. However, relying on a fixed number of steps may result in suboptimal performance because it is not tailored to the input data. In this paper, we introduce a novel approach for predicting the derivative of the flow using a continuous model, namely neural ordinary differential equations (ODE). One key advantage of this approach is its capacity to model an equilibrium process, dynamically adjusting the number of compute steps based on the data at hand. By following a particular neural architecture, ODE solver, and associated hyperparameters, our proposed model can replicate the exact same updates as recurrent cells used in existing works, offering greater generality. Through extensive experimental analysis on optical flow benchmarks, we demonstrate that our approach achieves an impressive improvement over baseline and existing models, all while requiring only a single refinement step. △ Less

Submitted 3 June, 2025; originally announced June 2025.

Comments: CVPRW 2025

arXiv:2412.01931 [pdf, other]

Planar Gaussian Splatting

Authors: Farhad G. Zanjani, Hong Cai, Hanno Ackermann, Leila Mirvakhabova, Fatih Porikli

Abstract: This paper presents Planar Gaussian Splatting (PGS), a novel neural rendering approach to learn the 3D geometry and parse the 3D planes of a scene, directly from multiple RGB images. The PGS leverages Gaussian primitives to model the scene and employ a hierarchical Gaussian mixture approach to group them. Similar Gaussians are progressively merged probabilistically in the tree-structured Gaussian… ▽ More This paper presents Planar Gaussian Splatting (PGS), a novel neural rendering approach to learn the 3D geometry and parse the 3D planes of a scene, directly from multiple RGB images. The PGS leverages Gaussian primitives to model the scene and employ a hierarchical Gaussian mixture approach to group them. Similar Gaussians are progressively merged probabilistically in the tree-structured Gaussian mixtures to identify distinct 3D plane instances and form the overall 3D scene geometry. In order to enable the grouping, the Gaussian primitives contain additional parameters, such as plane descriptors derived by lifting 2D masks from a general 2D segmentation model and surface normals. Experiments show that the proposed PGS achieves state-of-the-art performance in 3D planar reconstruction without requiring either 3D plane labels or depth supervision. In contrast to existing supervised methods that have limited generalizability and struggle under domain shift, PGS maintains its performance across datasets thanks to its neural rendering and scene-specific optimization mechanism, while also being significantly faster than existing optimization-based approaches. △ Less

Submitted 2 December, 2024; originally announced December 2024.

Comments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025

arXiv:2211.02667 [pdf, other]

Deconfounding Imitation Learning with Variational Inference

Authors: Risto Vuorio, Pim de Haan, Johann Brehmer, Hanno Ackermann, Daniel Dijkman, Taco Cohen

Abstract: Standard imitation learning can fail when the expert demonstrators have different sensory inputs than the imitating agent. This is because partial observability gives rise to hidden confounders in the causal graph. In previous work, to work around the confounding problem, policies have been trained using query access to the expert's policy or inverse reinforcement learning (IRL). However, both app… ▽ More Standard imitation learning can fail when the expert demonstrators have different sensory inputs than the imitating agent. This is because partial observability gives rise to hidden confounders in the causal graph. In previous work, to work around the confounding problem, policies have been trained using query access to the expert's policy or inverse reinforcement learning (IRL). However, both approaches have drawbacks as the expert's policy may not be available and IRL can be unstable in practice. Instead, we propose to train a variational inference model to infer the expert's latent information and use it to train a latent-conditional policy. We prove that using this method, under strong assumptions, the identification of the correct imitation learning policy is theoretically possible from expert demonstrations alone. In practice, we focus on a setting with less strong assumptions where we use exploration data for learning the inference model. We show in theory and practice that this algorithm converges to the correct interventional policy, solves the confounding issue, and can under certain assumptions achieve an asymptotically optimal imitation performance. △ Less

Submitted 25 August, 2024; v1 submitted 4 November, 2022; originally announced November 2022.

arXiv:2107.12309 [pdf, other]

Spatial-Temporal Transformer for Dynamic Scene Graph Generation

Authors: Yuren Cong, Wentong Liao, Hanno Ackermann, Bodo Rosenhahn, Michael Ying Yang

Abstract: Dynamic scene graph generation aims at generating a scene graph of the given video. Compared to the task of scene graph generation from images, it is more challenging because of the dynamic relationships between objects and the temporal dependencies between frames allowing for a richer semantic interpretation. In this paper, we propose Spatial-temporal Transformer (STTran), a neural network that c… ▽ More Dynamic scene graph generation aims at generating a scene graph of the given video. Compared to the task of scene graph generation from images, it is more challenging because of the dynamic relationships between objects and the temporal dependencies between frames allowing for a richer semantic interpretation. In this paper, we propose Spatial-temporal Transformer (STTran), a neural network that consists of two core modules: (1) a spatial encoder that takes an input frame to extract spatial context and reason about the visual relationships within a frame, and (2) a temporal decoder which takes the output of the spatial encoder as input in order to capture the temporal dependencies between frames and infer the dynamic relationships. Furthermore, STTran is flexible to take varying lengths of videos as input without clipping, which is especially important for long videos. Our method is validated on the benchmark dataset Action Genome (AG). The experimental results demonstrate the superior performance of our method in terms of dynamic scene graphs. Moreover, a set of ablative studies is conducted and the effect of each proposed module is justified. Code available at: https://github.com/yrcong/STTran. △ Less

Submitted 8 August, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

Comments: accepted by ICCV 2021

arXiv:2105.02047 [pdf, other]

Cuboids Revisited: Learning Robust 3D Shape Fitting to Single RGB Images

Authors: Florian Kluger, Hanno Ackermann, Eric Brachmann, Michael Ying Yang, Bodo Rosenhahn

Abstract: Humans perceive and construct the surrounding world as an arrangement of simple parametric models. In particular, man-made environments commonly consist of volumetric primitives such as cuboids or cylinders. Inferring these primitives is an important step to attain high-level, abstract scene descriptions. Previous approaches directly estimate shape parameters from a 2D or 3D input, and are only ab… ▽ More Humans perceive and construct the surrounding world as an arrangement of simple parametric models. In particular, man-made environments commonly consist of volumetric primitives such as cuboids or cylinders. Inferring these primitives is an important step to attain high-level, abstract scene descriptions. Previous approaches directly estimate shape parameters from a 2D or 3D input, and are only able to reproduce simple objects, yet unable to accurately parse more complex 3D scenes. In contrast, we propose a robust estimator for primitive fitting, which can meaningfully abstract real-world environments using cuboids. A RANSAC estimator guided by a neural network fits these primitives to 3D features, such as a depth map. We condition the network on previously detected parts of the scene, thus parsing it one-by-one. To obtain 3D features from a single RGB image, we additionally optimise a feature extraction CNN in an end-to-end manner. However, naively minimising point-to-primitive distances leads to large or spurious cuboids occluding parts of the scene behind. We thus propose an occlusion-aware distance metric correctly handling opaque scenes. The proposed algorithm does not require labour-intensive labels, such as cuboid annotations, for training. Results on the challenging NYU Depth v2 dataset demonstrate that the proposed algorithm successfully abstracts cluttered real-world 3D scene layouts. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: CVPR 2021

arXiv:2006.09872 [pdf, ps, other]

doi 10.1007/s10951-020-00667-2

Scheduling a Proportionate Flow Shop of Batching Machines

Authors: Christoph Hertrich, Christian Weiß, Heiner Ackermann, Sandy Heydrich, Sven O. Krumke

Abstract: In this paper we study a proportionate flow shop of batching machines with release dates and a fixed number $m \geq 2$ of machines. The scheduling problem has so far barely received any attention in the literature, but recently its importance has increased significantly, due to applications in the industrial scaling of modern bio-medicine production processes. We show that for any fixed number of… ▽ More In this paper we study a proportionate flow shop of batching machines with release dates and a fixed number $m \geq 2$ of machines. The scheduling problem has so far barely received any attention in the literature, but recently its importance has increased significantly, due to applications in the industrial scaling of modern bio-medicine production processes. We show that for any fixed number of machines, the makespan and the sum of completion times can be minimized in polynomial time. Furthermore, we show that the obtained algorithm can also be used to minimize the weighted total completion time, maximum lateness, total tardiness and (weighted) number of late jobs in polynomial time if all release dates are $0$. Previously, polynomial time algorithms have only been known for two machines. △ Less

Submitted 26 November, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

Comments: Version 2: replace initial preprint with authors' accepted manuscript

Journal ref: Journal of Scheduling 23, 575-593 (2020)

arXiv:2005.03552 [pdf, other]

doi 10.1007/s10951-022-00732-y

Online Algorithms to Schedule a Proportionate Flexible Flow Shop of Batching Machines

Authors: Christoph Hertrich, Christian Weiß, Heiner Ackermann, Sandy Heydrich, Sven O. Krumke

Abstract: This paper is the first to consider online algorithms to schedule a proportionate flexible flow shop of batching machines (PFFB). The scheduling model is motivated by manufacturing processes of individualized medicaments, which are used in modern medicine to treat some serious illnesses. We provide two different online algorithms, proving also lower bounds for the offline problem to compute their… ▽ More This paper is the first to consider online algorithms to schedule a proportionate flexible flow shop of batching machines (PFFB). The scheduling model is motivated by manufacturing processes of individualized medicaments, which are used in modern medicine to treat some serious illnesses. We provide two different online algorithms, proving also lower bounds for the offline problem to compute their competitive ratios. The first algorithm is an easy-to-implement, general local scheduling heuristic. It is 2-competitive for PFFBs with an arbitrary number of stages and for several natural scheduling objectives. We also show that for total/average flow time, no deterministic algorithm with better competitive ratio exists. For the special case with two stages and the makespan or total completion time objective, we describe an improved algorithm that achieves the best possible competitive ratio $\varphi=\frac{1+\sqrt{5}}{2}$, the golden ratio. All our results also hold for proportionate (non-flexible) flow shops of batching machines (PFB) for which this is also the first paper to study online algorithms. △ Less

Submitted 17 July, 2024; v1 submitted 7 May, 2020; originally announced May 2020.

Comments: Authors' accepted manuscript

Journal ref: Journal of Scheduling 25, 643-657 (2022)

arXiv:2001.04735 [pdf, other]

NODIS: Neural Ordinary Differential Scene Understanding

Authors: Cong Yuren, Hanno Ackermann, Wentong Liao, Michael Ying Yang, Bodo Rosenhahn

Abstract: Semantic image understanding is a challenging topic in computer vision. It requires to detect all objects in an image, but also to identify all the relations between them. Detected objects, their labels and the discovered relations can be used to construct a scene graph which provides an abstract semantic interpretation of an image. In previous works, relations were identified by solving an assign… ▽ More Semantic image understanding is a challenging topic in computer vision. It requires to detect all objects in an image, but also to identify all the relations between them. Detected objects, their labels and the discovered relations can be used to construct a scene graph which provides an abstract semantic interpretation of an image. In previous works, relations were identified by solving an assignment problem formulated as Mixed-Integer Linear Programs. In this work, we interpret that formulation as Ordinary Differential Equation (ODE). The proposed architecture performs scene graph inference by solving a neural variant of an ODE by end-to-end learning. It achieves state-of-the-art results on all three benchmark tasks: scene graph generation (SGGen), classification (SGCls) and visual relationship detection (PredCls) on Visual Genome benchmark. △ Less

Submitted 18 July, 2020; v1 submitted 14 January, 2020; originally announced January 2020.

arXiv:2001.02643 [pdf, other]

CONSAC: Robust Multi-Model Fitting by Conditional Sample Consensus

Authors: Florian Kluger, Eric Brachmann, Hanno Ackermann, Carsten Rother, Michael Ying Yang, Bodo Rosenhahn

Abstract: We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements. Applications include finding multiple vanishing points in man-made scenes, fitting planes to architectural imagery, or estimating multiple rigid motions within the same sequence. In contrast to previous works, which resorted to hand-crafted search strategies for multiple model detection, we… ▽ More We present a robust estimator for fitting multiple parametric models of the same form to noisy measurements. Applications include finding multiple vanishing points in man-made scenes, fitting planes to architectural imagery, or estimating multiple rigid motions within the same sequence. In contrast to previous works, which resorted to hand-crafted search strategies for multiple model detection, we learn the search strategy from data. A neural network conditioned on previously detected models guides a RANSAC estimator to different subsets of all measurements, thereby finding model instances one after another. We train our method supervised as well as self-supervised. For supervised training of the search strategy, we contribute a new dataset for vanishing point estimation. Leveraging this dataset, the proposed algorithm is superior with respect to other robust estimators as well as to designated vanishing point estimation algorithms. For self-supervised learning of the search, we evaluate the proposed algorithm on multi-homography estimation and demonstrate an accuracy that is superior to state-of-the-art methods. △ Less

Submitted 25 March, 2020; v1 submitted 8 January, 2020; originally announced January 2020.

Comments: CVPR 2020

arXiv:1908.08989 [pdf, other]

Learning Disentangled Representations via Independent Subspaces

Authors: Maren Awiszus, Hanno Ackermann, Bodo Rosenhahn

Abstract: Image generating neural networks are mostly viewed as black boxes, where any change in the input can have a number of globally effective changes on the output. In this work, we propose a method for learning disentangled representations to allow for localized image manipulations. We use face images as our example of choice. Depending on the image region, identity and other facial attributes can be… ▽ More Image generating neural networks are mostly viewed as black boxes, where any change in the input can have a number of globally effective changes on the output. In this work, we propose a method for learning disentangled representations to allow for localized image manipulations. We use face images as our example of choice. Depending on the image region, identity and other facial attributes can be modified. The proposed network can transfer parts of a face such as shape and color of eyes, hair, mouth, etc.~directly between persons while all other parts of the face remain unchanged. The network allows to generate modified images which appear like realistic images. Our model learns disentangled representations by weak supervision. We propose a localized resnet autoencoder optimized using several loss functions including a loss based on the semantic segmentation, which we interpret as masks, and a loss which enforces disentanglement by decomposition of the latent space into statistically independent subspaces. We evaluate the proposed solution w.r.t. disentanglement and generated image quality. Convincing results are demonstrated using the CelebA dataset. △ Less

Submitted 26 August, 2019; originally announced August 2019.

Comments: Accepted at ICCV 2019 Workshop on Robust Subspace Learning and Applications in Computer Vision

arXiv:1907.10014 [pdf, other]

Temporally Consistent Horizon Lines

Authors: Florian Kluger, Hanno Ackermann, Michael Ying Yang, Bodo Rosenhahn

Abstract: The horizon line is an important geometric feature for many image processing and scene understanding tasks in computer vision. For instance, in navigation of autonomous vehicles or driver assistance, it can be used to improve 3D reconstruction as well as for semantic interpretation of dynamic environments. While both algorithms and datasets exist for single images, the problem of horizon line esti… ▽ More The horizon line is an important geometric feature for many image processing and scene understanding tasks in computer vision. For instance, in navigation of autonomous vehicles or driver assistance, it can be used to improve 3D reconstruction as well as for semantic interpretation of dynamic environments. While both algorithms and datasets exist for single images, the problem of horizon line estimation from video sequences has not gained attention. In this paper, we show how convolutional neural networks are able to utilise the temporal consistency imposed by video sequences in order to increase the accuracy and reduce the variance of horizon line estimates. A novel CNN architecture with an improved residual convolutional LSTM is presented for temporally consistent horizon line estimation. We propose an adaptive loss function that ensures stable training as well as accurate results. Furthermore, we introduce an extension of the KITTI dataset which contains precise horizon line labels for 43699 images across 72 video sequences. A comprehensive evaluation shows that the proposed approach consistently achieves superior performance compared with existing methods. △ Less

Submitted 9 January, 2020; v1 submitted 23 July, 2019; originally announced July 2019.

arXiv:1904.13271 [pdf, other]

Non-Rigid Structure-From-Motion by Rank-One Basis Shapes

Authors: Sami S. Brandt, Hanno Ackermann

Abstract: In this paper, we show that the affine, non-rigid structure-from-motion problem can be solved by rank-one, thus degenerate, basis shapes. It is a natural reformulation of the classic low-rank method by Bregler et al., where it was assumed that the deformable 3D structure is generated by a linear combination of rigid basis shapes. The non-rigid shape will be decomposed into the mean shape and the d… ▽ More In this paper, we show that the affine, non-rigid structure-from-motion problem can be solved by rank-one, thus degenerate, basis shapes. It is a natural reformulation of the classic low-rank method by Bregler et al., where it was assumed that the deformable 3D structure is generated by a linear combination of rigid basis shapes. The non-rigid shape will be decomposed into the mean shape and the degenerate shapes, constructed from the right singular vectors of the low-rank decomposition. The right singular vectors are affinely back-projected into the 3D space, and the affine back-projections will also be solved as part of the factorisation. By construction, a direct interpretation for the right singular vectors of the low-rank decomposition will also follow: they can be seen as principal components, hence, the first variant of our method is referred to as Rank-1-PCA. The second variant, referred to as Rank-1-ICA, additionally estimates the orthogonal transform which maps the deformation modes into as statistically independent modes as possible. It has the advantage of pinpointing statistically dependent subspaces related to, for instance, lip movements on human faces. Moreover, in contrast to prior works, no predefined dimensionality for the subspaces is imposed. The experiments on several datasets show that the method achieves better results than the state-of-the-art, it can be computed faster, and it provides an intuitive interpretation for the deformation modes. △ Less

Submitted 30 April, 2019; originally announced April 2019.

arXiv:1811.09132 [pdf, other]

Uncalibrated Non-Rigid Factorisation by Independent Subspace Analysis

Authors: Sami Sebastian Brandt, Hanno Ackermann, Stella Grasshof

Abstract: We propose a general, prior-free approach for the uncalibrated non-rigid structure-from-motion problem for modelling and analysis of non-rigid objects such as human faces. The word general refers to an approach that recovers the non-rigid affine structure and motion from 2D point correspondences by assuming that (1) the non-rigid shapes are generated by a linear combination of rigid 3D basis shape… ▽ More We propose a general, prior-free approach for the uncalibrated non-rigid structure-from-motion problem for modelling and analysis of non-rigid objects such as human faces. The word general refers to an approach that recovers the non-rigid affine structure and motion from 2D point correspondences by assuming that (1) the non-rigid shapes are generated by a linear combination of rigid 3D basis shapes, (2) that the non-rigid shapes are affine in nature, i.e., they can be modelled as deviations from the mean, rigid shape, (3) and that the basis shapes are statistically independent. In contrast to the majority of existing works, no prior information is assumed for the structure and motion apart from the assumption the that underlying basis shapes are statistically independent. The independent 3D shape bases are recovered by independent subspace analysis (ISA). Likewise, in contrast to the most previous approaches, no calibration information is assumed for affine cameras; the reconstruction is solved up to a global affine ambiguity that makes our approach simple but efficient. In the experiments, we evaluated the method with several standard data sets including a real face expression data set of 7200 faces with 2D point correspondences and unknown 3D structure and motion for which we obtained promising results. △ Less

Submitted 22 November, 2018; originally announced November 2018.

arXiv:1709.05910 [pdf, other]

Object Recognition from very few Training Examples for Enhancing Bicycle Maps

Authors: Christoph Reinders, Hanno Ackermann, Michael Ying Yang, Bodo Rosenhahn

Abstract: In recent years, data-driven methods have shown great success for extracting information about the infrastructure in urban areas. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. While large datasets have been published regarding cars, for cyclists very few labeled data is available although appearance, point of view, and posi… ▽ More In recent years, data-driven methods have shown great success for extracting information about the infrastructure in urban areas. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. While large datasets have been published regarding cars, for cyclists very few labeled data is available although appearance, point of view, and positioning of even relevant objects differ. Unfortunately, labeling data is costly and requires a huge amount of work. In this paper, we thus address the problem of learning with very few labels. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. We propose a system for object recognition that is trained with only 15 examples per class on average. To achieve this, we combine the advantages of convolutional neural networks and random forests to learn a patch-wise classifier. In the next step, we map the random forest to a neural network and transform the classifier to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, we integrate data of the Global Positioning System (GPS) to localize the predictions on the map. In comparison to Faster R-CNN and other networks for object recognition or algorithms for transfer learning, we considerably reduce the required amount of labeled data. We demonstrate good performance on the recognition of traffic signs for cyclists as well as their localization in maps. △ Less

Submitted 28 May, 2018; v1 submitted 18 September, 2017; originally announced September 2017.

Comments: Submitted to IV 2018. This research was supported by German Research Foundation DFG within Priority Research Programme 1894 "Volunteered Geographic Information: Interpretation, Visualization and Social Computing"

arXiv:1707.02427 [pdf, other]

Deep Learning for Vanishing Point Detection Using an Inverse Gnomonic Projection

Authors: Florian Kluger, Hanno Ackermann, Michael Ying Yang, Bodo Rosenhahn

Abstract: We present a novel approach for vanishing point detection from uncalibrated monocular images. In contrast to state-of-the-art, we make no a priori assumptions about the observed scene. Our method is based on a convolutional neural network (CNN) which does not use natural images, but a Gaussian sphere representation arising from an inverse gnomonic projection of lines detected in an image. This all… ▽ More We present a novel approach for vanishing point detection from uncalibrated monocular images. In contrast to state-of-the-art, we make no a priori assumptions about the observed scene. Our method is based on a convolutional neural network (CNN) which does not use natural images, but a Gaussian sphere representation arising from an inverse gnomonic projection of lines detected in an image. This allows us to rely on synthetic data for training, eliminating the need for labelled images. Our method achieves competitive performance on three horizon estimation benchmark datasets. We further highlight some additional use cases for which our vanishing point detection algorithm can be used. △ Less

Submitted 16 November, 2017; v1 submitted 8 July, 2017; originally announced July 2017.

Comments: Accepted for publication at German Conference on Pattern Recognition (GCPR) 2017. This research was supported by German Research Foundation DFG within Priority Research Programme 1894 "Volunteered Geographic Information: Interpretation, Visualisation and Social Computing"

arXiv:1702.00186 [pdf, other]

A Kinematic Chain Space for Monocular Motion Capture

Authors: Bastian Wandt, Hanno Ackermann, Bodo Rosenhahn

Abstract: This paper deals with motion capture of kinematic chains (e.g. human skeletons) from monocular image sequences taken by uncalibrated cameras. We present a method based on projecting an observation into a kinematic chain space (KCS). An optimization of the nuclear norm is proposed that implicitly enforces structural properties of the kinematic chain. Unlike other approaches our method does not requ… ▽ More This paper deals with motion capture of kinematic chains (e.g. human skeletons) from monocular image sequences taken by uncalibrated cameras. We present a method based on projecting an observation into a kinematic chain space (KCS). An optimization of the nuclear norm is proposed that implicitly enforces structural properties of the kinematic chain. Unlike other approaches our method does not require specific camera or object motion and is not relying on training data or previously determined constraints such as particular body lengths. The proposed algorithm is able to reconstruct scenes with limited camera motion and previously unseen motions. It is not only applicable to human skeletons but also to other kinematic chains for instance animals or industrial robots. We achieve state-of-the-art results on different benchmark data bases and real world scenes. △ Less

Submitted 1 February, 2017; originally announced February 2017.

arXiv:1701.08285 [pdf, ps, other]

doi 10.1145/2806416.2806582

Who With Whom And How?: Extracting Large Social Networks Using Search Engines

Authors: Stefan Siersdorfer, Philipp Kemkes, Hanno Ackermann, Sergej Zerr

Abstract: Social network analysis is leveraged in a variety of applications such as identifying influential entities, detecting communities with special interests, and determining the flow of information and innovations. However, existing approaches for extracting social networks from unstructured Web content do not scale well and are only feasible for small graphs. In this paper, we introduce novel methodo… ▽ More Social network analysis is leveraged in a variety of applications such as identifying influential entities, detecting communities with special interests, and determining the flow of information and innovations. However, existing approaches for extracting social networks from unstructured Web content do not scale well and are only feasible for small graphs. In this paper, we introduce novel methodologies for query-based search engine mining, enabling efficient extraction of social networks from large amounts of Web data. To this end, we use patterns in phrase queries for retrieving entity connections, and employ a bootstrapping approach for iteratively expanding the pattern set. Our experimental evaluation in different domains demonstrates that our algorithms provide high quality results and allow for scalable and efficient construction of social graphs. △ Less

Submitted 28 January, 2017; originally announced January 2017.

Journal ref: CIKM 2015 Proceedings of the 24th ACM International on Conference on Information and Knowledge Management Pages 1491-1500

arXiv:1701.06944 [pdf, other]

Motion Segmentation via Global and Local Sparse Subspace Optimization

Authors: Michael Ying Yang, Hanno Ackermann, Weiyao Lin, Sitong Feng, Bodo Rosenhahn

Abstract: In this paper, we propose a new framework for segmenting feature-based moving objects under affine subspace model. Since the feature trajectories in practice are high-dimensional and contain a lot of noise, we firstly apply the sparse PCA to represent the original trajectories with a low-dimensional global subspace, which consists of the orthogonal sparse principal vectors. Subsequently, the local… ▽ More In this paper, we propose a new framework for segmenting feature-based moving objects under affine subspace model. Since the feature trajectories in practice are high-dimensional and contain a lot of noise, we firstly apply the sparse PCA to represent the original trajectories with a low-dimensional global subspace, which consists of the orthogonal sparse principal vectors. Subsequently, the local subspace separation will be achieved via automatically searching the sparse representation of the nearest neighbors for each projected data. In order to refine the local subspace estimation result and deal with the missing data problem, we propose an error estimation to encourage the projected data that span a same local subspace to be clustered together. In the end, the segmentation of different motions is achieved through the spectral clustering on an affinity matrix, which is constructed with both the error estimation and sparse neighbors optimization. We test our method extensively and compare it with state-of-the-art methods on the Hopkins 155 dataset and Freiburg-Berkeley Motion Segmentation dataset. The results show that our method is comparable with the other motion segmentation methods, and in many cases exceed them in terms of precision and computation time. △ Less

Submitted 24 January, 2017; originally announced January 2017.

Comments: 11 pages

arXiv:1609.05834 [pdf, other]

doi 10.1016/j.isprsjprs.2017.07.010

On Support Relations and Semantic Scene Graphs

Authors: Michael Ying Yang, Wentong Liao, Hanno Ackermann, Bodo Rosenhahn

Abstract: Scene understanding is a popular and challenging topic in both computer vision and photogrammetry. Scene graph provides rich information for such scene understanding. This paper presents a novel approach to infer such relations and then to construct the scene graph. Support relations are estimated by considering important, previously ignored information: the physical stability and the prior suppor… ▽ More Scene understanding is a popular and challenging topic in both computer vision and photogrammetry. Scene graph provides rich information for such scene understanding. This paper presents a novel approach to infer such relations and then to construct the scene graph. Support relations are estimated by considering important, previously ignored information: the physical stability and the prior support knowledge between object classes. In contrast to previous methods for extracting support relations, the proposed approach generates more accurate results, and does not require a pixel-wise semantic labeling of the scene. The semantic scene graph which describes all the contextual relations within the scene is constructed using this information. To evaluate the accuracy of these graphs, multiple different measures are formulated. The proposed algorithms are evaluated using the NYUv2 database. The results demonstrate that the inferred support relations are more precise than state-of-the-art. The scene graphs are compared against ground truth graphs. △ Less

Submitted 16 November, 2017; v1 submitted 19 September, 2016; originally announced September 2016.

Comments: Accepted in ISPRS Journal of Photogrammetry and Remote Sensing

arXiv:0808.2081 [pdf, ps, other]

Concurrent Imitation Dynamics in Congestion Games

Authors: Heiner Ackermann, Petra Berenbrink, Simon Fischer, Martin Hoefer

Abstract: Imitating successful behavior is a natural and frequently applied approach to trust in when facing scenarios for which we have little or no experience upon which we can base our decision. In this paper, we consider such behavior in atomic congestion games. We propose to study concurrent imitation dynamics that emerge when each player samples another player and possibly imitates this agents' stra… ▽ More Imitating successful behavior is a natural and frequently applied approach to trust in when facing scenarios for which we have little or no experience upon which we can base our decision. In this paper, we consider such behavior in atomic congestion games. We propose to study concurrent imitation dynamics that emerge when each player samples another player and possibly imitates this agents' strategy if the anticipated latency gain is sufficiently large. Our main focus is on convergence properties. Using a potential function argument, we show that our dynamics converge in a monotonic fashion to stable states. In such a state none of the players can improve its latency by imitating somebody else. As our main result, we show rapid convergence to approximate equilibria. At an approximate equilibrium only a small fraction of agents sustains a latency significantly above or below average. In particular, imitation dynamics behave like fully polynomial time approximation schemes (FPTAS). Fixing all other parameters, the convergence time depends only in a logarithmic fashion on the number of agents. Since imitation processes are not innovative they cannot discover unused strategies. Furthermore, strategies may become extinct with non-zero probability. For the case of singleton games, we show that the probability of this event occurring is negligible. Additionally, we prove that the social cost of a stable state reached by our dynamics is not much worse than an optimal state in singleton congestion games with linear latency function. Finally, we discuss how the protocol can be extended such that, in the long run, dynamics converge to a Nash equilibrium. △ Less

Submitted 3 October, 2008; v1 submitted 14 August, 2008; originally announced August 2008.

Comments: 28 pages, 1 figure

ACM Class: F.2.2; G.2; G.3

arXiv:0805.1130 [pdf, ps, other]

On the Convergence Time of the Best Response Dynamics in Player-specific Congestion Games

Authors: Heiner Ackermann, Heiko Roeglin

Abstract: We study the convergence time of the best response dynamics in player-specific singleton congestion games. It is well known that this dynamics can cycle, although from every state a short sequence of best responses to a Nash equilibrium exists. Thus, the random best response dynamics, which selects the next player to play a best response uniformly at random, terminates in a Nash equilibrium with… ▽ More We study the convergence time of the best response dynamics in player-specific singleton congestion games. It is well known that this dynamics can cycle, although from every state a short sequence of best responses to a Nash equilibrium exists. Thus, the random best response dynamics, which selects the next player to play a best response uniformly at random, terminates in a Nash equilibrium with probability one. In this paper, we are interested in the expected number of best responses until the random best response dynamics terminates. As a first step towards this goal, we consider games in which each player can choose between only two resources. These games have a natural representation as (multi-)graphs by identifying nodes with resources and edges with players. For the class of games that can be represented as trees, we show that the best-response dynamics cannot cycle and that it terminates after O(n^2) steps where n denotes the number of resources. For the class of games represented as cycles, we show that the best response dynamics can cycle. However, we also show that the random best response dynamics terminates after O(n^2) steps in expectation. Additionally, we conjecture that in general player-specific singleton congestion games there exists no polynomial upper bound on the expected number of steps until the random best response dynamics terminates. We support our conjecture by presenting a family of games for which simulations indicate a super-polynomial convergence time. △ Less

Submitted 8 May, 2008; originally announced May 2008.

Showing 1–22 of 22 results for author: Ackermann, H