-
A Comparative Study on Self-Organization in Wireless Sensor Networks
Authors:
Michael Simon,
Salwa M. Din,
Raja Jamal Chib
Abstract:
With advancements in microelectromechanical systems, low-power integrated circuits, and wireless communications, wireless sensor networks (WSNs) have become increasingly significant [1][2]. These distributed networks enable efficient resource utilization and open doors to numerous applications, including personal healthcare, home automation, environmental monitoring, industrial automation, and def…
▽ More
With advancements in microelectromechanical systems, low-power integrated circuits, and wireless communications, wireless sensor networks (WSNs) have become increasingly significant [1][2]. These distributed networks enable efficient resource utilization and open doors to numerous applications, including personal healthcare, home automation, environmental monitoring, industrial automation, and defense surveillance. However, WSNs are susceptible to environmental factors in their deployment areas and may suffer damage. In such cases, the network must be reconfigured or repaired. To address these challenges and adapt to resource constraints, WSN mechanisms must exhibit self-organizing capabilities. For instance, in tasks like allocation, cooperative communication, and dynamic data collection, self-organization enhances the efficiency and robustness of WSNs across the application, network, and physical layers.
△ Less
Submitted 4 January, 2025; v1 submitted 23 November, 2024;
originally announced November 2024.
-
The Impact of Knowledge Silos on Responsible AI Practices in Journalism
Authors:
Tomás Dodds,
Astrid Vandendaele,
Felix M. Simon,
Natali Helberger,
Valeria Resendez,
Wang Ngai Yeung
Abstract:
The effective adoption of responsible AI practices in journalism requires a concerted effort to bridge different perspectives, including technological, editorial, journalistic, and managerial. Among the many challenges that could impact information sharing around responsible AI inside news organizations are knowledge silos, where information is isolated within one part of the organization and not…
▽ More
The effective adoption of responsible AI practices in journalism requires a concerted effort to bridge different perspectives, including technological, editorial, journalistic, and managerial. Among the many challenges that could impact information sharing around responsible AI inside news organizations are knowledge silos, where information is isolated within one part of the organization and not easily shared with others. This study aims to explore if, and if so, how, knowledge silos affect the adoption of responsible AI practices in journalism through a cross-case study of four major Dutch media outlets. We examine the individual and organizational barriers to AI knowledge sharing and the extent to which knowledge silos could impede the operationalization of responsible AI initiatives inside newsrooms. To address this question, we conducted 14 semi-structured interviews with editors, managers, and journalists at de Telegraaf, de Volkskrant, the Nederlandse Omroep Stichting (NOS), and RTL Nederland. The interviews aimed to uncover insights into the existence of knowledge silos, their effects on responsible AI practice adoption, and the organizational practices influencing these dynamics. Our results emphasize the importance of creating better structures for sharing information on AI across all layers of news organizations.
△ Less
Submitted 23 October, 2024; v1 submitted 1 October, 2024;
originally announced October 2024.
-
Bi-objective trail-planning for a robot team orienteering in a hazardous environment
Authors:
Cory M. Simon,
Jeffrey Richley,
Lucas Overbey,
Darleen Perez-Lavin
Abstract:
Teams of mobile [aerial, ground, or aquatic] robots have applications in resource delivery, patrolling, information-gathering, agriculture, forest fire fighting, chemical plume source localization and mapping, and search-and-rescue. Robot teams traversing hazardous environments -- with e.g. rough terrain or seas, strong winds, or adversaries capable of attacking or capturing robots -- should plan…
▽ More
Teams of mobile [aerial, ground, or aquatic] robots have applications in resource delivery, patrolling, information-gathering, agriculture, forest fire fighting, chemical plume source localization and mapping, and search-and-rescue. Robot teams traversing hazardous environments -- with e.g. rough terrain or seas, strong winds, or adversaries capable of attacking or capturing robots -- should plan and coordinate their trails in consideration of risks of disablement, destruction, or capture. Specifically, the robots should take the safest trails, coordinate their trails to cooperatively achieve the team-level objective with robustness to robot failures, and balance the reward from visiting locations against risks of robot losses. Herein, we consider bi-objective trail-planning for a mobile team of robots orienteering in a hazardous environment. The hazardous environment is abstracted as a directed graph whose arcs, when traversed by a robot, present known probabilities of survival. Each node of the graph offers a reward to the team if visited by a robot (which e.g. delivers a good to or images the node). We wish to search for the Pareto-optimal robot-team trail plans that maximize two [conflicting] team objectives: the expected (i) team reward and (ii) number of robots that survive the mission. A human decision-maker can then select trail plans that balance, according to their values, reward and robot survival. We implement ant colony optimization, guided by heuristics, to search for the Pareto-optimal set of robot team trail plans. As a case study, we illustrate with an information-gathering mission in an art museum.
△ Less
Submitted 18 September, 2024;
originally announced September 2024.
-
"It Might be Technically Impressive, But It's Practically Useless to us": Motivations, Practices, Challenges, and Opportunities for Cross-Functional Collaboration around AI within the News Industry
Authors:
Qing Xiao,
Xianzhe Fan,
Felix M. Simon,
Bingbing Zhang,
Motahhare Eslami
Abstract:
Recently, an increasing number of news organizations have integrated artificial intelligence (AI) into their workflows, leading to a further influx of AI technologists and data workers into the news industry. This has initiated cross-functional collaborations between these professionals and journalists. Although prior research has explored the impact of AI-related roles entering the news industry,…
▽ More
Recently, an increasing number of news organizations have integrated artificial intelligence (AI) into their workflows, leading to a further influx of AI technologists and data workers into the news industry. This has initiated cross-functional collaborations between these professionals and journalists. Although prior research has explored the impact of AI-related roles entering the news industry, there is a lack of studies on how internal cross-functional collaboration around AI unfolds between AI professionals and journalists within the news industry. Through interviews with 17 journalists, six AI technologists, and three AI workers with cross-functional experience from leading Chinese news organizations, we investigate the practices, challenges, and opportunities for internal cross-functional collaboration around AI in news industry. We first study how these journalists and AI professionals perceive existing internal cross-collaboration strategies. We explore the challenges of cross-functional collaboration and provide recommendations for enhancing future cross-functional collaboration around AI in the news industry.
△ Less
Submitted 12 February, 2025; v1 submitted 18 September, 2024;
originally announced September 2024.
-
Sequential Representation Learning via Static-Dynamic Conditional Disentanglement
Authors:
Mathieu Cyrille Simon,
Pascal Frossard,
Christophe De Vleeschouwer
Abstract:
This paper explores self-supervised disentangled representation learning within sequential data, focusing on separating time-independent and time-varying factors in videos. We propose a new model that breaks the usual independence assumption between those factors by explicitly accounting for the causal relationship between the static/dynamic variables and that improves the model expressivity throu…
▽ More
This paper explores self-supervised disentangled representation learning within sequential data, focusing on separating time-independent and time-varying factors in videos. We propose a new model that breaks the usual independence assumption between those factors by explicitly accounting for the causal relationship between the static/dynamic variables and that improves the model expressivity through additional Normalizing Flows. A formal definition of the factors is proposed. This formalism leads to the derivation of sufficient conditions for the ground truth factors to be identifiable, and to the introduction of a novel theoretically grounded disentanglement constraint that can be directly and efficiently incorporated into our new framework. The experiments show that the proposed approach outperforms previous complex state-of-the-art techniques in scenarios where the dynamics of a scene are influenced by its content.
△ Less
Submitted 10 August, 2024;
originally announced August 2024.
-
SimGen: Simulator-conditioned Driving Scene Generation
Authors:
Yunsong Zhou,
Michael Simon,
Zhenghao Peng,
Sicheng Mo,
Hongzi Zhu,
Minyi Guo,
Bolei Zhou
Abstract:
Controllable synthetic data generation can substantially lower the annotation cost of training data. Prior works use diffusion models to generate driving images conditioned on the 3D object layout. However, those models are trained on small-scale datasets like nuScenes, which lack appearance and layout diversity. Moreover, overfitting often happens, where the trained models can only generate image…
▽ More
Controllable synthetic data generation can substantially lower the annotation cost of training data. Prior works use diffusion models to generate driving images conditioned on the 3D object layout. However, those models are trained on small-scale datasets like nuScenes, which lack appearance and layout diversity. Moreover, overfitting often happens, where the trained models can only generate images based on the layout data from the validation set of the same dataset. In this work, we introduce a simulator-conditioned scene generation framework called SimGen that can learn to generate diverse driving scenes by mixing data from the simulator and the real world. It uses a novel cascade diffusion pipeline to address challenging sim-to-real gaps and multi-condition conflicts. A driving video dataset DIVA is collected to enhance the generative diversity of SimGen, which contains over 147.5 hours of real-world driving videos from 73 locations worldwide and simulated driving data from the MetaDrive simulator. SimGen achieves superior generation quality and diversity while preserving controllability based on the text prompt and the layout pulled from a simulator. We further demonstrate the improvements brought by SimGen for synthetic data augmentation on the BEV detection and segmentation task and showcase its capability in safety-critical data generation.
△ Less
Submitted 7 December, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Constraint Model for the Satellite Image Mosaic Selection Problem
Authors:
Manuel Combarro Simón,
Pierre Talbot,
Grégoire Danoy,
Jedrzej Musial,
Mohammed Alswaitti,
Pascal Bouvry
Abstract:
Satellite imagery solutions are widely used to study and monitor different regions of the Earth. However, a single satellite image can cover only a limited area. In cases where a larger area of interest is studied, several images must be stitched together to create a single larger image, called a mosaic, that can cover the area. Today, with the increasing number of satellite images available for c…
▽ More
Satellite imagery solutions are widely used to study and monitor different regions of the Earth. However, a single satellite image can cover only a limited area. In cases where a larger area of interest is studied, several images must be stitched together to create a single larger image, called a mosaic, that can cover the area. Today, with the increasing number of satellite images available for commercial use, selecting the images to build the mosaic is challenging, especially when the user wants to optimize one or more parameters, such as the total cost and the cloud coverage percentage in the mosaic. More precisely, for this problem the input is an area of interest, several satellite images intersecting the area, a list of requirements relative to the image and the mosaic, such as cloud coverage percentage, image resolution, and a list of objectives to optimize. We contribute to the constraint and mixed integer lineal programming formulation of this new problem, which we call the \textit{satellite image mosaic selection problem}, which is a multi-objective extension of the polygon cover problem. We propose a dataset of realistic and challenging instances, where the images were captured by the satellite constellations SPOT, Pléiades and Pléiades Neo. We evaluate and compare the two proposed models and show their efficiency for large instances, up to 200 images.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
A tutorial on the Bayesian statistical approach to inverse problems
Authors:
Faaiq G. Waqar,
Swati Patel,
Cory M. Simon
Abstract:
Inverse problems are ubiquitous in the sciences and engineering. Two categories of inverse problems concerning a physical system are (1) estimate parameters in a model of the system from observed input-output pairs and (2) given a model of the system, reconstruct the input to it that caused some observed output. Applied inverse problems are challenging because a solution may (i) not exist, (ii) no…
▽ More
Inverse problems are ubiquitous in the sciences and engineering. Two categories of inverse problems concerning a physical system are (1) estimate parameters in a model of the system from observed input-output pairs and (2) given a model of the system, reconstruct the input to it that caused some observed output. Applied inverse problems are challenging because a solution may (i) not exist, (ii) not be unique, or (iii) be sensitive to measurement noise contaminating the data.
Bayesian statistical inversion (BSI) is an approach to tackle ill-posed and/or ill-conditioned inverse problems. Advantageously, BSI provides a "solution" that (i) quantifies uncertainty by assigning a probability to each possible value of the unknown parameter/input and (ii) incorporates prior information and beliefs about the parameter/input.
Herein, we provide a tutorial of BSI for inverse problems, by way of illustrative examples dealing with heat transfer from ambient air to a cold lime fruit. First, we use BSI to infer a parameter in a dynamic model of the lime temperature from measurements of the lime temperature over time. Second, we use BSI to reconstruct the initial condition of the lime from a measurement of its temperature later in time. We demonstrate the incorporation of prior information, visualize the posterior distributions of the parameter/initial condition, and show posterior samples of lime temperature trajectories from the model. Our tutorial aims to reach a wide range of scientists and engineers.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Space Trusted Autonomy Readiness Levels
Authors:
Kerianne L. Hobbs,
Joseph B. Lyons,
Martin S. Feather,
Benjamen P Bycroft,
Sean Phillips,
Michelle Simon,
Mark Harter,
Kenneth Costello,
Yuri Gawdiak,
Stephen Paine
Abstract:
Technology Readiness Levels are a mainstay for organizations that fund, develop, test, acquire, or use technologies. Technology Readiness Levels provide a standardized assessment of a technology's maturity and enable consistent comparison among technologies. They inform decisions throughout a technology's development life cycle, from concept, through development, to use. A variety of alternative R…
▽ More
Technology Readiness Levels are a mainstay for organizations that fund, develop, test, acquire, or use technologies. Technology Readiness Levels provide a standardized assessment of a technology's maturity and enable consistent comparison among technologies. They inform decisions throughout a technology's development life cycle, from concept, through development, to use. A variety of alternative Readiness Levels have been developed, including Algorithm Readiness Levels, Manufacturing Readiness Levels, Human Readiness Levels, Commercialization Readiness Levels, Machine Learning Readiness Levels, and Technology Commitment Levels. However, while Technology Readiness Levels have been increasingly applied to emerging disciplines, there are unique challenges to assessing the rapidly developing capabilities of autonomy. This paper adopts the moniker of Space Trusted Autonomy Readiness Levels to identify a two-dimensional scale of readiness and trust appropriate for the special challenges of assessing autonomy technologies that seek space use. It draws inspiration from other readiness levels' definitions, and from the rich field of trust and trustworthiness. The Space Trusted Autonomy Readiness Levels were developed by a collaborative Space Trusted Autonomy subgroup, which was created from The Space Science and Technology Partnership Forum between the United States Space Force, the National Aeronautics and Space Administration, and the National Reconnaissance Office.
△ Less
Submitted 24 October, 2022; v1 submitted 13 October, 2022;
originally announced October 2022.
-
LiMoSeg: Real-time Bird's Eye View based LiDAR Motion Segmentation
Authors:
Sambit Mohapatra,
Mona Hodaei,
Senthil Yogamani,
Stefan Milz,
Heinrich Gotzig,
Martin Simon,
Hazem Rashed,
Patrick Maeder
Abstract:
Moving object detection and segmentation is an essential task in the Autonomous Driving pipeline. Detecting and isolating static and moving components of a vehicle's surroundings are particularly crucial in path planning and localization tasks. This paper proposes a novel real-time architecture for motion segmentation of Light Detection and Ranging (LiDAR) data. We use three successive scans of Li…
▽ More
Moving object detection and segmentation is an essential task in the Autonomous Driving pipeline. Detecting and isolating static and moving components of a vehicle's surroundings are particularly crucial in path planning and localization tasks. This paper proposes a novel real-time architecture for motion segmentation of Light Detection and Ranging (LiDAR) data. We use three successive scans of LiDAR data in 2D Bird's Eye View (BEV) representation to perform pixel-wise classification as static or moving. Furthermore, we propose a novel data augmentation technique to reduce the significant class imbalance between static and moving objects. We achieve this by artificially synthesizing moving objects by cutting and pasting static vehicles. We demonstrate a low latency of 8 ms on a commonly used automotive embedded platform, namely Nvidia Jetson Xavier. To the best of our knowledge, this is the first work directly performing motion segmentation in LiDAR BEV space. We provide quantitative results on the challenging SemanticKITTI dataset, and qualitative results are provided in https://youtu.be/2aJ-cL8b0LI.
△ Less
Submitted 22 January, 2022; v1 submitted 8 November, 2021;
originally announced November 2021.
-
Trustworthy Artificial Intelligence and Process Mining: Challenges and Opportunities
Authors:
Andrew Pery,
Majid Rafiei,
Michael Simon,
Wil M. P. van der Aalst
Abstract:
The premise of this paper is that compliance with Trustworthy AI governance best practices and regulatory frameworks is an inherently fragmented process spanning across diverse organizational units, external stakeholders, and systems of record, resulting in process uncertainties and in compliance gaps that may expose organizations to reputational and regulatory risks. Moreover, there are complexit…
▽ More
The premise of this paper is that compliance with Trustworthy AI governance best practices and regulatory frameworks is an inherently fragmented process spanning across diverse organizational units, external stakeholders, and systems of record, resulting in process uncertainties and in compliance gaps that may expose organizations to reputational and regulatory risks. Moreover, there are complexities associated with meeting the specific dimensions of Trustworthy AI best practices such as data governance, conformance testing, quality assurance of AI model behaviors, transparency, accountability, and confidentiality requirements. These processes involve multiple steps, hand-offs, re-works, and human-in-the-loop oversight. In this paper, we demonstrate that process mining can provide a useful framework for gaining fact-based visibility to AI compliance process execution, surfacing compliance bottlenecks, and providing for an automated approach to analyze, remediate and monitor uncertainty in AI regulatory compliance processes.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Parsimonious Edge Computing to Reduce Microservice Resource Usage
Authors:
Mathieu Simon,
Alessandro Spallina,
Loic Dubocquet,
Andrea Araldo
Abstract:
Cloud Computing (CC) is the most prevalent paradigm under which services are provided over the Internet. The most relevant feature for its success is its capability to promptly scale service based on user demand. When scaling, the main objective is to maximize as much as possible service performance. Moreover, resources in the Cloud are usually so abundant, that they can be assumed infinite from t…
▽ More
Cloud Computing (CC) is the most prevalent paradigm under which services are provided over the Internet. The most relevant feature for its success is its capability to promptly scale service based on user demand. When scaling, the main objective is to maximize as much as possible service performance. Moreover, resources in the Cloud are usually so abundant, that they can be assumed infinite from the service point of view: an application provider can have as many servers it wills, as long it pays for it. This model has some limitations. First, energy efficiency is not among the first criteria for scaling decisions, which has raised concerns about the environmental effects of today's wild computations in the Cloud. Moreover, it is not viable for Edge Computing (EC), a paradigm in which computational resources are distributed up to the very edge of the network, i.e., co-located with base stations or access points. In edge nodes, resources are limited, which imposes different parsimonious scaling strategies to be adopted. In this work, we design a scaling strategy aimed to instantiate, parsimoniously, a number of microservices sufficient to guarantee a certain Quality of Service (QoS) target. We implement such a strategy in a Kubernetes/Docker environment. The strategy is based on a simple Proportional-Integrative-Derivative (PID) controller. In this paper we describe the system design and a preliminary performance evaluation.
△ Less
Submitted 6 September, 2021;
originally announced September 2021.
-
Application and Benchmark of SPH for Modeling the Impact in Thermal Spraying
Authors:
Stefan Rhys Jeske,
Jan Bender,
Kirsten Bobzin,
Hendrik Heinemann,
Kevin Jasutyn,
Marek Simon,
Oleg Mokrov,
Rahul Sharma,
Uwe Reisgen
Abstract:
The properties of a thermally sprayed coating, such as its durability or thermal conductivity depend on its microstructure, which is in turn directly related to the particle impact process. To simulate this process we present a 3D Smoothed Particle Hydrodynamics (SPH) model, which represents the molten droplet as an incompressible fluid, while a semi-implicit Enthalpy-Porosity method is applied fo…
▽ More
The properties of a thermally sprayed coating, such as its durability or thermal conductivity depend on its microstructure, which is in turn directly related to the particle impact process. To simulate this process we present a 3D Smoothed Particle Hydrodynamics (SPH) model, which represents the molten droplet as an incompressible fluid, while a semi-implicit Enthalpy-Porosity method is applied for the mushy zone during solidification. In addition, we present an implicit correction for SPH simulations, based on well known approaches, from which we can observe improved performance and simulation stability. We apply our SPH method to the impact and solidification of Al$_2$O$_3$ droplets onto a free slip substrate and perform a rigorous quantitative comparison of our method with the commercial software Ansys Fluent using the Volume of Fluid (VOF) approach, while taking identical physical effects into consideration. The results are evaluated in depth and we discuss the applicability of either method for the simulation of thermal spray deposition. We show that SPH is an excellent method for solving this free surface problem accurately and efficiently.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
Quantitative Evaluation of SPH in TIG Spot Welding
Authors:
Stefan Rhys Jeske,
Marek Simon,
Oleksii Semenov,
Jan Kruska,
Oleg Mokrov,
Rahul Sharma,
Uwe Reisgen,
Jan Bender
Abstract:
While the application of the Smoothed Particle Hydrodynamics (SPH) method for the modeling of welding processes has become increasingly popular in recent years, little is yet known about the quantitative predictive capability of this method. We propose a novel SPH model for the simulation of the tungsten inert gas (TIG) spot welding process and conduct a thorough comparison between our SPH impleme…
▽ More
While the application of the Smoothed Particle Hydrodynamics (SPH) method for the modeling of welding processes has become increasingly popular in recent years, little is yet known about the quantitative predictive capability of this method. We propose a novel SPH model for the simulation of the tungsten inert gas (TIG) spot welding process and conduct a thorough comparison between our SPH implementation and two Finite Element Method (FEM) based models. In order to be able to quantitatively compare the results of our SPH simulation method with grid based methods we additionally propose an improved particle to grid interpolation method based on linear least-squares with an optional hole-filling pass which accounts for missing particles. We show that SPH is able to yield excellent results, especially given the observed deviations between the investigated FEM methods and as such, we validate the accuracy of the method for an industrially relevant engineering application.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
A Benchmark for Spray from Nearby Cutting Vehicles
Authors:
Stefanie Walz,
Mario Bijelic,
Florian Kraus,
Werner Ritter,
Martin Simon,
Igor Doric
Abstract:
Current driver assistance systems and autonomous driving stacks are limited to well-defined environment conditions and geo fenced areas. To increase driving safety in adverse weather conditions, broadening the application spectrum of autonomous driving and driver assistance systems is necessary. In order to enable this development, reproducible benchmarking methods are required to quantify the exp…
▽ More
Current driver assistance systems and autonomous driving stacks are limited to well-defined environment conditions and geo fenced areas. To increase driving safety in adverse weather conditions, broadening the application spectrum of autonomous driving and driver assistance systems is necessary. In order to enable this development, reproducible benchmarking methods are required to quantify the expected distortions. In this publication, a testing methodology for disturbances from spray is presented. It introduces a novel lightweight and configurable spray setup alongside an evaluation scheme to assess the disturbances caused by spray. The analysis covers an automotive RGB camera and two different LiDAR systems, as well as downstream detection algorithms based on YOLOv3 and PV-RCNN. In a common scenario of a closely cutting vehicle, it is visible that the distortions are severely affecting the perception stack up to four seconds showing the necessity of benchmarking the influences of spray.
△ Less
Submitted 24 August, 2021;
originally announced August 2021.
-
Conference proceedings KI4Industry AI for SMEs -- The online congress for practical entry into AI for SMEs
Authors:
Michael Arnemann,
Per Olof Beckemeier,
Thomas Bertram,
Michael Eder,
Maximilian Erschig,
Matthias Feiner,
Francisco Javier Fernandez Garcia,
Frederic Foerster,
Ruediger Haas,
Martin Kipfmueller,
Jan Kotschenreuther,
Bernd Langer,
Ivan Lozada Rodriguez,
Thomas Meibert,
Simon Ottenhaus,
Stefan Paschek,
Lars Pfotzer,
Michael M. Roth,
Tim Schanz,
Philip Scherer,
Janine Schwienke,
Martin Simon,
Robin Tenscher-Philipp
Abstract:
The Institute of Materials and Processes, IMP, of the University of Applied Sciences in Karlsruhe, Germany in cooperation with VDI Verein Deutscher Ingenieure e.V, AEN Automotive Engineering Network and their cooperation partners present their competences of AI-based solution approaches in the production engineering field. The online congress KI 4 Industry on November 12 and 13, 2020, showed what…
▽ More
The Institute of Materials and Processes, IMP, of the University of Applied Sciences in Karlsruhe, Germany in cooperation with VDI Verein Deutscher Ingenieure e.V, AEN Automotive Engineering Network and their cooperation partners present their competences of AI-based solution approaches in the production engineering field. The online congress KI 4 Industry on November 12 and 13, 2020, showed what opportunities the use of artificial intelligence offers for medium-sized manufacturing companies, SMEs, and where potential fields of application lie. The main purpose of KI 4 Industry is to increase the transfer of knowledge, research and technology from universities to small and medium-sized enterprises, to demystify the term AI and to encourage companies to use AI-based solutions in their own value chain or in their products.
△ Less
Submitted 5 August, 2021; v1 submitted 14 June, 2021;
originally announced June 2021.
-
StickyPillars: Robust and Efficient Feature Matching on Point Clouds using Graph Neural Networks
Authors:
Kai Fischer,
Martin Simon,
Florian Oelsner,
Stefan Milz,
Horst-Michael Gross,
Patrick Maeder
Abstract:
Robust point cloud registration in real-time is an important prerequisite for many mapping and localization algorithms. Traditional methods like ICP tend to fail without good initialization, insufficient overlap or in the presence of dynamic objects. Modern deep learning based registration approaches present much better results, but suffer from a heavy run-time. We overcome these drawbacks by intr…
▽ More
Robust point cloud registration in real-time is an important prerequisite for many mapping and localization algorithms. Traditional methods like ICP tend to fail without good initialization, insufficient overlap or in the presence of dynamic objects. Modern deep learning based registration approaches present much better results, but suffer from a heavy run-time. We overcome these drawbacks by introducing StickyPillars, a fast, accurate and extremely robust deep middle-end 3D feature matching method on point clouds. It uses graph neural networks and performs context aggregation on sparse 3D key-points with the aid of transformer based multi-head self and cross-attention. The network output is used as the cost for an optimal transport problem whose solution yields the final matching probabilities. The system does not rely on hand crafted feature descriptors or heuristic matching strategies. We present state-of-art art accuracy results on the registration problem demonstrated on the KITTI dataset while being four times faster then leading deep methods. Furthermore, we integrate our matching system into a LiDAR odometry pipeline yielding most accurate results on the KITTI odometry dataset. Finally, we demonstrate robustness on KITTI odometry. Our method remains stable in accuracy where state-of-the-art procedures fail on frame drops and higher speeds.
△ Less
Submitted 19 February, 2021; v1 submitted 10 February, 2020;
originally announced February 2020.
-
Video-based Bottleneck Detection utilizing Lagrangian Dynamics in Crowded Scenes
Authors:
Maik Simon,
Markus Küchhold,
Tobias Senst,
Erik Bochinski,
Thomas Sikora
Abstract:
Avoiding bottleneck situations in crowds is critical for the safety and comfort of people at large events or in public transportation. Based on the work of Lagrangian motion analysis we propose a novel video-based bottleneckdetector by identifying characteristic stowage patterns in crowd-movements captured by optical flow fields. The Lagrangian framework allows to assess complex timedependent crow…
▽ More
Avoiding bottleneck situations in crowds is critical for the safety and comfort of people at large events or in public transportation. Based on the work of Lagrangian motion analysis we propose a novel video-based bottleneckdetector by identifying characteristic stowage patterns in crowd-movements captured by optical flow fields. The Lagrangian framework allows to assess complex timedependent crowd-motion dynamics at large temporal scales near the bottleneck by two dimensional Lagrangian fields. In particular we propose long-term temporal filtered Finite Time Lyapunov Exponents (FTLE) fields that provide towards a more global segmentation of the crowd movements and allows to capture its deformations when a crowd is passing a bottleneck. Finally, these deformations are used for an automatic spatio-temporal detection of such situations. The performance of the proposed approach is shown in extensive evaluations on the existing Jülich and AGORASET datasets, that we have updated with ground truth data for spatio-temporal bottleneck analysis.
△ Less
Submitted 21 August, 2019;
originally announced August 2019.
-
WoodScape: A multi-task, multi-camera fisheye dataset for autonomous driving
Authors:
Senthil Yogamani,
Ciaran Hughes,
Jonathan Horgan,
Ganesh Sistu,
Padraig Varley,
Derek O'Dea,
Michal Uricar,
Stefan Milz,
Martin Simon,
Karl Amende,
Christian Witt,
Hazem Rashed,
Sumanth Chennupati,
Sanjaya Nayak,
Saquib Mansoor,
Xavier Perroton,
Patrick Perez
Abstract:
Fisheye cameras are commonly employed for obtaining a large field of view in surveillance, augmented reality and in particular automotive applications. In spite of their prevalence, there are few public datasets for detailed evaluation of computer vision algorithms on fisheye images. We release the first extensive fisheye automotive dataset, WoodScape, named after Robert Wood who invented the fish…
▽ More
Fisheye cameras are commonly employed for obtaining a large field of view in surveillance, augmented reality and in particular automotive applications. In spite of their prevalence, there are few public datasets for detailed evaluation of computer vision algorithms on fisheye images. We release the first extensive fisheye automotive dataset, WoodScape, named after Robert Wood who invented the fisheye camera in 1906. WoodScape comprises of four surround view cameras and nine tasks including segmentation, depth estimation, 3D bounding box detection and soiling detection. Semantic annotation of 40 classes at the instance level is provided for over 10,000 images and annotation for other tasks are provided for over 100,000 images. With WoodScape, we would like to encourage the community to adapt computer vision models for fisheye camera instead of using naive rectification.
△ Less
Submitted 2 July, 2021; v1 submitted 4 May, 2019;
originally announced May 2019.
-
Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds
Authors:
Martin Simon,
Karl Amende,
Andrea Kraus,
Jens Honer,
Timo Sämann,
Hauke Kaulbersch,
Stefan Milz,
Horst Michael Gross
Abstract:
Accurate detection of 3D objects is a fundamental problem in computer vision and has an enormous impact on autonomous cars, augmented/virtual reality and many applications in robotics. In this work we present a novel fusion of neural network based state-of-the-art 3D detector and visual semantic segmentation in the context of autonomous driving. Additionally, we introduce Scale-Rotation-Translatio…
▽ More
Accurate detection of 3D objects is a fundamental problem in computer vision and has an enormous impact on autonomous cars, augmented/virtual reality and many applications in robotics. In this work we present a novel fusion of neural network based state-of-the-art 3D detector and visual semantic segmentation in the context of autonomous driving. Additionally, we introduce Scale-Rotation-Translation score (SRTs), a fast and highly parameterizable evaluation metric for comparison of object detections, which speeds up our inference time up to 20\% and halves training time. On top, we apply state-of-the-art online multi target feature tracking on the object measurements to further increase accuracy and robustness utilizing temporal information. Our experiments on KITTI show that we achieve same results as state-of-the-art in all related categories, while maintaining the performance and accuracy trade-off and still run in real-time. Furthermore, our model is the first one that fuses visual semantic with 3D object detection.
△ Less
Submitted 16 April, 2019;
originally announced April 2019.
-
Points2Pix: 3D Point-Cloud to Image Translation using conditional Generative Adversarial Networks
Authors:
Stefan Milz,
Martin Simon,
Kai Fischer,
Maximillian Pöpperl
Abstract:
We present the first approach for 3D point-cloud to image translation based on conditional Generative Adversarial Networks (cGAN). The model handles multi-modal information sources from different domains, i.e. raw point-sets and images. The generator is capable of processing three conditions, whereas the point-cloud is encoded as raw point-set and camera projection. An image background patch is us…
▽ More
We present the first approach for 3D point-cloud to image translation based on conditional Generative Adversarial Networks (cGAN). The model handles multi-modal information sources from different domains, i.e. raw point-sets and images. The generator is capable of processing three conditions, whereas the point-cloud is encoded as raw point-set and camera projection. An image background patch is used as constraint to bias environmental texturing. A global approximation function within the generator is directly applied on the point-cloud (Point-Net). Hence, the representative learning model incorporates global 3D characteristics directly at the latent feature space. Conditions are used to bias the background and the viewpoint of the generated image. This opens up new ways in augmenting or texturing 3D data to aim the generation of fully individual images. We successfully evaluated our method on the Kitti and SunRGBD dataset with an outstanding object detection inception score.
△ Less
Submitted 16 September, 2019; v1 submitted 26 January, 2019;
originally announced January 2019.
-
Efficient Semantic Segmentation for Visual Bird's-eye View Interpretation
Authors:
Timo Sämann,
Karl Amende,
Stefan Milz,
Christian Witt,
Martin Simon,
Johannes Petzold
Abstract:
The ability to perform semantic segmentation in real-time capable applications with limited hardware is of great importance. One such application is the interpretation of the visual bird's-eye view, which requires the semantic segmentation of the four omnidirectional camera images. In this paper, we present an efficient semantic segmentation that sets new standards in terms of runtime and hardware…
▽ More
The ability to perform semantic segmentation in real-time capable applications with limited hardware is of great importance. One such application is the interpretation of the visual bird's-eye view, which requires the semantic segmentation of the four omnidirectional camera images. In this paper, we present an efficient semantic segmentation that sets new standards in terms of runtime and hardware requirements. Our two main contributions are the decrease of the runtime by parallelizing the ArgMax layer and the reduction of hardware requirements by applying the channel pruning method to the ENet model.
△ Less
Submitted 29 November, 2018;
originally announced November 2018.
-
Rademacher Generalization Bounds for Classifier Chains
Authors:
Moura Simon,
Amini Massih-Reza,
Louhichi Sana,
Clausel Marianne
Abstract:
In this paper, we propose a new framework to study the generalization property of classifier chains trained over observations associated with multiple and interdependent class labels. The results are based on large deviation inequalities for Lipschitz functions of weakly dependent sequences proposed by Rio in 2000. We believe that the resulting generalization error bound brings many advantages and…
▽ More
In this paper, we propose a new framework to study the generalization property of classifier chains trained over observations associated with multiple and interdependent class labels. The results are based on large deviation inequalities for Lipschitz functions of weakly dependent sequences proposed by Rio in 2000. We believe that the resulting generalization error bound brings many advantages and could be adapted to other frameworks that consider interdependent outputs. First, it explicitly exhibits the dependencies between class labels. Secondly, it provides insights of the effect of the order of the chain on the algorithm generalization performances. Finally, the two dependency coefficients that appear in the bound could also be used to design new strategies to decide the order of the chain.
△ Less
Submitted 26 July, 2018;
originally announced July 2018.
-
Complex-YOLO: Real-time 3D Object Detection on Point Clouds
Authors:
Martin Simon,
Stefan Milz,
Karl Amende,
Horst-Michael Gross
Abstract:
Lidar based 3D object detection is inevitable for autonomous driving, because it directly links to environmental understanding and therefore builds the base for prediction and motion planning. The capacity of inferencing highly sparse 3D data in real-time is an ill-posed problem for lots of other application areas besides automated vehicles, e.g. augmented reality, personal robotics or industrial…
▽ More
Lidar based 3D object detection is inevitable for autonomous driving, because it directly links to environmental understanding and therefore builds the base for prediction and motion planning. The capacity of inferencing highly sparse 3D data in real-time is an ill-posed problem for lots of other application areas besides automated vehicles, e.g. augmented reality, personal robotics or industrial automation. We introduce Complex-YOLO, a state of the art real-time 3D object detection network on point clouds only. In this work, we describe a network that expands YOLOv2, a fast 2D standard object detector for RGB images, by a specific complex regression strategy to estimate multi-class 3D boxes in Cartesian space. Thus, we propose a specific Euler-Region-Proposal Network (E-RPN) to estimate the pose of the object by adding an imaginary and a real fraction to the regression network. This ends up in a closed complex space and avoids singularities, which occur by single angle estimations. The E-RPN supports to generalize well during training. Our experiments on the KITTI benchmark suite show that we outperform current leading methods for 3D object detection specifically in terms of efficiency. We achieve state of the art results for cars, pedestrians and cyclists by being more than five times faster than the fastest competitor. Further, our model is capable of estimating all eight KITTI-classes, including Vans, Trucks or sitting pedestrians simultaneously with high accuracy.
△ Less
Submitted 24 September, 2018; v1 submitted 16 March, 2018;
originally announced March 2018.
-
Monocular Fisheye Camera Depth Estimation Using Sparse LiDAR Supervision
Authors:
Varun Ravi Kumar,
Stefan Milz,
Martin Simon,
Christian Witt,
Karl Amende,
Johannes Petzold,
Senthil Yogamani,
Timo Pech
Abstract:
Near field depth estimation around a self driving car is an important function that can be achieved by four wide angle fisheye cameras having a field of view of over 180. Depth estimation based on convolutional neural networks (CNNs) produce state of the art results, but progress is hindered because depth annotation cannot be obtained manually. Synthetic datasets are commonly used but they have li…
▽ More
Near field depth estimation around a self driving car is an important function that can be achieved by four wide angle fisheye cameras having a field of view of over 180. Depth estimation based on convolutional neural networks (CNNs) produce state of the art results, but progress is hindered because depth annotation cannot be obtained manually. Synthetic datasets are commonly used but they have limitations. For instance, they do not capture the extensive variability in the appearance of objects like vehicles present in real datasets. There is also a domain shift while performing inference on natural images illustrated by many attempts to handle the domain adaptation explicitly. In this work, we explore an alternate approach of training using sparse LiDAR data as ground truth for depth estimation for fisheye camera. We built our own dataset using our self driving car setup which has a 64 beam Velodyne LiDAR and four wide angle fisheye cameras. To handle the difference in view points of LiDAR and fisheye camera, an occlusion resolution mechanism was implemented. We started with Eigen's multiscale convolutional network architecture and improved by modifying activation function and optimizer. We obtained promising results on our dataset with RMSE errors comparable to the state of the art results obtained on KITTI.
△ Less
Submitted 24 September, 2018; v1 submitted 16 March, 2018;
originally announced March 2018.
-
Her2 Challenge Contest: A Detailed Assessment of Automated Her2 Scoring Algorithms in Whole Slide Images of Breast Cancer Tissues
Authors:
Talha Qaiser,
Abhik Mukherjee,
Chaitanya Reddy Pb,
Sai Dileep Munugoti,
Vamsi Tallam,
Tomi Pitkäaho,
Taina Lehtimäki,
Thomas Naughton,
Matt Berseth,
Aníbal Pedraza,
Ramakrishnan Mukundan,
Matthew Smith,
Abhir Bhalerao,
Erik Rodner,
Marcel Simon,
Joachim Denzler,
Chao-Hui Huang,
Gloria Bueno,
David Snead,
Ian Ellis,
Mohammad Ilyas,
Nasir Rajpoot
Abstract:
Evaluating expression of the Human epidermal growth factor receptor 2 (Her2) by visual examination of immunohistochemistry (IHC) on invasive breast cancer (BCa) is a key part of the diagnostic assessment of BCa due to its recognised importance as a predictive and prognostic marker in clinical practice. However, visual scoring of Her2 is subjective and consequently prone to inter-observer variabili…
▽ More
Evaluating expression of the Human epidermal growth factor receptor 2 (Her2) by visual examination of immunohistochemistry (IHC) on invasive breast cancer (BCa) is a key part of the diagnostic assessment of BCa due to its recognised importance as a predictive and prognostic marker in clinical practice. However, visual scoring of Her2 is subjective and consequently prone to inter-observer variability. Given the prognostic and therapeutic implications of Her2 scoring, a more objective method is required. In this paper, we report on a recent automated Her2 scoring contest, held in conjunction with the annual PathSoc meeting held in Nottingham in June 2016, aimed at systematically comparing and advancing the state-of-the-art Artificial Intelligence (AI) based automated methods for Her2 scoring. The contest dataset comprised of digitised whole slide images (WSI) of sections from 86 cases of invasive breast carcinoma stained with both Haematoxylin & Eosin (H&E) and IHC for Her2. The contesting algorithms automatically predicted scores of the IHC slides for an unseen subset of the dataset and the predicted scores were compared with the 'ground truth' (a consensus score from at least two experts). We also report on a simple Man vs Machine contest for the scoring of Her2 and show that the automated methods could beat the pathology experts on this contest dataset. This paper presents a benchmark for comparing the performance of automated algorithms for scoring of Her2. It also demonstrates the enormous potential of automated algorithms in assisting the pathologist with objective IHC scoring.
△ Less
Submitted 24 July, 2017; v1 submitted 23 May, 2017;
originally announced May 2017.
-
Generalized orderless pooling performs implicit salient matching
Authors:
Marcel Simon,
Yang Gao,
Trevor Darrell,
Joachim Denzler,
Erik Rodner
Abstract:
Most recent CNN architectures use average pooling as a final feature encoding step. In the field of fine-grained recognition, however, recent global representations like bilinear pooling offer improved performance. In this paper, we generalize average and bilinear pooling to "alpha-pooling", allowing for learning the pooling strategy during training. In addition, we present a novel way to visualiz…
▽ More
Most recent CNN architectures use average pooling as a final feature encoding step. In the field of fine-grained recognition, however, recent global representations like bilinear pooling offer improved performance. In this paper, we generalize average and bilinear pooling to "alpha-pooling", allowing for learning the pooling strategy during training. In addition, we present a novel way to visualize decisions made by these approaches. We identify parts of training images having the highest influence on the prediction of a given test image. It allows for justifying decisions to users and also for analyzing the influence of semantic parts. For example, we can show that the higher capacity VGG16 model focuses much more on the bird's head than, e.g., the lower-capacity VGG-M model when recognizing fine-grained bird categories. Both contributions allow us to analyze the difference when moving between average and bilinear pooling. In addition, experiments show that our generalized approach can outperform both across a variety of standard datasets.
△ Less
Submitted 20 July, 2017; v1 submitted 1 May, 2017;
originally announced May 2017.
-
ImageNet pre-trained models with batch normalization
Authors:
Marcel Simon,
Erik Rodner,
Joachim Denzler
Abstract:
Convolutional neural networks (CNN) pre-trained on ImageNet are the backbone of most state-of-the-art approaches. In this paper, we present a new set of pre-trained models with popular state-of-the-art architectures for the Caffe framework. The first release includes Residual Networks (ResNets) with generation script as well as the batch-normalization-variants of AlexNet and VGG19. All models outp…
▽ More
Convolutional neural networks (CNN) pre-trained on ImageNet are the backbone of most state-of-the-art approaches. In this paper, we present a new set of pre-trained models with popular state-of-the-art architectures for the Caffe framework. The first release includes Residual Networks (ResNets) with generation script as well as the batch-normalization-variants of AlexNet and VGG19. All models outperform previous models with the same architecture. The models and training code are available at http://www.inf-cv.uni-jena.de/Research/CNN+Models.html and https://github.com/cvjena/cnn-models
△ Less
Submitted 6 December, 2016; v1 submitted 5 December, 2016;
originally announced December 2016.
-
Fine-grained Recognition in the Noisy Wild: Sensitivity Analysis of Convolutional Neural Networks Approaches
Authors:
Erik Rodner,
Marcel Simon,
Robert B. Fisher,
Joachim Denzler
Abstract:
In this paper, we study the sensitivity of CNN outputs with respect to image transformations and noise in the area of fine-grained recognition. In particular, we answer the following questions (1) how sensitive are CNNs with respect to image transformations encountered during wild image capture?; (2) how can we predict CNN sensitivity?; and (3) can we increase the robustness of CNNs with respect t…
▽ More
In this paper, we study the sensitivity of CNN outputs with respect to image transformations and noise in the area of fine-grained recognition. In particular, we answer the following questions (1) how sensitive are CNNs with respect to image transformations encountered during wild image capture?; (2) how can we predict CNN sensitivity?; and (3) can we increase the robustness of CNNs with respect to image degradations? To answer the first question, we provide an extensive empirical sensitivity analysis of commonly used CNN architectures (AlexNet, VGG19, GoogleNet) across various types of image degradations. This allows for predicting CNN performance for new domains comprised by images of lower quality or captured from a different viewpoint. We also show how the sensitivity of CNN outputs can be predicted for single images. Furthermore, we demonstrate that input layer dropout or pre-filtering during test time only reduces CNN sensitivity for high levels of degradation.
Experiments for fine-grained recognition tasks reveal that VGG19 is more robust to severe image degradations than AlexNet and GoogleNet. However, small intensity noise can lead to dramatic changes in CNN performance even for VGG19.
△ Less
Submitted 21 October, 2016;
originally announced October 2016.
-
Neither Quick Nor Proper -- Evaluation of QuickProp for Learning Deep Neural Networks
Authors:
Clemens-Alexander Brust,
Sven Sickert,
Marcel Simon,
Erik Rodner,
Joachim Denzler
Abstract:
Neural networks and especially convolutional neural networks are of great interest in current computer vision research. However, many techniques, extensions, and modifications have been published in the past, which are not yet used by current approaches. In this paper, we study the application of a method called QuickProp for training of deep neural networks. In particular, we apply QuickProp duri…
▽ More
Neural networks and especially convolutional neural networks are of great interest in current computer vision research. However, many techniques, extensions, and modifications have been published in the past, which are not yet used by current approaches. In this paper, we study the application of a method called QuickProp for training of deep neural networks. In particular, we apply QuickProp during learning and testing of fully convolutional networks for the task of semantic segmentation. We compare QuickProp empirically with gradient descent, which is the current standard method. Experiments suggest that QuickProp can not compete with standard gradient descent techniques for complex computer vision tasks like semantic segmentation.
△ Less
Submitted 15 June, 2016; v1 submitted 14 June, 2016;
originally announced June 2016.
-
Fine-grained Recognition Datasets for Biodiversity Analysis
Authors:
Erik Rodner,
Marcel Simon,
Gunnar Brehm,
Stephanie Pietsch,
J. Wolfgang Wägele,
Joachim Denzler
Abstract:
In the following paper, we present and discuss challenging applications for fine-grained visual classification (FGVC): biodiversity and species analysis. We not only give details about two challenging new datasets suitable for computer vision research with up to 675 highly similar classes, but also present first results with localized features using convolutional neural networks (CNN). We conclude…
▽ More
In the following paper, we present and discuss challenging applications for fine-grained visual classification (FGVC): biodiversity and species analysis. We not only give details about two challenging new datasets suitable for computer vision research with up to 675 highly similar classes, but also present first results with localized features using convolutional neural networks (CNN). We conclude with a list of challenging new research directions in the area of visual classification for biodiversity research.
△ Less
Submitted 3 July, 2015;
originally announced July 2015.
-
Neural Activation Constellations: Unsupervised Part Model Discovery with Convolutional Networks
Authors:
Marcel Simon,
Erik Rodner
Abstract:
Part models of object categories are essential for challenging recognition tasks, where differences in categories are subtle and only reflected in appearances of small parts of the object. We present an approach that is able to learn part models in a completely unsupervised manner, without part annotations and even without given bounding boxes during learning. The key idea is to find constellation…
▽ More
Part models of object categories are essential for challenging recognition tasks, where differences in categories are subtle and only reflected in appearances of small parts of the object. We present an approach that is able to learn part models in a completely unsupervised manner, without part annotations and even without given bounding boxes during learning. The key idea is to find constellations of neural activation patterns computed using convolutional neural networks. In our experiments, we outperform existing approaches for fine-grained recognition on the CUB200-2011, NA birds, Oxford PETS, and Oxford Flowers dataset in case no part or bounding box annotations are available and achieve state-of-the-art performance for the Stanford Dog dataset. We also show the benefits of neural constellation models as a data augmentation technique for fine-tuning. Furthermore, our paper unites the areas of generic and fine-grained classification, since our approach is suitable for both scenarios. The source code of our method is available online at http://www.inf-cv.uni-jena.de/part_discovery
△ Less
Submitted 5 December, 2015; v1 submitted 30 April, 2015;
originally announced April 2015.
-
Convolutional Patch Networks with Spatial Prior for Road Detection and Urban Scene Understanding
Authors:
Clemens-Alexander Brust,
Sven Sickert,
Marcel Simon,
Erik Rodner,
Joachim Denzler
Abstract:
Classifying single image patches is important in many different applications, such as road detection or scene understanding. In this paper, we present convolutional patch networks, which are convolutional networks learned to distinguish different image patches and which can be used for pixel-wise labeling. We also show how to incorporate spatial information of the patch as an input to the network,…
▽ More
Classifying single image patches is important in many different applications, such as road detection or scene understanding. In this paper, we present convolutional patch networks, which are convolutional networks learned to distinguish different image patches and which can be used for pixel-wise labeling. We also show how to incorporate spatial information of the patch as an input to the network, which allows for learning spatial priors for certain categories jointly with an appearance model. In particular, we focus on road detection and urban scene understanding, two application areas where we are able to achieve state-of-the-art results on the KITTI as well as on the LabelMeFacade dataset.
Furthermore, our paper offers a guideline for people working in the area and desperately wandering through all the painstaking details that render training CNs on image patches extremely difficult.
△ Less
Submitted 23 February, 2015;
originally announced February 2015.
-
Part Detector Discovery in Deep Convolutional Neural Networks
Authors:
Marcel Simon,
Erik Rodner,
Joachim Denzler
Abstract:
Current fine-grained classification approaches often rely on a robust localization of object parts to extract localized feature representations suitable for discrimination. However, part localization is a challenging task due to the large variation of appearance and pose. In this paper, we show how pre-trained convolutional neural networks can be used for robust and efficient object part discovery…
▽ More
Current fine-grained classification approaches often rely on a robust localization of object parts to extract localized feature representations suitable for discrimination. However, part localization is a challenging task due to the large variation of appearance and pose. In this paper, we show how pre-trained convolutional neural networks can be used for robust and efficient object part discovery and localization without the necessity to actually train the network on the current dataset. Our approach called "part detector discovery" (PDD) is based on analyzing the gradient maps of the network outputs and finding activation centers spatially related to annotated semantic parts or bounding boxes.
This allows us not just to obtain excellent performance on the CUB200-2011 dataset, but in contrast to previous approaches also to perform detection and bird classification jointly without requiring a given bounding box annotation during testing and ground-truth parts during training. The code is available at http://www.inf-cv.uni-jena.de/part_discovery and https://github.com/cvjena/PartDetectorDisovery.
△ Less
Submitted 14 November, 2014; v1 submitted 12 November, 2014;
originally announced November 2014.
-
The dynamics of iterated transportation simulations
Authors:
Kai Nagel,
Marcus Rickert,
Patrice M. Simon,
Martin Pieck
Abstract:
Iterating between a router and a traffic micro-simulation is an increasibly accepted method for doing traffic assignment. This paper, after pointing out that the analytical theory of simulation-based assignment to-date is insufficient for some practical cases, presents results of simulation studies from a real world study. Specifically, we look into the issues of uniqueness, variability, and rob…
▽ More
Iterating between a router and a traffic micro-simulation is an increasibly accepted method for doing traffic assignment. This paper, after pointing out that the analytical theory of simulation-based assignment to-date is insufficient for some practical cases, presents results of simulation studies from a real world study. Specifically, we look into the issues of uniqueness, variability, and robustness and validation. Regarding uniqueness, despite some cautionary notes from a theoretical point of view, we find no indication of ``meta-stable'' states for the iterations. Variability however is considerable. By variability we mean the variation of the simulation of a given plan set by just changing the random seed. We show then results from three different micro-simulations under the same iteration scenario in order to test for the robustness of the results under different implementations. We find the results encouraging, also when comparing to reality and with a traditional assignment result.
Keywords: dynamic traffic assignment (DTA); traffic micro-simulation; TRANSIMS; large-scale simulations; urban planning
△ Less
Submitted 22 February, 2000;
originally announced February 2000.