-
It's all Relative: Monocular 3D Human Pose Estimation from Weakly Supervised Data
Authors:
Matteo Ruggero Ronchi,
Oisin Mac Aodha,
Robert Eng,
Pietro Perona
Abstract:
We address the problem of 3D human pose estimation from 2D input images using only weakly supervised training data. Despite showing considerable success for 2D pose estimation, the application of supervised machine learning to 3D pose estimation in real world images is currently hampered by the lack of varied training images with corresponding 3D poses. Most existing 3D pose estimation algorithms…
▽ More
We address the problem of 3D human pose estimation from 2D input images using only weakly supervised training data. Despite showing considerable success for 2D pose estimation, the application of supervised machine learning to 3D pose estimation in real world images is currently hampered by the lack of varied training images with corresponding 3D poses. Most existing 3D pose estimation algorithms train on data that has either been collected in carefully controlled studio settings or has been generated synthetically. Instead, we take a different approach, and propose a 3D human pose estimation algorithm that only requires relative estimates of depth at training time. Such training signal, although noisy, can be easily collected from crowd annotators, and is of sufficient quality for enabling successful training and evaluation of 3D pose algorithms. Our results are competitive with fully supervised regression based approaches on the Human3.6M dataset, despite using significantly weaker training data. Our proposed algorithm opens the door to using existing widespread 2D datasets for 3D pose estimation by allowing fine-tuning with noisy relative constraints, resulting in more accurate 3D poses.
△ Less
Submitted 27 July, 2018; v1 submitted 17 May, 2018;
originally announced May 2018.
-
Benchmarking and Error Diagnosis in Multi-Instance Pose Estimation
Authors:
Matteo Ruggero Ronchi,
Pietro Perona
Abstract:
We propose a new method to analyze the impact of errors in algorithms for multi-instance pose estimation and a principled benchmark that can be used to compare them. We define and characterize three classes of errors - localization, scoring, and background - study how they are influenced by instance attributes and their impact on an algorithm's performance. Our technique is applied to compare the…
▽ More
We propose a new method to analyze the impact of errors in algorithms for multi-instance pose estimation and a principled benchmark that can be used to compare them. We define and characterize three classes of errors - localization, scoring, and background - study how they are influenced by instance attributes and their impact on an algorithm's performance. Our technique is applied to compare the two leading methods for human pose estimation on the COCO Dataset, measure the sensitivity of pose estimation with respect to instance size, type and number of visible keypoints, clutter due to multiple instances, and the relative score of instances. The performance of algorithms, and the types of error they make, are highly dependent on all these variables, but mostly on the number of keypoints and the clutter. The analysis and software tools we propose offer a novel and insightful approach for understanding the behavior of pose estimation algorithms and an effective method for measuring their strengths and weaknesses.
△ Less
Submitted 4 August, 2017; v1 submitted 17 July, 2017;
originally announced July 2017.
-
High-Frequency Modeling and Simulation of a Single-Phase Three-Winding Transformer Including Taps in Regulating Winding
Authors:
Bjorn Gustavsen,
Alvaro Portillo,
Rodrigo Ronchi,
Asgeir Mjelve
Abstract:
Transformer terminal equivalents obtained via admittance measurements are suitable for simulating high-frequency transient interaction between the transformer and the network. This paper augments the terminal equivalent approach with a measurement-based voltage transfer function model which permits calculation of voltages at internal points in the regulating winding. The approach is demonstrated f…
▽ More
Transformer terminal equivalents obtained via admittance measurements are suitable for simulating high-frequency transient interaction between the transformer and the network. This paper augments the terminal equivalent approach with a measurement-based voltage transfer function model which permits calculation of voltages at internal points in the regulating winding. The approach is demonstrated for a single-phase three-winding transformer in tap position Nom+ with inclusion of three internal points in the regulating winding that represent the mid-point and the two extreme ends. The terminal equivalent modeling makes use of additional common-mode measurements to avoid error magnifications to result from the ungrounded tertiary winding. The final model is used in a time domain simulation where ground-fault initiation results in a resonant voltage build-up in the winding. It is shown that that the peak value of the resonant overvoltage can be higher than during the lightning impulse test, with unfavorable network conditions. Additional measurements show that the selected tap position affects the terminal behavior of the transformer, changing the frequency and peak value of the lower resonance point in the voltage transfer between windings.
△ Less
Submitted 17 November, 2016;
originally announced November 2016.
-
A Rotation Invariant Latent Factor Model for Moveme Discovery from Static Poses
Authors:
Matteo Ruggero Ronchi,
Joon Sik Kim,
Yisong Yue
Abstract:
We tackle the problem of learning a rotation invariant latent factor model when the training data is comprised of lower-dimensional projections of the original feature space. The main goal is the discovery of a set of 3-D bases poses that can characterize the manifold of primitive human motions, or movemes, from a training set of 2-D projected poses obtained from still images taken at various came…
▽ More
We tackle the problem of learning a rotation invariant latent factor model when the training data is comprised of lower-dimensional projections of the original feature space. The main goal is the discovery of a set of 3-D bases poses that can characterize the manifold of primitive human motions, or movemes, from a training set of 2-D projected poses obtained from still images taken at various camera angles. The proposed technique for basis discovery is data-driven rather than hand-designed. The learned representation is rotation invariant, and can reconstruct any training instance from multiple viewing angles. We apply our method to modeling human poses in sports (via the Leeds Sports Dataset), and demonstrate the effectiveness of the learned bases in a range of applications such as activity classification, inference of dynamics from a single frame, and synthetic representation of movements.
△ Less
Submitted 23 September, 2016;
originally announced September 2016.
-
Describing Common Human Visual Actions in Images
Authors:
Matteo Ruggero Ronchi,
Pietro Perona
Abstract:
Which common human actions and interactions are recognizable in monocular still images? Which involve objects and/or other people? How many is a person performing at a time? We address these questions by exploring the actions and interactions that are detectable in the images of the MS COCO dataset. We make two main contributions. First, a list of 140 common `visual actions', obtained by analyzing…
▽ More
Which common human actions and interactions are recognizable in monocular still images? Which involve objects and/or other people? How many is a person performing at a time? We address these questions by exploring the actions and interactions that are detectable in the images of the MS COCO dataset. We make two main contributions. First, a list of 140 common `visual actions', obtained by analyzing the largest on-line verb lexicon currently available for English (VerbNet) and human sentences used to describe images in MS COCO. Second, a complete set of annotations for those `visual actions', composed of subject-object and associated verb, which we call COCO-a (a for `actions'). COCO-a is larger than existing action datasets in terms of number of actions and instances of these actions, and is unique because it is data-driven, rather than experimenter-biased. Other unique features are that it is exhaustive, and that all subjects and objects are localized. A statistical analysis of the accuracy of our annotations and of each action, interaction and subject-object combination is provided.
△ Less
Submitted 6 June, 2015;
originally announced June 2015.