Skip to main content

Showing 1–44 of 44 results for author: Hilton, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.21261  [pdf, ps, other

    cs.CV

    HDR Environment Map Estimation with Latent Diffusion Models

    Authors: Jack Hilliard, Adrian Hilton, Jean-Yves Guillemaut

    Abstract: We advance the field of HDR environment map estimation from a single-view image by establishing a novel approach leveraging the Latent Diffusion Model (LDM) to produce high-quality environment maps that can plausibly light mirror-reflective surfaces. A common issue when using the ERP representation, the format used by the vast majority of approaches, is distortions at the poles and a seam at the s… ▽ More

    Submitted 28 July, 2025; originally announced July 2025.

  2. arXiv:2506.14560  [pdf, ps, other

    cs.CV cs.LG

    Risk Estimation of Knee Osteoarthritis Progression via Predictive Multi-task Modelling from Efficient Diffusion Model using X-ray Images

    Authors: David Butler, Adrian Hilton, Gustavo Carneiro

    Abstract: Medical imaging plays a crucial role in assessing knee osteoarthritis (OA) risk by enabling early detection and disease monitoring. Recent machine learning methods have improved risk estimation (i.e., predicting the likelihood of disease progression) and predictive modelling (i.e., the forecasting of future outcomes based on current data) using medical images, but clinical adoption remains limited… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  3. arXiv:2504.03047  [pdf, other

    cs.CV

    Attention-Aware Multi-View Pedestrian Tracking

    Authors: Reef Alturki, Adrian Hilton, Jean-Yves Guillemaut

    Abstract: In spite of the recent advancements in multi-object tracking, occlusion poses a significant challenge. Multi-camera setups have been used to address this challenge by providing a comprehensive coverage of the scene. Recent multi-view pedestrian detection models have highlighted the potential of an early-fusion strategy, projecting feature maps of all views to a common ground plane or the Bird's Ey… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  4. arXiv:2503.14475  [pdf, other

    cs.GR cs.CV

    Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation

    Authors: Umar Farooq, Jean-Yves Guillemaut, Adrian Hilton, Marco Volino

    Abstract: The field of Novel View Synthesis has been revolutionized by 3D Gaussian Splatting (3DGS), which enables high-quality scene reconstruction that can be rendered in real-time. 3DGS-based techniques typically suffer from high GPU memory and disk storage requirements which limits their practical application on consumer-grade devices. We propose Opti3DGS, a novel frequency-modulated coarse-to-fine opti… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  5. arXiv:2503.10982  [pdf, other

    cs.CV

    Enhanced Multi-View Pedestrian Detection Using Probabilistic Occupancy Volume

    Authors: Reef Alturki, Adrian Hilton, Jean-Yves Guillemaut

    Abstract: Occlusion poses a significant challenge in pedestrian detection from a single view. To address this, multi-view detection systems have been utilized to aggregate information from multiple perspectives. Recent advances in multi-view detection utilized an early-fusion strategy that strategically projects the features onto the ground plane, where detection analysis is performed. A promising approach… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  6. arXiv:2502.18150  [pdf, other

    cs.CV

    Realistic Clothed Human and Object Joint Reconstruction from a Single Image

    Authors: Ayushi Dutta, Marco Pesavento, Marco Volino, Adrian Hilton, Armin Mustafa

    Abstract: Recent approaches to jointly reconstruct 3D humans and objects from a single RGB image represent 3D shapes with template-based or coarse models, which fail to capture details of loose clothing on human bodies. In this paper, we introduce a novel implicit approach for jointly reconstructing realistic 3D clothed humans and objects from a monocular view. For the first time, we model both the human an… ▽ More

    Submitted 8 March, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

  7. arXiv:2501.18509  [pdf, other

    cs.CV

    Reframing Dense Action Detection (RefDense): A Paradigm Shift in Problem Solving & a Novel Optimization Strategy

    Authors: Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton

    Abstract: Dense action detection involves detecting multiple co-occurring actions while action classes are often ambiguous and represent overlapping concepts. We argue that handling the dual challenge of temporal and class overlaps is too complex to effectively be tackled by a single network. To address this, we propose to decompose the task of detecting dense ambiguous actions into detecting dense, unambig… ▽ More

    Submitted 11 March, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

    Comments: Computer Vision

  8. arXiv:2410.13566  [pdf, other

    cs.CV cs.GR

    360U-Former: HDR Illumination Estimation with Panoramic Adapted Vision Transformers

    Authors: Jack Hilliard, Adrian Hilton, Jean-Yves Guillemaut

    Abstract: Recent illumination estimation methods have focused on enhancing the resolution and improving the quality and diversity of the generated textures. However, few have explored tailoring the neural network architecture to the Equirectangular Panorama (ERP) format utilised in image-based lighting. Consequently, high dynamic range images (HDRI) results usually exhibit a seam at the side borders and tex… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: Accepted at AIM Workshop 2024 at ECCV 2024, 18 pages, 6 figures

  9. arXiv:2407.10586  [pdf, other

    cs.CV

    COSMU: Complete 3D human shape from monocular unconstrained images

    Authors: Marco Pesavento, Marco Volino, Adrian Hilton

    Abstract: We present a novel framework to reconstruct complete 3D human shapes from a given target image by leveraging monocular unconstrained images. The objective of this work is to reproduce high-quality details in regions of the reconstructed human body that are not visible in the input target. The proposed methodology addresses the limitations of existing approaches for reconstructing 3D human shapes f… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV24

  10. arXiv:2406.14412  [pdf, other

    cs.CV

    Benchmarking Monocular 3D Dog Pose Estimation Using In-The-Wild Motion Capture Data

    Authors: Moira Shooter, Charles Malleson, Adrian Hilton

    Abstract: We introduce a new benchmark analysis focusing on 3D canine pose estimation from monocular in-the-wild images. A multi-modal dataset 3DDogs-Lab was captured indoors, featuring various dog breeds trotting on a walkway. It includes data from optical marker-based mocap systems, RGBD cameras, IMUs, and a pressure mat. While providing high-quality motion data, the presence of optical markers and limite… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 5 pages, 8 figures, including supplementary, CV4Animals Workshop 2024 (CVPRW)

  11. arXiv:2406.06499  [pdf, other

    cs.CV cs.HC

    NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative

    Authors: Asmar Nadeem, Faegheh Sardari, Robert Dawes, Syed Sameed Husain, Adrian Hilton, Armin Mustafa

    Abstract: Existing video captioning benchmarks and models lack causal-temporal narrative, which is sequences of events linked through cause and effect, unfolding over time and driven by characters or agents. This lack of narrative restricts models' ability to generate text descriptions that capture the causal and temporal dynamics inherent in video content. To address this gap, we propose NarrativeBridge, a… ▽ More

    Submitted 15 February, 2025; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: International Conference on Learning Representations (ICLR) 2025

  12. arXiv:2406.06187  [pdf, other

    cs.CV

    An Effective-Efficient Approach for Dense Multi-Label Action Detection

    Authors: Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton

    Abstract: Unlike the sparse label action detection task, where a single action occurs in each timestamp of a video, in a dense multi-label scenario, actions can overlap. To address this challenging task, it is necessary to simultaneously learn (i) temporal dependencies and (ii) co-occurrence action relationships. Recent approaches model temporal information by extracting multi-scale features through hierarc… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 14 pages. arXiv admin note: substantial text overlap with arXiv:2308.05051

  13. arXiv:2406.04251  [pdf, other

    cs.CV

    Improving Gaussian Splatting with Localized Points Management

    Authors: Haosen Yang, Chenhao Zhang, Wenqing Wang, Marco Volino, Adrian Hilton, Li Zhang, Xiatian Zhu

    Abstract: Point management is critical for optimizing 3D Gaussian Splatting models, as point initiation (e.g., via structure from motion) is often distributionally inappropriate. Typically, Adaptive Density Control (ADC) algorithm is adopted, leveraging view-averaged gradient magnitude thresholding for point densification, opacity thresholding for pruning, and regular all-points opacity reset. We reveal tha… ▽ More

    Submitted 19 April, 2025; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: CVPR 2025 (Highlight). Github: https://happy-hsy.github.io/projects/LPM/

  14. arXiv:2405.10690  [pdf, other

    cs.CV

    CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing

    Authors: Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton

    Abstract: Weakly supervised audio-visual video parsing (AVVP) methods aim to detect audible-only, visible-only, and audible-visible events using only video-level labels. Existing approaches tackle this by leveraging unimodal and cross-modal contexts. However, we argue that while cross-modal learning is beneficial for detecting audible-visible events, in the weakly supervised scenario, it negatively impacts… ▽ More

    Submitted 15 July, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted at ECCV 2024

  15. arXiv:2403.10357  [pdf, other

    cs.CV cs.GR

    ANIM: Accurate Neural Implicit Model for Human Reconstruction from a single RGB-D image

    Authors: Marco Pesavento, Yuanlu Xu, Nikolaos Sarafianos, Robert Maier, Ziyan Wang, Chun-Han Yao, Marco Volino, Edmond Boyer, Adrian Hilton, Tony Tung

    Abstract: Recent progress in human shape learning, shows that neural implicit models are effective in generating 3D human surfaces from limited number of views, and even from a single RGB image. However, existing monocular approaches still struggle to recover fine geometric details such as face, hands or cloth wrinkles. They are also easily prone to depth ambiguities that result in distorted geometries alon… ▽ More

    Submitted 18 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR24; Project page: https://marcopesavento.github.io/ANIM/

  16. arXiv:2310.16754  [pdf, other

    cs.CV

    CAD -- Contextual Multi-modal Alignment for Dynamic AVQA

    Authors: Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa

    Abstract: In the context of Audio Visual Question Answering (AVQA) tasks, the audio visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic. Existing AVQA methods suffer from two major shortcomings; the audio-visual (AV) information passing through the network isn't aligned on Spatial and Temporal levels; and, inter-modal (audio and visual) Semantic information is often n… ▽ More

    Submitted 27 October, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted to IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

  17. arXiv:2308.05051  [pdf, other

    cs.CV

    PAT: Position-Aware Transformer for Dense Multi-Label Action Detection

    Authors: Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton

    Abstract: We present PAT, a transformer-based network that learns complex temporal co-occurrence action dependencies in a video by exploiting multi-scale temporal features. In existing methods, the self-attention mechanism in transformers loses the temporal positional information, which is essential for robust action detection. To address this issue, we (i) embed relative positional encoding in the self-att… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  18. End-to-End Latency Optimization of Multi-view 3D Reconstruction for Disaster Response

    Authors: Xiaojie Zhang, Mingjun Li, Andrew Hilton, Amitangshu Pal, Soumyabrata Dey, Saptarshi Debroy

    Abstract: In order to plan rapid response during disasters, first responder agencies often adopt `bring your own device' (BYOD) model with inexpensive mobile edge devices (e.g., drones, robots, tablets) for complex video analytics applications, e.g., 3D reconstruction of a disaster scene. Unlike simpler video applications, widely used Multi-view Stereo (MVS) based 3D reconstruction applications (e.g., openM… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: 2022 10th IEEE International Conference on Mobile Cloud Computing, Services, and Engineering (MobileCloud)

  19. arXiv:2303.14829  [pdf, other

    cs.CV

    SEM-POS: Grammatically and Semantically Correct Video Captioning

    Authors: Asmar Nadeem, Adrian Hilton, Robert Dawes, Graham Thomas, Armin Mustafa

    Abstract: Generating grammatically and semantically correct captions in video captioning is a challenging task. The captions generated from the existing methods are either word-by-word that do not align with grammatical structure or miss key information from the input videos. To address these issues, we introduce a novel global-local fusion network, with a Global-Local Fusion Block (GLFB) that encodes and f… ▽ More

    Submitted 4 April, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

  20. arXiv:2209.06401  [pdf, ps, other

    math.CO cs.DM

    Ryser's Theorem for Symmetric $ρ$-latin Squares

    Authors: Amin Bahmanian, A. J. W. Hilton

    Abstract: Let $L$ be an $n\times n$ array whose top left $r\times r$ subarray is filled with $k$ different symbols, each occurring at most once in each row and at most once in each column. We establish necessary and sufficient conditions that ensure the remaining cells of $L$ can be filled such that each symbol occurs at most once in each row and at most once in each column, $L$ is symmetric with respect to… ▽ More

    Submitted 14 September, 2025; v1 submitted 13 September, 2022; originally announced September 2022.

    Comments: 15 pages. arXiv admin note: text overlap with arXiv:2201.04793

    MSC Class: 05B15; 05C70; 05C15

  21. arXiv:2208.10738  [pdf, other

    cs.CV

    Super-resolution 3D Human Shape from a Single Low-Resolution Image

    Authors: Marco Pesavento, Marco Volino, Adrian Hilton

    Abstract: We propose a novel framework to reconstruct super-resolution human shape from a single low-resolution input image. The approach overcomes limitations of existing approaches that reconstruct 3D human shape from a single image, which require high-resolution images together with auxiliary data such as surface normal or a parametric model to reconstruct high-detail shape. The proposed framework repres… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

  22. arXiv:2203.03291  [pdf, other

    eess.AS cs.SD eess.IV

    Visually Supervised Speaker Detection and Localization via Microphone Array

    Authors: Davide Berghi, Adrian Hilton, Philip J. B. Jackson

    Abstract: Active speaker detection (ASD) is a multi-modal task that aims to identify who, if anyone, is speaking from a set of candidates. Current audio-visual approaches for ASD typically rely on visually pre-extracted face tracks (sequences of consecutive face crops) and the respective monaural audio. However, their recall rate is often low as only the visible faces are included in the set of candidates.… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: Erratum: Due to a bug in the evaluation script, the correct average distance (aD) metric is here reported in yellow. The analysis remains unchanged from the original paper as the trend between the old and new measures are perfectly monotonic. The bug was caused by an incorrect normalization factor

    Journal ref: IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), 2021

  23. arXiv:2108.13739  [pdf, other

    cs.CV cs.LG cs.MM

    Super-Resolution Appearance Transfer for 4D Human Performances

    Authors: Marco Pesavento, Marco Volino, Adrian Hilton

    Abstract: A common problem in the 4D reconstruction of people from multi-view video is the quality of the captured dynamic texture appearance which depends on both the camera resolution and capture volume. Typically the requirement to frame cameras to capture the volume of a dynamic performance ($>50m^3$) results in the person occupying only a small proportion $<$ 10% of the field of view. Even with ultra h… ▽ More

    Submitted 31 August, 2021; originally announced August 2021.

  24. arXiv:2108.13697  [pdf, other

    cs.CV cs.AI cs.LG

    Attention-based Multi-Reference Learning for Image Super-Resolution

    Authors: Marco Pesavento, Marco Volino, Adrian Hilton

    Abstract: This paper proposes a novel Attention-based Multi-Reference Super-resolution network (AMRSR) that, given a low-resolution image, learns to adaptively transfer the most similar texture from multiple reference images to the super-resolution output whilst maintaining spatial coherence. The use of multiple reference images together with attention-based sampling is demonstrated to achieve significantly… ▽ More

    Submitted 31 August, 2021; originally announced August 2021.

  25. arXiv:2108.00249  [pdf, other

    cs.CV cs.AI cs.GR

    SyDog: A Synthetic Dog Dataset for Improved 2D Pose Estimation

    Authors: Moira Shooter, Charles Malleson, Adrian Hilton

    Abstract: Estimating the pose of animals can facilitate the understanding of animal motion which is fundamental in disciplines such as biomechanics, neuroscience, ethology, robotics and the entertainment industry. Human pose estimation models have achieved high performance due to the huge amount of training data available. Achieving the same results for animal pose estimation is challenging due to the lack… ▽ More

    Submitted 31 July, 2021; originally announced August 2021.

    Comments: 5 pages, 1 figure, Poster presentation at the Computer Vision for Animal Behavior Tracking and Modeling (CV4Animals:) Workshop in conjunction with CVPR 2021

  26. arXiv:2104.09283  [pdf, other

    cs.CV

    Multi-person Implicit Reconstruction from a Single Image

    Authors: Armin Mustafa, Akin Caliskan, Lourdes Agapito, Adrian Hilton

    Abstract: We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image. Existing multi-person methods suffer from two main drawbacks: they are often model-based and therefore cannot capture accurate 3D models of people with loose clothing and hair; or they require manual intervention to resolve occlusions or interactions. Our… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: To appear in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021

  27. arXiv:2104.09259  [pdf, other

    cs.CV

    Temporal Consistency Loss for High Resolution Textured and Clothed 3DHuman Reconstruction from Monocular Video

    Authors: Akin Caliskan, Armin Mustafa, Adrian Hilton

    Abstract: We present a novel method to learn temporally consistent 3D reconstruction of clothed people from a monocular video. Recent methods for 3D human reconstruction from monocular video using volumetric, implicit or parametric human shape models, produce per frame reconstructions giving temporally inconsistent output and limited performance when applied to video. In this paper, we introduce an approach… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: To appear in Dynavis Workshop, CVPR 2021

  28. arXiv:2009.14162  [pdf, other

    cs.CV

    Multi-View Consistency Loss for Improved Single-Image 3D Reconstruction of Clothed People

    Authors: Akin Caliskan, Armin Mustafa, Evren Imre, Adrian Hilton

    Abstract: We present a novel method to improve the accuracy of the 3D reconstruction of clothed human shape from a single image. Recent work has introduced volumetric, implicit and model-based shape learning frameworks for reconstruction of objects and people from one or more images. However, the accuracy and completeness for reconstruction of clothed people is limited due to the large variation in shape re… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

    Comments: Accepted to Asian Conference on Computer Vision 2020 (ACCV)

  29. arXiv:2009.05235  [pdf, ps, other

    cs.CV

    Spectral Analysis Network for Deep Representation Learning and Image Clustering

    Authors: Jinghua Wang, Adrian Hilton, Jianmin Jiang

    Abstract: Deep representation learning is a crucial procedure in multimedia analysis and attracts increasing attention. Most of the popular techniques rely on convolutional neural network and require a large amount of labeled data in the training procedure. However, it is time consuming or even impossible to obtain the label information in some tasks due to cost limitation. Thus, it is necessary to develop… ▽ More

    Submitted 11 September, 2020; originally announced September 2020.

    Journal ref: ICME2019

  30. arXiv:2003.06656  [pdf, other

    eess.AS cs.SD eess.IV

    Audio-Visual Spatial Aligment Requirements of Central and Peripheral Object Events

    Authors: Davide Berghi, Hanne Stenzel, Marco Volino, Adrian Hilton, Philip J. B. Jackson

    Abstract: Immersive audio-visual perception relies on the spatial integration of both auditory and visual information which are heterogeneous sensing modalities with different fields of reception and spatial resolution. This study investigates the perceived coherence of audiovisual object events presented either centrally or peripherally with horizontally aligned/misaligned sound. Various object events were… ▽ More

    Submitted 14 March, 2020; originally announced March 2020.

    Comments: Two-pages poster abstract

    Journal ref: IEEE VR 2020

  31. arXiv:1911.03926  [pdf, ps, other

    cs.PL

    Gemini: A Functional Programming Language for Hardware Description

    Authors: Aditya Srinivasan, Andrew D. Hilton

    Abstract: This paper presents Gemini, a functional programming language for hardware description that provides features such as parametric polymorphism, recursive datatypes, higher-order functions, and type inference for higher expressivity compared to modern hardware description languages. Gemini demonstrates the theory and implementation of novel type-theoretical concepts through its unique type system co… ▽ More

    Submitted 10 November, 2019; originally announced November 2019.

  32. arXiv:1910.01241  [pdf, other

    cs.CV

    Learning Dense Wide Baseline Stereo Matching for People

    Authors: Akin Caliskan, Armin Mustafa, Evren Imre, Adrian Hilton

    Abstract: Existing methods for stereo work on narrow baseline image pairs giving limited performance between wide baseline views. This paper proposes a framework to learn and estimate dense stereo for people from wide baseline image pairs. A synthetic people stereo patch dataset (S2P2) is introduced to learn wide baseline dense stereo matching for people. The proposed framework not only learns human specifi… ▽ More

    Submitted 2 October, 2019; originally announced October 2019.

    Comments: To appear in 3D Reconstruction in the Wild Workshop, ICCV 2019

  33. arXiv:1908.03030  [pdf, other

    cs.CV

    Semantic Estimation of 3D Body Shape and Pose using Minimal Cameras

    Authors: Andrew Gilbert, Matthew Trumble, Adrian Hilton, John Collomosse

    Abstract: We aim to simultaneously estimate the 3D articulated pose and high fidelity volumetric occupancy of human performance, from multiple viewpoint video (MVV) with as few as two views. We use a multi-channel symmetric 3D convolutional encoder-decoder with a dual loss to enforce the learning of a latent embedding that enables inference of skeletal joint positions and a volumetric reconstruction of the… ▽ More

    Submitted 7 September, 2020; v1 submitted 8 August, 2019; originally announced August 2019.

  34. EdgeNet: Semantic Scene Completion from a Single RGB-D Image

    Authors: Aloisio Dourado, Teofilo Emidio de Campos, Hansung Kim, Adrian Hilton

    Abstract: Semantic scene completion is the task of predicting a complete 3D representation of volumetric occupancy with corresponding semantic labels for a scene from a single point of view. Previous works on Semantic Scene Completion from RGB-D data used either only depth or depth with colour by projecting the 2D image into the 3D volume resulting in a sparse data representation. In this work, we present a… ▽ More

    Submitted 6 September, 2020; v1 submitted 7 August, 2019; originally announced August 2019.

    Comments: 10 pages, 5 figures Accepted at ICPR 2020

    ACM Class: I.4.6; I.4.8

  35. arXiv:1907.09905  [pdf, other

    cs.CV

    U4D: Unsupervised 4D Dynamic Scene Understanding

    Authors: Armin Mustafa, Chris Russell, Adrian Hilton

    Abstract: We introduce the first approach to solve the challenging problem of unsupervised 4D visual scene understanding for complex dynamic scenes with multiple interacting people from multi-view video. Our approach simultaneously estimates a detailed model that includes a per-pixel semantically and temporally coherent reconstruction, together with instance-level segmentation exploiting photo-consistency,… ▽ More

    Submitted 23 July, 2019; originally announced July 2019.

    Comments: To appear in IEEE International Conference in Computer Vision ICCV 2019

  36. arXiv:1907.08195  [pdf, other

    cs.CV

    Temporally Coherent General Dynamic Scene Reconstruction

    Authors: Armin Mustafa, Marco Volino, Hansung Kim, Jean-Yves Guillemaut, Adrian Hilton

    Abstract: Existing techniques for dynamic scene reconstruction from multiple wide-baseline cameras primarily focus on reconstruction in controlled environments, with fixed calibrated cameras and strong prior constraints. This paper introduces a general approach to obtain a 4D representation of complex dynamic scenes from multi-view wide-baseline static or moving cameras without prior knowledge of the scene… ▽ More

    Submitted 3 August, 2020; v1 submitted 18 July, 2019; originally announced July 2019.

    Comments: Submitted to IJCV 2019. arXiv admin note: substantial text overlap with arXiv:1603.03381

  37. arXiv:1807.01950  [pdf, other

    cs.CV

    Volumetric performance capture from minimal camera viewpoints

    Authors: Andrew Gilbert, Marco Volino, John Collomosse, Adrian Hilton

    Abstract: We present a convolutional autoencoder that enables high fidelity volumetric reconstructions of human performance to be captured from multi-view video comprising only a small set of camera views. Our method yields similar end-to-end reconstruction error to that of a probabilistic visual hull computed using significantly more (double or more) viewpoints. We use a deep prior implicitly learned by th… ▽ More

    Submitted 10 July, 2018; v1 submitted 5 July, 2018; originally announced July 2018.

  38. arXiv:1807.01511  [pdf, other

    cs.CV

    Deep Autoencoder for Combined Human Pose Estimation and body Model Upscaling

    Authors: Matthew Trumble, Andrew Gilbert, Adrian Hilton, John Collomosse

    Abstract: We present a method for simultaneously estimating 3D human pose and body shape from a sparse set of wide-baseline camera views. We train a symmetric convolutional autoencoder with a dual loss that enforces learning of a latent representation that encodes skeletal joint positions, and at the same time learns a deep representation of volumetric body shape. We harness the latter to up-scale input vol… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

  39. arXiv:1804.11276  [pdf, other

    cs.CV

    4D Temporally Coherent Light-field Video

    Authors: Armin Mustafa, Marco Volino, Jean-yves Guillemaut, Adrian Hilton

    Abstract: Light-field video has recently been used in virtual and augmented reality applications to increase realism and immersion. However, existing light-field methods are generally limited to static scenes due to the requirement to acquire a dense scene representation. The large amount of data and the absence of methods to infer temporal coherence pose major challenges in storage, compression and editing… ▽ More

    Submitted 30 April, 2018; originally announced April 2018.

    Comments: Published in 3D Vision (3DV) 2017

  40. arXiv:1802.04735  [pdf, other

    cs.CV

    Semantic Scene Completion Combining Colour and Depth: preliminary experiments

    Authors: Andre Bernardes Soares Guedes, Teofilo Emidio de Campos, Adrian Hilton

    Abstract: Semantic scene completion is the task of producing a complete 3D voxel representation of volumetric occupancy with semantic labels for a scene from a single-view observation. We built upon the recent work of Song et al. (CVPR 2017), who proposed SSCnet, a method that performs scene completion and semantic labelling in a single end-to-end 3D convolutional network. SSCnet uses only depth maps as inp… ▽ More

    Submitted 13 February, 2018; originally announced February 2018.

    Comments: 5 pages, 2 figures

  41. arXiv:1708.07218  [pdf

    cs.SD

    Object-Based Audio Rendering

    Authors: Philip Jackson, Filippo Fazi, Frank Melchior, Trevor Cox, Adrian Hilton, Chris Pike, Jon Francombe, Andreas Franck, Philip Coleman, Dylan Menzies-Gow, James Woodcock, Yan Tang, Qingju Liu, Rick Hughes, Marcos Simon Galvez, Teo de Campos, Hansung Kim, Hanne Stenzel

    Abstract: Apparatus and methods are disclosed for performing object-based audio rendering on a plurality of audio objects which define a sound scene, each audio object comprising at least one audio signal and associated metadata. The apparatus comprises: a plurality of renderers each capable of rendering one or more of the audio objects to output rendered audio data; and object adapting means for adapting o… ▽ More

    Submitted 23 August, 2017; originally announced August 2017.

    Comments: This is a transcript of GB Patent Application No: GB1609316.3, filed in the UK by the University of Surrey on 23 May 2016. It describes an intelligent system for customising, personalising and perceptually monitoring the rendering of an object-based audio stream for an arbitrary connected system of loudspeakers to optimize the listening experience as the producer intended. 30 pages, 5 figures

  42. arXiv:1608.00571  [pdf

    cs.DC cs.OS cs.PL

    TREES: A CPU/GPU Task-Parallel Runtime with Explicit Epoch Synchronization

    Authors: Blake A. Hechtman, Andrew D. Hilton, Daniel J. Sorin

    Abstract: We have developed a task-parallel runtime system, called TREES, that is designed for high performance on CPU/GPU platforms. On platforms with multiple CPUs, Cilk's "work-first" principle underlies how task-parallel applications can achieve performance, but work-first is a poor fit for GPUs. We build upon work-first to create the "work-together" principle that addresses the specific strengths and w… ▽ More

    Submitted 1 August, 2016; originally announced August 2016.

  43. arXiv:1603.03381  [pdf, other

    cs.CV

    Temporally coherent 4D reconstruction of complex dynamic scenes

    Authors: Armin Mustafa, Hansung Kim, Jean-Yves Guillemaut, Adrian Hilton

    Abstract: This paper presents an approach for reconstruction of 4D temporally coherent models of complex dynamic scenes. No prior knowledge is required of scene structure or camera calibration allowing reconstruction from multiple moving cameras. Sparse-to-dense temporal correspondence is integrated with joint multi-view segmentation and reconstruction to obtain a complete 4D representation of static and dy… ▽ More

    Submitted 28 March, 2016; v1 submitted 10 March, 2016; originally announced March 2016.

    Comments: To appear in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016 . Video available at: https://www.youtube.com/watch?v=bm_P13_-DsQ

  44. arXiv:1509.09294  [pdf, other

    cs.CV

    General Dynamic Scene Reconstruction from Multiple View Video

    Authors: Armin Mustafa, Hansung Kim, Jean-Yves Guillemaut, Adrian Hilton

    Abstract: This paper introduces a general approach to dynamic scene reconstruction from multiple moving cameras without prior knowledge or limiting constraints on the scene structure, appearance, or illumination. Existing techniques for dynamic scene reconstruction from multiple wide-baseline camera views primarily focus on accurate reconstruction in controlled environments, where the cameras are fixed and… ▽ More

    Submitted 30 September, 2015; originally announced September 2015.