Skip to main content

Showing 1–46 of 46 results for author: Okutomi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.05193  [pdf, ps, other

    eess.IV cs.CV

    RAM-W600: A Multi-Task Wrist Dataset and Benchmark for Rheumatoid Arthritis

    Authors: Songxiao Yang, Haolin Wang, Yao Fu, Ye Tian, Tamotsu Kamishima, Masayuki Ikebe, Yafei Ou, Masatoshi Okutomi

    Abstract: Rheumatoid arthritis (RA) is a common autoimmune disease that has been the focus of research in computer-aided diagnosis (CAD) and disease monitoring. In clinical settings, conventional radiography (CR) is widely used for the screening and evaluation of RA due to its low cost and accessibility. The wrist is a critical region for the diagnosis of RA. However, CAD research in this area remains limit… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2503.14219  [pdf, other

    cs.CV eess.IV

    Segmentation-Guided Neural Radiance Fields for Novel Street View Synthesis

    Authors: Yizhou Li, Yusuke Monno, Masatoshi Okutomi, Yuuichi Tanaka, Seiichi Kataoka, Teruaki Kosiba

    Abstract: Recent advances in Neural Radiance Fields (NeRF) have shown great potential in 3D reconstruction and novel view synthesis, particularly for indoor and small-scale scenes. However, extending NeRF to large-scale outdoor environments presents challenges such as transient objects, sparse cameras and textures, and varying lighting conditions. In this paper, we propose a segmentation-guided enhancement… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

    Comments: Presented at VISAPP2025. Project page: http://www.ok.sc.e.titech.ac.jp/res/NVS/index.html

  3. arXiv:2501.02269  [pdf, other

    cs.CV

    TDM: Temporally-Consistent Diffusion Model for All-in-One Real-World Video Restoration

    Authors: Yizhou Li, Zihua Liu, Yusuke Monno, Masatoshi Okutomi

    Abstract: In this paper, we propose the first diffusion-based all-in-one video restoration method that utilizes the power of a pre-trained Stable Diffusion and a fine-tuned ControlNet. Our method can restore various types of video degradation with a single unified model, overcoming the limitation of standard methods that require specific models for each restoration task. Our contributions include an efficie… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

    Comments: MMM2025

  4. arXiv:2409.00665  [pdf, other

    cs.CV

    Disparity Estimation Using a Quad-Pixel Sensor

    Authors: Zhuofeng Wu, Doehyung Lee, Zihua Liu, Kazunori Yoshizaki, Yusuke Monno, Masatoshi Okutomi

    Abstract: A quad-pixel (QP) sensor is increasingly integrated into commercial mobile cameras. The QP sensor has a unit of 2$\times$2 four photodiodes under a single microlens, generating multi-directional phase shifting when out-focus blurs occur. Similar to a dual-pixel (DP) sensor, the phase shifting can be regarded as stereo disparity and utilized for depth estimation. Based on this, we propose a QP disp… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

  5. arXiv:2405.18863  [pdf, other

    cs.CV

    Neural Radiance Fields for Novel View Synthesis in Monocular Gastroscopy

    Authors: Zijie Jiang, Yusuke Monno, Masatoshi Okutomi, Sho Suzuki, Kenji Miki

    Abstract: Enabling the synthesis of arbitrarily novel viewpoint images within a patient's stomach from pre-captured monocular gastroscopic images is a promising topic in stomach diagnosis. Typical methods to achieve this objective integrate traditional 3D reconstruction techniques, including structure-from-motion (SfM) and Poisson surface reconstruction. These methods produce explicit 3D representations, su… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted for EMBC 2024

  6. arXiv:2404.00149  [pdf, other

    cs.CV

    VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection

    Authors: Zihua Liu, Hiroki Sakuma, Masatoshi Okutomi

    Abstract: Monocular 3D object detection poses a significant challenge in 3D scene understanding due to its inherently ill-posed nature in monocular depth estimation. Existing methods heavily rely on supervised learning using abundant 3D labels, typically obtained through expensive and labor-intensive annotation on LiDAR point clouds. To tackle this problem, we propose a novel weakly supervised 3D object det… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

    Comments: CVPR 2024

  7. arXiv:2402.18181  [pdf, other

    cs.CV

    CFDNet: A Generalizable Foggy Stereo Matching Network with Contrastive Feature Distillation

    Authors: Zihua Liu, Yizhou Li, Masatoshi Okutomi

    Abstract: Stereo matching under foggy scenes remains a challenging task since the scattering effect degrades the visibility and results in less distinctive features for dense correspondence matching. While some previous learning-based methods integrated a physical scattering function for simultaneous stereo-matching and dehazing, simply removing fog might not aid depth estimation because the fog itself can… ▽ More

    Submitted 29 February, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Journal ref: IEEE International Conference on Robotics and Automation (ICRA2024)

  8. arXiv:2402.18178  [pdf, other

    cs.CV

    Reflection Removal Using Recurrent Polarization-to-Polarization Network

    Authors: Wenjiao Bian, Yusuke Monno, Masatoshi Okutomi

    Abstract: This paper addresses reflection removal, which is the task of separating reflection components from a captured image and deriving the image with only transmission components. Considering that the existence of the reflection changes the polarization state of a scene, some existing methods have exploited polarized images for reflection removal. While these methods apply polarized images as the input… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Journal ref: ICASSP 2024

  9. arXiv:2402.18175  [pdf, other

    cs.CV eess.IV

    Self-Supervised Spatially Variant PSF Estimation for Aberration-Aware Depth-from-Defocus

    Authors: Zhuofeng Wu, Yusuke Monno, Masatoshi Okutomi

    Abstract: In this paper, we address the task of aberration-aware depth-from-defocus (DfD), which takes account of spatially variant point spread functions (PSFs) of a real camera. To effectively obtain the spatially variant PSFs of a real camera without requiring any ground-truth PSFs, we propose a novel self-supervised learning method that leverages the pair of real sharp and blurred images, which can be e… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Journal ref: International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

  10. Digging Into Normal Incorporated Stereo Matching

    Authors: Zihua Liu, Songyan Zhang, Zhicheng Wang, Masatoshi Okutomi

    Abstract: Despite the remarkable progress facilitated by learning-based stereo-matching algorithms, disparity estimation in low-texture, occluded, and bordered regions still remains a bottleneck that limits the performance. To tackle these challenges, geometric guidance like plane information is necessary as it provides intuitive guidance about disparity consistency and affinity similarity. In this paper, w… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Journal ref: Proceedings of the 30th ACM International Conference on Multimedia (ACMMM2022), pp.6050-6060, October 2022

  11. arXiv:2312.14650  [pdf, other

    cs.CV

    Global Occlusion-Aware Transformer for Robust Stereo Matching

    Authors: Zihua Liu, Yizhou Li, Masatoshi Okutomi

    Abstract: Despite the remarkable progress facilitated by learning-based stereo-matching algorithms, the performance in the ill-conditioned regions, such as the occluded regions, remains a bottleneck. Due to the limited receptive field, existing CNN-based methods struggle to handle these ill-conditioned regions effectively. To address this issue, this paper introduces a novel attention-based stereo-matching… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Journal ref: Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV2024)

  12. arXiv:2311.07600  [pdf, other

    cs.CV

    Polarimetric PatchMatch Multi-View Stereo

    Authors: Jinyu Zhao, Jumpei Oishi, Yusuke Monno, Masatoshi Okutomi

    Abstract: PatchMatch Multi-View Stereo (PatchMatch MVS) is one of the popular MVS approaches, owing to its balanced accuracy and efficiency. In this paper, we propose Polarimetric PatchMatch multi-view Stereo (PolarPMS), which is the first method exploiting polarization cues to PatchMatch MVS. The key of PatchMatch MVS is to generate depth and normal hypotheses, which form local 3D planes and slanted stereo… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  13. arXiv:2309.01296  [pdf, other

    cs.CV

    EMR-MSF: Self-Supervised Recurrent Monocular Scene Flow Exploiting Ego-Motion Rigidity

    Authors: Zijie Jiang, Masatoshi Okutomi

    Abstract: Self-supervised monocular scene flow estimation, aiming to understand both 3D structures and 3D motions from two temporally consecutive monocular images, has received increasing attention for its simple and economical sensor setup. However, the accuracy of current methods suffers from the bottleneck of less-efficient network architecture and lack of motion rigidity for regularization. In this pape… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: To appear at ICCV 2023

  14. Polarimetric Multi-View Inverse Rendering

    Authors: Jinyu Zhao, Yusuke Monno, Masatoshi Okutomi

    Abstract: A polarization camera has great potential for 3D reconstruction since the angle of polarization (AoP) and the degree of polarization (DoP) of reflected light are related to an object's surface normal. In this paper, we propose a novel 3D reconstruction method called Polarimetric Multi-View Inverse Rendering (Polarimetric MVIR) that effectively exploits geometric, photometric, and polarimetric cues… ▽ More

    Submitted 24 December, 2022; originally announced December 2022.

    Comments: Paper accepted in IEEE Transactions on Pattern Analysis and Machine Intelligence (2022). arXiv admin note: substantial text overlap with arXiv:2007.08830

  15. arXiv:2210.13321  [pdf, other

    cs.CV

    Dual-Pixel Raindrop Removal

    Authors: Yizhou Li, Yusuke Monno, Masatoshi Okutomi

    Abstract: Removing raindrops in images has been addressed as a significant task for various computer vision applications. In this paper, we propose the first method using a Dual-Pixel (DP) sensor to better address the raindrop removal. Our key observation is that raindrops attached to a glass window yield noticeable disparities in DP's left-half and right-half images, while almost no disparity exists for in… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

    Comments: Accepted by BMVC2022 (Oral)

  16. arXiv:2209.06027  [pdf, other

    eess.IV cs.CV

    Two-Step Color-Polarization Demosaicking Network

    Authors: Vy Nguyen, Masayuki Tanaka, Yusuke Monno, Masatoshi Okutomi

    Abstract: Polarization information of light in a scene is valuable for various image processing and computer vision tasks. A division-of-focal-plane polarimeter is a promising approach to capture the polarization images of different orientations in one shot, while it requires color-polarization demosaicking. In this paper, we propose a two-step color-polarization demosaicking network~(TCPDNet), which consis… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: Accepted in ICIP2022. Project page: http://www.ok.sc.e.titech.ac.jp/res/PolarDem/TCPDNet.html

  17. arXiv:2204.03929  [pdf, other

    cs.CV

    Deep Hyperspectral-Depth Reconstruction Using Single Color-Dot Projection

    Authors: Chunyu Li, Yusuke Monno, Masatoshi Okutomi

    Abstract: Depth reconstruction and hyperspectral reflectance reconstruction are two active research topics in computer vision and image processing. Conventionally, these two topics have been studied separately using independent imaging setups and there is no existing method which can acquire depth and spectral reflectance simultaneously in one shot without using special hardware. In this paper, we propose a… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Accepted by CVPR 2022. Project homepage: http://www.ok.sc.e.titech.ac.jp/res/DHD/

  18. arXiv:2203.01557  [pdf, other

    cs.CV

    Self-Supervised Ego-Motion Estimation Based on Multi-Layer Fusion of RGB and Inferred Depth

    Authors: Zijie Jiang, Hajime Taira, Naoyuki Miyashita, Masatoshi Okutomi

    Abstract: In existing self-supervised depth and ego-motion estimation methods, ego-motion estimation is usually limited to only leveraging RGB information. Recently, several methods have been proposed to further improve the accuracy of self-supervised ego-motion estimation by fusing information from other modalities, e.g., depth, acceleration, and angular velocity. However, they rarely focus on how differen… ▽ More

    Submitted 3 March, 2022; originally announced March 2022.

    Comments: Accepted to ICRA 2022. Code will be available at https://github.com/Beniko95J/MLF-VO

  19. arXiv:2111.03615  [pdf, other

    cs.CV eess.IV

    Single Image Deraining Network with Rain Embedding Consistency and Layered LSTM

    Authors: Yizhou Li, Yusuke Monno, Masatoshi Okutomi

    Abstract: Single image deraining is typically addressed as residual learning to predict the rain layer from an input rainy image. For this purpose, an encoder-decoder network draws wide attention, where the encoder is required to encode a high-quality rain embedding which determines the performance of the subsequent decoding stage to reconstruct the rain layer. However, most of existing studies ignore the s… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: Accepted by WACV2022, January 2022

  20. arXiv:2107.13263  [pdf, other

    cs.CV

    Learning-Based Depth and Pose Estimation for Monocular Endoscope with Loss Generalization

    Authors: Aji Resindra Widya, Yusuke Monno, Masatoshi Okutomi, Sho Suzuki, Takuji Gotoda, Kenji Miki

    Abstract: Gastroendoscopy has been a clinical standard for diagnosing and treating conditions that affect a part of a patient's digestive system, such as the stomach. Despite the fact that gastroendoscopy has a lot of advantages for patients, there exist some challenges for practitioners, such as the lack of 3D perception, including the depth and the endoscope pose information. Such challenges make navigati… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

    Comments: Accepted for EMBC 2021

  21. arXiv:2107.11196  [pdf, other

    cs.CV

    Multi-Modal Pedestrian Detection with Large Misalignment Based on Modal-Wise Regression and Multi-Modal IoU

    Authors: Napat Wanchaitanawong, Masayuki Tanaka, Takashi Shibata, Masatoshi Okutomi

    Abstract: The combined use of multiple modalities enables accurate pedestrian detection under poor lighting conditions by using the high visibility areas from these modalities together. The vital assumption for the combination use is that there is no or only a weak misalignment between the two modalities. In general, however, this assumption often breaks in actual situations. Due to this assumption's breakd… ▽ More

    Submitted 23 July, 2021; originally announced July 2021.

    Comments: Accepted by MVA2021

  22. arXiv:2107.10524  [pdf, other

    cs.CV

    Geometric Data Augmentation Based on Feature Map Ensemble

    Authors: Takashi Shibata, Masayuki Tanaka, Masatoshi Okutomi

    Abstract: Deep convolutional networks have become the mainstream in computer vision applications. Although CNNs have been successful in many computer vision tasks, it is not free from drawbacks. The performance of CNN is dramatically degraded by geometric transformation, such as large rotations. In this paper, we propose a novel CNN architecture that can improve the robustness against geometric transformati… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

    Comments: Accepted to ICIP2021

  23. arXiv:2107.03068  [pdf, other

    cs.CV

    Video-Based Camera Localization Using Anchor View Detection and Recursive 3D Reconstruction

    Authors: Hajime Taira, Koki Onbe, Naoyuki Miyashita, Masatoshi Okutomi

    Abstract: In this paper we introduce a new camera localization strategy designed for image sequences captured in challenging industrial situations such as industrial parts inspection. To deal with peculiar appearances that hurt standard 3D reconstruction pipeline, we exploit pre-knowledge of the scene by selecting key frames in the sequence (called as anchors) which are roughly connected to a certain locati… ▽ More

    Submitted 7 July, 2021; originally announced July 2021.

    Comments: This paper have been accepted and will be appeared in the proceedings of 17th International Conference on Machine Vision Applications (MVA2021)

  24. arXiv:2104.07308  [pdf, other

    cs.CV

    Spectral MVIR: Joint Reconstruction of 3D Shape and Spectral Reflectance

    Authors: Chunyu Li, Yusuke Monno, Masatoshi Okutomi

    Abstract: Reconstructing an object's high-quality 3D shape with inherent spectral reflectance property, beyond typical device-dependent RGB albedos, opens the door to applications requiring a high-fidelity 3D model in terms of both geometry and photometry. In this paper, we propose a novel Multi-View Inverse Rendering (MVIR) method called Spectral MVIR for jointly reconstructing the 3D shape and the spectra… ▽ More

    Submitted 15 April, 2021; originally announced April 2021.

    Comments: Accepted by ICCP 2021. Project homepage: http://www.ok.sc.e.titech.ac.jp/res/MVIR/smvir.html

  25. arXiv:2101.09657  [pdf, other

    cs.CV

    VIO-Aided Structure from Motion Under Challenging Environments

    Authors: Zijie Jiang, Hajime Taira, Naoyuki Miyashita, Masatoshi Okutomi

    Abstract: In this paper, we present a robust and efficient Structure from Motion pipeline for accurate 3D reconstruction under challenging environments by leveraging the camera pose information from a visual-inertial odometry. Specifically, we propose a geometric verification method to filter out mismatches by considering the prior geometric configuration of candidate image pairs. Furthermore, we introduce… ▽ More

    Submitted 26 January, 2021; v1 submitted 24 January, 2021; originally announced January 2021.

    Comments: This manuscript was accepted and presented in the 22th IEEE International Conference on Industrial Technology (ICIT2021)

  26. arXiv:2012.10083  [pdf, other

    eess.IV cs.CV

    Spectral Reflectance Estimation Using Projector with Unknown Spectral Power Distribution

    Authors: Hironori Hidaka, Yusuke Monno, Masatoshi Okutomi

    Abstract: A lighting-based multispectral imaging system using an RGB camera and a projector is one of the most practical and low-cost systems to acquire multispectral observations for estimating the scene's spectral reflectance information. However, existing projector-based systems assume that the spectral power distribution (SPD) of each projector primary is known, which requires additional equipment such… ▽ More

    Submitted 18 December, 2020; originally announced December 2020.

    Comments: Presented at CIC2020. Projector's SPD data is available at http://www.ok.sc.e.titech.ac.jp/res/PCSSfM/pro-cam_reflectance.html

  27. arXiv:2011.10232  [pdf, other

    cs.CV cs.GR eess.IV

    Deep Snapshot HDR Imaging Using Multi-Exposure Color Filter Array

    Authors: Takeru Suda, Masayuki Tanaka, Yusuke Monno, Masatoshi Okutomi

    Abstract: In this paper, we propose a deep snapshot high dynamic range (HDR) imaging framework that can effectively reconstruct an HDR image from the RAW data captured using a multi-exposure color filter array (ME-CFA), which consists of a mosaic pattern of RGB filters with different exposure levels. To effectively learn the HDR image reconstruction network, we introduce the idea of luminance normalization… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

    Comments: Accepted at ACCV2020 (Oral). Project page: http://www.ok.sc.e.titech.ac.jp/res/DSHDR/

  28. arXiv:2011.06788  [pdf, other

    cs.CV

    Adaptive Future Frame Prediction with Ensemble Network

    Authors: Wonjik Kim, Masayuki Tanaka, Masatoshi Okutomi, Yoko Sasaki

    Abstract: Future frame prediction in videos is a challenging problem because videos include complicated movements and large appearance changes. Learning-based future frame prediction approaches have been proposed in kinds of literature. A common limitation of the existing learning-based approaches is a mismatch of training data and test data. In the future frame prediction task, we can obtain the ground tru… ▽ More

    Submitted 15 November, 2020; v1 submitted 13 November, 2020; originally announced November 2020.

    Comments: Accepted at 25th International Conference on Pattern Recognition Workshop (ICPRW 2020)

  29. arXiv:2010.08092  [pdf, other

    cs.CV

    Human Segmentation with Dynamic LiDAR Data

    Authors: Tao Zhong, Wonjik Kim, Masayuki Tanaka, Masatoshi Okutomi

    Abstract: Consecutive LiDAR scans compose dynamic 3D sequences, which contain more abundant information than a single frame. Similar to the development history of image and video perception, dynamic 3D sequence perception starts to come into sight after inspiring research on static 3D data perception. This work proposes a spatio-temporal neural network for human segmentation with the dynamic LiDAR point clo… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

  30. arXiv:2007.14292  [pdf, other

    eess.IV cs.CV

    Monochrome and Color Polarization Demosaicking Using Edge-Aware Residual Interpolation

    Authors: Miki Morimatsu, Yusuke Monno, Masayuki Tanaka, Masatoshi Okutomi

    Abstract: A division-of-focal-plane or microgrid image polarimeter enables us to acquire a set of polarization images in one shot. Since the polarimeter consists of an image sensor equipped with a monochrome or color polarization filter array (MPFA or CPFA), the demosaicking process to interpolate missing pixel values plays a crucial role in obtaining high-quality polarization images. In this paper, we prop… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

    Comments: Accepted in ICIP2020. Dataset and code are available at http://www.ok.sc.e.titech.ac.jp/res/PolarDem/index.html

  31. arXiv:2007.08830  [pdf, other

    cs.CV

    Polarimetric Multi-View Inverse Rendering

    Authors: Jinyu Zhao, Yusuke Monno, Masatoshi Okutomi

    Abstract: A polarization camera has great potential for 3D reconstruction since the angle of polarization (AoP) of reflected light is related to an object's surface normal. In this paper, we propose a novel 3D reconstruction method called Polarimetric Multi-View Inverse Rendering (Polarimetric MVIR) that effectively exploits geometric, photometric, and polarimetric cues extracted from input multi-view color… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Paper accepted in ECCV 2020

  32. arXiv:2006.10383  [pdf, other

    cs.CV

    3D Pipe Network Reconstruction Based on Structure from Motion with Incremental Conic Shape Detection and Cylindrical Constraint

    Authors: Sho kagami, Hajime Taira, Naoyuki Miyashita, Akihiko Torii, Masatoshi Okutomi

    Abstract: Pipe inspection is a critical task for many industries and infrastructure of a city. The 3D information of a pipe can be used for revealing the deformation of the pipe surface and position of the camera during the inspection. In this paper, we propose a 3D pipe reconstruction system using sequential images captured by a monocular endoscopic camera. Our work extends a state-of-the-art incremental S… ▽ More

    Submitted 3 July, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: This manuscript was accepted and presented in the 29th IEEE International Symposium on Industrial Electronics (ISIE2020)

  33. arXiv:2006.08145  [pdf, other

    cs.CV eess.IV

    Classifying degraded images over various levels of degradation

    Authors: Kazuki Endo, Masayuki Tanaka, Masatoshi Okutomi

    Abstract: Classification for degraded images having various levels of degradation is very important in practical applications. This paper proposes a convolutional neural network to classify degraded images by using a restoration network and an ensemble learning. The results demonstrate that the proposed network can classify degraded images over various levels of degradation well. This paper also reveals how… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: Accepted by the 27th IEEE International Conference on Image Processing (ICIP 2020)

  34. arXiv:2004.12288  [pdf, other

    cs.CV

    Stomach 3D Reconstruction Based on Virtual Chromoendoscopic Image Generation

    Authors: Aji Resindra Widya, Yusuke Monno, Masatoshi Okutomi, Sho Suzuki, Takuji Gotoda, Kenji Miki

    Abstract: Gastric endoscopy is a standard clinical process that enables medical practitioners to diagnose various lesions inside a patient's stomach. If any lesion is found, it is very important to perceive the location of the lesion relative to the global view of the stomach. Our previous research showed that this could be addressed by reconstructing the whole stomach shape from chromoendoscopic images usi… ▽ More

    Submitted 26 April, 2020; originally announced April 2020.

    Comments: Accepted for main conference in EMBC 2020

  35. arXiv:2003.05093  [pdf, other

    cs.CV

    Learning-Based Human Segmentation and Velocity Estimation Using Automatic Labeled LiDAR Sequence for Training

    Authors: Wonjik Kim, Masayuki Tanaka, Masatoshi Okutomi, Yoko Sasaki

    Abstract: In this paper, we propose an automatic labeled sequential data generation pipeline for human segmentation and velocity estimation with point clouds. Considering the impact of deep neural networks, state-of-the-art network architectures have been proposed for human recognition using point clouds captured by Light Detection and Ranging (LiDAR). However, one disadvantage is that legacy datasets may o… ▽ More

    Submitted 10 March, 2020; originally announced March 2020.

    Comments: Please check the following URL for more information. http://www.ok.sc.e.titech.ac.jp/res/LHD/

  36. arXiv:1908.08185  [pdf, other

    cs.CV cs.GR eess.IV

    Pro-Cam SSfM: Projector-Camera System for Structure and Spectral Reflectance from Motion

    Authors: Chunyu Li, Yusuke Monno, Hironori Hidaka, Masatoshi Okutomi

    Abstract: In this paper, we propose a novel projector-camera system for practical and low-cost acquisition of a dense object 3D model with the spectral reflectance property. In our system, we use a standard RGB camera and leverage an off-the-shelf projector as active illumination for both the 3D reconstruction and the spectral reflectance estimation. We first reconstruct the 3D points while estimating the p… ▽ More

    Submitted 21 August, 2019; originally announced August 2019.

    Comments: Accepted by ICCV 2019. Project homepage: http://www.ok.sc.e.titech.ac.jp/res/PCSSfM/

  37. arXiv:1908.04598  [pdf, other

    cs.CV

    Is This The Right Place? Geometric-Semantic Pose Verification for Indoor Visual Localization

    Authors: Hajime Taira, Ignacio Rocco, Jiri Sedlar, Masatoshi Okutomi, Josef Sivic, Tomas Pajdla, Torsten Sattler, Akihiko Torii

    Abstract: Visual localization in large and complex indoor scenes, dominated by weakly textured rooms and repeating geometric patterns, is a challenging problem with high practical relevance for applications such as Augmented Reality and robotics. To handle the ambiguities arising in this scenario, a common strategy is, first, to generate multiple estimates for the camera pose from which a given query image… ▽ More

    Submitted 2 September, 2019; v1 submitted 13 August, 2019; originally announced August 2019.

  38. arXiv:1905.12988  [pdf, other

    cs.CV eess.IV

    3D Reconstruction of Whole Stomach from Endoscope Video Using Structure-from-Motion

    Authors: Aji Resindra Widya, Yusuke Monno, Kosuke Imahori, Masatoshi Okutomi, Sho Suzuki, Takuji Gotoda, Kenji Miki

    Abstract: Gastric endoscopy is a common clinical practice that enables medical doctors to diagnose the stomach inside a body. In order to identify a gastric lesion's location such as early gastric cancer within the stomach, this work addressed to reconstruct the 3D shape of a whole stomach with color texture information generated from a standard monocular endoscope video. Previous works have tried to recons… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

    Comments: 5 pages, 4 figures, accepted in EMBC 2019

  39. arXiv:1903.05501  [pdf, other

    cs.LG cs.AI stat.ML

    Improving Transparency of Deep Neural Inference Process

    Authors: Hiroshi Kuwajima, Masayuki Tanaka, Masatoshi Okutomi

    Abstract: Deep learning techniques are rapidly advanced recently, and becoming a necessity component for widespread systems. However, the inference process of deep learning is black-box, and not very suitable to safety-critical systems which must exhibit high transparency. In this paper, to address this black-box limitation, we develop a simple analysis method which consists of 1) structural feature analysi… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: 11 pages, 14 figures, 1 table. This is a pre-print of an article accepted in "Progress in Artificial Intelligence" on 26 Feb 2019. The final authenticated version will be available online soon

  40. arXiv:1902.05341  [pdf

    cs.CV

    Automatic Labeled LiDAR Data Generation based on Precise Human Model

    Authors: Wonjik Kim, Masayuki Tanaka, Masatoshi Okutomi, Yoko Sasaki

    Abstract: Following improvements in deep neural networks, state-of-the-art networks have been proposed for human recognition using point clouds captured by LiDAR. However, the performance of these networks strongly depends on the training data. An issue with collecting training data is labeling. Labeling by humans is necessary to obtain the ground truth label; however, labeling requires huge costs. Therefor… ▽ More

    Submitted 14 February, 2019; originally announced February 2019.

    Comments: Accepted at ICRA2019

  41. arXiv:1812.09629  [pdf, other

    cs.CV

    Estimation and Restoration of Compositional Degradation Using Convolutional Neural Networks

    Authors: Kazutaka Uchida, Masayuki Tanaka, Masatoshi Okutomi

    Abstract: Image restoration from a single image degradation type, such as blurring, hazing, random noise, and compression has been investigated for decades. However, image degradations in practice are often a mixture of several types of degradation. Such compositional degradations complicate restoration because they require the differentiation of different degradation types and levels. In this paper, we pro… ▽ More

    Submitted 22 December, 2018; originally announced December 2018.

  42. arXiv:1809.09297  [pdf, other

    cs.CV

    Gradient-Based Low-Light Image Enhancement

    Authors: Masayuki Tanaka, Takashi Shibata, Masatoshi Okutomi

    Abstract: A low-light image enhancement is a highly demanded image processing technique, especially for consumer digital cameras and cameras on mobile phones. In this paper, a gradient-based low-light image enhancement algorithm is proposed. The key is to enhance the gradients of dark region, because the gradients are more sensitive for human visual system than absolute values. In addition, we involve the i… ▽ More

    Submitted 24 September, 2018; originally announced September 2018.

  43. arXiv:1809.03757  [pdf, other

    cs.CV

    Non-blind Image Restoration Based on Convolutional Neural Network

    Authors: Kazutaka Uchida, Masayuki Tanaka, Masatoshi Okutomi

    Abstract: Blind image restoration processors based on convolutional neural network (CNN) are intensively researched because of their high performance. However, they are too sensitive to the perturbation of the degradation model. They easily fail to restore the image whose degradation model is slightly different from the trained degradation model. In this paper, we propose a non-blind CNN-based image restora… ▽ More

    Submitted 11 September, 2018; originally announced September 2018.

    Comments: Accepted by IEEE 7th Global Conference on Consumer Electronics, 2018

  44. arXiv:1805.03879  [pdf, other

    cs.CV

    Structure-from-Motion using Dense CNN Features with Keypoint Relocalization

    Authors: Aji Resindra Widya, Akihiko Torii, Masatoshi Okutomi

    Abstract: Structure from Motion (SfM) using imagery that involves extreme appearance changes is yet a challenging task due to a loss of feature repeatability. Using feature correspondences obtained by matching densely extracted convolutional neural network (CNN) features significantly improves the SfM reconstruction capability. However, the reconstruction accuracy is limited by the spatial resolution of the… ▽ More

    Submitted 10 May, 2018; v1 submitted 10 May, 2018; originally announced May 2018.

  45. arXiv:1803.10368  [pdf, other

    cs.CV

    InLoc: Indoor Visual Localization with Dense Matching and View Synthesis

    Authors: Hajime Taira, Masatoshi Okutomi, Torsten Sattler, Mircea Cimpoi, Marc Pollefeys, Josef Sivic, Tomas Pajdla, Akihiko Torii

    Abstract: We seek to predict the 6 degree-of-freedom (6DoF) pose of a query photograph with respect to a large indoor 3D map. The contributions of this work are three-fold. First, we develop a new large-scale visual localization method targeted for indoor environments. The method proceeds along three steps: (i) efficient retrieval of candidate poses that ensures scalability to large-scale environments, (ii)… ▽ More

    Submitted 8 April, 2018; v1 submitted 27 March, 2018; originally announced March 2018.

  46. arXiv:1707.09092  [pdf, ps, other

    cs.CV

    Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions

    Authors: Torsten Sattler, Will Maddern, Carl Toft, Akihiko Torii, Lars Hammarstrand, Erik Stenborg, Daniel Safari, Masatoshi Okutomi, Marc Pollefeys, Josef Sivic, Fredrik Kahl, Tomas Pajdla

    Abstract: Visual localization enables autonomous vehicles to navigate in their surroundings and augmented reality applications to link virtual to real worlds. Practical visual localization approaches need to be robust to a wide variety of viewing condition, including day-night changes, as well as weather and seasonal variations, while providing highly accurate 6 degree-of-freedom (6DOF) camera pose estimate… ▽ More

    Submitted 4 April, 2018; v1 submitted 27 July, 2017; originally announced July 2017.

    Comments: Accepted to CVPR 2018 as a spotlight