Skip to main content

Showing 1–20 of 20 results for author: Kanamori, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.00475  [pdf, ps, other

    cs.SD eess.AS

    AudioBERTScore: Objective Evaluation of Environmental Sound Synthesis Based on Similarity of Audio embedding Sequences

    Authors: Minoru Kishi, Ryosuke Sakai, Shinnosuke Takamichi, Yusuke Kanamori, Yuki Okamoto

    Abstract: We propose a novel objective evaluation metric for synthesized audio in text-to-audio (TTA), aiming to improve the performance of TTA models. In TTA, subjective evaluation of the synthesized sound is an important, but its implementation requires monetary costs. Therefore, objective evaluation such as mel-cepstral distortion are used, but the correlation between these objective metrics and subjecti… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

  2. arXiv:2506.23582  [pdf, ps, other

    cs.SD eess.AS

    RELATE: Subjective evaluation dataset for automatic evaluation of relevance between text and audio

    Authors: Yusuke Kanamori, Yuki Okamoto, Taisei Takano, Shinnosuke Takamichi, Yuki Saito, Hiroshi Saruwatari

    Abstract: In text-to-audio (TTA) research, the relevance between input text and output audio is an important evaluation aspect. Traditionally, it has been evaluated from both subjective and objective perspectives. However, subjective evaluation is costly in terms of money and time, and objective evaluation is unclear regarding the correlation to subjective evaluation scores. In this study, we construct RELA… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

    Comments: Accepted to INTERSPEECH2025

  3. arXiv:2506.23553  [pdf, ps, other

    eess.AS cs.SD

    Human-CLAP: Human-perception-based contrastive language-audio pretraining

    Authors: Taisei Takano, Yuki Okamoto, Yusuke Kanamori, Yuki Saito, Ryotaro Nagase, Hiroshi Saruwatari

    Abstract: Contrastive language-audio pretraining (CLAP) is widely used for audio generation and recognition tasks. For example, CLAPScore, which utilizes the similarity of CLAP embeddings, has been a major metric for the evaluation of the relevance between audio and text in text-to-audio. However, the relationship between CLAPScore and human subjective evaluation scores is still unclarified. We show that CL… ▽ More

    Submitted 30 June, 2025; originally announced June 2025.

  4. arXiv:2505.04052  [pdf, ps, other

    cs.GR cs.CV

    Person-In-Situ: Scene-Consistent Human Image Insertion with Occlusion-Aware Pose Control

    Authors: Shun Masuda, Yuki Endo, Yoshihiro Kanamori

    Abstract: Compositing human figures into scene images has broad applications in areas such as entertainment and advertising. However, existing methods often cannot handle occlusion of the inserted person by foreground objects and unnaturally place the person in the frontmost layer. Moreover, they offer limited control over the inserted person's pose. To address these challenges, we propose two methods. Both… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  5. arXiv:2505.04050  [pdf, ps, other

    cs.GR cs.CV

    TerraFusion: Joint Generation of Terrain Geometry and Texture Using Latent Diffusion Models

    Authors: Kazuki Higo, Toshiki Kanai, Yuki Endo, Yoshihiro Kanamori

    Abstract: 3D terrain models are essential in fields such as video game development and film production. Since surface color often correlates with terrain geometry, capturing this relationship is crucial to achieving realism. However, most existing methods generate either a heightmap or a texture, without sufficiently accounting for the inherent correlation. In this paper, we propose a method that jointly ge… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  6. arXiv:2503.17197  [pdf, other

    cs.CV

    FreeUV: Ground-Truth-Free Realistic Facial UV Texture Recovery via Cross-Assembly Inference Strategy

    Authors: Xingchao Yang, Takafumi Taketomi, Yuki Endo, Yoshihiro Kanamori

    Abstract: Recovering high-quality 3D facial textures from single-view 2D images is a challenging task, especially under constraints of limited data and complex facial details such as makeup, wrinkles, and occlusions. In this paper, we introduce FreeUV, a novel ground-truth-free UV texture recovery framework that eliminates the need for annotated or synthetic UV data. FreeUV leverages pre-trained stable diff… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: CVPR 2025. Project: https://yangxingchao.github.io/FreeUV-page/

  7. arXiv:2502.13987  [pdf, other

    cs.GR eess.IV

    SelfAge: Personalized Facial Age Transformation Using Self-reference Images

    Authors: Taishi Ito, Yuki Endo, Yoshihiro Kanamori

    Abstract: Age transformation of facial images is a technique that edits age-related person's appearances while preserving the identity. Existing deep learning-based methods can reproduce natural age transformations; however, they only reproduce averaged transitions and fail to account for individual-specific appearances influenced by their life histories. In this paper, we propose the first diffusion model-… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  8. arXiv:2411.00356  [pdf, other

    cs.GR cs.CV

    All-frequency Full-body Human Image Relighting

    Authors: Daichi Tajima, Yoshihiro Kanamori, Yuki Endo

    Abstract: Relighting of human images enables post-photography editing of lighting effects in portraits. The current mainstream approach uses neural networks to approximate lighting effects without explicitly accounting for the principle of physical shading. As a result, it often has difficulty representing high-frequency shadows and shading. In this paper, we propose a two-stage relighting method that can r… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: project page: [this URL](https://github.com/majita06/All-frequency_Full-body_Human_Image_Relighting)

  9. arXiv:2410.12242  [pdf, other

    cs.CV cs.GR

    EG-HumanNeRF: Efficient Generalizable Human NeRF Utilizing Human Prior for Sparse View

    Authors: Zhaorong Wang, Yoshihiro Kanamori, Yuki Endo

    Abstract: Generalizable neural radiance field (NeRF) enables neural-based digital human rendering without per-scene retraining. When combined with human prior knowledge, high-quality human rendering can be achieved even with sparse input views. However, the inference of these methods is still slow, as a large number of neural network queries on each ray are required to ensure the rendering quality. Moreover… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: project page: https://github.com/LarsPh/EG-HumanNeRF

  10. arXiv:2405.16443  [pdf, other

    cs.CV cs.GR

    3D View Optimization for Improving Image Aesthetics

    Authors: Taichi Uchida, Yoshihiro Kanamori, Yuki Endo

    Abstract: Achieving aesthetically pleasing photography necessitates attention to multiple factors, including composition and capture conditions, which pose challenges to novices. Prior research has explored the enhancement of photo aesthetics post-capture through 2D manipulation techniques; however, these approaches offer limited search space for aesthetics. We introduce a pioneering method that employs 3D… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 10 pages

  11. arXiv:2403.17761  [pdf, other

    cs.CV cs.GR

    Makeup Prior Models for 3D Facial Makeup Estimation and Applications

    Authors: Xingchao Yang, Takafumi Taketomi, Yuki Endo, Yoshihiro Kanamori

    Abstract: In this work, we introduce two types of makeup prior models to extend existing 3D face prior models: PCA-based and StyleGAN2-based priors. The PCA-based prior model is a linear model that is easy to construct and is computationally efficient. However, it retains only low-frequency information. Conversely, the StyleGAN2-based model can represent high-frequency information with relatively higher com… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR2024. Project: https://yangxingchao.github.io/makeup-priors-page

  12. arXiv:2401.02804  [pdf, other

    cs.CV cs.GR

    DiffBody: Diffusion-based Pose and Shape Editing of Human Images

    Authors: Yuta Okuyama, Yuki Endo, Yoshihiro Kanamori

    Abstract: Pose and body shape editing in a human image has received increasing attention. However, current methods often struggle with dataset biases and deteriorate realism and the person's identity when users make large edits. We propose a one-shot approach that enables large edits with identity preservation. To enable large edits, we fit a 3D body model, project the input image onto the 3D model, and cha… ▽ More

    Submitted 7 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

    Comments: Accepted to WACV 2024, project page: https://www.cgg.cs.tsukuba.ac.jp/~okuyama/pub/diffbody/

  13. arXiv:2305.16759  [pdf, other

    cs.CV cs.GR

    StyleHumanCLIP: Text-guided Garment Manipulation for StyleGAN-Human

    Authors: Takato Yoshikawa, Yuki Endo, Yoshihiro Kanamori

    Abstract: This paper tackles text-guided control of StyleGAN for editing garments in full-body human images. Existing StyleGAN-based methods suffer from handling the rich diversity of garments and body shapes and poses. We propose a framework for text-guided full-body human image synthesis via an attention-based latent code mapper, which enables more disentangled control of StyleGAN than existing mappers. O… ▽ More

    Submitted 20 March, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: VISIAPP 2024, project page: https://www.cgg.cs.tsukuba.ac.jp/~yoshikawa/pub/style_human_clip/

  14. arXiv:2302.13279  [pdf, other

    cs.CV cs.GR

    Makeup Extraction of 3D Representation via Illumination-Aware Image Decomposition

    Authors: Xingchao Yang, Takafumi Taketomi, Yoshihiro Kanamori

    Abstract: Facial makeup enriches the beauty of not only real humans but also virtual characters; therefore, makeup for 3D facial models is highly in demand in productions. However, painting directly on 3D faces and capturing real-world makeup are costly, and extracting makeup from 2D images often struggles with shading effects and occlusions. This paper presents the first method for extracting makeup for 3D… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

    Comments: Eurographics 2023

  15. arXiv:2110.07272  [pdf, other

    cs.GR

    Relighting Humans in the Wild: Monocular Full-Body Human Relighting with Domain Adaptation

    Authors: Daichi Tajima, Yoshihiro Kanamori, Yuki Endo

    Abstract: The modern supervised approaches for human image relighting rely on training data generated from 3D human models. However, such datasets are often small (e.g., Light Stage data with a small number of individuals) or limited to diffuse materials (e.g., commercial 3D scanned human models). Thus, the human relighting techniques suffer from the poor generalization capability and synthetic-to-real doma… ▽ More

    Submitted 14 October, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: Accepted to Pacific Graphics 2021, project page: http://www.cgg.cs.tsukuba.ac.jp/~tajima/pub/relighting_in_the_wild/

  16. Diversifying Semantic Image Synthesis and Editing via Class- and Layer-wise VAEs

    Authors: Yuki Endo, Yoshihiro Kanamori

    Abstract: Semantic image synthesis is a process for generating photorealistic images from a single semantic mask. To enrich the diversity of multimodal image synthesis, previous methods have controlled the global appearance of an output image by learning a single latent space. However, a single latent code is often insufficient for capturing various object styles because object appearance depends on multipl… ▽ More

    Submitted 29 June, 2021; v1 submitted 25 June, 2021; originally announced June 2021.

    Comments: Accepted to Pacific Graphics 2020, codes available at https://github.com/endo-yuki-t/DiversifyingSMIS

  17. arXiv:2103.14877  [pdf, other

    cs.CV cs.GR

    Few-shot Semantic Image Synthesis Using StyleGAN Prior

    Authors: Yuki Endo, Yoshihiro Kanamori

    Abstract: This paper tackles a challenging problem of generating photorealistic images from semantic layouts in few-shot scenarios where annotated training pairs are hardly available but pixel-wise annotation is quite costly. We present a training strategy that performs pseudo labeling of semantic masks using the StyleGAN prior. Our key idea is to construct a simple mapping between the StyleGAN feature and… ▽ More

    Submitted 12 May, 2021; v1 submitted 27 March, 2021; originally announced March 2021.

    Comments: The source codes are available at https://github.com/endo-yuki-t/Fewshot-SMIS

  18. arXiv:1910.07192  [pdf, other

    cs.GR cs.CV

    Animating Landscape: Self-Supervised Learning of Decoupled Motion and Appearance for Single-Image Video Synthesis

    Authors: Yuki Endo, Yoshihiro Kanamori, Shigeru Kuriyama

    Abstract: Automatic generation of a high-quality video from a single image remains a challenging task despite the recent advances in deep generative models. This paper proposes a method that can create a high-resolution, long-term animation using convolutional neural networks (CNNs) from a single landscape image where we mainly focus on skies and waters. Our key observation is that the motion (e.g., moving… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

    Comments: Published at SIGGRAPH Asia 2019 (ACM Transactions on Graphics)

  19. Relighting Humans: Occlusion-Aware Inverse Rendering for Full-Body Human Images

    Authors: Yoshihiro Kanamori, Yuki Endo

    Abstract: Relighting of human images has various applications in image synthesis. For relighting, we must infer albedo, shape, and illumination from a human portrait. Previous techniques rely on human faces for this inference, based on spherical harmonics (SH) lighting. However, because they often ignore light occlusion, inferred shapes are biased and relit images are unnaturally bright particularly at holl… ▽ More

    Submitted 7 August, 2019; originally announced August 2019.

    Comments: Published at SIGGRAPH Asia 2018 (ACM Transactions on Graphics). Project page with codes, pretrained models, and human model lists is at http://kanamori.cs.tsukuba.ac.jp/projects/relighting_human/

  20. arXiv:1004.0599  [pdf

    cs.CR

    Quantum Three-Pass protocol: Key distribution using quantum superposition states

    Authors: Yoshito Kanamori, Seong-Moo Yoo

    Abstract: This letter proposes a novel key distribution protocol with no key exchange in advance, which is secure as the BB84 quantum key distribution protocol. Our protocol utilizes a photon in superposition state for single-bit data transmission instead of a classical electrical/optical signal. The security of this protocol relies on the fact, that the arbitrary quantum state cannot be cloned, known as th… ▽ More

    Submitted 5 April, 2010; originally announced April 2010.

    Comments: 7Pages

    Journal ref: International Journal of Network Security & Its Applications 1.2 (2009) 64-70