Skip to main content

Showing 1–25 of 25 results for author: Simo-Serra, E

.
  1. arXiv:2503.15783  [pdf, other

    cs.CL cs.AI

    Grammar and Gameplay-aligned RL for Game Description Generation with LLMs

    Authors: Tsunehiko Tanaka, Edgar Simo-Serra

    Abstract: Game Description Generation (GDG) is the task of generating a game description written in a Game Description Language (GDL) from natural language text. Previous studies have explored generation methods leveraging the contextual understanding capabilities of Large Language Models (LLMs); however, accurately reproducing the game features of the game descriptions remains a challenge. In this paper, w… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  2. arXiv:2412.18421  [pdf, other

    cs.CV

    Fashionability-Enhancing Outfit Image Editing with Conditional Diffusion Models

    Authors: Qice Qin, Yuki Hirakawa, Ryotaro Shimizu, Takuya Furusawa, Edgar Simo-Serra

    Abstract: Image generation in the fashion domain has predominantly focused on preserving body characteristics or following input prompts, but little attention has been paid to improving the inherent fashionability of the output images. This paper presents a novel diffusion model-based approach that generates fashion images with improved fashionability while maintaining control over key attributes. Key compo… ▽ More

    Submitted 24 December, 2024; originally announced December 2024.

    Comments: 11 pages, 6 figures

  3. arXiv:2409.19051  [pdf, other

    cs.CV cs.AI cs.MM

    Multimodal Markup Document Models for Graphic Design Completion

    Authors: Kotaro Kikuchi, Naoto Inoue, Mayu Otani, Edgar Simo-Serra, Kota Yamaguchi

    Abstract: This paper presents multimodal markup document models (MarkupDM) that can generate both markup language and images within interleaved multimodal documents. Unlike existing vision-and-language multimodal models, our MarkupDM tackles unique challenges critical to graphic design tasks: generating partial images that contribute to the overall appearance, often involving transparency and varying sizes,… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: Project page: https://cyberagentailab.github.io/MarkupDM/

  4. Grammar-based Game Description Generation using Large Language Models

    Authors: Tsunehiko Tanaka, Edgar Simo-Serra

    Abstract: Game Description Language (GDL) provides a standardized way to express diverse games in a machine-readable format, enabling automated game simulation, and evaluation. While previous research has explored game description generation using search-based methods, generating GDL descriptions from natural language remains a challenging task. This paper presents a novel framework that leverages Large Lan… ▽ More

    Submitted 22 January, 2025; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted for publication at the IEEE Transactions on Games

    Journal ref: IEEE Transactions on Games, 2024 (early access)

  5. arXiv:2405.19056  [pdf, other

    cs.GR

    Neural Scene Baking for Permutation Invariant Transparency Rendering with Real-time Global Illumination

    Authors: Ziyang Zhang, Edgar Simo-Serra

    Abstract: Neural rendering provides a fundamentally new way to render photorealistic images. Similar to traditional light-baking methods, neural rendering utilizes neural networks to bake representations of scenes, materials, and lights into latent vectors learned from path-tracing ground truths. However, existing neural rendering algorithms typically use G-buffers to provide position, normal, and texture i… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by Computational Visual Media

  6. arXiv:2402.03923  [pdf, ps, other

    cs.LG

    Return-Aligned Decision Transformer

    Authors: Tsunehiko Tanaka, Kenshi Abe, Kaito Ariu, Tetsuro Morimura, Edgar Simo-Serra

    Abstract: Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return. It is increasingly important to adjust the performance of AI agents to meet human requirements, for example, in applications like video games and education tools. Decision Transformer (DT) optimizes a policy that generates actions conditioned on the t… ▽ More

    Submitted 19 June, 2025; v1 submitted 6 February, 2024; originally announced February 2024.

  7. Visual Grounding of Whole Radiology Reports for 3D CT Images

    Authors: Akimichi Ichinose, Taro Hatsutani, Keigo Nakamura, Yoshiro Kitamura, Satoshi Iizuka, Edgar Simo-Serra, Shoji Kido, Noriyuki Tomiyama

    Abstract: Building a large-scale training dataset is an essential problem in the development of medical image recognition systems. Visual grounding techniques, which automatically associate objects in images with corresponding descriptions, can facilitate labeling of large number of images. However, visual grounding of radiology reports for CT images remains challenging, because so many kinds of anomalies a… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 14 pages, 7 figures. Accepted at MICCAI 2023

    Journal ref: Medical Image Computing and Computer Assisted Intervention Lecture Notes in Computer Science 14224 (2023) 611-621

  8. arXiv:2312.04779  [pdf, other

    eess.IV cs.CV cs.LG

    Image Synthesis-based Late Stage Cancer Augmentation and Semi-Supervised Segmentation for MRI Rectal Cancer Staging

    Authors: Saeko Sasuga, Akira Kudo, Yoshiro Kitamura, Satoshi Iizuka, Edgar Simo-Serra, Atsushi Hamabe, Masayuki Ishii, Ichiro Takemasa

    Abstract: Rectal cancer is one of the most common diseases and a major cause of mortality. For deciding rectal cancer treatment plans, T-staging is important. However, evaluating the index from preoperative MRI images requires high radiologists' skill and experience. Therefore, the aim of this study is to segment the mesorectum, rectum, and rectal cancer region so that the system can predict T-stage from se… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 10 pages, 7 figures, Accepted to Data Augmentation, Labeling, and Imperfections (DALI) at MICCAI 2022

  9. arXiv:2309.14759  [pdf, other

    cs.GR cs.CV

    Diffusion-based Holistic Texture Rectification and Synthesis

    Authors: Guoqing Hao, Satoshi Iizuka, Kensho Hara, Edgar Simo-Serra, Hirokatsu Kataoka, Kazuhiro Fukui

    Abstract: We present a novel framework for rectifying occlusions and distortions in degraded texture samples from natural images. Traditional texture synthesis approaches focus on generating textures from pristine samples, which necessitate meticulous preparation by humans and are often unattainable in most natural images. These challenges stem from the frequent occlusions and distortions of texture samples… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: SIGGRAPH Asia 2023 Conference Paper

  10. arXiv:2308.10111  [pdf, other

    cs.CV cs.GR

    Controllable Multi-domain Semantic Artwork Synthesis

    Authors: Yuantian Huang, Satoshi Iizuka, Edgar Simo-Serra, Kazuhiro Fukui

    Abstract: We present a novel framework for multi-domain synthesis of artwork from semantic layouts. One of the main limitations of this challenging task is the lack of publicly available segmentation datasets for art synthesis. To address this problem, we propose a dataset, which we call ArtSem, that contains 40,000 images of artwork from 4 different domains with their corresponding semantic label maps. We… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: 15 pages, accepted by CVMJ, to appear

  11. arXiv:2303.18248  [pdf, other

    cs.CV

    Towards Flexible Multi-modal Document Models

    Authors: Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

    Abstract: Creative workflows for generating graphical documents involve complex inter-related tasks, such as aligning elements, choosing appropriate fonts, or employing aesthetically harmonious colors. In this work, we attempt at building a holistic model that can jointly solve many different design tasks. Our model, which we denote by FlexDM, treats vector graphic documents as a set of multi-modal elements… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

    Comments: To be published in CVPR2023 (highlight), project page: https://cyberagentailab.github.io/flex-dm

  12. arXiv:2303.08137  [pdf, other

    cs.CV cs.GR

    LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

    Authors: Naoto Inoue, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

    Abstract: Controllable layout generation aims at synthesizing plausible arrangement of element bounding boxes with optional constraints, such as type or position of a specific element. In this work, we try to solve a broad range of layout generation tasks in a single model that is based on discrete state-space diffusion models. Our model, named LayoutDM, naturally handles the structured layout data in the d… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: To be published in CVPR2023, project page: https://cyberagentailab.github.io/layout-dm/

  13. arXiv:2212.11541  [pdf, other

    cs.CV cs.MM

    Generative Colorization of Structured Mobile Web Pages

    Authors: Kotaro Kikuchi, Naoto Inoue, Mayu Otani, Edgar Simo-Serra, Kota Yamaguchi

    Abstract: Color is a critical design factor for web pages, affecting important factors such as viewer emotions and the overall trust and satisfaction of a website. Effective coloring requires design knowledge and expertise, but if this process could be automated through data-driven modeling, efficient exploration and alternative workflows would be possible. However, this direction remains underexplored due… ▽ More

    Submitted 23 January, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

    Comments: Accepted to WACV 2023

  14. arXiv:2212.00567  [pdf, other

    cs.CV cs.RO

    P2Net: A Post-Processing Network for Refining Semantic Segmentation of LiDAR Point Cloud based on Consistency of Consecutive Frames

    Authors: Yutaka Momma, Weimin Wang, Edgar Simo-Serra, Satoshi Iizuka, Ryosuke Nakamura, Hiroshi Ishikawa

    Abstract: We present a lightweight post-processing method to refine the semantic segmentation results of point cloud sequences. Most existing methods usually segment frame by frame and encounter the inherent ambiguity of the problem: based on a measurement in a single frame, labels are sometimes difficult to predict even for humans. To remedy this problem, we propose to explicitly train a network to refine… ▽ More

    Submitted 1 December, 2022; originally announced December 2022.

    Comments: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

  15. Constrained Graphic Layout Generation via Latent Optimization

    Authors: Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi

    Abstract: It is common in graphic design humans visually arrange various elements according to their design intent and semantics. For example, a title text almost always appears on top of other elements in a document. In this work, we generate graphic layouts that can flexibly incorporate such design semantics, either specified implicitly or explicitly by a user. We optimize using the latent space of an off… ▽ More

    Submitted 2 August, 2021; originally announced August 2021.

    Comments: Accepted by ACM Multimedia 2021

  16. arXiv:2009.13798  [pdf, other

    eess.IV cs.AI

    Automatic Segmentation, Localization, and Identification of Vertebrae in 3D CT Images Using Cascaded Convolutional Neural Networks

    Authors: Naoto Masuzawa, Yoshiro Kitamura, Keigo Nakamura, Satoshi Iizuka, Edgar Simo-Serra

    Abstract: This paper presents a method for automatic segmentation, localization, and identification of vertebrae in arbitrary 3D CT images. Many previous works do not perform the three tasks simultaneously even though requiring a priori knowledge of which part of the anatomy is visible in the 3D CT images. Our method tackles all these tasks in a single multi-stage framework without any assumptions. In the f… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

  17. arXiv:2009.08692  [pdf, other

    cs.CV cs.GR

    DeepRemaster: Temporal Source-Reference Attention Networks for Comprehensive Video Enhancement

    Authors: Satoshi Iizuka, Edgar Simo-Serra

    Abstract: The remastering of vintage film comprises of a diversity of sub-tasks including super-resolution, noise removal, and contrast enhancement which aim to restore the deteriorated film medium to its original state. Additionally, due to the technical limitations of the time, most vintage film is either recorded in black and white, or has low quality colors, for which colorization becomes necessary. In… ▽ More

    Submitted 18 September, 2020; originally announced September 2020.

    Comments: Accepted to SIGGRAPH Asia 2019. Project page: http://iizuka.cs.tsukuba.ac.jp/projects/remastering/

  18. arXiv:2009.08674  [pdf, other

    cs.CV

    TopNet: Topology Preserving Metric Learning for Vessel Tree Reconstruction and Labelling

    Authors: Deepak Keshwani, Yoshiro Kitamura, Satoshi Ihara, Satoshi Iizuka, Edgar Simo-Serra

    Abstract: Reconstructing Portal Vein and Hepatic Vein trees from contrast enhanced abdominal CT scans is a prerequisite for preoperative liver surgery simulation. Existing deep learning based methods treat vascular tree reconstruction as a semantic segmentation problem. However, vessels such as hepatic and portal vein look very similar locally and need to be traced to their source for robust label assignmen… ▽ More

    Submitted 18 September, 2020; originally announced September 2020.

    Comments: Accepted in MICCAI 2020

    Report number: 603

  19. arXiv:2003.11211  [pdf, other

    cs.CV

    Two-stage Discriminative Re-ranking for Large-scale Landmark Retrieval

    Authors: Shuhei Yokoo, Kohei Ozaki, Edgar Simo-Serra, Satoshi Iizuka

    Abstract: We propose an efficient pipeline for large-scale landmark image retrieval that addresses the diversity of the dataset through two-stage discriminative re-ranking. Our approach is based on embedding the images in a feature-space using a convolutional neural network trained with a cosine softmax loss. Due to the variance of the images, which include extreme viewpoint changes such as having to retrie… ▽ More

    Submitted 25 March, 2020; originally announced March 2020.

    Comments: 10 pages, 5 figures

  20. arXiv:1909.04021  [pdf, other

    cs.CV cs.LG stat.ML

    Understanding the Effects of Pre-Training for Object Detectors via Eigenspectrum

    Authors: Yosuke Shinya, Edgar Simo-Serra, Taiji Suzuki

    Abstract: ImageNet pre-training has been regarded as essential for training accurate object detectors for a long time. Recently, it has been shown that object detectors trained from randomly initialized weights can be on par with those fine-tuned from ImageNet pre-trained models. However, the effects of pre-training and the differences caused by pre-training are still not fully understood. In this paper, we… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

    Comments: ICCV 2019 Workshop on Neural Architects (Oral)

  21. arXiv:1908.11506  [pdf, other

    eess.IV cs.CV

    Virtual Thin Slice: 3D Conditional GAN-based Super-resolution for CT Slice Interval

    Authors: Akira Kudo, Yoshiro Kitamura, Yuanzhong Li, Satoshi Iizuka, Edgar Simo-Serra

    Abstract: Many CT slice images are stored with large slice intervals to reduce storage size in clinical practice. This leads to low resolution perpendicular to the slice images (i.e., z-axis), which is insufficient for 3D visualization or image analysis. In this paper, we present a novel architecture based on conditional Generative Adversarial Networks (cGANs) with the goal of generating high resolution ima… ▽ More

    Submitted 1 September, 2019; v1 submitted 29 August, 2019; originally announced August 2019.

    Comments: 10 pages, 6 figures, Accepted to Machine Learning for Medical Image Reconstruction (MLMIR) at MICCAI 2019

  22. arXiv:1703.08966  [pdf, other

    cs.CV

    Mastering Sketching: Adversarial Augmentation for Structured Prediction

    Authors: Edgar Simo-Serra, Satoshi Iizuka, Hiroshi Ishikawa

    Abstract: We present an integral framework for training sketch simplification networks that convert challenging rough sketches into clean line drawings. Our approach augments a simplification network with a discriminator network, training both networks jointly so that the discriminator network discerns whether a line drawing is a real training data or the output of the simplification network, which in turn… ▽ More

    Submitted 27 March, 2017; originally announced March 2017.

    Comments: 12 pages, 14 figures

  23. arXiv:1604.08164  [pdf, other

    cs.CV

    Understanding Human-Centric Images: From Geometry to Fashion

    Authors: Edgar Simo-Serra

    Abstract: Understanding humans from photographs has always been a fundamental goal of computer vision. In this thesis we have developed a hierarchy of tools that cover a wide range of topics with the objective of understanding humans from monocular RGB image: from low level feature point descriptors to high level fashion-aware conditional random fields models. In order to build these high level models it is… ▽ More

    Submitted 13 December, 2015; originally announced April 2016.

    Comments: PhD Thesis, May 2015. BarcelonaTech. 169 pages

  24. arXiv:1509.02130  [pdf, other

    cs.CV

    Structured Prediction with Output Embeddings for Semantic Image Annotation

    Authors: Ariadna Quattoni, Arnau Ramisa, Pranava Swaroop Madhyastha, Edgar Simo-Serra, Francesc Moreno-Noguer

    Abstract: We address the task of annotating images with semantic tuples. Solving this problem requires an algorithm which is able to deal with hundreds of classes for each argument of the tuple. In such contexts, data sparsity becomes a key challenge, as there will be a large number of classes for which only a few examples are available. We propose handling this by incorporating feature representations of b… ▽ More

    Submitted 7 September, 2015; originally announced September 2015.

    Comments: 11 pages

  25. arXiv:1412.6537  [pdf, other

    cs.CV

    Fracking Deep Convolutional Image Descriptors

    Authors: Edgar Simo-Serra, Eduard Trulls, Luis Ferraz, Iasonas Kokkinos, Francesc Moreno-Noguer

    Abstract: In this paper we propose a novel framework for learning local image descriptors in a discriminative manner. For this purpose we explore a siamese architecture of Deep Convolutional Neural Networks (CNN), with a Hinge embedding loss on the L2 distance between descriptors. Since a siamese architecture uses pairs rather than single image patches to train, there exist a large number of positive sample… ▽ More

    Submitted 25 February, 2015; v1 submitted 19 December, 2014; originally announced December 2014.