Search | arXiv e-print repository

arXiv:2507.06261 [pdf, ps, other]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Authors: Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, Luke Marris, Sam Petulla, Colin Gaffney, Asaf Aharoni, Nathan Lintz, Tiago Cardal Pais, Henrik Jacobsson, Idan Szpektor, Nan-Jiang Jiang, Krishna Haridasan, Ahmed Omran, Nikunj Saunshi, Dara Bahri, Gaurav Mishra, Eric Chu , et al. (3264 additional authors not shown)

Abstract: In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal unde… ▽ More In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving. △ Less

Submitted 11 July, 2025; v1 submitted 7 July, 2025; originally announced July 2025.

Comments: 72 pages, 17 figures

arXiv:2505.19500 [pdf, ps, other]

Objective, Absolute and Hue-aware Metrics for Intrinsic Image Decomposition on Real-World Scenes: A Proof of Concept

Authors: Shogo Sato, Masaru Tsuchida, Mariko Yamaguchi, Takuhiro Kaneko, Kazuhiko Murasaki, Taiga Yoshida, Ryuichi Tanida

Abstract: Intrinsic image decomposition (IID) is the task of separating an image into albedo and shade. In real-world scenes, it is difficult to quantitatively assess IID quality due to the unavailability of ground truth. The existing method provides the relative reflection intensities based on human-judged annotations. However, these annotations have challenges in subjectivity, relative evaluation, and hue… ▽ More Intrinsic image decomposition (IID) is the task of separating an image into albedo and shade. In real-world scenes, it is difficult to quantitatively assess IID quality due to the unavailability of ground truth. The existing method provides the relative reflection intensities based on human-judged annotations. However, these annotations have challenges in subjectivity, relative evaluation, and hue non-assessment. To address these, we propose a concept of quantitative evaluation with a calculated albedo from a hyperspectral imaging and light detection and ranging (LiDAR) intensity. Additionally, we introduce an optional albedo densification approach based on spectral similarity. This paper conducted a concept verification in a laboratory environment, and suggested the feasibility of an objective, absolute, and hue-aware assessment. (This paper is accepted by IEEE ICIP 2025. ) △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: copyright 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2407.09511 [pdf, other]

doi 10.1145/3641517.3664388

Controlling the color appearance of objects by optimizing the illumination spectrum

Authors: Mariko Yamaguchi, Masaru Tsuchida, Takahiro Matsumoto, Tetsuro Tokunaga, Takayoshi Mochizuki

Abstract: We have developed an innovative lighting system that changes specific target colors while keeping the lights appearing naturally white. By precisely controlling the spectral power distribution (SPD) of illumination and harnessing the unique phenomenon of metamerism, our system achieves unique color variations in ways you've never seen before. Our system calculates the optimal SPDs of illumination… ▽ More We have developed an innovative lighting system that changes specific target colors while keeping the lights appearing naturally white. By precisely controlling the spectral power distribution (SPD) of illumination and harnessing the unique phenomenon of metamerism, our system achieves unique color variations in ways you've never seen before. Our system calculates the optimal SPDs of illumination for given materials to intensively induce metamerism, and then synthesizes the illumination using various colors of LEDs. We successfully demonstrated the system's implementation at Paris Fashion Week 2024. As models step onto the stage, their dresses initiate a captivating transformation. Our system altering the colors of the dresses, showcasing an impressive transition from one stunning color to another. △ Less

Submitted 17 June, 2024; originally announced July 2024.

ACM Class: J.5

Journal ref: SIGGRAPH Emerging Technologies (2024)

arXiv:2303.17750 [pdf, other]

Design by Contract Framework for Quantum Software

Authors: Masaomi Yamaguchi, Nobukazu Yoshioka

Abstract: To realize reliable quantum software, techniques to automatically ensure the quantum software's correctness have recently been investigated. However, they primarily focus on fixed quantum circuits rather than the procedure of building quantum circuits. Despite being a common approach, the correctness of building circuits using different parameters following the same procedure is not guaranteed. To… ▽ More To realize reliable quantum software, techniques to automatically ensure the quantum software's correctness have recently been investigated. However, they primarily focus on fixed quantum circuits rather than the procedure of building quantum circuits. Despite being a common approach, the correctness of building circuits using different parameters following the same procedure is not guaranteed. To this end, we propose a design-by-contract framework for quantum software. Our framework provides a python-embedded language to write assertions on the input and output states of all quantum circuits built by certain procedures. Additionally, it provides a method to write assertions about the statistical processing of measurement results to ensure the procedure's correctness for obtaining the final result. These assertions are automatically checked using a quantum computer simulator. For evaluation, we implemented our framework and wrote assertions for some widely used quantum algorithms. Consequently, we found that our framework has sufficient expressive power to verify the whole procedure of quantum software. △ Less

Submitted 30 March, 2023; originally announced March 2023.

arXiv:2108.13783 [pdf, other]

Synbit: Synthesizing Bidirectional Programs using Unidirectional Sketches

Authors: Masaomi Yamaguchi, Kazutaka Matsuda, Cristina David, Meng Wang

Abstract: We propose a technique for synthesizing bidirectional programs from the corresponding unidirectional code plus a few input/output examples. The core ideas are: (1) constructing a sketch using the given unidirectional program as a specification, and (2) filling the sketch in a modular fashion by exploiting the properties of bidirectional programs. These ideas are enabled by our choice of programmin… ▽ More We propose a technique for synthesizing bidirectional programs from the corresponding unidirectional code plus a few input/output examples. The core ideas are: (1) constructing a sketch using the given unidirectional program as a specification, and (2) filling the sketch in a modular fashion by exploiting the properties of bidirectional programs. These ideas are enabled by our choice of programming language, HOBiT, which is specifically designed to maintain the unidirectional program structure in bidirectional programming, and keep the parts that control bidirectional behavior modular. To evaluate our approach, we implemented it in a tool called Synbit and used it to generate bidirectional programs for intricate microbenchmarks, as well as for a few larger, more realistic problems. We also compared Synbit to a state-of-the-art unidirectional synthesis tool on the task of synthesizing backward computations. △ Less

Submitted 6 October, 2021; v1 submitted 31 August, 2021; originally announced August 2021.

Comments: The is the full version of a paper "Synbit: Synthesizing Bidirectional Programs using Unidirectional Sketches" accepted by OOPSLA 2021. The accepted paper is available at https://doi.org/10.1145/3485482

arXiv:2108.12165 [pdf, ps, other]

LassoLayer: Nonlinear Feature Selection by Switching One-to-one Links

Authors: Akihito Sudo, Teng Teck Hou, Masaki Yamaguchi, Yoshinori Tone

Abstract: Along with the desire to address more complex problems, feature selection methods have gained in importance. Feature selection methods can be classified into wrapper method, filter method, and embedded method. Being a powerful embedded feature selection method, Lasso has attracted the attention of many researchers. However, as a linear approach, the applicability of Lasso has been limited. In this… ▽ More Along with the desire to address more complex problems, feature selection methods have gained in importance. Feature selection methods can be classified into wrapper method, filter method, and embedded method. Being a powerful embedded feature selection method, Lasso has attracted the attention of many researchers. However, as a linear approach, the applicability of Lasso has been limited. In this work, we propose LassoLayer that is one-to-one connected and trained by L1 optimization, which work to drop out unnecessary units for prediction. For nonlinear feature selections, we build LassoMLP: the network equipped with LassoLayer as its first layer. Because we can insert LassoLayer in any network structure, it can harness the strength of neural network suitable for tasks where feature selection is needed. We evaluate LassoMLP in feature selection with regression and classification tasks. LassoMLP receives features including considerable numbers of noisy factors that is harmful for overfitting. In the experiments using MNIST dataset, we confirm that LassoMLP outperforms the state-of-the-art method. △ Less

Submitted 27 August, 2021; originally announced August 2021.

arXiv:1907.08338 [pdf, other]

Batch Uniformization for Minimizing Maximum Anomaly Score of DNN-based Anomaly Detection in Sounds

Authors: Yuma Koizumi, Shoichiro Saito, Masataka Yamaguchi, Shin Murata, Noboru Harada

Abstract: Use of an autoencoder (AE) as a normal model is a state-of-the-art technique for unsupervised-anomaly detection in sounds (ADS). The AE is trained to minimize the sample mean of the anomaly score of normal sounds in a mini-batch. One problem with this approach is that the anomaly score of rare-normal sounds becomes higher than that of frequent-normal sounds, because the sample mean is strongly aff… ▽ More Use of an autoencoder (AE) as a normal model is a state-of-the-art technique for unsupervised-anomaly detection in sounds (ADS). The AE is trained to minimize the sample mean of the anomaly score of normal sounds in a mini-batch. One problem with this approach is that the anomaly score of rare-normal sounds becomes higher than that of frequent-normal sounds, because the sample mean is strongly affected by frequent-normal samples, resulting in preferentially decreasing the anomaly score of frequent-normal samples. To decrease anomaly scores for both frequent- and rare-normal sounds, we propose batch uniformization, a training method for unsupervised-ADS for minimizing a weighted average of the anomaly score on each sample in a mini-batch. We used the reciprocal of the probabilistic density of each sample as the weight, more intuitively, a large weight is given for rare-normal sounds. Such a weight works to give a constant anomaly score for both frequent- and rare-normal sounds. Since the probabilistic density is unknown, we estimate it by using the kernel density estimation on each training mini-batch. Verification- and objective-experiments show that the proposed batch uniformization improves the performance of unsupervised-ADS. △ Less

Submitted 18 July, 2019; originally announced July 2019.

Comments: 5 pages, to appear in IEEE WASPAA 2019

arXiv:1902.07707 [pdf, ps, other]

An Energy-efficient Time-domain Analog VLSI Neural Network Processor Based on a Pulse-width Modulation Approach

Authors: Masatoshi Yamaguchi, Goki Iwamoto, Hakaru Tamukoh, Takashi Morie

Abstract: A time-domain analog-weighted-sum calculation model based on a pulse-width modulation (PWM) approach is proposed. The proposed calculation model can be applied to any types of network structure including multi-layer feedforward networks. We also propose very large-scale integrated (VLSI) circuits to implement the proposed model. Unlike the conventional analog voltage or current mode circuits used… ▽ More A time-domain analog-weighted-sum calculation model based on a pulse-width modulation (PWM) approach is proposed. The proposed calculation model can be applied to any types of network structure including multi-layer feedforward networks. We also propose very large-scale integrated (VLSI) circuits to implement the proposed model. Unlike the conventional analog voltage or current mode circuits used in computing-in-memory circuits, our time-domain analog circuits use transient operation in charging/discharging processes to capacitors. Since the circuits can be designed without operational amplifiers, they can be operated with extremely low power consumption. However, they have to use very high-resistance devices, on the order of giga-ohms. We designed a CMOS VLSI chip to verify weighted-sum operation based on the proposed model with binary weights, which realizes the BinaryConnect model. In the chip, memory cells of static-random-access memory (SRAM) are used for synaptic connection weights. High-resistance operation was realized by using the subthreshold operation region of MOS transistors unlike the ordinary computing-in-memory circuits. The chip was designed and fabricated using a 250-nm fabrication technology. Measurement results showed that energy efficiency for the weighted-sum calculation was 300~TOPS/W (Tera-Operations Per Second per Watt), which is more than one order of magnitude higher than that in state-of-the-art digital AI processors, even though the minimum width of interconnection used in this chip was several times larger than that in such digital processors. If state-of-the-art VLSI technology is used to implement the proposed model, an energy efficiency of more than 1,000~TOPS/W will be possible. For practical applications, development of emerging analog memory devices such as ferroelectric-gate field effect transistors (FeFETs) is necessary. △ Less

Submitted 16 February, 2019; originally announced February 2019.

Comments: arXiv admin note: text overlap with arXiv:1810.06819

arXiv:1812.05796 [pdf, other]

AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Translation

Authors: Masataka Yamaguchi, Yuma Koizumi, Noboru Harada

Abstract: We tackle unsupervised anomaly detection (UAD), a problem of detecting data that significantly differ from normal data. UAD is typically solved by using density estimation. Recently, deep neural network (DNN)-based density estimators, such as Normalizing Flows, have been attracting attention. However, one of their drawbacks is the difficulty in adapting them to the change in the normal data's dist… ▽ More We tackle unsupervised anomaly detection (UAD), a problem of detecting data that significantly differ from normal data. UAD is typically solved by using density estimation. Recently, deep neural network (DNN)-based density estimators, such as Normalizing Flows, have been attracting attention. However, one of their drawbacks is the difficulty in adapting them to the change in the normal data's distribution. To address this difficulty, we propose AdaFlow, a new DNN-based density estimator that can be easily adapted to the change of the distribution. AdaFlow is a unified model of a Normalizing Flow and Adaptive Batch-Normalizations, a module that enables DNNs to adapt to new distributions. AdaFlow can be adapted to a new distribution by just conducting forward propagation once per sample; hence, it can be used on devices that have limited computational resources. We have confirmed the effectiveness of the proposed model through an anomaly detection in a sound task. We also propose a method of applying AdaFlow to the unpaired cross-domain translation problem, in which one has to train a cross-domain translation model with only unpaired samples. We have confirmed that our model can be used for the cross-domain translation problem through experiments on image datasets. △ Less

Submitted 13 March, 2019; v1 submitted 14 December, 2018; originally announced December 2018.

Comments: Accepted to ICASSP2019

arXiv:1710.11549 [pdf, other]

Melody Generation for Pop Music via Word Representation of Musical Properties

Authors: Andrew Shin, Leopold Crestel, Hiroharu Kato, Kuniaki Saito, Katsunori Ohnishi, Masataka Yamaguchi, Masahiro Nakawaki, Yoshitaka Ushiku, Tatsuya Harada

Abstract: Automatic melody generation for pop music has been a long-time aspiration for both AI researchers and musicians. However, learning to generate euphonious melody has turned out to be highly challenging due to a number of factors. Representation of multivariate property of notes has been one of the primary challenges. It is also difficult to remain in the permissible spectrum of musical variety, out… ▽ More Automatic melody generation for pop music has been a long-time aspiration for both AI researchers and musicians. However, learning to generate euphonious melody has turned out to be highly challenging due to a number of factors. Representation of multivariate property of notes has been one of the primary challenges. It is also difficult to remain in the permissible spectrum of musical variety, outside of which would be perceived as a plain random play without auditory pleasantness. Observing the conventional structure of pop music poses further challenges. In this paper, we propose to represent each note and its properties as a unique `word,' thus lessening the prospect of misalignments between the properties, as well as reducing the complexity of learning. We also enforce regularization policies on the range of notes, thus encouraging the generated melody to stay close to what humans would find easy to follow. Furthermore, we generate melody conditioned on song part information, thus replicating the overall structure of a full song. Experimental results demonstrate that our model can generate auditorily pleasant songs that are more indistinguishable from human-written ones than previous models. △ Less

Submitted 31 October, 2017; originally announced October 2017.

Comments: submitted to ICLR 2018

arXiv:1704.07945 [pdf, ps, other]

Spatio-temporal Person Retrieval via Natural Language Queries

Authors: Masataka Yamaguchi, Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada

Abstract: In this paper, we address the problem of spatio-temporal person retrieval from multiple videos using a natural language query, in which we output a tube (i.e., a sequence of bounding boxes) which encloses the person described by the query. For this problem, we introduce a novel dataset consisting of videos containing people annotated with bounding boxes for each second and with five natural langua… ▽ More In this paper, we address the problem of spatio-temporal person retrieval from multiple videos using a natural language query, in which we output a tube (i.e., a sequence of bounding boxes) which encloses the person described by the query. For this problem, we introduce a novel dataset consisting of videos containing people annotated with bounding boxes for each second and with five natural language descriptions. To retrieve the tube of the person described by a given natural language query, we design a model that combines methods for spatio-temporal human detection and multimodal retrieval. We conduct comprehensive experiments to compare a variety of tube and text representations and multimodal retrieval methods, and present a strong baseline in this task as well as demonstrate the efficacy of our tube representation and multimodal feature embedding technique. Finally, we demonstrate the versatility of our model by applying it to two other important tasks. △ Less

Submitted 22 August, 2017; v1 submitted 25 April, 2017; originally announced April 2017.

Comments: Accepted to ICCV2017

arXiv:1603.09046 [pdf, other]

Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning

Authors: Andrew Shin, Masataka Yamaguchi, Katsunori Ohnishi, Tatsuya Harada

Abstract: The workflow of extracting features from images using convolutional neural networks (CNN) and generating captions with recurrent neural networks (RNN) has become a de-facto standard for image captioning task. However, since CNN features are originally designed for classification task, it is mostly concerned with the main conspicuous element of the image, and often fails to correctly convey informa… ▽ More The workflow of extracting features from images using convolutional neural networks (CNN) and generating captions with recurrent neural networks (RNN) has become a de-facto standard for image captioning task. However, since CNN features are originally designed for classification task, it is mostly concerned with the main conspicuous element of the image, and often fails to correctly convey information on local, secondary elements. We propose to incorporate coding with vector of locally aggregated descriptors (VLAD) on spatial pyramid for CNN features of sub-regions in order to generate image representations that better reflect the local information of the images. Our results show that our method of compact VLAD coding can match CNN features with as little as 3% of dimensionality and, when combined with spatial pyramid, it results in image captions that more accurately take local elements into account. △ Less

Submitted 30 March, 2016; originally announced March 2016.

Comments: submitted to ECCV2016

Showing 1–12 of 12 results for author: Yamaguchi, M