-
Uniformity First: Uniformity-aware Test-time Adaptation of Vision-language Models against Image Corruption
Authors:
Kazuki Adachi,
Shin'ya Yamaguchi,
Tomoki Hamagami
Abstract:
Pre-trained vision-language models such as contrastive language-image pre-training (CLIP) have demonstrated a remarkable generalizability, which has enabled a wide range of applications represented by zero-shot classification. However, vision-language models still suffer when they face datasets with large gaps from training ones, i.e., distribution shifts. We found that CLIP is especially vulnerab…
▽ More
Pre-trained vision-language models such as contrastive language-image pre-training (CLIP) have demonstrated a remarkable generalizability, which has enabled a wide range of applications represented by zero-shot classification. However, vision-language models still suffer when they face datasets with large gaps from training ones, i.e., distribution shifts. We found that CLIP is especially vulnerable to sensor degradation, a type of realistic distribution shift caused by sensor conditions such as weather, light, or noise. Collecting a new dataset from a test distribution for fine-tuning highly costs since sensor degradation occurs unexpectedly and has a range of variety. Thus, we investigate test-time adaptation (TTA) of zero-shot classification, which enables on-the-fly adaptation to the test distribution with unlabeled test data. Existing TTA methods for CLIP mainly focus on modifying image and text embeddings or predictions to address distribution shifts. Although these methods can adapt to domain shifts, such as fine-grained labels spaces or different renditions in input images, they fail to adapt to distribution shifts caused by sensor degradation. We found that this is because image embeddings are "corrupted" in terms of uniformity, a measure related to the amount of information. To make models robust to sensor degradation, we propose a novel method called uniformity-aware information-balanced TTA (UnInfo). To address the corruption of image embeddings, we introduce uniformity-aware confidence maximization, information-aware loss balancing, and knowledge distillation from the exponential moving average (EMA) teacher. Through experiments, we demonstrate that our UnInfo improves accuracy under sensor degradation by retaining information in terms of uniformity.
△ Less
Submitted 19 May, 2025;
originally announced May 2025.
-
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Authors:
Rei Higuchi,
Ryotaro Kawata,
Naoki Nishikawa,
Kazusato Oko,
Shoichiro Yamaguchi,
Sosuke Kobayashi,
Seiya Tokui,
Kohei Hayashi,
Daisuke Okanohara,
Taiji Suzuki
Abstract:
The ability to acquire latent semantics is one of the key properties that determines the performance of language models. One convenient approach to invoke this ability is to prepend metadata (e.g. URLs, domains, and styles) at the beginning of texts in the pre-training data, making it easier for the model to access latent semantics before observing the entire text. Previous studies have reported t…
▽ More
The ability to acquire latent semantics is one of the key properties that determines the performance of language models. One convenient approach to invoke this ability is to prepend metadata (e.g. URLs, domains, and styles) at the beginning of texts in the pre-training data, making it easier for the model to access latent semantics before observing the entire text. Previous studies have reported that this technique actually improves the performance of trained models in downstream tasks; however, this improvement has been observed only in specific downstream tasks, without consistent enhancement in average next-token prediction loss. To understand this phenomenon, we closely investigate how prepending metadata during pre-training affects model performance by examining its behavior using artificial data. Interestingly, we found that this approach produces both positive and negative effects on the downstream tasks. We demonstrate that the effectiveness of the approach depends on whether latent semantics can be inferred from the downstream task's prompt. Specifically, through investigations using data generated by probabilistic context-free grammars, we show that training with metadata helps improve model's performance when the given context is long enough to infer the latent semantics. In contrast, the technique negatively impacts performance when the context lacks the necessary information to make an accurate posterior inference.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Authors:
Shin'ya Yamaguchi,
Dewei Feng,
Sekitoshi Kanai,
Kazuki Adachi,
Daiki Chijiwa
Abstract:
Contrastive language image pre-training (CLIP) is an essential component of building modern vision-language foundation models. While CLIP demonstrates remarkable zero-shot performance on downstream tasks, the multi-modal feature spaces still suffer from a modality gap, which is a gap between image and text feature clusters and limits downstream task performance. Although existing works attempt to…
▽ More
Contrastive language image pre-training (CLIP) is an essential component of building modern vision-language foundation models. While CLIP demonstrates remarkable zero-shot performance on downstream tasks, the multi-modal feature spaces still suffer from a modality gap, which is a gap between image and text feature clusters and limits downstream task performance. Although existing works attempt to address the modality gap by modifying pre-training or fine-tuning, they struggle with heavy training costs with large datasets or degradations of zero-shot performance. This paper presents CLIP-Refine, a post-pre-training method for CLIP models at a phase between pre-training and fine-tuning. CLIP-Refine aims to align the feature space with 1 epoch training on small image-text datasets without zero-shot performance degradations. To this end, we introduce two techniques: random feature alignment (RaFA) and hybrid contrastive-distillation (HyCD). RaFA aligns the image and text features to follow a shared prior distribution by minimizing the distance to random reference vectors sampled from the prior. HyCD updates the model with hybrid soft labels generated by combining ground-truth image-text pair labels and outputs from the pre-trained CLIP model. This contributes to achieving both maintaining the past knowledge and learning new knowledge to align features. Our extensive experiments with multiple classification and retrieval tasks show that CLIP-Refine succeeds in mitigating the modality gap and improving the zero-shot performance.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Boundary Scattering and Non-invertible Symmetries in 1+1 Dimensions
Authors:
Soichiro Shimamori,
Satoshi Yamaguchi
Abstract:
Recent studies by Copetti, Córdova and Komatsu have revealed that when non-invertible symmetries are spontaneously broken, the conventional crossing relation of the S-matrix is modified by the effects of the corresponding topological quantum field theory (TQFT). In this paper, we extend these considerations to $(1+1)$-dimensional quantum field theories (QFTs) with boundaries. In the presence of a…
▽ More
Recent studies by Copetti, Córdova and Komatsu have revealed that when non-invertible symmetries are spontaneously broken, the conventional crossing relation of the S-matrix is modified by the effects of the corresponding topological quantum field theory (TQFT). In this paper, we extend these considerations to $(1+1)$-dimensional quantum field theories (QFTs) with boundaries. In the presence of a boundary, one can define not only the bulk S-matrix but also the boundary S-matrix, which is subject to a consistency condition known as the boundary crossing relation. We show that when the boundary is weakly-symmetric under the non-invertible symmetry, the conventional boundary crossing relation also receives a modification due to the TQFT effects. As a concrete example of the boundary scattering, we analyze kink scattering in the gapped theory obtained from the $Φ_{(1,3)}$-deformation of a minimal model. We explicitly construct the boundary S-matrix that satisfies the Ward-Takahashi identities associated with non-invertible symmetries.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
$K$-theoretic computation of the Atiyah(-Patodi)-Singer index of lattice Dirac operators
Authors:
Shoto Aoki,
Hidenori Fukaya,
Mikio Furuta,
Shinichiroh Matsuo,
Tetsuya Onogi,
Satoshi Yamaguchi
Abstract:
We show that the Wilson Dirac operator in lattice gauge theory can be identified as a mathematical object in $K$-theory and that its associated spectral flow is equal to the index. In comparison to the standard lattice Dirac operator index, our formulation does not require the Ginsparg-Wilson relation and has broader applicability to systems with boundaries and to the mod-two version of the indice…
▽ More
We show that the Wilson Dirac operator in lattice gauge theory can be identified as a mathematical object in $K$-theory and that its associated spectral flow is equal to the index. In comparison to the standard lattice Dirac operator index, our formulation does not require the Ginsparg-Wilson relation and has broader applicability to systems with boundaries and to the mod-two version of the indices in general dimensions. We numerically verify that the $K$ and $KO$ group formulas reproduce the known index theorems in continuum theory. We examine the Atiyah-Singer index on a flat two-dimensional torus and, for the first time, demonstrate that the Atiyah-Patodi-Singer index with nontrivial curved boundaries, as well as the mod-two versions, can be computed on a lattice.
△ Less
Submitted 15 April, 2025; v1 submitted 31 March, 2025;
originally announced March 2025.
-
On the fundamental group of the regular part of Fujiki's compact Kahler symplectic orbifolds
Authors:
Shun Yamaguchi
Abstract:
We calculate the fundamental group of the regular part of certain compact Kahler symplectic orbifolds constructed by Fujiki, called Fujiki's examples. We determine which one is an irreducible symplectic orbifold among Fujiki's examples. This answers a question posed by A.Perego.
We calculate the fundamental group of the regular part of certain compact Kahler symplectic orbifolds constructed by Fujiki, called Fujiki's examples. We determine which one is an irreducible symplectic orbifold among Fujiki's examples. This answers a question posed by A.Perego.
△ Less
Submitted 30 March, 2025;
originally announced March 2025.
-
Proximity-Induced Nodal Metal in an Extremely Underdoped CuO$_2$ Plane in Triple-Layer Cuprates
Authors:
Shin-ichiro Ideta,
Shintaro Adachi,
Takashi Noji,
Shunpei Yamaguchi,
Nae Sasaki,
Shigeyuki Ishida,
Shin-ichi Uchida,
Takenori Fujii,
Takao Watanabe,
Wen O. Wang,
Brian Moritz,
Thomas P. Devereaux,
Masashi Arita,
Chung-Yu Mou,
Teppei Yoshida,
Kiyohisa Tanaka,
Ting-Kuo Lee,
Atsushi Fujimori
Abstract:
ARPES studies have established that the high-$T_c$ cuprates with single and double CuO$_2$ layers evolve from the Mott insulator to the pseudogap state with a Fermi arc, on which the superconducting (SC) gap opens. In four- to six-layer cuprates, on the other hand, small hole Fermi pockets are formed in the innermost CuO$_2$ planes, indicating antiferromagnetism. Here, we performed ARPES studies o…
▽ More
ARPES studies have established that the high-$T_c$ cuprates with single and double CuO$_2$ layers evolve from the Mott insulator to the pseudogap state with a Fermi arc, on which the superconducting (SC) gap opens. In four- to six-layer cuprates, on the other hand, small hole Fermi pockets are formed in the innermost CuO$_2$ planes, indicating antiferromagnetism. Here, we performed ARPES studies on the triple-layer Bi$_2$Sr$_2$Ca$_2$Cu$_3$O$_{10+δ}$ over a wide doping range, and found that, although the doping level of the inner CuO$_2$ plane was extremely low in underdoped samples, the $d$-wave SC gap was enhanced to the unprecedentedly large value of $Δ_0\sim$100 meV at the antinode and persisted well above $T_{c}$ without the appearance of a Fermi arc, indicating a robust ``nodal metal''. We attribute the nodal metallic behavior to the unique local environment of the inner clean CuO$_2$ plane in the triple-layer cuprates, sandwiched by nearly optimally-doped two outer CuO$_2$ planes and hence subject to strong proximity effect from both sides. In the nodal metal, quasiparticle peaks showed electron-hole symmetry, suggesting $d$-wave pairing fluctuations. Thus the proximity effect on the innermost CuO${_2}$ plane is the strongest in the triple-layer cuprates, which explains why the $T_c$ reaches the maximum at the layer number of three in every multi-layer cuprate family.
△ Less
Submitted 21 February, 2025;
originally announced February 2025.
-
Zero-shot Concept Bottleneck Models
Authors:
Shin'ya Yamaguchi,
Kosuke Nishida,
Daiki Chijiwa,
Yasutoshi Ida
Abstract:
Concept bottleneck models (CBMs) are inherently interpretable and intervenable neural network models, which explain their final label prediction by the intermediate prediction of high-level semantic concepts. However, they require target task training to learn input-to-concept and concept-to-label mappings, incurring target dataset collections and training resources. In this paper, we present \tex…
▽ More
Concept bottleneck models (CBMs) are inherently interpretable and intervenable neural network models, which explain their final label prediction by the intermediate prediction of high-level semantic concepts. However, they require target task training to learn input-to-concept and concept-to-label mappings, incurring target dataset collections and training resources. In this paper, we present \textit{zero-shot concept bottleneck models} (Z-CBMs), which predict concepts and labels in a fully zero-shot manner without training neural networks. Z-CBMs utilize a large-scale concept bank, which is composed of millions of vocabulary extracted from the web, to describe arbitrary input in various domains. For the input-to-concept mapping, we introduce concept retrieval, which dynamically finds input-related concepts by the cross-modal search on the concept bank. In the concept-to-label inference, we apply concept regression to select essential concepts from the retrieved concepts by sparse linear regression. Through extensive experiments, we confirm that our Z-CBMs provide interpretable and intervenable concepts without any additional training. Code will be available at https://github.com/yshinya6/zcbm.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
The distance in Morrey spaces to $C^{\infty}_{\mathrm{comp}}$
Authors:
Satoshi Yamaguchi
Abstract:
In this paper we characterize the distance between the function $f$ and the set $C^{\infty}_{\mathrm{comp}}(\mathbb{R}^d)$ in generalized Morrey spaces $L_{p,φ}(\mathbb{R}^d)$ with variable growth condition. We also prove that the bi-dual of $\overline{C^{\infty}_{\mathrm{comp}}(\mathbb{R}^d)}^{L_{p,φ}(\mathbb{R}^d)}$ is $L_{p,φ}(\mathbb{R}^d)$. As an application of the characterization of the dis…
▽ More
In this paper we characterize the distance between the function $f$ and the set $C^{\infty}_{\mathrm{comp}}(\mathbb{R}^d)$ in generalized Morrey spaces $L_{p,φ}(\mathbb{R}^d)$ with variable growth condition. We also prove that the bi-dual of $\overline{C^{\infty}_{\mathrm{comp}}(\mathbb{R}^d)}^{L_{p,φ}(\mathbb{R}^d)}$ is $L_{p,φ}(\mathbb{R}^d)$. As an application of the characterization of the distance we show the boundedness of Calderón-Zygmund operators on $\overline{C^{\infty}_{\mathrm{comp}}(\mathbb{R}^d)}^{L_{p,φ}(\mathbb{R}^d)}$. By the duality we also see that these operators are bounded on its dual and bi-dual spaces.
△ Less
Submitted 29 January, 2025;
originally announced January 2025.
-
Transfer Learning Strategies for Pathological Foundation Models: A Systematic Evaluation in Brain Tumor Classification
Authors:
Ken Enda,
Yoshitaka Oda,
Zen-ichi Tanei,
Kenichi Satoh,
Hiroaki Motegi,
Terasaka Shunsuke,
Shigeru Yamaguchi,
Takahiro Ogawa,
Wang Lei,
Masumi Tsuda,
Shinya Tanaka
Abstract:
Foundation models pretrained on large-scale pathology datasets have shown promising results across various diagnostic tasks. Here, we present a systematic evaluation of transfer learning strategies for brain tumor classification using these models. We analyzed 254 cases comprising five major tumor types: glioblastoma, astrocytoma, oligodendroglioma, primary central nervous system lymphoma, and met…
▽ More
Foundation models pretrained on large-scale pathology datasets have shown promising results across various diagnostic tasks. Here, we present a systematic evaluation of transfer learning strategies for brain tumor classification using these models. We analyzed 254 cases comprising five major tumor types: glioblastoma, astrocytoma, oligodendroglioma, primary central nervous system lymphoma, and metastatic tumors. Comparing state-of-the-art foundation models with conventional approaches, we found that foundation models demonstrated robust classification performance with as few as 10 patches per case, despite the traditional assumption that extensive per-case image sampling is necessary. Furthermore, our evaluation revealed that simple transfer learning strategies like linear probing were sufficient, while fine-tuning often degraded model performance. These findings suggest a paradigm shift from "training encoders on extensive pathological data" to "querying pre-trained encoders with labeled datasets", providing practical implications for implementing AI-assisted diagnosis in clinical pathology.
△ Less
Submitted 7 April, 2025; v1 submitted 19 January, 2025;
originally announced January 2025.
-
Exotic massive fermionic systems with huge vacuum degeneracy at boundaries
Authors:
Hiroki Kawakami,
Satoshi Yamaguchi
Abstract:
We investigate a massive non-relativistic fermionic system exhibiting exotic features. When the mass parameter is set to zero, the system acquires the fermionic subsystem symmetry. Introducing the mass term explicitly breaks this symmetry, resulting in a trivially gapped system in the absence of boundaries. We demonstrate that with the introduction of boundaries, the system remains gapped, but it…
▽ More
We investigate a massive non-relativistic fermionic system exhibiting exotic features. When the mass parameter is set to zero, the system acquires the fermionic subsystem symmetry. Introducing the mass term explicitly breaks this symmetry, resulting in a trivially gapped system in the absence of boundaries. We demonstrate that with the introduction of boundaries, the system remains gapped, but it has a huge vacuum degeneracy. The residual entropy is proportional to the area of the boundary.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
$η$ invariant of massive Wilson Dirac operator and the index
Authors:
Shoto Aoki,
Hidenori Fukaya,
Mikio Furuta,
Shinichiroh Matsuo,
Tetsuya Onogi,
Satoshi Yamaguchi
Abstract:
We revisit the lattice index theorem in the perspective of $K$-theory. The standard definition given by the overlap Dirac operator equals to the $η$ invariant of the Wilson Dirac operator with a negative mass. This equality is not coincidental but reflects a mathematically profound significance known as the suspension isomorphism of $K$-groups. Specifically, we identify the Wilson Dirac operator a…
▽ More
We revisit the lattice index theorem in the perspective of $K$-theory. The standard definition given by the overlap Dirac operator equals to the $η$ invariant of the Wilson Dirac operator with a negative mass. This equality is not coincidental but reflects a mathematically profound significance known as the suspension isomorphism of $K$-groups. Specifically, we identify the Wilson Dirac operator as an element of the $K^1$ group, which is characterized by the $η$-invariant. Furthermore, we prove that, at sufficiently small but finite lattice spacings, this $η$-invariant equals to the index of the continuum Dirac operator. Our results indicate that the Ginsparg-Wilson relation and the associated exact chiral symmetry are not essential for understanding gauge field topology in lattice gauge theory.
△ Less
Submitted 27 January, 2025; v1 submitted 6 January, 2025;
originally announced January 2025.
-
SyncViolinist: Music-Oriented Violin Motion Generation Based on Bowing and Fingering
Authors:
Hiroki Nishizawa,
Keitaro Tanaka,
Asuka Hirata,
Shugo Yamaguchi,
Qi Feng,
Masatoshi Hamanaka,
Shigeo Morishima
Abstract:
Automatically generating realistic musical performance motion can greatly enhance digital media production, often involving collaboration between professionals and musicians. However, capturing the intricate body, hand, and finger movements required for accurate musical performances is challenging. Existing methods often fall short due to the complex mapping between audio and motion, typically req…
▽ More
Automatically generating realistic musical performance motion can greatly enhance digital media production, often involving collaboration between professionals and musicians. However, capturing the intricate body, hand, and finger movements required for accurate musical performances is challenging. Existing methods often fall short due to the complex mapping between audio and motion, typically requiring additional inputs like scores or MIDI data. In this work, we present SyncViolinist, a multi-stage end-to-end framework that generates synchronized violin performance motion solely from audio input. Our method overcomes the challenge of capturing both global and fine-grained performance features through two key modules: a bowing/fingering module and a motion generation module. The bowing/fingering module extracts detailed playing information from the audio, which the motion generation module uses to create precise, coordinated body motions reflecting the temporal granularity and nature of the violin performance. We demonstrate the effectiveness of SyncViolinist with significantly improved qualitative and quantitative results from unseen violin performance audio, outperforming state-of-the-art methods. Extensive subjective evaluations involving professional violinists further validate our approach. The code and dataset are available at https://github.com/Kakanat/SyncViolinist.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
Anomalies and D-branes in the Dabholkar-Park background
Authors:
Hiroki Wada,
Satoshi Yamaguchi
Abstract:
We consider D-branes in the Dabholkar-Park (DP) background, a $9$d orientifold theory obtained by gauging symmetry in the type IIB string theory compactified on a circle. Using anomalies in the world-sheet theory, we provide physical insights into the classification of stable D-branes by relative KR-theory. The nature, such as stability, of D-branes wrapping along the compactified circle can be ex…
▽ More
We consider D-branes in the Dabholkar-Park (DP) background, a $9$d orientifold theory obtained by gauging symmetry in the type IIB string theory compactified on a circle. Using anomalies in the world-sheet theory, we provide physical insights into the classification of stable D-branes by relative KR-theory. The nature, such as stability, of D-branes wrapping along the compactified circle can be extracted from information about $1$d Majorana fermions on the boundary of the world-sheet. These Majorana fermions need to be introduced to consistently perform the GSO projection and the orientifold. We also construct D-brane states in the DP background. The spectrum of D-branes characterized by the relative KR-theory is correctly reproduced from the D-brane states.
△ Less
Submitted 29 April, 2025; v1 submitted 10 December, 2024;
originally announced December 2024.
-
Tappy Plugin for Figma: Predicting Tap Success Rates of User-Interface Elements under Development for Smartphones
Authors:
Shota Yamanaka,
Hiroki Usuba,
Junichi Sato,
Naomi Sasaya,
Fumiya Yamashita,
Shuji Yamaguchi
Abstract:
Tapping buttons and hyperlinks on smartphones is a fundamental operation, but users sometimes fail to tap user-interface (UI) elements. Such mistakes degrade usability, and thus it is important for designers to configure UI elements so that users can accurately select them. To support designers in setting a UI element with an intended tap success rate, we developed a plugin for Figma, which is mod…
▽ More
Tapping buttons and hyperlinks on smartphones is a fundamental operation, but users sometimes fail to tap user-interface (UI) elements. Such mistakes degrade usability, and thus it is important for designers to configure UI elements so that users can accurately select them. To support designers in setting a UI element with an intended tap success rate, we developed a plugin for Figma, which is modern software for developing webpages and applications for smartphones, based on our previously launched web-based application, Tappy. This plugin converts the size of a UI element from pixels to mm and then computes the tap success rates based on the Dual Gaussian Distribution Model. We have made this plugin freely available to external users, so readers can install the Tappy plugin for Figma by visiting its installation page (https://www.figma.com/community/plugin/1425006564066437139/tappy) or from their desktop Figma software.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
Test-time Adaptation for Regression by Subspace Alignment
Authors:
Kazuki Adachi,
Shin'ya Yamaguchi,
Atsutoshi Kumagai,
Tomoki Hamagami
Abstract:
This paper investigates test-time adaptation (TTA) for regression, where a regression model pre-trained in a source domain is adapted to an unknown target distribution with unlabeled target data. Although regression is one of the fundamental tasks in machine learning, most of the existing TTA methods have classification-specific designs, which assume that models output class-categorical prediction…
▽ More
This paper investigates test-time adaptation (TTA) for regression, where a regression model pre-trained in a source domain is adapted to an unknown target distribution with unlabeled target data. Although regression is one of the fundamental tasks in machine learning, most of the existing TTA methods have classification-specific designs, which assume that models output class-categorical predictions, whereas regression models typically output only single scalar values. To enable TTA for regression, we adopt a feature alignment approach, which aligns the feature distributions between the source and target domains to mitigate the domain gap. However, we found that naive feature alignment employed in existing TTA methods for classification is ineffective or even worse for regression because the features are distributed in a small subspace and many of the raw feature dimensions have little significance to the output. For an effective feature alignment in TTA for regression, we propose Significant-subspace Alignment (SSA). SSA consists of two components: subspace detection and dimension weighting. Subspace detection finds the feature subspace that is representative and significant to the output. Then, the feature alignment is performed in the subspace during TTA. Meanwhile, dimension weighting raises the importance of the dimensions of the feature subspace that have greater significance to the output. We experimentally show that SSA outperforms various baselines on real-world datasets.
△ Less
Submitted 22 January, 2025; v1 submitted 4 October, 2024;
originally announced October 2024.
-
Explanation Bottleneck Models
Authors:
Shin'ya Yamaguchi,
Kosuke Nishida
Abstract:
Recent concept-based interpretable models have succeeded in providing meaningful explanations by pre-defined concept sets. However, the dependency on the pre-defined concepts restricts the application because of the limited number of concepts for explanations. This paper proposes a novel interpretable deep neural network called explanation bottleneck models (XBMs). XBMs generate a text explanation…
▽ More
Recent concept-based interpretable models have succeeded in providing meaningful explanations by pre-defined concept sets. However, the dependency on the pre-defined concepts restricts the application because of the limited number of concepts for explanations. This paper proposes a novel interpretable deep neural network called explanation bottleneck models (XBMs). XBMs generate a text explanation from the input without pre-defined concepts and then predict a final task prediction based on the generated explanation by leveraging pre-trained vision-language encoder-decoder models. To achieve both the target task performance and the explanation quality, we train XBMs through the target task loss with the regularization penalizing the explanation decoder via the distillation from the frozen pre-trained decoder. Our experiments, including a comparison to state-of-the-art concept bottleneck models, confirm that XBMs provide accurate and fluent natural language explanations without pre-defined concept sets. Code is available at https://github.com/yshinya6/xbm/.
△ Less
Submitted 18 February, 2025; v1 submitted 26 September, 2024;
originally announced September 2024.
-
DisasterNeedFinder: Understanding the Information Needs in the 2024 Noto Earthquake (Comprehensive Explanation)
Authors:
Kota Tsubouchi,
Shuji Yamaguchi,
Keijirou Saitou,
Akihisa Soemori,
Masato Morita,
Shigeki Asou
Abstract:
We propose and demonstrate the DisasterNeedFinder framework in order to provide appropriate information support for the Noto Peninsula Earthquake. In the event of a large-scale disaster, it is essential to accurately capture the ever-changing information needs. However, it is difficult to obtain appropriate information from the chaotic situation on the ground. Therefore, as a data-driven approach,…
▽ More
We propose and demonstrate the DisasterNeedFinder framework in order to provide appropriate information support for the Noto Peninsula Earthquake. In the event of a large-scale disaster, it is essential to accurately capture the ever-changing information needs. However, it is difficult to obtain appropriate information from the chaotic situation on the ground. Therefore, as a data-driven approach, we aim to pick up precise information needs at the site by integrally analyzing the location information of disaster victims and search information. It is difficult to make a clear estimation of information needs by just analyzing search history information in disaster areas, due to the large amount of noise and the small number of users. Therefore, the idea of assuming that the magnitude of information needs is not the volume of searches, but the degree of abnormalities in searches, enables an appropriate understanding of the information needs of the disaster victims. DNF has been continuously clarifying the information needs of disaster areas since the disaster strike, and has been recognized as a new approach to support disaster areas by being featured in the major Japanese media on several occasions.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Evaluating Time-Series Training Dataset through Lens of Spectrum in Deep State Space Models
Authors:
Sekitoshi Kanai,
Yasutoshi Ida,
Kazuki Adachi,
Mihiro Uchida,
Tsukasa Yoshida,
Shin'ya Yamaguchi
Abstract:
This study investigates a method to evaluate time-series datasets in terms of the performance of deep neural networks (DNNs) with state space models (deep SSMs) trained on the dataset. SSMs have attracted attention as components inside DNNs to address time-series data. Since deep SSMs have powerful representation capacities, training datasets play a crucial role in solving a new task. However, the…
▽ More
This study investigates a method to evaluate time-series datasets in terms of the performance of deep neural networks (DNNs) with state space models (deep SSMs) trained on the dataset. SSMs have attracted attention as components inside DNNs to address time-series data. Since deep SSMs have powerful representation capacities, training datasets play a crucial role in solving a new task. However, the effectiveness of training datasets cannot be known until deep SSMs are actually trained on them. This can increase the cost of data collection for new tasks, as a trial-and-error process of data collection and time-consuming training are needed to achieve the necessary performance. To advance the practical use of deep SSMs, the metric of datasets to estimate the performance early in the training can be one key element. To this end, we introduce the concept of data evaluation methods used in system identification. In system identification of linear dynamical systems, the effectiveness of datasets is evaluated by using the spectrum of input signals. We introduce this concept to deep SSMs, which are nonlinear dynamical systems. We propose the K-spectral metric, which is the sum of the top-K spectra of signals inside deep SSMs, by focusing on the fact that each layer of a deep SSM can be regarded as a linear dynamical system. Our experiments show that the K-spectral metric has a large absolute value of the correlation coefficient with the performance and can be used to evaluate the quality of training datasets.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Correlation between $T_{\mathrm{c}}$ and the Pseudogap Observed in the Optical Spectra of High $T_{\mathrm{c}}$ Superconducting Cuprates
Authors:
Setsuko Tajima,
Yuhta Itoh,
Katsuya Mizutamari,
Shigeki Miyasaka,
Masamichi Nakajima,
Nae Sasaki,
Shunpei Yamaguchi,
Kei-ichi Harada,
Takao Watanabe
Abstract:
We studied the temperature dependences of the optical spectra for optimally and underdoped Bi$_2$Sr$_2$Ca$_2$Cu$_3$O$_{10+z}$ single crystals. Similarly to the other cuprates' cases, a gap-like conductivity suppression was observed with reducing the temperature from above $T_{\mathrm{c}}$, creating a peak in the conductivity spectrum. The conductivity peak energy was insensitive to the doping leve…
▽ More
We studied the temperature dependences of the optical spectra for optimally and underdoped Bi$_2$Sr$_2$Ca$_2$Cu$_3$O$_{10+z}$ single crystals. Similarly to the other cuprates' cases, a gap-like conductivity suppression was observed with reducing the temperature from above $T_{\mathrm{c}}$, creating a peak in the conductivity spectrum. The conductivity peak energy was insensitive to the doping level, namely $T_{\mathrm{c}}$, which suggests that this gap is not a superconducting gap but is related to the pseudogap. Comparing the data of various mono-, double-, and triple-layer cuprates, we found a clear correlation between the optimal $T_{\mathrm{c}}$ of each material and the pseudogap-related conductivity peak energy.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
The index of lattice Dirac operators and $K$-theory
Authors:
Shoto Aoki,
Hidenori Fukaya,
Mikio Furuta,
Shinichiroh Matsuo,
Tetsuya Onogi,
Satoshi Yamaguchi
Abstract:
We mathematically show an equality between the index of a Dirac operator on a flat continuum torus and the $η$ invariant of a lattice Dirac operator known as the Wilson Dirac operator with a negative mass when the lattice spacing is sufficiently small. Unlike the standard approach, our formulation using $K$-theory does not require modified chiral symmetry on the lattice. We prove that a one-parame…
▽ More
We mathematically show an equality between the index of a Dirac operator on a flat continuum torus and the $η$ invariant of a lattice Dirac operator known as the Wilson Dirac operator with a negative mass when the lattice spacing is sufficiently small. Unlike the standard approach, our formulation using $K$-theory does not require modified chiral symmetry on the lattice. We prove that a one-parameter family of continuum massive Dirac operators and the corresponding Wilson Dirac operators belong to the same equivalence class of the $K^1$ group at a finite lattice spacing. Their indices, which are evaluated by the spectral flow or equivalently by the $η$ invariant at a finite mass, are proved to be equal.
△ Less
Submitted 2 June, 2025; v1 submitted 24 July, 2024;
originally announced July 2024.
-
Effects of vortex and antivortex excitations in underdoped Bi$_2$Sr$_2$Ca$_2$Cu$_3$O$_{10+δ}$ bulk single crystals
Authors:
Takao Watanabe,
Kenta Kosugi,
Nae Sasaki,
Shunpei Yamaguchi,
Takenori Fujii,
Ken Hayama,
Itsuhiro Kakeya,
Toshimitsu Ito
Abstract:
The observance of vortex and anti-vortex effects in bulk crystals can prove the existence of phase-disordered superconductivity in the bulk. To gain insights into the mechanisms that govern superconducting transition in copper oxide high-transition temperature ($T_c$) superconductors, this study investigated the transport properties of underdoped Bi$_2$Sr$_2$Ca$_2$Cu$_3$O$_{10+δ}$ (Bi-2223) bulk s…
▽ More
The observance of vortex and anti-vortex effects in bulk crystals can prove the existence of phase-disordered superconductivity in the bulk. To gain insights into the mechanisms that govern superconducting transition in copper oxide high-transition temperature ($T_c$) superconductors, this study investigated the transport properties of underdoped Bi$_2$Sr$_2$Ca$_2$Cu$_3$O$_{10+δ}$ (Bi-2223) bulk single crystals.The $I$-$V$ characteristics results and the typical tailing behavior owing to the temperature dependence of in-plane resistivity ($ρ_{ab}$) were consistent with the Kosterlitz-Thouless (KT) transition characteristics. Thus, with increasing temperature, copper oxide high-$T_c$ superconductors transitioned to their normal state owing to destruction of their phase correlations, although a finite Cooper pair density was prevalent at $T_c$. Further, magnetization measurements were performed to determine the temperature dependence of the irreversible magnetic field $B_{irr}$. Consequently, the mechanism governing the KT transition-like superconducting transition in this bulk system was elucidated. These results support the extreme strong-coupling models for high-$T_c$ superconductivity in cuprates.
△ Less
Submitted 10 October, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
Test-time Adaptation Meets Image Enhancement: Improving Accuracy via Uncertainty-aware Logit Switching
Authors:
Shohei Enomoto,
Naoya Hasegawa,
Kazuki Adachi,
Taku Sasaki,
Shin'ya Yamaguchi,
Satoshi Suzuki,
Takeharu Eda
Abstract:
Deep neural networks have achieved remarkable success in a variety of computer vision applications. However, there is a problem of degrading accuracy when the data distribution shifts between training and testing. As a solution of this problem, Test-time Adaptation~(TTA) has been well studied because of its practicality. Although TTA methods increase accuracy under distribution shift by updating t…
▽ More
Deep neural networks have achieved remarkable success in a variety of computer vision applications. However, there is a problem of degrading accuracy when the data distribution shifts between training and testing. As a solution of this problem, Test-time Adaptation~(TTA) has been well studied because of its practicality. Although TTA methods increase accuracy under distribution shift by updating the model at test time, using high-uncertainty predictions is known to degrade accuracy. Since the input image is the root of the distribution shift, we incorporate a new perspective on enhancing the input image into TTA methods to reduce the prediction's uncertainty. We hypothesize that enhancing the input image reduces prediction's uncertainty and increase the accuracy of TTA methods. On the basis of our hypothesis, we propose a novel method: Test-time Enhancer and Classifier Adaptation~(TECA). In TECA, the classification model is combined with the image enhancement model that transforms input images into recognition-friendly ones, and these models are updated by existing TTA methods. Furthermore, we found that the prediction from the enhanced image does not always have lower uncertainty than the prediction from the original image. Thus, we propose logit switching, which compares the uncertainty measure of these predictions and outputs the lower one. In our experiments, we evaluate TECA with various TTA methods and show that TECA reduces prediction's uncertainty and increases accuracy of TTA methods despite having no hyperparameters and little parameter overhead.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Test-time Similarity Modification for Person Re-identification toward Temporal Distribution Shift
Authors:
Kazuki Adachi,
Shohei Enomoto,
Taku Sasaki,
Shin'ya Yamaguchi
Abstract:
Person re-identification (re-id), which aims to retrieve images of the same person in a given image from a database, is one of the most practical image recognition applications. In the real world, however, the environments that the images are taken from change over time. This causes a distribution shift between training and testing and degrades the performance of re-id. To maintain re-id performan…
▽ More
Person re-identification (re-id), which aims to retrieve images of the same person in a given image from a database, is one of the most practical image recognition applications. In the real world, however, the environments that the images are taken from change over time. This causes a distribution shift between training and testing and degrades the performance of re-id. To maintain re-id performance, models should continue adapting to the test environment's temporal changes. Test-time adaptation (TTA), which aims to adapt models to the test environment with only unlabeled test data, is a promising way to handle this problem because TTA can adapt models instantly in the test environment. However, the previous TTA methods are designed for classification and cannot be directly applied to re-id. This is because the set of people's identities in the dataset differs between training and testing in re-id, whereas the set of classes is fixed in the current TTA methods designed for classification. To improve re-id performance in changing test environments, we propose TEst-time similarity Modification for Person re-identification (TEMP), a novel TTA method for re-id. TEMP is the first fully TTA method for re-id, which does not require any modification to pre-training. Inspired by TTA methods that refine the prediction uncertainty in classification, we aim to refine the uncertainty in re-id. However, the uncertainty cannot be computed in the same way as classification in re-id since it is an open-set task, which does not share person labels between training and testing. Hence, we propose re-id entropy, an alternative uncertainty measure for re-id computed based on the similarity between the feature vectors. Experiments show that the re-id entropy can measure the uncertainty on re-id and TEMP improves the performance of re-id in online settings where the distribution changes over time.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Adaptive Random Feature Regularization on Fine-tuning Deep Neural Networks
Authors:
Shin'ya Yamaguchi,
Sekitoshi Kanai,
Kazuki Adachi,
Daiki Chijiwa
Abstract:
While fine-tuning is a de facto standard method for training deep neural networks, it still suffers from overfitting when using small target datasets. Previous methods improve fine-tuning performance by maintaining knowledge of the source datasets or introducing regularization terms such as contrastive loss. However, these methods require auxiliary source information (e.g., source labels or datase…
▽ More
While fine-tuning is a de facto standard method for training deep neural networks, it still suffers from overfitting when using small target datasets. Previous methods improve fine-tuning performance by maintaining knowledge of the source datasets or introducing regularization terms such as contrastive loss. However, these methods require auxiliary source information (e.g., source labels or datasets) or heavy additional computations. In this paper, we propose a simple method called adaptive random feature regularization (AdaRand). AdaRand helps the feature extractors of training models to adaptively change the distribution of feature vectors for downstream classification tasks without auxiliary source information and with reasonable computation costs. To this end, AdaRand minimizes the gap between feature vectors and random reference vectors that are sampled from class conditional Gaussian distributions. Furthermore, AdaRand dynamically updates the conditional distribution to follow the currently updated feature extractors and balance the distance between classes in feature spaces. Our experiments show that AdaRand outperforms the other fine-tuning regularization, which requires auxiliary source information and heavy computation costs.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
On the Limitation of Diffusion Models for Synthesizing Training Datasets
Authors:
Shin'ya Yamaguchi,
Takuma Fukuda
Abstract:
Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real datasets even when using state-of-the-art diffusion models. This means that modern diffusion models do not perfectly represent the data distribution for the purpos…
▽ More
Synthetic samples from diffusion models are promising for leveraging in training discriminative models as replications of real training datasets. However, we found that the synthetic datasets degrade classification performance over real datasets even when using state-of-the-art diffusion models. This means that modern diffusion models do not perfectly represent the data distribution for the purpose of replicating datasets for training discriminative tasks. This paper investigates the gap between synthetic and real samples by analyzing the synthetic samples reconstructed from real samples through the diffusion and reverse process. By varying the time steps starting the reverse process in the reconstruction, we can control the trade-off between the information in the original real data and the information added by diffusion models. Through assessing the reconstructed samples and trained models, we found that the synthetic data are concentrated in modes of the training data distribution as the reverse step increases, and thus, they are difficult to cover the outer edges of the distribution. Our findings imply that modern diffusion models are insufficient to replicate training data distribution perfectly, and there is room for the improvement of generative modeling in the replication of training datasets.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
TRAIL Team Description Paper for RoboCup@Home 2023
Authors:
Chikaha Tsuji,
Dai Komukai,
Mimo Shirasaka,
Hikaru Wada,
Tsunekazu Omija,
Aoi Horo,
Daiki Furuta,
Saki Yamaguchi,
So Ikoma,
Soshi Tsunashima,
Masato Kobayashi,
Koki Ishimoto,
Yuya Ikeda,
Tatsuya Matsushima,
Yusuke Iwasawa,
Yutaka Matsuo
Abstract:
Our team, TRAIL, consists of AI/ML laboratory members from The University of Tokyo. We leverage our extensive research experience in state-of-the-art machine learning to build general-purpose in-home service robots. We previously participated in two competitions using Human Support Robot (HSR): RoboCup@Home Japan Open 2020 (DSPL) and World Robot Summit 2020, equivalent to RoboCup World Tournament.…
▽ More
Our team, TRAIL, consists of AI/ML laboratory members from The University of Tokyo. We leverage our extensive research experience in state-of-the-art machine learning to build general-purpose in-home service robots. We previously participated in two competitions using Human Support Robot (HSR): RoboCup@Home Japan Open 2020 (DSPL) and World Robot Summit 2020, equivalent to RoboCup World Tournament. Throughout the competitions, we showed that a data-driven approach is effective for performing in-home tasks. Aiming for further development of building a versatile and fast-adaptable system, in RoboCup @Home 2023, we unify three technologies that have recently been evaluated as components in the fields of deep learning and robot learning into a real household robot system. In addition, to stimulate research all over the RoboCup@Home community, we build a platform that manages data collected from each site belonging to the community around the world, taking advantage of the characteristics of the community.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Generative Semi-supervised Learning with Meta-Optimized Synthetic Samples
Authors:
Shin'ya Yamaguchi
Abstract:
Semi-supervised learning (SSL) is a promising approach for training deep classification models using labeled and unlabeled datasets. However, existing SSL methods rely on a large unlabeled dataset, which may not always be available in many real-world applications due to legal constraints (e.g., GDPR). In this paper, we investigate the research question: Can we train SSL models without real unlabel…
▽ More
Semi-supervised learning (SSL) is a promising approach for training deep classification models using labeled and unlabeled datasets. However, existing SSL methods rely on a large unlabeled dataset, which may not always be available in many real-world applications due to legal constraints (e.g., GDPR). In this paper, we investigate the research question: Can we train SSL models without real unlabeled datasets? Instead of using real unlabeled datasets, we propose an SSL method using synthetic datasets generated from generative foundation models trained on datasets containing millions of samples in diverse domains (e.g., ImageNet). Our main concepts are identifying synthetic samples that emulate unlabeled samples from generative foundation models and training classifiers using these synthetic samples. To achieve this, our method is formulated as an alternating optimization problem: (i) meta-learning of generative foundation models and (ii) SSL of classifiers using real labeled and synthetic unlabeled samples. For (i), we propose a meta-learning objective that optimizes latent variables to generate samples that resemble real labeled samples and minimize the validation loss. For (ii), we propose a simple unsupervised loss function that regularizes the feature extractors of classifiers to maximize the performance improvement obtained from synthetic samples. We confirm that our method outperforms baselines using generative foundation models on SSL. We also demonstrate that our methods outperform SSL using real unlabeled datasets in scenarios with extremely small amounts of labeled datasets. This suggests that synthetic samples have the potential to provide improvement gains more efficiently than real unlabeled data.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Adversarial Finetuning with Latent Representation Constraint to Mitigate Accuracy-Robustness Tradeoff
Authors:
Satoshi Suzuki,
Shin'ya Yamaguchi,
Shoichiro Takeda,
Sekitoshi Kanai,
Naoki Makishima,
Atsushi Ando,
Ryo Masumura
Abstract:
This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs). Although adversarial training (AT) improves robustness, it degrades the standard accuracy, thus yielding the tradeoff. To mitigate this tradeoff, we propose a novel AT method called ARREST, which comprises three components: (i) adversarial finetu…
▽ More
This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs). Although adversarial training (AT) improves robustness, it degrades the standard accuracy, thus yielding the tradeoff. To mitigate this tradeoff, we propose a novel AT method called ARREST, which comprises three components: (i) adversarial finetuning (AFT), (ii) representation-guided knowledge distillation (RGKD), and (iii) noisy replay (NR). AFT trains a DNN on adversarial examples by initializing its parameters with a DNN that is standardly pretrained on clean examples. RGKD and NR respectively entail a regularization term and an algorithm to preserve latent representations of clean examples during AFT. RGKD penalizes the distance between the representations of the standardly pretrained and AFT DNNs. NR switches input adversarial examples to nonadversarial ones when the representation changes significantly during AFT. By combining these components, ARREST achieves both high standard accuracy and robustness. Experimental results demonstrate that ARREST mitigates the tradeoff more effectively than previous AT-based methods do.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
Regularizing Neural Networks with Meta-Learning Generative Models
Authors:
Shin'ya Yamaguchi,
Daiki Chijiwa,
Sekitoshi Kanai,
Atsutoshi Kumagai,
Hisashi Kashima
Abstract:
This paper investigates methods for improving generative data augmentation for deep learning. Generative data augmentation leverages the synthetic samples produced by generative models as an additional dataset for classification with small dataset settings. A key challenge of generative data augmentation is that the synthetic data contain uninformative samples that degrade accuracy. This is becaus…
▽ More
This paper investigates methods for improving generative data augmentation for deep learning. Generative data augmentation leverages the synthetic samples produced by generative models as an additional dataset for classification with small dataset settings. A key challenge of generative data augmentation is that the synthetic data contain uninformative samples that degrade accuracy. This is because the synthetic samples do not perfectly represent class categories in real data and uniform sampling does not necessarily provide useful samples for tasks. In this paper, we present a novel strategy for generative data augmentation called meta generative regularization (MGR). To avoid the degradation of generative data augmentation, MGR utilizes synthetic samples in the regularization term for feature extractors instead of in the loss function, e.g., cross-entropy. These synthetic samples are dynamically determined to minimize the validation losses through meta-learning. We observed that MGR can avoid the performance degradation of naïve generative data augmentation and boost the baselines. Experiments on six datasets showed that MGR is effective particularly when datasets are smaller and stably outperforms baselines.
△ Less
Submitted 23 October, 2023; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics
Authors:
Kenta Oono,
Nontawat Charoenphakdee,
Kotatsu Bito,
Zhengyan Gao,
Hideyoshi Igata,
Masashi Yoshikawa,
Yoshiaki Ota,
Hiroki Okui,
Kei Akita,
Shoichiro Yamaguchi,
Yohei Sugawara,
Shin-ichi Maeda,
Kunihiko Miyoshi,
Yuki Saito,
Koki Tsuda,
Hiroshi Maruyama,
Kohei Hayashi
Abstract:
Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental well-being. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose the Virtual Human Generative Model (VHGM), a novel deep generative model capable of estimating over 2…
▽ More
Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental well-being. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose the Virtual Human Generative Model (VHGM), a novel deep generative model capable of estimating over 2,000 attributes across healthcare, lifestyle, and personality domains. VHGM leverages masked modeling to learn the joint distribution of attributes, enabling accurate predictions and robust conditional sampling. We deploy VHGM as a web service, showcasing its versatility in driving diverse healthcare applications aimed at improving user well-being. Through extensive quantitative evaluations, we demonstrate VHGM's superior performance in attribute imputation and high-quality sample generation compared to existing baselines. This work highlights VHGM as a powerful tool for personalized healthcare and lifestyle management, with broad implications for data-driven health solutions.
△ Less
Submitted 29 January, 2025; v1 submitted 18 June, 2023;
originally announced June 2023.
-
Toward Data Efficient Model Merging between Different Datasets without Performance Degradation
Authors:
Masanori Yamada,
Tomoya Yamashita,
Shin'ya Yamaguchi,
Daiki Chijiwa
Abstract:
Model merging is attracting attention as a novel method for creating a new model by combining the weights of different trained models. While previous studies reported that model merging works well for models trained on a single dataset with different random seeds, model merging between different datasets remains unsolved. In this paper, we attempt to reveal the difficulty in merging such models tr…
▽ More
Model merging is attracting attention as a novel method for creating a new model by combining the weights of different trained models. While previous studies reported that model merging works well for models trained on a single dataset with different random seeds, model merging between different datasets remains unsolved. In this paper, we attempt to reveal the difficulty in merging such models trained on different datasets and alleviate it. Our empirical analyses show that, in contrast to the single-dataset scenarios, dataset information needs to be accessed to achieve high accuracy when merging models trained on different datasets. However, the requirement to use full datasets not only incurs significant computational costs but also becomes a major limitation when integrating models developed and shared by others. To address this, we demonstrate that dataset reduction techniques, such as coreset selection and dataset condensation, effectively reduce the data requirement for model merging. In our experiments with SPLIT-CIFAR10 model merging, the accuracy is significantly improved by $31%$ when using the full dataset and $24%$ when using the sampled subset compared with not using the dataset.
△ Less
Submitted 20 September, 2024; v1 submitted 8 June, 2023;
originally announced June 2023.
-
A Survey on Multi-Resident Activity Recognition in Smart Environments
Authors:
Farhad MortezaPour Shiri,
Thinagaran Perumal,
Norwati Mustapha,
Raihani Mohamed,
Mohd Anuaruddin Bin Ahmadon,
Shingo Yamaguchi
Abstract:
Human activity recognition (HAR) is a rapidly growing field that utilizes smart devices, sensors, and algorithms to automatically classify and identify the actions of individuals within a given environment. These systems have a wide range of applications, including assisting with caring tasks, increasing security, and improving energy efficiency. However, there are several challenges that must be…
▽ More
Human activity recognition (HAR) is a rapidly growing field that utilizes smart devices, sensors, and algorithms to automatically classify and identify the actions of individuals within a given environment. These systems have a wide range of applications, including assisting with caring tasks, increasing security, and improving energy efficiency. However, there are several challenges that must be addressed in order to effectively utilize HAR systems in multi-resident environments. One of the key challenges is accurately associating sensor observations with the identities of the individuals involved, which can be particularly difficult when residents are engaging in complex and collaborative activities. This paper provides a brief overview of the design and implementation of HAR systems, including a summary of the various data collection devices and approaches used for human activity identification. It also reviews previous research on the use of these systems in multi-resident environments and offers conclusions on the current state of the art in the field.
△ Less
Submitted 20 April, 2025; v1 submitted 24 April, 2023;
originally announced April 2023.
-
Non-invertible symmetries and boundaries in four dimensions
Authors:
Masataka Koide,
Yuta Nagoya,
Satoshi Yamaguchi
Abstract:
We study quantum field theories with boundary by utilizing non-invertible symmetries. We consider three kinds of boundary conditions of the four dimensional $\mathbb{Z}_2$ lattice gauge theory at the critical point as examples. The weights of the elements on the boundary is determined so that these boundary conditions are related by the Kramers-Wannier-Wegner (KWW) duality. In other words, it is r…
▽ More
We study quantum field theories with boundary by utilizing non-invertible symmetries. We consider three kinds of boundary conditions of the four dimensional $\mathbb{Z}_2$ lattice gauge theory at the critical point as examples. The weights of the elements on the boundary is determined so that these boundary conditions are related by the Kramers-Wannier-Wegner (KWW) duality. In other words, it is required that the KWW duality defects ending on the boundary is topological. Moreover, we obtain the ratios of the hemisphere partition functions with these boundary conditions; this result constrains the boundary renormalization group flows under the assumption of the conjectured g-theorem in four dimensions.
△ Less
Submitted 25 September, 2023; v1 submitted 4 April, 2023;
originally announced April 2023.
-
Phase structure of linear quiver gauge theories from anomaly matching
Authors:
Okuto Morikawa,
Hiroki Wada,
Satoshi Yamaguchi
Abstract:
We consider the phase structure of the linear quiver gauge theory, using the 't Hooft anomaly matching condition. This theory is characterized by the length $K$ of the quiver diagram. When $K$ is even, the symmetry and its anomaly are the same as those of massless QCD. Therefore, one can expect that the spontaneous symmetry breaking similar to the chiral symmetry breaking occurs. On the other hand…
▽ More
We consider the phase structure of the linear quiver gauge theory, using the 't Hooft anomaly matching condition. This theory is characterized by the length $K$ of the quiver diagram. When $K$ is even, the symmetry and its anomaly are the same as those of massless QCD. Therefore, one can expect that the spontaneous symmetry breaking similar to the chiral symmetry breaking occurs. On the other hand, when $K$ is odd, the anomaly matching condition is satisfied by the massless composite fermions. We also consider the thermal partition function under the twisted boundary conditions. When $K$ is even, from the anomaly at finite temperature, we estimate the relation between the critical temperatures associated with the confinement/deconfinement and the breaking of the global symmetry. Finally we discuss the anomaly matching at finite temperature when $K$ is odd.
△ Less
Submitted 6 February, 2023; v1 submitted 22 November, 2022;
originally announced November 2022.
-
Component-Wise Natural Gradient Descent -- An Efficient Neural Network Optimization
Authors:
Tran Van Sang,
Mhd Irvan,
Rie Shigetomi Yamaguchi,
Toshiyuki Nakata
Abstract:
Natural Gradient Descent (NGD) is a second-order neural network training that preconditions the gradient descent with the inverse of the Fisher Information Matrix (FIM). Although NGD provides an efficient preconditioner, it is not practicable due to the expensive computation required when inverting the FIM. This paper proposes a new NGD variant algorithm named Component-Wise Natural Gradient Desce…
▽ More
Natural Gradient Descent (NGD) is a second-order neural network training that preconditions the gradient descent with the inverse of the Fisher Information Matrix (FIM). Although NGD provides an efficient preconditioner, it is not practicable due to the expensive computation required when inverting the FIM. This paper proposes a new NGD variant algorithm named Component-Wise Natural Gradient Descent (CW-NGD). CW-NGD is composed of 2 steps. Similar to several existing works, the first step is to consider the FIM matrix as a block-diagonal matrix whose diagonal blocks correspond to the FIM of each layer's weights. In the second step, unique to CW-NGD, we analyze the layer's structure and further decompose the layer's FIM into smaller segments whose derivatives are approximately independent. As a result, individual layers' FIMs are approximated in a block-diagonal form that trivially supports the inversion. The segment decomposition strategy is varied by layer structure. Specifically, we analyze the dense and convolutional layers and design their decomposition strategies appropriately. In an experiment of training a network containing these 2 types of layers, we empirically prove that CW-NGD requires fewer iterations to converge compared to the state-of-the-art first-order and second-order methods.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
SL$(2,\mathbb{Z})$ action on quantum field theories with U(1) subsystem symmetry
Authors:
Satoshi Yamaguchi
Abstract:
We consider SL$(2,\mathbb{Z})$ action on quantum field theories with U(1) subsystem symmetry in five dimensions. This is an analog of the SL$(2,\mathbb{Z})$ action considered in arXiv:hep-th/0307041. We show that the exotic level 1 BF theory and the exotic level 1 Chern-Simons theories are trivial and almost trivial, respectively. By using this fact, we define S operation and T operation. These op…
▽ More
We consider SL$(2,\mathbb{Z})$ action on quantum field theories with U(1) subsystem symmetry in five dimensions. This is an analog of the SL$(2,\mathbb{Z})$ action considered in arXiv:hep-th/0307041. We show that the exotic level 1 BF theory and the exotic level 1 Chern-Simons theories are trivial and almost trivial, respectively. By using this fact, we define S operation and T operation. These operations make SL$(2,\mathbb{Z})$ group up to a possible invertible phase that is unity within the space-times treated in this paper. We also demonstrate SL$(2,\mathbb{Z})$ action on the $\varphi$ theory as an example.
△ Less
Submitted 13 January, 2023; v1 submitted 28 August, 2022;
originally announced August 2022.
-
One-vs-the-Rest Loss to Focus on Important Samples in Adversarial Training
Authors:
Sekitoshi Kanai,
Shin'ya Yamaguchi,
Masanori Yamada,
Hiroshi Takahashi,
Kentaro Ohno,
Yasutoshi Ida
Abstract:
This paper proposes a new loss function for adversarial training. Since adversarial training has difficulties, e.g., necessity of high model capacity, focusing on important data points by weighting cross-entropy loss has attracted much attention. However, they are vulnerable to sophisticated attacks, e.g., Auto-Attack. This paper experimentally reveals that the cause of their vulnerability is thei…
▽ More
This paper proposes a new loss function for adversarial training. Since adversarial training has difficulties, e.g., necessity of high model capacity, focusing on important data points by weighting cross-entropy loss has attracted much attention. However, they are vulnerable to sophisticated attacks, e.g., Auto-Attack. This paper experimentally reveals that the cause of their vulnerability is their small margins between logits for the true label and the other labels. Since neural networks classify the data points based on the logits, logit margins should be large enough to avoid flipping the largest logit by the attacks. Importance-aware methods do not increase logit margins of important samples but decrease those of less-important samples compared with cross-entropy loss. To increase logit margins of important samples, we propose switching one-vs-the-rest loss (SOVR), which switches from cross-entropy to one-vs-the-rest loss for important samples that have small logit margins. We prove that one-vs-the-rest loss increases logit margins two times larger than the weighted cross-entropy loss for a simple problem. We experimentally confirm that SOVR increases logit margins of important samples unlike existing methods and achieves better robustness against Auto-Attack than importance-aware methods.
△ Less
Submitted 26 April, 2023; v1 submitted 20 July, 2022;
originally announced July 2022.
-
Meta-ticket: Finding optimal subnetworks for few-shot learning within randomly initialized neural networks
Authors:
Daiki Chijiwa,
Shin'ya Yamaguchi,
Atsutoshi Kumagai,
Yasutoshi Ida
Abstract:
Few-shot learning for neural networks (NNs) is an important problem that aims to train NNs with a few data. The main challenge is how to avoid overfitting since over-parameterized NNs can easily overfit to such small dataset. Previous work (e.g. MAML by Finn et al. 2017) tackles this challenge by meta-learning, which learns how to learn from a few data by using various tasks. On the other hand, on…
▽ More
Few-shot learning for neural networks (NNs) is an important problem that aims to train NNs with a few data. The main challenge is how to avoid overfitting since over-parameterized NNs can easily overfit to such small dataset. Previous work (e.g. MAML by Finn et al. 2017) tackles this challenge by meta-learning, which learns how to learn from a few data by using various tasks. On the other hand, one conventional approach to avoid overfitting is restricting hypothesis spaces by endowing sparse NN structures like convolution layers in computer vision. However, although such manually-designed sparse structures are sample-efficient for sufficiently large datasets, they are still insufficient for few-shot learning. Then the following questions naturally arise: (1) Can we find sparse structures effective for few-shot learning by meta-learning? (2) What benefits will it bring in terms of meta-generalization? In this work, we propose a novel meta-learning approach, called Meta-ticket, to find optimal sparse subnetworks for few-shot learning within randomly initialized NNs. We empirically validated that Meta-ticket successfully discover sparse subnetworks that can learn specialized features for each given task. Due to this task-wise adaptation ability, Meta-ticket achieves superior meta-generalization compared to MAML-based methods especially with large NNs. The code is available at: https://github.com/dchiji-ntt/meta-ticket
△ Less
Submitted 9 February, 2023; v1 submitted 31 May, 2022;
originally announced May 2022.
-
Covariance-aware Feature Alignment with Pre-computed Source Statistics for Test-time Adaptation to Multiple Image Corruptions
Authors:
Kazuki Adachi,
Shin'ya Yamaguchi,
Atsutoshi Kumagai
Abstract:
Real-world image recognition systems often face corrupted input images, which cause distribution shifts and degrade the performance of models. These systems often use a single prediction model in a central server and process images sent from various environments, such as cameras distributed in cities or cars. Such single models face images corrupted in heterogeneous ways in test time. Thus, they r…
▽ More
Real-world image recognition systems often face corrupted input images, which cause distribution shifts and degrade the performance of models. These systems often use a single prediction model in a central server and process images sent from various environments, such as cameras distributed in cities or cars. Such single models face images corrupted in heterogeneous ways in test time. Thus, they require to instantly adapt to the multiple corruptions during testing rather than being re-trained at a high cost. Test-time adaptation (TTA), which aims to adapt models without accessing the training dataset, is one of the settings that can address this problem. Existing TTA methods indeed work well on a single corruption. However, the adaptation ability is limited when multiple types of corruption occur, which is more realistic. We hypothesize this is because the distribution shift is more complicated, and the adaptation becomes more difficult in case of multiple corruptions. In fact, we experimentally found that a larger distribution gap remains after TTA. To address the distribution gap during testing, we propose a novel TTA method named Covariance-Aware Feature alignment (CAFe). We empirically show that CAFe outperforms prior TTA methods on image corruptions, including multiple types of corruptions.
△ Less
Submitted 29 June, 2023; v1 submitted 27 April, 2022;
originally announced April 2022.
-
Transfer Learning with Pre-trained Conditional Generative Models
Authors:
Shin'ya Yamaguchi,
Sekitoshi Kanai,
Atsutoshi Kumagai,
Daiki Chijiwa,
Hisashi Kashima
Abstract:
Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods always assume at least one of (i) source and target task label spaces overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, holding these assumptions is difficult in practical settings because the target task ra…
▽ More
Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods always assume at least one of (i) source and target task label spaces overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, holding these assumptions is difficult in practical settings because the target task rarely has the same labels as the source task, the source dataset access is restricted due to storage costs and privacy, and the target architecture is often specialized to each task. To transfer source knowledge without these assumptions, we propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training (PP) and pseudo semi-supervised learning (P-SSL). PP trains a target architecture with an artificial dataset synthesized by using conditional source generative models. P-SSL applies SSL algorithms to labeled target data and unlabeled pseudo samples, which are generated by cascading the source classifier and generative models to condition them with target samples. Our experimental results indicate that our method can outperform the baselines of scratch training and knowledge distillation.
△ Less
Submitted 20 February, 2025; v1 submitted 27 April, 2022;
originally announced April 2022.
-
Learning-based Collision-free Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs
Authors:
Tomoki Ando,
Hiroto Iino,
Hiroki Mori,
Ryota Torishima,
Kuniyuki Takahashi,
Shoichiro Yamaguchi,
Daisuke Okanohara,
Tetsuya Ogata
Abstract:
We propose a new method for collision-free planning using Conditional Generative Adversarial Networks (cGANs) to transform between the robot's joint space and a latent space that captures only collision-free areas of the joint space, conditioned by an obstacle map. Generating multiple plausible trajectories is convenient in applications such as the manipulation of a robot arm by enabling the selec…
▽ More
We propose a new method for collision-free planning using Conditional Generative Adversarial Networks (cGANs) to transform between the robot's joint space and a latent space that captures only collision-free areas of the joint space, conditioned by an obstacle map. Generating multiple plausible trajectories is convenient in applications such as the manipulation of a robot arm by enabling the selection of trajectories that avoids collision with the robot or surrounding environment. In the proposed method, various trajectories that avoid obstacles can be generated by connecting the start and goal state with arbitrary line segments in this generated latent space. Our method provides this collision-free latent space, after which any planner, using any optimization conditions, can be used to generate the most suitable paths on the fly. We successfully verified this method with a simulated and actual UR5e 6-DoF robotic arm. We confirmed that different trajectories could be generated depending on optimization conditions.
△ Less
Submitted 5 February, 2023; v1 submitted 26 February, 2022;
originally announced February 2022.
-
Collision-free Path Planning in the Latent Space through cGANs
Authors:
Tomoki Ando,
Hiroki Mori,
Ryota Torishima,
Kuniyuki Takahashi,
Shoichiro Yamaguchi,
Daisuke Okanohara,
Tetsuya Ogata
Abstract:
We show a new method for collision-free path planning by cGANs by mapping its latent space to only the collision-free areas of the robot joint space. Our method simply provides this collision-free latent space after which any planner, using any optimization conditions, can be used to generate the most suitable paths on the fly. We successfully verified this method with a simulated two-link robot a…
▽ More
We show a new method for collision-free path planning by cGANs by mapping its latent space to only the collision-free areas of the robot joint space. Our method simply provides this collision-free latent space after which any planner, using any optimization conditions, can be used to generate the most suitable paths on the fly. We successfully verified this method with a simulated two-link robot arm.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Learning Robust Convolutional Neural Networks with Relevant Feature Focusing via Explanations
Authors:
Kazuki Adachi,
Shin'ya Yamaguchi
Abstract:
Existing image recognition techniques based on convolutional neural networks (CNNs) basically assume that the training and test datasets are sampled from i.i.d distributions. However, this assumption is easily broken in the real world because of the distribution shift that occurs when the co-occurrence relations between objects and backgrounds in input images change. Under this type of distributio…
▽ More
Existing image recognition techniques based on convolutional neural networks (CNNs) basically assume that the training and test datasets are sampled from i.i.d distributions. However, this assumption is easily broken in the real world because of the distribution shift that occurs when the co-occurrence relations between objects and backgrounds in input images change. Under this type of distribution shift, CNNs learn to focus on features that are not task-relevant, such as backgrounds from the training data, and degrade their accuracy on the test data. To tackle this problem, we propose relevant feature focusing (ReFF). ReFF detects task-relevant features and regularizes CNNs via explanation outputs (e.g., Grad-CAM). Since ReFF is composed of post-hoc explanation modules, it can be easily applied to off-the-shelf CNNs. Furthermore, ReFF requires no additional inference cost at test time because it is only used for regularization while training. We demonstrate that CNNs trained with ReFF focus on features relevant to the target task and that ReFF improves the test-time accuracy.
△ Less
Submitted 23 March, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
On elementary moves of singular Legendrian knots
Authors:
Sara Yamaguchi,
Noboru Ito
Abstract:
We have two results. First, we give 96 generating sets oriented singular Reidemeister moves; it is an answer to a question by Bataineh, Khaled, Elhamdadi, and Hajij who give a generating set of oriented singular Reidemeister moves using their computation. Second, in the theory of plane curve and Legendrian knots introduced by V. I. Arnold, we select which moves survive as those of Legendrian singu…
▽ More
We have two results. First, we give 96 generating sets oriented singular Reidemeister moves; it is an answer to a question by Bataineh, Khaled, Elhamdadi, and Hajij who give a generating set of oriented singular Reidemeister moves using their computation. Second, in the theory of plane curve and Legendrian knots introduced by V. I. Arnold, we select which moves survive as those of Legendrian singular knots and fronts diagrammatically and explicitly.
△ Less
Submitted 11 January, 2022;
originally announced January 2022.
-
A physicist-friendly reformulation of the mod-two Atiyah-Patodi-Singer index
Authors:
Hidenori Fukaya,
Mikio Furuta,
Yoshiyuki Matsuki,
Shinichiroh Matsuo,
Tetsuya Onogi,
Satoshi Yamaguchi,
Mayuko Yamashita
Abstract:
Gauge anomaly in 4-dimensions can be viewed as a current inflow into an extra-dimension, where the total phase of the fermion partition function is given in a gauge invariant way by the Atiyah- Patodi-Singer(APS) eta-invariant of a 5-dimensional Dirac operator. However, this formalism requires a non-local boundary condition, with which the physical roles of edge/bulk modes are unclear and how the…
▽ More
Gauge anomaly in 4-dimensions can be viewed as a current inflow into an extra-dimension, where the total phase of the fermion partition function is given in a gauge invariant way by the Atiyah- Patodi-Singer(APS) eta-invariant of a 5-dimensional Dirac operator. However, this formalism requires a non-local boundary condition, with which the physical roles of edge/bulk modes are unclear and how the causality of the theory is maintained is not obvious. In this work, we consider a special case where the Dirac operator is in a real representation and its eta invariant becomes the mod-two type APS index. We propose a physicist-friendly reformulation of the mod-two index using domain-wall fermion formalism, which naturally describes how the global anomaly is canceled between edge and bulk.
△ Less
Submitted 22 November, 2021;
originally announced November 2021.
-
Gapless edge modes in (4+1)-dimensional topologically massive tensor gauge theory and anomaly inflow for subsystem symmetry
Authors:
Satoshi Yamaguchi
Abstract:
We consider (4+1)-dimensional topologically massive tensor gauge theory. This theory is an analog of the (2+1)-dimensional topologically massive Maxwell-Chern-Simons theory. If the space has a boundary, we find that a (3+1)-dimensional gapless theory appears at the boundary. This gapless theory is a chiral version of the (3+1)-dimensional $\varphi$ theory. This gapless theory is protected by the a…
▽ More
We consider (4+1)-dimensional topologically massive tensor gauge theory. This theory is an analog of the (2+1)-dimensional topologically massive Maxwell-Chern-Simons theory. If the space has a boundary, we find that a (3+1)-dimensional gapless theory appears at the boundary. This gapless theory is a chiral version of the (3+1)-dimensional $\varphi$ theory. This gapless theory is protected by the anomaly inflow mechanism of subsystem symmetry. We also consider the corner of our topologically massive tensor gauge theory, and find that an infinite number of (1+1)-dimensional chiral bosons appear at the corner.
△ Less
Submitted 14 February, 2022; v1 submitted 25 October, 2021;
originally announced October 2021.
-
Gauge Kinetic Mixing and Dark Topological Defects
Authors:
Takashi Hiramatsu,
Masahiro Ibe,
Motoo Suzuki,
Soma Yamaguchi
Abstract:
We discuss how the topological defects in the dark sector affect the Standard Model sector when the dark photon has a kinetic mixing with the QED photon. In particular, we consider the dark photon appearing in the successive gauge symmetry breaking, $\mathrm{SU}(2)\to \mathrm{U}(1) \to \mathbb{Z}_2$, where the remaining $\mathbb{Z}_2$ is the center of $\mathrm{SU(2)}$. In this model, the monopole…
▽ More
We discuss how the topological defects in the dark sector affect the Standard Model sector when the dark photon has a kinetic mixing with the QED photon. In particular, we consider the dark photon appearing in the successive gauge symmetry breaking, $\mathrm{SU}(2)\to \mathrm{U}(1) \to \mathbb{Z}_2$, where the remaining $\mathbb{Z}_2$ is the center of $\mathrm{SU(2)}$. In this model, the monopole is trapped into the cosmic strings and forms the so-called bead solution. As we will discuss, the dark cosmic string induces the QED magnetic flux inside the dark string through the kinetic mixing. The dark monopole, on the other hand, does not induce the QED magnetic flux in the U(1) symmetric phase, even in the presence of the kinetic mixing. Finally, we show that the dark bead solution induces a spherically symmetric QED magnetic flux through the kinetic mixing. The induced flux looks like the QED magnetic monopole viewed from a distance, although QED satisfies the Bianchi identity everywhere, which we call a pseudo magnetic monopole.
△ Less
Submitted 26 September, 2021;
originally announced September 2021.
-
Non-invertible topological defects in 4-dimensional $\mathbb{Z}_2$ pure lattice gauge theory
Authors:
Masataka Koide,
Yuta Nagoya,
Satoshi Yamaguchi
Abstract:
We explore topological defects in the 4-dimensional pure $\mathbb{Z}_2$ lattice gauge theory. This theory has 1-form $\mathbb{Z}_{2}$ center symmetry as well as the Kramers-Wannier-Wegner (KWW) duality. We construct the KWW duality topological defects in the similar way to that constructed by Aasen, Mong, Fendley arXiv:1601.07185 for the 2-dimensional Ising model. These duality defects turn out to…
▽ More
We explore topological defects in the 4-dimensional pure $\mathbb{Z}_2$ lattice gauge theory. This theory has 1-form $\mathbb{Z}_{2}$ center symmetry as well as the Kramers-Wannier-Wegner (KWW) duality. We construct the KWW duality topological defects in the similar way to that constructed by Aasen, Mong, Fendley arXiv:1601.07185 for the 2-dimensional Ising model. These duality defects turn out to be non-invertible. We also construct the 1-form $\mathbb{Z}_{2}$ symmetry defects as well as the junctions among KWW duality defects and 1-form $\mathbb{Z}_{2}$ center symmetry defects. The crossing relations among these defects are derived. The expectation values of some configurations of these topological defects are calculated by using these crossing relations.
△ Less
Submitted 1 November, 2021; v1 submitted 13 September, 2021;
originally announced September 2021.
-
Pruning Randomly Initialized Neural Networks with Iterative Randomization
Authors:
Daiki Chijiwa,
Shin'ya Yamaguchi,
Yasutoshi Ida,
Kenji Umakoshi,
Tomohiro Inoue
Abstract:
Pruning the weights of randomly initialized neural networks plays an important role in the context of lottery ticket hypothesis. Ramanujan et al. (2020) empirically showed that only pruning the weights can achieve remarkable performance instead of optimizing the weight values. However, to achieve the same level of performance as the weight optimization, the pruning approach requires more parameter…
▽ More
Pruning the weights of randomly initialized neural networks plays an important role in the context of lottery ticket hypothesis. Ramanujan et al. (2020) empirically showed that only pruning the weights can achieve remarkable performance instead of optimizing the weight values. However, to achieve the same level of performance as the weight optimization, the pruning approach requires more parameters in the networks before pruning and thus more memory space. To overcome this parameter inefficiency, we introduce a novel framework to prune randomly initialized neural networks with iteratively randomizing weight values (IteRand). Theoretically, we prove an approximation theorem in our framework, which indicates that the randomizing operations are provably effective to reduce the required number of the parameters. We also empirically demonstrate the parameter efficiency in multiple experiments on CIFAR-10 and ImageNet. The code is available at: https://github.com/dchiji-ntt/iterand
△ Less
Submitted 5 April, 2022; v1 submitted 17 June, 2021;
originally announced June 2021.