Search | arXiv e-print repository

Exploring the Relationship Between Diversity and Quality in Ad Text Generation

Authors: Yoichi Aoki, Soichiro Murakami, Ukyo Honda, Akihiko Kato

Abstract: In natural language generation for advertising, creating diverse and engaging ad texts is crucial for capturing a broad audience and avoiding advertising fatigue. Regardless of the importance of diversity, the impact of the diversity-enhancing methods in ad text generation -- mainly tested on tasks such as summarization and machine translation -- has not been thoroughly explored. Ad text generatio… ▽ More In natural language generation for advertising, creating diverse and engaging ad texts is crucial for capturing a broad audience and avoiding advertising fatigue. Regardless of the importance of diversity, the impact of the diversity-enhancing methods in ad text generation -- mainly tested on tasks such as summarization and machine translation -- has not been thoroughly explored. Ad text generation significantly differs from these tasks owing to the text style and requirements. This research explores the relationship between diversity and ad quality in ad text generation by considering multiple factors, such as diversity-enhancing methods, their hyperparameters, input-output formats, and the models. △ Less

Submitted 22 May, 2025; originally announced May 2025.

arXiv:2505.10998 [pdf, other]

Investigating the axial structure of the nucleon based on large-volume lattice QCD at the physical point

Authors: Ryutaro Tsuji, Yasumichi Aoki, Ken-Ichi Ishikawa, Yoshinobu Kuramashi, Shoichi Sasaki, Kohei Sato, Eigo Shintani, Hiromasa Watanabe, Takeshi Yamazaki

Abstract: We present a short summary for the calculations of the nucleon $\textit{isovector}$ form factors, which are relevant to improving the accuracy of the current neutrino oscillation experiments. The calculations are carried out with two of three sets of the $2+1$ flavor lattice QCD configurations generated at the physical point in large spatial volumes by the PACS Collaboration. The two gauge configu… ▽ More We present a short summary for the calculations of the nucleon $\textit{isovector}$ form factors, which are relevant to improving the accuracy of the current neutrino oscillation experiments. The calculations are carried out with two of three sets of the $2+1$ flavor lattice QCD configurations generated at the physical point in large spatial volumes by the PACS Collaboration. The two gauge configurations are generated with the six stout-smeared $O(a)$ improved Wilson quark action and Iwasaki gauge action at the lattice spacing of $0.09$ fm and $0.06$ fm. We summarize the results for three form factors as well as the nucleon axial-vector ($g_A$), induced pseudoscalar ($g_P^*$) and pion-nucleon ($g_{πNN}$) couplings. Although our couplings agree with the experimental data, a firm conclusion should be drawn only after a continuum limit extrapolation is taken. We investigate the partially conserved axial-vector current (PCAC) relation in the context of the nucleon correlation functions. The low-energy relations arising from the PCAC relation can be used to verify whether the lattice QCD data correctly reproduce the physics in the continuum within the statistical accuracy. It is demonstrated that our $\textit{new analysis}$ reduces the systematic uncertainty for the induced pseudoscalar and pseudoscalar form factors to a greater extent than the $\textit{traditional analysis}$, and the results offer a theoretical insight into the pion-pole dominance model. Finally, we examine the applicable $q^2$ region for the low-energy relations. △ Less

Submitted 16 May, 2025; originally announced May 2025.

Comments: 63 pages, 33 figures

Report number: UTHEP-803, UTCCS-P-166, HUPD-2504, KEK-TH-2716

arXiv:2505.08658 [pdf, ps, other]

A novel view of the flavor-singlet spectrum from multi-flavor QCD on the lattice

Authors: Yasumichi Aoki, Tatsumi Aoyama, Ed Bennett, Toshihide Maskawa, Kohtaroh Miura, Hiroshi Ohki, Enrico Rinaldi, Akihiro Shibata, Koichi Yamawaki, Takeshi Yamazaki

Abstract: SU(3) gauge theories with increasing number of light fermions are the templates of strongly interacting sectors and studying their low-energy dynamics and spectrum is important, both for understanding the strong dynamics of QCD itself, but also for discovering viable UV completions of beyond the Standard Model physics. In order to contrast many-flavors strongly interacting theories with QCD on a q… ▽ More SU(3) gauge theories with increasing number of light fermions are the templates of strongly interacting sectors and studying their low-energy dynamics and spectrum is important, both for understanding the strong dynamics of QCD itself, but also for discovering viable UV completions of beyond the Standard Model physics. In order to contrast many-flavors strongly interacting theories with QCD on a quantitative footing, we use Lattice Field Theory simulations. We focus on the study of the flavor-singlet spectrum in the scalar and pseudoscalar channels: this is an interesting probe of the dynamics of the strongly interacting sector, as reminded by the QCD case with the $f_0(500)$ ($σ$) and $η^\prime$ mesons. The hierarchy of the spectrum of a strongly coupled new gauge sector of the Standard Model defines the potential reach of future colliders for new physics discoveries. In addition to a novel hierarchy with light scalars, introducing many light flavors at fixed number of colors can influence the dynamics of the lightest flavor-singlet pseudoscalar. We present a complete lattice study of both these flavor-singlet channels on high-statistics gauge ensembles generated by the LatKMI collaboration with 4, 8, and 12 copies of light mass-degenerate fermions. We also present other hadron masses on the lightest ensemble for $N_f=8$ generated by the LatKMI collaboration and discuss the chiral extrapolation of the spectrum in this particular theory. We contrast the results to $N_f=4$ simulations and previous results of $N_f=12$ simulations. △ Less

Submitted 13 May, 2025; originally announced May 2025.

Comments: 60 pages, 44 figures

Report number: RIKEN-iTHEMS-Report-22, UTHEP-804, UTCCS-P-167, KEK-TH-2721

arXiv:2505.06854 [pdf, other]

Method for high-precision determination of the nucleon axial structure using lattice QCD: Removing $πN$-state contamination

Authors: Yasumichi Aoki, Ken-Ichi Ishikawa, Yoshinobu Kuramashi, Shoichi Sasaki, Kohei Sato, Eigo Shintani, Ryutaro Tsuji, Hiromasa Watanabe, Takeshi Yamazaki

Abstract: We performed a precise calculation of physical quantities related to the axial structure of the nucleon using 2+1 flavor lattice QCD gauge configuration (PACS10 configuration) generated at the physical point with lattice volume larger than $(10\;{\mathrm{fm}})^4$ by the PACS Collaboration. The nucleon matrix element of the axial-vector current has two types of the nucleon form factors, the axial-v… ▽ More We performed a precise calculation of physical quantities related to the axial structure of the nucleon using 2+1 flavor lattice QCD gauge configuration (PACS10 configuration) generated at the physical point with lattice volume larger than $(10\;{\mathrm{fm}})^4$ by the PACS Collaboration. The nucleon matrix element of the axial-vector current has two types of the nucleon form factors, the axial-vector ($F_A$) form factor and the induced pseudoscalar ($F_P$) form factor. Recently lattice QCD simulations have succeeded in reproducing the experimental value of the axial-vector coupling, $g_A$, determined from $F_A(q^2)$ at zero momentum transfer $q^2=0$, at a percent level of statistical accuracy. However, the $F_P$ form factor so far has not reproduced the experimental values well due to strong $πN$ excited-state contamination. Therefore, we proposed a simple subtraction method for removing the so-called leading $πN$-state contribution, and succeeded in reproducing the values obtained by two experiments of muon capture on the proton and pion electro-production for $F_P(q^2)$. The novel approach can also be applied to the nucleon pseudoscalar matrix element to determine the pseudoscalar ($G_P$) form factor with the help of the axial Wald-Takahashi identity. The resulting form factors, $F_P(q^2)$ and $G_P(q^2)$, are in good agreement with the prediction of the pion-pole dominance model. In the new analysis, the induced pseudoscalar coupling $g_P^\ast$ and the pion-nucleon coupling $g_{πNN}$ can be evaluated with a few percent accuracy including systematic uncertainties using existing data calculated at two lattice spacings. △ Less

Submitted 11 May, 2025; originally announced May 2025.

Comments: 45 pages, 31 figures

Report number: UTHEP-802, UTCCS-P-165, HUPD-2503, KEK-TH-2715

arXiv:2504.18447 [pdf, other]

Iterative Event-based Motion Segmentation by Variational Contrast Maximization

Authors: Ryo Yamaki, Shintaro Shiba, Guillermo Gallego, Yoshimitsu Aoki

Abstract: Event cameras provide rich signals that are suitable for motion estimation since they respond to changes in the scene. As any visual changes in the scene produce event data, it is paramount to classify the data into different motions (i.e., motion segmentation), which is useful for various tasks such as object detection and visual servoing. We propose an iterative motion segmentation method, by cl… ▽ More Event cameras provide rich signals that are suitable for motion estimation since they respond to changes in the scene. As any visual changes in the scene produce event data, it is paramount to classify the data into different motions (i.e., motion segmentation), which is useful for various tasks such as object detection and visual servoing. We propose an iterative motion segmentation method, by classifying events into background (e.g., dominant motion hypothesis) and foreground (independent motion residuals), thus extending the Contrast Maximization framework. Experimental results demonstrate that the proposed method successfully classifies event clusters both for public and self-recorded datasets, producing sharp, motion-compensated edge-like images. The proposed method achieves state-of-the-art accuracy on moving object detection benchmarks with an improvement of over 30%, and demonstrates its possibility of applying to more complex and noisy real-world scenes. We hope this work broadens the sensitivity of Contrast Maximization with respect to both motion parameters and input events, thus contributing to theoretical advancements in event-based motion segmentation estimation. https://github.com/aoki-media-lab/event_based_segmentation_vcmax △ Less

Submitted 25 April, 2025; originally announced April 2025.

Comments: 11 pages, 9 figures, 3 tables, CVPR Workshop 2025

arXiv:2504.16983 [pdf, other]

Baryon Number Violation: From Nuclear Matrix Elements to BSM Physics

Authors: Leah J. Broussard, Andreas Crivellin, Martin Hoferichter, Sergey Syritsyn, Yasumichi Aoki, Joshua L. Barrow, Arnau Bas i Beneito, Zurab Berezhiani, Nicola Fulvio Calabria, Svjetlana Fajfer, Susan Gardner, Julian Heeck, Cailian Jiang, Luca Naterop, Alexey A. Petrov, Robert Shrock, Adrian Thompson, Ubirajara van Kolck, Michael L. Wagman, Linyan Wan, John Womersley, Jun-Sik Yoo

Abstract: Processes that violate baryon number, most notably proton decay and $n\bar n$ transitions, are promising probes of physics beyond the Standard Model (BSM) needed to understand the lack of antimatter in the Universe. To interpret current and forthcoming experimental limits, theory input from nuclear matrix elements to UV complete models enters. Thus, an interplay of experiment, effective field theo… ▽ More Processes that violate baryon number, most notably proton decay and $n\bar n$ transitions, are promising probes of physics beyond the Standard Model (BSM) needed to understand the lack of antimatter in the Universe. To interpret current and forthcoming experimental limits, theory input from nuclear matrix elements to UV complete models enters. Thus, an interplay of experiment, effective field theory, lattice QCD, and BSM model building is required to develop strategies to accurately extract information from current and future data and maximize the impact and sensitivity of next-generation experiments. Here, we briefly summarize the main results and discussions from the workshop "INT-25-91W: Baryon Number Violation: From Nuclear Matrix Elements to BSM Physics," held at the Institute for Nuclear Theory, University of Washington, Seattle, WA, January 13-17, 2025. △ Less

Submitted 23 April, 2025; originally announced April 2025.

Comments: 42 pages, summary of INT workshop "INT-25-91W: Baryon Number Violation: From Nuclear Matrix Elements to BSM Physics"

Report number: INT-PUB-25-009, LA-UR-25-23551, PSI-PR-25-07, YITP-SB-2025-08, ZU-TH 23/25

arXiv:2504.04428 [pdf, other]

Formula-Supervised Sound Event Detection: Pre-Training Without Real Data

Authors: Yuto Shibata, Keitaro Tanaka, Yoshiaki Bando, Keisuke Imoto, Hirokatsu Kataoka, Yoshimitsu Aoki

Abstract: In this paper, we propose a novel formula-driven supervised learning (FDSL) framework for pre-training an environmental sound analysis model by leveraging acoustic signals parametrically synthesized through formula-driven methods. Specifically, we outline detailed procedures and evaluate their effectiveness for sound event detection (SED). The SED task, which involves estimating the types and timi… ▽ More In this paper, we propose a novel formula-driven supervised learning (FDSL) framework for pre-training an environmental sound analysis model by leveraging acoustic signals parametrically synthesized through formula-driven methods. Specifically, we outline detailed procedures and evaluate their effectiveness for sound event detection (SED). The SED task, which involves estimating the types and timings of sound events, is particularly challenged by the difficulty of acquiring a sufficient quantity of accurately labeled training data. Moreover, it is well known that manually annotated labels often contain noises and are significantly influenced by the subjective judgment of annotators. To address these challenges, we propose a novel pre-training method that utilizes a synthetic dataset, Formula-SED, where acoustic data are generated solely based on mathematical formulas. The proposed method enables large-scale pre-training by using the synthesis parameters applied at each time step as ground truth labels, thereby eliminating label noise and bias. We demonstrate that large-scale pre-training with Formula-SED significantly enhances model accuracy and accelerates training, as evidenced by our results in the DESED dataset used for DCASE2023 Challenge Task 4. The project page is at https://yutoshibata07.github.io/Formula-SED/ △ Less

Submitted 6 April, 2025; originally announced April 2025.

Comments: Accepted by ICASSP 2025

arXiv:2504.04029 [pdf, other]

Simultaneous Motion And Noise Estimation with Event Cameras

Authors: Shintaro Shiba, Yoshimitsu Aoki, Guillermo Gallego

Abstract: Event cameras are emerging vision sensors, whose noise is challenging to characterize. Existing denoising methods for event cameras consider other tasks such as motion estimation separately (i.e., sequentially after denoising). However, motion is an intrinsic part of event data, since scene edges cannot be sensed without motion. This work proposes, to the best of our knowledge, the first method th… ▽ More Event cameras are emerging vision sensors, whose noise is challenging to characterize. Existing denoising methods for event cameras consider other tasks such as motion estimation separately (i.e., sequentially after denoising). However, motion is an intrinsic part of event data, since scene edges cannot be sensed without motion. This work proposes, to the best of our knowledge, the first method that simultaneously estimates motion in its various forms (e.g., ego-motion, optical flow) and noise. The method is flexible, as it allows replacing the 1-step motion estimation of the widely-used Contrast Maximization framework with any other motion estimator, such as deep neural networks. The experiments show that the proposed method achieves state-of-the-art results on the E-MLB denoising benchmark and competitive results on the DND21 benchmark, while showing its efficacy on motion estimation and intensity reconstruction tasks. We believe that the proposed approach contributes to strengthening the theory of event-data denoising, as well as impacting practical denoising use-cases, as we release the code upon acceptance. Project page: https://github.com/tub-rip/ESMD △ Less

Submitted 4 April, 2025; originally announced April 2025.

Comments: 13 pages, 13 figures, 6 tables

arXiv:2503.23519 [pdf, other]

BoundMatch: Boundary detection applied to semi-supervised segmentation for urban-driving scenes

Authors: Haruya Ishikawa, Yoshimitsu Aoki

Abstract: Semi-supervised semantic segmentation (SS-SS) aims to mitigate the heavy annotation burden of dense pixel labeling by leveraging abundant unlabeled images alongside a small labeled set. While current teacher-student consistency regularization methods achieve strong results, they often overlook a critical challenge: the precise delineation of object boundaries. In this paper, we propose BoundMatch,… ▽ More Semi-supervised semantic segmentation (SS-SS) aims to mitigate the heavy annotation burden of dense pixel labeling by leveraging abundant unlabeled images alongside a small labeled set. While current teacher-student consistency regularization methods achieve strong results, they often overlook a critical challenge: the precise delineation of object boundaries. In this paper, we propose BoundMatch, a novel multi-task SS-SS framework that explicitly integrates semantic boundary detection into the consistency regularization pipeline. Our core mechanism, Boundary Consistency Regularized Multi-Task Learning (BCRM), enforces prediction agreement between teacher and student models on both segmentation masks and detailed semantic boundaries. To further enhance performance and sharpen contours, BoundMatch incorporates two lightweight fusion modules: Boundary-Semantic Fusion (BSF) injects learned boundary cues into the segmentation decoder, while Spatial Gradient Fusion (SGF) refines boundary predictions using mask gradients, leading to higher-quality boundary pseudo-labels. This framework is built upon SAMTH, a strong teacher-student baseline featuring a Harmonious Batch Normalization (HBN) update strategy for improved stability. Extensive experiments on diverse datasets including Cityscapes, BDD100K, SYNTHIA, ADE20K, and Pascal VOC show that BoundMatch achieves competitive performance against state-of-the-art methods while significantly improving boundary-specific evaluation metrics. We also demonstrate its effectiveness in realistic large-scale unlabeled data scenarios and on lightweight architectures designed for mobile deployment. △ Less

Submitted 30 March, 2025; originally announced March 2025.

Comments: 15 pages, 7 figures

arXiv:2503.21190 [pdf, other]

Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering

Authors: Erika Mori, Yue Qiu, Hirokatsu Kataoka, Yoshimitsu Aoki

Abstract: Social intelligence, the ability to interpret emotions, intentions, and behaviors, is essential for effective communication and adaptive responses. As robots and AI systems become more prevalent in caregiving, healthcare, and education, the demand for AI that can interact naturally with humans grows. However, creating AI that seamlessly integrates multiple modalities, such as vision and speech, re… ▽ More Social intelligence, the ability to interpret emotions, intentions, and behaviors, is essential for effective communication and adaptive responses. As robots and AI systems become more prevalent in caregiving, healthcare, and education, the demand for AI that can interact naturally with humans grows. However, creating AI that seamlessly integrates multiple modalities, such as vision and speech, remains a challenge. Current video-based methods for social intelligence rely on general video recognition or emotion recognition techniques, often overlook the unique elements inherent in human interactions. To address this, we propose the Looped Video Debating (LVD) framework, which integrates Large Language Models (LLMs) with visual information, such as facial expressions and body movements, to enhance the transparency and reliability of question-answering tasks involving human interaction videos. Our results on the Social-IQ 2.0 benchmark show that LVD achieves state-of-the-art performance without fine-tuning. Furthermore, supplementary human annotations on existing datasets provide insights into the model's accuracy, guiding future improvements in AI-driven social intelligence. △ Less

Submitted 27 March, 2025; originally announced March 2025.

arXiv:2503.20180 [pdf, other]

In-orbit Performance of the Soft X-ray Imaging Telescope Xtend aboard XRISM

Authors: Hiroyuki Uchida, Koji Mori, Hiroshi Tomida, Hiroshi Nakajima, Hirofumi Noda, Takaaki Tanaka, Hiroshi Murakami, Hiromasa Suzuki, Shogo Benjamin Kobayashi, Tomokage Yoneyama, Kouichi Hagino, Kumiko Kawabata Nobukawa, Hideki Uchiyama, Masayoshi Nobukawa, Hironori Matsumoto, Takeshi Go Tsuru, Makoto Yamauchi, Isamu Hatsukade, Hirokazu Odaka, Takayoshi Kohmura, Kazutaka Yamaoka, Tessei Yoshida, Yoshiaki Kanemaru, Daiki Ishi, Tadayasu Dotani , et al. (40 additional authors not shown)

Abstract: We present a summary of the in-orbit performance of the soft X-ray imaging telescope Xtend onboard the XRISM mission, based on in-flight observation data, including first-light celestial objects, calibration sources, and results from the cross-calibration campaign with other currently-operating X-ray observatories. XRISM/Xtend has a large field of view of $38.5'\times38.5'$, covering an energy ran… ▽ More We present a summary of the in-orbit performance of the soft X-ray imaging telescope Xtend onboard the XRISM mission, based on in-flight observation data, including first-light celestial objects, calibration sources, and results from the cross-calibration campaign with other currently-operating X-ray observatories. XRISM/Xtend has a large field of view of $38.5'\times38.5'$, covering an energy range of 0.4--13 keV, as demonstrated by the first-light observation of the galaxy cluster Abell 2319. It also features an energy resolution of 170--180 eV at 6 keV, which meets the mission requirement and enables to resolve He-like and H-like Fe K$α$ lines. Throughout the observation during the performance verification phase, we confirm that two issues identified in SXI onboard the previous Hitomi mission -- light leakage and crosstalk events -- are addressed and suppressed in the case of Xtend. A joint cross-calibration observation of the bright quasar 3C273 results in an effective area measured to be $\sim420$ cm$^{2}[email protected] keV and $\sim310$ cm$^{2}[email protected] keV, which matches values obtained in ground tests. We also continuously monitor the health of Xtend by analyzing overclocking data, calibration source spectra, and day-Earth observations: the readout noise is stable and low, and contamination is negligible even one year after launch. A low background level compared to other major X-ray instruments onboard satellites, combined with the largest grasp ($Ω_{\rm eff}\sim60$ ${\rm cm^2~degree^2}$) of Xtend, will not only support Resolve analysis, but also enable significant scientific results on its own. This includes near future follow-up observations and transient searches in the context of time-domain and multi-messenger astrophysics. △ Less

Submitted 25 March, 2025; originally announced March 2025.

Comments: 16 pages, 20 figures, 2 tables, accepted for publication in the PASJ XRISM special issue

arXiv:2503.06760 [pdf, ps, other]

New CCD Driving Technique to Suppress Anomalous Charge Intrusion from Outside the Imaging Area for Soft X-ray Imager of Xtend onboard XRISM

Authors: Hirofumi Noda, Mio Aoyagi, Koji Mori, Hiroshi Tomida, Hiroshi Nakajima, Takaaki Tanaka, Hiromasa Suzuki, Hiroshi Murakami, Hiroyuki Uchida, Takeshi G. Tsuru, Keitaro Miyazaki, Kohei Kusunoki, Yoshiaki Kanemaru, Yuma Aoki, Kumiko Nobukawa, Masayoshi Nobukawa, Kohei Shima, Marina Yoshimoto, Kazunori Asakura, Hironori Matsumoto, Tomokage Yoneyama, Shogo B. Kobayashi, Kouichi Hagino, Hideki Uchiyama, Kiyoshi Hayashida

Abstract: The Soft X-ray Imager (SXI) is an X-ray CCD camera of the Xtend system onboard the X-Ray Imaging and Spectroscopy Mission (XRISM), which was successfully launched on September 7, 2023 (JST). During ground cooling tests of the CCDs in 2020/2021, using the flight-model detector housing, electronic boards, and a mechanical cooler, we encountered an unexpected issue. Anomalous charges appeared outside… ▽ More The Soft X-ray Imager (SXI) is an X-ray CCD camera of the Xtend system onboard the X-Ray Imaging and Spectroscopy Mission (XRISM), which was successfully launched on September 7, 2023 (JST). During ground cooling tests of the CCDs in 2020/2021, using the flight-model detector housing, electronic boards, and a mechanical cooler, we encountered an unexpected issue. Anomalous charges appeared outside the imaging area of the CCDs and intruded into the imaging area, causing pulse heights to stick to the maximum value over a wide region. Although this issue has not occurred in subsequent tests or in orbit so far, it could seriously affect the imaging and spectroscopic performance of the SXI if it were to happen in the future. Through experiments with non-flight-model detector components, we successfully reproduced the issue and identified that the anomalous charges intrude via the potential structure created by the charge injection electrode at the top of the imaging area. To prevent anomalous charge intrusion and maintain imaging and spectroscopic performance that satisfies the requirements, even if this issue occurs in orbit, we developed a new CCD driving technique. This technique is different from the normal operation in terms of potential structure and its changes during imaging and charge injection. In this paper, we report an overview of the anomalous charge issue, the related potential structures, the development of the new CCD driving technique to prevent the issue, the imaging and spectroscopic performance of the new technique, and the results of experiments to investigate the cause of anomalous charges. △ Less

Submitted 9 March, 2025; originally announced March 2025.

Comments: 13 pages, 8 figures, Accepted for publication in JATIS XRISM special issue

arXiv:2503.00389 [pdf, other]

BGM2Pose: Active 3D Human Pose Estimation with Non-Stationary Sounds

Authors: Yuto Shibata, Yusuke Oumi, Go Irie, Akisato Kimura, Yoshimitsu Aoki, Mariko Isogawa

Abstract: We propose BGM2Pose, a non-invasive 3D human pose estimation method using arbitrary music (e.g., background music) as active sensing signals. Unlike existing approaches that significantly limit practicality by employing intrusive chirp signals within the audible range, our method utilizes natural music that causes minimal discomfort to humans. Estimating human poses from standard music presents si… ▽ More We propose BGM2Pose, a non-invasive 3D human pose estimation method using arbitrary music (e.g., background music) as active sensing signals. Unlike existing approaches that significantly limit practicality by employing intrusive chirp signals within the audible range, our method utilizes natural music that causes minimal discomfort to humans. Estimating human poses from standard music presents significant challenges. In contrast to sound sources specifically designed for measurement, regular music varies in both volume and pitch. These dynamic changes in signals caused by music are inevitably mixed with alterations in the sound field resulting from human motion, making it hard to extract reliable cues for pose estimation. To address these challenges, BGM2Pose introduces a Contrastive Pose Extraction Module that employs contrastive learning and hard negative sampling to eliminate musical components from the recorded data, isolating the pose information. Additionally, we propose a Frequency-wise Attention Module that enables the model to focus on subtle acoustic variations attributable to human movement by dynamically computing attention across frequency bands. Experiments suggest that our method outperforms the existing methods, demonstrating substantial potential for real-world applications. Our datasets and code will be made publicly available. △ Less

Submitted 1 March, 2025; originally announced March 2025.

arXiv:2502.08303 [pdf, ps, other]

Lattice gauge ensembles and data management

Authors: Yasumichi Aoki, Ed Bennett, Ryan Bignell, Kadir Utku Can, Takumi Doi, Steven Gottlieb, Rajan Gupta, Georg von Hippel, Issaku Kanamori, Andrey Kotov, Giannis Koutsou, Agostino Patella, Giovanni Pederiva, Christian Schmidt, Takeshi Yamazaki, Yi-Bo Yang

Abstract: We summarize the status of lattice QCD ensemble generation efforts and their data management characteristics. Namely, these proceedings combine the contributions to a dedicated parallel session during the 41st International Symposium on Lattice Field Theory (Lattice 2024), during which representatives of 16 lattice QCD collaborations provided details on their simulation program, with focus on plan… ▽ More We summarize the status of lattice QCD ensemble generation efforts and their data management characteristics. Namely, these proceedings combine the contributions to a dedicated parallel session during the 41st International Symposium on Lattice Field Theory (Lattice 2024), during which representatives of 16 lattice QCD collaborations provided details on their simulation program, with focus on plans for publication, data management, and storage requirements. The parallel session was organized by the International Lattice Data Grid (ILDG), following an open call to the lattice QCD community for participation in the session. △ Less

Submitted 12 February, 2025; originally announced February 2025.

Comments: 20 pages. Proceedings summarizing the contributions to the "Lattice Data" session, held during the 41st International Symposium on Lattice Field theory (LATTICE2024), July 28th - August 3rd, 2024, Liverpool, UK

arXiv:2502.08030 [pdf, ps, other]

Soft X-ray Imager of the Xtend system onboard XRISM

Authors: Hirofumi Noda, Koji Mori, Hiroshi Tomida, Hiroshi Nakajima, Takaaki Tanaka, Hiroshi Murakami, Hiroyuki Uchida, Hiromasa Suzuki, Shogo Benjamin Kobayashi, Tomokage Yoneyama, Kouichi Hagino, Kumiko Nobukawa, Hideki Uchiyama, Masayoshi Nobukawa, Hironori Matsumoto, Takeshi Go Tsuru, Makoto Yamauchi, Isamu Hatsukade, Hirokazu Odaka, Takayoshi Kohmura, Kazutaka Yamaoka, Tessei Yoshida, Yoshiaki Kanemaru, Junko Hiraga, Tadayasu Dotani , et al. (35 additional authors not shown)

Abstract: The Soft X-ray Imager (SXI) is the X-ray charge-coupled device (CCD) camera for the soft X-ray imaging telescope Xtend installed on the X-ray Imaging and Spectroscopy Mission (XRISM), which was adopted as a recovery mission for the Hitomi X-ray satellite and was successfully launched on 2023 September 7 (JST). In order to maximize the science output of XRISM, we set the requirements for Xtend and… ▽ More The Soft X-ray Imager (SXI) is the X-ray charge-coupled device (CCD) camera for the soft X-ray imaging telescope Xtend installed on the X-ray Imaging and Spectroscopy Mission (XRISM), which was adopted as a recovery mission for the Hitomi X-ray satellite and was successfully launched on 2023 September 7 (JST). In order to maximize the science output of XRISM, we set the requirements for Xtend and find that the CCD set employed in the Hitomi/SXI or similar, i.e., a $2 \times 2$ array of back-illuminated CCDs with a $200~μ$m-thick depletion layer, would be practically best among available choices, when used in combination with the X-ray mirror assembly. We design the XRISM/SXI, based on the Hitomi/SXI, to have a wide field of view of $38' \times 38'$ in the $0.4-13$ keV energy range. We incorporated several significant improvements from the Hitomi/SXI into the CCD chip design to enhance the optical-light blocking capability and to increase the cosmic-ray tolerance, reducing the degradation of charge-transfer efficiency in orbit. By the time of the launch of XRISM, the imaging and spectroscopic capabilities of the SXI has been extensively studied in on-ground experiments with the full flight-model configuration or equivalent setups and confirmed to meet the requirements. The optical blocking capability, the cooling and temperature control performance, and the transmissivity and quantum efficiency to incident X-rays of the CCDs are also all confirmed to meet the requirements. Thus, we successfully complete the pre-flight development of the SXI for XRISM. △ Less

Submitted 11 February, 2025; originally announced February 2025.

Comments: 14 pages, 11 figures, 3 tables, Accepted for publication in PASJ XRISM special issue

arXiv:2501.15494 [pdf, other]

Three flavor QCD phase transition with Möbius domain wall fermions

Authors: Yu Zhang, Yasumichi Aoki, Shoji Hashimoto, Issaku Kanamori, Takashi Kaneko, Yoshifumi Nakamura

Abstract: We present an updated study of the $N_f=3$ QCD phase transition using Möbius domain wall fermions. Simulations were performed on $N_t=12$ lattices with aspect ratios ranging from 2 to 4 for various quark masses, at a lattice spacing of $a=0.1361(20)$ fm, corresponding to a temperature of 121(2) MeV. To clarify the nature of the phase transition, a large-volume lattice, $48^3 \times 12\times 16$, w… ▽ More We present an updated study of the $N_f=3$ QCD phase transition using Möbius domain wall fermions. Simulations were performed on $N_t=12$ lattices with aspect ratios ranging from 2 to 4 for various quark masses, at a lattice spacing of $a=0.1361(20)$ fm, corresponding to a temperature of 121(2) MeV. To clarify the nature of the phase transition, a large-volume lattice, $48^3 \times 12\times 16$, was added to analyze the volume dependence of disconnected chiral susceptibility. By examining the chiral condensate, disconnected chiral susceptibility, and Binder cumulant, and incorporating results from $24^3 \times 12 \times 16$ and $36^3 \times 12 \times 16$ lattices reported in earlier studies, we observe that the transition is consistent with a crossover at a quark mass of approximately $m_f^{\mathrm{\overline {MS}}}(2\, \mathrm{GeV}) \sim 4$ MeV at this temperature. Furthermore, we discuss the effects of residual chiral symmetry breaking on the chiral condensate and disconnected chiral susceptibility for different sizes in the 5th direction. △ Less

Submitted 8 February, 2025; v1 submitted 26 January, 2025; originally announced January 2025.

Comments: The abstract and conclusions have been revised to align with the consensus of all authors, addressing concerns about clarity and tone

arXiv:2501.13490 [pdf, other]

A proposal for removing $πN$-state contamination from the nucleon induced pseudoscalar form factor in lattice QCD

Authors: Shoichi Sasaki, Yasumichi Aoki, Ken-Ichi Ishikawa, Yoshinobu Kuramashi, Kohei Sato, Eigo Shintani, Ryutaro Tsuji, Hiromasa Watanabe, Takeshi Yamazaki

Abstract: In the PACS10 project, the PACS collaboration has generated three sets of the PACS10 gauge configurations at the physical point with lattice volume larger than $(10\;{\rm fm})^4$ and three different lattice spacings. The isovector nucleon form factors had been already calculated by using two sets of the PACS10 gauge configurations. In our strategy, the smearing parameters of the nucleon interpolat… ▽ More In the PACS10 project, the PACS collaboration has generated three sets of the PACS10 gauge configurations at the physical point with lattice volume larger than $(10\;{\rm fm})^4$ and three different lattice spacings. The isovector nucleon form factors had been already calculated by using two sets of the PACS10 gauge configurations. In our strategy, the smearing parameters of the nucleon interpolation operator were highly optimized to eliminate as much as possible the contribution of excited states in the nucleon two-point function. This strategy was quite successful in calculations of the electric ($G_E$), magnetic ($G_M$) and axial-vector ($F_A$) form factors, while the induced pseudoscalar ($F_P$) and pseudoscalar ($G_P$) form factors remained strongly affected by residual contamination of $πN$-state contribution. In this work, we propose a simple method to remove the $πN$-state contamination from the $F_P$ form factor, and then evaluate the induced pseudoscalar charge $g_P^\ast$ and the pion-nucleon coupling $g_{πNN}$ from existing data in a new analysis. Applying this method to the $G_P$ form factor is also considered with a help of the axial Ward-Takahashi identity. △ Less

Submitted 21 April, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

Comments: 10 pages, 11 figures, v2: typos corrected, Proceedings of the 41st International Symposium on Lattice Field Theory (Lattice 2024), July 28th - August 3rd, 2024, University of Liverpool, UK

Report number: KEK-TH-2683

arXiv:2501.13429 [pdf, other]

Proton decay matrix elements on PACS configurations

Authors: Ryutaro Tsuji, Yasumichi Aoki, Yoshinobu Kuramashi, Eigo Shintani

Abstract: We report the preliminary results of lattice computation for the proton decay matrix elements in $N_f=2+1$ physical point with Wilson-clover fermion. We perform it on the PACS configurations of $64^4$ lattice volume with lattice spacing $a=0.085$ fm, and carefully estimate the systematic uncertainties, especially for the excited state contamination and associated error of the renormalization const… ▽ More We report the preliminary results of lattice computation for the proton decay matrix elements in $N_f=2+1$ physical point with Wilson-clover fermion. We perform it on the PACS configurations of $64^4$ lattice volume with lattice spacing $a=0.085$ fm, and carefully estimate the systematic uncertainties, especially for the excited state contamination and associated error of the renormalization constant with Regularization Independent (RI, Rome-Southampton) scheme. Our preliminary results of the twelve relevant transition modes in proton decay matrix element and comparison with other lattice results are presented. △ Less

Submitted 23 January, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

Comments: 9 pages, 9 figures, Proceedings of the 41st International Symposium on Lattice Field Theory (Lattice 2024), July 28th - August 3rd, 2024, University of Liverpool, UK

Report number: KEK-TH-2682

arXiv:2501.12675 [pdf, other]

Symmetry of screening masses of mesons in two-flavor lattice QCD at high temperatures

Authors: Yasumichi Aoki, Hidenori Fukaya, Shoji Hashimoto, Issaku Kanamori, Yoshifumi Nakamura, Christian Rohrhofer, Kei Suzuki, David Ward

Abstract: We investigate spatial two-point correlation functions of mesonic operators in two-flavor lattice QCD at high temperatures. The simulated temperatures over the range $T \in [147, 330]$ MeV, where the critical temperature is estimated around 165 MeV. To ensure a good control of the chiral symmetry we employ the Möbius domain-wall fermion action for two degenerate flavors of quarks. With a lattice c… ▽ More We investigate spatial two-point correlation functions of mesonic operators in two-flavor lattice QCD at high temperatures. The simulated temperatures over the range $T \in [147, 330]$ MeV, where the critical temperature is estimated around 165 MeV. To ensure a good control of the chiral symmetry we employ the Möbius domain-wall fermion action for two degenerate flavors of quarks. With a lattice cut off $a^{-1}\sim 2.6$ GeV, the residual mass is reduced to 0.14 MeV. With the energy spectrum obtained from the screening mass at incremental values of the temperature range, we examine the $SU(2)_L\times SU(2)_R$ chiral symmetry, the anomalous axial $U(1)$ as well as an enhanced symmetry which exchanges the spin degrees of freedom. We also study how the data approaches the perturbative prediction given by twice the Matsubara frequency of free quarks. △ Less

Submitted 28 January, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

Comments: 25 pages + 2 page appendix, 13 figures, v2 added author name to arXiv metadata, v3 added citations, additional footnotes, and reorganized appendix plots

arXiv:2501.09278 [pdf, other]

Text-guided Synthetic Geometric Augmentation for Zero-shot 3D Understanding

Authors: Kohei Torimi, Ryosuke Yamada, Daichi Otsuka, Kensho Hara, Yuki M. Asano, Hirokatsu Kataoka, Yoshimitsu Aoki

Abstract: Zero-shot recognition models require extensive training data for generalization. However, in zero-shot 3D classification, collecting 3D data and captions is costly and laborintensive, posing a significant barrier compared to 2D vision. Recent advances in generative models have achieved unprecedented realism in synthetic data production, and recent research shows the potential for using generated d… ▽ More Zero-shot recognition models require extensive training data for generalization. However, in zero-shot 3D classification, collecting 3D data and captions is costly and laborintensive, posing a significant barrier compared to 2D vision. Recent advances in generative models have achieved unprecedented realism in synthetic data production, and recent research shows the potential for using generated data as training data. Here, naturally raising the question: Can synthetic 3D data generated by generative models be used as expanding limited 3D datasets? In response, we present a synthetic 3D dataset expansion method, Textguided Geometric Augmentation (TeGA). TeGA is tailored for language-image-3D pretraining, which achieves SoTA in zero-shot 3D classification, and uses a generative textto-3D model to enhance and extend limited 3D datasets. Specifically, we automatically generate text-guided synthetic 3D data and introduce a consistency filtering strategy to discard noisy samples where semantics and geometric shapes do not match with text. In the experiment to double the original dataset size using TeGA, our approach demonstrates improvements over the baselines, achieving zeroshot performance gains of 3.0% on Objaverse-LVIS, 4.6% on ScanObjectNN, and 8.7% on ModelNet40. These results demonstrate that TeGA effectively bridges the 3D data gap, enabling robust zero-shot 3D classification even with limited real training data and paving the way for zero-shot 3D vision application. △ Less

Submitted 17 January, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

arXiv:2501.03509 [pdf, other]

Quark number susceptibility and conserved charge fluctuation for (2+1)-flavor QCD with Möbius domain wall fermions

Authors: Jishnu Goswami, Yasumichi Aoki, Hidenori Fukaya, Shoji Hashimoto, Issaku Kanamori, Takashi Kaneko, Yoshifumi Nakamura, Yu Zhang

Abstract: We present quark number susceptibilities and conserved charge fluctuations for (2+1)-flavor QCD using Möbius Domain Wall fermions with a pion mass of $135~\rm{MeV}$. Our results are compared with hadron resonance gas models below the QCD transition temperature and with $\mathcal{O}(g^2)$ perturbation theory at high temperatures. Additionally, we compare our findings with results from staggered… ▽ More We present quark number susceptibilities and conserved charge fluctuations for (2+1)-flavor QCD using Möbius Domain Wall fermions with a pion mass of $135~\rm{MeV}$. Our results are compared with hadron resonance gas models below the QCD transition temperature and with $\mathcal{O}(g^2)$ perturbation theory at high temperatures. Additionally, we compare our findings with results from staggered fermion discretizations. Furthermore, we also present results of leading order Kurtosis of electric charge and strangeness fluctuations. △ Less

Submitted 6 January, 2025; originally announced January 2025.

Comments: 10 pages, 4 figures, Prepared for the proceedings of LATTICE2024 held at University of Liverpool, Liverpool, UK

arXiv:2412.06574 [pdf, other]

Study of symmetries in finite temperature $N_f=2$ QCD with Möbius Domain Wall Fermions

Authors: David Ward, Sinya Aoki, Yasumichi Aoki, Hidenori Fukaya, Shoji Hashimoto, Issaku Kanamori, Takashi Kaneko, Jishnu Goswami, Yu Zhang

Abstract: We report on the ongoing study of symmetry of $N_f=2$ QCD around the critical temperature. Our simulations of $N_f = 2$ QCD employ the Möbius domain-wall fermion action with residual mass $\sim 1\mbox{MeV}$ or less, maintaining a good chiral symmetry. Using the screening masses from the two point spatial correlators we compare the mass difference between channels connected through various symmetry… ▽ More We report on the ongoing study of symmetry of $N_f=2$ QCD around the critical temperature. Our simulations of $N_f = 2$ QCD employ the Möbius domain-wall fermion action with residual mass $\sim 1\mbox{MeV}$ or less, maintaining a good chiral symmetry. Using the screening masses from the two point spatial correlators we compare the mass difference between channels connected through various symmetry transformations. Our analysis focuses on restoration of the $SU(2)_L\times SU(2)_R$ as well as anomalously broken axial $U(1)_A$. We also present additional study of a potential $SU(2)_{CS}$ symmetry which may emerge at sufficiently high temperatures. △ Less

Submitted 30 January, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

Comments: 9 pages, 4 figures, Proceedings for 41st International Symposium on Lattice Field Theory (Lattice 2024) - University of Liverpool, Liverpool, UK - July 28th - August 3rd 2024. arXiv admin note: text overlap with arXiv:2401.07514. v2 Added citations and reduced page number, v3 Finalized version -- reduced pages, added references and removed figures

arXiv:2412.01113 [pdf, other]

Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Arithmetic Reasoning

Authors: Keito Kudo, Yoichi Aoki, Tatsuki Kuribayashi, Shusaku Sone, Masaya Taniguchi, Ana Brassard, Keisuke Sakaguchi, Kentaro Inui

Abstract: This study investigates the internal reasoning process of language models during arithmetic multi-step reasoning, motivated by the question of when they internally form their answers during reasoning. Particularly, we inspect whether the answer is determined before or after chain-of-thought (CoT) begins to determine whether models follow a post-hoc Think-to-Talk mode or a step-by-step Talk-to-Thin… ▽ More This study investigates the internal reasoning process of language models during arithmetic multi-step reasoning, motivated by the question of when they internally form their answers during reasoning. Particularly, we inspect whether the answer is determined before or after chain-of-thought (CoT) begins to determine whether models follow a post-hoc Think-to-Talk mode or a step-by-step Talk-to-Think mode of explanation. Through causal probing experiments in controlled arithmetic reasoning tasks, we found systematic internal reasoning patterns across models in our case study; for example, single-step subproblems are solved before CoT begins, and more complicated multi-step calculations are performed during CoT. △ Less

Submitted 17 April, 2025; v1 submitted 1 December, 2024; originally announced December 2024.

arXiv:2411.16784 [pdf, other]

Studies of nucleon isovector structure with the PACS10 superfine lattice

Authors: Ryutaro Tsuji, Yasumichi Aoki, Ken-Ichi Ishikawa, Yoshinobu Kuramashi, Shoichi Sasaki, Kohei Sato, Eigo Shintani, Hiromasa Watanabe, Takeshi Yamazaki

Abstract: We present the results for the nucleon axial-vector, induced pseudoscalar and pion-nucleon couplings obtained from 2+1 flavor lattice QCD at the physical point with a large spatial extent of about 10 fm. Our calculations are performed with the PACS10 gauge configurations generated by the PACS Collaboration with the six stout-smeared $O(a)$ improved Wilson-clover quark action and Iwasaki gauge acti… ▽ More We present the results for the nucleon axial-vector, induced pseudoscalar and pion-nucleon couplings obtained from 2+1 flavor lattice QCD at the physical point with a large spatial extent of about 10 fm. Our calculations are performed with the PACS10 gauge configurations generated by the PACS Collaboration with the six stout-smeared $O(a)$ improved Wilson-clover quark action and Iwasaki gauge action at $β$ = 1.82, 2.00 and 2.20 corresponding to lattice spacings of 0.09 fm (coarse), 0.06 fm (fine) and 0.04 fm (superfine), respectively. We first evaluate the value of the nucleon axial-vector coupling. In addition, the induced pseudoscalar and pion-nucleon couplings from the induced pseudoscalar form factor are also investigated. Combining the results obtained from the all of our coarse, fine and superfine lattices, we finally discuss the systematic uncertainties in our calculation based on the comparison with both of the experimental values and lattice QCD results provided by the other collaborations. △ Less

Submitted 24 January, 2025; v1 submitted 25 November, 2024; originally announced November 2024.

Comments: 9 pages, 2 figures, Proceedings of the 41st International Symposium on Lattice Field Theory (Lattice 2024), July 28th - August 3rd, 2024, University of Liverpool, UK. arXiv admin note: text overlap with arXiv:2401.05340

Report number: KEK-TH-2681

arXiv:2411.07165 [pdf, other]

Acoustic-based 3D Human Pose Estimation Robust to Human Position

Authors: Yusuke Oumi, Yuto Shibata, Go Irie, Akisato Kimura, Yoshimitsu Aoki, Mariko Isogawa

Abstract: This paper explores the problem of 3D human pose estimation from only low-level acoustic signals. The existing active acoustic sensing-based approach for 3D human pose estimation implicitly assumes that the target user is positioned along a line between loudspeakers and a microphone. Because reflection and diffraction of sound by the human body cause subtle acoustic signal changes compared to soun… ▽ More This paper explores the problem of 3D human pose estimation from only low-level acoustic signals. The existing active acoustic sensing-based approach for 3D human pose estimation implicitly assumes that the target user is positioned along a line between loudspeakers and a microphone. Because reflection and diffraction of sound by the human body cause subtle acoustic signal changes compared to sound obstruction, the existing model degrades its accuracy significantly when subjects deviate from this line, limiting its practicality in real-world scenarios. To overcome this limitation, we propose a novel method composed of a position discriminator and reverberation-resistant model. The former predicts the standing positions of subjects and applies adversarial learning to extract subject position-invariant features. The latter utilizes acoustic signals before the estimation target time as references to enhance robustness against the variations in sound arrival times due to diffraction and reflection. We construct an acoustic pose estimation dataset that covers diverse human locations and demonstrate through experiments that our proposed method outperforms existing approaches. △ Less

Submitted 8 November, 2024; originally announced November 2024.

Comments: Accepted at BMVC2024

arXiv:2411.04268 [pdf, other]

FLAG Review 2024

Authors: Y. Aoki, T. Blum, S. Collins, L. Del Debbio, M. Della Morte, P. Dimopoulos, X. Feng, M. Golterman, Steven Gottlieb, R. Gupta, G. Herdoiza, P. Hernandez, A. Jüttner, T. Kaneko, E. Lunghi, S. Meinel, C. Monahan, A. Nicholson, T. Onogi, P. Petreczky, A. Portelli, A. Ramos, S. R. Sharpe, J. N. Simone, S. Sint , et al. (6 additional authors not shown)

Abstract: We review lattice results related to pion, kaon, $D$-meson, $B$-meson, and nucleon physics with the aim of making them easily accessible to the nuclear and particle physics communities. More specifically, we report on the determination of the light-quark masses, the form factor $f_+(0)$ arising in the semileptonic $K \to π$ transition at zero momentum transfer, as well as the decay-constant ratio… ▽ More We review lattice results related to pion, kaon, $D$-meson, $B$-meson, and nucleon physics with the aim of making them easily accessible to the nuclear and particle physics communities. More specifically, we report on the determination of the light-quark masses, the form factor $f_+(0)$ arising in the semileptonic $K \to π$ transition at zero momentum transfer, as well as the decay-constant ratio $f_K/f_π$ and its consequences for the CKM matrix elements $V_{us}$ and $V_{ud}$. We review the determination of the $B_K$ parameter of neutral kaon mixing as well as the additional four $B$ parameters that arise in theories of physics beyond the Standard Model. For the heavy-quark sector, we provide results for $m_c$ and $m_b$ as well as those for the decay constants, form factors, and mixing parameters of charmed and bottom mesons and baryons. These are the heavy-quark quantities most relevant for the determination of CKM matrix elements and the global CKM unitarity-triangle fit. We review the status of lattice determinations of the strong coupling constant $α_s$. We review the determinations of nucleon charges from the matrix elements of both isovector and flavour-diagonal axial, scalar and tensor local quark bilinears, and momentum fraction, helicity moment and the transversity moment from one-link quark bilinears. We also review determinations of scale-setting quantities. Finally, in this review we have added a new section on the general definition of the low-energy limit of the Standard Model. △ Less

Submitted 17 January, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

Comments: 435 pages, 53 Figures, 190 tables. arXiv admin note: substantial text overlap with arXiv:2111.09849, arXiv:1902.08191, some corrections and updated references

Report number: CERN-TH-2024-192

arXiv:2410.00511 [pdf, other]

Pre-training with Synthetic Patterns for Audio

Authors: Yuchi Ishikawa, Tatsuya Komatsu, Yoshimitsu Aoki

Abstract: In this paper, we propose to pre-train audio encoders using synthetic patterns instead of real audio data. Our proposed framework consists of two key elements. The first one is Masked Autoencoder (MAE), a self-supervised learning framework that learns from reconstructing data from randomly masked counterparts. MAEs tend to focus on low-level information such as visual patterns and regularities wit… ▽ More In this paper, we propose to pre-train audio encoders using synthetic patterns instead of real audio data. Our proposed framework consists of two key elements. The first one is Masked Autoencoder (MAE), a self-supervised learning framework that learns from reconstructing data from randomly masked counterparts. MAEs tend to focus on low-level information such as visual patterns and regularities within data. Therefore, it is unimportant what is portrayed in the input, whether it be images, audio mel-spectrograms, or even synthetic patterns. This leads to the second key element, which is synthetic data. Synthetic data, unlike real audio, is free from privacy and licensing infringement issues. By combining MAEs and synthetic patterns, our framework enables the model to learn generalized feature representations without real data, while addressing the issues related to real audio. To evaluate the efficacy of our framework, we conduct extensive experiments across a total of 13 audio tasks and 17 synthetic datasets. The experiments provide insights into which types of synthetic patterns are effective for audio. Our results demonstrate that our framework achieves performance comparable to models pre-trained on AudioSet-2M and partially outperforms image-based pre-training methods. △ Less

Submitted 1 October, 2024; originally announced October 2024.

Comments: Submitted to ICASSP'25

arXiv:2409.20221 [pdf, other]

doi 10.7566/JPSJ.93.044706

Linear Magnetoresistance and Type-I Superconductivity in $β$-IrSn$_4$

Authors: Nazir Ahmad, Shunsuke Shimada, Takumi Hasegawa, Hiroto Suzuki, Md Asif Afzal, Naoki Nakamura, Ryuji Higashinaka, Tatsuma D. Matsuda, Yuji Aoki

Abstract: Layered material $β$-IrSn$_4$ ($I4_1/acd$, $D^{20}_{4h}$, #142), whose electron bands have symmetry-enforced Dirac points, was investigated using high-quality single crystals. It exhibits a pronounced linear field-dependence of magnetoresistance (LMR), which cannot be explained by currently existing models. Structures in the field-angle dependence of magnetoresistance and Hall resistivity are attr… ▽ More Layered material $β$-IrSn$_4$ ($I4_1/acd$, $D^{20}_{4h}$, #142), whose electron bands have symmetry-enforced Dirac points, was investigated using high-quality single crystals. It exhibits a pronounced linear field-dependence of magnetoresistance (LMR), which cannot be explained by currently existing models. Structures in the field-angle dependence of magnetoresistance and Hall resistivity are attributable to the Fermi surface topology; the presence of open orbits is inferred. At the superconducting (SC) transition, the specific-heat jump exhibits a significant increase in applied fields, revealing the type-I SC nature. This feature is attributable to the high Fermi velocity of linearly dispersive multibands. To clarify the mechanism of the puzzling LMR, investigations into the topological nature of those multibands in applied fields are highly desired. △ Less

Submitted 30 September, 2024; originally announced September 2024.

Journal ref: J. Phys. Soc. Jpn. 93, 044706 (2024)

arXiv:2409.06665 [pdf, other]

Data Collection-free Masked Video Modeling

Authors: Yuchi Ishikawa, Masayoshi Kondo, Yoshimitsu Aoki

Abstract: Pre-training video transformers generally requires a large amount of data, presenting significant challenges in terms of data collection costs and concerns related to privacy, licensing, and inherent biases. Synthesizing data is one of the promising ways to solve these issues, yet pre-training solely on synthetic data has its own challenges. In this paper, we introduce an effective self-supervised… ▽ More Pre-training video transformers generally requires a large amount of data, presenting significant challenges in terms of data collection costs and concerns related to privacy, licensing, and inherent biases. Synthesizing data is one of the promising ways to solve these issues, yet pre-training solely on synthetic data has its own challenges. In this paper, we introduce an effective self-supervised learning framework for videos that leverages readily available and less costly static images. Specifically, we define the Pseudo Motion Generator (PMG) module that recursively applies image transformations to generate pseudo-motion videos from images. These pseudo-motion videos are then leveraged in masked video modeling. Our approach is applicable to synthetic images as well, thus entirely freeing video pre-training from data collection costs and other concerns in real data. Through experiments in action recognition tasks, we demonstrate that this framework allows effective learning of spatio-temporal features through pseudo-motion videos, significantly improving over existing methods which also use static images and partially outperforming those using both real and synthetic videos. These results uncover fragments of what video transformers learn through masked video modeling. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: ECCV 2024

arXiv:2409.04692 [pdf, other]

doi 10.1016/j.cma.2025.117772

Data-driven multifidelity topology design with multi-channel variational auto-encoder for concurrent optimization of multiple design variable fields

Authors: Hiroki Kawabe, Kentaro Yaji, Yuichiro Aoki

Abstract: The objective of this study is to establish a gradient-free topology optimization framework that facilitates more global solution searches to avoid entrapping in undesirable local optima, especially in problems with strong non-linearity. The framework utilizes a data-driven multifidelity topology design, where solution candidates resulting from low-fidelity optimization problems are iteratively up… ▽ More The objective of this study is to establish a gradient-free topology optimization framework that facilitates more global solution searches to avoid entrapping in undesirable local optima, especially in problems with strong non-linearity. The framework utilizes a data-driven multifidelity topology design, where solution candidates resulting from low-fidelity optimization problems are iteratively updated by a variational auto-encoder (VAE) and high-fidelity (HF) evaluation. A key step in the solution update involves constructing HF models by extruding VAE-generated material distributions to a constant thickness (the HF modeling parameter) across all candidates, which limits exploration of the parameter space and requires extensive parametric studies outside the optimization loop. To achieve comprehensive optimization in a single run, we propose a multi-channel image data architecture that stores material distributions and HF modeling parameters in separate channels, allowing simultaneous optimization of the HF parameter space. We demonstrated the efficacy of the proposed framework by solving a maximum stress minimization problem, characterized by strong non-linearity due to its minimax formulation. △ Less

Submitted 6 September, 2024; originally announced September 2024.

arXiv:2409.00768 [pdf, other]

Rethinking Image Super-Resolution from Training Data Perspectives

Authors: Go Ohtani, Ryu Tadokoro, Ryosuke Yamada, Yuki M. Asano, Iro Laina, Christian Rupprecht, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka, Yoshimitsu Aoki

Abstract: In this work, we investigate the understudied effect of the training data used for image super-resolution (SR). Most commonly, novel SR methods are developed and benchmarked on common training datasets such as DIV2K and DF2K. However, we investigate and rethink the training data from the perspectives of diversity and quality, {thereby addressing the question of ``How important is SR training for S… ▽ More In this work, we investigate the understudied effect of the training data used for image super-resolution (SR). Most commonly, novel SR methods are developed and benchmarked on common training datasets such as DIV2K and DF2K. However, we investigate and rethink the training data from the perspectives of diversity and quality, {thereby addressing the question of ``How important is SR training for SR models?''}. To this end, we propose an automated image evaluation pipeline. With this, we stratify existing high-resolution image datasets and larger-scale image datasets such as ImageNet and PASS to compare their performances. We find that datasets with (i) low compression artifacts, (ii) high within-image diversity as judged by the number of different objects, and (iii) a large number of images from ImageNet or PASS all positively affect SR performance. We hope that the proposed simple-yet-effective dataset curation pipeline will inform the construction of SR datasets in the future and yield overall better models. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: Accepted to ECCV2024

arXiv:2407.18507 [pdf, ps, other]

Feasibility study of upper atmosphere density measurement on the ISS by observations of the CXB transmitted through the Earth rim

Authors: Takumi Kishimoto, Kumiko K. Nobukawa, Ayaki Takeda, Takeshi G. Tsuru, Satoru Katsuda, Nakazawa Kazuhiro, Koji Mori, Masayoshi Nobukawa, Hiroyuki Uchida, Yoshihisa Kawabe, Satoru Kuwano, Eisuke Kurogi, Yamato Ito, Yuma Aoki

Abstract: Measurements of the upper atmosphere at ~100 km are important to investigate climate change, space weather forecasting, and the interaction between the Sun and the Earth. Atmospheric occultations of cosmic X-ray sources are an effective technique to measure the neutral density in the upper atmosphere. We are developing the instrument SUIM dedicated to continuous observations of atmospheric occulta… ▽ More Measurements of the upper atmosphere at ~100 km are important to investigate climate change, space weather forecasting, and the interaction between the Sun and the Earth. Atmospheric occultations of cosmic X-ray sources are an effective technique to measure the neutral density in the upper atmosphere. We are developing the instrument SUIM dedicated to continuous observations of atmospheric occultations. SUIM will be mounted on a platform on the exterior of the International Space Station for six months and pointed at the Earth's rim to observe atmospheric absorption of the cosmic X-ray background (CXB). In this paper, we conducted a feasibility study of SUIM by estimating the CXB statistics and the fraction of the non-X-ray background (NXB) in the observed data. The estimated CXB statistics are enough to evaluate the atmospheric absorption of CXB for every 15 km of altitude. On the other hand, the NXB will be dominant in the X-ray spectra of SUIM. Assuming that the NXB per detection area of SUIM is comparable to that of the soft X-ray Imager onboard Hitomi, the NXB level will be much higher than the CXB one and account for ~80% of the total SUIM spectra. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: 5 pages, 5 figures, Proceedings of SPIE Astronomical Telescopes and Instrumentation 2024

arXiv:2407.16922 [pdf, ps, other]

SUIM project: measuring the upper atmosphere from the ISS by observations of the CXB transmitted through the Earth rim

Authors: Kumiko K. Nobukawa, Ayaki Takeda, Satoru Katsuda, Takeshi G. Tsuru, Kazuhiro Nakazawa, Koji Mori, Hiroyuki Uchida, Masayoshi Nobukawa, Eisuke Kurogi, Takumi Kishimoto, Reo Matsui, Yuma Aoki, Yamato Ito, Satoru Kuwano, Tomitaka Tanaka, Mizuki Uenomachi, Masamune Matsuda, Takaya Yamawaki, Takayoshi Kohmura

Abstract: The upper atmosphere at the altitude of 60-110 km, the mesosphere and lower thermosphere (MLT), has the least observational data of all atmospheres due to the difficulties of in-situ observations. Previous studies demonstrated that atmospheric occultation of cosmic X-ray sources is an effective technique to investigate the MLT. Aiming to measure the atmospheric density of the MLT continuously, we… ▽ More The upper atmosphere at the altitude of 60-110 km, the mesosphere and lower thermosphere (MLT), has the least observational data of all atmospheres due to the difficulties of in-situ observations. Previous studies demonstrated that atmospheric occultation of cosmic X-ray sources is an effective technique to investigate the MLT. Aiming to measure the atmospheric density of the MLT continuously, we are developing an X-ray camera, "Soipix for observing Upper atmosphere as Iss experiment Mission (SUIM)", dedicated to atmospheric observations. SUIM will be installed on the exposed area of the International Space Station (ISS) and face the ram direction of the ISS to point toward the Earth rim. Observing the cosmic X-ray background (CXB) transmitted through the atmosphere, we will measure the absorption column density via spectroscopy and thus obtain the density of the upper atmosphere. The X-ray camera is composed of a slit collimator and two X-ray SOI-CMOS pixel sensors (SOIPIX), and will stand on its own and make observations, controlled by a CPU-embedded FPGA "Zynq". We plan to install the SUIM payload on the ISS in 2025 during the solar maximum. In this paper, we report the overview and the development status of this project. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 5 pages, 2 figures, Proceedings of SPIE Astronomical Telescopes and Instrumentation 2024

arXiv:2407.02999 [pdf, other]

doi 10.1103/PhysRevLett.133.016401

Fermi Surface Nesting Driving the RKKY Interaction in the Centrosymmetric Skyrmion Magnet Gd2PdSi3

Authors: Yuyang Dong, Yosuke Arai, Kenta Kuroda, Masayuki Ochi, Natsumi Tanaka, Yuxuan Wan, Matthew D. Watson, Timur K. Kim, Cephise Cacho, Makoto Hashimoto, Donghui Lu, Yuji Aoki, Tatsuma D. Matsuda, Takeshi Kondo

Abstract: The magnetic skyrmions generated in a centrosymmetric crystal were recently first discovered in Gd2PdSi3. In light of this, we observe the electronic structure by angle-resolved photoemission spectroscopy (ARPES) and unveil its direct relationship with the magnetism in this compound. The Fermi surface and band dispersions are demonstrated to have a good agreement with the density functional theory… ▽ More The magnetic skyrmions generated in a centrosymmetric crystal were recently first discovered in Gd2PdSi3. In light of this, we observe the electronic structure by angle-resolved photoemission spectroscopy (ARPES) and unveil its direct relationship with the magnetism in this compound. The Fermi surface and band dispersions are demonstrated to have a good agreement with the density functional theory (DFT) calculations carried out with careful consideration of the crystal superstructure. Most importantly, we find that the three-dimensional Fermi surface has extended nesting which matches well the q-vector of the magnetic order detected by recent scattering measurements. The consistency we find among ARPES, DFT, and the scattering measurements suggests the Ruderman-Kittel-Kasuya-Yosida (RKKY) interaction involving itinerant electrons to be the formation mechanism of skyrmions in Gd2PdSi3. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Journal ref: Phys. Rev. Lett. 133, 016401 (2024)

arXiv:2406.19911 [pdf, other]

Status of Xtend telescope onboard X-Ray Imaging and Spectroscopy Mission (XRISM)

Authors: Koji Mori, Hiroshi Tomida, Hiroshi Nakajima, Takashi Okajima, Hirofumi Noda, Hiroyuki Uchida, Hiromasa Suzuki, Shogo Benjamin Kobayashi, Tomokage Yoneyama, Kouichi Hagino, Kumiko Nobukawa, Takaaki Tanaka, Hiroshi Murakami, Hideki Uchiyama, Masayoshi Nobukawa, Hironori Matsumoto, Takeshi Tsuru, Makoto Yamauchi, Isamu Hatsukade, Hirokazu Odaka, Takayoshi Kohmura, Kazutaka Yamaoka, Manabu Ishida, Yoshitomo Maeda, Takayuki Hayashi , et al. (38 additional authors not shown)

Abstract: Xtend is one of the two telescopes onboard the X-ray imaging and spectroscopy mission (XRISM), which was launched on September 7th, 2023. Xtend comprises the Soft X-ray Imager (SXI), an X-ray CCD camera, and the X-ray Mirror Assembly (XMA), a thin-foil-nested conically approximated Wolter-I optics. A large field of view of $38^{\prime}\times38^{\prime}$ over the energy range from 0.4 to 13 keV is… ▽ More Xtend is one of the two telescopes onboard the X-ray imaging and spectroscopy mission (XRISM), which was launched on September 7th, 2023. Xtend comprises the Soft X-ray Imager (SXI), an X-ray CCD camera, and the X-ray Mirror Assembly (XMA), a thin-foil-nested conically approximated Wolter-I optics. A large field of view of $38^{\prime}\times38^{\prime}$ over the energy range from 0.4 to 13 keV is realized by the combination of the SXI and XMA with a focal length of 5.6 m. The SXI employs four P-channel, back-illuminated type CCDs with a thick depletion layer of 200 $μ$m. The four CCD chips are arranged in a 2$\times$2 grid and cooled down to $-110$ $^{\circ}$C with a single-stage Stirling cooler. Before the launch of XRISM, we conducted a month-long spacecraft thermal vacuum test. The performance verification of the SXI was successfully carried out in a course of multiple thermal cycles of the spacecraft. About a month after the launch of XRISM, the SXI was carefully activated and the soundness of its functionality was checked by a step-by-step process. Commissioning observations followed the initial operation. We here present pre- and post-launch results verifying the Xtend performance. All the in-orbit performances are consistent with those measured on ground and satisfy the mission requirement. Extensive calibration studies are ongoing. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: 10 pages, 8 figures. Proceedings of SPIE Astronomical Telescopes and Instrumentation 2024

arXiv:2406.19910 [pdf, other]

Initial operations of the Soft X-ray Imager onboard XRISM

Authors: Hiromasa Suzuki, Tomokage Yoneyama, Shogo B. Kobayashi, Hirofumi Noda, Hiroyuki Uchida, Kumiko K. Nobukawa, Kouichi Hagino, Koji Mori, Hiroshi Tomida, Hiroshi Nakajima, Takaaki Tanaka, Hiroshi Murakami, Hideki Uchiyama, Masayoshi Nobukawa, Yoshiaki Kanemaru, Yoshinori Otsuka, Haruhiko Yokosu, Wakana Yonemaru, Hanako Nakano, Kazuhiro Ichikawa, Reo Takemoto, Tsukasa Matsushima, Marina Yoshimoto, Mio Aoyagi, Kohei Shima , et al. (30 additional authors not shown)

Abstract: XRISM (X-Ray Imaging and Spectroscopy Mission) is an astronomical satellite with the capability of high-resolution spectroscopy with the X-ray microcalorimeter, Resolve, and wide field-of-view imaging with the CCD camera, Xtend. Xtend consists of the mirror assembly (XMA: X-ray Mirror Assembly) and detector (SXI: Soft X-ray Imager). The SXI is composed of CCDs, analog and digital electronics, and… ▽ More XRISM (X-Ray Imaging and Spectroscopy Mission) is an astronomical satellite with the capability of high-resolution spectroscopy with the X-ray microcalorimeter, Resolve, and wide field-of-view imaging with the CCD camera, Xtend. Xtend consists of the mirror assembly (XMA: X-ray Mirror Assembly) and detector (SXI: Soft X-ray Imager). The SXI is composed of CCDs, analog and digital electronics, and a mechanical cooler. After the successful launch on September 6th, 2023 (UT) and subsequent critical operations, the mission instruments were turned on and set up. The CCDs have been kept at the designed operating temperature of $-110^\circ$C after the electronics and cooling system were successfully set up. During the initial operation phase, which continued for more than a month after the critical operations, we verified the observation procedure, stability of the cooling system, all the observation options with different imaging areas and/or timing resolutions, and time-tagged and automated operations including those for South Atlantic Anomaly passages. We optimized the operation procedure and observation parameters including the cooler settings, imaging areas for the small window modes, and event selection algorithm. We summarize our policy and procedure of the initial operations for the SXI. We also report on a couple of issues we faced during the initial operations and lessons learned from them. △ Less

Submitted 14 February, 2025; v1 submitted 28 June, 2024; originally announced June 2024.

Comments: 14 pages, 8 figures, accepted for publication in JATIS

arXiv:2406.16078 [pdf, other]

First Heuristic Then Rational: Dynamic Use of Heuristics in Language Model Reasoning

Authors: Yoichi Aoki, Keito Kudo, Tatsuki Kuribayashi, Shusaku Sone, Masaya Taniguchi, Keisuke Sakaguchi, Kentaro Inui

Abstract: Multi-step reasoning instruction, such as chain-of-thought prompting, is widely adopted to explore better language models (LMs) performance. We report on the systematic strategy that LMs employ in such a multi-step reasoning process. Our controlled experiments reveal that LMs rely more heavily on heuristics, such as lexical overlap, in the earlier stages of reasoning, where more reasoning steps re… ▽ More Multi-step reasoning instruction, such as chain-of-thought prompting, is widely adopted to explore better language models (LMs) performance. We report on the systematic strategy that LMs employ in such a multi-step reasoning process. Our controlled experiments reveal that LMs rely more heavily on heuristics, such as lexical overlap, in the earlier stages of reasoning, where more reasoning steps remain to reach a goal. Conversely, their reliance on heuristics decreases as LMs progress closer to the final answer through multiple reasoning steps. This suggests that LMs can backtrack only a limited number of future steps and dynamically combine heuristic strategies with rationale ones in tasks involving multi-step reasoning. △ Less

Submitted 7 October, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

Comments: This paper is accepted at EMNLP 2024

arXiv:2404.08504 [pdf, other]

3D Human Scan With A Moving Event Camera

Authors: Kai Kohyama, Shintaro Shiba, Yoshimitsu Aoki

Abstract: Capturing a 3D human body is one of the important tasks in computer vision with a wide range of applications such as virtual reality and sports analysis. However, conventional frame cameras are limited by their temporal resolution and dynamic range, which imposes constraints in real-world application setups. Event cameras have the advantages of high temporal resolution and high dynamic range (HDR)… ▽ More Capturing a 3D human body is one of the important tasks in computer vision with a wide range of applications such as virtual reality and sports analysis. However, conventional frame cameras are limited by their temporal resolution and dynamic range, which imposes constraints in real-world application setups. Event cameras have the advantages of high temporal resolution and high dynamic range (HDR), but the development of event-based methods is necessary to handle data with different characteristics. This paper proposes a novel event-based method for 3D pose estimation and human mesh recovery. Prior work on event-based human mesh recovery require frames (images) as well as event data. The proposed method solely relies on events; it carves 3D voxels by moving the event camera around a stationary body, reconstructs the human pose and mesh by attenuated rays, and fit statistical body models, preserving high-frequency details. The experimental results show that the proposed method outperforms conventional frame-based methods in the estimation accuracy of both pose and body mesh. We also demonstrate results in challenging situations where a conventional camera has motion blur. This is the first to demonstrate event-only human mesh recovery, and we hope that it is the first step toward achieving robust and accurate 3D human body scanning from vision sensors. https://florpeng.github.io/event-based-human-scan/ △ Less

Submitted 16 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Journal ref: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop On Computer Vision For Mixed Reality (CV4MR), Seattle, 2024

arXiv:2403.12530 [pdf, other]

PCT: Perspective Cue Training Framework for Multi-Camera BEV Segmentation

Authors: Haruya Ishikawa, Takumi Iida, Yoshinori Konishi, Yoshimitsu Aoki

Abstract: Generating annotations for bird's-eye-view (BEV) segmentation presents significant challenges due to the scenes' complexity and the high manual annotation cost. In this work, we address these challenges by leveraging the abundance of unlabeled data available. We propose the Perspective Cue Training (PCT) framework, a novel training framework that utilizes pseudo-labels generated from unlabeled per… ▽ More Generating annotations for bird's-eye-view (BEV) segmentation presents significant challenges due to the scenes' complexity and the high manual annotation cost. In this work, we address these challenges by leveraging the abundance of unlabeled data available. We propose the Perspective Cue Training (PCT) framework, a novel training framework that utilizes pseudo-labels generated from unlabeled perspective images using publicly available semantic segmentation models trained on large street-view datasets. PCT applies a perspective view task head to the image encoder shared with the BEV segmentation head, effectively utilizing the unlabeled data to be trained with the generated pseudo-labels. Since image encoders are present in nearly all camera-based BEV segmentation architectures, PCT is flexible and applicable to various existing BEV architectures. PCT can be applied to various settings where unlabeled data is available. In this paper, we applied PCT for semi-supervised learning (SSL) and unsupervised domain adaptation (UDA). Additionally, we introduce strong input perturbation through Camera Dropout (CamDrop) and feature perturbation via BEV Feature Dropout (BFD), which are crucial for enhancing SSL capabilities using our teacher-student framework. Our comprehensive approach is simple and flexible but yields significant improvements over various baselines for SSL and UDA, achieving competitive performances even against the current state-of-the-art. △ Less

Submitted 15 July, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

Comments: 13 pages, 5 figures; Accepted to IROS 2024

arXiv:2403.11197 [pdf, other]

TAG: Guidance-free Open-Vocabulary Semantic Segmentation

Authors: Yasufumi Kawano, Yoshimitsu Aoki

Abstract: Semantic segmentation is a crucial task in computer vision, where each pixel in an image is classified into a category. However, traditional methods face significant challenges, including the need for pixel-level annotations and extensive training. Furthermore, because supervised learning uses a limited set of predefined categories, models typically struggle with rare classes and cannot recognize… ▽ More Semantic segmentation is a crucial task in computer vision, where each pixel in an image is classified into a category. However, traditional methods face significant challenges, including the need for pixel-level annotations and extensive training. Furthermore, because supervised learning uses a limited set of predefined categories, models typically struggle with rare classes and cannot recognize new ones. Unsupervised and open-vocabulary segmentation, proposed to tackle these issues, faces challenges, including the inability to assign specific class labels to clusters and the necessity of user-provided text queries for guidance. In this context, we propose a novel approach, TAG which achieves Training, Annotation, and Guidance-free open-vocabulary semantic segmentation. TAG utilizes pre-trained models such as CLIP and DINO to segment images into meaningful categories without additional training or dense annotations. It retrieves class labels from an external database, providing flexibility to adapt to new scenarios. Our TAG achieves state-of-the-art results on PascalVOC, PascalContext and ADE20K for open-vocabulary segmentation without given class names, i.e. improvement of +15.3 mIoU on PascalVOC. All code and data will be released at https://github.com/Valkyrja3607/TAG. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 18 pages

arXiv:2403.11194 [pdf, other]

MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation

Authors: Yasufumi Kawano, Yoshimitsu Aoki

Abstract: Semantic segmentation is essential in computer vision for various applications, yet traditional approaches face significant challenges, including the high cost of annotation and extensive training for supervised learning. Additionally, due to the limited predefined categories in supervised learning, models typically struggle with infrequent classes and are unable to predict novel classes. To addre… ▽ More Semantic segmentation is essential in computer vision for various applications, yet traditional approaches face significant challenges, including the high cost of annotation and extensive training for supervised learning. Additionally, due to the limited predefined categories in supervised learning, models typically struggle with infrequent classes and are unable to predict novel classes. To address these limitations, we propose MaskDiffusion, an innovative approach that leverages pretrained frozen Stable Diffusion to achieve open-vocabulary semantic segmentation without the need for additional training or annotation, leading to improved performance compared to similar methods. We also demonstrate the superior performance of MaskDiffusion in handling open vocabularies, including fine-grained and proper noun-based categories, thus expanding the scope of segmentation applications. Overall, our MaskDiffusion shows significant qualitative and quantitative improvements in contrast to other comparable unsupervised segmentation methods, i.e. on the Potsdam dataset (+10.5 mIoU compared to GEM) and COCO-Stuff (+14.8 mIoU compared to DiffSeg). All code and data will be released at https://github.com/Valkyrja3607/MaskDiffusion. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: 19 pages

arXiv:2401.14022 [pdf, other]

Axial U(1) symmetry near the pseudocritical temperature in $N_f=2+1$ lattice QCD with chiral fermions

Authors: Sinya Aoki, Yasumichi Aoki, Hidenori Fukaya, Shoji Hashimoto, Issaku Kanamori, Takashi Kaneko, Yoshifumi Nakamura, Christian Rohrhofer, Kei Suzuki, David Ward

Abstract: We study the $U(1)_A$ anomaly at high temperatures of $N_f=2+1$ lattice QCD with chiral fermions. Gauge ensembles are generated with Möbius domain-wall (MDW) fermions, and the measurements are reweighted to those with overlap fermions. We report on the results for the Dirac spectra, the $U(1)_A$ susceptibility, and the topological susceptibility in the temperature range of $T=136$, $153$, $175$, a… ▽ More We study the $U(1)_A$ anomaly at high temperatures of $N_f=2+1$ lattice QCD with chiral fermions. Gauge ensembles are generated with Möbius domain-wall (MDW) fermions, and the measurements are reweighted to those with overlap fermions. We report on the results for the Dirac spectra, the $U(1)_A$ susceptibility, and the topological susceptibility in the temperature range of $T=136$, $153$, $175$, and $204$ MeV, where the up and down quark masses are set to be near the physical point as well as at lighter or heavier masses. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 10 pages, 4 figures, 1 table; Proceedings of the 40th International Symposium on Lattice Field Theory (Lattice 2023), July 31st - August 4th, 2023, Fermilab, Batavia, Illinois, USA

Report number: KEK-CP-0399, OU-HET-1216

arXiv:2401.07514 [pdf, other]

Study of Chiral Symmetry and $U(1)_A$ using Spatial Correlators for $N_f=2+1$ QCD at finite temperature with Domain Wall Fermions

Authors: David Ward, Sinya Aoki, Yasumichi Aoki, Hidenori Fukaya, Shoji Hashimoto, Issaku Kanamori, Takashi Kaneko, Jishnu Goswami, Yu Zhang

Abstract: Based on simulations of 2+1 flavor lattice QCD with Möbius domain wall fermions at high temperatures, we compute a series of spatial correlation functions to study the screening masses in mesonic states. We compare these masses with the symmetry relations for various quark masses and lattice sizes at temperatures above the critical point. Using these spatial correlation functions we examine the… ▽ More Based on simulations of 2+1 flavor lattice QCD with Möbius domain wall fermions at high temperatures, we compute a series of spatial correlation functions to study the screening masses in mesonic states. We compare these masses with the symmetry relations for various quark masses and lattice sizes at temperatures above the critical point. Using these spatial correlation functions we examine the $SU(2)_L \times SU(2)_R$ symmetry as well as the anomalously broken axial $U(1)_A$ symmetry. Additionally we explore a possible and emergent chiral-spin symmetry $SU(2)_{CS}$. △ Less

Submitted 24 January, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

Comments: 10 pages, 5 figures, Contribution to the 40th International Symposium on Lattice Field Theory (Lattice 2023)

arXiv:2401.06459 [pdf, other]

Chiral susceptibility and axial U(1) anomaly near the (pseudo-)critical temperature

Authors: S. Aoki, Y. Aoki, H. Fukaya, S. Hashimoto, I. Kanamori, T. Kaneko, Y. Nakamura, K. Suzuki, D. Ward

Abstract: We investigate relations between the chiral susceptibility and axial $U(1)$ anomaly in lattice QCD at high temperatures. Employing the exactly chiral symmetric Dirac operator, we separate the purely axial $U(1)$ breaking effect in the connected and disconnected chiral susceptibilites in a theoretically clean manner. Preliminary results for two-flavor lattice QCD near the critical temperature are p… ▽ More We investigate relations between the chiral susceptibility and axial $U(1)$ anomaly in lattice QCD at high temperatures. Employing the exactly chiral symmetric Dirac operator, we separate the purely axial $U(1)$ breaking effect in the connected and disconnected chiral susceptibilites in a theoretically clean manner. Preliminary results for two-flavor lattice QCD near the critical temperature are presented. △ Less

Submitted 26 March, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: 9 pages, 3 figures, LATTICE2023, minor corrections

Report number: OU-HET-1211, KEK-CP-0395

arXiv:2401.05340 [pdf, other]

Discretization effects on nucleon root-mean-square radii from lattice QCD at the physical point

Authors: Ryutaro Tsuji, Yasumichi Aoki, Ken-Ichi Ishikawa, Yoshinobu Kuramashi, Shoichi Sasaki, Eigo Shintani, Takeshi Yamazaki

Abstract: We present results for the axial-vector coupling and root-mean-square (RMS) radii of the nucleon obtained from 2+1 flavor lattice QCD at the physical point with a large spatial extent of about 10 fm. Our calculations are performed with the PACS10 gauge configurations generated by the PACS Collaboration with the six stout-smeared $O(a)$ improved Wilson-clover quark action and Iwasaki gauge action a… ▽ More We present results for the axial-vector coupling and root-mean-square (RMS) radii of the nucleon obtained from 2+1 flavor lattice QCD at the physical point with a large spatial extent of about 10 fm. Our calculations are performed with the PACS10 gauge configurations generated by the PACS Collaboration with the six stout-smeared $O(a)$ improved Wilson-clover quark action and Iwasaki gauge action at $β$ = 1.82 and 2.00 corresponding to lattice spacings of 0.085 fm and 0.063 fm, respectively. We first evaluate the value of the axial-vector coupling of the nucleon ($g_A$). In addition, the isovector electric, magnetic and axial radii and magnetic moment from the corresponding form factors are also determined. Combining the results at $β=1.82$ and $2.00$, we finally discuss the finite lattice spacing effect. It was found that the effect on $g_A$ is kept smaller than the statistical error of 2% while the effect on the isovector radii was observed as a possible discretization error of about 10%, regardless of the channel. △ Less

Submitted 29 November, 2023; originally announced January 2024.

Comments: 7 pages, 1 figure, Proceeding for the 40th International Symposium on Lattice Field Theory, 31 July 2023 - 4 August 2023, Chicago, USA

arXiv:2401.05066 [pdf, other]

Exploring the QCD phase diagram with three flavors of Möbius domain wall fermions

Authors: Yu Zhang, Yasumichi Aoki, Shoji Hashimoto, Issaku Kanamori, Takashi Kaneko, Yoshifumi Nakamura

Abstract: We present an update on the study of the QCD phase transition with 3 flavors of Möbius domain wall fermions at zero baryon density. We performed simulations on lattices of size $36^3\times12\times16$ and $24^3\times12\times32$ with a variety of quark masses at a fixed lattice spacing $a=0.1361(20)$ fm, which correspond to a temperature 121(2) MeV. By analyzing the chiral condensate, chiral suscept… ▽ More We present an update on the study of the QCD phase transition with 3 flavors of Möbius domain wall fermions at zero baryon density. We performed simulations on lattices of size $36^3\times12\times16$ and $24^3\times12\times32$ with a variety of quark masses at a fixed lattice spacing $a=0.1361(20)$ fm, which correspond to a temperature 121(2) MeV. By analyzing the chiral condensate, chiral susceptibilitities and Binder cumulant on $36^3\times12\times16$ lattices together with the result obtained from our previous study on $24^3\times12\times16$ lattices, we identified a crossover occurring at quark mass around $m_q^{\mathrm{\overline {MS}}}(2\, \mathrm{GeV}) \sim 3-4$ MeV for this temperature. Besides, we show the effects of residual chiral symmetry breaking on chiral condensate and chiral susceptibilities between $L_s=16$ and 32. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 10 pages, 6 figures, Contribution to the 40th International Symposium on Lattice Field Theory (Lattice 2023), July 31 to August 4, 2023, Fermilab, USA

arXiv:2311.10345 [pdf, other]

Nucleon form factors in $N_f=2+1$ lattice QCD at the physical point : finite lattice spacing effect on the root-mean-square radii

Authors: Ryutaro Tsuji, Yasumichi Aoki, Ken-Ichi Ishikawa, Yoshinobu Kuramashi, Shoichi Sasaki, Kohei Sato, Eigo Shintani, Hiromasa Watanabe, Takeshi Yamazaki

Abstract: We present results for the nucleon form factors: electric ($G_E$), magnetic ($G_M$), axial ($F_A$), induced pseudoscalar ($F_P$) and pseudoscalar ($G_P$) form factors, using the second PACS10 ensemble that is one of three sets of $2+1$ flavor lattice QCD configurations at physical quark masses in large spatial volumes (exceeding $(10\ \mathrm{fm})^3$). The second PACS10 gauge configurations are ge… ▽ More We present results for the nucleon form factors: electric ($G_E$), magnetic ($G_M$), axial ($F_A$), induced pseudoscalar ($F_P$) and pseudoscalar ($G_P$) form factors, using the second PACS10 ensemble that is one of three sets of $2+1$ flavor lattice QCD configurations at physical quark masses in large spatial volumes (exceeding $(10\ \mathrm{fm})^3$). The second PACS10 gauge configurations are generated by the PACS Collaboration with the six stout-smeared $O(a)$ improved Wilson quark action and Iwasaki gauge action at the second gauge coupling $β=2.00$ corresponding to the lattice spacing of $a=0.063$ fm. We determine the isovector electric, magnetic and axial radii and magnetic moment from the corresponding form factors, as well as the axial-vector coupling $g_A$. Combining our previous results for the coarser lattice spacing [E. Shintani et al., Phys. Rev. D99 (2019) 014510; Phys. Rev. D102 (2020) 019902 (erattum)], the finite lattice spacing effects on the isovector radii, magnetic moment and axial-vector coupling are investigated using the difference between the two results. It was found that the effect on $g_A$ is kept smaller than the statistical error of 2% while the effect on the isovector radii was observed as a possible discretization error of about 10%, regardless of the channel. We also report the partially conserved axial vector current (PCAC) relation using a set of nucleon three-point correlation functions in order to verify the effect by $O(a)$-improvement of the axial-vector current. △ Less

Submitted 5 June, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: 100 pages, 48 figures

Report number: UTHEP-783, UTCCS-P-149, HUPD-2307, YITP-23-143

arXiv:2311.00434 [pdf, other]

doi 10.1109/TPAMI.2023.3328188

Event-based Background-Oriented Schlieren

Authors: Shintaro Shiba, Friedhelm Hamann, Yoshimitsu Aoki, Guillermo Gallego

Abstract: Schlieren imaging is an optical technique to observe the flow of transparent media, such as air or water, without any particle seeding. However, conventional frame-based techniques require both high spatial and temporal resolution cameras, which impose bright illumination and expensive computation limitations. Event cameras offer potential advantages (high dynamic range, high temporal resolution,… ▽ More Schlieren imaging is an optical technique to observe the flow of transparent media, such as air or water, without any particle seeding. However, conventional frame-based techniques require both high spatial and temporal resolution cameras, which impose bright illumination and expensive computation limitations. Event cameras offer potential advantages (high dynamic range, high temporal resolution, and data efficiency) to overcome such limitations due to their bio-inspired sensing principle. This paper presents a novel technique for perceiving air convection using events and frames by providing the first theoretical analysis that connects event data and schlieren. We formulate the problem as a variational optimization one combining the linearized event generation model with a physically-motivated parameterization that estimates the temporal derivative of the air density. The experiments with accurately aligned frame- and event camera data reveal that the proposed method enables event cameras to obtain on par results with existing frame-based optical flow techniques. Moreover, the proposed method works under dark conditions where frame-based schlieren fails, and also enables slow-motion analysis by leveraging the event camera's advantages. Our work pioneers and opens a new stack of event camera applications, as we publish the source code as well as the first schlieren dataset with high-quality frame and event data. https://github.com/tub-rip/event_based_bos △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: Accepted at IEEE T-PAMI

Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, Oct. 2023

arXiv:2307.08000 [pdf, other]

Magnetic-domain-dependent pseudogap induced by Fermi surface nesting in a centrosymmetric skyrmion magnet

Authors: Yuyang Dong, Yuto Kinoshita, Masayuki Ochi, Ryu Nakachi, Ryuji Higashinaka, Satoru Hayami, Yuxuan Wan, Yosuke Arai, Soonsang Huh, Makoto Hashimoto, Donghui Lu, Masashi Tokunaga, Yuji Aoki, Tatsuma D. Matsuda, Takeshi Kondo

Abstract: Skyrmions in non-centrosymmetric materials are believed to occur due to the Dzyaloshinskii-Moriya interaction. In contrast, the skyrmion formation mechanism in centrosymmetric materials remains elusive. Among those, Gd-based compounds are the prototype compounds; however, their electronic structure is not uncovered, even though it should be the foundation for elucidating the skyrmion mechanism. He… ▽ More Skyrmions in non-centrosymmetric materials are believed to occur due to the Dzyaloshinskii-Moriya interaction. In contrast, the skyrmion formation mechanism in centrosymmetric materials remains elusive. Among those, Gd-based compounds are the prototype compounds; however, their electronic structure is not uncovered, even though it should be the foundation for elucidating the skyrmion mechanism. Here, we reveal the intrinsic electronic structure of GdRu2Si2 for the first time by magnetic domain selective measurements of angle-resolved photoemission spectroscopy (ARPES). In particular, we find the robust Fermi surface (FS) nesting, consistent with the q-vector detected by the previous resonant X-ray scattering (RXS) measurements. Most importantly, we find that the pseudogap is opened at the nested portions of FS at low temperatures. The momentum locations of the pseudogap vary for different magnetic domains, most likely having a direct relationship with the screw-type spin modulation that changes direction for each domain. Intriguingly, the anomalous pseudogap disconnects the FS to generate Fermi arcs with 2-fold symmetry. These results indicate the significance of Ruderman-Kittel-Kasuya-Yosida (RKKY) interaction, in which itinerant electrons mediate to stabilize the local magnetic moment, as the mechanism for the magnetism in the Gd-based skyrmion magnets. Our data also predict that the momentum space where the pseudogap opens is doubled (or Fermi arcs shrink) and thereby stabilizes the skyrmion phase under a magnetic field. Furthermore, we demonstrate the flexible nature of magnetism in GdRu2Si2 by manipulating magnetic domains with a magnetic field and temperature cyclings, providing a possibility of future application for data storage and processing device with centrosymmetric skyrmion magnets. △ Less

Submitted 16 July, 2023; originally announced July 2023.

arXiv:2306.05657 [pdf, other]

doi 10.1103/PhysRevD.109.074503

$B \to D^*\ellν_\ell$ semileptonic form factors from lattice QCD with Möbius domain-wall quarks

Authors: Y. Aoki, B. Colquhoun, H. Fukaya, S. Hashimoto, T. Kaneko, R. Kellermann, J. Koponen, E. Kou

Abstract: We calculate the form factors for the $B \to D^*\ellν_\ell$ decay in 2+1 flavor lattice QCD. For all quark flavors, we employ the Möbius domain-wall action, which preserves chiral symmetry to a good precision. Our gauge ensembles are generated at three lattice cutoffs $a^{-1} \sim 2.5$, 3.6 and 4.5 GeV with pion masses as low as $M_π\sim 230$ MeV. The physical lattice size $L$ satisfies the condit… ▽ More We calculate the form factors for the $B \to D^*\ellν_\ell$ decay in 2+1 flavor lattice QCD. For all quark flavors, we employ the Möbius domain-wall action, which preserves chiral symmetry to a good precision. Our gauge ensembles are generated at three lattice cutoffs $a^{-1} \sim 2.5$, 3.6 and 4.5 GeV with pion masses as low as $M_π\sim 230$ MeV. The physical lattice size $L$ satisfies the condition $M_πL \geq 4$ to control finite volume effects (FVEs), while we simulate a smaller size at the smallest $M_π$ to directly examine FVEs. The bottom quark masses are chosen in a range from the physical charm quark mass to $0.7 a^{-1}$ to control discretization effects. We extrapolate the form factors to the continuum limit and physical quark masses based on heavy meson chiral perturbation theory at next-to-leading order. Then the recoil parameter dependence is parametrized using a model independent form leading to our estimate of the decay rate ratio between the tau ($\ell = τ$) and light lepton ($\ell = e,μ$) channels $R(D^*) = 0.252(22)$ in the Standard Model. A simultaneous fit with recent data from the Belle experiment yields $|V_{cb}| = 39.19(91)\times 10^{-3}$, which is consistent with previous exclusive determinations, and shows good consistency in the kinematical distribution of the differential decay rate between the lattice and experimental data. △ Less

Submitted 4 June, 2024; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 45 pages, 19 figures; v2: version published in PRD

Report number: KEK-CP-393, OU-HET-1186

Showing 1–50 of 264 results for author: Aoki, Y