Skip to main content

Showing 1–50 of 98 results for author: Shimodaira, H

.
  1. arXiv:2505.15428  [pdf, ps, other

    cs.CL

    Likelihood Variance as Text Importance for Resampling Texts to Map Language Models

    Authors: Momose Oyama, Ryo Kishino, Hiroaki Yamagiwa, Hidetoshi Shimodaira

    Abstract: We address the computational cost of constructing a model map, which embeds diverse language models into a common space for comparison via KL divergence. The map relies on log-likelihoods over a large text set, making the cost proportional to the number of texts. To reduce this cost, we propose a resampling method that selects important texts with weights proportional to the variance of log-likeli… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  2. arXiv:2505.15353  [pdf, ps, other

    cs.CL

    Revealing Language Model Trajectories via Kullback-Leibler Divergence

    Authors: Ryo Kishino, Yusuke Takase, Momose Oyama, Hiroaki Yamagiwa, Hidetoshi Shimodaira

    Abstract: A recently proposed method enables efficient estimation of the KL divergence between language models, including models with different architectures, by assigning coordinates based on log-likelihood vectors. To better understand the behavior of this metric, we systematically evaluate KL divergence across a wide range of conditions using publicly available language models. Our analysis covers compar… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  3. arXiv:2505.14099  [pdf, ps, other

    cs.CL cs.IR

    Beyond Chains: Bridging Large Language Models and Knowledge Bases in Complex Question Answering

    Authors: Yihua Zhu, Qianying Liu, Akiko Aizawa, Hidetoshi Shimodaira

    Abstract: Knowledge Base Question Answering (KBQA) aims to answer natural language questions using structured knowledge from KBs. While LLM-only approaches offer generalization, they suffer from outdated knowledge, hallucinations, and lack of transparency. Chain-based KG-RAG methods address these issues by incorporating external KBs, but are limited to simple chain-structured questions due to the absence of… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  4. arXiv:2503.02343  [pdf, other

    cs.CL cs.LG

    DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability

    Authors: Yunzhen He, Yusuke Takase, Yoichi Ishibashi, Hidetoshi Shimodaira

    Abstract: Large Language Models (LLMs) are increasingly being used in real-world applications. However, concerns about the reliability of the content they generate persist, as it frequently deviates from factual correctness or exhibits deficiencies in logical reasoning. This paper proposes a novel decoding strategy aimed at enhancing both factual accuracy and inferential reasoning without requiring any modi… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: Source code is available at https://github.com/githubhyz/DeLTa

  5. arXiv:2502.16173  [pdf, ps, other

    cs.CL

    Mapping 1,000+ Language Models via the Log-Likelihood Vector

    Authors: Momose Oyama, Hiroaki Yamagiwa, Yusuke Takase, Hidetoshi Shimodaira

    Abstract: To compare autoregressive language models at scale, we propose using log-likelihood vectors computed on a predefined text set as model features. This approach has a solid theoretical basis: when treated as model coordinates, their squared Euclidean distance approximates the Kullback-Leibler divergence of text-generation probabilities. Our method is highly scalable, with computational cost growing… ▽ More

    Submitted 31 May, 2025; v1 submitted 22 February, 2025; originally announced February 2025.

    Comments: ACL 2025

  6. arXiv:2412.12569  [pdf, ps, other

    cs.CL

    Quantifying Lexical Semantic Shift via Unbalanced Optimal Transport

    Authors: Ryo Kishino, Hiroaki Yamagiwa, Ryo Nagata, Sho Yokoi, Hidetoshi Shimodaira

    Abstract: Lexical semantic change detection aims to identify shifts in word meanings over time. While existing methods using embeddings from a diachronic corpus pair estimate the degree of change for target words, they offer limited insight into changes at the level of individual usage instances. To address this, we apply Unbalanced Optimal Transport (UOT) to sets of contextualized word embeddings, capturin… ▽ More

    Submitted 31 May, 2025; v1 submitted 17 December, 2024; originally announced December 2024.

    Comments: ACL 2025

  7. arXiv:2411.00680  [pdf, other

    cs.CL cs.LG stat.ML

    Zipfian Whitening

    Authors: Sho Yokoi, Han Bao, Hiroto Kurita, Hidetoshi Shimodaira

    Abstract: The word embedding space in neural models is skewed, and correcting this can improve task performance. We point out that most approaches for modeling, correcting, and measuring the symmetry of an embedding space implicitly assume that the word frequencies are uniform; in reality, word frequencies follow a highly non-uniform distribution, known as Zipf's law. Surprisingly, simply performing PCA whi… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

    Comments: NeurIPS 2024

  8. arXiv:2409.19919  [pdf, other

    cs.CL

    Understanding Higher-Order Correlations Among Semantic Components in Embeddings

    Authors: Momose Oyama, Hiroaki Yamagiwa, Hidetoshi Shimodaira

    Abstract: Independent Component Analysis (ICA) offers interpretable semantic components of embeddings. While ICA theory assumes that embeddings can be linearly decomposed into independent components, real-world data often do not satisfy this assumption. Consequently, non-independencies remain between the estimated components, which ICA cannot eliminate. We quantified these non-independencies using higher-or… ▽ More

    Submitted 9 October, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

    Comments: EMNLP 2024

  9. arXiv:2409.11253  [pdf, other

    cs.CL

    Norm of Mean Contextualized Embeddings Determines their Variance

    Authors: Hiroaki Yamagiwa, Hidetoshi Shimodaira

    Abstract: Contextualized embeddings vary by context, even for the same token, and form a distribution in the embedding space. To analyze this distribution, we focus on the norm of the mean embedding and the variance of the embeddings. In this study, we first demonstrate that these values follow the well-known formula for variance in statistics and provide an efficient sequential computation method. Then, by… ▽ More

    Submitted 17 December, 2024; v1 submitted 17 September, 2024; originally announced September 2024.

    Comments: COLING 2025

  10. arXiv:2406.19287  [pdf, other

    astro-ph.HE

    Isotropy of cosmic rays beyond $10^{20}$ eV favors their heavy mass composition

    Authors: Telescope Array Collaboration, R. U. Abbasi, Y. Abe, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, I. Buckland, B. G. Cheon, M. Chikawa, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich, N. Globus, R. Gonzalez, W. Hanlon, N. Hayashida, H. He , et al. (118 additional authors not shown)

    Abstract: We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the resul… ▽ More

    Submitted 3 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures, accepted for publication in PRL

  11. arXiv:2406.19286  [pdf, other

    astro-ph.HE

    Mass composition of ultra-high energy cosmic rays from distribution of their arrival directions with the Telescope Array

    Authors: Telescope Array Collaboration, R. U. Abbasi, Y. Abe, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, I. Buckland, B. G. Cheon, M. Chikawa, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich, N. Globus, R. Gonzalez, W. Hanlon, N. Hayashida, H. He , et al. (118 additional authors not shown)

    Abstract: We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale struc… ▽ More

    Submitted 3 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 18 pages, 11 figures, accepted for publication in PRD

  12. arXiv:2406.18094  [pdf, other

    cs.CL

    Shimo Lab at "Discharge Me!": Discharge Summarization by Prompt-Driven Concatenation of Electronic Health Record Sections

    Authors: Yunzhen He, Hiroaki Yamagiwa, Hidetoshi Shimodaira

    Abstract: In this paper, we present our approach to the shared task "Discharge Me!" at the BioNLP Workshop 2024. The primary goal of this task is to reduce the time and effort clinicians spend on writing detailed notes in the electronic health record (EHR). Participants develop a pipeline to generate the "Brief Hospital Course" and "Discharge Instructions" sections from the EHR. Our approach involves a firs… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: BioNLP @ ACL2024

  13. arXiv:2406.10984  [pdf, other

    cs.CL

    Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings

    Authors: Hiroaki Yamagiwa, Momose Oyama, Hidetoshi Shimodaira

    Abstract: Cosine similarity is widely used to measure the similarity between two embeddings, while interpretations based on angle and correlation coefficient are common. In this study, we focus on the interpretable axes of embeddings transformed by Independent Component Analysis (ICA), and propose a novel interpretation of cosine similarity as the sum of semantic similarities over axes. The normalized ICA-t… ▽ More

    Submitted 17 December, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: COLING 2025

  14. arXiv:2406.08612  [pdf, other

    astro-ph.HE

    Observation of Declination Dependence in the Cosmic Ray Energy Spectrum

    Authors: The Telescope Array Collaboration, R. U. Abbasi, T. Abu-Zayyad, M. Allen, J. W. Belz, D. R. Bergman, I. Buckland, W. Campbell, B. G. Cheon, K. Endo, A. Fedynitch, T. Fujii, K. Fujisue, K. Fujita, M. Fukushima, G. Furlich, Z. Gerber, N. Globus, W. Hanlon, N. Hayashida, H. He, K. Hibino, R. Higuchi, D. Ikeda, T. Ishii , et al. (101 additional authors not shown)

    Abstract: We report on an observation of the difference between northern and southern skies of the ultrahigh energy cosmic ray energy spectrum with a significance of ${\sim}8σ$. We use measurements from the two largest experiments$\unicode{x2014}$the Telescope Array observing the northern hemisphere and the Pierre Auger Observatory viewing the southern hemisphere. Since the comparison of two measurements fr… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 8 pages, 6 figures

  15. Predicting drug-gene relations via analogy tasks with word embeddings

    Authors: Hiroaki Yamagiwa, Ryoma Hashimoto, Kiwamu Arakane, Ken Murakami, Shou Soeda, Momose Oyama, Yihua Zhu, Mariko Okada, Hidetoshi Shimodaira

    Abstract: Natural language processing (NLP) is utilized in a wide range of fields, where words in text are typically transformed into feature vectors called embeddings. BioConceptVec is a specific example of embeddings tailored for biology, trained on approximately 30 million PubMed abstracts using models such as skip-gram. Generally, word embeddings are known to solve analogy tasks through simple vector ar… ▽ More

    Submitted 27 May, 2025; v1 submitted 3 June, 2024; originally announced June 2024.

    Journal ref: Sci Rep 15, 17240 (2025)

  16. arXiv:2404.12718  [pdf

    cs.CV cs.LG

    Improving Prediction Accuracy of Semantic Segmentation Methods Using Convolutional Autoencoder Based Pre-processing Layers

    Authors: Hisashi Shimodaira

    Abstract: In this paper, we propose a method to improve prediction accuracy of semantic segmentation methods as follows: (1) construct a neural network that has pre-processing layers based on a convolutional autoencoder ahead of a semantic segmentation network, and (2) train the entire network initialized by the weights of the pre-trained autoencoder. We applied this method to the fully convolutional networ… ▽ More

    Submitted 9 July, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: The changes from the previous version: References [14] and [17] are added in page 2372. Summary of results and discussion (6) are added in page 2383. The new version has been reviewed by AAIML Journal. Reviewer1: The manuscript presents a solid contribution and is well written. The reviewer2: The work is novel and the results are promissing

    Journal ref: Advances in Artificial Intelligence and Machine Learning; Research 4 (2) 2369-2386; Published 29-06-2024

  17. arXiv:2401.06112  [pdf, other

    cs.CL

    Axis Tour: Word Tour Determines the Order of Axes in ICA-transformed Embeddings

    Authors: Hiroaki Yamagiwa, Yusuke Takase, Hidetoshi Shimodaira

    Abstract: Word embedding is one of the most important components in natural language processing, but interpreting high-dimensional embeddings remains a challenging problem. To address this problem, Independent Component Analysis (ICA) is identified as an effective solution. ICA-transformed word embeddings reveal interpretable semantic axes; however, the order of these axes are arbitrary. In this study, we f… ▽ More

    Submitted 9 October, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: EMNLP 2024 Findings (short)

  18. arXiv:2401.05967  [pdf, other

    cs.CL cs.IR

    Block-Diagonal Orthogonal Relation and Matrix Entity for Knowledge Graph Embedding

    Authors: Yihua Zhu, Hidetoshi Shimodaira

    Abstract: The primary aim of Knowledge Graph embeddings (KGE) is to learn low-dimensional representations of entities and relations for predicting missing facts. While rotation-based methods like RotatE and QuatE perform well in KGE, they face two challenges: limited model flexibility requiring proportional increases in relation size with entity dimension, and difficulties in generalizing the model for high… ▽ More

    Submitted 2 October, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: EMNLP2024 findings (Long)

  19. arXiv:2309.11852  [pdf, other

    cs.CL

    Knowledge Sanitization of Large Language Models

    Authors: Yoichi Ishibashi, Hidetoshi Shimodaira

    Abstract: We explore a knowledge sanitization approach to mitigate the privacy concerns associated with large language models (LLMs). LLMs trained on a large corpus of Web data can memorize and potentially reveal sensitive or confidential information, raising critical security concerns. Our technique efficiently fine-tunes these models using the Low-Rank Adaptation (LoRA) method, prompting them to generate… ▽ More

    Submitted 2 March, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

  20. arXiv:2305.13175  [pdf, other

    cs.CL

    Discovering Universal Geometry in Embeddings with ICA

    Authors: Hiroaki Yamagiwa, Momose Oyama, Hidetoshi Shimodaira

    Abstract: This study utilizes Independent Component Analysis (ICA) to unveil a consistent semantic structure within embeddings of words or images. Our approach extracts independent semantic components from the embeddings of a pre-trained model by leveraging anisotropic information that remains after the whitening process in Principal Component Analysis (PCA). We demonstrate that each embedding can be expres… ▽ More

    Submitted 2 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 29 pages, EMNLP 2023

  21. arXiv:2305.13015  [pdf, other

    cs.CL

    3D Rotation and Translation for Hyperbolic Knowledge Graph Embedding

    Authors: Yihua Zhu, Hidetoshi Shimodaira

    Abstract: The main objective of Knowledge Graph (KG) embeddings is to learn low-dimensional representations of entities and relations, enabling the prediction of missing facts. A significant challenge in achieving better KG embeddings lies in capturing relation patterns, including symmetry, antisymmetry, inversion, commutative composition, non-commutative composition, hierarchy, and multiplicity. This study… ▽ More

    Submitted 3 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 19 pages, EACL2024 main

  22. arXiv:2212.09663  [pdf, other

    cs.CL

    Norm of Word Embedding Encodes Information Gain

    Authors: Momose Oyama, Sho Yokoi, Hidetoshi Shimodaira

    Abstract: Distributed representations of words encode lexical semantic information, but what type of information is encoded and how? Focusing on the skip-gram with negative-sampling method, we found that the squared norm of static word embedding encodes the information gain conveyed by the word; the information gain is defined by the Kullback-Leibler divergence of the co-occurrence distribution of the word… ▽ More

    Submitted 2 November, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: 23 pages, EMNLP 2023

  23. arXiv:2211.06229  [pdf, other

    cs.CL

    Improving word mover's distance by leveraging self-attention matrix

    Authors: Hiroaki Yamagiwa, Sho Yokoi, Hidetoshi Shimodaira

    Abstract: Measuring the semantic similarity between two sentences is still an important task. The word mover's distance (WMD) computes the similarity via the optimal alignment between the sets of word embeddings. However, WMD does not utilize word order, making it challenging to distinguish sentences with significant overlaps of similar words, even if they are semantically very different. Here, we attempt t… ▽ More

    Submitted 2 November, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Comments: 24 pages, accepted to EMNLP 2023 Findings

  24. arXiv:2205.05115  [pdf, other

    physics.ao-ph astro-ph.IM hep-ex

    First High-speed Video Camera Observations of a Lightning Flash Associated with a Downward Terrestrial Gamma-ray Flash

    Authors: R. U. Abbasi, M. M. F. Saba, J. W. Belz, P. R. Krehbiel, W. Rison, N. Kieu, D. R. da Silva, Dan Rodeheffer, M. A. Stanley, J. Remington, J. Mazich, R. LeVon, K. Smout, A. Petrizze, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, D. R. Bergman, S. A. Blake, I. Buckland, B. G. Cheon, M. Chikawa, T. Fujii , et al. (127 additional authors not shown)

    Abstract: In this paper, we present the first high-speed video observation of a cloud-to-ground lightning flash and its associated downward-directed Terrestrial Gamma-ray Flash (TGF). The optical emission of the event was observed by a high-speed video camera running at 40,000 frames per second in conjunction with the Telescope Array Surface Detector, Lightning Mapping Array, interferometer, electric-field… ▽ More

    Submitted 9 August, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

    Journal ref: Geophysical Research Letters, 50, e2023GL102958 (2023)

  25. Search for Spatial Correlations of Neutrinos with Ultra-High-Energy Cosmic Rays

    Authors: The ANTARES collaboration, A. Albert, S. Alves, M. André, M. Anghinolfi, M. Ardid, S. Ardid, J. -J. Aubert, J. Aublin, B. Baret, S. Basa, B. Belhorma, M. Bendahman, V. Bertin, S. Biagi, M. Bissinger, J. Boumaaza, M. Bouta, M. C. Bouwhuis, H. Brânzaş, R. Bruijn, J. Brunner, J. Busto, B. Caiffi, D. Calvo , et al. (1025 additional authors not shown)

    Abstract: For several decades, the origin of ultra-high-energy cosmic rays (UHECRs) has been an unsolved question of high-energy astrophysics. One approach for solving this puzzle is to correlate UHECRs with high-energy neutrinos, since neutrinos are a direct probe of hadronic interactions of cosmic rays and are not deflected by magnetic fields. In this paper, we present three different approaches for corre… ▽ More

    Submitted 23 August, 2022; v1 submitted 18 January, 2022; originally announced January 2022.

    Comments: 39 pages, 7 figures, 4 tables; updated source files including xml authorlist

    Report number: FERMILAB-PUB-22-033-AD-PPD-SCD-TD

    Journal ref: ApJ 934 164 (2022)

  26. arXiv:2112.13951  [pdf, other

    stat.ML cs.AI cs.LG stat.ME

    Improving Nonparametric Classification via Local Radial Regression with an Application to Stock Prediction

    Authors: Ruixing Cao, Akifumi Okuno, Kei Nakagawa, Hidetoshi Shimodaira

    Abstract: For supervised classification problems, this paper considers estimating the query's label probability through local regression using observed covariates. Well-known nonparametric kernel smoother and $k$-nearest neighbor ($k$-NN) estimator, which take label average over a ball around the query, are consistent but asymptotically biased particularly for a large radius of the ball. To eradicate such b… ▽ More

    Submitted 21 July, 2022; v1 submitted 27 December, 2021; originally announced December 2021.

    Comments: 23pages, 10 figures, first two authors (R. Cao and A. Okuno) contributed equally to this work

  27. arXiv:2111.09962  [pdf, other

    astro-ph.HE hep-ex physics.ao-ph

    Observation of Variations in Cosmic Ray Single Count Rates During Thunderstorms and Implications for Large-Scale Electric Field Changes

    Authors: R. U. Abbasi, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, I. Buckland, R. Cady, B. G. Cheon, J. Chiba, M. Chikawa, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, R. Fukushima, G. Furlich, N. Globus, R. Gonzalez, W. Hanlon, M. Hayashi , et al. (140 additional authors not shown)

    Abstract: We present the first observation by the Telescope Array Surface Detector (TASD) of the effect of thunderstorms on the development of cosmic ray single count rate intensity over a 700 km$^{2}$ area. Observations of variations in the secondary low-energy cosmic ray counting rate, using the TASD, allow us to study the electric field inside thunderstorms, on a large scale, as it progresses on top of t… ▽ More

    Submitted 18 November, 2021; originally announced November 2021.

  28. arXiv:2110.14827  [pdf, other

    astro-ph.HE

    Indications of a Cosmic Ray Source in the Perseus-Pisces Supercluster

    Authors: Telescope Array Collaboration, R. U. Abbasi, T. Abu-Zayyad, M. Allen, Y. Arai, R. Arimura, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, I. Buckland, R. Cady, B. G. Cheon, J. Chiba, M. Chikawa, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, R. Fukushima, G. Furlich, N. Globus, R. Gonzalez, W. Hanlon , et al. (135 additional authors not shown)

    Abstract: The Telescope Array Collaboration has observed an excess of events with $E \ge 10^{19.4} ~{\rm eV}$ in the data which is centered at (RA, dec) = ($19^\circ$, $35^\circ$). This is near the center of the Perseus-Pisces supercluster (PPSC). The PPSC is about $70 ~{\rm Mpc}$ distant and is the closest supercluster in the Northern Hemisphere (other than the Virgo supercluster of which we are a part). A… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: 8 pages, 4 figures, 1 table

  29. arXiv:2105.08585  [pdf, other

    cs.CL

    Revisiting Additive Compositionality: AND, OR and NOT Operations with Word Embeddings

    Authors: Masahiro Naito, Sho Yokoi, Geewook Kim, Hidetoshi Shimodaira

    Abstract: It is well-known that typical word embedding methods such as Word2Vec and GloVe have the property that the meaning can be composed by adding up the embeddings (additive compositionality). Several theories have been proposed to explain additive compositionality, but the following questions remain unanswered: (Q1) The assumptions of those theories do not hold for the practical word embedding. (Q2) O… ▽ More

    Submitted 19 December, 2022; v1 submitted 18 May, 2021; originally announced May 2021.

    Comments: 13pages; v1: accepted at ACL-IJCNLP 2021 Student Research Workshop; v2: minor revision

    MSC Class: 68T50

  30. arXiv:2103.01750  [pdf, other

    cs.SI physics.data-an physics.soc-ph stat.ME

    Nonparametric estimation of the preferential attachment function from one network snapshot

    Authors: Thong Pham, Paul Sheridan, Hidetoshi Shimodaira

    Abstract: Preferential attachment is commonly invoked to explain the emergence of those heavy-tailed degree distributions characteristic of growing network representations of diverse real-world phenomena. Experimentally confirming this hypothesis in real-world growing networks is an important frontier in network science research. Conventional preferential attachment estimation methods require that a growing… ▽ More

    Submitted 21 June, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: 26 pages, 11 figures

  31. arXiv:2103.01086  [pdf, ps, other

    astro-ph.IM physics.ins-det

    Surface detectors of the TAx4 experiment

    Authors: Telescope Array Collaboration, R. U. Abbasi, M. Abe, T. Abu-Zayyad, M. Allen, Y. Arai, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, R. Cady, B. G. Cheon, J. Chiba, M. Chikawa, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, R. Fukushima, G. Furlich, W. Hanlon, M. Hayashi, N. Hayashida, K. Hibino , et al. (124 additional authors not shown)

    Abstract: Telescope Array (TA) is the largest ultrahigh energy cosmic-ray (UHECR) observatory in the Northern Hemisphere. It explores the origin of UHECRs by measuring their energy spectrum, arrival-direction distribution, and mass composition using a surface detector (SD) array covering approximately 700 km$^2$ and fluorescence detector (FD) stations. TA has found evidence for a cluster of cosmic rays with… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: 26 pages, 17 figures, submitted to Nuclear Inst. and Methods in Physics Research, A

  32. arXiv:2009.14327  [pdf, other

    physics.ao-ph astro-ph.HE hep-ex

    Observations of the Origin of Downward Terrestrial Gamma-Ray Flashes

    Authors: J. W. Belz, P. R. Krehbiel, J. Remington, M. A. Stanley, R. U. Abbasi, R. LeVon, W. Rison, D. Rodeheffer, the Telescope Array Scientific Collaboration, :, T. Abu-Zayyad, M. Allen, E. Barcikowski, D. R. Bergman, S. A. Blake, M. Byrne, R. Cady, B. G. Cheon, M. Chikawa, A. di Matteo, T. Fujii, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich , et al. (116 additional authors not shown)

    Abstract: In this paper we report the first close, high-resolution observations of downward-directed terrestrial gamma-ray flashes (TGFs) detected by the large-area Telescope Array cosmic ray observatory, obtained in conjunction with broadband VHF interferometer and fast electric field change measurements of the parent discharge. The results show that the TGFs occur during strong initial breakdown pulses (I… ▽ More

    Submitted 12 October, 2020; v1 submitted 29 September, 2020; originally announced September 2020.

    Comments: Typo fixed and reference added. Manuscript is 36 pages. Supplemental Information is 42 pages. This paper is to be published in the Journal of Geophysical Research: Atmospheres. Online data repository: Open Science Framework DOI: 10.17605/OSF.IO/Z3XDA

  33. Search for Large-scale Anisotropy on Arrival Directions of Ultra-high-energy Cosmic Rays Observed with the Telescope Array Experiment

    Authors: Telescope Array Collaboration, R. U. Abbasi, M. Abe, T. Abu-Zayyad, M. Allen, R. Azuma, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, R. Cady, B. G. Cheon, J. Chiba, M. Chikawa, A. di Matteo, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich, W. Hanlon, M. Hayashi, N. Hayashida, K. Hibino , et al. (121 additional authors not shown)

    Abstract: Motivated by the detection of a significant dipole structure in the arrival directions of ultrahigh-energy cosmic rays above 8 EeV reported by the Pierre Auger Observatory (Auger), we search for a large-scale anisotropy using data collected with the surface detector array of the Telescope Array Experiment (TA). With 11 years of TA data, a dipole structure in a projection of the right ascension is… ▽ More

    Submitted 27 July, 2020; v1 submitted 30 June, 2020; originally announced July 2020.

    Comments: 6 pages, 3 figures, 1 table, Proofed title. Added journal reference and DOI

    Journal ref: The Astrophysical Journal Letters 898, L28 (2020)

  34. Measurement of the Proton-Air Cross Section with Telescope Array's Black Rock Mesa and Long Ridge Fluorescence Detectors, and Surface Array in Hybrid Mode

    Authors: R. U. Abbasi, M. Abe, T. Abu-Zayyad, M. Allen, R. Azuma, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, R. Cady, B. G. Cheon, J. Chiba, M. Chikawa, A. di Matteo, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich, W. Hanlon, M. Hayashi, N. Hayashida, K. Hibino, R. Higuchi , et al. (120 additional authors not shown)

    Abstract: Ultra high energy cosmic rays provide the highest known energy source in the universe to measure proton cross sections. Though conditions for collecting such data are less controlled than an accelerator environment, current generation cosmic ray observatories have large enough exposures to collect significant statistics for a reliable measurement for energies above what can be attained in the lab.… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    Journal ref: Phys. Rev. D 102, 062004 (2020)

  35. arXiv:2005.07312  [pdf, other

    astro-ph.HE

    Evidence for a Supergalactic Structure of Magnetic Deflection Multiplets of Ultra-High Energy Cosmic Rays

    Authors: Telescope Array Collaboration, R. U. Abbasi, M. Abe, T. Abu-Zayyad, M. Allen, R. Azuma, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, R. Cady, B. G. Cheon, J. Chiba, M. Chikawa, A. di Matteo, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich, W. Hanlon, M. Hayashi, N. Hayashida, K. Hibino , et al. (119 additional authors not shown)

    Abstract: Evidence for a large-scale supergalactic cosmic ray multiplet (arrival directions correlated with energy) structure is reported for ultra-high energy cosmic ray (UHECR) energies above 10$^{19}$ eV using seven years of data from the Telescope Array (TA) surface detector and updated to 10 years. Previous energy-position correlation studies have made assumptions regarding magnetic field shapes and st… ▽ More

    Submitted 2 July, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

  36. arXiv:2005.00670  [pdf, other

    cs.LG cs.CL cs.CV cs.HC stat.ML

    Stochastic Neighbor Embedding of Multimodal Relational Data for Image-Text Simultaneous Visualization

    Authors: Morihiro Mizutani, Akifumi Okuno, Geewook Kim, Hidetoshi Shimodaira

    Abstract: Multimodal relational data analysis has become of increasing importance in recent years, for exploring across different domains of data, such as images and their text tags obtained from social networking services (e.g., Flickr). A variety of data analysis methods have been developed for visualization; to give an example, t-Stochastic Neighbor Embedding (t-SNE) computes low-dimensional feature vect… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

    Comments: 20 pages, 23 figures

  37. arXiv:2002.03054  [pdf, other

    stat.ML cs.LG

    Extrapolation Towards Imaginary $0$-Nearest Neighbour and Its Improved Convergence Rate

    Authors: Akifumi Okuno, Hidetoshi Shimodaira

    Abstract: $k$-nearest neighbour ($k$-NN) is one of the simplest and most widely-used methods for supervised classification, that predicts a query's label by taking weighted ratio of observed labels of $k$ objects nearest to the query. The weights and the parameter $k \in \mathbb{N}$ regulate its bias-variance trade-off, and the trade-off implicitly affects the convergence rate of the excess risk for the $k… ▽ More

    Submitted 10 November, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

    Comments: 27 pages (with Supplementary Material), 4 figures, NeurIPS2020

  38. Prediction of head motion from speech waveforms with a canonical-correlation-constrained autoencoder

    Authors: JinHong Lu, Hiroshi Shimodaira

    Abstract: This study investigates the direct use of speech waveforms to predict head motion for speech-driven head-motion synthesis, whereas the use of spectral features such as MFCC as basic input features together with additional features such as energy and F0 is common in the literature. We show that, rather than combining different features that originate from waveforms, it is more effective to use wave… ▽ More

    Submitted 2 November, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

    Comments: head motion synthesis, speech-driven animation, deep canonically correlated autoencoder

    Journal ref: Proc. Interspeech 2020, 1301-1305

  39. arXiv:1910.06134  [pdf, other

    cs.LG stat.ML

    More Powerful Selective Kernel Tests for Feature Selection

    Authors: Jen Ning Lim, Makoto Yamada, Wittawat Jitkrittum, Yoshikazu Terada, Shigeyuki Matsui, Hidetoshi Shimodaira

    Abstract: Refining one's hypotheses in the light of data is a common scientific practice; however, the dependency on the data introduces selection bias and can lead to specious statistical analysis. An approach for addressing this is via conditioning on the selection procedure to account for how we have used the data to generate our hypotheses, and prevent information to be used again after selection. Many… ▽ More

    Submitted 29 February, 2020; v1 submitted 14 October, 2019; originally announced October 2019.

    Comments: Accepted to AISTATS 2020

  40. arXiv:1910.00213  [pdf, other

    physics.soc-ph cs.SI physics.data-an stat.AP stat.CO

    Joint Estimation of the Non-parametric Transitivity and Preferential Attachment Functions in Scientific Co-authorship Networks

    Authors: Masaaki Inoue, Thong Pham, Hidetoshi Shimodaira

    Abstract: We propose a statistical method to estimate simultaneously the non-parametric transitivity and preferential attachment functions in a growing network, in contrast to conventional methods that either estimate each function in isolation or assume some functional form for them. Our model is shown to be a good fit to two real-world co-authorship networks and be able to bring to light intriguing detail… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: 24 pages, 10 figures

  41. arXiv:1908.02573  [pdf, other

    cs.SI cs.LG stat.ME stat.ML

    Hyperlink Regression via Bregman Divergence

    Authors: Akifumi Okuno, Hidetoshi Shimodaira

    Abstract: A collection of $U \: (\in \mathbb{N})$ data vectors is called a $U$-tuple, and the association strength among the vectors of a tuple is termed as the \emph{hyperlink weight}, that is assumed to be symmetric with respect to permutation of the entries in the index. We herein propose Bregman hyperlink regression (BHLR), which learns a user-specified symmetric similarity function such that it predict… ▽ More

    Submitted 28 March, 2020; v1 submitted 21 July, 2019; originally announced August 2019.

    Comments: 41 pages, 14 figures

  42. arXiv:1907.10585  [pdf, other

    eess.SP cs.LG eess.AS

    A neural network based post-filter for speech-driven head motion synthesis

    Authors: JinHong Lu, Hiroshi Shimodaira

    Abstract: Despite the fact that neural networks are widely used for speech-driven head motion synthesis, it is well-known that the output of neural networks is noisy or discontinuous due to the limited capability of deep neural networks in predicting human motion. Thus, post-processing is required to obtain smooth head motion trajectories for animation. It is common to apply a linear filter or consider keyf… ▽ More

    Submitted 24 July, 2019; v1 submitted 24 July, 2019; originally announced July 2019.

  43. arXiv:1906.07893  [pdf

    cs.CV

    Extended probabilistic Rand index and the adjustable moving window-based pixel-pair sampling method

    Authors: Hisashi Shimodaira

    Abstract: The probabilistic Rand (PR) index has the following three problems: It lacks variations in its value over images; the normalized probabilistic Rand (NPR) index to address this is theoretically unclear, and the sampling method of pixel-pairs was not proposed concretely. In this paper, we propose methods for solving these problems. First, we propose extended probabilistic Rand (EPR) index that consi… ▽ More

    Submitted 18 June, 2019; originally announced June 2019.

    Comments: 9 pages, 9 figures and 9tables

  44. arXiv:1905.10573  [pdf, other

    stat.ME

    Selective inference after feature selection via multiscale bootstrap

    Authors: Yoshikazu Terada, Hidetoshi Shimodaira

    Abstract: It is common to show the confidence intervals or $p$-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the selection event. Most existing studies of selective inference consider a specific algorithm, such as Lasso, for feature selection, and thus they have difficulties in ha… ▽ More

    Submitted 31 May, 2022; v1 submitted 25 May, 2019; originally announced May 2019.

    Comments: The article was accepted for publication in Annals of the Institute of Statistical Mathematics (http://www.ism.ac.jp/editsec/aism/). The title has changed (The old title is "Selective inference after variable selection via multiscale bootstrap"). 27 pages, 11 figures

  45. Search for Ultra-High-Energy Neutrinos with the Telescope Array Surface Detector

    Authors: R. U. Abbasi, M. Abe, T. Abu-Zayyad, M. Allen, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, R. Cady, B. G. Cheon, J. Chiba, M. Chikawa, A. di Matteo, T. Fujii, K. Fujisue, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich, W. Hanlon, M. Hayashi, Y. Hayashi, N. Hayashida, K. Hibino, K. Honda , et al. (112 additional authors not shown)

    Abstract: We present an upper limit on the flux of ultra-high-energy down-going neutrinos for $E > 10^{18}\ \mbox{eV}$ derived with the nine years of data collected by the Telescope Array surface detector (05-11-2008 -- 05-10-2017). The method is based on the multivariate analysis technique, so-called Boosted Decision Trees (BDT). Proton-neutrino classifier is built upon 16 observables related to both the p… ▽ More

    Submitted 12 May, 2020; v1 submitted 9 May, 2019; originally announced May 2019.

    Comments: 10 pages, 4 figures, accepted to JETP

    Journal ref: JETP Vol. 158 (8(2)) (2020)

  46. Search for point sources of ultra-high energy photons with the Telescope Array surface detector

    Authors: Telescope Array Collaboration, R. U. Abbasi, M. Abe, T. Abu-Zayyad, M. Allen, R. Azuma, E. Barcikowski, J. W. Belz, D. R. Bergman, S. A. Blake, R. Cady, B. G. Cheon, J. Chiba, M. Chikawa, A. diMatteo, T. Fujii, K. Fujita, R. Fujiwara, M. Fukushima, G. Furlich, W. Hanlon, M. Hayashi, Y. Hayashi, N. Hayashida, K. Hibino , et al. (114 additional authors not shown)

    Abstract: The surface detector (SD) of the Telescope Array (TA) experiment allows one to indirectly detect photons with energies of order $10^{18}$ eV and higher and to separate photons from the cosmic-ray background. In this paper we present the results of a blind search for point sources of ultra-high energy (UHE) photons in the Northern sky using the TA SD data. The photon-induced extensive air showers (… ▽ More

    Submitted 9 March, 2020; v1 submitted 30 March, 2019; originally announced April 2019.

    Comments: accepted to MNRAS, 11 pages, 4 figures, 2 tables; results in text-file format are supplemented to paper source

    Report number: INR-TH-2019-005

    Journal ref: MNRAS 492 (2020), 3984

  47. arXiv:1902.10409  [pdf, other

    cs.LG stat.ML

    Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities

    Authors: Geewook Kim, Akifumi Okuno, Kazuki Fukui, Hidetoshi Shimodaira

    Abstract: We propose $\textit{weighted inner product similarity}$ (WIPS) for neural network-based graph embedding. In addition to the parameters of neural networks, we optimize the weights of the inner product by allowing positive and negative values. Despite its simplicity, WIPS can approximate arbitrary general similarities including positive definite, conditionally positive definite, and indefinite kerne… ▽ More

    Submitted 1 June, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

    Comments: 8 pages, 2 figures, IJCAI 2019

  48. arXiv:1902.08440  [pdf, other

    stat.ML cs.LG

    Robust Graph Embedding with Noisy Link Weights

    Authors: Akifumi Okuno, Hidetoshi Shimodaira

    Abstract: We propose $β$-graph embedding for robustly learning feature vectors from data vectors and noisy link weights. A newly introduced empirical moment $β$-score reduces the influence of contamination and robustly measures the difference between the underlying correct expected weights of links and the specified generative model. The proposed method is computationally tractable; we employ a minibatch-ba… ▽ More

    Submitted 22 February, 2019; originally announced February 2019.

    Comments: 14 pages (with Supplementary Material), 3 figures, AISTATS2019

  49. arXiv:1902.07954  [pdf, other

    stat.ME math.ST stat.ML

    An information criterion for auxiliary variable selection in incomplete data analysis

    Authors: Shinpei Imori, Hidetoshi Shimodaira

    Abstract: Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not observed. Utilizing a parametric model of joint distribution of primary and auxiliary variables, it is possible to improve the estimation of parametr… ▽ More

    Submitted 9 March, 2019; v1 submitted 21 February, 2019; originally announced February 2019.

  50. arXiv:1902.04964  [pdf, other

    stat.AP q-bio.PE

    Selective Inference for Testing Trees and Edges in Phylogenetics

    Authors: Hidetoshi Shimodaira, Yoshikazu Terada

    Abstract: Selective inference is considered for testing trees and edges in phylogenetic tree selection from molecular sequences. This improves the previously proposed approximately unbiased test by adjusting the selection bias when testing many trees and edges at the same time. The newly proposed selective inference $p$-value is useful for testing selected edges to claim that they are significantly supporte… ▽ More

    Submitted 24 May, 2019; v1 submitted 13 February, 2019; originally announced February 2019.

    Journal ref: Frontiers in Ecology and Evolution 7:174, 2019