-
Likelihood Variance as Text Importance for Resampling Texts to Map Language Models
Authors:
Momose Oyama,
Ryo Kishino,
Hiroaki Yamagiwa,
Hidetoshi Shimodaira
Abstract:
We address the computational cost of constructing a model map, which embeds diverse language models into a common space for comparison via KL divergence. The map relies on log-likelihoods over a large text set, making the cost proportional to the number of texts. To reduce this cost, we propose a resampling method that selects important texts with weights proportional to the variance of log-likeli…
▽ More
We address the computational cost of constructing a model map, which embeds diverse language models into a common space for comparison via KL divergence. The map relies on log-likelihoods over a large text set, making the cost proportional to the number of texts. To reduce this cost, we propose a resampling method that selects important texts with weights proportional to the variance of log-likelihoods across models for each text. Our method significantly reduces the number of required texts while preserving the accuracy of KL divergence estimates. Experiments show that it achieves comparable performance to uniform sampling with about half as many texts, and also facilitates efficient incorporation of new models into an existing map. These results enable scalable and efficient construction of language model maps.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Revealing Language Model Trajectories via Kullback-Leibler Divergence
Authors:
Ryo Kishino,
Yusuke Takase,
Momose Oyama,
Hiroaki Yamagiwa,
Hidetoshi Shimodaira
Abstract:
A recently proposed method enables efficient estimation of the KL divergence between language models, including models with different architectures, by assigning coordinates based on log-likelihood vectors. To better understand the behavior of this metric, we systematically evaluate KL divergence across a wide range of conditions using publicly available language models. Our analysis covers compar…
▽ More
A recently proposed method enables efficient estimation of the KL divergence between language models, including models with different architectures, by assigning coordinates based on log-likelihood vectors. To better understand the behavior of this metric, we systematically evaluate KL divergence across a wide range of conditions using publicly available language models. Our analysis covers comparisons between pretraining checkpoints, fine-tuned and base models, and layers via the logit lens. We find that trajectories of language models, as measured by KL divergence, exhibit a spiral structure during pretraining and thread-like progressions across layers. Furthermore, we show that, in terms of diffusion exponents, model trajectories in the log-likelihood space are more constrained than those in weight space.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
Beyond Chains: Bridging Large Language Models and Knowledge Bases in Complex Question Answering
Authors:
Yihua Zhu,
Qianying Liu,
Akiko Aizawa,
Hidetoshi Shimodaira
Abstract:
Knowledge Base Question Answering (KBQA) aims to answer natural language questions using structured knowledge from KBs. While LLM-only approaches offer generalization, they suffer from outdated knowledge, hallucinations, and lack of transparency. Chain-based KG-RAG methods address these issues by incorporating external KBs, but are limited to simple chain-structured questions due to the absence of…
▽ More
Knowledge Base Question Answering (KBQA) aims to answer natural language questions using structured knowledge from KBs. While LLM-only approaches offer generalization, they suffer from outdated knowledge, hallucinations, and lack of transparency. Chain-based KG-RAG methods address these issues by incorporating external KBs, but are limited to simple chain-structured questions due to the absence of planning and logical structuring. Inspired by semantic parsing methods, we propose PDRR: a four-stage framework consisting of Predict, Decompose, Retrieve, and Reason. Our method first predicts the question type and decomposes the question into structured triples. Then retrieves relevant information from KBs and guides the LLM as an agent to reason over and complete the decomposed triples. Experimental results demonstrate that PDRR consistently outperforms existing methods across various LLM backbones and achieves superior performance on both chain-structured and non-chain complex questions.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability
Authors:
Yunzhen He,
Yusuke Takase,
Yoichi Ishibashi,
Hidetoshi Shimodaira
Abstract:
Large Language Models (LLMs) are increasingly being used in real-world applications. However, concerns about the reliability of the content they generate persist, as it frequently deviates from factual correctness or exhibits deficiencies in logical reasoning. This paper proposes a novel decoding strategy aimed at enhancing both factual accuracy and inferential reasoning without requiring any modi…
▽ More
Large Language Models (LLMs) are increasingly being used in real-world applications. However, concerns about the reliability of the content they generate persist, as it frequently deviates from factual correctness or exhibits deficiencies in logical reasoning. This paper proposes a novel decoding strategy aimed at enhancing both factual accuracy and inferential reasoning without requiring any modifications to the architecture or pre-trained parameters of LLMs. Our approach adjusts next-token probabilities by analyzing the trajectory of logits from lower to higher layers in Transformers and applying linear regression. We find that this Decoding by Logit Trajectory-based approach (DeLTa) effectively reinforces factuality and reasoning while mitigating incorrect generation. Experiments on TruthfulQA demonstrate that DeLTa attains up to a 4.9% improvement over the baseline. Furthermore, it enhances performance by up to 8.1% on StrategyQA and 7.3% on GSM8K, both of which demand strong reasoning capabilities.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
Mapping 1,000+ Language Models via the Log-Likelihood Vector
Authors:
Momose Oyama,
Hiroaki Yamagiwa,
Yusuke Takase,
Hidetoshi Shimodaira
Abstract:
To compare autoregressive language models at scale, we propose using log-likelihood vectors computed on a predefined text set as model features. This approach has a solid theoretical basis: when treated as model coordinates, their squared Euclidean distance approximates the Kullback-Leibler divergence of text-generation probabilities. Our method is highly scalable, with computational cost growing…
▽ More
To compare autoregressive language models at scale, we propose using log-likelihood vectors computed on a predefined text set as model features. This approach has a solid theoretical basis: when treated as model coordinates, their squared Euclidean distance approximates the Kullback-Leibler divergence of text-generation probabilities. Our method is highly scalable, with computational cost growing linearly in both the number of models and text samples, and is easy to implement as the required features are derived from cross-entropy loss. Applying this method to over 1,000 language models, we constructed a "model map," providing a new perspective on large-scale model analysis.
△ Less
Submitted 31 May, 2025; v1 submitted 22 February, 2025;
originally announced February 2025.
-
Quantifying Lexical Semantic Shift via Unbalanced Optimal Transport
Authors:
Ryo Kishino,
Hiroaki Yamagiwa,
Ryo Nagata,
Sho Yokoi,
Hidetoshi Shimodaira
Abstract:
Lexical semantic change detection aims to identify shifts in word meanings over time. While existing methods using embeddings from a diachronic corpus pair estimate the degree of change for target words, they offer limited insight into changes at the level of individual usage instances. To address this, we apply Unbalanced Optimal Transport (UOT) to sets of contextualized word embeddings, capturin…
▽ More
Lexical semantic change detection aims to identify shifts in word meanings over time. While existing methods using embeddings from a diachronic corpus pair estimate the degree of change for target words, they offer limited insight into changes at the level of individual usage instances. To address this, we apply Unbalanced Optimal Transport (UOT) to sets of contextualized word embeddings, capturing semantic change through the excess and deficit in the alignment between usage instances. In particular, we propose Sense Usage Shift (SUS), a measure that quantifies changes in the usage frequency of a word sense at each usage instance. By leveraging SUS, we demonstrate that several challenges in semantic change detection can be addressed in a unified manner, including quantifying instance-level semantic change and word-level tasks such as measuring the magnitude of semantic change and the broadening or narrowing of meaning.
△ Less
Submitted 31 May, 2025; v1 submitted 17 December, 2024;
originally announced December 2024.
-
Zipfian Whitening
Authors:
Sho Yokoi,
Han Bao,
Hiroto Kurita,
Hidetoshi Shimodaira
Abstract:
The word embedding space in neural models is skewed, and correcting this can improve task performance. We point out that most approaches for modeling, correcting, and measuring the symmetry of an embedding space implicitly assume that the word frequencies are uniform; in reality, word frequencies follow a highly non-uniform distribution, known as Zipf's law. Surprisingly, simply performing PCA whi…
▽ More
The word embedding space in neural models is skewed, and correcting this can improve task performance. We point out that most approaches for modeling, correcting, and measuring the symmetry of an embedding space implicitly assume that the word frequencies are uniform; in reality, word frequencies follow a highly non-uniform distribution, known as Zipf's law. Surprisingly, simply performing PCA whitening weighted by the empirical word frequency that follows Zipf's law significantly improves task performance, surpassing established baselines. From a theoretical perspective, both our approach and existing methods can be clearly categorized: word representations are distributed according to an exponential family with either uniform or Zipfian base measures. By adopting the latter approach, we can naturally emphasize informative low-frequency words in terms of their vector norm, which becomes evident from the information-geometric perspective, and in terms of the loss functions for imbalanced classification. Additionally, our theory corroborates that popular natural language processing methods, such as skip-gram negative sampling, WhiteningBERT, and headless language models, work well just because their word embeddings encode the empirical word frequency into the underlying probabilistic model.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
Understanding Higher-Order Correlations Among Semantic Components in Embeddings
Authors:
Momose Oyama,
Hiroaki Yamagiwa,
Hidetoshi Shimodaira
Abstract:
Independent Component Analysis (ICA) offers interpretable semantic components of embeddings. While ICA theory assumes that embeddings can be linearly decomposed into independent components, real-world data often do not satisfy this assumption. Consequently, non-independencies remain between the estimated components, which ICA cannot eliminate. We quantified these non-independencies using higher-or…
▽ More
Independent Component Analysis (ICA) offers interpretable semantic components of embeddings. While ICA theory assumes that embeddings can be linearly decomposed into independent components, real-world data often do not satisfy this assumption. Consequently, non-independencies remain between the estimated components, which ICA cannot eliminate. We quantified these non-independencies using higher-order correlations and demonstrated that when the higher-order correlation between two components is large, it indicates a strong semantic association between them, along with many words sharing common meanings with both components. The entire structure of non-independencies was visualized using a maximum spanning tree of semantic components. These findings provide deeper insights into embeddings through ICA.
△ Less
Submitted 9 October, 2024; v1 submitted 29 September, 2024;
originally announced September 2024.
-
Norm of Mean Contextualized Embeddings Determines their Variance
Authors:
Hiroaki Yamagiwa,
Hidetoshi Shimodaira
Abstract:
Contextualized embeddings vary by context, even for the same token, and form a distribution in the embedding space. To analyze this distribution, we focus on the norm of the mean embedding and the variance of the embeddings. In this study, we first demonstrate that these values follow the well-known formula for variance in statistics and provide an efficient sequential computation method. Then, by…
▽ More
Contextualized embeddings vary by context, even for the same token, and form a distribution in the embedding space. To analyze this distribution, we focus on the norm of the mean embedding and the variance of the embeddings. In this study, we first demonstrate that these values follow the well-known formula for variance in statistics and provide an efficient sequential computation method. Then, by observing embeddings from intermediate layers of several Transformer models, we found a strong trade-off relationship between the norm and the variance: as the mean embedding becomes closer to the origin, the variance increases. This trade-off is likely influenced by the layer normalization mechanism used in Transformer models. Furthermore, when the sets of token embeddings are treated as clusters, we show that the variance of the entire embedding set can theoretically be decomposed into the within-cluster variance and the between-cluster variance. We found experimentally that as the layers of Transformer models deepen, the embeddings move farther from the origin, the between-cluster variance relatively decreases, and the within-cluster variance relatively increases. These results are consistent with existing studies on the anisotropy of the embedding spaces across layers.
△ Less
Submitted 17 December, 2024; v1 submitted 17 September, 2024;
originally announced September 2024.
-
Isotropy of cosmic rays beyond $10^{20}$ eV favors their heavy mass composition
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
Y. Abe,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
R. Arimura,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
I. Buckland,
B. G. Cheon,
M. Chikawa,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
N. Globus,
R. Gonzalez,
W. Hanlon,
N. Hayashida,
H. He
, et al. (118 additional authors not shown)
Abstract:
We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the resul…
▽ More
We report an estimation of the injected mass composition of ultra-high energy cosmic rays (UHECRs) at energies higher than 10 EeV. The composition is inferred from an energy-dependent sky distribution of UHECR events observed by the Telescope Array surface detector by comparing it to the Large Scale Structure of the local Universe. In the case of negligible extra-galactic magnetic fields the results are consistent with a relatively heavy injected composition at E ~ 10 EeV that becomes lighter up to E ~ 100 EeV, while the composition at E > 100 EeV is very heavy. The latter is true even in the presence of highest experimentally allowed extra-galactic magnetic fields, while the composition at lower energies can be light if a strong EGMF is present. The effect of the uncertainty in the galactic magnetic field on these results is subdominant.
△ Less
Submitted 3 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Mass composition of ultra-high energy cosmic rays from distribution of their arrival directions with the Telescope Array
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
Y. Abe,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
R. Arimura,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
I. Buckland,
B. G. Cheon,
M. Chikawa,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
N. Globus,
R. Gonzalez,
W. Hanlon,
N. Hayashida,
H. He
, et al. (118 additional authors not shown)
Abstract:
We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale struc…
▽ More
We use a new method to estimate the injected mass composition of ultrahigh cosmic rays (UHECRs) at energies higher than 10 EeV. The method is based on comparison of the energy-dependent distribution of cosmic ray arrival directions as measured by the Telescope Array experiment (TA) with that calculated in a given putative model of UHECR under the assumption that sources trace the large-scale structure (LSS) of the Universe. As we report in the companion letter, the TA data show large deflections with respect to the LSS which can be explained, assuming small extra-galactic magnetic fields (EGMF), by an intermediate composition changing to a heavy one (iron) in the highest energy bin. Here we show that these results are robust to uncertainties in UHECR injection spectra, the energy scale of the experiment and galactic magnetic fields (GMF). The assumption of weak EGMF, however, strongly affects this interpretation at all but the highest energies E > 100 EeV, where the remarkable isotropy of the data implies a heavy injected composition even in the case of strong EGMF. This result also holds if UHECR sources are as rare as $2 \times 10^{-5}$ Mpc$^{-3}$, that is the conservative lower limit for the source number density.
△ Less
Submitted 3 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Shimo Lab at "Discharge Me!": Discharge Summarization by Prompt-Driven Concatenation of Electronic Health Record Sections
Authors:
Yunzhen He,
Hiroaki Yamagiwa,
Hidetoshi Shimodaira
Abstract:
In this paper, we present our approach to the shared task "Discharge Me!" at the BioNLP Workshop 2024. The primary goal of this task is to reduce the time and effort clinicians spend on writing detailed notes in the electronic health record (EHR). Participants develop a pipeline to generate the "Brief Hospital Course" and "Discharge Instructions" sections from the EHR. Our approach involves a firs…
▽ More
In this paper, we present our approach to the shared task "Discharge Me!" at the BioNLP Workshop 2024. The primary goal of this task is to reduce the time and effort clinicians spend on writing detailed notes in the electronic health record (EHR). Participants develop a pipeline to generate the "Brief Hospital Course" and "Discharge Instructions" sections from the EHR. Our approach involves a first step of extracting the relevant sections from the EHR. We then add explanatory prompts to these sections and concatenate them with separate tokens to create the input text. To train a text generation model, we perform LoRA fine-tuning on the ClinicalT5-large model. On the final test data, our approach achieved a ROUGE-1 score of $0.394$, which is comparable to the top solutions.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings
Authors:
Hiroaki Yamagiwa,
Momose Oyama,
Hidetoshi Shimodaira
Abstract:
Cosine similarity is widely used to measure the similarity between two embeddings, while interpretations based on angle and correlation coefficient are common. In this study, we focus on the interpretable axes of embeddings transformed by Independent Component Analysis (ICA), and propose a novel interpretation of cosine similarity as the sum of semantic similarities over axes. The normalized ICA-t…
▽ More
Cosine similarity is widely used to measure the similarity between two embeddings, while interpretations based on angle and correlation coefficient are common. In this study, we focus on the interpretable axes of embeddings transformed by Independent Component Analysis (ICA), and propose a novel interpretation of cosine similarity as the sum of semantic similarities over axes. The normalized ICA-transformed embeddings exhibit sparsity, enhancing the interpretability of each axis, and the semantic similarity defined by the product of the components represents the shared meaning between the two embeddings along each axis. The effectiveness of this approach is demonstrated through intuitive numerical examples and thorough numerical experiments. By deriving the probability distributions that govern each component and the product of components, we propose a method for selecting statistically significant axes.
△ Less
Submitted 17 December, 2024; v1 submitted 16 June, 2024;
originally announced June 2024.
-
Observation of Declination Dependence in the Cosmic Ray Energy Spectrum
Authors:
The Telescope Array Collaboration,
R. U. Abbasi,
T. Abu-Zayyad,
M. Allen,
J. W. Belz,
D. R. Bergman,
I. Buckland,
W. Campbell,
B. G. Cheon,
K. Endo,
A. Fedynitch,
T. Fujii,
K. Fujisue,
K. Fujita,
M. Fukushima,
G. Furlich,
Z. Gerber,
N. Globus,
W. Hanlon,
N. Hayashida,
H. He,
K. Hibino,
R. Higuchi,
D. Ikeda,
T. Ishii
, et al. (101 additional authors not shown)
Abstract:
We report on an observation of the difference between northern and southern skies of the ultrahigh energy cosmic ray energy spectrum with a significance of ${\sim}8σ$. We use measurements from the two largest experiments$\unicode{x2014}$the Telescope Array observing the northern hemisphere and the Pierre Auger Observatory viewing the southern hemisphere. Since the comparison of two measurements fr…
▽ More
We report on an observation of the difference between northern and southern skies of the ultrahigh energy cosmic ray energy spectrum with a significance of ${\sim}8σ$. We use measurements from the two largest experiments$\unicode{x2014}$the Telescope Array observing the northern hemisphere and the Pierre Auger Observatory viewing the southern hemisphere. Since the comparison of two measurements from different observatories introduces the issue of possible systematic differences between detectors and analyses, we validate the methodology of the comparison by examining the region of the sky where the apertures of the two observatories overlap. Although the spectra differ in this region, we find that there is only a $1.8σ$ difference between the spectrum measurements when anisotropic regions are removed and a fiducial cut in the aperture is applied.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Predicting drug-gene relations via analogy tasks with word embeddings
Authors:
Hiroaki Yamagiwa,
Ryoma Hashimoto,
Kiwamu Arakane,
Ken Murakami,
Shou Soeda,
Momose Oyama,
Yihua Zhu,
Mariko Okada,
Hidetoshi Shimodaira
Abstract:
Natural language processing (NLP) is utilized in a wide range of fields, where words in text are typically transformed into feature vectors called embeddings. BioConceptVec is a specific example of embeddings tailored for biology, trained on approximately 30 million PubMed abstracts using models such as skip-gram. Generally, word embeddings are known to solve analogy tasks through simple vector ar…
▽ More
Natural language processing (NLP) is utilized in a wide range of fields, where words in text are typically transformed into feature vectors called embeddings. BioConceptVec is a specific example of embeddings tailored for biology, trained on approximately 30 million PubMed abstracts using models such as skip-gram. Generally, word embeddings are known to solve analogy tasks through simple vector arithmetic. For example, subtracting the vector for man from that of king and then adding the vector for woman yields a point that lies closer to queen in the embedding space. In this study, we demonstrate that BioConceptVec embeddings, along with our own embeddings trained on PubMed abstracts, contain information about drug-gene relations and can predict target genes from a given drug through analogy computations. We also show that categorizing drugs and genes using biological pathways improves performance. Furthermore, we illustrate that vectors derived from known relations in the past can predict unknown future relations in datasets divided by year. Despite the simplicity of implementing analogy tasks as vector additions, our approach demonstrated performance comparable to that of large language models such as GPT-4 in predicting drug-gene relations.
△ Less
Submitted 27 May, 2025; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Improving Prediction Accuracy of Semantic Segmentation Methods Using Convolutional Autoencoder Based Pre-processing Layers
Authors:
Hisashi Shimodaira
Abstract:
In this paper, we propose a method to improve prediction accuracy of semantic segmentation methods as follows: (1) construct a neural network that has pre-processing layers based on a convolutional autoencoder ahead of a semantic segmentation network, and (2) train the entire network initialized by the weights of the pre-trained autoencoder. We applied this method to the fully convolutional networ…
▽ More
In this paper, we propose a method to improve prediction accuracy of semantic segmentation methods as follows: (1) construct a neural network that has pre-processing layers based on a convolutional autoencoder ahead of a semantic segmentation network, and (2) train the entire network initialized by the weights of the pre-trained autoencoder. We applied this method to the fully convolutional network (FCN) and experimentally compared its prediction accuracy on the cityscapes dataset. The Mean IoU of the proposed target model with the He normal initialization is 18.7% higher than that of FCN with the He normal initialization. In addition, those of the modified models of the target model are significantly higher than that of FCN with the He normal initialization. The accuracy and loss curves during the training showed that these are resulting from the improvement of the generalization ability. All of these results provide strong evidence that the proposed method is significantly effective in improving the prediction accuracy of FCN. The proposed method has the following features: it is comparatively simple, whereas the effect on improving the generalization ability and prediction accuracy of FCN is significant; the increase in the number of parameters by using it is very small, and that in the computation time is substantially large. In principle, the proposed method can be applied to other semantic segmentation methods. For semantic segmentation, at present, there is no effective way to improve the prediction accuracy of existing methods. None have published a method which is the same as or similar to our method and none have used such a method in practice. Therefore, we believe that our method is useful in practice and worthy of being widely known and used.
△ Less
Submitted 9 July, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
Axis Tour: Word Tour Determines the Order of Axes in ICA-transformed Embeddings
Authors:
Hiroaki Yamagiwa,
Yusuke Takase,
Hidetoshi Shimodaira
Abstract:
Word embedding is one of the most important components in natural language processing, but interpreting high-dimensional embeddings remains a challenging problem. To address this problem, Independent Component Analysis (ICA) is identified as an effective solution. ICA-transformed word embeddings reveal interpretable semantic axes; however, the order of these axes are arbitrary. In this study, we f…
▽ More
Word embedding is one of the most important components in natural language processing, but interpreting high-dimensional embeddings remains a challenging problem. To address this problem, Independent Component Analysis (ICA) is identified as an effective solution. ICA-transformed word embeddings reveal interpretable semantic axes; however, the order of these axes are arbitrary. In this study, we focus on this property and propose a novel method, Axis Tour, which optimizes the order of the axes. Inspired by Word Tour, a one-dimensional word embedding method, we aim to improve the clarity of the word embedding space by maximizing the semantic continuity of the axes. Furthermore, we show through experiments on downstream tasks that Axis Tour yields better or comparable low-dimensional embeddings compared to both PCA and ICA.
△ Less
Submitted 9 October, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
Block-Diagonal Orthogonal Relation and Matrix Entity for Knowledge Graph Embedding
Authors:
Yihua Zhu,
Hidetoshi Shimodaira
Abstract:
The primary aim of Knowledge Graph embeddings (KGE) is to learn low-dimensional representations of entities and relations for predicting missing facts. While rotation-based methods like RotatE and QuatE perform well in KGE, they face two challenges: limited model flexibility requiring proportional increases in relation size with entity dimension, and difficulties in generalizing the model for high…
▽ More
The primary aim of Knowledge Graph embeddings (KGE) is to learn low-dimensional representations of entities and relations for predicting missing facts. While rotation-based methods like RotatE and QuatE perform well in KGE, they face two challenges: limited model flexibility requiring proportional increases in relation size with entity dimension, and difficulties in generalizing the model for higher-dimensional rotations. To address these issues, we introduce OrthogonalE, a novel KGE model employing matrices for entities and block-diagonal orthogonal matrices with Riemannian optimization for relations. This approach enhances the generality and flexibility of KGE models. The experimental results indicate that our new KGE model, OrthogonalE, is both general and flexible, significantly outperforming state-of-the-art KGE models while substantially reducing the number of relation parameters.
△ Less
Submitted 2 October, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
Knowledge Sanitization of Large Language Models
Authors:
Yoichi Ishibashi,
Hidetoshi Shimodaira
Abstract:
We explore a knowledge sanitization approach to mitigate the privacy concerns associated with large language models (LLMs). LLMs trained on a large corpus of Web data can memorize and potentially reveal sensitive or confidential information, raising critical security concerns. Our technique efficiently fine-tunes these models using the Low-Rank Adaptation (LoRA) method, prompting them to generate…
▽ More
We explore a knowledge sanitization approach to mitigate the privacy concerns associated with large language models (LLMs). LLMs trained on a large corpus of Web data can memorize and potentially reveal sensitive or confidential information, raising critical security concerns. Our technique efficiently fine-tunes these models using the Low-Rank Adaptation (LoRA) method, prompting them to generate harmless responses such as ``I don't know'' when queried about specific information. Experimental results in a closed-book question-answering task show that our straightforward method not only minimizes particular knowledge leakage but also preserves the overall performance of LLMs. These two advantages strengthen the defense against extraction attacks and reduces the emission of harmful content such as hallucinations.
△ Less
Submitted 2 March, 2024; v1 submitted 21 September, 2023;
originally announced September 2023.
-
Discovering Universal Geometry in Embeddings with ICA
Authors:
Hiroaki Yamagiwa,
Momose Oyama,
Hidetoshi Shimodaira
Abstract:
This study utilizes Independent Component Analysis (ICA) to unveil a consistent semantic structure within embeddings of words or images. Our approach extracts independent semantic components from the embeddings of a pre-trained model by leveraging anisotropic information that remains after the whitening process in Principal Component Analysis (PCA). We demonstrate that each embedding can be expres…
▽ More
This study utilizes Independent Component Analysis (ICA) to unveil a consistent semantic structure within embeddings of words or images. Our approach extracts independent semantic components from the embeddings of a pre-trained model by leveraging anisotropic information that remains after the whitening process in Principal Component Analysis (PCA). We demonstrate that each embedding can be expressed as a composition of a few intrinsic interpretable axes and that these semantic axes remain consistent across different languages, algorithms, and modalities. The discovery of a universal semantic structure in the geometric patterns of embeddings enhances our understanding of the representations in embeddings.
△ Less
Submitted 2 November, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
3D Rotation and Translation for Hyperbolic Knowledge Graph Embedding
Authors:
Yihua Zhu,
Hidetoshi Shimodaira
Abstract:
The main objective of Knowledge Graph (KG) embeddings is to learn low-dimensional representations of entities and relations, enabling the prediction of missing facts. A significant challenge in achieving better KG embeddings lies in capturing relation patterns, including symmetry, antisymmetry, inversion, commutative composition, non-commutative composition, hierarchy, and multiplicity. This study…
▽ More
The main objective of Knowledge Graph (KG) embeddings is to learn low-dimensional representations of entities and relations, enabling the prediction of missing facts. A significant challenge in achieving better KG embeddings lies in capturing relation patterns, including symmetry, antisymmetry, inversion, commutative composition, non-commutative composition, hierarchy, and multiplicity. This study introduces a novel model called 3H-TH (3D Rotation and Translation in Hyperbolic space) that captures these relation patterns simultaneously. In contrast, previous attempts have not achieved satisfactory performance across all the mentioned properties at the same time. The experimental results demonstrate that the new model outperforms existing state-of-the-art models in terms of accuracy, hierarchy property, and other relation patterns in low-dimensional space, meanwhile performing similarly in high-dimensional space.
△ Less
Submitted 3 February, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Norm of Word Embedding Encodes Information Gain
Authors:
Momose Oyama,
Sho Yokoi,
Hidetoshi Shimodaira
Abstract:
Distributed representations of words encode lexical semantic information, but what type of information is encoded and how? Focusing on the skip-gram with negative-sampling method, we found that the squared norm of static word embedding encodes the information gain conveyed by the word; the information gain is defined by the Kullback-Leibler divergence of the co-occurrence distribution of the word…
▽ More
Distributed representations of words encode lexical semantic information, but what type of information is encoded and how? Focusing on the skip-gram with negative-sampling method, we found that the squared norm of static word embedding encodes the information gain conveyed by the word; the information gain is defined by the Kullback-Leibler divergence of the co-occurrence distribution of the word to the unigram distribution. Our findings are explained by the theoretical framework of the exponential family of probability distributions and confirmed through precise experiments that remove spurious correlations arising from word frequency. This theory also extends to contextualized word embeddings in language models or any neural networks with the softmax output layer. We also demonstrate that both the KL divergence and the squared norm of embedding provide a useful metric of the informativeness of a word in tasks such as keyword extraction, proper-noun discrimination, and hypernym discrimination.
△ Less
Submitted 2 November, 2023; v1 submitted 19 December, 2022;
originally announced December 2022.
-
Improving word mover's distance by leveraging self-attention matrix
Authors:
Hiroaki Yamagiwa,
Sho Yokoi,
Hidetoshi Shimodaira
Abstract:
Measuring the semantic similarity between two sentences is still an important task. The word mover's distance (WMD) computes the similarity via the optimal alignment between the sets of word embeddings. However, WMD does not utilize word order, making it challenging to distinguish sentences with significant overlaps of similar words, even if they are semantically very different. Here, we attempt t…
▽ More
Measuring the semantic similarity between two sentences is still an important task. The word mover's distance (WMD) computes the similarity via the optimal alignment between the sets of word embeddings. However, WMD does not utilize word order, making it challenging to distinguish sentences with significant overlaps of similar words, even if they are semantically very different. Here, we attempt to improve WMD by incorporating the sentence structure represented by BERT's self-attention matrix (SAM). The proposed method is based on the Fused Gromov-Wasserstein distance, which simultaneously considers the similarity of the word embedding and the SAM for calculating the optimal transport between two sentences. Experiments demonstrate the proposed method enhances WMD and its variants in paraphrase identification with near-equivalent performance in semantic textual similarity. Our code is available at \url{https://github.com/ymgw55/WSMD}.
△ Less
Submitted 2 November, 2023; v1 submitted 11 November, 2022;
originally announced November 2022.
-
First High-speed Video Camera Observations of a Lightning Flash Associated with a Downward Terrestrial Gamma-ray Flash
Authors:
R. U. Abbasi,
M. M. F. Saba,
J. W. Belz,
P. R. Krehbiel,
W. Rison,
N. Kieu,
D. R. da Silva,
Dan Rodeheffer,
M. A. Stanley,
J. Remington,
J. Mazich,
R. LeVon,
K. Smout,
A. Petrizze,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
R. Arimura,
E. Barcikowski,
D. R. Bergman,
S. A. Blake,
I. Buckland,
B. G. Cheon,
M. Chikawa,
T. Fujii
, et al. (127 additional authors not shown)
Abstract:
In this paper, we present the first high-speed video observation of a cloud-to-ground lightning flash and its associated downward-directed Terrestrial Gamma-ray Flash (TGF). The optical emission of the event was observed by a high-speed video camera running at 40,000 frames per second in conjunction with the Telescope Array Surface Detector, Lightning Mapping Array, interferometer, electric-field…
▽ More
In this paper, we present the first high-speed video observation of a cloud-to-ground lightning flash and its associated downward-directed Terrestrial Gamma-ray Flash (TGF). The optical emission of the event was observed by a high-speed video camera running at 40,000 frames per second in conjunction with the Telescope Array Surface Detector, Lightning Mapping Array, interferometer, electric-field fast antenna, and the National Lightning Detection Network. The cloud-to-ground flash associated with the observed TGF was formed by a fast downward leader followed by a very intense return stroke peak current of -154 kA. The TGF occurred while the downward leader was below cloud base, and even when it was halfway in its propagation to ground. The suite of gamma-ray and lightning instruments, timing resolution, and source proximity offer us detailed information and therefore a unique look at the TGF phenomena.
△ Less
Submitted 9 August, 2023; v1 submitted 10 May, 2022;
originally announced May 2022.
-
Search for Spatial Correlations of Neutrinos with Ultra-High-Energy Cosmic Rays
Authors:
The ANTARES collaboration,
A. Albert,
S. Alves,
M. André,
M. Anghinolfi,
M. Ardid,
S. Ardid,
J. -J. Aubert,
J. Aublin,
B. Baret,
S. Basa,
B. Belhorma,
M. Bendahman,
V. Bertin,
S. Biagi,
M. Bissinger,
J. Boumaaza,
M. Bouta,
M. C. Bouwhuis,
H. Brânzaş,
R. Bruijn,
J. Brunner,
J. Busto,
B. Caiffi,
D. Calvo
, et al. (1025 additional authors not shown)
Abstract:
For several decades, the origin of ultra-high-energy cosmic rays (UHECRs) has been an unsolved question of high-energy astrophysics. One approach for solving this puzzle is to correlate UHECRs with high-energy neutrinos, since neutrinos are a direct probe of hadronic interactions of cosmic rays and are not deflected by magnetic fields. In this paper, we present three different approaches for corre…
▽ More
For several decades, the origin of ultra-high-energy cosmic rays (UHECRs) has been an unsolved question of high-energy astrophysics. One approach for solving this puzzle is to correlate UHECRs with high-energy neutrinos, since neutrinos are a direct probe of hadronic interactions of cosmic rays and are not deflected by magnetic fields. In this paper, we present three different approaches for correlating the arrival directions of neutrinos with the arrival directions of UHECRs. The neutrino data is provided by the IceCube Neutrino Observatory and ANTARES, while the UHECR data with energies above $\sim$50 EeV is provided by the Pierre Auger Observatory and the Telescope Array. All experiments provide increased statistics and improved reconstructions with respect to our previous results reported in 2015. The first analysis uses a high-statistics neutrino sample optimized for point-source searches to search for excesses of neutrinos clustering in the vicinity of UHECR directions. The second analysis searches for an excess of UHECRs in the direction of the highest-energy neutrinos. The third analysis searches for an excess of pairs of UHECRs and highest-energy neutrinos on different angular scales. None of the analyses has found a significant excess, and previously reported over-fluctuations are reduced in significance. Based on these results, we further constrain the neutrino flux spatially correlated with UHECRs.
△ Less
Submitted 23 August, 2022; v1 submitted 18 January, 2022;
originally announced January 2022.
-
Improving Nonparametric Classification via Local Radial Regression with an Application to Stock Prediction
Authors:
Ruixing Cao,
Akifumi Okuno,
Kei Nakagawa,
Hidetoshi Shimodaira
Abstract:
For supervised classification problems, this paper considers estimating the query's label probability through local regression using observed covariates. Well-known nonparametric kernel smoother and $k$-nearest neighbor ($k$-NN) estimator, which take label average over a ball around the query, are consistent but asymptotically biased particularly for a large radius of the ball. To eradicate such b…
▽ More
For supervised classification problems, this paper considers estimating the query's label probability through local regression using observed covariates. Well-known nonparametric kernel smoother and $k$-nearest neighbor ($k$-NN) estimator, which take label average over a ball around the query, are consistent but asymptotically biased particularly for a large radius of the ball. To eradicate such bias, local polynomial regression (LPoR) and multiscale $k$-NN (MS-$k$-NN) learn the bias term by local regression around the query and extrapolate it to the query itself. However, their theoretical optimality has been shown for the limit of the infinite number of training samples. For correcting the asymptotic bias with fewer observations, this paper proposes a \emph{local radial regression (LRR)} and its logistic regression variant called \emph{local radial logistic regression~(LRLR)}, by combining the advantages of LPoR and MS-$k$-NN. The idea is quite simple: we fit the local regression to observed labels by taking only the radial distance as the explanatory variable and then extrapolate the estimated label probability to zero distance. The usefulness of the proposed method is shown theoretically and experimentally. We prove the convergence rate of the $L^2$ risk for LRR with reference to MS-$k$-NN, and our numerical experiments, including real-world datasets of daily stock indices, demonstrate that LRLR outperforms LPoR and MS-$k$-NN.
△ Less
Submitted 21 July, 2022; v1 submitted 27 December, 2021;
originally announced December 2021.
-
Observation of Variations in Cosmic Ray Single Count Rates During Thunderstorms and Implications for Large-Scale Electric Field Changes
Authors:
R. U. Abbasi,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
R. Arimura,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
I. Buckland,
R. Cady,
B. G. Cheon,
J. Chiba,
M. Chikawa,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
R. Fukushima,
G. Furlich,
N. Globus,
R. Gonzalez,
W. Hanlon,
M. Hayashi
, et al. (140 additional authors not shown)
Abstract:
We present the first observation by the Telescope Array Surface Detector (TASD) of the effect of thunderstorms on the development of cosmic ray single count rate intensity over a 700 km$^{2}$ area. Observations of variations in the secondary low-energy cosmic ray counting rate, using the TASD, allow us to study the electric field inside thunderstorms, on a large scale, as it progresses on top of t…
▽ More
We present the first observation by the Telescope Array Surface Detector (TASD) of the effect of thunderstorms on the development of cosmic ray single count rate intensity over a 700 km$^{2}$ area. Observations of variations in the secondary low-energy cosmic ray counting rate, using the TASD, allow us to study the electric field inside thunderstorms, on a large scale, as it progresses on top of the 700 km$^{2}$ detector, without dealing with the limitation of narrow exposure in time and space using balloons and aircraft detectors. In this work, variations in the cosmic ray intensity (single count rate) using the TASD, were studied and found to be on average at the $\sim(0.5-1)\%$ and up to 2\% level. These observations were found to be both in excess and in deficit. They were also found to be correlated with lightning in addition to thunderstorms. These variations lasted for tens of minutes; their footprint on the ground ranged from 6 to 24 km in diameter and moved in the same direction as the thunderstorm. With the use of simple electric field models inside the cloud and between cloud to ground, the observed variations in the cosmic ray single count rate were recreated using CORSIKA simulations. Depending on the electric field model used and the direction of the electric field in that model, the electric field magnitude that reproduces the observed low-energy cosmic ray single count rate variations was found to be approximately between 0.2-0.4 GV. This in turn allows us to get a reasonable insight on the electric field and its effect on cosmic ray air showers inside thunderstorms.
△ Less
Submitted 18 November, 2021;
originally announced November 2021.
-
Indications of a Cosmic Ray Source in the Perseus-Pisces Supercluster
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
R. Arimura,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
I. Buckland,
R. Cady,
B. G. Cheon,
J. Chiba,
M. Chikawa,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
R. Fukushima,
G. Furlich,
N. Globus,
R. Gonzalez,
W. Hanlon
, et al. (135 additional authors not shown)
Abstract:
The Telescope Array Collaboration has observed an excess of events with $E \ge 10^{19.4} ~{\rm eV}$ in the data which is centered at (RA, dec) = ($19^\circ$, $35^\circ$). This is near the center of the Perseus-Pisces supercluster (PPSC). The PPSC is about $70 ~{\rm Mpc}$ distant and is the closest supercluster in the Northern Hemisphere (other than the Virgo supercluster of which we are a part). A…
▽ More
The Telescope Array Collaboration has observed an excess of events with $E \ge 10^{19.4} ~{\rm eV}$ in the data which is centered at (RA, dec) = ($19^\circ$, $35^\circ$). This is near the center of the Perseus-Pisces supercluster (PPSC). The PPSC is about $70 ~{\rm Mpc}$ distant and is the closest supercluster in the Northern Hemisphere (other than the Virgo supercluster of which we are a part). A Li-Ma oversampling analysis with $20^\circ$-radius circles indicates an excess in the arrival direction of events with a local significance of about 4 standard deviations. The probability of having such excess close to the PPSC by chance is estimated to be 3.5 standard deviations. This result indicates that a cosmic ray source likely exists in that supercluster.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Revisiting Additive Compositionality: AND, OR and NOT Operations with Word Embeddings
Authors:
Masahiro Naito,
Sho Yokoi,
Geewook Kim,
Hidetoshi Shimodaira
Abstract:
It is well-known that typical word embedding methods such as Word2Vec and GloVe have the property that the meaning can be composed by adding up the embeddings (additive compositionality). Several theories have been proposed to explain additive compositionality, but the following questions remain unanswered: (Q1) The assumptions of those theories do not hold for the practical word embedding. (Q2) O…
▽ More
It is well-known that typical word embedding methods such as Word2Vec and GloVe have the property that the meaning can be composed by adding up the embeddings (additive compositionality). Several theories have been proposed to explain additive compositionality, but the following questions remain unanswered: (Q1) The assumptions of those theories do not hold for the practical word embedding. (Q2) Ordinary additive compositionality can be seen as an AND operation of word meanings, but it is not well understood how other operations, such as OR and NOT, can be computed by the embeddings. We address these issues by the idea of frequency-weighted centering at its core. This paper proposes a post-processing method for bridging the gap between practical word embedding and the assumption of theory about additive compositionality as an answer to (Q1). It also gives a method for taking OR or NOT of the meaning by linear operation of word embedding as an answer to (Q2). Moreover, we confirm experimentally that the accuracy of AND operation, i.e., the ordinary additive compositionality, can be improved by our post-processing method (3.5x improvement in top-100 accuracy) and that OR and NOT operations can be performed correctly.
△ Less
Submitted 19 December, 2022; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Nonparametric estimation of the preferential attachment function from one network snapshot
Authors:
Thong Pham,
Paul Sheridan,
Hidetoshi Shimodaira
Abstract:
Preferential attachment is commonly invoked to explain the emergence of those heavy-tailed degree distributions characteristic of growing network representations of diverse real-world phenomena. Experimentally confirming this hypothesis in real-world growing networks is an important frontier in network science research. Conventional preferential attachment estimation methods require that a growing…
▽ More
Preferential attachment is commonly invoked to explain the emergence of those heavy-tailed degree distributions characteristic of growing network representations of diverse real-world phenomena. Experimentally confirming this hypothesis in real-world growing networks is an important frontier in network science research. Conventional preferential attachment estimation methods require that a growing network be observed across at least two snapshots in time. Numerous publicly available growing network datasets are, however, only available as single snapshots, leaving the applied network scientist with no means of measuring preferential attachment in these cases. We propose a nonparametric method, called PAFit-oneshot, for estimating preferential attachment in a growing network from one snapshot. PAFit-oneshot corrects for a previously unnoticed bias that arises when estimating preferential attachment values only for degrees observed in the single snapshot. Our work provides a means of measuring preferential attachment in a large number of publicly available one-snapshot networks. As a demonstration, we estimated preferential attachment in three such networks, and found sublinear preferential attachment in all cases. PAFit-oneshot is implemented in the R package PAFit.
△ Less
Submitted 21 June, 2021; v1 submitted 1 March, 2021;
originally announced March 2021.
-
Surface detectors of the TAx4 experiment
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
M. Abe,
T. Abu-Zayyad,
M. Allen,
Y. Arai,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
R. Cady,
B. G. Cheon,
J. Chiba,
M. Chikawa,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
R. Fukushima,
G. Furlich,
W. Hanlon,
M. Hayashi,
N. Hayashida,
K. Hibino
, et al. (124 additional authors not shown)
Abstract:
Telescope Array (TA) is the largest ultrahigh energy cosmic-ray (UHECR) observatory in the Northern Hemisphere. It explores the origin of UHECRs by measuring their energy spectrum, arrival-direction distribution, and mass composition using a surface detector (SD) array covering approximately 700 km$^2$ and fluorescence detector (FD) stations. TA has found evidence for a cluster of cosmic rays with…
▽ More
Telescope Array (TA) is the largest ultrahigh energy cosmic-ray (UHECR) observatory in the Northern Hemisphere. It explores the origin of UHECRs by measuring their energy spectrum, arrival-direction distribution, and mass composition using a surface detector (SD) array covering approximately 700 km$^2$ and fluorescence detector (FD) stations. TA has found evidence for a cluster of cosmic rays with energies greater than 57 EeV. In order to confirm this evidence with more data, it is necessary to increase the data collection rate.We have begun building an expansion of TA that we call TAx4. In this paper, we explain the motivation, design, technical features, and expected performance of the TAx4 SD. We also present TAx4's current status and examples of the data that have already been collected.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Observations of the Origin of Downward Terrestrial Gamma-Ray Flashes
Authors:
J. W. Belz,
P. R. Krehbiel,
J. Remington,
M. A. Stanley,
R. U. Abbasi,
R. LeVon,
W. Rison,
D. Rodeheffer,
the Telescope Array Scientific Collaboration,
:,
T. Abu-Zayyad,
M. Allen,
E. Barcikowski,
D. R. Bergman,
S. A. Blake,
M. Byrne,
R. Cady,
B. G. Cheon,
M. Chikawa,
A. di Matteo,
T. Fujii,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich
, et al. (116 additional authors not shown)
Abstract:
In this paper we report the first close, high-resolution observations of downward-directed terrestrial gamma-ray flashes (TGFs) detected by the large-area Telescope Array cosmic ray observatory, obtained in conjunction with broadband VHF interferometer and fast electric field change measurements of the parent discharge. The results show that the TGFs occur during strong initial breakdown pulses (I…
▽ More
In this paper we report the first close, high-resolution observations of downward-directed terrestrial gamma-ray flashes (TGFs) detected by the large-area Telescope Array cosmic ray observatory, obtained in conjunction with broadband VHF interferometer and fast electric field change measurements of the parent discharge. The results show that the TGFs occur during strong initial breakdown pulses (IBPs) in the first few milliseconds of negative cloud-to-ground and low-altitude intracloud flashes, and that the IBPs are produced by a newly-identified streamer-based discharge process called fast negative breakdown. The observations indicate the relativistic runaway electron avalanches (RREAs) responsible for producing the TGFs are initiated by embedded spark-like transient conducting events (TCEs) within the fast streamer system, and potentially also by individual fast streamers themselves. The TCEs are inferred to be the cause of impulsive sub-pulses that are characteristic features of classic IBP sferics. Additional development of the avalanches would be facilitated by the enhanced electric field ahead of the advancing front of the fast negative breakdown. In addition to showing the nature of IBPs and their enigmatic sub-pulses, the observations also provide a possible explanation for the unsolved question of how the streamer to leader transition occurs during the initial negative breakdown, namely as a result of strong currents flowing in the final stage of successive IBPs, extending backward through both the IBP itself and the negative streamer breakdown preceding the IBP.
△ Less
Submitted 12 October, 2020; v1 submitted 29 September, 2020;
originally announced September 2020.
-
Search for Large-scale Anisotropy on Arrival Directions of Ultra-high-energy Cosmic Rays Observed with the Telescope Array Experiment
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
M. Abe,
T. Abu-Zayyad,
M. Allen,
R. Azuma,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
R. Cady,
B. G. Cheon,
J. Chiba,
M. Chikawa,
A. di Matteo,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
W. Hanlon,
M. Hayashi,
N. Hayashida,
K. Hibino
, et al. (121 additional authors not shown)
Abstract:
Motivated by the detection of a significant dipole structure in the arrival directions of ultrahigh-energy cosmic rays above 8 EeV reported by the Pierre Auger Observatory (Auger), we search for a large-scale anisotropy using data collected with the surface detector array of the Telescope Array Experiment (TA). With 11 years of TA data, a dipole structure in a projection of the right ascension is…
▽ More
Motivated by the detection of a significant dipole structure in the arrival directions of ultrahigh-energy cosmic rays above 8 EeV reported by the Pierre Auger Observatory (Auger), we search for a large-scale anisotropy using data collected with the surface detector array of the Telescope Array Experiment (TA). With 11 years of TA data, a dipole structure in a projection of the right ascension is fitted with an amplitude of 3.3+- 1.9% and a phase of 131 +- 33 degrees. The corresponding 99% confidence-level upper limit on the amplitude is 7.3%. At the current level of statistics, the fitted result is compatible with both an isotropic distribution and the dipole structure reported by Auger.
△ Less
Submitted 27 July, 2020; v1 submitted 30 June, 2020;
originally announced July 2020.
-
Measurement of the Proton-Air Cross Section with Telescope Array's Black Rock Mesa and Long Ridge Fluorescence Detectors, and Surface Array in Hybrid Mode
Authors:
R. U. Abbasi,
M. Abe,
T. Abu-Zayyad,
M. Allen,
R. Azuma,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
R. Cady,
B. G. Cheon,
J. Chiba,
M. Chikawa,
A. di Matteo,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
W. Hanlon,
M. Hayashi,
N. Hayashida,
K. Hibino,
R. Higuchi
, et al. (120 additional authors not shown)
Abstract:
Ultra high energy cosmic rays provide the highest known energy source in the universe to measure proton cross sections. Though conditions for collecting such data are less controlled than an accelerator environment, current generation cosmic ray observatories have large enough exposures to collect significant statistics for a reliable measurement for energies above what can be attained in the lab.…
▽ More
Ultra high energy cosmic rays provide the highest known energy source in the universe to measure proton cross sections. Though conditions for collecting such data are less controlled than an accelerator environment, current generation cosmic ray observatories have large enough exposures to collect significant statistics for a reliable measurement for energies above what can be attained in the lab. Cosmic ray measurements of cross section use atmospheric calorimetry to measure depth of air shower maximum ($X_{\mathrm{max}}$), which is related to the primary particle's energy and mass. The tail of the $X_{\mathrm{max}}$ distribution is assumed to be dominated by showers generated by protons, allowing measurement of the inelastic proton-air cross section. In this work the proton-air inelastic cross section measurement, $σ^{\mathrm{inel}}_{\mathrm{p-air}}$, using data observed by Telescope Array's Black Rock Mesa and Long Ridge fluorescence detectors and surface detector array in hybrid mode is presented. $σ^{\mathrm{inel}}_{\mathrm{p-air}}$ is observed to be $520.1 \pm 35.8$[Stat.] $^{+25.0}_{-40}$[Sys.]~mb at $\sqrt{s} = 73$ TeV. The total proton-proton cross section is subsequently inferred from Glauber formalism and is found to be $σ^{\mathrm{tot}}_{\mathrm{pp}} = 139.4 ^{+23.4}_{-21.3}$ [Stat.]$ ^{+15.0}_{-24.0}$[Sys.]~mb.
△ Less
Submitted 8 June, 2020;
originally announced June 2020.
-
Evidence for a Supergalactic Structure of Magnetic Deflection Multiplets of Ultra-High Energy Cosmic Rays
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
M. Abe,
T. Abu-Zayyad,
M. Allen,
R. Azuma,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
R. Cady,
B. G. Cheon,
J. Chiba,
M. Chikawa,
A. di Matteo,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
W. Hanlon,
M. Hayashi,
N. Hayashida,
K. Hibino
, et al. (119 additional authors not shown)
Abstract:
Evidence for a large-scale supergalactic cosmic ray multiplet (arrival directions correlated with energy) structure is reported for ultra-high energy cosmic ray (UHECR) energies above 10$^{19}$ eV using seven years of data from the Telescope Array (TA) surface detector and updated to 10 years. Previous energy-position correlation studies have made assumptions regarding magnetic field shapes and st…
▽ More
Evidence for a large-scale supergalactic cosmic ray multiplet (arrival directions correlated with energy) structure is reported for ultra-high energy cosmic ray (UHECR) energies above 10$^{19}$ eV using seven years of data from the Telescope Array (TA) surface detector and updated to 10 years. Previous energy-position correlation studies have made assumptions regarding magnetic field shapes and strength, and UHECR composition. Here the assumption tested is that, since the supergalactic plane is a fit to the average matter density of the local Large Scale Structure (LSS), UHECR sources and intervening extragalactic magnetic fields are correlated with this plane. This supergalactic deflection hypothesis is tested by the entire field-of-view (FOV) behavior of the strength of intermediate-scale energy-angle correlations. These multiplets are measured in spherical cap section bins (wedges) of the FOV to account for coherent and random magnetic fields. The structure found is consistent with supergalactic deflection, the previously published energy spectrum anisotropy results of TA (the hotspot and coldspot), and toy-model simulations of a supergalactic magnetic sheet. The seven year data post-trial significance of this supergalactic structure of multiplets appearing by chance, on an isotropic sky, is found by Monte Carlo simulation to be 4.2$σ$. The ten years of data post-trial significance is 4.1$σ$. Furthermore, the starburst galaxy M82 is shown to be a possible source of the TA Hotspot, and an estimate of the supergalactic magnetic field using UHECR measurements is presented.
△ Less
Submitted 2 July, 2020; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Stochastic Neighbor Embedding of Multimodal Relational Data for Image-Text Simultaneous Visualization
Authors:
Morihiro Mizutani,
Akifumi Okuno,
Geewook Kim,
Hidetoshi Shimodaira
Abstract:
Multimodal relational data analysis has become of increasing importance in recent years, for exploring across different domains of data, such as images and their text tags obtained from social networking services (e.g., Flickr). A variety of data analysis methods have been developed for visualization; to give an example, t-Stochastic Neighbor Embedding (t-SNE) computes low-dimensional feature vect…
▽ More
Multimodal relational data analysis has become of increasing importance in recent years, for exploring across different domains of data, such as images and their text tags obtained from social networking services (e.g., Flickr). A variety of data analysis methods have been developed for visualization; to give an example, t-Stochastic Neighbor Embedding (t-SNE) computes low-dimensional feature vectors so that their similarities keep those of the observed data vectors. However, t-SNE is designed only for a single domain of data but not for multimodal data; this paper aims at visualizing multimodal relational data consisting of data vectors in multiple domains with relations across these vectors. By extending t-SNE, we herein propose Multimodal Relational Stochastic Neighbor Embedding (MR-SNE), that (1) first computes augmented relations, where we observe the relations across domains and compute those within each of domains via the observed data vectors, and (2) jointly embeds the augmented relations to a low-dimensional space. Through visualization of Flickr and Animal with Attributes 2 datasets, proposed MR-SNE is compared with other graph embedding-based approaches; MR-SNE demonstrates the promising performance.
△ Less
Submitted 1 May, 2020;
originally announced May 2020.
-
Extrapolation Towards Imaginary $0$-Nearest Neighbour and Its Improved Convergence Rate
Authors:
Akifumi Okuno,
Hidetoshi Shimodaira
Abstract:
$k$-nearest neighbour ($k$-NN) is one of the simplest and most widely-used methods for supervised classification, that predicts a query's label by taking weighted ratio of observed labels of $k$ objects nearest to the query. The weights and the parameter $k \in \mathbb{N}$ regulate its bias-variance trade-off, and the trade-off implicitly affects the convergence rate of the excess risk for the $k…
▽ More
$k$-nearest neighbour ($k$-NN) is one of the simplest and most widely-used methods for supervised classification, that predicts a query's label by taking weighted ratio of observed labels of $k$ objects nearest to the query. The weights and the parameter $k \in \mathbb{N}$ regulate its bias-variance trade-off, and the trade-off implicitly affects the convergence rate of the excess risk for the $k$-NN classifier; several existing studies considered selecting optimal $k$ and weights to obtain faster convergence rate. Whereas $k$-NN with non-negative weights has been developed widely, it was also proved that negative weights are essential for eradicating the bias terms and attaining optimal convergence rate. In this paper, we propose a novel multiscale $k$-NN (MS-$k$-NN), that extrapolates unweighted $k$-NN estimators from several $k \ge 1$ values to $k=0$, thus giving an imaginary 0-NN estimator. Our method implicitly computes optimal real-valued weights that are adaptive to the query and its neighbour points. We theoretically prove that the MS-$k$-NN attains the improved rate, which coincides with the existing optimal rate under some conditions.
△ Less
Submitted 10 November, 2020; v1 submitted 7 February, 2020;
originally announced February 2020.
-
Prediction of head motion from speech waveforms with a canonical-correlation-constrained autoencoder
Authors:
JinHong Lu,
Hiroshi Shimodaira
Abstract:
This study investigates the direct use of speech waveforms to predict head motion for speech-driven head-motion synthesis, whereas the use of spectral features such as MFCC as basic input features together with additional features such as energy and F0 is common in the literature. We show that, rather than combining different features that originate from waveforms, it is more effective to use wave…
▽ More
This study investigates the direct use of speech waveforms to predict head motion for speech-driven head-motion synthesis, whereas the use of spectral features such as MFCC as basic input features together with additional features such as energy and F0 is common in the literature. We show that, rather than combining different features that originate from waveforms, it is more effective to use waveforms directly predicting corresponding head motion. The challenge with the waveform-based approach is that waveforms contain a large amount of information irrelevant to predict head motion, which hinders the training of neural networks. To overcome the problem, we propose a canonical-correlation-constrained autoencoder (CCCAE), where hidden layers are trained to not only minimise the error but also maximise the canonical correlation with head motion. Compared with an MFCC-based system, the proposed system shows comparable performance in objective evaluation, and better performance in subject evaluation.
△ Less
Submitted 2 November, 2020; v1 submitted 5 February, 2020;
originally announced February 2020.
-
More Powerful Selective Kernel Tests for Feature Selection
Authors:
Jen Ning Lim,
Makoto Yamada,
Wittawat Jitkrittum,
Yoshikazu Terada,
Shigeyuki Matsui,
Hidetoshi Shimodaira
Abstract:
Refining one's hypotheses in the light of data is a common scientific practice; however, the dependency on the data introduces selection bias and can lead to specious statistical analysis. An approach for addressing this is via conditioning on the selection procedure to account for how we have used the data to generate our hypotheses, and prevent information to be used again after selection. Many…
▽ More
Refining one's hypotheses in the light of data is a common scientific practice; however, the dependency on the data introduces selection bias and can lead to specious statistical analysis. An approach for addressing this is via conditioning on the selection procedure to account for how we have used the data to generate our hypotheses, and prevent information to be used again after selection. Many selective inference (a.k.a. post-selection inference) algorithms typically take this approach but will "over-condition" for sake of tractability. While this practice yields well calibrated statistic tests with controlled false positive rates (FPR), it can incur a major loss in power. In our work, we extend two recent proposals for selecting features using the Maximum Mean Discrepancy and Hilbert Schmidt Independence Criterion to condition on the minimal conditioning event. We show how recent advances in multiscale bootstrap makes conditioning on the minimal selection event possible and demonstrate our proposal over a range of synthetic and real world experiments. Our results show that our proposed test is indeed more powerful in most scenarios.
△ Less
Submitted 29 February, 2020; v1 submitted 14 October, 2019;
originally announced October 2019.
-
Joint Estimation of the Non-parametric Transitivity and Preferential Attachment Functions in Scientific Co-authorship Networks
Authors:
Masaaki Inoue,
Thong Pham,
Hidetoshi Shimodaira
Abstract:
We propose a statistical method to estimate simultaneously the non-parametric transitivity and preferential attachment functions in a growing network, in contrast to conventional methods that either estimate each function in isolation or assume some functional form for them. Our model is shown to be a good fit to two real-world co-authorship networks and be able to bring to light intriguing detail…
▽ More
We propose a statistical method to estimate simultaneously the non-parametric transitivity and preferential attachment functions in a growing network, in contrast to conventional methods that either estimate each function in isolation or assume some functional form for them. Our model is shown to be a good fit to two real-world co-authorship networks and be able to bring to light intriguing details of the preferential attachment and transitivity phenomena that would be unavailable under traditional methods. We also introduce a method to quantify the amount of contributions of those phenomena in the growth process of a network based on the probabilistic dynamic process induced by the model formula. Applying this method, we found that transitivity dominated PA in both co-authorship networks. This suggests the importance of indirect relations in scientific creative processes. The proposed methods are implemented in the R package FoFaF.
△ Less
Submitted 1 October, 2019;
originally announced October 2019.
-
Hyperlink Regression via Bregman Divergence
Authors:
Akifumi Okuno,
Hidetoshi Shimodaira
Abstract:
A collection of $U \: (\in \mathbb{N})$ data vectors is called a $U$-tuple, and the association strength among the vectors of a tuple is termed as the \emph{hyperlink weight}, that is assumed to be symmetric with respect to permutation of the entries in the index. We herein propose Bregman hyperlink regression (BHLR), which learns a user-specified symmetric similarity function such that it predict…
▽ More
A collection of $U \: (\in \mathbb{N})$ data vectors is called a $U$-tuple, and the association strength among the vectors of a tuple is termed as the \emph{hyperlink weight}, that is assumed to be symmetric with respect to permutation of the entries in the index. We herein propose Bregman hyperlink regression (BHLR), which learns a user-specified symmetric similarity function such that it predicts the tuple's hyperlink weight from data vectors stored in the $U$-tuple. BHLR is a simple and general framework for hyper-relational learning, that minimizes Bregman-divergence (BD) between the hyperlink weights and estimated similarities defined for the corresponding tuples; BHLR encompasses various existing methods, such as logistic regression ($U=1$), Poisson regression ($U=1$), link prediction ($U=2$), and those for representation learning, such as graph embedding ($U=2$), matrix factorization ($U=2$), tensor factorization ($U \geq 2$), and their variants equipped with arbitrary BD. Nonlinear functions (e.g., neural networks), can be employed for the similarity functions. However, there are theoretical challenges such that some of different tuples of BHLR may share data vectors therein, unlike the i.i.d. setting of classical regression. We address these theoretical issues, and proved that BHLR equipped with arbitrary BD and $U \in \mathbb{N}$ is (P-1) statistically consistent, that is, it asymptotically recovers the underlying true conditional expectation of hyperlink weights given data vectors, and (P-2) computationally tractable, that is, it is efficiently computed by stochastic optimization algorithms using a novel generalized minibatch sampling procedure for hyper-relational data. Consequently, theoretical guarantees for BHLR including several existing methods, that have been examined experimentally, are provided in a unified manner.
△ Less
Submitted 28 March, 2020; v1 submitted 21 July, 2019;
originally announced August 2019.
-
A neural network based post-filter for speech-driven head motion synthesis
Authors:
JinHong Lu,
Hiroshi Shimodaira
Abstract:
Despite the fact that neural networks are widely used for speech-driven head motion synthesis, it is well-known that the output of neural networks is noisy or discontinuous due to the limited capability of deep neural networks in predicting human motion. Thus, post-processing is required to obtain smooth head motion trajectories for animation. It is common to apply a linear filter or consider keyf…
▽ More
Despite the fact that neural networks are widely used for speech-driven head motion synthesis, it is well-known that the output of neural networks is noisy or discontinuous due to the limited capability of deep neural networks in predicting human motion. Thus, post-processing is required to obtain smooth head motion trajectories for animation. It is common to apply a linear filter or consider keyframes as post-processing. However, neither approach is optimal as there is always a trade-off between smoothness and accuracy. We propose to employ a neural network trained in a way that it is capable of reconstructing the head motions, in order to overcome this limitation. In the objective evaluation, this filter is proved to be good at de-noising data involving types of noise (dropout or Gaussian noise). Objective metrics also demonstrate the improvement of the joined head motion's smoothness after being processed by our proposed filter. A detailed analysis reveals that our proposed filter learns the characteristic of head motions. The subjective evaluation shows that participants were unable to distinguish the synthesised head motions with our proposed filter from ground truth, which was preferred over the Gaussian filter and moving average.
△ Less
Submitted 24 July, 2019; v1 submitted 24 July, 2019;
originally announced July 2019.
-
Extended probabilistic Rand index and the adjustable moving window-based pixel-pair sampling method
Authors:
Hisashi Shimodaira
Abstract:
The probabilistic Rand (PR) index has the following three problems: It lacks variations in its value over images; the normalized probabilistic Rand (NPR) index to address this is theoretically unclear, and the sampling method of pixel-pairs was not proposed concretely. In this paper, we propose methods for solving these problems. First, we propose extended probabilistic Rand (EPR) index that consi…
▽ More
The probabilistic Rand (PR) index has the following three problems: It lacks variations in its value over images; the normalized probabilistic Rand (NPR) index to address this is theoretically unclear, and the sampling method of pixel-pairs was not proposed concretely. In this paper, we propose methods for solving these problems. First, we propose extended probabilistic Rand (EPR) index that considers not only similarity but also dissimilarity between segmentations. The EPR index provides twice as wide effective range as the PR index does. Second, we propose an adjustable moving window-based pixel-pair sampling (AWPS) method in which each pixel-pair is sampled adjustably by considering granularities of ground truth segmentations. Results of experiments show that the proposed methods work effectively and efficiently.
△ Less
Submitted 18 June, 2019;
originally announced June 2019.
-
Selective inference after feature selection via multiscale bootstrap
Authors:
Yoshikazu Terada,
Hidetoshi Shimodaira
Abstract:
It is common to show the confidence intervals or $p$-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the selection event. Most existing studies of selective inference consider a specific algorithm, such as Lasso, for feature selection, and thus they have difficulties in ha…
▽ More
It is common to show the confidence intervals or $p$-values of selected features, or predictor variables in regression, but they often involve selection bias. The selective inference approach solves this bias by conditioning on the selection event. Most existing studies of selective inference consider a specific algorithm, such as Lasso, for feature selection, and thus they have difficulties in handling more complicated algorithms. Moreover, existing studies often consider unnecessarily restrictive events, leading to over-conditioning and lower statistical power. Our novel and widely-applicable resampling method via multiscale bootstrap addresses these issues to compute an approximately unbiased selective $p$-value for the selected features. As a simplification of the proposed method, we also develop a simpler method via the classical bootstrap. We prove that the $p$-value computed by our multiscale bootstrap method is more accurate than the classical bootstrap method. Furthermore, numerical experiments demonstrate that our algorithm works well even for more complicated feature selection methods such as non-convex regularization.
△ Less
Submitted 31 May, 2022; v1 submitted 25 May, 2019;
originally announced May 2019.
-
Search for Ultra-High-Energy Neutrinos with the Telescope Array Surface Detector
Authors:
R. U. Abbasi,
M. Abe,
T. Abu-Zayyad,
M. Allen,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
R. Cady,
B. G. Cheon,
J. Chiba,
M. Chikawa,
A. di Matteo,
T. Fujii,
K. Fujisue,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
W. Hanlon,
M. Hayashi,
Y. Hayashi,
N. Hayashida,
K. Hibino,
K. Honda
, et al. (112 additional authors not shown)
Abstract:
We present an upper limit on the flux of ultra-high-energy down-going neutrinos for $E > 10^{18}\ \mbox{eV}$ derived with the nine years of data collected by the Telescope Array surface detector (05-11-2008 -- 05-10-2017). The method is based on the multivariate analysis technique, so-called Boosted Decision Trees (BDT). Proton-neutrino classifier is built upon 16 observables related to both the p…
▽ More
We present an upper limit on the flux of ultra-high-energy down-going neutrinos for $E > 10^{18}\ \mbox{eV}$ derived with the nine years of data collected by the Telescope Array surface detector (05-11-2008 -- 05-10-2017). The method is based on the multivariate analysis technique, so-called Boosted Decision Trees (BDT). Proton-neutrino classifier is built upon 16 observables related to both the properties of the shower front and the lateral distribution function.
△ Less
Submitted 12 May, 2020; v1 submitted 9 May, 2019;
originally announced May 2019.
-
Search for point sources of ultra-high energy photons with the Telescope Array surface detector
Authors:
Telescope Array Collaboration,
R. U. Abbasi,
M. Abe,
T. Abu-Zayyad,
M. Allen,
R. Azuma,
E. Barcikowski,
J. W. Belz,
D. R. Bergman,
S. A. Blake,
R. Cady,
B. G. Cheon,
J. Chiba,
M. Chikawa,
A. diMatteo,
T. Fujii,
K. Fujita,
R. Fujiwara,
M. Fukushima,
G. Furlich,
W. Hanlon,
M. Hayashi,
Y. Hayashi,
N. Hayashida,
K. Hibino
, et al. (114 additional authors not shown)
Abstract:
The surface detector (SD) of the Telescope Array (TA) experiment allows one to indirectly detect photons with energies of order $10^{18}$ eV and higher and to separate photons from the cosmic-ray background. In this paper we present the results of a blind search for point sources of ultra-high energy (UHE) photons in the Northern sky using the TA SD data. The photon-induced extensive air showers (…
▽ More
The surface detector (SD) of the Telescope Array (TA) experiment allows one to indirectly detect photons with energies of order $10^{18}$ eV and higher and to separate photons from the cosmic-ray background. In this paper we present the results of a blind search for point sources of ultra-high energy (UHE) photons in the Northern sky using the TA SD data. The photon-induced extensive air showers (EAS) are separated from the hadron-induced EAS background by means of a multivariate classifier based upon 16 parameters that characterize the air shower events. No significant evidence for the photon point sources is found. The upper limits are set on the flux of photons from each particular direction in the sky within the TA field of view, according to the experiment's angular resolution for photons. Average 95% C.L. upper limits for the point-source flux of photons with energies greater than $10^{18}$, $10^{18.5}$, $10^{19}$, $10^{19.5}$ and $10^{20}$ eV are $0.094$, $0.029$, $0.010$, $0.0073$ and $0.0058$ km$^{-2}$yr$^{-1}$ respectively. For the energies higher than $10^{18.5}$ eV, the photon point-source limits are set for the first time. Numerical results for each given direction in each energy range are provided as a supplement to this paper.
△ Less
Submitted 9 March, 2020; v1 submitted 30 March, 2019;
originally announced April 2019.
-
Representation Learning with Weighted Inner Product for Universal Approximation of General Similarities
Authors:
Geewook Kim,
Akifumi Okuno,
Kazuki Fukui,
Hidetoshi Shimodaira
Abstract:
We propose $\textit{weighted inner product similarity}$ (WIPS) for neural network-based graph embedding. In addition to the parameters of neural networks, we optimize the weights of the inner product by allowing positive and negative values. Despite its simplicity, WIPS can approximate arbitrary general similarities including positive definite, conditionally positive definite, and indefinite kerne…
▽ More
We propose $\textit{weighted inner product similarity}$ (WIPS) for neural network-based graph embedding. In addition to the parameters of neural networks, we optimize the weights of the inner product by allowing positive and negative values. Despite its simplicity, WIPS can approximate arbitrary general similarities including positive definite, conditionally positive definite, and indefinite kernels. WIPS is free from similarity model selection, since it can learn any similarity models such as cosine similarity, negative Poincaré distance and negative Wasserstein distance. Our experiments show that the proposed method can learn high-quality distributed representations of nodes from real datasets, leading to an accurate approximation of similarities as well as high performance in inductive tasks.
△ Less
Submitted 1 June, 2019; v1 submitted 27 February, 2019;
originally announced February 2019.
-
Robust Graph Embedding with Noisy Link Weights
Authors:
Akifumi Okuno,
Hidetoshi Shimodaira
Abstract:
We propose $β$-graph embedding for robustly learning feature vectors from data vectors and noisy link weights. A newly introduced empirical moment $β$-score reduces the influence of contamination and robustly measures the difference between the underlying correct expected weights of links and the specified generative model. The proposed method is computationally tractable; we employ a minibatch-ba…
▽ More
We propose $β$-graph embedding for robustly learning feature vectors from data vectors and noisy link weights. A newly introduced empirical moment $β$-score reduces the influence of contamination and robustly measures the difference between the underlying correct expected weights of links and the specified generative model. The proposed method is computationally tractable; we employ a minibatch-based efficient stochastic algorithm and prove that this algorithm locally minimizes the empirical moment $β$-score. We conduct numerical experiments on synthetic and real-world datasets.
△ Less
Submitted 22 February, 2019;
originally announced February 2019.
-
An information criterion for auxiliary variable selection in incomplete data analysis
Authors:
Shinpei Imori,
Hidetoshi Shimodaira
Abstract:
Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not observed. Utilizing a parametric model of joint distribution of primary and auxiliary variables, it is possible to improve the estimation of parametr…
▽ More
Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not observed. Utilizing a parametric model of joint distribution of primary and auxiliary variables, it is possible to improve the estimation of parametric model for the primary variables when the auxiliary variables are closely related to the primary variables. However, the estimation accuracy reduces when the auxiliary variables are irrelevant to the primary variables. For selecting useful auxiliary variables, we formulate the problem as model selection, and propose an information criterion for predicting primary variables by leveraging auxiliary variables. The proposed information criterion is an asymptotically unbiased estimator of the Kullback-Leibler divergence for complete data of primary variables under some reasonable conditions. We also clarify an asymptotic equivalence between the proposed information criterion and a variant of leave-one-out cross validation. Performance of our method is demonstrated via a simulation study and a real data example.
△ Less
Submitted 9 March, 2019; v1 submitted 21 February, 2019;
originally announced February 2019.
-
Selective Inference for Testing Trees and Edges in Phylogenetics
Authors:
Hidetoshi Shimodaira,
Yoshikazu Terada
Abstract:
Selective inference is considered for testing trees and edges in phylogenetic tree selection from molecular sequences. This improves the previously proposed approximately unbiased test by adjusting the selection bias when testing many trees and edges at the same time. The newly proposed selective inference $p$-value is useful for testing selected edges to claim that they are significantly supporte…
▽ More
Selective inference is considered for testing trees and edges in phylogenetic tree selection from molecular sequences. This improves the previously proposed approximately unbiased test by adjusting the selection bias when testing many trees and edges at the same time. The newly proposed selective inference $p$-value is useful for testing selected edges to claim that they are significantly supported if $p>1-α$, whereas the non-selective $p$-value is still useful for testing candidate trees to claim that they are rejected if $p<α$. The selective $p$-value controls the type-I error conditioned on the selection event, whereas the non-selective $p$-value controls it unconditionally. The selective and non-selective approximately unbiased $p$-values are computed from two geometric quantities called signed distance and mean curvature of the region representing tree or edge of interest in the space of probability distributions. These two geometric quantities are estimated by fitting a model of scaling-law to the non-parametric multiscale bootstrap probabilities. Our general method is applicable to a wider class of problems; phylogenetic tree selection is an example of model selection, and it is interpreted as the variable selection of multiple regression, where each edge corresponds to each predictor. Our method is illustrated in a previously controversial phylogenetic analysis of human, rabbit and mouse.
△ Less
Submitted 24 May, 2019; v1 submitted 13 February, 2019;
originally announced February 2019.