Search | arXiv e-print repository

arXiv:1912.02686 [pdf, ps, other]

Binarized Canonical Polyadic Decomposition for Knowledge Graph Completion

Authors: Koki Kishimoto, Katsuhiko Hayashi, Genki Akai, Masashi Shimbo

Abstract: Methods based on vector embeddings of knowledge graphs have been actively pursued as a promising approach to knowledge graph completion.However, embedding models generate storage-inefficient representations, particularly when the number of entities and relations, and the dimensionality of the real-valued embedding vectors are large. We present a binarized CANDECOMP/PARAFAC(CP) decomposition algori… ▽ More Methods based on vector embeddings of knowledge graphs have been actively pursued as a promising approach to knowledge graph completion.However, embedding models generate storage-inefficient representations, particularly when the number of entities and relations, and the dimensionality of the real-valued embedding vectors are large. We present a binarized CANDECOMP/PARAFAC(CP) decomposition algorithm, which we refer to as B-CP, where real-valued parameters are replaced by binary values to reduce model size. Moreover, we show that a fast score computation technique can be developed with bitwise operations. We prove that B-CP is fully expressive by deriving a bound on the size of its embeddings. Experimental results on several benchmark datasets demonstrate that the proposed method successfully reduces model size by more than an order of magnitude while maintaining task performance at the same level as the real-valued CP model. △ Less

Submitted 4 December, 2019; originally announced December 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1902.02970

arXiv:1902.02970 [pdf, ps, other]

Binarized Knowledge Graph Embeddings

Authors: Koki Kishimoto, Katsuhiko Hayashi, Genki Akai, Masashi Shimbo, Kazunori Komatani

Abstract: Tensor factorization has become an increasingly popular approach to knowledge graph completion(KGC), which is the task of automatically predicting missing facts in a knowledge graph. However, even with a simple model like CANDECOMP/PARAFAC(CP) tensor decomposition, KGC on existing knowledge graphs is impractical in resource-limited environments, as a large amount of memory is required to store par… ▽ More Tensor factorization has become an increasingly popular approach to knowledge graph completion(KGC), which is the task of automatically predicting missing facts in a knowledge graph. However, even with a simple model like CANDECOMP/PARAFAC(CP) tensor decomposition, KGC on existing knowledge graphs is impractical in resource-limited environments, as a large amount of memory is required to store parameters represented as 32-bit or 64-bit floating point numbers. This limitation is expected to become more stringent as existing knowledge graphs, which are already huge, keep steadily growing in scale. To reduce the memory requirement, we present a method for binarizing the parameters of the CP tensor decomposition by introducing a quantization function to the optimization problem. This method replaces floating point-valued parameters with binary ones after training, which drastically reduces the model size at run time. We investigate the trade-off between the quality and size of tensor factorization models for several KGC benchmark datasets. In our experiments, the proposed method successfully reduced the model size by more than an order of magnitude while maintaining the task performance. Moreover, a fast score computation technique can be developed with bitwise operations. △ Less

Submitted 8 February, 2019; originally announced February 2019.

arXiv:1808.08361 [pdf, other]

Data-dependent Learning of Symmetric/Antisymmetric Relations for Knowledge Base Completion

Authors: Hitoshi Manabe, Katsuhiko Hayashi, Masashi Shimbo

Abstract: Embedding-based methods for knowledge base completion (KBC) learn representations of entities and relations in a vector space, along with the scoring function to estimate the likelihood of relations between entities. The learnable class of scoring functions is designed to be expressive enough to cover a variety of real-world relations, but this expressive comes at the cost of an increased number o… ▽ More Embedding-based methods for knowledge base completion (KBC) learn representations of entities and relations in a vector space, along with the scoring function to estimate the likelihood of relations between entities. The learnable class of scoring functions is designed to be expressive enough to cover a variety of real-world relations, but this expressive comes at the cost of an increased number of parameters. In particular, parameters in these methods are superfluous for relations that are either symmetric or antisymmetric. To mitigate this problem, we propose a new L1 regularizer for Complex Embeddings, which is one of the state-of-the-art embedding-based methods for KBC. This regularizer promotes symmetry or antisymmetry of the scoring function on a relation-by-relation basis, in accordance with the observed data. Our empirical evaluation shows that the proposed method outperforms the original Complex Embeddings and other baseline methods on the FB15k dataset. △ Less

Submitted 25 August, 2018; originally announced August 2018.

Comments: In AAAI 2018

arXiv:1806.03945 [pdf, ps, other]

A Fast and Easy Regression Technique for k-NN Classification Without Using Negative Pairs

Authors: Yutaro Shigeto, Masashi Shimbo, Yuji Matsumoto

Abstract: This paper proposes an inexpensive way to learn an effective dissimilarity function to be used for $k$-nearest neighbor ($k$-NN) classification. Unlike Mahalanobis metric learning methods that map both query (unlabeled) objects and labeled objects to new coordinates by a single transformation, our method learns a transformation of labeled objects to new points in the feature space whereas query ob… ▽ More This paper proposes an inexpensive way to learn an effective dissimilarity function to be used for $k$-nearest neighbor ($k$-NN) classification. Unlike Mahalanobis metric learning methods that map both query (unlabeled) objects and labeled objects to new coordinates by a single transformation, our method learns a transformation of labeled objects to new points in the feature space whereas query objects are kept in their original coordinates. This method has several advantages over existing distance metric learning methods: (i) In experiments with large document and image datasets, it achieves $k$-NN classification accuracy better than or at least comparable to the state-of-the-art metric learning methods. (ii) The transformation can be learned efficiently by solving a standard ridge regression problem. For document and image datasets, training is often more than two orders of magnitude faster than the fastest metric learning methods tested. This speed-up is also due to the fact that the proposed method eliminates the optimization over "negative" object pairs, i.e., objects whose class labels are different. (iii) The formulation has a theoretical justification in terms of reducing hubness in data. △ Less

Submitted 28 August, 2020; v1 submitted 11 June, 2018; originally announced June 2018.

Comments: Earlier version of this paper appeared in PAKDD 2017. This version corrects an error in Eq. (6)

arXiv:1507.00825 [pdf, other]

Ridge Regression, Hubness, and Zero-Shot Learning

Authors: Yutaro Shigeto, Ikumi Suzuki, Kazuo Hara, Masashi Shimbo, Yuji Matsumoto

Abstract: This paper discusses the effect of hubness in zero-shot learning, when ridge regression is used to find a mapping between the example space to the label space. Contrary to the existing approach, which attempts to find a mapping from the example space to the label space, we show that mapping labels into the example space is desirable to suppress the emergence of hubs in the subsequent nearest neigh… ▽ More This paper discusses the effect of hubness in zero-shot learning, when ridge regression is used to find a mapping between the example space to the label space. Contrary to the existing approach, which attempts to find a mapping from the example space to the label space, we show that mapping labels into the example space is desirable to suppress the emergence of hubs in the subsequent nearest neighbor search step. Assuming a simple data model, we prove that the proposed approach indeed reduces hubness. This was verified empirically on the tasks of bilingual lexicon extraction and image labeling: hubness was reduced with both of these tasks and the accuracy was improved accordingly. △ Less

Submitted 3 July, 2015; originally announced July 2015.

Comments: To be presented at ECML/PKDD 2015

arXiv:1301.0725 [pdf, other]

The Sum-over-Forests density index: identifying dense regions in a graph

Authors: Mathieu Senelle, Silvia Garcia-Diez, Amin Mantrach, Masashi Shimbo, Marco Saerens, François Fouss

Abstract: This work introduces a novel nonparametric density index defined on graphs, the Sum-over-Forests (SoF) density index. It is based on a clear and intuitive idea: high-density regions in a graph are characterized by the fact that they contain a large amount of low-cost trees with high outdegrees while low-density regions contain few ones. Therefore, a Boltzmann probability distribution on the counta… ▽ More This work introduces a novel nonparametric density index defined on graphs, the Sum-over-Forests (SoF) density index. It is based on a clear and intuitive idea: high-density regions in a graph are characterized by the fact that they contain a large amount of low-cost trees with high outdegrees while low-density regions contain few ones. Therefore, a Boltzmann probability distribution on the countable set of forests in the graph is defined so that large (high-cost) forests occur with a low probability while short (low-cost) forests occur with a high probability. Then, the SoF density index of a node is defined as the expected outdegree of this node in a non-trivial tree of the forest, thus providing a measure of density around that node. Following the matrix-forest theorem, and a statistical physics framework, it is shown that the SoF density index can be easily computed in closed form through a simple matrix inversion. Experiments on artificial and real data sets show that the proposed index performs well on finding dense regions, for graphs of various origins. △ Less

Submitted 4 January, 2013; originally announced January 2013.

arXiv:1212.1666 [pdf, other]

doi 10.1016/j.physa.2013.09.016

Developments in the theory of randomized shortest paths with a comparison of graph node distances

Authors: Ilkka Kivimäki, Masashi Shimbo, Marco Saerens

Abstract: There have lately been several suggestions for parametrized distances on a graph that generalize the shortest path distance and the commute time or resistance distance. The need for developing such distances has risen from the observation that the above-mentioned common distances in many situations fail to take into account the global structure of the graph. In this article, we develop the theory… ▽ More There have lately been several suggestions for parametrized distances on a graph that generalize the shortest path distance and the commute time or resistance distance. The need for developing such distances has risen from the observation that the above-mentioned common distances in many situations fail to take into account the global structure of the graph. In this article, we develop the theory of one family of graph node distances, known as the randomized shortest path dissimilarity, which has its foundation in statistical physics. We show that the randomized shortest path dissimilarity can be easily computed in closed form for all pairs of nodes of a graph. Moreover, we come up with a new definition of a distance measure that we call the free energy distance. The free energy distance can be seen as an upgrade of the randomized shortest path dissimilarity as it defines a metric, in addition to which it satisfies the graph-geodetic property. The derivation and computation of the free energy distance are also straightforward. We then make a comparison between a set of generalized distances that interpolate between the shortest path distance and the commute time, or resistance distance. This comparison focuses on the applicability of the distances in graph node clustering and classification. The comparison, in general, shows that the parametrized distances perform well in the tasks. In particular, we see that the results obtained with the free energy distance are among the best in all the experiments. △ Less

Submitted 3 October, 2013; v1 submitted 7 December, 2012; originally announced December 2012.

Comments: 30 pages, 4 figures, 3 tables

Showing 1–7 of 7 results for author: Shimbo, M