Skip to main content

Showing 1–27 of 27 results for author: Yamanishi, K

.
  1. arXiv:2502.18709  [pdf, ps, other

    cs.LG stat.ML

    Bandit and Delayed Feedback in Online Structured Prediction

    Authors: Yuki Shibukawa, Taira Tsuchiya, Shinsaku Sakaue, Kenji Yamanishi

    Abstract: Online structured prediction is a task of sequentially predicting outputs with complex structures based on inputs and past observations, encompassing online classification. Recent studies showed that in the full information setup, we can achieve finite bounds on the surrogate regret, i.e., the extra target loss relative to the best possible surrogate loss. In practice, however, full information fe… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: 33 pages

  2. arXiv:2412.01163  [pdf, other

    cs.LG cs.IT stat.ML

    Graph Community Augmentation with GMM-based Modeling in Latent Space

    Authors: Shintaro Fukushima, Kenji Yamanishi

    Abstract: This study addresses the issue of graph generation with generative models. In particular, we are concerned with graph community augmentation problem, which refers to the problem of generating unseen or unfamiliar graphs with a new community out of the probability distribution estimated with a given graph dataset. The graph community augmentation means that the generated graphs have a new community… ▽ More

    Submitted 2 December, 2024; originally announced December 2024.

    Comments: IEEE Copyright. Accepted to 24th IEEE International Conference on Data Mining (ICDM). 10pages

  3. arXiv:2409.08387  [pdf, ps, other

    math.ST cs.IT stat.ML

    Foundation of Calculating Normalized Maximum Likelihood for Continuous Probability Models

    Authors: Atsushi Suzuki, Kota Fukuzawa, Kenji Yamanishi

    Abstract: The normalized maximum likelihood (NML) code length is widely used as a model selection criterion based on the minimum description length principle, where the model with the shortest NML code length is selected. A common method to calculate the NML code length is to use the sum (for a discrete model) or integral (for a continuous model) of a function defined by the distribution of the maximum like… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  4. arXiv:2403.18269  [pdf, other

    stat.ML cs.IT cs.LG

    Clustering Change Sign Detection by Fusing Mixture Complexity

    Authors: Kento Urano, Ryo Yuki, Kenji Yamanishi

    Abstract: This paper proposes an early detection method for cluster structural changes. Cluster structure refers to discrete structural characteristics, such as the number of clusters, when data are represented using finite mixture models, such as Gaussian mixture models. We focused on scenarios in which the cluster structure gradually changed over time. For finite mixture models, the concept of mixture com… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 23 pages

  5. arXiv:2311.18694  [pdf, other

    stat.ML cs.IT cs.LG

    Balancing Summarization and Change Detection in Graph Streams

    Authors: Shintaro Fukushima, Kenji Yamanishi

    Abstract: This study addresses the issue of balancing graph summarization and graph change detection. Graph summarization compresses large-scale graphs into a smaller scale. However, the question remains: To what extent should the original graph be compressed? This problem is solved from the perspective of graph change detection, aiming to detect statistically significant changes using a stream of summary g… ▽ More

    Submitted 12 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: 6 pages, Accepted to 23rd IEEE International Conference on Data Mining (ICDM2023)

  6. arXiv:2307.09259  [pdf, other

    cs.LG cs.CG cs.CV

    Adaptive Topological Feature via Persistent Homology: Filtration Learning for Point Clouds

    Authors: Naoki Nishikawa, Yuichi Ike, Kenji Yamanishi

    Abstract: Machine learning for point clouds has been attracting much attention, with many applications in various fields, such as shape recognition and material science. For enhancing the accuracy of such machine learning methods, it is often effective to incorporate global topological features, which are typically extracted by persistent homology. In the calculation of persistent homology for a point cloud… ▽ More

    Submitted 24 December, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

    Comments: 24 pages with 6 figures

  7. arXiv:2305.07971  [pdf, ps, other

    stat.ML cs.LG

    Tight and fast generalization error bound of graph embedding in metric space

    Authors: Atsushi Suzuki, Atsushi Nitanda, Taiji Suzuki, Jing Wang, Feng Tian, Kenji Yamanishi

    Abstract: Recent studies have experimentally shown that we can achieve in non-Euclidean metric space effective and efficient graph embedding, which aims to obtain the vertices' representations reflecting the graph's structure in the metric space. Specifically, graph embedding in hyperbolic space has experimentally succeeded in embedding graphs with hierarchical-tree structure, e.g., data in natural language… ▽ More

    Submitted 13 May, 2023; originally announced May 2023.

  8. arXiv:2302.12127  [pdf, other

    cs.LG stat.ML

    Detecting Signs of Model Change with Continuous Model Selection Based on Descriptive Dimensionality

    Authors: Kenji Yamanishi, So Hirai

    Abstract: We address the issue of detecting changes of models that lie behind a data stream. The model refers to an integer-valued structural information such as the number of free parameters in a parametric model. Specifically we are concerned with the problem of how we can detect signs of model changes earlier than they are actualized. To this end, we employ {\em continuous model selection} on the basis o… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

  9. arXiv:2105.10475  [pdf, ps, other

    cs.LG

    Generalization Error Bound for Hyperbolic Ordinal Embedding

    Authors: Atsushi Suzuki, Atsushi Nitanda, Jing Wang, Linchuan Xu, Marc Cavazza, Kenji Yamanishi

    Abstract: Hyperbolic ordinal embedding (HOE) represents entities as points in hyperbolic space so that they agree as well as possible with given constraints in the form of entity i is more similar to entity j than to entity k. It has been experimentally shown that HOE can obtain representations of hierarchical data such as a knowledge base and a citation network effectively, owing to hyperbolic space's expo… ▽ More

    Submitted 21 May, 2021; originally announced May 2021.

  10. arXiv:2011.09465  [pdf, other

    stat.ML cs.IT cs.LG

    Detecting Hierarchical Changes in Latent Variable Models

    Authors: Shintaro Fukushima, Kenji Yamanishi

    Abstract: This paper addresses the issue of detecting hierarchical changes in latent variable models (HCDL) from data streams. There are three different levels of changes for latent variable models: 1) the first level is the change in data distribution for fixed latent variables, 2) the second one is that in the distribution over latent variables, and 3) the third one is that in the number of latent variabl… ▽ More

    Submitted 22 November, 2020; v1 submitted 18 November, 2020; originally announced November 2020.

    Comments: 12pages, Accepted to 20th IEEE International Conference on Data Mining (ICDM2020)

  11. arXiv:2008.07720  [pdf, other

    cs.LG cs.CL stat.ML

    Word2vec Skip-gram Dimensionality Selection via Sequential Normalized Maximum Likelihood

    Authors: Pham Thuc Hung, Kenji Yamanishi

    Abstract: In this paper, we propose a novel information criteria-based approach to select the dimensionality of the word2vec Skip-gram (SG). From the perspective of the probability theory, SG is considered as an implicit probability distribution estimation under the assumption that there exists a true contextual distribution among words. Therefore, we apply information criteria with the aim of selecting the… ▽ More

    Submitted 24 August, 2020; v1 submitted 17 August, 2020; originally announced August 2020.

  12. arXiv:2007.15897  [pdf, other

    eess.IV cs.CV cs.LG

    A Novel Global Spatial Attention Mechanism in Convolutional Neural Network for Medical Image Classification

    Authors: Linchuan Xu, Jun Huang, Atsushi Nitanda, Ryo Asaoka, Kenji Yamanishi

    Abstract: Spatial attention has been introduced to convolutional neural networks (CNNs) for improving both their performance and interpretability in visual tasks including image classification. The essence of the spatial attention is to learn a weight map which represents the relative importance of activations within the same layer or channel. All existing attention mechanisms are local attentions in the se… ▽ More

    Submitted 31 July, 2020; originally announced July 2020.

  13. arXiv:2007.15179  [pdf, other

    stat.AP cs.IT stat.ME

    Detecting Change Signs with Differential MDL Change Statistics for COVID-19 Pandemic Analysis

    Authors: Kenji Yamanishi, Linchuan Xu, Ryo Yuki, Shintaro Fukushima, Chuan-hao Lin

    Abstract: We are concerned with the issue of detecting changes and their signs from a data stream. For example, when given time series of COVID-19 cases in a region, we may raise early warning signals of outbreaks by detecting signs of changes in the cases. We propose a novel methodology to address this issue. The key idea is to employ a new information-theoretic notion, which we call the differential minim… ▽ More

    Submitted 19 February, 2021; v1 submitted 29 July, 2020; originally announced July 2020.

  14. arXiv:2007.12160  [pdf, other

    stat.ML cs.LG

    Online Robust and Adaptive Learning from Data Streams

    Authors: Shintaro Fukushima, Atsushi Nitanda, Kenji Yamanishi

    Abstract: In online learning from non-stationary data streams, it is necessary to learn robustly to outliers and to adapt quickly to changes in the underlying data generating mechanism. In this paper, we refer to the former attribute of online learning algorithms as robustness and to the latter as adaptivity. There is an obvious tradeoff between the two attributes. It is a fundamental issue to quantify and… ▽ More

    Submitted 27 September, 2021; v1 submitted 23 July, 2020; originally announced July 2020.

    Comments: 42 pages

  15. arXiv:2007.07467  [pdf, ps, other

    cs.LG stat.ML

    Mixture Complexity and Its Application to Gradual Clustering Change Detection

    Authors: Shunki Kyoya, Kenji Yamanishi

    Abstract: In model-based clustering using finite mixture models, it is a significant challenge to determine the number of clusters (cluster size). It used to be equal to the number of mixture components (mixture size); however, this may not be valid in the presence of overlaps or weight biases. In this study, we propose to continuously measure the cluster size in a mixture model by a new concept called mixt… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

  16. arXiv:1910.11540  [pdf, other

    cs.LG cs.IT stat.ML

    Descriptive Dimensionality and Its Characterization of MDL-based Learning and Change Detection

    Authors: Kenji Yamanishi

    Abstract: This paper introduces a new notion of dimensionality of probabilistic models from an information-theoretic view point. We call it the "descriptive dimension"(Ddim). We show that Ddim coincides with the number of independent parameters for the parametric class, and can further be extended to real-valued dimensionality when a number of models are mixed. The paper then derives the rate of convergence… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.

  17. arXiv:1905.00699  [pdf, ps, other

    physics.soc-ph stat.AP

    Long-tailed distributions of inter-event times as mixtures of exponential distributions

    Authors: Makoto Okada, Kenji Yamanishi, Naoki Masuda

    Abstract: Inter-event times of various human behavior are apparently non-Poissonian and obey long-tailed distributions as opposed to exponential distributions, which correspond to Poisson processes. It has been suggested that human individuals may switch between different states in each of which they are regarded to generate events obeying a Poisson process. If this is the case, inter-event times should app… ▽ More

    Submitted 26 February, 2020; v1 submitted 30 April, 2019; originally announced May 2019.

    Comments: 2 figures, 4 tables, SI and code are available here: https://github.com/naokimas/exp_mixture_model

    Journal ref: Royal Society Open Science, 7, 191643 (2020)

  18. arXiv:1810.03825  [pdf, other

    stat.ML cs.LG

    Adaptive Minimax Regret against Smooth Logarithmic Losses over High-Dimensional $\ell_1$-Balls via Envelope Complexity

    Authors: Kohei Miyaguchi, Kenji Yamanishi

    Abstract: We develop a new theoretical framework, the \emph{envelope complexity}, to analyze the minimax regret with logarithmic loss functions and derive a Bayesian predictor that adaptively achieves the minimax regret over high-dimensional $\ell_1$-balls within a factor of two. The prior is newly derived for achieving the minimax regret and called the \emph{spike-and-tails~(ST) prior} as it looks like. Th… ▽ More

    Submitted 13 October, 2018; v1 submitted 9 October, 2018; originally announced October 2018.

  19. arXiv:1805.10487  [pdf, other

    stat.ML cs.LG

    Stable Geodesic Update on Hyperbolic Space and its Application to Poincare Embeddings

    Authors: Yosuke Enokida, Atsushi Suzuki, Kenji Yamanishi

    Abstract: A hyperbolic space has been shown to be more capable of modeling complex networks than a Euclidean space. This paper proposes an explicit update rule along geodesics in a hyperbolic space. The convergence of our algorithm is theoretically guaranteed, and the convergence rate is better than the conventional Euclidean gradient descent algorithm. Moreover, our algorithm avoids the "bias" problem of e… ▽ More

    Submitted 26 May, 2018; originally announced May 2018.

  20. arXiv:1804.09904  [pdf, other

    stat.ML cs.LG

    High-dimensional Penalty Selection via Minimum Description Length Principle

    Authors: Kohei Miyaguchi, Kenji Yamanishi

    Abstract: We tackle the problem of penalty selection of regularization on the basis of the minimum description length (MDL) principle. In particular, we consider that the design space of the penalty function is high-dimensional. In this situation, the luckiness-normalized-maximum-likelihood(LNML)-minimization approach is favorable, because LNML quantifies the goodness of regularized models with any forms of… ▽ More

    Submitted 26 April, 2018; originally announced April 2018.

    Comments: Preprint before review

  21. arXiv:1801.03705  [pdf, ps, other

    math.ST

    Exact Calculation of Normalized Maximum Likelihood Code Length Using Fourier Analysis

    Authors: Atsushi Suzuki, Kenji Yamanishi

    Abstract: The normalized maximum likelihood code length has been widely used in model selection, and its favorable properties, such as its consistency and the upper bound of its statistical risk, have been demonstrated. This paper proposes a novel methodology for calculating the normalized maximum likelihood code length on the basis of Fourier analysis. Our methodology provides an efficient non-asymptotic c… ▽ More

    Submitted 11 January, 2018; originally announced January 2018.

  22. Grafting for Combinatorial Boolean Model using Frequent Itemset Mining

    Authors: Taito Lee, Shin Matsushima, Kenji Yamanishi

    Abstract: This paper introduces the combinatorial Boolean model (CBM), which is defined as the class of linear combinations of conjunctions of Boolean attributes. This paper addresses the issue of learning CBM from labeled data. CBM is of high knowledge interpretability but naïve learning of it requires exponentially large computation time with respect to data dimension and sample size. To overcome this com… ▽ More

    Submitted 13 November, 2017; v1 submitted 7 November, 2017; originally announced November 2017.

    Journal ref: Data Min Knowl Disc 34, 101-123 (2020)

  23. arXiv:1709.00925  [pdf, other

    cs.IT

    Upper Bound on Normalized Maximum Likelihood Codes for Gaussian Mixture Models

    Authors: So Hirai, Kenji Yamanishi

    Abstract: This paper shows that the normalized maximum likelihood~(NML) code-length calculated in [1] is an upper bound on the NML code-length strictly calculated for the Gaussian Mixture Model. When we use this upper bound on the NML code-length, we must change the scale of the data sequence to satisfy the restricted domain. However, we also show that the algorithm for model selection is essentially univer… ▽ More

    Submitted 18 November, 2018; v1 submitted 4 September, 2017; originally announced September 2017.

  24. arXiv:1603.07094  [pdf, ps, other

    stat.ML cs.LG

    Predicting Glaucoma Visual Field Loss by Hierarchically Aggregating Clustering-based Predictors

    Authors: Motohide Higaki, Kai Morino, Hiroshi Murata, Ryo Asaoka, Kenji Yamanishi

    Abstract: This study addresses the issue of predicting the glaucomatous visual field loss from patient disease datasets. Our goal is to accurately predict the progress of the disease in individual patients. As very few measurements are available for each patient, it is difficult to produce good predictors for individuals. A recently proposed clustering-based method enhances the power of prediction using pat… ▽ More

    Submitted 23 March, 2016; originally announced March 2016.

  25. arXiv:1205.3549  [pdf, ps, other

    cs.LG

    Normalized Maximum Likelihood Coding for Exponential Family with Its Applications to Optimal Clustering

    Authors: So Hirai, Kenji Yamanishi

    Abstract: We are concerned with the issue of how to calculate the normalized maximum likelihood (NML) code-length. There is a problem that the normalization term of the NML code-length may diverge when it is continuous and unbounded and a straightforward computation of it is highly expensive when the data domain is finite . In previous works it has been investigated how to calculate the NML code-length for… ▽ More

    Submitted 16 May, 2012; v1 submitted 15 May, 2012; originally announced May 2012.

  26. arXiv:1110.2899  [pdf, ps, other

    stat.ML cs.LG cs.SI physics.soc-ph

    Discovering Emerging Topics in Social Streams via Link Anomaly Detection

    Authors: Toshimitsu Takahashi, Ryota Tomioka, Kenji Yamanishi

    Abstract: Detection of emerging topics are now receiving renewed interest motivated by the rapid growth of social networks. Conventional term-frequency-based approaches may not be appropriate in this context, because the information exchanged are not only texts but also images, URLs, and videos. We focus on the social aspects of theses networks. That is, the links between users that are generated dynamicall… ▽ More

    Submitted 13 October, 2011; originally announced October 2011.

    Comments: 10 pages, 6 figures

  27. Document Classification Using a Finite Mixture Model

    Authors: Hang Li, Kenji Yamanishi

    Abstract: We propose a new method of classifying documents into categories. The simple method of conducting hypothesis testing over word-based distributions in categories suffers from the data sparseness problem. In order to address this difficulty, Guthrie et.al. have developed a method using distributions based on hard clustering of words, i.e., in which a word is assigned to a single cluster and words… ▽ More

    Submitted 6 May, 1997; originally announced May 1997.

    Comments: latex file, uses aclap.sty and epsf.sty, 9 pages, to appear ACL/EACL-97