Skip to main content

Showing 1–3 of 3 results for author: Hessen, D J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.20895  [pdf, other

    cs.CL

    A comparison of correspondence analysis with PMI-based word embedding methods

    Authors: Qianqian Qi, Ayoub Bagheri, David J. Hessen, Peter G. M. van der Heijden

    Abstract: Popular word embedding methods such as GloVe and Word2Vec are related to the factorization of the pointwise mutual information (PMI) matrix. In this paper, we link correspondence analysis (CA) to the factorization of the PMI matrix. CA is a dimensionality reduction method that uses singular value decomposition (SVD), and we show that CA is mathematically close to the weighted factorization of the… ▽ More

    Submitted 8 November, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  2. Improving information retrieval through correspondence analysis instead of latent semantic analysis

    Authors: Qianqian Qi, David J. Hessen, Peter G. M. van der Heijden

    Abstract: Both latent semantic analysis (LSA) and correspondence analysis (CA) are dimensionality reduction techniques that use singular value decomposition (SVD) for information retrieval. Theoretically, the results of LSA display both the association between documents and terms, and marginal effects; in comparison, CA only focuses on the associations between documents and terms. Marginal effects are usual… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Journal ref: Journal of Intelligent Information Systems 2023

  3. A comparison of latent semantic analysis and correspondence analysis of document-term matrices

    Authors: Qianqian Qi, David J. Hessen, Tejaswini Deoskar, Peter G. M. van der Heijden

    Abstract: Latent semantic analysis (LSA) and correspondence analysis (CA) are two techniques that use a singular value decomposition (SVD) for dimensionality reduction. LSA has been extensively used to obtain low-dimensional representations that capture relationships among documents and terms. In this article, we present a theoretical analysis and comparison of the two techniques in the context of document-… ▽ More

    Submitted 25 November, 2022; v1 submitted 25 July, 2021; originally announced August 2021.

    Journal ref: Nat. Lang. Eng. 30 (2024) 722-752