Skip to main content

Showing 1–7 of 7 results for author: Lupu, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2001.04484  [pdf, other

    cs.CL cs.IR cs.LG

    On the Replicability of Combining Word Embeddings and Retrieval Models

    Authors: Luca Papariello, Alexandros Bampoulidis, Mihai Lupu

    Abstract: We replicate recent experiments attempting to demonstrate an attractive hypothesis about the use of the Fisher kernel framework and mixture models for aggregating word embeddings towards document representations and the use of these representations in document classification, clustering, and retrieval. Specifically, the hypothesis was that the use of a mixture model of von Mises-Fisher (VMF) distr… ▽ More

    Submitted 13 January, 2020; originally announced January 2020.

  2. arXiv:1902.09897  [pdf, ps, other

    cs.CR cs.DB

    An Abstract View on the De-anonymization Process

    Authors: Alexandros Bampoulidis, Mihai Lupu

    Abstract: Over the recent years, the availability of datasets containing personal, but anonymized information has been continuously increasing. Extensive research has revealed that such datasets are vulnerable to privacy breaches: being able to reveal sensitive information about individuals through deanonymization methods. Here, we provide a taxonomy of the research in de-anonymization.

    Submitted 26 February, 2019; originally announced February 2019.

  3. arXiv:1711.06196  [pdf, other

    cs.CL cs.IR

    Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian

    Authors: Navid Rekabsaz, Mihai Lupu, Allan Hanbury, Andres Duque

    Abstract: We explore the use of unsupervised methods in Cross-Lingual Word Sense Disambiguation (CL-WSD) with the application of English to Persian. Our proposed approach targets the languages with scarce resources (low-density) by exploiting word embedding and semantic similarity of the words in context. We evaluate the approach on a recent evaluation benchmark and compare it with the state-of-the-art unsu… ▽ More

    Submitted 21 March, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

  4. arXiv:1707.06598  [pdf, other

    cs.IR cs.CL

    Toward Incorporation of Relevant Documents in word2vec

    Authors: Navid Rekabsaz, Bhaskar Mitra, Mihai Lupu, Allan Hanbury

    Abstract: Recent advances in neural word embedding provide significant benefit to various information retrieval tasks. However as shown by recent studies, adapting the embedding models for the needs of IR tasks can bring considerable further improvements. The embedding models in general define the term relatedness by exploiting the terms' co-occurrences in short-window contexts. An alternative (and well-stu… ▽ More

    Submitted 4 April, 2018; v1 submitted 20 July, 2017; originally announced July 2017.

    Comments: Neu-IR Workshop at the ACM Conference on Research and Development in Information Retrieval (NeuIR-SIGIR 2017)

  5. arXiv:1703.05123  [pdf, other

    cs.IR cs.CL

    Character-based Neural Embeddings for Tweet Clustering

    Authors: Svitlana Vakulenko, Lyndon Nixon, Mihai Lupu

    Abstract: In this paper we show how the performance of tweet clustering can be improved by leveraging character-based neural networks. The proposed approach overcomes the limitations related to the vocabulary explosion in the word-based models and allows for the seamless processing of the multilingual content. Our evaluation results and code are available on-line at https://github.com/vendi12/tweet2vec_clus… ▽ More

    Submitted 16 March, 2017; v1 submitted 15 March, 2017; originally announced March 2017.

    Comments: Accepted at the SocialNLP 2017 workshop held in conjunction with EACL 2017, April 3, 2017, Valencia, Spain

  6. Volatility Prediction using Financial Disclosures Sentiments with Word Embedding-based IR Models

    Authors: Navid Rekabsaz, Mihai Lupu, Artem Baklanov, Allan Hanbury, Alexander Duer, Linda Anderson

    Abstract: Volatility prediction--an essential concept in financial markets--has recently been addressed using sentiment analysis methods. We investigate the sentiment of annual disclosures of companies in stock markets to forecast volatility. We specifically explore the use of recent Information Retrieval (IR) term weighting models that are effectively extended by related terms using word embeddings. In par… ▽ More

    Submitted 28 September, 2017; v1 submitted 7 February, 2017; originally announced February 2017.

  7. arXiv:1606.06086  [pdf, other

    cs.CL cs.IR

    Uncertainty in Neural Network Word Embedding: Exploration of Threshold for Similarity

    Authors: Navid Rekabsaz, Mihai Lupu, Allan Hanbury

    Abstract: Word embedding, specially with its recent developments, promises a quantification of the similarity between terms. However, it is not clear to which extent this similarity value can be genuinely meaningful and useful for subsequent tasks. We explore how the similarity score obtained from the models is really indicative of term relatedness. We first observe and quantify the uncertainty factor of th… ▽ More

    Submitted 4 April, 2018; v1 submitted 20 June, 2016; originally announced June 2016.

    Comments: Neu-IR Workshop at the ACM Conference on Research and Development in Information Retrieval (NeuIR-SIGIR 2016)