Skip to main content

Showing 1–2 of 2 results for author: Gour, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2011.04096  [pdf, other

    cs.CL cs.AI

    Metrics also Disagree in the Low Scoring Range: Revisiting Summarization Evaluation Metrics

    Authors: Manik Bhandari, Pranav Gour, Atabak Ashfaq, Pengfei Liu

    Abstract: In text summarization, evaluating the efficacy of automatic metrics without human judgments has become recently popular. One exemplar work concludes that automatic metrics strongly disagree when ranking high-scoring summaries. In this paper, we revisit their experiments and find that their observations stem from the fact that metrics disagree in ranking summaries from any narrow scoring range. We… ▽ More

    Submitted 8 November, 2020; originally announced November 2020.

    Comments: Accepted at COLING 2020

  2. arXiv:2010.07100  [pdf, other

    cs.CL cs.IR cs.LG

    Re-evaluating Evaluation in Text Summarization

    Authors: Manik Bhandari, Pranav Gour, Atabak Ashfaq, Pengfei Liu, Graham Neubig

    Abstract: Automated evaluation metrics as a stand-in for manual evaluation are an essential part of the development of text-generation tasks such as text summarization. However, while the field has progressed, our standard metrics have not -- for nearly 20 years ROUGE has been the standard evaluation in most summarization papers. In this paper, we make an attempt to re-evaluate the evaluation method for tex… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: Accepted at EMNLP 2020