Search | arXiv e-print repository

arXiv:cs/0206007 [pdf, ps, other]

Using the Annotated Bibliography as a Resource for Indicative Summarization

Authors: Min-Yen Kan, Judith L. Klavans, Kathleen R. McKeown

Abstract: We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated bibliographies cover certain aspects of summarization that have not been well-covered by other summary corpora, and motivate why they constitute an important form to study for information retrieval. We deta… ▽ More We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated bibliographies cover certain aspects of summarization that have not been well-covered by other summary corpora, and motivate why they constitute an important form to study for information retrieval. We detail our methodology for collecting the corpus, and overview our document feature markup that we introduced to facilitate summary analysis. We present the characteristics of the corpus, methods of collection, and show its use in finding the distribution of types of information included in indicative summaries and their relative ordering within the summaries. △ Less

Submitted 4 June, 2002; originally announced June 2002.

Comments: 8 pages, 3 figures

ACM Class: I.2.7

Journal ref: Proceedings of LREC 2002, Las Palmas, Spain. pp. 1746-1752

arXiv:cs/0107019 [pdf, ps, other]

Applying Natural Language Generation to Indicative Summarization

Authors: Min-Yen Kan, Kathleen R. McKeown, Judith L. Klavans

Abstract: The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a generation perspective, by first analyzing its required content via published guidelines and corpus analysis. We show how these summaries can be factored into a set of document features, and how an implemente… ▽ More The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a generation perspective, by first analyzing its required content via published guidelines and corpus analysis. We show how these summaries can be factored into a set of document features, and how an implemented content planner uses the topicality document feature to create indicative multidocument query-based summaries. △ Less

Submitted 16 July, 2001; v1 submitted 16 July, 2001; originally announced July 2001.

Comments: 8 pages, published in Proc. of 8th European Workshop on NLG

ACM Class: I.2.7

arXiv:cs/9810014 [pdf, ps, other]

Resources for Evaluation of Summarization Techniques

Authors: Judith L. Klavans, Kathleen R. McKeown, Min-Yen Kan, Susan Lee

Abstract: We report on two corpora to be used in the evaluation of component systems for the tasks of (1) linear segmentation of text and (2) summary-directed sentence extraction. We present characteristics of the corpora, methods used in the collection of user judgments, and an overview of the application of the corpora to evaluating the component system. Finally, we discuss the problems and issues with… ▽ More We report on two corpora to be used in the evaluation of component systems for the tasks of (1) linear segmentation of text and (2) summary-directed sentence extraction. We present characteristics of the corpora, methods used in the collection of user judgments, and an overview of the application of the corpora to evaluating the component system. Finally, we discuss the problems and issues with construction of the test set which apply broadly to the construction of evaluation resources for language technologies. △ Less

Submitted 13 October, 1998; originally announced October 1998.

Comments: LaTeX source, 5 pages, US Letter, uses lrec98.sty

ACM Class: I.2.7

Journal ref: in Proc. of First International Conference on Language Resources and Evaluation, Rubio, Gallardo, Castro, and Tejada (eds.), Granada, Spain, 1998

arXiv:cs/9809020 [pdf, ps]

Linear Segmentation and Segment Significance

Authors: Min-Yen Kan, Judith L. Klavans, Kathleen R. McKeown

Abstract: We present a new method for discovering a segmental discourse structure of a document while categorizing segment function. We demonstrate how retrieval of noun phrases and pronominal forms, along with a zero-sum weighting scheme, determines topicalized segmentation. Futhermore, we use term distribution to aid in identifying the role that the segment performs in the document. Finally, we present… ▽ More We present a new method for discovering a segmental discourse structure of a document while categorizing segment function. We demonstrate how retrieval of noun phrases and pronominal forms, along with a zero-sum weighting scheme, determines topicalized segmentation. Futhermore, we use term distribution to aid in identifying the role that the segment performs in the document. Finally, we present results of evaluation in terms of precision and recall which surpass earlier approaches. △ Less

Submitted 15 September, 1998; originally announced September 1998.

Comments: 9 pages, US Letter, 4 figures. Software License can be found at http://www.cs.columbia.edu/nlp/licenses/segmenterLicenseDownload.html

ACM Class: I.2.7

Journal ref: Proceedings of 6th International Workshop of Very Large Corpora (WVLC-6), Montreal, Quebec, Canada: Aug. 1998. pp. 197-205

arXiv:cmp-lg/9702014 [pdf, ps, other]

Building a Generation Knowledge Source using Internet-Accessible Newswire

Authors: Dragomir R. Radev, Kathleen R. McKeown

Abstract: In this paper, we describe a method for automatic creation of a knowledge source for text generation using information extraction over the Internet. We present a prototype system called PROFILE which uses a client-server architecture to extract noun-phrase descriptions of entities such as people, places, and organizations. The system serves two purposes: as an information extraction tool, it all… ▽ More In this paper, we describe a method for automatic creation of a knowledge source for text generation using information extraction over the Internet. We present a prototype system called PROFILE which uses a client-server architecture to extract noun-phrase descriptions of entities such as people, places, and organizations. The system serves two purposes: as an information extraction tool, it allows users to search for textual descriptions of entities; as a utility to generate functional descriptions (FD), it is used in a functional-unification based generation system. We present an evaluation of the approach and its applications to natural language generation and summarization. △ Less

Submitted 25 February, 1997; originally announced February 1997.

Comments: 8 pages, uses epsf

Journal ref: To appear in Proceedings of the 5th Conference on Applied Natural Processing, Washington DC, 31 March - 3 April, 1997.

arXiv:cmp-lg/9610002 [pdf, ps]

Gathering Statistics to Aspectually Classify Sentences with a Genetic Algorithm

Authors: Eric V. Siegel, Kathleen R. McKeown

Abstract: This paper presents a method for large corpus analysis to semantically classify an entire clause. In particular, we use cooccurrence statistics among similar clauses to determine the aspectual class of an input clause. The process examines linguistic features of clauses that are relevant to aspectual classification. A genetic algorithm determines what combinations of linguistic features to use f… ▽ More This paper presents a method for large corpus analysis to semantically classify an entire clause. In particular, we use cooccurrence statistics among similar clauses to determine the aspectual class of an input clause. The process examines linguistic features of clauses that are relevant to aspectual classification. A genetic algorithm determines what combinations of linguistic features to use for this task. △ Less

Submitted 21 October, 1996; originally announced October 1996.

Comments: postscript, 9 pages, Proceedings of the Second International Conference on New Methods in Language Processing, Oflazer and Somers ed.

arXiv:cmp-lg/9408007 [pdf, ps]

Emergent Linguistic Rules from Inducing Decision Trees: Disambiguating Discourse Clue Words

Authors: Eric V. Siegel, Kathleen R. McKeown

Abstract: We apply decision tree induction to the problem of discourse clue word sense disambiguation with a genetic algorithm. The automatic partitioning of the training set which is intrinsic to decision tree induction gives rise to linguistically viable rules. We apply decision tree induction to the problem of discourse clue word sense disambiguation with a genetic algorithm. The automatic partitioning of the training set which is intrinsic to decision tree induction gives rise to linguistically viable rules. △ Less

Submitted 13 August, 1994; originally announced August 1994.

Journal ref: AAAI94 proceedings

Showing 1–7 of 7 results for author: McKeown, K R