-
arXiv:cs/0206007 [pdf, ps, other]
Using the Annotated Bibliography as a Resource for Indicative Summarization
Abstract: We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated bibliographies cover certain aspects of summarization that have not been well-covered by other summary corpora, and motivate why they constitute an important form to study for information retrieval. We deta… ▽ More
Submitted 4 June, 2002; originally announced June 2002.
Comments: 8 pages, 3 figures
ACM Class: I.2.7
Journal ref: Proceedings of LREC 2002, Las Palmas, Spain. pp. 1746-1752
-
arXiv:cs/0107019 [pdf, ps, other]
Applying Natural Language Generation to Indicative Summarization
Abstract: The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a generation perspective, by first analyzing its required content via published guidelines and corpus analysis. We show how these summaries can be factored into a set of document features, and how an implemente… ▽ More
Submitted 16 July, 2001; v1 submitted 16 July, 2001; originally announced July 2001.
Comments: 8 pages, published in Proc. of 8th European Workshop on NLG
ACM Class: I.2.7
-
arXiv:cs/9810014 [pdf, ps, other]
Resources for Evaluation of Summarization Techniques
Abstract: We report on two corpora to be used in the evaluation of component systems for the tasks of (1) linear segmentation of text and (2) summary-directed sentence extraction. We present characteristics of the corpora, methods used in the collection of user judgments, and an overview of the application of the corpora to evaluating the component system. Finally, we discuss the problems and issues with… ▽ More
Submitted 13 October, 1998; originally announced October 1998.
Comments: LaTeX source, 5 pages, US Letter, uses lrec98.sty
ACM Class: I.2.7
Journal ref: in Proc. of First International Conference on Language Resources and Evaluation, Rubio, Gallardo, Castro, and Tejada (eds.), Granada, Spain, 1998
-
Linear Segmentation and Segment Significance
Abstract: We present a new method for discovering a segmental discourse structure of a document while categorizing segment function. We demonstrate how retrieval of noun phrases and pronominal forms, along with a zero-sum weighting scheme, determines topicalized segmentation. Futhermore, we use term distribution to aid in identifying the role that the segment performs in the document. Finally, we present… ▽ More
Submitted 15 September, 1998; originally announced September 1998.
Comments: 9 pages, US Letter, 4 figures. Software License can be found at http://www.cs.columbia.edu/nlp/licenses/segmenterLicenseDownload.html
ACM Class: I.2.7
Journal ref: Proceedings of 6th International Workshop of Very Large Corpora (WVLC-6), Montreal, Quebec, Canada: Aug. 1998. pp. 197-205
-
Building a Generation Knowledge Source using Internet-Accessible Newswire
Abstract: In this paper, we describe a method for automatic creation of a knowledge source for text generation using information extraction over the Internet. We present a prototype system called PROFILE which uses a client-server architecture to extract noun-phrase descriptions of entities such as people, places, and organizations. The system serves two purposes: as an information extraction tool, it all… ▽ More
Submitted 25 February, 1997; originally announced February 1997.
Comments: 8 pages, uses epsf
Journal ref: To appear in Proceedings of the 5th Conference on Applied Natural Processing, Washington DC, 31 March - 3 April, 1997.
-
Gathering Statistics to Aspectually Classify Sentences with a Genetic Algorithm
Abstract: This paper presents a method for large corpus analysis to semantically classify an entire clause. In particular, we use cooccurrence statistics among similar clauses to determine the aspectual class of an input clause. The process examines linguistic features of clauses that are relevant to aspectual classification. A genetic algorithm determines what combinations of linguistic features to use f… ▽ More
Submitted 21 October, 1996; originally announced October 1996.
Comments: postscript, 9 pages, Proceedings of the Second International Conference on New Methods in Language Processing, Oflazer and Somers ed.
-
Emergent Linguistic Rules from Inducing Decision Trees: Disambiguating Discourse Clue Words
Abstract: We apply decision tree induction to the problem of discourse clue word sense disambiguation with a genetic algorithm. The automatic partitioning of the training set which is intrinsic to decision tree induction gives rise to linguistically viable rules.
Submitted 13 August, 1994; originally announced August 1994.
Journal ref: AAAI94 proceedings