-
Multimodal Biomarkers for Schizophrenia: Towards Individual Symptom Severity Estimation
Authors:
Gowtham Premananth,
Philip Resnik,
Sonia Bansal,
Deanna L. Kelly,
Carol Espy-Wilson
Abstract:
Studies on schizophrenia assessments using deep learning typically treat it as a classification task to detect the presence or absence of the disorder, oversimplifying the condition and reducing its clinical applicability. This traditional approach overlooks the complexity of schizophrenia, limiting its practical value in healthcare settings. This study shifts the focus to individual symptom sever…
▽ More
Studies on schizophrenia assessments using deep learning typically treat it as a classification task to detect the presence or absence of the disorder, oversimplifying the condition and reducing its clinical applicability. This traditional approach overlooks the complexity of schizophrenia, limiting its practical value in healthcare settings. This study shifts the focus to individual symptom severity estimation using a multimodal approach that integrates speech, video, and text inputs. We develop unimodal models for each modality and a multimodal framework to improve accuracy and robustness. By capturing a more detailed symptom profile, this approach can help in enhancing diagnostic precision and support personalized treatment, offering a scalable and objective tool for mental health assessment.
△ Less
Submitted 4 June, 2025; v1 submitted 21 May, 2025;
originally announced May 2025.
-
Conversational User-AI Intervention: A Study on Prompt Rewriting for Improved LLM Response Generation
Authors:
Rupak Sarkar,
Bahareh Sarrafzadeh,
Nirupama Chandrasekaran,
Nagu Rangan,
Philip Resnik,
Longqi Yang,
Sujay Kumar Jauhar
Abstract:
Human-LLM conversations are increasingly becoming more pervasive in peoples' professional and personal lives, yet many users still struggle to elicit helpful responses from LLM Chatbots. One of the reasons for this issue is users' lack of understanding in crafting effective prompts that accurately convey their information needs. Meanwhile, the existence of real-world conversational datasets on the…
▽ More
Human-LLM conversations are increasingly becoming more pervasive in peoples' professional and personal lives, yet many users still struggle to elicit helpful responses from LLM Chatbots. One of the reasons for this issue is users' lack of understanding in crafting effective prompts that accurately convey their information needs. Meanwhile, the existence of real-world conversational datasets on the one hand, and the text understanding faculties of LLMs on the other, present a unique opportunity to study this problem, and its potential solutions at scale. Thus, in this paper we present the first LLM-centric study of real human-AI chatbot conversations, focused on investigating aspects in which user queries fall short of expressing information needs, and the potential of using LLMs to rewrite suboptimal user prompts. Our findings demonstrate that rephrasing ineffective prompts can elicit better responses from a conversational system, while preserving the user's original intent. Notably, the performance of rewrites improves in longer conversations, where contextual inferences about user needs can be made more accurately. Additionally, we observe that LLMs often need to -- and inherently do -- make \emph{plausible} assumptions about a user's intentions and goals when interpreting prompts. Our findings largely hold true across conversational domains, user intents, and LLMs of varying sizes and families, indicating the promise of using prompt rewriting as a solution for better human-AI interactions.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Understanding Common Ground Misalignment in Goal-Oriented Dialog: A Case-Study with Ubuntu Chat Logs
Authors:
Rupak Sarkar,
Neha Srikanth,
Taylor Hudson,
Rachel Rudinger,
Claire Bonial,
Philip Resnik
Abstract:
While it is commonly accepted that maintaining common ground plays a role in conversational success, little prior research exists connecting conversational grounding to success in task-oriented conversations. We study failures of grounding in the Ubuntu IRC dataset, where participants use text-only communication to resolve technical issues. We find that disruptions in conversational flow often ste…
▽ More
While it is commonly accepted that maintaining common ground plays a role in conversational success, little prior research exists connecting conversational grounding to success in task-oriented conversations. We study failures of grounding in the Ubuntu IRC dataset, where participants use text-only communication to resolve technical issues. We find that disruptions in conversational flow often stem from a misalignment in common ground, driven by a divergence in beliefs and assumptions held by participants. These disruptions, which we call conversational friction, significantly correlate with task success. We find that although LLMs can identify overt cases of conversational friction, they struggle with subtler and more context-dependent instances requiring pragmatic or domain-specific reasoning.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
Large Language Models are Biased Because They Are Large Language Models
Authors:
Philip Resnik
Abstract:
This position paper's primary goal is to provoke thoughtful discussion about the relationship between bias and fundamental properties of large language models. I do this by seeking to convince the reader that harmful biases are an inevitable consequence arising from the design of any large language model as LLMs are currently formulated. To the extent that this is true, it suggests that the proble…
▽ More
This position paper's primary goal is to provoke thoughtful discussion about the relationship between bias and fundamental properties of large language models. I do this by seeking to convince the reader that harmful biases are an inevitable consequence arising from the design of any large language model as LLMs are currently formulated. To the extent that this is true, it suggests that the problem of harmful bias cannot be properly addressed without a serious reconsideration of AI driven by LLMs, going back to the foundational assumptions underlying their design.
△ Less
Submitted 13 March, 2025; v1 submitted 18 June, 2024;
originally announced June 2024.
-
The Prompt Report: A Systematic Survey of Prompt Engineering Techniques
Authors:
Sander Schulhoff,
Michael Ilie,
Nishant Balepur,
Konstantine Kahadze,
Amanda Liu,
Chenglei Si,
Yinheng Li,
Aayush Gupta,
HyoJung Han,
Sevien Schulhoff,
Pranav Sandeep Dulepet,
Saurav Vidyadhara,
Dayeon Ki,
Sweta Agrawal,
Chau Pham,
Gerson Kroiz,
Feileen Li,
Hudson Tao,
Ashay Srivastava,
Hevander Da Costa,
Saloni Gupta,
Megan L. Rogers,
Inna Goncearenco,
Giuseppe Sarli,
Igor Galynker
, et al. (6 additional authors not shown)
Abstract:
Generative Artificial Intelligence (GenAI) systems are increasingly being deployed across diverse industries and research domains. Developers and end-users interact with these systems through the use of prompting and prompt engineering. Although prompt engineering is a widely adopted and extensively researched area, it suffers from conflicting terminology and a fragmented ontological understanding…
▽ More
Generative Artificial Intelligence (GenAI) systems are increasingly being deployed across diverse industries and research domains. Developers and end-users interact with these systems through the use of prompting and prompt engineering. Although prompt engineering is a widely adopted and extensively researched area, it suffers from conflicting terminology and a fragmented ontological understanding of what constitutes an effective prompt due to its relatively recent emergence. We establish a structured understanding of prompt engineering by assembling a taxonomy of prompting techniques and analyzing their applications. We present a detailed vocabulary of 33 vocabulary terms, a taxonomy of 58 LLM prompting techniques, and 40 techniques for other modalities. Additionally, we provide best practices and guidelines for prompt engineering, including advice for prompting state-of-the-art (SOTA) LLMs such as ChatGPT. We further present a meta-analysis of the entire literature on natural language prefix-prompting. As a culmination of these efforts, this paper presents the most comprehensive survey on prompt engineering to date.
△ Less
Submitted 26 February, 2025; v1 submitted 6 June, 2024;
originally announced June 2024.
-
TopicGPT: A Prompt-based Topic Modeling Framework
Authors:
Chau Minh Pham,
Alexander Hoyle,
Simeng Sun,
Philip Resnik,
Mohit Iyyer
Abstract:
Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users minimal control over the formatting and specificity of resulting topics. To tackle these issues, we introduce TopicGPT, a prompt-based framework that uses large lan…
▽ More
Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users minimal control over the formatting and specificity of resulting topics. To tackle these issues, we introduce TopicGPT, a prompt-based framework that uses large language models (LLMs) to uncover latent topics in a text collection. TopicGPT produces topics that align better with human categorizations compared to competing methods: it achieves a harmonic mean purity of 0.74 against human-annotated Wikipedia topics compared to 0.64 for the strongest baseline. Its topics are also interpretable, dispensing with ambiguous bags of words in favor of topics with natural language labels and associated free-form descriptions. Moreover, the framework is highly adaptable, allowing users to specify constraints and modify topics without the need for model retraining. By streamlining access to high-quality and interpretable topics, TopicGPT represents a compelling, human-centered approach to topic modeling.
△ Less
Submitted 1 April, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Words, Subwords, and Morphemes: What Really Matters in the Surprisal-Reading Time Relationship?
Authors:
Sathvik Nair,
Philip Resnik
Abstract:
An important assumption that comes with using LLMs on psycholinguistic data has gone unverified. LLM-based predictions are based on subword tokenization, not decomposition of words into morphemes. Does that matter? We carefully test this by comparing surprisal estimates using orthographic, morphological, and BPE tokenization against reading time data. Our results replicate previous findings and pr…
▽ More
An important assumption that comes with using LLMs on psycholinguistic data has gone unverified. LLM-based predictions are based on subword tokenization, not decomposition of words into morphemes. Does that matter? We carefully test this by comparing surprisal estimates using orthographic, morphological, and BPE tokenization against reading time data. Our results replicate previous findings and provide evidence that in the aggregate, predictions using BPE tokenization do not suffer relative to morphological and orthographic segmentation. However, a finer-grained analysis points to potential issues with relying on BPE-based tokenization, as well as providing promising results involving morphologically-aware surprisal estimates and suggesting a new method for evaluating morphological prediction.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
A multi-modal approach for identifying schizophrenia using cross-modal attention
Authors:
Gowtham Premananth,
Yashish M. Siriwardena,
Philip Resnik,
Carol Espy-Wilson
Abstract:
This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectivel…
▽ More
This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectively, which were then used to compute high-level coordination features that served as the inputs to the audio and video modalities. Context-independent text embeddings extracted from transcriptions of speech were used as the input for the text modality. The multi-modal system is developed by fusing a segment-to-session-level classifier for video and audio modalities with a text model based on a Hierarchical Attention Network (HAN) with cross-modal attention. The proposed multi-modal system outperforms the previous state-of-the-art multi-modal system by 8.53% in the weighted average F1 score.
△ Less
Submitted 18 April, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Using co-sharing to identify use of mainstream news for promoting potentially misleading narratives
Authors:
Pranav Goel,
Jon Green,
David Lazer,
Philip Resnik
Abstract:
Much of the research quantifying volume and spread of online misinformation measures the construct at the source level, identifying a set of specific unreliable domains that account for a relatively small share of news consumption. This source-level dichotomy obscures the potential for users to repurpose factually true information from reliable sources to advance misleading narratives. We demonstr…
▽ More
Much of the research quantifying volume and spread of online misinformation measures the construct at the source level, identifying a set of specific unreliable domains that account for a relatively small share of news consumption. This source-level dichotomy obscures the potential for users to repurpose factually true information from reliable sources to advance misleading narratives. We demonstrate this potentially far more prevalent form of misinformation by identifying articles from reliable sources that are frequently co-shared with (shared by users who also shared) "fake" news on social media, and concurrently extracting narratives present in fake news content and claims fact-checked as false. Specifically in this study, we use Twitter/X data from May 2018 to November 2021 matched to a U.S. voter file. We find that narratives present in misinformation content are significantly more likely to occur in co-shared articles than in articles from the same reliable sources that are not co-shared, consistent with users using information from mainstream sources to enhance the credibility and reach of potentially misleading claims.
△ Less
Submitted 10 June, 2025; v1 submitted 11 August, 2023;
originally announced August 2023.
-
Natural Language Decompositions of Implicit Content Enable Better Text Representations
Authors:
Alexander Hoyle,
Rupak Sarkar,
Pranav Goel,
Philip Resnik
Abstract:
When people interpret text, they rely on inferences that go beyond the observed language itself. Inspired by this observation, we introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed, then validate the plausibilit…
▽ More
When people interpret text, they rely on inferences that go beyond the observed language itself. Inspired by this observation, we introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed, then validate the plausibility of the generated content via human judgments. Incorporating these explicit representations of implicit content proves useful in multiple problem settings that involve the human interpretation of utterances: assessing the similarity of arguments, making sense of a body of opinion data, and modeling legislative behavior. Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP and particularly its applications to social science.
△ Less
Submitted 24 February, 2025; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Using Open-Ended Stressor Responses to Predict Depressive Symptoms across Demographics
Authors:
Carlos Aguirre,
Mark Dredze,
Philip Resnik
Abstract:
Stressors are related to depression, but this relationship is complex. We investigate the relationship between open-ended text responses about stressors and depressive symptoms across gender and racial/ethnic groups. First, we use topic models and other NLP tools to find thematic and vocabulary differences when reporting stressors across demographic groups. We train language models using self-repo…
▽ More
Stressors are related to depression, but this relationship is complex. We investigate the relationship between open-ended text responses about stressors and depressive symptoms across gender and racial/ethnic groups. First, we use topic models and other NLP tools to find thematic and vocabulary differences when reporting stressors across demographic groups. We train language models using self-reported stressors to predict depressive symptoms, finding a relationship between stressors and depression. Finally, we find that differences in stressors translate to downstream performance differences across demographic groups.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Are Neural Topic Models Broken?
Authors:
Alexander Hoyle,
Pranav Goel,
Rupak Sarkar,
Philip Resnik
Abstract:
Recently, the relationship between automated and human evaluation of topic models has been called into question. Method developers have staked the efficacy of new topic model variants on automated measures, and their failure to approximate human preferences places these models on uncertain ground. Moreover, existing evaluation paradigms are often divorced from real-world use.
Motivated by conten…
▽ More
Recently, the relationship between automated and human evaluation of topic models has been called into question. Method developers have staked the efficacy of new topic model variants on automated measures, and their failure to approximate human preferences places these models on uncertain ground. Moreover, existing evaluation paradigms are often divorced from real-world use.
Motivated by content analysis as a dominant real-world use case for topic modeling, we analyze two related aspects of topic models that affect their effectiveness and trustworthiness in practice for that purpose: the stability of their estimates and the extent to which the model's discovered categories align with human-determined categories in the data. We find that neural topic models fare worse in both respects compared to an established classical method. We take a step toward addressing both issues in tandem by demonstrating that a straightforward ensembling method can reliably outperform the members of the ensemble.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.
-
Is Automated Topic Model Evaluation Broken?: The Incoherence of Coherence
Authors:
Alexander Hoyle,
Pranav Goel,
Denis Peskov,
Andrew Hian-Cheong,
Jordan Boyd-Graber,
Philip Resnik
Abstract:
Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap:…
▽ More
Topic model evaluation, like evaluation of other unsupervised methods, can be contentious. However, the field has coalesced around automated estimates of topic coherence, which rely on the frequency of word co-occurrences in a reference corpus. Contemporary neural topic models surpass classical ones according to these metrics. At the same time, topic model evaluation suffers from a validation gap: automated coherence, developed for classical models, has not been validated using human experimentation for neural models. In addition, a meta-analysis of topic modeling literature reveals a substantial standardization gap in automated topic modeling benchmarks. To address the validation gap, we compare automated coherence with the two most widely accepted human judgment tasks: topic rating and word intrusion. To address the standardization gap, we systematically evaluate a dominant classical model and two state-of-the-art neural models on two commonly used datasets. Automated evaluations declare a winning model when corresponding human evaluations do not, calling into question the validity of fully automatic evaluations independent of human judgments.
△ Less
Submitted 27 October, 2021; v1 submitted 5 July, 2021;
originally announced July 2021.
-
Towards Clinical Encounter Summarization: Learning to Compose Discharge Summaries from Prior Notes
Authors:
Han-Chin Shing,
Chaitanya Shivade,
Nima Pourdamghani,
Feng Nan,
Philip Resnik,
Douglas Oard,
Parminder Bhatia
Abstract:
The records of a clinical encounter can be extensive and complex, thus placing a premium on tools that can extract and summarize relevant information. This paper introduces the task of generating discharge summaries for a clinical encounter. Summaries in this setting need to be faithful, traceable, and scale to multiple long documents, motivating the use of extract-then-abstract summarization casc…
▽ More
The records of a clinical encounter can be extensive and complex, thus placing a premium on tools that can extract and summarize relevant information. This paper introduces the task of generating discharge summaries for a clinical encounter. Summaries in this setting need to be faithful, traceable, and scale to multiple long documents, motivating the use of extract-then-abstract summarization cascades. We introduce two new measures, faithfulness and hallucination rate for evaluation in this task, which complement existing measures for fluency and informativeness. Results across seven medical sections and five models show that a summarization architecture that supports traceability yields promising results, and that a sentence-rewriting approach performs consistently on the measure used for faithfulness (faithfulness-adjusted $F_3$) over a diverse range of generated sections.
△ Less
Submitted 27 April, 2021;
originally announced April 2021.
-
Improving Neural Topic Models using Knowledge Distillation
Authors:
Alexander Hoyle,
Pranav Goel,
Philip Resnik
Abstract:
Topic models are often used to identify human-interpretable topics to help make sense of large document collections. We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers. Our modular method can be straightforwardly applied with any neural topic model to improve topic quality, which we demonstrate using two models having disparate ar…
▽ More
Topic models are often used to identify human-interpretable topics to help make sense of large document collections. We use knowledge distillation to combine the best attributes of probabilistic topic models and pretrained transformers. Our modular method can be straightforwardly applied with any neural topic model to improve topic quality, which we demonstrate using two models having disparate architectures, obtaining state-of-the-art topic coherence. We show that our adaptable framework not only improves performance in the aggregate over all estimated topics, as is commonly reported, but also in head-to-head comparisons of aligned topics.
△ Less
Submitted 5 October, 2020;
originally announced October 2020.
-
Assigning Medical Codes at the Encounter Level by Paying Attention to Documents
Authors:
Han-Chin Shing,
Guoli Wang,
Philip Resnik
Abstract:
The vast majority of research in computer assisted medical coding focuses on coding at the document level, but a substantial proportion of medical coding in the real world involves coding at the level of clinical encounters, each of which is typically represented by a potentially large set of documents. We introduce encounter-level document attention networks, which use hierarchical attention to e…
▽ More
The vast majority of research in computer assisted medical coding focuses on coding at the document level, but a substantial proportion of medical coding in the real world involves coding at the level of clinical encounters, each of which is typically represented by a potentially large set of documents. We introduce encounter-level document attention networks, which use hierarchical attention to explicitly take the hierarchical structure of encounter documentation into account. Experimental evaluation demonstrates improvements in coding accuracy as well as facilitation of human reviewers in their ability to identify which documents within an encounter play a role in determining the encounter level codes.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
Assessing Composition in Sentence Vector Representations
Authors:
Allyson Ettinger,
Ahmed Elgohary,
Colin Phillips,
Philip Resnik
Abstract:
An important component of achieving language understanding is mastering the composition of sentence meaning, but an immediate challenge to solving this problem is the opacity of sentence vector representations produced by current neural sentence composition models. We present a method to address this challenge, developing tasks that directly target compositional meaning information in sentence vec…
▽ More
An important component of achieving language understanding is mastering the composition of sentence meaning, but an immediate challenge to solving this problem is the opacity of sentence vector representations produced by current neural sentence composition models. We present a method to address this challenge, developing tasks that directly target compositional meaning information in sentence vector representations with a high degree of precision and control. To enable the creation of these controlled tasks, we introduce a specialized sentence generation system that produces large, annotated sentence sets meeting specified syntactic, semantic and lexical constraints. We describe the details of the method and generation system, and then present results of experiments applying our method to probe for compositional information in embeddings from a number of existing sentence composition models. We find that the method is able to extract useful information about the differing capacities of these models, and we discuss the implications of our results with respect to these systems' capturing of sentence information. We make available for public use the datasets used for these experiments, as well as the generation system.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.
-
Parser for Abstract Meaning Representation using Learning to Search
Authors:
Sudha Rao,
Yogarshi Vyas,
Hal Daume III,
Philip Resnik
Abstract:
We develop a novel technique to parse English sentences into Abstract Meaning Representation (AMR) using SEARN, a Learning to Search approach, by modeling the concept and the relation learning in a unified framework. We evaluate our parser on multiple datasets from varied domains and show an absolute improvement of 2% to 6% over the state-of-the-art. Additionally we show that using the most freque…
▽ More
We develop a novel technique to parse English sentences into Abstract Meaning Representation (AMR) using SEARN, a Learning to Search approach, by modeling the concept and the relation learning in a unified framework. We evaluate our parser on multiple datasets from varied domains and show an absolute improvement of 2% to 6% over the state-of-the-art. Additionally we show that using the most frequent concept gives us a baseline that is stronger than the state-of-the-art for concept prediction. We plan to release our parser for public use.
△ Less
Submitted 26 October, 2015;
originally announced October 2015.
-
Two Algorithms for Finding $k$ Shortest Paths of a Weighted Pushdown Automaton
Authors:
Ke Wu,
Philip Resnik
Abstract:
We introduce efficient algorithms for finding the $k$ shortest paths of a weighted pushdown automaton (WPDA), a compact representation of a weighted set of strings with potential applications in parsing and machine translation. Both of our algorithms are derived from the same weighted deductive logic description of the execution of a WPDA using different search strategies. Experimental results sho…
▽ More
We introduce efficient algorithms for finding the $k$ shortest paths of a weighted pushdown automaton (WPDA), a compact representation of a weighted set of strings with potential applications in parsing and machine translation. Both of our algorithms are derived from the same weighted deductive logic description of the execution of a WPDA using different search strategies. Experimental results show our Algorithm 2 adds very little overhead vs. the single shortest path algorithm, even with a large $k$.
△ Less
Submitted 5 February, 2013; v1 submitted 4 December, 2012;
originally announced December 2012.
-
Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language
Authors:
P. Resnik
Abstract:
This article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The article presents algorithms that take advantage of taxonomic similarity in resolving…
▽ More
This article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The article presents algorithms that take advantage of taxonomic similarity in resolving syntactic and semantic ambiguity, along with experimental results demonstrating their effectiveness.
△ Less
Submitted 26 May, 2011;
originally announced May 2011.
-
Tagger Evaluation Given Hierarchical Tag Sets
Authors:
I. Dan Melamed,
Philip Resnik
Abstract:
We present methods for evaluating human and automatic taggers that extend current practice in three ways. First, we show how to evaluate taggers that assign multiple tags to each test instance, even if they do not assign probabilities. Second, we show how to accommodate a common property of manually constructed ``gold standards'' that are typically used for objective evaluation, namely that ther…
▽ More
We present methods for evaluating human and automatic taggers that extend current practice in three ways. First, we show how to evaluate taggers that assign multiple tags to each test instance, even if they do not assign probabilities. Second, we show how to accommodate a common property of manually constructed ``gold standards'' that are typically used for objective evaluation, namely that there is often more than one correct answer. Third, we show how to measure performance when the set of possible tags is tree-structured in an IS-A hierarchy. To illustrate how our methods can be used to measure inter-annotator agreement, we show how to compute the kappa coefficient over hierarchical tag sets.
△ Less
Submitted 9 August, 2000;
originally announced August 2000.
-
Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text
Authors:
Philip Resnik
Abstract:
Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parall…
▽ More
Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genre- and domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parallel translation, offering a potential solution to some of these problems and unique opportunities of its own. This paper presents the necessary first step in that exploration: a method for automatically finding parallel translated documents on the Web. The technique is conceptually simple, fully language independent, and scalable, and preliminary evaluation results indicate that the method may be accurate enough to apply without human intervention.
△ Less
Submitted 7 August, 1998;
originally announced August 1998.
-
Evaluating Multilingual Gisting of Web Pages
Authors:
Philip Resnik
Abstract:
We describe a prototype system for multilingual gisting of Web pages, and present an evaluation methodology based on the notion of gisting as decision support. This evaluation paradigm is straightforward, rigorous, permits fair comparison of alternative approaches, and should easily generalize to evaluation in other situations where the user is faced with decision-making on the basis of informat…
▽ More
We describe a prototype system for multilingual gisting of Web pages, and present an evaluation methodology based on the notion of gisting as decision support. This evaluation paradigm is straightforward, rigorous, permits fair comparison of alternative approaches, and should easily generalize to evaluation in other situations where the user is faced with decision-making on the basis of information in restricted or alternative form.
△ Less
Submitted 7 April, 1997;
originally announced April 1997.
-
Semi-Automatic Acquisition of Domain-Specific Translation Lexicons
Authors:
Philip Resnik,
I. Dan Melamed
Abstract:
We investigate the utility of an algorithm for translation lexicon acquisition (SABLE), used previously on a very large corpus to acquire general translation lexicons, when that algorithm is applied to a much smaller corpus to produce candidates for domain-specific translation lexicons.
We investigate the utility of an algorithm for translation lexicon acquisition (SABLE), used previously on a very large corpus to acquire general translation lexicons, when that algorithm is applied to a much smaller corpus to produce candidates for domain-specific translation lexicons.
△ Less
Submitted 27 March, 1997;
originally announced March 1997.
-
Using Information Content to Evaluate Semantic Similarity in a Taxonomy
Authors:
Philip Resnik
Abstract:
This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r = 0.90 for human subjects performing the same task), and significantly better than the traditi…
▽ More
This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r = 0.90 for human subjects performing the same task), and significantly better than the traditional edge counting approach (r = 0.66).
△ Less
Submitted 29 November, 1995;
originally announced November 1995.
-
Disambiguating Noun Groupings with Respect to WordNet Senses
Authors:
Philip Resnik
Abstract:
Word groupings useful for language processing tasks are increasingly available, as thesauri appear on-line, and as distributional word clustering techniques improve. However, for many tasks, one is interested in relationships among word {\em senses}, not words. This paper presents a method for automatic sense disambiguation of nouns appearing within sets of related nouns --- the kind of data one…
▽ More
Word groupings useful for language processing tasks are increasingly available, as thesauri appear on-line, and as distributional word clustering techniques improve. However, for many tasks, one is interested in relationships among word {\em senses}, not words. This paper presents a method for automatic sense disambiguation of nouns appearing within sets of related nouns --- the kind of data one finds in on-line thesauri, or as the output of distributional clustering algorithms. Disambiguation is performed with respect to WordNet senses, which are fairly fine-grained; however, the method also permits the assignment of higher-level WordNet categories rather than sense labels. The method is illustrated primarily by example, though results of a more rigorous evaluation are also presented.
△ Less
Submitted 29 November, 1995;
originally announced November 1995.
-
A Rule-Based Approach To Prepositional Phrase Attachment Disambiguation
Authors:
Eric Brill,
Philip Resnik
Abstract:
In this paper, we describe a new corpus-based approach to prepositional phrase attachment disambiguation, and present results comparing performance of this algorithm with other corpus-based approaches to this problem.
In this paper, we describe a new corpus-based approach to prepositional phrase attachment disambiguation, and present results comparing performance of this algorithm with other corpus-based approaches to this problem.
△ Less
Submitted 25 October, 1994;
originally announced October 1994.