Search | arXiv e-print repository

LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles

Authors: Ho Yin 'Sam' Ng, Ting-Yao Hsu, Aashish Anantha Ramakrishnan, Branislav Kveton, Nedim Lipka, Franck Dernoncourt, Dongwon Lee, Tong Yu, Sungchul Kim, Ryan A. Rossi, Ting-Hao 'Kenneth' Huang

Abstract: Figure captions are crucial for helping readers understand and remember a figure's key message. Many models have been developed to generate these captions, helping authors compose better quality captions more easily. Yet, authors almost always need to revise generic AI-generated captions to match their writing style and the domain's style, highlighting the need for personalization. Despite languag… ▽ More Figure captions are crucial for helping readers understand and remember a figure's key message. Many models have been developed to generate these captions, helping authors compose better quality captions more easily. Yet, authors almost always need to revise generic AI-generated captions to match their writing style and the domain's style, highlighting the need for personalization. Despite language models' personalization (LaMP) advances, these technologies often focus on text-only settings and rarely address scenarios where both inputs and profiles are multimodal. This paper introduces LaMP-Cap, a dataset for personalized figure caption generation with multimodal figure profiles. For each target figure, LaMP-Cap provides not only the needed inputs, such as figure images, but also up to three other figures from the same document--each with its image, caption, and figure-mentioning paragraphs--as a profile to characterize the context. Experiments with four LLMs show that using profile information consistently helps generate captions closer to the original author-written ones. Ablation studies reveal that images in the profile are more helpful than figure-mentioning paragraphs, highlighting the advantage of using multimodal profiles over text-only ones. △ Less

Submitted 17 June, 2025; v1 submitted 6 June, 2025; originally announced June 2025.

Comments: The LaMP-CAP dataset is publicly available at: https://github.com/Crowd-AI-Lab/lamp-cap

arXiv:2502.11767 [pdf, ps, other]

From Selection to Generation: A Survey of LLM-based Active Learning

Authors: Yu Xia, Subhojyoti Mukherjee, Zhouhang Xie, Junda Wu, Xintong Li, Ryan Aponte, Hanjia Lyu, Joe Barrow, Hongjie Chen, Franck Dernoncourt, Branislav Kveton, Tong Yu, Ruiyi Zhang, Jiuxiang Gu, Nesreen K. Ahmed, Yu Wang, Xiang Chen, Hanieh Deilamsalehy, Sungchul Kim, Zhengmian Hu, Yue Zhao, Nedim Lipka, Seunghyun Yoon, Ting-Hao Kenneth Huang, Zichao Wang , et al. (9 additional authors not shown)

Abstract: Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the incre… ▽ More Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the increasing importance of high-quality data and efficient model training in the era of LLMs, we present a comprehensive survey on LLM-based Active Learning. We introduce an intuitive taxonomy that categorizes these techniques and discuss the transformative roles LLMs can play in the active learning loop. We further examine the impact of AL on LLM learning paradigms and its applications across various domains. Finally, we identify open challenges and propose future research directions. This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques and deploy them to new applications. △ Less

Submitted 31 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

Comments: ACL 2025

arXiv:2502.11267 [pdf, other]

doi 10.1145/3706598.3714319

Prompting in the Dark: Assessing Human Performance in Prompt Engineering for Data Labeling When Gold Labels Are Absent

Authors: Zeyu He, Saniya Naphade, Ting-Hao 'Kenneth' Huang

Abstract: Millions of users prompt large language models (LLMs) for various tasks, but how good are people at prompt engineering? Do users actually get closer to their desired outcome over multiple iterations of their prompts? These questions are crucial when no gold-standard labels are available to measure progress. This paper investigates a scenario in LLM-powered data labeling, "prompting in the dark," w… ▽ More Millions of users prompt large language models (LLMs) for various tasks, but how good are people at prompt engineering? Do users actually get closer to their desired outcome over multiple iterations of their prompts? These questions are crucial when no gold-standard labels are available to measure progress. This paper investigates a scenario in LLM-powered data labeling, "prompting in the dark," where users iteratively prompt LLMs to label data without using manually-labeled benchmarks. We developed PromptingSheet, a Google Sheets add-on that enables users to compose, revise, and iteratively label data through spreadsheets. Through a study with 20 participants, we found that prompting in the dark was highly unreliable-only 9 participants improved labeling accuracy after four or more iterations. Automated prompt optimization tools like DSPy also struggled when few gold labels were available. Our findings highlight the importance of gold labels and the needs, as well as the risks, of automated support in human prompt engineering, providing insights for future tool design. △ Less

Submitted 16 February, 2025; originally announced February 2025.

Comments: Accepted By CHI 2025

arXiv:2502.07058 [pdf, other]

Using Contextually Aligned Online Reviews to Measure LLMs' Performance Disparities Across Language Varieties

Authors: Zixin Tang, Chieh-Yang Huang, Tsung-Che Li, Ho Yin Sam Ng, Hen-Hsen Huang, Ting-Hao 'Kenneth' Huang

Abstract: A language can have different varieties. These varieties can affect the performance of natural language processing (NLP) models, including large language models (LLMs), which are often trained on data from widely spoken varieties. This paper introduces a novel and cost-effective approach to benchmark model performance across language varieties. We argue that international online review platforms,… ▽ More A language can have different varieties. These varieties can affect the performance of natural language processing (NLP) models, including large language models (LLMs), which are often trained on data from widely spoken varieties. This paper introduces a novel and cost-effective approach to benchmark model performance across language varieties. We argue that international online review platforms, such as Booking.com, can serve as effective data sources for constructing datasets that capture comments in different language varieties from similar real-world scenarios, like reviews for the same hotel with the same rating using the same language (e.g., Mandarin Chinese) but different language varieties (e.g., Taiwan Mandarin, Mainland Mandarin). To prove this concept, we constructed a contextually aligned dataset comprising reviews in Taiwan Mandarin and Mainland Mandarin and tested six LLMs in a sentiment analysis task. Our results show that LLMs consistently underperform in Taiwan Mandarin. △ Less

Submitted 20 March, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

Comments: Accepted by 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), theme track

arXiv:2501.19353 [pdf, other]

Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SciCap Challenge 2023

Authors: Ting-Yao E. Hsu, Yi-Li Hsu, Shaurya Rohatgi, Chieh-Yang Huang, Ho Yin Sam Ng, Ryan Rossi, Sungchul Kim, Tong Yu, Lun-Wei Ku, C. Lee Giles, Ting-Hao K. Huang

Abstract: Since the SciCap datasets launch in 2021, the research community has made significant progress in generating captions for scientific figures in scholarly articles. In 2023, the first SciCap Challenge took place, inviting global teams to use an expanded SciCap dataset to develop models for captioning diverse figure types across various academic fields. At the same time, text generation models advan… ▽ More Since the SciCap datasets launch in 2021, the research community has made significant progress in generating captions for scientific figures in scholarly articles. In 2023, the first SciCap Challenge took place, inviting global teams to use an expanded SciCap dataset to develop models for captioning diverse figure types across various academic fields. At the same time, text generation models advanced quickly, with many powerful pre-trained large multimodal models (LMMs) emerging that showed impressive capabilities in various vision-and-language tasks. This paper presents an overview of the first SciCap Challenge and details the performance of various models on its data, capturing a snapshot of the fields state. We found that professional editors overwhelmingly preferred figure captions generated by GPT-4V over those from all other models and even the original captions written by authors. Following this key finding, we conducted detailed analyses to answer this question: Have advanced LMMs solved the task of generating captions for scientific figures? △ Less

Submitted 18 February, 2025; v1 submitted 31 January, 2025; originally announced January 2025.

Comments: Accepted to TACL 2025

arXiv:2501.18210 [pdf, other]

doi 10.1145/3706598.3713379

Hashtag Re-Appropriation for Audience Control on Recommendation-Driven Social Media Xiaohongshu (rednote)

Authors: Ruyuan Wan, Lingbo Tong, Tiffany Knearem, Toby Jia-Jun Li, Ting-Hao 'Kenneth' Huang, Qunfang Wu

Abstract: Algorithms have played a central role in personalized recommendations on social media. However, they also present significant obstacles for content creators trying to predict and manage their audience reach. This issue is particularly challenging for marginalized groups seeking to maintain safe spaces. Our study explores how women on Xiaohongshu (rednote), a recommendation-driven social platform,… ▽ More Algorithms have played a central role in personalized recommendations on social media. However, they also present significant obstacles for content creators trying to predict and manage their audience reach. This issue is particularly challenging for marginalized groups seeking to maintain safe spaces. Our study explores how women on Xiaohongshu (rednote), a recommendation-driven social platform, proactively re-appropriate hashtags (e.g., #Baby Supplemental Food) by using them in posts unrelated to their literal meaning. The hashtags were strategically chosen from topics that would be uninteresting to the male audience they wanted to block. Through a mixed-methods approach, we analyzed the practice of hashtag re-appropriation based on 5,800 collected posts and interviewed 24 active users from diverse backgrounds to uncover users' motivations and reactions towards the re-appropriation. This practice highlights how users can reclaim agency over content distribution on recommendation-driven platforms, offering insights into self-governance within algorithmic-centered power structures. △ Less

Submitted 3 March, 2025; v1 submitted 30 January, 2025; originally announced January 2025.

arXiv:2501.06317 [pdf, other]

Understanding How Paper Writers Use AI-Generated Captions in Figure Caption Writing

Authors: Ho Yin, Ng, Ting-Yao Hsu, Jiyoo Min, Sungchul Kim, Ryan A. Rossi, Tong Yu, Hyunggu Jung, Ting-Hao 'Kenneth' Huang

Abstract: Figures and their captions play a key role in scientific publications. However, despite their importance, many captions in published papers are poorly crafted, largely due to a lack of attention by paper authors. While prior AI research has explored caption generation, it has mainly focused on reader-centered use cases, where users evaluate generated captions rather than actively integrating them… ▽ More Figures and their captions play a key role in scientific publications. However, despite their importance, many captions in published papers are poorly crafted, largely due to a lack of attention by paper authors. While prior AI research has explored caption generation, it has mainly focused on reader-centered use cases, where users evaluate generated captions rather than actively integrating them into their writing. This paper addresses this gap by investigating how paper authors incorporate AI-generated captions into their writing process through a user study involving 18 participants. Each participant rewrote captions for two figures from their own recently published work, using captions generated by state-of-the-art AI models as a resource. By analyzing video recordings of the writing process through interaction analysis, we observed that participants often began by copying and refining AI-generated captions. Paper writers favored longer, detail-rich captions that integrated textual and visual elements but found current AI models less effective for complex figures. These findings highlight the nuanced and diverse nature of figure caption composition, revealing design opportunities for AI systems to better support the challenges of academic writing. △ Less

Submitted 10 January, 2025; originally announced January 2025.

Comments: This paper will appear at AAAI 2025 Workshop (2nd AI4Research Workshop: Towards a Knowledge-grounded Scientific Research Lifecycle)

arXiv:2501.02552 [pdf, other]

Multi-LLM Collaborative Caption Generation in Scientific Documents

Authors: Jaeyoung Kim, Jongho Lee, Hong-Jun Choi, Ting-Yao Hsu, Chieh-Yang Huang, Sungchul Kim, Ryan Rossi, Tong Yu, Clyde Lee Giles, Ting-Hao 'Kenneth' Huang, Sungchul Choi

Abstract: Scientific figure captioning is a complex task that requires generating contextually appropriate descriptions of visual content. However, existing methods often fall short by utilizing incomplete information, treating the task solely as either an image-to-text or text summarization problem. This limitation hinders the generation of high-quality captions that fully capture the necessary details. Mo… ▽ More Scientific figure captioning is a complex task that requires generating contextually appropriate descriptions of visual content. However, existing methods often fall short by utilizing incomplete information, treating the task solely as either an image-to-text or text summarization problem. This limitation hinders the generation of high-quality captions that fully capture the necessary details. Moreover, existing data sourced from arXiv papers contain low-quality captions, posing significant challenges for training large language models (LLMs). In this paper, we introduce a framework called Multi-LLM Collaborative Figure Caption Generation (MLBCAP) to address these challenges by leveraging specialized LLMs for distinct sub-tasks. Our approach unfolds in three key modules: (Quality Assessment) We utilize multimodal LLMs to assess the quality of training data, enabling the filtration of low-quality captions. (Diverse Caption Generation) We then employ a strategy of fine-tuning/prompting multiple LLMs on the captioning task to generate candidate captions. (Judgment) Lastly, we prompt a prominent LLM to select the highest quality caption from the candidates, followed by refining any remaining inaccuracies. Human evaluations demonstrate that informative captions produced by our approach rank better than human-written captions, highlighting its effectiveness. Our code is available at https://github.com/teamreboott/MLBCAP △ Less

Submitted 5 January, 2025; originally announced January 2025.

Comments: Accepted to AAAI 2025 AI4Research Workshop

arXiv:2410.03457 [pdf, other]

CoCoLoFa: A Dataset of News Comments with Common Logical Fallacies Written by LLM-Assisted Crowds

Authors: Min-Hsuan Yeh, Ruyuan Wan, Ting-Hao 'Kenneth' Huang

Abstract: Detecting logical fallacies in texts can help users spot argument flaws, but automating this detection is not easy. Manually annotating fallacies in large-scale, real-world text data to create datasets for developing and validating detection models is costly. This paper introduces CoCoLoFa, the largest known logical fallacy dataset, containing 7,706 comments for 648 news articles, with each commen… ▽ More Detecting logical fallacies in texts can help users spot argument flaws, but automating this detection is not easy. Manually annotating fallacies in large-scale, real-world text data to create datasets for developing and validating detection models is costly. This paper introduces CoCoLoFa, the largest known logical fallacy dataset, containing 7,706 comments for 648 news articles, with each comment labeled for fallacy presence and type. We recruited 143 crowd workers to write comments embodying specific fallacy types (e.g., slippery slope) in response to news articles. Recognizing the complexity of this writing task, we built an LLM-powered assistant into the workers' interface to aid in drafting and refining their comments. Experts rated the writing quality and labeling validity of CoCoLoFa as high and reliable. BERT-based models fine-tuned using CoCoLoFa achieved the highest fallacy detection (F1=0.86) and classification (F1=0.87) performance on its test set, outperforming the state-of-the-art LLMs. Our work shows that combining crowdsourcing and LLMs enables us to more effectively construct datasets for complex linguistic phenomena that crowd workers find challenging to produce on their own. △ Less

Submitted 4 October, 2024; originally announced October 2024.

Comments: In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)

arXiv:2409.12949 [pdf, ps, other]

doi 10.1109/TRO.2025.3577037

A Learning-based Quadcopter Controller with Extreme Adaptation

Authors: Dingqi Zhang, Antonio Loquercio, Jerry Tang, Ting-Hao Wang, Jitendra Malik, Mark W. Mueller

Abstract: This paper introduces a learning-based low-level controller for quadcopters, which adaptively controls quadcopters with significant variations in mass, size, and actuator capabilities. Our approach leverages a combination of imitation learning and reinforcement learning, creating a fast-adapting and general control framework for quadcopters that eliminates the need for precise model estimation or… ▽ More This paper introduces a learning-based low-level controller for quadcopters, which adaptively controls quadcopters with significant variations in mass, size, and actuator capabilities. Our approach leverages a combination of imitation learning and reinforcement learning, creating a fast-adapting and general control framework for quadcopters that eliminates the need for precise model estimation or manual tuning. The controller estimates a latent representation of the vehicle's system parameters from sensor-action history, enabling it to adapt swiftly to diverse dynamics. Extensive evaluations in simulation demonstrate the controller's ability to generalize to unseen quadcopter parameters, with an adaptation range up to 16 times broader than the training set. In real-world tests, the controller is successfully deployed on quadcopters with mass differences of 3.7 times and propeller constants varying by more than 100 times, while also showing rapid adaptation to disturbances such as off-center payloads and motor failures. These results highlight the potential of our controller in extreme adaptation to simplify the design process and enhance the reliability of autonomous drone operations in unpredictable environments. The video and code are at: https://github.com/muellerlab/xadapt_ctrl △ Less

Submitted 8 June, 2025; v1 submitted 19 September, 2024; originally announced September 2024.

Comments: Accepted for the Transaction on Robotics (T-RO), April 2025

arXiv:2408.06494 [pdf, other]

What Color Scheme is More Effective in Assisting Readers to Locate Information in a Color-Coded Article?

Authors: Ho Yin Ng, Zeyu He, Ting-Hao 'Kenneth' Huang

Abstract: Color coding, a technique assigning specific colors to cluster information types, has proven advantages in aiding human cognitive activities, especially reading and comprehension. The rise of Large Language Models (LLMs) has streamlined document coding, enabling simple automatic text labeling with various schemes. This has the potential to make color-coding more accessible and benefit more users.… ▽ More Color coding, a technique assigning specific colors to cluster information types, has proven advantages in aiding human cognitive activities, especially reading and comprehension. The rise of Large Language Models (LLMs) has streamlined document coding, enabling simple automatic text labeling with various schemes. This has the potential to make color-coding more accessible and benefit more users. However, the impact of color choice on information seeking is understudied. We conducted a user study assessing various color schemes' effectiveness in LLM-coded text documents, standardizing contrast ratios to approximately 5.55:1 across schemes. Participants performed timed information-seeking tasks in color-coded scholarly abstracts. Results showed non-analogous and yellow-inclusive color schemes improved performance, with the latter also being more preferred by participants. These findings can inform better color scheme choices for text annotation. As LLMs advance document coding, we advocate for more research focusing on the "color" aspect of color-coding techniques. △ Less

Submitted 26 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

Comments: This paper will appear at IEEE VIS 2024

arXiv:2406.12787 [pdf, other]

Generating Educational Materials with Different Levels of Readability using LLMs

Authors: Chieh-Yang Huang, Jing Wei, Ting-Hao 'Kenneth' Huang

Abstract: This study introduces the leveled-text generation task, aiming to rewrite educational materials to specific readability levels while preserving meaning. We assess the capability of GPT-3.5, LLaMA-2 70B, and Mixtral 8x7B, to generate content at various readability levels through zero-shot and few-shot prompting. Evaluating 100 processed educational materials reveals that few-shot prompting signific… ▽ More This study introduces the leveled-text generation task, aiming to rewrite educational materials to specific readability levels while preserving meaning. We assess the capability of GPT-3.5, LLaMA-2 70B, and Mixtral 8x7B, to generate content at various readability levels through zero-shot and few-shot prompting. Evaluating 100 processed educational materials reveals that few-shot prompting significantly improves performance in readability manipulation and information preservation. LLaMA-2 70B performs better in achieving the desired difficulty range, while GPT-3.5 maintains original meaning. However, manual inspection highlights concerns such as misinformation introduction and inconsistent edit distribution. These findings emphasize the need for further research to ensure the quality of generated educational content. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: In2Writing 2024

arXiv:2404.17025 [pdf, other]

How Does Conversation Length Impact User's Satisfaction? A Case Study of Length-Controlled Conversations with LLM-Powered Chatbots

Authors: Shih-Hong Huang, Ya-Fang Lin, Zeyu He, Chieh-Yang Huang, Ting-Hao 'Kenneth' Huang

Abstract: Users can discuss a wide range of topics with large language models (LLMs), but they do not always prefer solving problems or getting information through lengthy conversations. This raises an intriguing HCI question: How does instructing LLMs to engage in longer or shorter conversations affect conversation quality? In this paper, we developed two Slack chatbots using GPT-4 with the ability to vary… ▽ More Users can discuss a wide range of topics with large language models (LLMs), but they do not always prefer solving problems or getting information through lengthy conversations. This raises an intriguing HCI question: How does instructing LLMs to engage in longer or shorter conversations affect conversation quality? In this paper, we developed two Slack chatbots using GPT-4 with the ability to vary conversation lengths and conducted a user study. Participants asked the chatbots both highly and less conversable questions, engaging in dialogues with 0, 3, 5, and 7 conversational turns. We found that the conversation quality does not differ drastically across different conditions, while participants had mixed reactions. Our study demonstrates LLMs' ability to change conversation length and the potential benefits for users resulting from such changes, but we caution that changes in text form may not necessarily imply changes in quality or content. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2403.17784 [pdf, other]

doi 10.1145/3613905.3650738

SciCapenter: Supporting Caption Composition for Scientific Figures with Machine-Generated Captions and Ratings

Authors: Ting-Yao Hsu, Chieh-Yang Huang, Shih-Hong Huang, Ryan Rossi, Sungchul Kim, Tong Yu, C. Lee Giles, Ting-Hao K. Huang

Abstract: Crafting effective captions for figures is important. Readers heavily depend on these captions to grasp the figure's message. However, despite a well-developed set of AI technologies for figures and captions, these have rarely been tested for usefulness in aiding caption writing. This paper introduces SciCapenter, an interactive system that puts together cutting-edge AI technologies for scientific… ▽ More Crafting effective captions for figures is important. Readers heavily depend on these captions to grasp the figure's message. However, despite a well-developed set of AI technologies for figures and captions, these have rarely been tested for usefulness in aiding caption writing. This paper introduces SciCapenter, an interactive system that puts together cutting-edge AI technologies for scientific figure captions to aid caption composition. SciCapenter generates a variety of captions for each figure in a scholarly article, providing scores and a comprehensive checklist to assess caption quality across multiple critical aspects, such as helpfulness, OCR mention, key takeaways, and visual properties reference. Users can directly edit captions in SciCapenter, resubmit for revised evaluations, and iteratively refine them. A user study with Ph.D. students indicates that SciCapenter significantly lowers the cognitive load of caption writing. Participants' feedback further offers valuable design insights for future systems aiming to enhance caption writing. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: CHI EA '24: Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems

arXiv:2402.16795 [pdf, other]

doi 10.1145/3613904.3642834

If in a Crowdsourced Data Annotation Pipeline, a GPT-4

Authors: Zeyu He, Chieh-Yang Huang, Chien-Kuang Cornelia Ding, Shaurya Rohatgi, Ting-Hao 'Kenneth' Huang

Abstract: Recent studies indicated GPT-4 outperforms online crowd workers in data labeling accuracy, notably workers from Amazon Mechanical Turk (MTurk). However, these studies were criticized for deviating from standard crowdsourcing practices and emphasizing individual workers' performances over the whole data-annotation process. This paper compared GPT-4 and an ethical and well-executed MTurk pipeline, w… ▽ More Recent studies indicated GPT-4 outperforms online crowd workers in data labeling accuracy, notably workers from Amazon Mechanical Turk (MTurk). However, these studies were criticized for deviating from standard crowdsourcing practices and emphasizing individual workers' performances over the whole data-annotation process. This paper compared GPT-4 and an ethical and well-executed MTurk pipeline, with 415 workers labeling 3,177 sentence segments from 200 scholarly articles using the CODA-19 scheme. Two worker interfaces yielded 127,080 labels, which were then used to infer the final labels through eight label-aggregation algorithms. Our evaluation showed that despite best practices, MTurk pipeline's highest accuracy was 81.5%, whereas GPT-4 achieved 83.6%. Interestingly, when combining GPT-4's labels with crowd labels collected via an advanced worker interface for aggregation, 2 out of the 8 algorithms achieved an even higher accuracy (87.5%, 87.0%). Further analysis suggested that, when the crowd's and GPT-4's labeling strengths are complementary, aggregating them could increase labeling accuracy. △ Less

Submitted 28 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: Accepted By CHI 2024

arXiv:2311.16521 [pdf, other]

Inspo: Writing with Crowds Alongside AI

Authors: Chieh-Yang Huang, Sanjana Gautam, Shannon McClellan Brooks, Ya-Fang Lin, Tiffany Knearem, Ting-Hao 'Kenneth' Huang

Abstract: The use of artificial intelligence (AI) to support creative writing has bloomed in recent years. However, it is less well understood how AI compares to on-demand human support. We explored how writers interact with both AI and crowd worker writing assistants in creative writing. We replicated the interface of the prior crowd-writing system, Heteroglossia, and developed Inspo, a text editor allowin… ▽ More The use of artificial intelligence (AI) to support creative writing has bloomed in recent years. However, it is less well understood how AI compares to on-demand human support. We explored how writers interact with both AI and crowd worker writing assistants in creative writing. We replicated the interface of the prior crowd-writing system, Heteroglossia, and developed Inspo, a text editor allowing users to request suggestions from AI models and crowd workers. In a one-week deployment study involving eight creative writers, we examined how often participants selected crowd workers when fluent AI text generators were also available. Findings showed a consistent decline in crowd worker usage, with participants favoring AI due to its faster responses and more consistent quality. We conclude with suggestions for future systems, recommending designs that account for the unique strengths and weaknesses of human versus AI assistants, strategies to address automation bias, and sociocultural views of writing. △ Less

Submitted 19 October, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

arXiv:2310.15405 [pdf, other]

GPT-4 as an Effective Zero-Shot Evaluator for Scientific Figure Captions

Authors: Ting-Yao Hsu, Chieh-Yang Huang, Ryan Rossi, Sungchul Kim, C. Lee Giles, Ting-Hao K. Huang

Abstract: There is growing interest in systems that generate captions for scientific figures. However, assessing these systems output poses a significant challenge. Human evaluation requires academic expertise and is costly, while automatic evaluation depends on often low-quality author-written captions. This paper investigates using large language models (LLMs) as a cost-effective, reference-free method fo… ▽ More There is growing interest in systems that generate captions for scientific figures. However, assessing these systems output poses a significant challenge. Human evaluation requires academic expertise and is costly, while automatic evaluation depends on often low-quality author-written captions. This paper investigates using large language models (LLMs) as a cost-effective, reference-free method for evaluating figure captions. We first constructed SCICAP-EVAL, a human evaluation dataset that contains human judgments for 3,600 scientific figure captions, both original and machine-made, for 600 arXiv figures. We then prompted LLMs like GPT-4 and GPT-3 to score (1-6) each caption based on its potential to aid reader understanding, given relevant context such as figure-mentioning paragraphs. Results show that GPT-4, used as a zero-shot evaluator, outperformed all other models and even surpassed assessments made by Computer Science and Informatics undergraduates, achieving a Kendall correlation score of 0.401 with Ph.D. students rankings △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: To Appear in EMNLP 2023 Findings

arXiv:2310.15129 [pdf, other]

Location-Aware Visual Question Generation with Lightweight Models

Authors: Nicholas Collin Suwono, Justin Chih-Yao Chen, Tun Min Hung, Ting-Hao Kenneth Huang, I-Bin Liao, Yung-Hui Li, Lun-Wei Ku, Shao-Hua Sun

Abstract: This work introduces a novel task, location-aware visual question generation (LocaVQG), which aims to generate engaging questions from data relevant to a particular geographical location. Specifically, we represent such location-aware information with surrounding images and a GPS coordinate. To tackle this task, we present a dataset generation pipeline that leverages GPT-4 to produce diverse and s… ▽ More This work introduces a novel task, location-aware visual question generation (LocaVQG), which aims to generate engaging questions from data relevant to a particular geographical location. Specifically, we represent such location-aware information with surrounding images and a GPS coordinate. To tackle this task, we present a dataset generation pipeline that leverages GPT-4 to produce diverse and sophisticated questions. Then, we aim to learn a lightweight model that can address the LocaVQG task and fit on an edge device, such as a mobile phone. To this end, we propose a method which can reliably generate engaging questions from location-aware information. Our proposed method outperforms baselines regarding human evaluation (e.g., engagement, grounding, coherence) and automatic evaluation metrics (e.g., BERTScore, ROUGE-2). Moreover, we conduct extensive ablation studies to justify our proposed techniques for both generating the dataset and solving the task. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: EMNLP 2023

arXiv:2308.04346 [pdf, other]

Unmasking Nationality Bias: A Study of Human Perception of Nationalities in AI-Generated Articles

Authors: Pranav Narayanan Venkit, Sanjana Gautam, Ruchi Panchanadikar, Ting-Hao `Kenneth' Huang, Shomir Wilson

Abstract: We investigate the potential for nationality biases in natural language processing (NLP) models using human evaluation methods. Biased NLP models can perpetuate stereotypes and lead to algorithmic discrimination, posing a significant challenge to the fairness and justice of AI systems. Our study employs a two-step mixed-methods approach that includes both quantitative and qualitative analysis to i… ▽ More We investigate the potential for nationality biases in natural language processing (NLP) models using human evaluation methods. Biased NLP models can perpetuate stereotypes and lead to algorithmic discrimination, posing a significant challenge to the fairness and justice of AI systems. Our study employs a two-step mixed-methods approach that includes both quantitative and qualitative analysis to identify and understand the impact of nationality bias in a text generation model. Through our human-centered quantitative analysis, we measure the extent of nationality bias in articles generated by AI sources. We then conduct open-ended interviews with participants, performing qualitative coding and thematic analysis to understand the implications of these biases on human readers. Our findings reveal that biased NLP models tend to replicate and amplify existing societal biases, which can translate to harm if used in a sociotechnical setting. The qualitative analysis from our interviews offers insights into the experience readers have when encountering such articles, highlighting the potential to shift a reader's perception of a country. These findings emphasize the critical role of public perception in shaping AI's impact on society and the need to correct biases in AI systems. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2306.04820 [pdf, other]

Good Data, Large Data, or No Data? Comparing Three Approaches in Developing Research Aspect Classifiers for Biomedical Papers

Authors: Shreya Chandrasekhar, Chieh-Yang Huang, Ting-Hao 'Kenneth' Huang

Abstract: The rapid growth of scientific publications, particularly during the COVID-19 pandemic, emphasizes the need for tools to help researchers efficiently comprehend the latest advancements. One essential part of understanding scientific literature is research aspect classification, which categorizes sentences in abstracts to Background, Purpose, Method, and Finding. In this study, we investigate the i… ▽ More The rapid growth of scientific publications, particularly during the COVID-19 pandemic, emphasizes the need for tools to help researchers efficiently comprehend the latest advancements. One essential part of understanding scientific literature is research aspect classification, which categorizes sentences in abstracts to Background, Purpose, Method, and Finding. In this study, we investigate the impact of different datasets on model performance for the crowd-annotated CODA-19 research aspect classification task. Specifically, we explore the potential benefits of using the large, automatically curated PubMed 200K RCT dataset and evaluate the effectiveness of large language models (LLMs), such as LLaMA, GPT-3, ChatGPT, and GPT-4. Our results indicate that using the PubMed 200K RCT dataset does not improve performance for the CODA-19 task. We also observe that while GPT-4 performs well, it does not outperform the SciBERT model fine-tuned on the CODA-19 dataset, emphasizing the importance of a dedicated and task-aligned datasets dataset for the target task. Our code is available at https://github.com/Crowd-AI-Lab/CODA-19-exp. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: BioNLP workshop 2023

arXiv:2305.09770 [pdf, other]

ConvXAI: Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing

Authors: Hua Shen, Chieh-Yang Huang, Tongshuang Wu, Ting-Hao 'Kenneth' Huang

Abstract: Despite a surge collection of XAI methods, users still struggle to obtain required AI explanations. Previous research suggests chatbots as dynamic solutions, but the effective design of conversational XAI agents for practical human needs remains under-explored. This paper focuses on Conversational XAI for AI-assisted scientific writing tasks. Drawing from human linguistic theories and formative st… ▽ More Despite a surge collection of XAI methods, users still struggle to obtain required AI explanations. Previous research suggests chatbots as dynamic solutions, but the effective design of conversational XAI agents for practical human needs remains under-explored. This paper focuses on Conversational XAI for AI-assisted scientific writing tasks. Drawing from human linguistic theories and formative studies, we identify four design rationales: "multifaceted", "controllability", "mix-initiative", "context-aware drill-down". We incorporate them into an interactive prototype, ConvXAI, which facilitates heterogeneous AI explanations for scientific writing through dialogue. In two studies with 21 users, ConvXAI outperforms a GUI-based baseline on improving human-perceived understanding and writing improvement. The paper further discusses the practical human usage patterns in interacting with ConvXAI for scientific co-writing. △ Less

Submitted 27 October, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: CSCW 2023 Demo. ConvXAI system code: https://github.com/huashen218/convxai.git

arXiv:2304.01002 [pdf, other]

Does Human Collaboration Enhance the Accuracy of Identifying LLM-Generated Deepfake Texts?

Authors: Adaku Uchendu, Jooyoung Lee, Hua Shen, Thai Le, Ting-Hao 'Kenneth' Huang, Dongwon Lee

Abstract: Advances in Large Language Models (e.g., GPT-4, LLaMA) have improved the generation of coherent sentences resembling human writing on a large scale, resulting in the creation of so-called deepfake texts. However, this progress poses security and privacy concerns, necessitating effective solutions for distinguishing deepfake texts from human-written ones. Although prior works studied humans' abilit… ▽ More Advances in Large Language Models (e.g., GPT-4, LLaMA) have improved the generation of coherent sentences resembling human writing on a large scale, resulting in the creation of so-called deepfake texts. However, this progress poses security and privacy concerns, necessitating effective solutions for distinguishing deepfake texts from human-written ones. Although prior works studied humans' ability to detect deepfake texts, none has examined whether "collaboration" among humans improves the detection of deepfake texts. In this study, to address this gap of understanding on deepfake texts, we conducted experiments with two groups: (1) nonexpert individuals from the AMT platform and (2) writing experts from the Upwork platform. The results demonstrate that collaboration among humans can potentially improve the detection of deepfake texts for both groups, increasing detection accuracies by 6.36% for non-experts and 12.76% for experts, respectively, compared to individuals' detection accuracies. We further analyze the explanations that humans used for detecting a piece of text as deepfake text, and find that the strongest indicator of deepfake texts is their lack of coherence and consistency. Our study provides useful insights for future tools and framework designs to facilitate the collaborative human detection of deepfake texts. The experiment datasets and AMT implementations are available at: https://github.com/huashen218/llm-deepfake-human-study.git △ Less

Submitted 9 October, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: Accepted at The 11th AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2023)

arXiv:2303.17710 [pdf, other]

What Types of Questions Require Conversation to Answer? A Case Study of AskReddit Questions

Authors: Shih-Hong Huang, Chieh-Yang Huang, Ya-Fang Lin, Ting-Hao 'Kenneth' Huang

Abstract: The proliferation of automated conversational systems such as chatbots, spoken-dialogue systems, and smart speakers, has significantly impacted modern digital life. However, these systems are primarily designed to provide answers to well-defined questions rather than to support users in exploring complex, ill-defined questions. In this paper, we aim to push the boundaries of conversational systems… ▽ More The proliferation of automated conversational systems such as chatbots, spoken-dialogue systems, and smart speakers, has significantly impacted modern digital life. However, these systems are primarily designed to provide answers to well-defined questions rather than to support users in exploring complex, ill-defined questions. In this paper, we aim to push the boundaries of conversational systems by examining the types of nebulous, open-ended questions that can best be answered through conversation. We first sampled 500 questions from one million open-ended requests posted on AskReddit, and then recruited online crowd workers to answer eight inquiries about these questions. We also performed open coding to categorize the questions into 27 different domains. We found that the issues people believe require conversation to resolve satisfactorily are highly social and personal. Our work provides insights into how future research could be geared to align with users' needs. △ Less

Submitted 3 April, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

Comments: To appear in CHI 2023 Late-Breaking Work

arXiv:2302.12324 [pdf, other]

Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization

Authors: Chieh-Yang Huang, Ting-Yao Hsu, Ryan Rossi, Ani Nenkova, Sungchul Kim, Gromit Yeuk-Yin Chan, Eunyee Koh, Clyde Lee Giles, Ting-Hao 'Kenneth' Huang

Abstract: Good figure captions help paper readers understand complex scientific figures. Unfortunately, even published papers often have poorly written captions. Automatic caption generation could aid paper writers by providing good starting captions that can be refined for better quality. Prior work often treated figure caption generation as a vision-to-language task. In this paper, we show that it can be… ▽ More Good figure captions help paper readers understand complex scientific figures. Unfortunately, even published papers often have poorly written captions. Automatic caption generation could aid paper writers by providing good starting captions that can be refined for better quality. Prior work often treated figure caption generation as a vision-to-language task. In this paper, we show that it can be more effectively tackled as a text summarization task in scientific documents. We fine-tuned PEGASUS, a pre-trained abstractive summarization model, to specifically summarize figure-referencing paragraphs (e.g., "Figure 3 shows...") into figure captions. Experiments on large-scale arXiv figures show that our method outperforms prior vision methods in both automatic and human evaluations. We further conducted an in-depth investigation focused on two key challenges: (i) the common presence of low-quality author-written captions and (ii) the lack of clear standards for good captions. Our code and data are available at: https://github.com/Crowd-AI-Lab/Generating-Figure-Captions-as-a-Text-Summarization-Task. △ Less

Submitted 11 August, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: Accepted by INLG-2023

arXiv:2302.09122 [pdf, other]

Conveying the Predicted Future to Users: A Case Study of Story Plot Prediction

Authors: Chieh-Yang Huang, Saniya Naphade, Kavya Laalasa Karanam, Ting-Hao 'Kenneth' Huang

Abstract: Creative writing is hard: Novelists struggle with writer's block daily. While automatic story generation has advanced recently, it is treated as a "toy task" for advancing artificial intelligence rather than helping people. In this paper, we create a system that produces a short description that narrates a predicted plot using existing story generation approaches. Our goal is to assist writers in… ▽ More Creative writing is hard: Novelists struggle with writer's block daily. While automatic story generation has advanced recently, it is treated as a "toy task" for advancing artificial intelligence rather than helping people. In this paper, we create a system that produces a short description that narrates a predicted plot using existing story generation approaches. Our goal is to assist writers in crafting a consistent and compelling story arc. We conducted experiments on Amazon Mechanical Turk (AMT) to examine the quality of the generated story plots in terms of consistency and storiability. The results show that short descriptions produced by our frame-enhanced GPT-2 (FGPT-2) were rated as the most consistent and storiable among all models; FGPT-2's outputs even beat some random story snippets written by humans. Next, we conducted a preliminary user study using a story continuation task where AMT workers were given access to machine-generated story plots and asked to write a follow-up story. FGPT-2 could positively affect the writing process, though people favor other baselines more. Our study shed some light on the possibilities of future creative writing support systems beyond the scope of completing sentences. Our code is available at: https://github.com/appleternity/Story-Plot-Generation. △ Less

Submitted 17 February, 2023; originally announced February 2023.

Comments: To appear in the AAAI 2023 Workshop- Creative AI Across Modalities

arXiv:2302.02463 [pdf, other]

Nationality Bias in Text Generation

Authors: Pranav Narayanan Venkit, Sanjana Gautam, Ruchi Panchanadikar, Ting-Hao 'Kenneth' Huang, Shomir Wilson

Abstract: Little attention is placed on analyzing nationality bias in language models, especially when nationality is highly used as a factor in increasing the performance of social NLP models. This paper examines how a text generation model, GPT-2, accentuates pre-existing societal biases about country-based demonyms. We generate stories using GPT-2 for various nationalities and use sensitivity analysis to… ▽ More Little attention is placed on analyzing nationality bias in language models, especially when nationality is highly used as a factor in increasing the performance of social NLP models. This paper examines how a text generation model, GPT-2, accentuates pre-existing societal biases about country-based demonyms. We generate stories using GPT-2 for various nationalities and use sensitivity analysis to explore how the number of internet users and the country's economic status impacts the sentiment of the stories. To reduce the propagation of biases through large language models (LLM), we explore the debiasing method of adversarial triggering. Our results show that GPT-2 demonstrates significant bias against countries with lower internet users, and adversarial triggering effectively reduces the same. △ Less

Submitted 14 February, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: Paper accepted in the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL2023)

arXiv:2212.03969 [pdf, other]

Too Slow to Be Useful? On Incorporating Humans in the Loop of Smart Speakers

Authors: Shih-Hong Huang, Chieh-Yang Huang, Yuxin Deng, Hua Shen, Szu-Chi Kuan, Ting-Hao 'Kenneth' Huang

Abstract: Real-time crowd-powered systems, such as Chorus/Evorus, VizWiz, and Apparition, have shown how incorporating humans into automated systems could supplement where the automatic solutions fall short. However, one unspoken bottleneck of applying such architectures to more scenarios is the longer latency of including humans in the loop of automated systems. For the applications that have hard constrai… ▽ More Real-time crowd-powered systems, such as Chorus/Evorus, VizWiz, and Apparition, have shown how incorporating humans into automated systems could supplement where the automatic solutions fall short. However, one unspoken bottleneck of applying such architectures to more scenarios is the longer latency of including humans in the loop of automated systems. For the applications that have hard constraints in turnaround times, human-operated components' longer latency and large speed variation seem to be apparent deal breakers. This paper explicates and quantifies these limitations by using a human-powered text-based backend to hold conversations with users through a voice-only smart speaker. Smart speakers must respond to users' requests within seconds, so the workers behind the scenes only have a few seconds to compose answers. We measured the end-to-end system latency and the conversation quality with eight pairs of participants, showing the challenges and superiority of such systems. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: This document is the extended technical report of the "Too Slow to Be Useful? On Incorporating Humans in the Loop of Smart Speakers" paper by the authors. The paper was accepted by the Works-in-Progress and Demonstration track of the 10th AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2022 WiP/Demo) https://youtu.be/iMDsX52VWGY

arXiv:2211.07441 [pdf, other]

Multi-VQG: Generating Engaging Questions for Multiple Images

Authors: Min-Hsuan Yeh, Vicent Chen, Ting-Hao 'Kenneth' Haung, Lun-Wei Ku

Abstract: Generating engaging content has drawn much recent attention in the NLP community. Asking questions is a natural way to respond to photos and promote awareness. However, most answers to questions in traditional question-answering (QA) datasets are factoids, which reduce individuals' willingness to answer. Furthermore, traditional visual question generation (VQG) confines the source data for questio… ▽ More Generating engaging content has drawn much recent attention in the NLP community. Asking questions is a natural way to respond to photos and promote awareness. However, most answers to questions in traditional question-answering (QA) datasets are factoids, which reduce individuals' willingness to answer. Furthermore, traditional visual question generation (VQG) confines the source data for question generation to single images, resulting in a limited ability to comprehend time-series information of the underlying event. In this paper, we propose generating engaging questions from multiple images. We present MVQG, a new dataset, and establish a series of baselines, including both end-to-end and dual-stage architectures. Results show that building stories behind the image sequence enables models to generate engaging questions, which confirms our assumption that people typically construct a picture of the event in their minds before asking questions. These results open up an exciting challenge for visual-and-language models to implicitly construct a story behind a series of photos to allow for creativity and experience sharing and hence draw attention to downstream applications. △ Less

Submitted 17 November, 2022; v1 submitted 14 November, 2022; originally announced November 2022.

Comments: In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)

arXiv:2205.09327 [pdf, other]

Let's Talk! Striking Up Conversations via Conversational Visual Question Generation

Authors: Shih-Han Chan, Tsai-Lun Yang, Yun-Wei Chu, Chi-Yang Hsu, Ting-Hao Huang, Yu-Shian Chiu, Lun-Wei Ku

Abstract: An engaging and provocative question can open up a great conversation. In this work, we explore a novel scenario: a conversation agent views a set of the user's photos (for example, from social media platforms) and asks an engaging question to initiate a conversation with the user. The existing vision-to-question models mostly generate tedious and obvious questions, which might not be ideals conve… ▽ More An engaging and provocative question can open up a great conversation. In this work, we explore a novel scenario: a conversation agent views a set of the user's photos (for example, from social media platforms) and asks an engaging question to initiate a conversation with the user. The existing vision-to-question models mostly generate tedious and obvious questions, which might not be ideals conversation starters. This paper introduces a two-phase framework that first generates a visual story for the photo set and then uses the story to produce an interesting question. The human evaluation shows that our framework generates more response-provoking questions for starting conversations than other vision-to-question baselines. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: Accepted as a full talk paper on AAAI-DEEPDIAL'21

arXiv:2204.06382 [pdf, ps, other]

Empathy-Centric Design At Scale

Authors: Andrea Mauri, Yen-Chia Hsu, Marco Brambilla, Aisling Ann O'Kane, Ting-Hao 'Kenneth' Huang, Himanshu Verma

Abstract: EmpathiCH aims at bringing together and blend different expertise to develop new research agenda in the context of "Empathy-Centric Design at Scale". The main research question is to investigate how new technologies can contribute to the elicitation of empathy across and within multiple stakeholders at scale; and how empathy can be used to design solutions to societal problems that are not only ef… ▽ More EmpathiCH aims at bringing together and blend different expertise to develop new research agenda in the context of "Empathy-Centric Design at Scale". The main research question is to investigate how new technologies can contribute to the elicitation of empathy across and within multiple stakeholders at scale; and how empathy can be used to design solutions to societal problems that are not only effective but also balanced, inclusive, and aware of their effect on society. Through presentations, participatory sessions, and a living experiment -- where data about the peoples' interactions is collected throughout the event -- we aim to make this workshop the ideal venue to foster collaboration, build networks, and shape the future direction of "Empathy-Centric Design at Scale". △ Less

Submitted 13 April, 2022; originally announced April 2022.

Comments: accepted at Workshops at the 2022 CHI Conference on Human Factors in Computing Systems (CHI 2022)

arXiv:2203.08788 [pdf, other]

Are Shortest Rationales the Best Explanations for Human Understanding?

Authors: Hua Shen, Tongshuang Wu, Wenbo Guo, Ting-Hao 'Kenneth' Huang

Abstract: Existing self-explaining models typically favor extracting the shortest possible rationales - snippets of an input text "responsible for" corresponding output - to explain the model prediction, with the assumption that shorter rationales are more intuitive to humans. However, this assumption has yet to be validated. Is the shortest rationale indeed the most human-understandable? To answer this que… ▽ More Existing self-explaining models typically favor extracting the shortest possible rationales - snippets of an input text "responsible for" corresponding output - to explain the model prediction, with the assumption that shorter rationales are more intuitive to humans. However, this assumption has yet to be validated. Is the shortest rationale indeed the most human-understandable? To answer this question, we design a self-explaining model, LimitedInk, which allows users to extract rationales at any target length. Compared to existing baselines, LimitedInk achieves compatible end-task performance and human-annotated rationale agreement, making it a suitable representation of the recent class of self-explaining models. We use LimitedInk to conduct a user study on the impact of rationale length, where we ask human judges to predict the sentiment label of documents based only on LimitedInk-generated rationales with different lengths. We show rationales that are too short do not help humans predict labels better than randomly masked text, suggesting the need for more careful design of the best human rationales. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: To appear in ACL 2022 main conference

arXiv:2110.11624 [pdf, other]

SciCap: Generating Captions for Scientific Figures

Authors: Ting-Yao Hsu, C. Lee Giles, Ting-Hao 'Kenneth' Huang

Abstract: Researchers use figures to communicate rich, complex information in scientific papers. The captions of these figures are critical to conveying effective messages. However, low-quality figure captions commonly occur in scientific articles and may decrease understanding. In this paper, we propose an end-to-end neural framework to automatically generate informative, high-quality captions for scientif… ▽ More Researchers use figures to communicate rich, complex information in scientific papers. The captions of these figures are critical to conveying effective messages. However, low-quality figure captions commonly occur in scientific articles and may decrease understanding. In this paper, we propose an end-to-end neural framework to automatically generate informative, high-quality captions for scientific figures. To this end, we introduce SCICAP, a large-scale figure-caption dataset based on computer science arXiv papers published between 2010 and 2020. After pre-processing - including figure-type classification, sub-figure identification, text normalization, and caption text selection - SCICAP contained more than two million figures extracted from over 290,000 papers. We then established baseline models that caption graph plots, the dominant (19.2%) figure type. The experimental results showed both opportunities and steep challenges of generating captions for scientific figures. △ Less

Submitted 25 October, 2021; v1 submitted 22 October, 2021; originally announced October 2021.

Comments: To Appear in EMNLP 2021 Findings. The dataset is available at: https://github.com/tingyaohsu/SciCap

arXiv:2110.02007 [pdf, other]

doi 10.1016/j.patter.2022.100449

Empowering Local Communities Using Artificial Intelligence

Authors: Yen-Chia Hsu, Ting-Hao 'Kenneth' Huang, Himanshu Verma, Andrea Mauri, Illah Nourbakhsh, Alessandro Bozzon

Abstract: Artificial Intelligence (AI) is increasingly used to analyze large amounts of data in various practices, such as object recognition. We are specifically interested in using AI-powered systems to engage local communities in developing plans or solutions for pressing societal and environmental concerns. Such local contexts often involve multiple stakeholders with different and even contradictory age… ▽ More Artificial Intelligence (AI) is increasingly used to analyze large amounts of data in various practices, such as object recognition. We are specifically interested in using AI-powered systems to engage local communities in developing plans or solutions for pressing societal and environmental concerns. Such local contexts often involve multiple stakeholders with different and even contradictory agendas, resulting in mismatched expectations of these systems' behaviors and desired outcomes. There is a need to investigate if AI models and pipelines can work as expected in different contexts through co-creation and field deployment. Based on case studies in co-creating AI-powered systems with local people, we explain challenges that require more attention and provide viable paths to bridge AI research with citizen needs. We advocate for developing new collaboration approaches and mindsets that are needed to co-create AI-powered systems in multi-stakeholder contexts to address local concerns. △ Less

Submitted 26 April, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: This manuscript is peer-reviewed and accepted by the Patterns journal

arXiv:2109.00122 [pdf, other]

FinQA: A Dataset of Numerical Reasoning over Financial Data

Authors: Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, William Yang Wang

Abstract: The sheer volume of financial statements makes it difficult for humans to access and analyze a business's financials. Robust numerical reasoning likewise faces unique challenges in this domain. In this work, we focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. In contrast to existing tasks on general domain, the finance… ▽ More The sheer volume of financial statements makes it difficult for humans to access and analyze a business's financials. Robust numerical reasoning likewise faces unique challenges in this domain. In this work, we focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. In contrast to existing tasks on general domain, the finance domain includes complex numerical reasoning and understanding of heterogeneous representations. To facilitate analytical progress, we propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. We also annotate the gold reasoning programs to ensure full explainability. We further introduce baselines and conduct comprehensive experiments in our dataset. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge and in complex multi-step numerical reasoning on that knowledge. Our dataset -- the first of its kind -- should therefore enable significant, new community research into complex application domains. The dataset and code are publicly available\url{https://github.com/czyssrs/FinQA}. △ Less

Submitted 7 May, 2022; v1 submitted 31 August, 2021; originally announced September 2021.

Comments: EMNLP 2021

arXiv:2106.12027 [pdf, other]

ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences

Authors: Yanjun Gao, Ting-hao Huang, Rebecca J. Passonneau

Abstract: Atomic clauses are fundamental text units for understanding complex sentences. Identifying the atomic sentences within complex sentences is important for applications such as summarization, argument mining, discourse analysis, discourse parsing, and question answering. Previous work mainly relies on rule-based methods dependent on parsing. We propose a new task to decompose each complex sentence i… ▽ More Atomic clauses are fundamental text units for understanding complex sentences. Identifying the atomic sentences within complex sentences is important for applications such as summarization, argument mining, discourse analysis, discourse parsing, and question answering. Previous work mainly relies on rule-based methods dependent on parsing. We propose a new task to decompose each complex sentence into simple sentences derived from the tensed clauses in the source, and a novel problem formulation as a graph edit task. Our neural model learns to Accept, Break, Copy or Drop elements of a graph that combines word adjacency and grammatical dependencies. The full processing pipeline includes modules for graph construction, graph editing, and sentence generation from the output graph. We introduce DeSSE, a new dataset designed to train and evaluate complex sentence decomposition, and MinWiki, a subset of MinWikiSplit. ABCD achieves comparable performance as two parsing baselines on MinWiki. On DeSSE, which has a more even balance of complex sentence types, our model achieves higher accuracy on the number of atomic sentences than an encoder-decoder baseline. Results include a detailed error analysis. △ Less

Submitted 22 June, 2021; originally announced June 2021.

Comments: To appear in the proceeding of 59th Annual Meeting of the Association for Computational Linguistics (ACL 2021) Main Conference

arXiv:2105.06950 [pdf, other]

Plot and Rework: Modeling Storylines for Visual Storytelling

Authors: Chi-Yang Hsu, Yun-Wei Chu, Ting-Hao 'Kenneth' Huang, Lun-Wei Ku

Abstract: Writing a coherent and engaging story is not easy. Creative writers use their knowledge and worldview to put disjointed elements together to form a coherent storyline, and work and rework iteratively toward perfection. Automated visual storytelling (VIST) models, however, make poor use of external knowledge and iterative generation when attempting to create stories. This paper introduces PR-VIST,… ▽ More Writing a coherent and engaging story is not easy. Creative writers use their knowledge and worldview to put disjointed elements together to form a coherent storyline, and work and rework iteratively toward perfection. Automated visual storytelling (VIST) models, however, make poor use of external knowledge and iterative generation when attempting to create stories. This paper introduces PR-VIST, a framework that represents the input image sequence as a story graph in which it finds the best path to form a storyline. PR-VIST then takes this path and learns to generate the final story via an iterative training process. This framework produces stories that are superior in terms of diversity, coherence, and humanness, per both automatic and human evaluations. An ablation study shows that both plotting and reworking contribute to the model's superiority. △ Less

Submitted 7 July, 2021; v1 submitted 14 May, 2021; originally announced May 2021.

Comments: 9 pages, ACL-IJCNLP 2021 Findings

arXiv:2104.05604 [pdf, other]

Semantic Frame Forecast

Authors: Chieh-Yang Huang, Ting-Hao 'Kenneth' Huang

Abstract: This paper introduces semantic frame forecast, a task that predicts the semantic frames that will occur in the next 10, 100, or even 1,000 sentences in a running story. Prior work focused on predicting the immediate future of a story, such as one to a few sentences ahead. However, when novelists write long stories, generating a few sentences is not enough to help them gain high-level insight to de… ▽ More This paper introduces semantic frame forecast, a task that predicts the semantic frames that will occur in the next 10, 100, or even 1,000 sentences in a running story. Prior work focused on predicting the immediate future of a story, such as one to a few sentences ahead. However, when novelists write long stories, generating a few sentences is not enough to help them gain high-level insight to develop the follow-up story. In this paper, we formulate a long story as a sequence of "story blocks," where each block contains a fixed number of sentences (e.g., 10, 100, or 200). This formulation allows us to predict the follow-up story arc beyond the scope of a few sentences. We represent a story block using the term frequencies (TF) of semantic frames in it, normalized by each frame's inverse document frequency (IDF). We conduct semantic frame forecast experiments on 4,794 books from the Bookcorpus and 7,962 scientific abstracts from CODA-19, with block sizes ranging from 5 to 1,000 sentences. The results show that automated models can forecast the follow-up story blocks better than the random, prior, and replay baselines, indicating the task's feasibility. We also learn that the models using the frame representation as features outperform all the existing approaches when the block size is over 150 sentences. The human evaluation also shows that the proposed frame representation, when visualized as word clouds, is comprehensible, representative, and specific to humans. Our code is available at https://github.com/appleternity/FrameForecasting. △ Less

Submitted 12 April, 2021; originally announced April 2021.

Comments: 9 pages, NAACL 2021

arXiv:2103.14973 [pdf, other]

Explaining the Road Not Taken

Authors: Hua Shen, Ting-Hao 'Kenneth' Huang

Abstract: It is unclear if existing interpretations of deep neural network models respond effectively to the needs of users. This paper summarizes the common forms of explanations (such as feature attribution, decision rules, or probes) used in over 200 recent papers about natural language processing (NLP), and compares them against user questions collected in the XAI Question Bank. We found that although u… ▽ More It is unclear if existing interpretations of deep neural network models respond effectively to the needs of users. This paper summarizes the common forms of explanations (such as feature attribution, decision rules, or probes) used in over 200 recent papers about natural language processing (NLP), and compares them against user questions collected in the XAI Question Bank. We found that although users are interested in explanations for the road not taken -- namely, why the model chose one result and not a well-defined, seemly similar legitimate counterpart -- most model interpretations cannot answer these questions. △ Less

Submitted 30 March, 2021; v1 submitted 27 March, 2021; originally announced March 2021.

Comments: Accepted by The 2021 ACM CHI Workshop on Operationalizing Human-Centered Perspectives in Explainable AI (CHI 2021 HCXAI Workshop). For associated website, see https://human-centered-exnlp.github.io

arXiv:2010.02179 [pdf, other]

Assessing the Helpfulness of Learning Materials with Inference-Based Learner-Like Agent

Authors: Yun-Hsuan Jen, Chieh-Yang Huang, Mei-Hua Chen, Ting-Hao 'Kenneth' Huang, Lun-Wei Ku

Abstract: Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs.little; briefly vs.shortly) correctly, and often look for example sentences to learn how two nearly synonymous terms differ. Prior work uses hand-crafted scores to recommend sentences but has difficulty in adopting such scores to all the near-synonyms as near-synonyms differ in various ways. We notice… ▽ More Many English-as-a-second language learners have trouble using near-synonym words (e.g., small vs.little; briefly vs.shortly) correctly, and often look for example sentences to learn how two nearly synonymous terms differ. Prior work uses hand-crafted scores to recommend sentences but has difficulty in adopting such scores to all the near-synonyms as near-synonyms differ in various ways. We notice that the helpfulness of the learning material would reflect on the learners' performance. Thus, we propose the inference-based learner-like agent to mimic learner behavior and identify good learning materials by examining the agent's performance. To enable the agent to behave like a learner, we leverage entailment modeling's capability of inferring answers from the provided materials. Experimental results show that the proposed agent is equipped with good learner-like behavior to achieve the best performance in both fill-in-the-blank (FITB) and good example sentence selection tasks. We further conduct a classroom user study with college ESL learners. The results of the user study show that the proposed agent can find out example sentences that help students learn more easily and efficiently. Compared to other models, the proposed agent improves the score of more than 17% of students after learning. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: 9 pages, to appear in EMNLP 2020 as a long paper

arXiv:2008.11721 [pdf, other]

How Useful Are the Machine-Generated Interpretations to General Users? A Human Evaluation on Guessing the Incorrectly Predicted Labels

Authors: Hua Shen, Ting-Hao Kenneth Huang

Abstract: Explaining to users why automated systems make certain mistakes is important and challenging. Researchers have proposed ways to automatically produce interpretations for deep neural network models. However, it is unclear how useful these interpretations are in helping users figure out why they are getting an error. If an interpretation effectively explains to users how the underlying deep neural n… ▽ More Explaining to users why automated systems make certain mistakes is important and challenging. Researchers have proposed ways to automatically produce interpretations for deep neural network models. However, it is unclear how useful these interpretations are in helping users figure out why they are getting an error. If an interpretation effectively explains to users how the underlying deep neural network model works, people who were presented with the interpretation should be better at predicting the model's outputs than those who were not. This paper presents an investigation on whether or not showing machine-generated visual interpretations helps users understand the incorrectly predicted labels produced by image classifiers. We showed the images and the correct labels to 150 online crowd workers and asked them to select the incorrectly predicted labels with or without showing them the machine-generated visual interpretations. The results demonstrated that displaying the visual interpretations did not increase, but rather decreased, the average guessing accuracy by roughly 10%. △ Less

Submitted 27 August, 2020; v1 submitted 26 August, 2020; originally announced August 2020.

Comments: Accepted by The 8th AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2020) https://github.com/huashen218/GuessWrongLabel

arXiv:2005.06111 [pdf, other]

Project RISE: Recognizing Industrial Smoke Emissions

Authors: Yen-Chia Hsu, Ting-Hao 'Kenneth' Huang, Ting-Yao Hu, Paul Dille, Sean Prendi, Ryan Hoffman, Anastasia Tsuhlares, Jessica Pachuta, Randy Sargent, Illah Nourbakhsh

Abstract: Industrial smoke emissions pose a significant concern to human health. Prior works have shown that using Computer Vision (CV) techniques to identify smoke as visual evidence can influence the attitude of regulators and empower citizens to pursue environmental justice. However, existing datasets are not of sufficient quality nor quantity to train the robust CV models needed to support air quality a… ▽ More Industrial smoke emissions pose a significant concern to human health. Prior works have shown that using Computer Vision (CV) techniques to identify smoke as visual evidence can influence the attitude of regulators and empower citizens to pursue environmental justice. However, existing datasets are not of sufficient quality nor quantity to train the robust CV models needed to support air quality advocacy. We introduce RISE, the first large-scale video dataset for Recognizing Industrial Smoke Emissions. We adopted a citizen science approach to collaborate with local community members to annotate whether a video clip has smoke emissions. Our dataset contains 12,567 clips from 19 distinct views from cameras that monitored three industrial facilities. These daytime clips span 30 days over two years, including all four seasons. We ran experiments using deep neural networks to establish a strong performance baseline and reveal smoke recognition challenges. Our survey study discussed community feedback, and our data analysis displayed opportunities for integrating citizen scientists and crowd workers into the application of Artificial Intelligence for Social Impact. △ Less

Submitted 29 April, 2024; v1 submitted 12 May, 2020; originally announced May 2020.

Comments: Accepted by AAAI 2021

arXiv:2005.02367 [pdf, other]

CODA-19: Using a Non-Expert Crowd to Annotate Research Aspects on 10,000+ Abstracts in the COVID-19 Open Research Dataset

Authors: Ting-Hao 'Kenneth' Huang, Chieh-Yang Huang, Chien-Kuang Cornelia Ding, Yen-Chia Hsu, C. Lee Giles

Abstract: This paper introduces CODA-19, a human-annotated dataset that codes the Background, Purpose, Method, Finding/Contribution, and Other sections of 10,966 English abstracts in the COVID-19 Open Research Dataset. CODA-19 was created by 248 crowd workers from Amazon Mechanical Turk within 10 days, and achieved labeling quality comparable to that of experts. Each abstract was annotated by nine different… ▽ More This paper introduces CODA-19, a human-annotated dataset that codes the Background, Purpose, Method, Finding/Contribution, and Other sections of 10,966 English abstracts in the COVID-19 Open Research Dataset. CODA-19 was created by 248 crowd workers from Amazon Mechanical Turk within 10 days, and achieved labeling quality comparable to that of experts. Each abstract was annotated by nine different workers, and the final labels were acquired by majority vote. The inter-annotator agreement (Cohen's kappa) between the crowd and the biomedical expert (0.741) is comparable to inter-expert agreement (0.788). CODA-19's labels have an accuracy of 82.2% when compared to the biomedical expert's labels, while the accuracy between experts was 85.0%. Reliable human annotations help scientists access and integrate the rapidly accelerating coronavirus literature, and also serve as the battery of AI/NLP research, but obtaining expert annotations can be slow. We demonstrated that a non-expert crowd can be rapidly employed at scale to join the fight against COVID-19. △ Less

Submitted 17 September, 2020; v1 submitted 5 May, 2020; originally announced May 2020.

Comments: Accepted by the NLP COVID-19 Workshop at ACL 2020. (The data, code, and model are available at: https://github.com/windx0303/CODA-19)

arXiv:2001.04519 [pdf, other]

doi 10.1145/3313831.3376715

Heteroglossia: In-Situ Story Ideation with the Crowd

Authors: Chieh-Yang Huang, Shih-Hong Huang, Ting-Hao 'Kenneth' Huang

Abstract: Ideation is essential for creative writing. Many authors struggle to come up with ideas throughout the writing process, yet modern writing tools fail to provide on-the-spot assistance for writers when they get stuck. This paper introduces Heteroglossia, an add-on for Google Docs that allows writers to elicit story ideas from the online crowd using their text editors. Writers can share snippets of… ▽ More Ideation is essential for creative writing. Many authors struggle to come up with ideas throughout the writing process, yet modern writing tools fail to provide on-the-spot assistance for writers when they get stuck. This paper introduces Heteroglossia, an add-on for Google Docs that allows writers to elicit story ideas from the online crowd using their text editors. Writers can share snippets of their working drafts and ask the crowd to provide follow-up story ideas based on it. Heteroglossia employs a strategy called "role play", where each worker is assigned a fictional character in a story and asked to brainstorm plot ideas from that character's perspective. Our deployment with two experienced story writers shows that Heteroglossia is easy to use and can generate interesting ideas. Heteroglossia allows us to gain insight into how future technologies can be developed to support ideation in creative writing △ Less

Submitted 15 January, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

Comments: Accepted by CHI 2020. Video Promotion: https://www.youtube.com/watch?v=i0G-tq3d8c0

ACM Class: H.5; H.4; I.7

arXiv:1912.11936 [pdf, other]

Smell Pittsburgh: Engaging Community Citizen Science for Air Quality

Authors: Yen-Chia Hsu, Jennifer Cross, Paul Dille, Michael Tasota, Beatrice Dias, Randy Sargent, Ting-Hao 'Kenneth' Huang, Illah Nourbakhsh

Abstract: Urban air pollution has been linked to various human health concerns, including cardiopulmonary diseases. Communities who suffer from poor air quality often rely on experts to identify pollution sources due to the lack of accessible tools. Taking this into account, we developed Smell Pittsburgh, a system that enables community members to report odors and track where these odors are frequently conc… ▽ More Urban air pollution has been linked to various human health concerns, including cardiopulmonary diseases. Communities who suffer from poor air quality often rely on experts to identify pollution sources due to the lack of accessible tools. Taking this into account, we developed Smell Pittsburgh, a system that enables community members to report odors and track where these odors are frequently concentrated. All smell report data are publicly accessible online. These reports are also sent to the local health department and visualized on a map along with air quality data from monitoring stations. This visualization provides a comprehensive overview of the local pollution landscape. Additionally, with these reports and air quality data, we developed a model to predict upcoming smell events and send push notifications to inform communities. We also applied regression analysis to identify statistically significant effects of push notifications on user engagement. Our evaluation of this system demonstrates that engaging residents in documenting their experiences with pollution odors can help identify local air pollution patterns, and can empower communities to advocate for better air quality. All citizen-contributed smell data are publicly accessible and can be downloaded from https://smellpgh.org. △ Less

Submitted 20 November, 2020; v1 submitted 26 December, 2019; originally announced December 2019.

Comments: Accepted by ACM Transactions on Interactive Intelligent Systems on 2020. This is an extended version of the arXiv:1810.11143, which was accepted by the ACM IUI 2019 conference. arXiv admin note: substantial text overlap with arXiv:1810.11143

arXiv:1912.01496 [pdf, other]

Knowledge-Enriched Visual Storytelling

Authors: Chao-Chun Hsu, Zi-Yuan Chen, Chi-Yang Hsu, Chih-Chia Li, Tzu-Yuan Lin, Ting-Hao 'Kenneth' Huang, Lun-Wei Ku

Abstract: Stories are diverse and highly personalized, resulting in a large possible output space for story generation. Existing end-to-end approaches produce monotonous stories because they are limited to the vocabulary and knowledge in a single training dataset. This paper introduces KG-Story, a three-stage framework that allows the story generation model to take advantage of external Knowledge Graphs to… ▽ More Stories are diverse and highly personalized, resulting in a large possible output space for story generation. Existing end-to-end approaches produce monotonous stories because they are limited to the vocabulary and knowledge in a single training dataset. This paper introduces KG-Story, a three-stage framework that allows the story generation model to take advantage of external Knowledge Graphs to produce interesting stories. KG-Story distills a set of representative words from the input prompts, enriches the word set by using external knowledge graphs, and finally generates stories based on the enriched word set. This distill-enrich-generate framework allows the use of external resources not only for the enrichment phase, but also for the distillation and generation phases. In this paper, we show the superiority of KG-Story for visual storytelling, where the input prompt is a sequence of five photos and the output is a short story. Per the human ranking evaluation, stories generated by KG-Story are on average ranked better than that of the state-of-the-art systems. Our code and output stories are available at https://github.com/zychen423/KE-VIST. △ Less

Submitted 3 December, 2019; originally announced December 2019.

Comments: AAAI 2020

arXiv:1910.09621 [pdf, other]

On Automating Conversations

Authors: Ting-Hao 'Kenneth' Huang

Abstract: From 2016 to 2018, we developed and deployed Chorus, a system that blends real-time human computation with artificial intelligence (AI) and has real-world, open conversations with users. We took a top-down approach that started with a working crowd-powered system, Chorus, and then created a framework, Evorus, that enables Chorus to automate itself over time. Over our two-year deployment, more than… ▽ More From 2016 to 2018, we developed and deployed Chorus, a system that blends real-time human computation with artificial intelligence (AI) and has real-world, open conversations with users. We took a top-down approach that started with a working crowd-powered system, Chorus, and then created a framework, Evorus, that enables Chorus to automate itself over time. Over our two-year deployment, more than 420 users talked with Chorus, having over 2,200 conversation sessions. This line of work demonstrated how a crowd-powered conversational assistant can be automated over time, and more importantly, how such a system can be deployed to talk with real users to help them with their everyday tasks. This position paper discusses two sets of challenges that we explored during the development and deployment of Chorus and Evorus: the challenges that come from being an "agent" and those that arise from the subset of conversations that are more difficult to automate. △ Less

Submitted 24 October, 2019; v1 submitted 21 October, 2019; originally announced October 2019.

Comments: An invited position paper at the "Artificial Intelligence and Work: AAAI 2019 Fall Symposium" (AAAI-FSS 2019), Washington, DC, November 7-9, 2019

arXiv:1910.08814 [pdf, ps, other]

On Using Chatbots to Promote Smoking Cessation Among Adolescents of Low Socioeconomic Status

Authors: Patricia Simon, Suchitra Krishnan-Sarin, Ting-Hao 'Kenneth' Huang

Abstract: Reducing youth tobacco use is critical for improving child health since tobacco use is associated with respiratory problems, and nicotine may interfere with healthy brain development. While tobacco regulation has contributed to declines in cigarette use among youth, these declines have occurred more quickly for youth of high socioeconomic status (SES) compared to youth of low SES. A major barrier… ▽ More Reducing youth tobacco use is critical for improving child health since tobacco use is associated with respiratory problems, and nicotine may interfere with healthy brain development. While tobacco regulation has contributed to declines in cigarette use among youth, these declines have occurred more quickly for youth of high socioeconomic status (SES) compared to youth of low SES. A major barrier to smoking cessation for adolescents of low SES is coordination of access and transportation to in-person treatment sessions. Low-SES youth may have family obligations that limit their ability to access in-person treatment. At the same time, mobile use among adolescents is high: 85% have smartphones. Additionally, adolescents engage in texting at high rates, suggesting that they are well-suited for mobile instant messaging interventions. Mobile interventions have shown promise for youth, but their use remains low. Thus, more research is needed to develop effective and engaging mobile interventions to increase quit rates. In this paper, we provide a brief review of approaches to adolescent smoking cessation and describe the promise of chatbots for smoking cessation. △ Less

Submitted 19 October, 2019; originally announced October 2019.

Comments: Selected for round-table discussion in Artificial Intelligence and Work: AAAI 2019 Fall Symposium (AAAI FSS 2019)

arXiv:1909.05725 [pdf, other]

doi 10.15346/hc.v6i1.7

InstructableCrowd: Creating IF-THEN Rules for Smartphones via Conversations with the Crowd

Authors: Ting-Hao 'Kenneth' Huang, Amos Azaria, Oscar J. Romero, Jeffrey P. Bigham

Abstract: Natural language interfaces have become a common part of modern digital life. Chatbots utilize text-based conversations to communicate with users; personal assistants on smartphones such as Google Assistant take direct speech commands from their users; and speech-controlled devices such as Amazon Echo use voice as their only input mode. In this paper, we introduce InstructableCrowd, a crowd-powere… ▽ More Natural language interfaces have become a common part of modern digital life. Chatbots utilize text-based conversations to communicate with users; personal assistants on smartphones such as Google Assistant take direct speech commands from their users; and speech-controlled devices such as Amazon Echo use voice as their only input mode. In this paper, we introduce InstructableCrowd, a crowd-powered system that allows users to program their devices via conversation. The user verbally expresses a problem to the system, in which a group of crowd workers collectively respond and program relevant multi-part IF-THEN rules to help the user. The IF-THEN rules generated by InstructableCrowd connect relevant sensor combinations (e.g., location, weather, device acceleration, etc.) to useful effectors (e.g., text messages, device alarms, etc.). Our study showed that non-programmers can use the conversational interface of InstructableCrowd to create IF-THEN rules that have similar quality compared with the rules created manually. InstructableCrowd generally illustrates how users may converse with their devices, not only to trigger simple voice commands, but also to personalize their increasingly powerful and complicated devices. △ Less

Submitted 12 September, 2019; originally announced September 2019.

Comments: Published at Human Computation (2019) 6:1:113-146

Journal ref: Human Computation (2019) 6:1:113-146

arXiv:1906.01764 [pdf, other]

Visual Story Post-Editing

Authors: Ting-Yao Hsu, Chieh-Yang Huang, Yen-Chia Hsu, Ting-Hao 'Kenneth' Huang

Abstract: We introduce the first dataset for human edits of machine-generated visual stories and explore how these collected edits may be used for the visual story post-editing task. The dataset, VIST-Edit, includes 14,905 human edited versions of 2,981 machine-generated visual stories. The stories were generated by two state-of-the-art visual storytelling models, each aligned to 5 human-edited versions. We… ▽ More We introduce the first dataset for human edits of machine-generated visual stories and explore how these collected edits may be used for the visual story post-editing task. The dataset, VIST-Edit, includes 14,905 human edited versions of 2,981 machine-generated visual stories. The stories were generated by two state-of-the-art visual storytelling models, each aligned to 5 human-edited versions. We establish baselines for the task, showing how a relatively small set of human edits can be leveraged to boost the performance of large visual storytelling models. We also discuss the weak correlation between automatic evaluation scores and human ratings, motivating the need for new automatic metrics. △ Less

Submitted 4 June, 2019; originally announced June 2019.

Comments: Accepted by ACL 2019

arXiv:1903.02230 [pdf, other]

doi 10.1145/3308558.3314131

Dixit: Interactive Visual Storytelling via Term Manipulation

Authors: Chao-Chun Hsu, Yu-Hua Chen, Zi-Yuan Chen, Hsin-Yu Lin, Ting-Hao 'Kenneth' Huang, Lun-Wei Ku

Abstract: In this paper, we introduce Dixit, an interactive visual storytelling system that the user interacts with iteratively to compose a short story for a photo sequence. The user initiates the process by uploading a sequence of photos. Dixit first extracts text terms from each photo which describe the objects (e.g., boy, bike) or actions (e.g., sleep) in the photo, and then allows the user to add new t… ▽ More In this paper, we introduce Dixit, an interactive visual storytelling system that the user interacts with iteratively to compose a short story for a photo sequence. The user initiates the process by uploading a sequence of photos. Dixit first extracts text terms from each photo which describe the objects (e.g., boy, bike) or actions (e.g., sleep) in the photo, and then allows the user to add new terms or remove existing terms. Dixit then generates a short story based on these terms. Behind the scenes, Dixit uses an LSTM-based model trained on image caption data and FrameNet to distill terms from each image and utilizes a transformer decoder to compose a context-coherent story. Users change images or terms iteratively with Dixit to create the most ideal story. Dixit also allows users to manually edit and rate stories. The proposed procedure opens up possibilities for interpretable and controllable visual storytelling, allowing users to understand the story formation rationale and to intervene in the generation process. △ Less

Submitted 31 May, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

Comments: WWW'19 Demo, demo video: https://www.youtube.com/watch?v=CUu1MOwnveI

Showing 1–50 of 62 results for author: Ting-Hao