-
Curatr: A Platform for Semantic Analysis and Curation of Historical Literary Texts
Authors:
Susan Leavy,
Gerardine Meaney,
Karen Wade,
Derek Greene
Abstract:
The increasing availability of digital collections of historical and contemporary literature presents a wealth of possibilities for new research in the humanities. The scale and diversity of such collections however, presents particular challenges in identifying and extracting relevant content. This paper presents Curatr, an online platform for the exploration and curation of literature with machi…
▽ More
The increasing availability of digital collections of historical and contemporary literature presents a wealth of possibilities for new research in the humanities. The scale and diversity of such collections however, presents particular challenges in identifying and extracting relevant content. This paper presents Curatr, an online platform for the exploration and curation of literature with machine learning-supported semantic search, designed within the context of digital humanities scholarship. The platform provides a text mining workflow that combines neural word embeddings with expert domain knowledge to enable the generation of thematic lexicons, allowing researches to curate relevant sub-corpora from a large corpus of 18th and 19th century digitised texts.
△ Less
Submitted 13 June, 2023;
originally announced June 2023.
-
Supporting Serendipity: Opportunities and Challenges for Human-AI Collaboration in Qualitative Analysis
Authors:
Jialun Aaron Jiang,
Kandrea Wade,
Casey Fiesler,
Jed R. Brubaker
Abstract:
Qualitative inductive methods are widely used in CSCW and HCI research for their ability to generatively discover deep and contextualized insights, but these inherently manual and human-resource-intensive processes are often infeasible for analyzing large corpora. Researchers have been increasingly interested in ways to apply qualitative methods to "big" data problems, hoping to achieve more gener…
▽ More
Qualitative inductive methods are widely used in CSCW and HCI research for their ability to generatively discover deep and contextualized insights, but these inherently manual and human-resource-intensive processes are often infeasible for analyzing large corpora. Researchers have been increasingly interested in ways to apply qualitative methods to "big" data problems, hoping to achieve more generalizable results from larger amounts of data while preserving the depth and richness of qualitative methods. In this paper, we describe a study of qualitative researchers' work practices and their challenges, with an eye towards whether this is an appropriate domain for human-AI collaboration and what successful collaborations might entail. Our findings characterize participants' diverse methodological practices and nuanced collaboration dynamics, and identify areas where they might benefit from AI-based tools. While participants highlight the messiness and uncertainty of qualitative inductive analysis, they still want full agency over the process and believe that AI should not interfere. Our study provides a deep investigation of task delegability in human-AI collaboration in the context of qualitative analysis, and offers directions for the design of AI assistance that honor serendipity, human agency, and ambiguity.
△ Less
Submitted 6 February, 2021;
originally announced February 2021.
-
Mitigating Gender Bias in Machine Learning Data Sets
Authors:
Susan Leavy,
Gerardine Meaney,
Karen Wade,
Derek Greene
Abstract:
Artificial Intelligence has the capacity to amplify and perpetuate societal biases and presents profound ethical implications for society. Gender bias has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, in…
▽ More
Artificial Intelligence has the capacity to amplify and perpetuate societal biases and presents profound ethical implications for society. Gender bias has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, integrating concepts of fairness to machine learning and performing more rigorous analysis of training data. Mitigating bias when algorithms are trained on textual data is particularly challenging given the complex way gender ideology is embedded in language. This paper proposes a framework for the identification of gender bias in training data for machine learning.The work draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias from training data and critically assessing its impact.
△ Less
Submitted 18 May, 2020; v1 submitted 14 May, 2020;
originally announced May 2020.