Search | arXiv e-print repository

Situated Epistemic Infrastructures: A Diagnostic Framework for Post-Coherence Knowledge

Abstract: Large Language Models (LLMs) such as ChatGPT have rendered visible the fragility of contemporary knowledge infrastructures by simulating coherence while bypassing traditional modes of citation, authority, and validation. This paper introduces the Situated Epistemic Infrastructures (SEI) framework as a diagnostic tool for analyzing how knowledge becomes authoritative across hybrid human-machine sys… ▽ More Large Language Models (LLMs) such as ChatGPT have rendered visible the fragility of contemporary knowledge infrastructures by simulating coherence while bypassing traditional modes of citation, authority, and validation. This paper introduces the Situated Epistemic Infrastructures (SEI) framework as a diagnostic tool for analyzing how knowledge becomes authoritative across hybrid human-machine systems under post-coherence conditions. Rather than relying on stable scholarly domains or bounded communities of practice, SEI traces how credibility is mediated across institutional, computational, and temporal arrangements. Integrating insights from infrastructure studies, platform theory, and epistemology, the framework foregrounds coordination over classification, emphasizing the need for anticipatory and adaptive models of epistemic stewardship. The paper contributes to debates on AI governance, knowledge production, and the ethical design of information systems by offering a robust alternative to representationalist models of scholarly communication. △ Less

Submitted 12 August, 2025; v1 submitted 6 August, 2025; originally announced August 2025.

Comments: 22 pages including references. Draft prepared for submission to Science, Technology & Human Values

ACM Class: K.4.1; K.3; K.2

arXiv:2506.05636 [pdf, other]

Bayesian Inference for Correlated Human Experts and Classifiers

Authors: Markelle Kelly, Alex Boyd, Sam Showalter, Mark Steyvers, Padhraic Smyth

Abstract: Applications of machine learning often involve making predictions based on both model outputs and the opinions of human experts. In this context, we investigate the problem of querying experts for class label predictions, using as few human queries as possible, and leveraging the class probability estimates of pre-trained classifiers. We develop a general Bayesian framework for this problem, model… ▽ More Applications of machine learning often involve making predictions based on both model outputs and the opinions of human experts. In this context, we investigate the problem of querying experts for class label predictions, using as few human queries as possible, and leveraging the class probability estimates of pre-trained classifiers. We develop a general Bayesian framework for this problem, modeling expert correlation via a joint latent representation, enabling simulation-based inference about the utility of additional expert queries, as well as inference of posterior distributions over unobserved expert labels. We apply our approach to two real-world medical classification problems, as well as to CIFAR-10H and ImageNet-16H, demonstrating substantial reductions relative to baselines in the cost of querying human experts while maintaining high prediction accuracy. △ Less

Submitted 5 June, 2025; originally announced June 2025.

Comments: accepted to ICML 2025

arXiv:2506.05390 [pdf, other]

doi 10.1145/3715275.3732169

Understanding Gender Bias in AI-Generated Product Descriptions

Authors: Markelle Kelly, Mohammad Tahaei, Padhraic Smyth, Lauren Wilcox

Abstract: While gender bias in large language models (LLMs) has been extensively studied in many domains, uses of LLMs in e-commerce remain largely unexamined and may reveal novel forms of algorithmic bias and harm. Our work investigates this space, developing data-driven taxonomic categories of gender bias in the context of product description generation, which we situate with respect to existing general p… ▽ More While gender bias in large language models (LLMs) has been extensively studied in many domains, uses of LLMs in e-commerce remain largely unexamined and may reveal novel forms of algorithmic bias and harm. Our work investigates this space, developing data-driven taxonomic categories of gender bias in the context of product description generation, which we situate with respect to existing general purpose harms taxonomies. We illustrate how AI-generated product descriptions can uniquely surface gender biases in ways that require specialized detection and mitigation approaches. Further, we quantitatively analyze issues corresponding to our taxonomic categories in two models used for this task -- GPT-3.5 and an e-commerce-specific LLM -- demonstrating that these forms of bias commonly occur in practice. Our results illuminate unique, under-explored dimensions of gender bias, such as assumptions about clothing size, stereotypical bias in which features of a product are advertised, and differences in the use of persuasive language. These insights contribute to our understanding of three types of AI harms identified by current frameworks: exclusionary norms, stereotyping, and performance disparities, particularly for the context of e-commerce. △ Less

Submitted 3 June, 2025; originally announced June 2025.

Comments: Accepted to FAccT 2025

arXiv:2502.01525 [pdf, ps, other]

Archiving and Replaying Current Web Advertisements: Challenges and Opportunities

Authors: Travis Reid, Alex H. Poole, Hyung Wook Choi, Christopher Rauch, Mat Kelly, Michael L. Nelson, Michele C. Weigle

Abstract: Although web advertisements represent an inimitable part of digital cultural heritage, serious archiving and replay challenges persist. To explore these challenges, we created a dataset of 279 archived ads. We encountered five problems in archiving and replaying them. For one, prior to August 2023, Internet Archive's Save Page Now service excluded not only well-known ad services' ads, but also URL… ▽ More Although web advertisements represent an inimitable part of digital cultural heritage, serious archiving and replay challenges persist. To explore these challenges, we created a dataset of 279 archived ads. We encountered five problems in archiving and replaying them. For one, prior to August 2023, Internet Archive's Save Page Now service excluded not only well-known ad services' ads, but also URLs with ad related file and directory names. Although after August 2023, Save Page Now still blocked the archiving of ads loaded on a web page, it permitted the archiving of an ad's resources if the user directly archived the URL(s) associated with the ad. Second, Brozzler's incompatibility with Chrome prevented ads from being archived. Third, during crawling and replay sessions, Google's and Amazon's ad scripts generated URLs with different random values. This precluded archived ads' replay. Updating replay systems' fuzzy matching approach should enable the replay of these ads. Fourth, when loading Flashtalking web page ads outside of ad iframes, the ad script requested a non-existent URL. This, prevented the replay of ad resources. But as was the case with Google and Amazon ads, updating replay systems' fuzzy matching approach should enable Flashtalking ads' replay. Finally, successful replay of ads loaded in iframes with the src attribute of "about:blank" depended upon a given browser's service worker implementation. A Chromium bug stopped service workers from accessing resources inside of this type of iframe, which in turn prevented replay. Replacing the "about:blank" value for the iframe's src attribute with a blob URL before an ad was loaded solved this problem. Resolving these replay problems will improve the replay of ads and other dynamically loaded embedded web resources that use random values or "about:blank" iframes. △ Less

Submitted 22 September, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

arXiv:2501.09951 [pdf, other]

Discord's Design Encourages "Third Place" Social Media Experiences

Authors: JaeWon Kim, Thea Klein-Balajee, Ryan M. Kelly, Alexis Hiniker

Abstract: In light of the diminishing presence of physical third places -- informal gathering spaces essential for social connection -- this study explores how the social media platform Discord fosters third-place experiences. Drawing on Oldenburg's conceptual framework, we analyze how Discord's design elements support the creation of virtual third places that foster both dyadic and community-based relation… ▽ More In light of the diminishing presence of physical third places -- informal gathering spaces essential for social connection -- this study explores how the social media platform Discord fosters third-place experiences. Drawing on Oldenburg's conceptual framework, we analyze how Discord's design elements support the creation of virtual third places that foster both dyadic and community-based relationships. Through 25 semi-structured interviews with active Discord users, we identified 21 design elements aligned with Oldenburg's third-place characteristics. These elements cluster around four core principles: providing themed spaces for repeated interactions, supporting user autonomy and customization, facilitating mutually engaging activities, and enabling casual, low-pressure interactions. This work contributes to understanding how intentional platform design can cultivate virtual spaces that support meaningful social connections. The findings have implications for designing future social technologies that can help address growing concerns about social isolation in an increasingly digital world. △ Less

Submitted 16 January, 2025; originally announced January 2025.

arXiv:2410.24100 [pdf, other]

Benchmark Data Repositories for Better Benchmarking

Authors: Rachel Longjohn, Markelle Kelly, Sameer Singh, Padhraic Smyth

Abstract: In machine learning research, it is common to evaluate algorithms via their performance on standard benchmark datasets. While a growing body of work establishes guidelines for -- and levies criticisms at -- data and benchmarking practices in machine learning, comparatively less attention has been paid to the data repositories where these datasets are stored, documented, and shared. In this paper,… ▽ More In machine learning research, it is common to evaluate algorithms via their performance on standard benchmark datasets. While a growing body of work establishes guidelines for -- and levies criticisms at -- data and benchmarking practices in machine learning, comparatively less attention has been paid to the data repositories where these datasets are stored, documented, and shared. In this paper, we analyze the landscape of these $\textit{benchmark data repositories}$ and the role they can play in improving benchmarking. This role includes addressing issues with both datasets themselves (e.g., representational harms, construct validity) and the manner in which evaluation is carried out using such datasets (e.g., overemphasis on a few datasets and metrics, lack of reproducibility). To this end, we identify and discuss a set of considerations surrounding the design and use of benchmark data repositories, with a focus on improving benchmarking practices in machine learning. △ Less

Submitted 31 October, 2024; originally announced October 2024.

Comments: Accepted to NeurIPS Datasets and Benchmarks 2024

arXiv:2407.17579 [pdf, ps, other]

doi 10.1145/3678884.3681833

Envisioning New Futures of Positive Social Technology: Beyond Paradigms of Fixing, Protecting, and Preventing

Authors: JaeWon Kim, Lindsay Popowski, Anna Fang, Cassidy Pyle, Guo Freeman, Ryan M. Kelly, Angela Y. Lee, Fannie Liu, Angela D. R. Smith, Alexandra To, Amy X. Zhang

Abstract: Social technology research today largely focuses on mitigating the negative impacts of technology and, therefore, often misses the potential of technology to enhance human connections and well-being. However, we see a potential to shift towards a holistic view of social technology's impact on human flourishing. We introduce Positive Social Technology (Positech), a framework that shifts emphasis to… ▽ More Social technology research today largely focuses on mitigating the negative impacts of technology and, therefore, often misses the potential of technology to enhance human connections and well-being. However, we see a potential to shift towards a holistic view of social technology's impact on human flourishing. We introduce Positive Social Technology (Positech), a framework that shifts emphasis toward leveraging social technologies to support and augment human flourishing. This workshop is organized around three themes relevant to Positech: 1) "Exploring Relevant and Adjacent Research" to define and widen the Positech scope with insights from related fields, 2) "Projecting the Landscape of Positech" for participants to outline the domain's key aspects and 3) "Envisioning the Future of Positech," anchored around strategic planning towards a sustainable research community. Ultimately, this workshop will serve as a platform to shift the narrative of social technology research towards a more positive, human-centric approach. It will foster research that goes beyond fixing technologies to protect humans from harm, to also pursue enriching human experiences and connections through technology. △ Less

Submitted 14 October, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.15814 [pdf, other]

Perceptions of Linguistic Uncertainty by Language Models and Humans

Authors: Catarina G Belem, Markelle Kelly, Mark Steyvers, Sameer Singh, Padhraic Smyth

Abstract: _Uncertainty expressions_ such as "probably" or "highly unlikely" are pervasive in human language. While prior work has established that there is population-level agreement in terms of how humans quantitatively interpret these expressions, there has been little inquiry into the abilities of language models in the same context. In this paper, we investigate how language models map linguistic expres… ▽ More _Uncertainty expressions_ such as "probably" or "highly unlikely" are pervasive in human language. While prior work has established that there is population-level agreement in terms of how humans quantitatively interpret these expressions, there has been little inquiry into the abilities of language models in the same context. In this paper, we investigate how language models map linguistic expressions of uncertainty to numerical responses. Our approach assesses whether language models can employ theory of mind in this setting: understanding the uncertainty of another agent about a particular statement, independently of the model's own certainty about that statement. We find that 7 out of 10 models are able to map uncertainty expressions to probabilistic responses in a human-like manner. However, we observe systematically different behavior depending on whether a statement is actually true or false. This sensitivity indicates that language models are substantially more susceptible to bias based on their prior knowledge (as compared to humans). These findings raise important questions and have broad implications for human-AI and AI-AI communication. △ Less

Submitted 7 November, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

Comments: Accepted at EMNLP 2024 (Main)

arXiv:2406.01076 [pdf, other]

Estimating Canopy Height at Scale

Authors: Jan Pauls, Max Zimmer, Una M. Kelly, Martin Schwartz, Sassan Saatchi, Philippe Ciais, Sebastian Pokutta, Martin Brandt, Fabian Gieseke

Abstract: We propose a framework for global-scale canopy height estimation based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regio… ▽ More We propose a framework for global-scale canopy height estimation based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regions, enhancing the reliability of our predictions in those areas. A comparison between predictions and ground-truth labels yields an MAE / RMSE of 2.43 / 4.73 (meters) overall and 4.45 / 6.72 (meters) for trees taller than five meters, which depicts a substantial improvement compared to existing global-scale maps. The resulting height map as well as the underlying framework will facilitate and enhance ecological analyses at a global scale, including, but not limited to, large-scale forest and biomass monitoring. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: ICML Camera-Ready, 17 pages, 14 figures, 7 tables

arXiv:2405.16426 [pdf, other]

Segmentation of Maya hieroglyphs through fine-tuned foundation models

Authors: FNU Shivam, Megan Leight, Mary Kate Kelly, Claire Davis, Kelsey Clodfelter, Jacob Thrasher, Yenumula Reddy, Prashnna Gyawali

Abstract: The study of Maya hieroglyphic writing unlocks the rich history of cultural and societal knowledge embedded within this ancient civilization's visual narrative. Artificial Intelligence (AI) offers a novel lens through which we can translate these inscriptions, with the potential to allow non-specialists access to reading these texts and to aid in the decipherment of those hieroglyphs which continu… ▽ More The study of Maya hieroglyphic writing unlocks the rich history of cultural and societal knowledge embedded within this ancient civilization's visual narrative. Artificial Intelligence (AI) offers a novel lens through which we can translate these inscriptions, with the potential to allow non-specialists access to reading these texts and to aid in the decipherment of those hieroglyphs which continue to elude comprehensive interpretation. Toward this, we leverage a foundational model to segment Maya hieroglyphs from an open-source digital library dedicated to Maya artifacts. Despite the initial promise of publicly available foundational segmentation models, their effectiveness in accurately segmenting Maya hieroglyphs was initially limited. Addressing this challenge, our study involved the meticulous curation of image and label pairs with the assistance of experts in Maya art and history, enabling the fine-tuning of these foundational models. This process significantly enhanced model performance, illustrating the potential of fine-tuning approaches and the value of our expanding dataset. We plan to open-source this dataset for encouraging future research, and eventually to help make the hieroglyphic texts legible to a broader community, particularly for Maya heritage community members. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2404.08611 [pdf, other]

Automatic Quantification of Serial PET/CT Images for Pediatric Hodgkin Lymphoma Patients Using a Longitudinally-Aware Segmentation Network

Authors: Xin Tie, Muheon Shin, Changhee Lee, Scott B. Perlman, Zachary Huemann, Amy J. Weisman, Sharon M. Castellino, Kara M. Kelly, Kathleen M. McCarten, Adina L. Alazraki, Junjie Hu, Steve Y. Cho, Tyler J. Bradshaw

Abstract: $\textbf{Purpose}$: Automatic quantification of longitudinal changes in PET scans for lymphoma patients has proven challenging, as residual disease in interim-therapy scans is often subtle and difficult to detect. Our goal was to develop a longitudinally-aware segmentation network (LAS-Net) that can quantify serial PET/CT images for pediatric Hodgkin lymphoma patients. $\textbf{Materials and Metho… ▽ More $\textbf{Purpose}$: Automatic quantification of longitudinal changes in PET scans for lymphoma patients has proven challenging, as residual disease in interim-therapy scans is often subtle and difficult to detect. Our goal was to develop a longitudinally-aware segmentation network (LAS-Net) that can quantify serial PET/CT images for pediatric Hodgkin lymphoma patients. $\textbf{Materials and Methods}$: This retrospective study included baseline (PET1) and interim (PET2) PET/CT images from 297 patients enrolled in two Children's Oncology Group clinical trials (AHOD1331 and AHOD0831). LAS-Net incorporates longitudinal cross-attention, allowing relevant features from PET1 to inform the analysis of PET2. Model performance was evaluated using Dice coefficients for PET1 and detection F1 scores for PET2. Additionally, we extracted and compared quantitative PET metrics, including metabolic tumor volume (MTV) and total lesion glycolysis (TLG) in PET1, as well as qPET and $Δ$SUVmax in PET2, against physician measurements. We quantified their agreement using Spearman's $ρ$ correlations and employed bootstrap resampling for statistical analysis. $\textbf{Results}$: LAS-Net detected residual lymphoma in PET2 with an F1 score of 0.606 (precision/recall: 0.615/0.600), outperforming all comparator methods (P<0.01). For baseline segmentation, LAS-Net achieved a mean Dice score of 0.772. In PET quantification, LAS-Net's measurements of qPET, $Δ$SUVmax, MTV and TLG were strongly correlated with physician measurements, with Spearman's $ρ$ of 0.78, 0.80, 0.93 and 0.96, respectively. The performance remained high, with a slight decrease, in an external testing cohort. $\textbf{Conclusion}$: LAS-Net demonstrated significant improvements in quantifying PET metrics across serial scans, highlighting the value of longitudinal awareness in evaluating multi-time-point imaging datasets. △ Less

Submitted 30 September, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: There are 6 figures and 4 tables in the main text. The supplementary material is appended to the main text

arXiv:2404.06784 [pdf]

Statistical evaluation of 571 GaAs quantum point contact transistors showing the 0.7 anomaly in quantized conductance using millikelvin cryogenic on-chip multiplexing

Authors: Pengcheng Ma, Kaveh Delfanazari, Reuben K. Puddy, Jiahui Li, Moda Cao, Teng Yi, Jonathan P. Griffiths, Harvey E. Beere, David A. Ritchie, Michael J. Kelly, Charles G. Smith

Abstract: The mass production and the practical number of cryogenic quantum devices producible in a single chip are limited to the number of electrical contact pads and wiring of the cryostat or dilution refrigerator. It is, therefore, beneficial to contrast the measurements of hundreds of devices fabricated in a single chip in one cooldown process to promote the scalability, integrability, reliability, and… ▽ More The mass production and the practical number of cryogenic quantum devices producible in a single chip are limited to the number of electrical contact pads and wiring of the cryostat or dilution refrigerator. It is, therefore, beneficial to contrast the measurements of hundreds of devices fabricated in a single chip in one cooldown process to promote the scalability, integrability, reliability, and reproducibility of quantum devices and to save evaluation time, cost and energy. Here, we use a cryogenic on-chip multiplexer architecture and investigate the statistics of the 0.7 anomaly observed on the first three plateaus of the quantized conductance of semiconductor quantum point contact (QPC) transistors. Our single chips contain 256 split gate field effect QPC transistors (QFET) each, with two 16-branch multiplexed source-drain and gate pads, allowing individual transistors to be selected, addressed and controlled through an electrostatic gate voltage process. A total of 1280 quantum transistors with nano-scale dimensions are patterned in 5 different chips of GaAs heterostructures. From the measurements of 571 functioning QPCs taken at temperatures T= 1.4 K and T= 40 mK, it is found that the spontaneous polarisation model and Kondo effect do not fit our results. Furthermore, some of the features in our data largely agreed with van Hove model with short-range interactions. Our approach provides further insight into the quantum mechanical properties and microscopic origin of the 0.7 anomaly in QPCs, paving the way for the development of semiconducting quantum circuits and integrated cryogenic electronics, for scalable quantum logic control, readout, synthesis, and processing applications. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2403.18827 [pdf, other]

Bridging Generative Networks with the Common Model of Cognition

Authors: Robert L. West, Spencer Eckler, Brendan Conway-Smith, Nico Turcas, Eilene Tomkins-Flanagan, Mary Alexandria Kelly

Abstract: This article presents a theoretical framework for adapting the Common Model of Cognition to large generative network models within the field of artificial intelligence. This can be accomplished by restructuring modules within the Common Model into shadow production systems that are peripheral to a central production system, which handles higher-level reasoning based on the shadow productions' outp… ▽ More This article presents a theoretical framework for adapting the Common Model of Cognition to large generative network models within the field of artificial intelligence. This can be accomplished by restructuring modules within the Common Model into shadow production systems that are peripheral to a central production system, which handles higher-level reasoning based on the shadow productions' output. Implementing this novel structure within the Common Model allows for a seamless connection between cognitive architectures and generative neural networks. △ Less

Submitted 25 January, 2024; originally announced March 2024.

arXiv:2403.11164 [pdf, other]

doi 10.1145/3613904.3642919

The Effects of Generative AI on Design Fixation and Divergent Thinking

Authors: Samangi Wadinambiarachchi, Ryan M. Kelly, Saumya Pareek, Qiushi Zhou, Eduardo Velloso

Abstract: Generative AI systems have been heralded as tools for augmenting human creativity and inspiring divergent thinking, though with little empirical evidence for these claims. This paper explores the effects of exposure to AI-generated images on measures of design fixation and divergent thinking in a visual ideation task. Through a between-participants experiment (N=60), we found that support from an… ▽ More Generative AI systems have been heralded as tools for augmenting human creativity and inspiring divergent thinking, though with little empirical evidence for these claims. This paper explores the effects of exposure to AI-generated images on measures of design fixation and divergent thinking in a visual ideation task. Through a between-participants experiment (N=60), we found that support from an AI image generator during ideation leads to higher fixation on an initial example. Participants who used AI produced fewer ideas, with less variety and lower originality compared to a baseline. Our qualitative analysis suggests that the effectiveness of co-ideation with AI rests on participants' chosen approach to prompt creation and on the strategies used by participants to generate ideas in response to the AI's suggestions. We discuss opportunities for designing generative AI systems for ideation support and incorporating these AI tools into ideation workflows. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: Accepted at the CHI Conference on Human Factors in Computing Systems (CHI 24),18 pages, 15 figures,

arXiv:2402.01040 [pdf]

Everyday Uses of Music Listening and Music Technologies by Caregivers and People with Dementia: Survey and Focus Group Study

Authors: Dianna Vidas, Romina Carrasco, Ryan M. Kelly, Jenny Waycott, Jeanette Tamplin, Kate McMahon, Libby M. Flynn, Phoebe A. Stretton-Smith, Tanara Vieira Sousa, Felicity A. Baker

Abstract: Music is a valuable non-pharmacological tool that provides benefits for people with dementia, and there is interest in designing technologies to support music use in dementia care. To ensure music technologies are appropriately designed for supporting caregivers and people living with dementia, there remains a need to better understand how music is currently used in everyday care at home. We aimed… ▽ More Music is a valuable non-pharmacological tool that provides benefits for people with dementia, and there is interest in designing technologies to support music use in dementia care. To ensure music technologies are appropriately designed for supporting caregivers and people living with dementia, there remains a need to better understand how music is currently used in everyday care at home. We aimed to understand how people with dementia and their caregivers use music technologies in everyday caring, as well as challenges they experience using music and technology. This study used a mixed methods design. A survey was completed by 77 caregivers and people with dementia to understand their use of music and technology. Of these, 18 survey respondents (12 family caregivers, 6 people living with dementia) participated in focus groups about their experiences of using music and technology in care. Transcripts were analysed with reflexive thematic analysis. Most survey respondents used music often in their daily lives, reporting a range of music technologies such as CDs, radio, and streaming. Focus groups highlighted benefits and challenges of music technologies in everyday care. Participants used music and music technologies to regulate mood, provide joy, facilitate social connection, encourage reminiscence, provide continuity before and after diagnosis, and to make caregiving easier. Challenges of using music technology in care included difficulties staying up to date with evolving technology, and low self-efficacy for technology use expressed by people living with dementia. Evidently, people living with dementia and their caregivers use music technologies to support their everyday care needs. Results suggest opportunities to design technologies enabling easier access to music and supporting people living with dementia with recreational and therapeutic music listening and music-based activities. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.11628 [pdf]

Older Adults Imagining Future Technologies in Participatory Design Workshops: Supporting Continuity in the Pursuit of Meaningful Activities

Authors: Wei Zhao, Ryan M. Kelly, Melissa J. Rogerson, Jenny Waycott

Abstract: Recent innovations in digital technology offer significant opportunities for older adults to engage in meaningful activities. To investigate older adults' perceptions of using existing and emerging technologies for meaningful activities, we conducted three participatory design workshops and follow-up interviews with adults aged over 65. The workshops encompassed discussions on existing technologie… ▽ More Recent innovations in digital technology offer significant opportunities for older adults to engage in meaningful activities. To investigate older adults' perceptions of using existing and emerging technologies for meaningful activities, we conducted three participatory design workshops and follow-up interviews with adults aged over 65. The workshops encompassed discussions on existing technologies for meaningful activities, demonstrations of emerging technologies such as VR, AR, and AI, and design activities including prototyping and storyboarding. Our findings show that while participants had diverse interpretations of meaningful activities, they sought to use technologies to support continuity in the pursuit of these activities. Specifically, participants highlighted the importance of safe aging at home, which provides a pathway for meaningful activities in later life. We further discuss participants' discerning attitudes when assessing the use of different technologies for meaningful activities and several values and attributes they desire when envisioning future technologies, including simplicity, positivity, proactivity, and integration. △ Less

Submitted 23 May, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

arXiv:2310.15177 [pdf, other]

A Neuro-mimetic Realization of the Common Model of Cognition via Hebbian Learning and Free Energy Minimization

Authors: Alexander Ororbia, Mary Alexandria Kelly

Abstract: Over the last few years, large neural generative models, capable of synthesizing semantically rich passages of text or producing complex images, have recently emerged as a popular representation of what has come to be known as ``generative artificial intelligence'' (generative AI). Beyond opening the door to new opportunities as well as challenges for the domain of statistical machine learning, th… ▽ More Over the last few years, large neural generative models, capable of synthesizing semantically rich passages of text or producing complex images, have recently emerged as a popular representation of what has come to be known as ``generative artificial intelligence'' (generative AI). Beyond opening the door to new opportunities as well as challenges for the domain of statistical machine learning, the rising popularity of generative AI brings with it interesting questions for Cognitive Science, which seeks to discover the nature of the processes that underpin minds and brains as well as to understand how such functionality might be acquired and instantianted in biological (or artificial) substrate. With this goal in mind, we argue that a promising research program lies in the crafting of cognitive architectures, a long-standing tradition of the field, cast fundamentally in terms of neuro-mimetic generative building blocks. Concretely, we discuss the COGnitive Neural GENerative system, such an architecture that casts the Common Model of Cognition in terms of Hebbian adaptation operating in service of optimizing a variational free energy functional. △ Less

Submitted 3 November, 2023; v1 submitted 14 October, 2023; originally announced October 2023.

Comments: Additional section on hopfield functionals and CogNGen's full free energy, basal ganglia sub-circuit diagram integrated

arXiv:2310.12369 [pdf, other]

On Identifying Points of Semantic Shift Across Domains

Authors: Hyung Wook Choi, Mat Kelly

Abstract: The semantics used for particular terms in an academic field organically evolve over time. Tracking this evolution through inspection of published literature has either been from the perspective of Linguistic scholars or has concentrated the focus of term evolution within a single domain of study. In this paper, we performed a case study to identify semantic evolution across different domains and… ▽ More The semantics used for particular terms in an academic field organically evolve over time. Tracking this evolution through inspection of published literature has either been from the perspective of Linguistic scholars or has concentrated the focus of term evolution within a single domain of study. In this paper, we performed a case study to identify semantic evolution across different domains and identify examples of inter-domain semantic shifts. We initially used keywords as the basis of our search and executed an iterative process of following citations to find the initial mention of the concepts in the field. We found that a select set of keywords like ``semaphore'', ``polymorphism'', and ``ontology'' were mentioned within Computer Science literature and tracked the seminal study that borrowed those terms from original fields by citations. We marked these events as semantic evolution points. Through this manual investigation method, we can identify term evolution across different academic fields. This study reports our initial findings that will seed future automated and computational methods of incorporating concepts from additional academic fields. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: In 17th International Conference on Metadata and Semantics Research, October 2023

arXiv:2310.08371 [pdf, other]

doi 10.1155/1970/9353816

Worst-Case Morphs using Wasserstein ALI and Improved MIPGAN

Authors: Una M. Kelly, Meike Nauta, Lu Liu, Luuk J. Spreeuwers, Raymond N. J. Veldhuis

Abstract: A morph is a combination of two separate facial images and contains identity information of two different people. When used in an identity document, both people can be authenticated by a biometric Face Recognition (FR) system. Morphs can be generated using either a landmark-based approach or approaches based on deep learning such as Generative Adversarial Networks (GAN). In a recent paper, we intr… ▽ More A morph is a combination of two separate facial images and contains identity information of two different people. When used in an identity document, both people can be authenticated by a biometric Face Recognition (FR) system. Morphs can be generated using either a landmark-based approach or approaches based on deep learning such as Generative Adversarial Networks (GAN). In a recent paper, we introduced a \emph{worst-case} upper bound on how challenging morphing attacks can be for an FR system. The closer morphs are to this upper bound, the bigger the challenge they pose to FR. We introduced an approach with which it was possible to generate morphs that approximate this upper bound for a known FR system (white box), but not for unknown (black box) FR systems. In this paper, we introduce a morph generation method that can approximate worst-case morphs even when the FR system is not known. A key contribution is that we include the goal of generating difficult morphs \emph{during} training. Our method is based on Adversarially Learned Inference (ALI) and uses concepts from Wasserstein GANs trained with Gradient Penalty, which were introduced to stabilise the training of GANs. We include these concepts to achieve similar improvement in training stability and call the resulting method Wasserstein ALI (WALI). We finetune WALI using loss functions designed specifically to improve the ability to manipulate identity information in facial images and show how it can generate morphs that are more challenging for FR systems than landmark- or GAN-based morphs. We also show how our findings can be used to improve MIPGAN, an existing StyleGAN-based morph generator. △ Less

Submitted 13 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2309.10066 [pdf, other]

doi 10.1007/s10278-024-00985-3

Automatic Personalized Impression Generation for PET Reports Using Large Language Models

Authors: Xin Tie, Muheon Shin, Ali Pirasteh, Nevein Ibrahim, Zachary Huemann, Sharon M. Castellino, Kara M. Kelly, John Garrett, Junjie Hu, Steve Y. Cho, Tyler J. Bradshaw

Abstract: In this study, we aimed to determine if fine-tuned large language models (LLMs) can generate accurate, personalized impressions for whole-body PET reports. Twelve language models were trained on a corpus of PET reports using the teacher-forcing algorithm, with the report findings as input and the clinical impressions as reference. An extra input token encodes the reading physician's identity, allo… ▽ More In this study, we aimed to determine if fine-tuned large language models (LLMs) can generate accurate, personalized impressions for whole-body PET reports. Twelve language models were trained on a corpus of PET reports using the teacher-forcing algorithm, with the report findings as input and the clinical impressions as reference. An extra input token encodes the reading physician's identity, allowing models to learn physician-specific reporting styles. Our corpus comprised 37,370 retrospective PET reports collected from our institution between 2010 and 2022. To identify the best LLM, 30 evaluation metrics were benchmarked against quality scores from two nuclear medicine (NM) physicians, with the most aligned metrics selecting the model for expert evaluation. In a subset of data, model-generated impressions and original clinical impressions were assessed by three NM physicians according to 6 quality dimensions (3-point scale) and an overall utility score (5-point scale). Each physician reviewed 12 of their own reports and 12 reports from other physicians. Bootstrap resampling was used for statistical analysis. Of all evaluation metrics, domain-adapted BARTScore and PEGASUSScore showed the highest Spearman's rank correlations (0.568 and 0.563) with physician preferences. Based on these metrics, the fine-tuned PEGASUS model was selected as the top LLM. When physicians reviewed PEGASUS-generated impressions in their own style, 89% were considered clinically acceptable, with a mean utility score of 4.08 out of 5. Physicians rated these personalized impressions as comparable in overall utility to the impressions dictated by other physicians (4.03, P=0.41). In conclusion, personalized impressions generated by PEGASUS were clinically useful, highlighting its potential to expedite PET reporting. △ Less

Submitted 17 October, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: 25 pages in total. 6 figures and 3 tables in the main body. The manuscript has been submitted to a journal for potential publication

Journal ref: J Digit Imaging. Inform. Med. (2024)

arXiv:2305.09064 [pdf, other]

doi 10.1145/3593013.3594111

Capturing Humans' Mental Models of AI: An Item Response Theory Approach

Authors: Markelle Kelly, Aakriti Kumar, Padhraic Smyth, Mark Steyvers

Abstract: Improving our understanding of how humans perceive AI teammates is an important foundation for our general understanding of human-AI teams. Extending relevant work from cognitive science, we propose a framework based on item response theory for modeling these perceptions. We apply this framework to real-world experiments, in which each participant works alongside another person or an AI agent in a… ▽ More Improving our understanding of how humans perceive AI teammates is an important foundation for our general understanding of human-AI teams. Extending relevant work from cognitive science, we propose a framework based on item response theory for modeling these perceptions. We apply this framework to real-world experiments, in which each participant works alongside another person or an AI agent in a question-answering setting, repeatedly assessing their teammate's performance. Using this experimental data, we demonstrate the use of our framework for testing research questions about people's perceptions of both AI agents and other people. We contrast mental models of AI teammates with those of human teammates as we characterize the dimensionality of these mental models, their development over time, and the influence of the participants' own self-perception. Our results indicate that people expect AI agents' performance to be significantly better on average than the performance of other humans, with less variation across different types of problems. We conclude with a discussion of the implications of these findings for human-AI interaction. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: FAccT 2023

arXiv:2210.01196 [pdf, other]

doi 10.1007/978-3-031-21756-2_25

Aggregator Reuse and Extension for Richer Web Archive Interaction

Authors: Mat Kelly

Abstract: Memento aggregators enable users to query multiple web archives for captures of a URI in time through a single HTTP endpoint. While this one-to-many access point is useful for researchers and end-users, aggregators are in a position to provide additional functionality to end-users beyond black box style aggregation. This paper identifies the state-of-the-art of Memento aggregation, abstracts its p… ▽ More Memento aggregators enable users to query multiple web archives for captures of a URI in time through a single HTTP endpoint. While this one-to-many access point is useful for researchers and end-users, aggregators are in a position to provide additional functionality to end-users beyond black box style aggregation. This paper identifies the state-of-the-art of Memento aggregation, abstracts its processes, highlights shortcomings, and offers systematic enhancements. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Comments: 16 pages, preprint accepted to be In Proceedings of the 24th International Conference on Asia-Pacific Digital Libraries (ICADL 2022)

arXiv:2209.15154 [pdf, other]

Variable-Based Calibration for Machine Learning Classifiers

Authors: Markelle Kelly, Padhraic Smyth

Abstract: The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confidence scores for model predictions. In this paper we introduce the notion of variable-based calibration to characterize calibration properties of a model with respect to a variable of interest, generalizing traditional score-based metrics such as expected calibration error (ECE). In particular, we f… ▽ More The deployment of machine learning classifiers in high-stakes domains requires well-calibrated confidence scores for model predictions. In this paper we introduce the notion of variable-based calibration to characterize calibration properties of a model with respect to a variable of interest, generalizing traditional score-based metrics such as expected calibration error (ECE). In particular, we find that models with near-perfect ECE can exhibit significant miscalibration as a function of features of the data. We demonstrate this phenomenon both theoretically and in practice on multiple well-known datasets, and show that it can persist after the application of existing calibration methods. To mitigate this issue, we propose strategies for detection, visualization, and quantification of variable-based calibration error. We then examine the limitations of current score-based calibration methods and explore potential modifications. Finally, we discuss the implications of these findings, emphasizing that an understanding of calibration beyond simple aggregate measures is crucial for endeavors such as fairness and model interpretability. △ Less

Submitted 5 April, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

arXiv:2207.09534 [pdf, other]

doi 10.1109/VIS54862.2022.00028

VisQuiz: Exploring Feedback Mechanisms to Improve Graphical Perception

Authors: Ryan Birchfield, Maddison Caten, Errica Cheng, Madyson Kelly, Truman Larson, Hoan Phan Pham, Yiren Ding, Noëlle Rakotondravony, Lane Harrison

Abstract: Graphical perception studies are a key element of visualization research, forming the basis of design recommendations and contributing to our understanding of how people make sense of visualizations. However, graphical perception studies typically include only brief training sessions, and the impact of longer and more in-depth feedback remains unclear. In this paper, we explore the design and eval… ▽ More Graphical perception studies are a key element of visualization research, forming the basis of design recommendations and contributing to our understanding of how people make sense of visualizations. However, graphical perception studies typically include only brief training sessions, and the impact of longer and more in-depth feedback remains unclear. In this paper, we explore the design and evaluation of feedback for graphical perception tasks, called VisQuiz. Using a quiz-like metaphor, we design feedback for a typical visualization comparison experiment, showing participants their answer alongside the correct answer in an animated sequence in each trial. We extend this quiz metaphor to include summary feedback after each stage of the experiment, providing additional moments for participants to reflect on their performance. To evaluate VisQuiz, we conduct a between-subjects experiment, including three stages of 40 trials each with a control condition that included only summary feedback. Results from n = 80 participants show that once participants started receiving trial feedback (Stage 2) they performed significantly better with bubble charts than those in the control condition. This effect carried over when feedback was removed (Stage 3). Results also suggest an overall trend of improved performance due to feedback. We discuss these findings in the context of other visualization literacy efforts, and possible future work at the intersection of visualization, feedback, and learning. Experiment data and analysis scripts are available at the following repository https://osf.io/jys5d/ △ Less

Submitted 2 October, 2023; v1 submitted 19 July, 2022; originally announced July 2022.

Comments: 5 pages, 5 figures, short paper

Journal ref: Proceedings of IEEE Visualization conference 2023

arXiv:2204.00619 [pdf, other]

Maze Learning using a Hyperdimensional Predictive Processing Cognitive Architecture

Authors: Alexander Ororbia, M. Alex Kelly

Abstract: We present the COGnitive Neural GENerative system (CogNGen), a cognitive architecture that combines two neurobiologically-plausible, computational models: predictive processing and hyperdimensional/vector-symbolic models. We draw inspiration from architectures such as ACT-R and Spaun/Nengo. CogNGen is in broad agreement with these, providing a level of detail between ACT-R's high-level symbolic de… ▽ More We present the COGnitive Neural GENerative system (CogNGen), a cognitive architecture that combines two neurobiologically-plausible, computational models: predictive processing and hyperdimensional/vector-symbolic models. We draw inspiration from architectures such as ACT-R and Spaun/Nengo. CogNGen is in broad agreement with these, providing a level of detail between ACT-R's high-level symbolic description of human cognition and Spaun's low-level neurobiological description, furthermore creating the groundwork for designing agents that learn continually from diverse tasks and model human performance at larger scales than what is possible with current systems. We test CogNGen on four maze-learning tasks, including those that test memory and planning, and find that CogNGen matches performance of deep reinforcement learning models and exceeds on a task designed to test memory. △ Less

Submitted 8 August, 2022; v1 submitted 31 March, 2022; originally announced April 2022.

Comments: Revisions applied to reflect the version accepted to AGI 2022. Note that this includes the appendix mentioned in the AGI 2022 proceedings publication

arXiv:2202.10194 [pdf]

doi 10.1016/j.proci.2022.07.181

Low-Dimensional High-Fidelity Kinetic Models for NOX Formation by a Compute Intensification Method

Authors: Mark Kelly, Harry Dunne, Gilles Bourque, Stephen Dooley

Abstract: A novel compute intensification methodology to the construction of low-dimensional, high-fidelity "compact" kinetic models for NOX formation is designed and demonstrated. The method adapts the data intensive Machine Learned Optimization of Chemical Kinetics (MLOCK) algorithm for compact model generation by the use of a Latin Square method for virtual reaction network generation. A set of logical r… ▽ More A novel compute intensification methodology to the construction of low-dimensional, high-fidelity "compact" kinetic models for NOX formation is designed and demonstrated. The method adapts the data intensive Machine Learned Optimization of Chemical Kinetics (MLOCK) algorithm for compact model generation by the use of a Latin Square method for virtual reaction network generation. A set of logical rules are defined which construct a minimally sized virtual reaction network comprising three additional nodes (N, NO, NO2). This NOX virtual reaction network is appended to a pre-existing compact model for methane combustion comprising fifteen nodes. The resulting eighteen node virtual reaction network is processed by the MLOCK coded algorithm to produce a plethora of compact model candidates for NOX formation during methane combustion. MLOCK automatically; populates the terms of the virtual reaction network with candidate inputs; measures the success of the resulting compact model candidates (in reproducing a broad set of gas turbine industry-defined performance targets); selects regions of input parameters space showing models of best performance; refines the input parameters to give better performance; and makes an ultimate selection of the best performing model or models. By this method, it is shown that a number of compact model candidates exist that show fidelities in excess of 75% in reproducing industry defined performance targets, with one model valid to >75% across fuel/air equivalence ratios of 0.5-1.0. However, to meet the full fuel/air equivalence ratio performance envelope defined by industry, we show that with this minimal virtual reaction network, two further compact models are required. △ Less

Submitted 21 February, 2022; originally announced February 2022.

Comments: arXiv admin note: text overlap with arXiv:2202.08021

arXiv:2202.08021 [pdf]

doi 10.1016/j.combustflame.2023.112755

Toward Development of Machine Learned Techniques for Production of Compact Kinetic Models

Authors: Mark Kelly, Mark Fortune, Gilles Bourque, Stephen Dooley

Abstract: Chemical kinetic models are an essential component in the development and optimisation of combustion devices through their coupling to multi-dimensional simulations such as computational fluid dynamics (CFD). Low-dimensional kinetic models which retain good fidelity to the reality are needed, the production of which requires considerable human-time cost and expert knowledge. Here, we present a nov… ▽ More Chemical kinetic models are an essential component in the development and optimisation of combustion devices through their coupling to multi-dimensional simulations such as computational fluid dynamics (CFD). Low-dimensional kinetic models which retain good fidelity to the reality are needed, the production of which requires considerable human-time cost and expert knowledge. Here, we present a novel automated compute intensification methodology to produce overly-reduced and optimised (compact) chemical kinetic models. This algorithm, termed Machine Learned Optimisation of Chemical Kinetics (MLOCK), systematically perturbs each of the four sub-models of a chemical kinetic model to discover what combinations of terms results in a good model. A virtual reaction network comprised of n species is first obtained using conventional mechanism reduction. To counteract the imposed decrease in model performance, the weights (virtual reaction rate constants) of important connections (virtual reactions) between each node (species) of the virtual reaction network are numerically optimised to replicate selected calculations across four sequential phases. The first version of MLOCK, (MLOCK1.0) simultaneously perturbs all three virtual Arrhenius reaction rate constant parameters for important connections and assesses the suitability of the new parameters through objective error functions, which quantify the error in each compact model candidate's calculation of the optimisation targets, which may be comprised of detailed model calculations and/or experimental data. MLOCK1.0 is demonstrated by creating compact models for the archetypal case of methane air combustion. It is shown that the NUGMECH1.0 detailed model comprised of 2,789 species is reliably compacted to 15 species (nodes), whilst retaining an overall fidelity of ~87% to the detailed model calculations, outperforming the prior state-of-art. △ Less

Submitted 16 February, 2022; originally announced February 2022.

arXiv:2111.15416 [pdf, other]

Worst-Case Morphs: a Theoretical and a Practical Approach

Authors: Una M. Kelly, Raymond Veldhuis, Luuk Spreeuwers

Abstract: Face Recognition (FR) systems have been shown to be vulnerable to morphing attacks. We examine exactly how challenging morphs can become. By showing a worst-case construction in the embedding space of an FR system and using a mapping from embedding space back to image space we generate images that show that this theoretical upper bound can be approximated if the FR system is known. The resulting m… ▽ More Face Recognition (FR) systems have been shown to be vulnerable to morphing attacks. We examine exactly how challenging morphs can become. By showing a worst-case construction in the embedding space of an FR system and using a mapping from embedding space back to image space we generate images that show that this theoretical upper bound can be approximated if the FR system is known. The resulting morphs can also succesfully fool unseen FR systems and are useful for exploring and understanding the weaknesses of FR systems. Our method contributes to gaining more insight into the vulnerability of FR systems. △ Less

Submitted 19 September, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

arXiv:2111.03910 [pdf]

FAIR Metadata: A Community-driven Vocabulary Application

Authors: Christopher B. Rauch, Mat Kelly, John A. Kunze, Jane Greenberg

Abstract: FAIR metadata is critical to supporting FAIR data overall. Transparency, community engagement, and flexibility are key aspects of FAIR that apply to metadata. This paper presents YAMZ (Yet Another Metadata Zoo), a community-driven vocabulary application that supports FAIR. The history ofYAMZ and its original features are reviewed, followed by a presentation of recent innovations and a discussion o… ▽ More FAIR metadata is critical to supporting FAIR data overall. Transparency, community engagement, and flexibility are key aspects of FAIR that apply to metadata. This paper presents YAMZ (Yet Another Metadata Zoo), a community-driven vocabulary application that supports FAIR. The history ofYAMZ and its original features are reviewed, followed by a presentation of recent innovations and a discussion of how YAMZ supports FAIR principles. The conclusion identifies next steps and key outputs. △ Less

Submitted 6 November, 2021; originally announced November 2021.

ACM Class: H.3.7

arXiv:2109.13915 [pdf]

Modeling Ephraim Chambers' Knowledge Structure from a Naive Standpoint

Authors: Scott McClellan, Mat Kelly, Jane Greenberg

Abstract: In the preface to his Cyclopaedia published in 1728 Ephraim Chambers offers readers a systematized structure of his attempt to produce a universal repository of human knowledge. Divided into an interconnected taxonomic tree and domain vocabulary, this structure forms the basis of one effort from the Metadata Research Center to study historical ontologies. The knowledge structure is being encoded i… ▽ More In the preface to his Cyclopaedia published in 1728 Ephraim Chambers offers readers a systematized structure of his attempt to produce a universal repository of human knowledge. Divided into an interconnected taxonomic tree and domain vocabulary, this structure forms the basis of one effort from the Metadata Research Center to study historical ontologies. The knowledge structure is being encoded into a Simple Knowledge Organization System (SKOS) form as well as a Web Ontology Language (OWL) version. This paper explores the expressive and functional differences between these SKOS and OWL versions of Chambers' knowledge structure. As part of this goal, the paper research focused on the construction and application of rules in each system to produce a more computationally ready representation of Chambers' structure in SKOS, which is more thesaurus-like, and OWL, which represents additional ontological nuances. First, studying the various textual aspects at the semantic, syntactic, and typographic levels allowed for the relationships between terms to manifest from which rules governing expression of the connections between elements developed. Second, because each language, SKOS and OWL, functionally expresses different logical relationships, their possibilities and limitations offer a ground for further analyzing the resultant knowledge structures; although, each stemmed from the same basic source of Chambers' text. Lastly this paper will examine rule making and expression in light of Paul Grice's theory of conversational implicature to understand how a naive agent formulates and applies these rules to a knowledge structure. △ Less

Submitted 28 September, 2021; originally announced September 2021.

Comments: NASKO 2021 Conference. 9 pages, 3 figures

ACM Class: I.7

arXiv:2109.06317 [pdf]

Project Pipeline: Preservation, Persistence, and Performance

Authors: Jane Greenberg, Christopher B. Rauch, Mat Kelly

Abstract: Preservation pipelines demonstrate extended value when digitized content is also computation ready. Expanding this to historical controlled vocabularies published in analog format requires additional steps if they are to be fully leveraged for research. This paper reports on work addressing this challenge. We report on a pipeline and project progress addressing three key goals: 1) transforming the… ▽ More Preservation pipelines demonstrate extended value when digitized content is also computation ready. Expanding this to historical controlled vocabularies published in analog format requires additional steps if they are to be fully leveraged for research. This paper reports on work addressing this challenge. We report on a pipeline and project progress addressing three key goals: 1) transforming the 1910 Library of Congress Subject Headings (LCSH) to the Simple Knowledge Organization System (SKOS) linked data standard, 2) implementing persistent identifiers (PIDs) and launching our prototype ARK resolver, and 3) importing the 1910 LCSH into the Helping Interdisciplinary Vocabulary Engineering (HIVE) System to support automatic metadata generation and scholarly analysis of the historical record. The discussion considers the implications of our work in the broader context of preservation, and the conclusion summarizes our work and identifies next steps. △ Less

Submitted 18 September, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: 5 pages, 2 figures. 17th International Conference on Digital Preservation (iPRES) 2021, Beijing, China

arXiv:2105.07308 [pdf, other]

Towards a Predictive Processing Implementation of the Common Model of Cognition

Authors: Alexander Ororbia, M. A. Kelly

Abstract: In this article, we present a cognitive architecture that is built from powerful yet simple neural models. Specifically, we describe an implementation of the common model of cognition grounded in neural generative coding and holographic associative memory. The proposed system creates the groundwork for developing agents that learn continually from diverse tasks as well as model human performance a… ▽ More In this article, we present a cognitive architecture that is built from powerful yet simple neural models. Specifically, we describe an implementation of the common model of cognition grounded in neural generative coding and holographic associative memory. The proposed system creates the groundwork for developing agents that learn continually from diverse tasks as well as model human performance at larger scales than what is possible with existant cognitive architectures. △ Less

Submitted 18 May, 2021; v1 submitted 15 May, 2021; originally announced May 2021.

Comments: 6 pages, 2 figures

arXiv:2102.12899 [pdf, other]

Mobility for Cellular-Connected UAVs: challenges for the network provider

Authors: Erika Fonseca, Boris Galkin, Marvin Kelly, Luiz A. DaSilva, Ivana Dusparic

Abstract: Unmanned Aerial Vehicle (UAV) technology is becoming more prevalent and more diverse in its application. 5G and beyond networks must enable UAV connectivity. This will require the network operator to consider this new type of user in the planning and operation of the network. This work presents the challenges an operator will encounter and should consider in the future as UAVs become users of the… ▽ More Unmanned Aerial Vehicle (UAV) technology is becoming more prevalent and more diverse in its application. 5G and beyond networks must enable UAV connectivity. This will require the network operator to consider this new type of user in the planning and operation of the network. This work presents the challenges an operator will encounter and should consider in the future as UAVs become users of the network. We analyse the 3GPP specifications, the existing research literature, and a publicly available UAV connectivity dataset, to describe the challenges. We classify these challenges into network planning and network optimisation categories. We discuss the challenge of planning network coverage when considering coverage for flying users and the PCI collision and confusion issues that can be aggravated by these users. In discussing network optimisation challenges, we introduce Automatic Neighbouring Relation (ANR) and handover challenges, specifically the number of neighbours in the Neighbour Relation Table (NRT), and their potential deletion and block-listing, the frequent number of handovers and the possibility that the UAV disconnects because of handover issues. We discuss possible approaches to address the presented challenges and use a real-world dataset to support our findings about these challenges and their importance. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: 6 pages, 4 figures

arXiv:2011.13114 [pdf]

doi 10.1109/BigData50022.2020.9378268

A Computational Approach to Historical Ontologies

Authors: Mat Kelly, Jane Greenberg, Christopher B. Rauch, Sam Grabus, Joan P. Boone, John A. Kunze, Peter Melville Logan

Abstract: This paper presents a use case exploring the application of the Archival Resource Key (ARK) persistent identifier for promoting and maintaining ontologies. In particular, we look at improving computation with an in-house ontology server in the context of temporally aligned vocabularies. This effort demonstrates the utility of ARKs in preparing historical ontologies for computational archival scien… ▽ More This paper presents a use case exploring the application of the Archival Resource Key (ARK) persistent identifier for promoting and maintaining ontologies. In particular, we look at improving computation with an in-house ontology server in the context of temporally aligned vocabularies. This effort demonstrates the utility of ARKs in preparing historical ontologies for computational archival science. △ Less

Submitted 25 November, 2020; originally announced November 2020.

Comments: 6 pages, 5 figures. To be published in Proceedings of the 2020 IEEE International Conference on Big Data (IEEE Big Data 2020)

ACM Class: H.3.7

Journal ref: 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 2020, pp. 1878-1883

arXiv:2011.03236 [pdf, other]

Experimental Evaluation of a UAV User QoS from a Two-Tier 3.6GHz Spectrum Network

Authors: Boris Galkin, Erika Fonseca, Gavin Lee, Conor Duff, Marvin Kelly, Edward Emmanuel, Ivana Dusparic

Abstract: Unmanned Aerial Vehicle (UAV) technology is becoming increasingly used in a variety of applications such as video surveillance and deliveries. To enable safe and efficient use of UAVs, the devices will need to be connected into cellular networks. Existing research on UAV cellular connectivity shows that UAVs encounter significant issues with existing networks, such as strong interference and anten… ▽ More Unmanned Aerial Vehicle (UAV) technology is becoming increasingly used in a variety of applications such as video surveillance and deliveries. To enable safe and efficient use of UAVs, the devices will need to be connected into cellular networks. Existing research on UAV cellular connectivity shows that UAVs encounter significant issues with existing networks, such as strong interference and antenna misalignment. In this work, we perform a novel measurement campaign of the performance of a UAV user when it connects to an experimental two-tier cellular network in two different areas of Dublin city's Smart Docklands, which includes massive MIMO macrocells and wirelessly-backhauled small cells. We measure Reference Signal Received Power (RSRP), Reference Signal Received Quality (RSRQ), Signal to Interference and Noise Ratio (SINR), the downlink throughput, and the small cell handover rate. Our results show that increasing the UAV height reduces the performance in both tiers, due to issues such as antenna misalignment. The small cell tier, however, can maintain relatively stable performance across the entire range of UAV heights, suggesting that UAV users can successfully connect to small cells during their flight. Furthermore, we demonstrate that while the UAV handover rate significantly fluctuates at different heights, the overall observed handover rates are very low. Our results highlight the potential for small cells in urban areas to provide connectivity to UAVs. △ Less

Submitted 9 April, 2021; v1 submitted 6 November, 2020; originally announced November 2020.

arXiv:2006.02487 [pdf, other]

Visualizing Webpage Changes Over Time

Authors: Abigail Mabe, Dhruv Patel, Maheedhar Gunnam, Surbhi Shankar, Mat Kelly, Sawood Alam, Michael L. Nelson, Michele C. Weigle

Abstract: We report on the development of TMVis, a web service to provide visualizations of how individual webpages have changed over time. We leverage past research on summarizing collections of webpages with thumbnail-sized screenshots and on choosing a small number of representative past archived webpages from a large collection. We offer four visualizations: image grid, image slider, timeline, and anima… ▽ More We report on the development of TMVis, a web service to provide visualizations of how individual webpages have changed over time. We leverage past research on summarizing collections of webpages with thumbnail-sized screenshots and on choosing a small number of representative past archived webpages from a large collection. We offer four visualizations: image grid, image slider, timeline, and animated GIF. Embed codes for the image grid and image slider can be produced to include these on separate webpages. The animated GIF can be downloaded as an image file for the same purpose. This tool can be used to allow scholars from various disciplines, as well as the general public, to explore the temporal nature of web archives. We hope that these visualizations will just be the beginning and will provide a starting point for others to expand these types of offerings for users of web archives. △ Less

Submitted 3 June, 2020; originally announced June 2020.

Comments: 13 pages

arXiv:1909.08663 [pdf, other]

Do We Need Neural Models to Explain Human Judgments of Acceptability?

Authors: Wang Jing, M. A. Kelly, David Reitter

Abstract: Native speakers can judge whether a sentence is an acceptable instance of their language. Acceptability provides a means of evaluating whether computational language models are processing language in a human-like manner. We test the ability of computational language models, simple language features, and word embeddings to predict native English speakers judgments of acceptability on English-langua… ▽ More Native speakers can judge whether a sentence is an acceptable instance of their language. Acceptability provides a means of evaluating whether computational language models are processing language in a human-like manner. We test the ability of computational language models, simple language features, and word embeddings to predict native English speakers judgments of acceptability on English-language essays written by non-native speakers. We find that much of the sentence acceptability variance can be captured by a combination of features including misspellings, word order, and word similarity (Pearson's r = 0.494). While predictive neural models fit acceptability judgments well (r = 0.527), we find that a 4-gram model with statistical smoothing is just as good (r = 0.528). Thanks to incorporating a count of misspellings, our 4-gram model surpasses both the previous unsupervised state-of-the art (Lau et al., 2015; r = 0.472), and the average non-expert native speaker (r = 0.46). Our results demonstrate that acceptability is well captured by n-gram statistics and simple language features. △ Less

Submitted 9 October, 2019; v1 submitted 18 September, 2019; originally announced September 2019.

Comments: 10 pages (8 pages + 2 pages of references), 1 figure, 7 tables

arXiv:1907.12214 [pdf, other]

A Case Study on Automated Fuzz Target Generation for Large Codebases

Authors: Matthew Kelly, Christoph Treude, Alex Murray

Abstract: Fuzz Testing is a largely automated testing technique that provides random and unexpected input to a program in attempt to trigger failure conditions. Much of the research conducted thus far into Fuzz Testing has focused on developing improvements to available Fuzz Testing tools and frameworks in order to improve efficiency. In this paper however, we instead look at a way in which we can reduce th… ▽ More Fuzz Testing is a largely automated testing technique that provides random and unexpected input to a program in attempt to trigger failure conditions. Much of the research conducted thus far into Fuzz Testing has focused on developing improvements to available Fuzz Testing tools and frameworks in order to improve efficiency. In this paper however, we instead look at a way in which we can reduce the amount of developer time required to integrate Fuzz Testing to help maintain an existing codebase. We accomplish this with a new technique for automatically generating Fuzz Targets, the modified versions of programs on which Fuzz Testing tools operate. We evaluated three different Fuzz Testing solutions on the codebase of our industry partner and found a fully automated solution to result in significantly more bugs found with respect to the developer time required to implement said solution. Our research is an important step towards increasing the prevalence of Fuzz Testing by making it simpler to integrate a Fuzz Testing solution for maintaining an existing codebase. △ Less

Submitted 29 July, 2019; originally announced July 2019.

Comments: to appear as industry track paper at ESEM 2019, the 13th International Symposium on Empirical Software Engineering and Measurement

arXiv:1810.02890 [pdf, other]

HG-DAgger: Interactive Imitation Learning with Human Experts

Authors: Michael Kelly, Chelsea Sidrane, Katherine Driggs-Campbell, Mykel J. Kochenderfer

Abstract: Imitation learning has proven to be useful for many real-world problems, but approaches such as behavioral cloning suffer from data mismatch and compounding error issues. One attempt to address these limitations is the DAgger algorithm, which uses the state distribution induced by the novice to sample corrective actions from the expert. Such sampling schemes, however, require the expert to provide… ▽ More Imitation learning has proven to be useful for many real-world problems, but approaches such as behavioral cloning suffer from data mismatch and compounding error issues. One attempt to address these limitations is the DAgger algorithm, which uses the state distribution induced by the novice to sample corrective actions from the expert. Such sampling schemes, however, require the expert to provide action labels without being fully in control of the system. This can decrease safety and, when using humans as experts, is likely to degrade the quality of the collected labels due to perceived actuator lag. In this work, we propose HG-DAgger, a variant of DAgger that is more suitable for interactive imitation learning from human experts in real-world systems. In addition to training a novice policy, HG-DAgger also learns a safety threshold for a model-uncertainty-based risk metric that can be used to predict the performance of the fully trained novice in different regions of the state space. We evaluate our method on both a simulated and real-world autonomous driving task, and demonstrate improved performance over both DAgger and behavioral cloning. △ Less

Submitted 11 March, 2019; v1 submitted 5 October, 2018; originally announced October 2018.

arXiv:1806.00871 [pdf, other]

doi 10.1145/3197026.3197045

A Framework for Aggregating Private and Public Web Archives

Authors: Mat Kelly, Michael L. Nelson, Michele C. Weigle

Abstract: Personal and private Web archives are proliferating due to the increase in the tools to create them and the realization that Internet Archive and other public Web archives are unable to capture personalized (e.g., Facebook) and private (e.g., banking) Web pages. We introduce a framework to mitigate issues of aggregation in private, personal, and public Web archives without compromising potential s… ▽ More Personal and private Web archives are proliferating due to the increase in the tools to create them and the realization that Internet Archive and other public Web archives are unable to capture personalized (e.g., Facebook) and private (e.g., banking) Web pages. We introduce a framework to mitigate issues of aggregation in private, personal, and public Web archives without compromising potential sensitive information contained in private captures. We amend Memento syntax and semantics to allow TimeMap enrichment to account for additional attributes to be expressed inclusive of the requirements for dereferencing private Web archive captures. We provide a method to involve the user further in the negotiation of archival captures in dimensions beyond time. We introduce a model for archival querying precedence and short-circuiting, as needed when aggregating private and personal Web archive captures with those from public Web archives through Memento. Negotiation of this sort is novel to Web archiving and allows for the more seamless aggregation of various types of Web archives to convey a more accurate picture of the past Web. △ Less

Submitted 3 June, 2018; originally announced June 2018.

Comments: Preprint version of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2018) full paper, accessible at the DOI

arXiv:1805.11546 [pdf, other]

Like a Baby: Visually Situated Neural Language Acquisition

Authors: Alexander G. Ororbia, Ankur Mali, Matthew A. Kelly, David Reitter

Abstract: We examine the benefits of visual context in training neural language models to perform next-word prediction. A multi-modal neural architecture is introduced that outperform its equivalent trained on language alone with a 2\% decrease in perplexity, even when no visual context is available at test. Fine-tuning the embeddings of a pre-trained state-of-the-art bidirectional language model (BERT) in… ▽ More We examine the benefits of visual context in training neural language models to perform next-word prediction. A multi-modal neural architecture is introduced that outperform its equivalent trained on language alone with a 2\% decrease in perplexity, even when no visual context is available at test. Fine-tuning the embeddings of a pre-trained state-of-the-art bidirectional language model (BERT) in the language modeling framework yields a 3.5\% improvement. The advantage for training with visual context when testing without is robust across different languages (English, German and Spanish) and different models (GRU, LSTM, $Δ$-RNN, as well as those that use BERT embeddings). Thus, language models perform better when they learn like a baby, i.e, in a multi-modal environment. This finding is compatible with the theory of situated cognition: language is inseparable from its physical context. △ Less

Submitted 4 June, 2019; v1 submitted 29 May, 2018; originally announced May 2018.

Comments: Final submission (camera-ready), accepted to ACL 2019

arXiv:1703.03302 [pdf, other]

doi 10.1109/JCDL.2017.7991601

Impact of URI Canonicalization on Memento Count

Authors: Mat Kelly, Lulwah M. Alkwai, Michael L. Nelson, Michele C. Weigle, Herbert Van de Sompel

Abstract: Quantifying the captures of a URI over time is useful for researchers to identify the extent to which a Web page has been archived. Memento TimeMaps provide a format to list mementos (URI-Ms) for captures along with brief metadata, like Memento-Datetime, for each URI-M. However, when some URI-Ms are dereferenced, they simply provide a redirect to a different URI-M (instead of a unique representati… ▽ More Quantifying the captures of a URI over time is useful for researchers to identify the extent to which a Web page has been archived. Memento TimeMaps provide a format to list mementos (URI-Ms) for captures along with brief metadata, like Memento-Datetime, for each URI-M. However, when some URI-Ms are dereferenced, they simply provide a redirect to a different URI-M (instead of a unique representation at the datetime), often also present in the TimeMap. This infers that confidently obtaining an accurate count quantifying the number of non-forwarding captures for a URI-R is not possible using a TimeMap alone and that the magnitude of a TimeMap is not equivalent to the number of representations it identifies. In this work we discuss this particular phenomena in depth. We also perform a breakdown of the dynamics of counting mementos for a particular URI-R (google.com) and quantify the prevalence of the various canonicalization patterns that exacerbate attempts at counting using only a TimeMap. For google.com we found that 84.9% of the URI-Ms result in an HTTP redirect when dereferenced. We expand on and apply this metric to TimeMaps for seven other URI-Rs of large Web sites and thirteen academic institutions. Using a ratio metric DI for the number of URI-Ms without redirects to those requiring a redirect when dereferenced, five of the eight large web sites' and two of the thirteen academic institutions' TimeMaps had a ratio of ratio less than one, indicating that more than half of the URI-Ms in these TimeMaps result in redirects when dereferenced. △ Less

Submitted 9 March, 2017; originally announced March 2017.

Comments: 43 pages, 8 figures

arXiv:1307.8067 [pdf, other]

doi 10.1007/978-3-642-40501-3_5

On the Change in Archivability of Websites Over Time

Authors: Mat Kelly, Justin F. Brunelle, Michele C. Weigle, Michael L. Nelson

Abstract: As web technologies evolve, web archivists work to keep up so that our digital history is preserved. Recent advances in web technologies have introduced client-side executed scripts that load data without a referential identifier or that require user interaction (e.g., content loading when the page has scrolled). These advances have made automating methods for capturing web pages more difficult. B… ▽ More As web technologies evolve, web archivists work to keep up so that our digital history is preserved. Recent advances in web technologies have introduced client-side executed scripts that load data without a referential identifier or that require user interaction (e.g., content loading when the page has scrolled). These advances have made automating methods for capturing web pages more difficult. Because of the evolving schemes of publishing web pages along with the progressive capability of web preservation tools, the archivability of pages on the web has varied over time. In this paper we show that the archivability of a web page can be deduced from the type of page being archived, which aligns with that page's accessibility in respect to dynamic content. We show concrete examples of when these technologies were introduced by referencing mementos of pages that have persisted through a long evolution of available technologies. Identifying these reasons for the inability of these web pages to be archived in the past in respect to accessibility serves as a guide for ensuring that content that has longevity is published using good practice methods that make it available for preservation. △ Less

Submitted 30 July, 2013; originally announced July 2013.

Comments: 12 pages, 8 figures, Theory and Practice of Digital Libraries (TPDL) 2013, Valletta, Malta

Showing 1–43 of 43 results for author: Kelly, M