Skip to main content

Showing 1–16 of 16 results for author: Speakman, S

.
  1. arXiv:2506.12576  [pdf, ps, other

    cs.CL cs.AI

    Enabling Precise Topic Alignment in Large Language Models Via Sparse Autoencoders

    Authors: Ananya Joshi, Celia Cintas, Skyler Speakman

    Abstract: Recent work shows that Sparse Autoencoders (SAE) applied to large language model (LLM) layers have neurons corresponding to interpretable concepts. These SAE neurons can be modified to align generated outputs, but only towards pre-identified topics and with some parameter tuning. Our approach leverages the observational and modification properties of SAEs to enable alignment for any topic. This me… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  2. arXiv:2505.24539  [pdf, ps, other

    cs.CL cs.AI

    Localizing Persona Representations in LLMs

    Authors: Celia Cintas, Miriam Rateike, Erik Miehling, Elizabeth Daly, Skyler Speakman

    Abstract: We present a study on how and where personas -- defined by distinct sets of human characteristics, values, and beliefs -- are encoded in the representation space of large language models (LLMs). Using a range of dimension reduction and pattern recognition methods, we first identify the model layers that show the greatest divergence in encoding these representations. We then analyze the activations… ▽ More

    Submitted 3 June, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

  3. arXiv:2401.10358  [pdf

    physics.app-ph

    Percolation pathway switching in laser graphitized polyimide conducting tracks

    Authors: Melanie Whitfield, Larry Yip, Stuart Speakman, David Hasko

    Abstract: Laser processing has been used to create weakly conducting tracks in polyimide film. Raman spectroscopy shows that these tracks consist of nanometre sized graphitic regions contained in a carbon-rich matrix. The measured temperature dependent and electric field dependent conduction characteristics show an activated characteristic that is consistent with nearest neighbour hopping. In addition, disc… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: 4 pages, 7 figures

    ACM Class: J.2

  4. arXiv:2312.08143  [pdf, other

    cs.LG cs.AI

    Efficient Representation of the Activation Space in Deep Neural Networks

    Authors: Tanya Akumu, Celia Cintas, Girmaw Abebe Tadesse, Adebayo Oshingbesan, Skyler Speakman, Edward McFowland III

    Abstract: The representations of the activation space of deep neural networks (DNNs) are widely utilized for tasks like natural language processing, anomaly detection and speech recognition. Due to the diverse nature of these tasks and the large size of DNNs, an efficient and task-independent representation of activations becomes crucial. Empirical p-values have been used to quantify the relative strength o… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  5. arXiv:2312.02798  [pdf, other

    cs.LG cs.CL

    Weakly Supervised Detection of Hallucinations in LLM Activations

    Authors: Miriam Rateike, Celia Cintas, John Wamburu, Tanya Akumu, Skyler Speakman

    Abstract: We propose an auditing method to identify whether a large language model (LLM) encodes patterns such as hallucinations in its internal states, which may propagate to downstream tasks. We introduce a weakly supervised auditing technique using a subset scanning approach to detect anomalous patterns in LLM activations from pre-trained models. Importantly, our method does not need knowledge of the typ… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  6. arXiv:2203.04386  [pdf, other

    cs.LG cs.AI cs.IT eess.SP

    Model-free feature selection to facilitate automatic discovery of divergent subgroups in tabular data

    Authors: Girmaw Abebe Tadesse, William Ogallo, Celia Cintas, Skyler Speakman

    Abstract: Data-centric AI encourages the need of cleaning and understanding of data in order to achieve trustworthy AI. Existing technologies, such as AutoML, make it easier to design and train models automatically, but there is a lack of a similar level of capabilities to extract data-centric insights. Manual stratification of tabular data per a feature (e.g., gender) is limited to scale up for higher feat… ▽ More

    Submitted 8 March, 2022; originally announced March 2022.

  7. arXiv:2203.00523  [pdf, other

    cs.CV

    Towards Creativity Characterization of Generative Models via Group-based Subset Scanning

    Authors: Celia Cintas, Payel Das, Brian Quanz, Girmaw Abebe Tadesse, Skyler Speakman, Pin-Yu Chen

    Abstract: Deep generative models, such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), have been employed widely in computational creativity research. However, such models discourage out-of-distribution generation to avoid spurious sample generation, thereby limiting their creativity. Thus, incorporating research on human creativity into generative deep learning techniques pre… ▽ More

    Submitted 26 May, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

    Comments: Accepted to IJCAI 2022 - Creativity Track - Extended version from Synthetic Data Generation Workshop at ICLR'21 submission (arXiv:2104.00479). arXiv admin note: text overlap with arXiv:2105.12479

  8. arXiv:2201.02008  [pdf, other

    cs.LG cs.AI eess.SP

    Sparsity-based Feature Selection for Anomalous Subgroup Discovery

    Authors: Girmaw Abebe Tadesse, William Ogallo, Catherine Wanjiru, Charles Wachira, Isaiah Onando Mulang', Vibha Anand, Aisha Walcott-Bryant, Skyler Speakman

    Abstract: Anomalous pattern detection aims to identify instances where deviation from normalcy is evident, and is widely applicable across domains. Multiple anomalous detection techniques have been proposed in the state of the art. However, there is a common lack of a principled and scalable feature selection method for efficient discovery. Existing feature selection techniques are often conducted by optimi… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

  9. arXiv:2105.12479  [pdf, other

    cs.CV cs.CR cs.LG

    Pattern Detection in the Activation Space for Identifying Synthesized Content

    Authors: Celia Cintas, Skyler Speakman, Girmaw Abebe Tadesse, Victor Akinwande, Edward McFowland III, Komminist Weldemariam

    Abstract: Generative Adversarial Networks (GANs) have recently achieved unprecedented success in photo-realistic image synthesis from low-dimensional random noise. The ability to synthesize high-quality content at a large scale brings potential risks as the generated samples may lead to misinformation that can create severe social, political, health, and business hazards. We propose SubsetGAN to identify ge… ▽ More

    Submitted 27 May, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

    Comments: The paper is under consideration at Pattern Recognition Letters

  10. arXiv:2105.11160  [pdf, other

    cs.CV cs.LG

    Out-of-Distribution Detection in Dermatology using Input Perturbation and Subset Scanning

    Authors: Hannah Kim, Girmaw Abebe Tadesse, Celia Cintas, Skyler Speakman, Kush Varshney

    Abstract: Recent advances in deep learning have led to breakthroughs in the development of automated skin disease classification. As we observe an increasing interest in these models in the dermatology space, it is crucial to address aspects such as the robustness towards input data distribution shifts. Current skin disease models could make incorrect inferences for test samples from different hardware devi… ▽ More

    Submitted 2 June, 2021; v1 submitted 24 May, 2021; originally announced May 2021.

    Comments: Under review for 6th Outlier Detection & Description Workshop

  11. arXiv:2104.00479  [pdf, other

    cs.LG cs.AI

    Towards creativity characterization of generative models via group-based subset scanning

    Authors: Celia Cintas, Payel Das, Brian Quanz, Skyler Speakman, Victor Akinwande, Pin-Yu Chen

    Abstract: Deep generative models, such as Variational Autoencoders (VAEs), have been employed widely in computational creativity research. However, such models discourage out-of-distribution generation to avoid spurious sample generation, limiting their creativity. Thus, incorporating research on human creativity into generative deep learning techniques presents an opportunity to make their outputs more com… ▽ More

    Submitted 26 May, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

    Comments: Synthetic Data Generation Workshop at ICLR'21

  12. arXiv:2011.12707  [pdf, other

    cs.LG cs.DB

    Prediction of neonatal mortality in Sub-Saharan African countries using data-level linkage of multiple surveys

    Authors: Girmaw Abebe Tadesse, Celia Cintas, Skyler Speakman, Komminist Weldemariam

    Abstract: Existing datasets available to address crucial problems, such as child mortality and family planning discontinuation in developing countries, are not ample for data-driven approaches. This is partly due to disjoint data collection efforts employed across locations, times, and variations of modalities. On the other hand, state-of-the-art methods for small data problem are confined to image modaliti… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

    Comments: 3 pages

  13. arXiv:2002.05463  [pdf, ps, other

    cs.LG cs.CR cs.SD eess.AS stat.ML

    Identifying Audio Adversarial Examples via Anomalous Pattern Detection

    Authors: Victor Akinwande, Celia Cintas, Skyler Speakman, Srihari Sridharan

    Abstract: Audio processing models based on deep neural networks are susceptible to adversarial attacks even when the adversarial audio waveform is 99.9% similar to a benign sample. Given the wide application of DNN-based audio recognition systems, detecting the presence of adversarial examples is of high practical relevance. By applying anomalous pattern detection techniques in the activation space of these… ▽ More

    Submitted 25 July, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

  14. arXiv:1908.01224  [pdf, other

    cs.CV

    Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models

    Authors: Daniel Omeiza, Skyler Speakman, Celia Cintas, Komminist Weldermariam

    Abstract: Gaining insight into how deep convolutional neural network models perform image classification and how to explain their outputs have been a concern to computer vision researchers and decision makers. These deep models are often referred to as black box due to low comprehension of their internal workings. As an effort to developing explainable deep learning models, several methods have been propose… ▽ More

    Submitted 3 August, 2019; originally announced August 2019.

    Comments: Accepted in the Intelligent Systems Conference 2019

  15. arXiv:1810.08676  [pdf, other

    cs.LG cs.AI stat.ML

    Subset Scanning Over Neural Network Activations

    Authors: Skyler Speakman, Srihari Sridharan, Sekou Remy, Komminist Weldemariam, Edward McFowland

    Abstract: This work views neural networks as data generating systems and applies anomalous pattern detection techniques on that data in order to detect when a network is processing an anomalous input. Detecting anomalies is a critical component for multiple machine learning problems including detecting adversarial noise. More broadly, this work is a step towards giving neural networks the ability to recogni… ▽ More

    Submitted 19 October, 2018; originally announced October 2018.

  16. arXiv:q-bio/0611084  [pdf, ps, other

    q-bio.PE math.ST q-bio.QM

    A Novel Test for Host-Symbiont Codivergence Indicates Ancient Origin of Fungal Endophytes in Grasses

    Authors: Chris L. Schardl, Kelly D. Craven, Adam Lindstrom, Skyler Speakman, Arnold Stromberg, Ruriko Yoshida

    Abstract: Significant phylogenetic codivergence between plant or animal hosts ($H$) and their symbionts or parasites ($P$) indicate the importance of their interactions on evolutionary time scales. However, valid and realistic methods to test for codivergence are not fully developed. One of the systems where possible codivergence has been of interest involves the large subfamily of temperate grasses (Pooi… ▽ More

    Submitted 28 August, 2008; v1 submitted 25 November, 2006; originally announced November 2006.

    Comments: 6 figures and 6 tables

    Journal ref: Systematic Biology. Volume 57, Issue 3, (2008), p483 - 498