Search | arXiv e-print repository

doi 10.2196/59882

Analyzing Geospatial and Socioeconomic Disparities in Breast Cancer Screening Among Populations in the United States: Machine Learning Approach

Authors: Soheil Hashtarkhani, Yiwang Zhou, Fekede Asefa Kumsa, Shelley White-Means, David L Schwartz, Arash Shaban-Nejad

Abstract: Breast cancer screening plays a pivotal role in early detection and subsequent effective management of the disease, impacting patient outcomes and survival rates. This study aims to assess breast cancer screening rates nationwide in the United States and investigate the impact of social determinants of health on these screening rates. Data on mammography screening at the census tract level for 201… ▽ More Breast cancer screening plays a pivotal role in early detection and subsequent effective management of the disease, impacting patient outcomes and survival rates. This study aims to assess breast cancer screening rates nationwide in the United States and investigate the impact of social determinants of health on these screening rates. Data on mammography screening at the census tract level for 2018 and 2020 were collected from the Behavioral Risk Factor Surveillance System. We developed a large dataset of social determinants of health, comprising 13 variables for 72337 census tracts. Spatial analysis employing Getis-Ord Gi statistics was used to identify clusters of high and low breast cancer screening rates. To evaluate the influence of these social determinants, we implemented a random forest model, with the aim of comparing its performance to linear regression and support vector machine models. The models were evaluated using R2 and root mean squared error metrics. Shapley Additive Explanations values were subsequently used to assess the significance of variables and direction of their influence. Geospatial analysis revealed elevated screening rates in the eastern and northern United States, while central and midwestern regions exhibited lower rates. The random forest model demonstrated superior performance, with an R2=64.53 and root mean squared error of 2.06 compared to linear regression and support vector machine models. Shapley Additive Explanations values indicated that the percentage of the Black population, the number of mammography facilities within a 10-mile radius, and the percentage of the population with at least a bachelor's degree were the most influential variables, all positively associated with mammography screening rates. △ Less

Submitted 30 January, 2025; originally announced February 2025.

Comments: 11 Pages, 4 Figures, 2 Tables

ACM Class: I.2.1

Journal ref: JMIR Cancer 2025;11:e59882

arXiv:2501.18638 [pdf, ps, other]

Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation

Authors: Daniel Schwartz, Dmitriy Bespalov, Zhe Wang, Ninad Kulkarni, Yanjun Qi

Abstract: As large language models (LLMs) become increasingly prevalent, ensuring their robustness against adversarial misuse is crucial. This paper introduces the GAP (Graph of Attacks with Pruning) framework, an advanced approach for generating stealthy jailbreak prompts to evaluate and enhance LLM safeguards. GAP addresses limitations in existing tree-based LLM jailbreak methods by implementing an interc… ▽ More As large language models (LLMs) become increasingly prevalent, ensuring their robustness against adversarial misuse is crucial. This paper introduces the GAP (Graph of Attacks with Pruning) framework, an advanced approach for generating stealthy jailbreak prompts to evaluate and enhance LLM safeguards. GAP addresses limitations in existing tree-based LLM jailbreak methods by implementing an interconnected graph structure that enables knowledge sharing across attack paths. Our experimental evaluation demonstrates GAP's superiority over existing techniques, achieving a 20.8% increase in attack success rates while reducing query costs by 62.7%. GAP consistently outperforms state-of-the-art methods for attacking both open and closed LLMs, with attack success rates of >96%. Additionally, we present specialized variants like GAP-Auto for automated seed generation and GAP-VLM for multimodal attacks. GAP-generated prompts prove highly effective in improving content moderation systems, increasing true positive detection rates by 108.5% and accuracy by 183.6% when used for fine-tuning. Our implementation is available at https://github.com/dsbuddy/GAP-LLM-Safety. △ Less

Submitted 13 June, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

Comments: 14 pages, 5 figures

arXiv:2412.11286 [pdf, other]

Detecting Daily Living Gait Amid Huntington's Disease Chorea using a Foundation Deep Learning Model

Authors: Dafna Schwartz, Lori Quinn, Nora E. Fritz, Lisa M. Muratori, Jeffery M. Hausdorff, Ran Gilad Bachrach

Abstract: Wearable sensors offer a non-invasive way to collect physical activity (PA) data, with walking as a key component. Existing models often struggle to detect gait bouts in individuals with neurodegenerative diseases (NDDs) involving involuntary movements. We developed J-Net, a deep learning model inspired by U-Net, which uses a pre-trained self-supervised foundation model fine-tuned with Huntington`… ▽ More Wearable sensors offer a non-invasive way to collect physical activity (PA) data, with walking as a key component. Existing models often struggle to detect gait bouts in individuals with neurodegenerative diseases (NDDs) involving involuntary movements. We developed J-Net, a deep learning model inspired by U-Net, which uses a pre-trained self-supervised foundation model fine-tuned with Huntington`s disease (HD) in-lab data and paired with a segmentation head for gait detection. J-Net processes wrist-worn accelerometer data to detect gait during daily living. We evaluated J-Net on in-lab and daily-living data from HD, Parkinson`s disease (PD), and controls. J-Net achieved a 10-percentage point improvement in ROC-AUC for HD over existing methods, reaching 0.97 for in-lab data. In daily-living environments, J-Net estimates showed no significant differences in median daily walking time between HD and controls (p = 0.23), in contrast to other models, which indicated counterintuitive results (p < 0.005). Walking time measured by J-Net correlated with the UHDRS-TMS clinical severity score (r=-0.52; p=0.02), confirming its clinical relevance. Fine-tuning J-Net on PD data also improved gait detection over current methods. J-Net`s architecture effectively addresses the challenges of gait detection in severe chorea and offers robust performance in daily living. The dataset and J-Net model are publicly available, providing a resource for further research into NDD-related gait impairments. △ Less

Submitted 15 December, 2024; originally announced December 2024.

arXiv:2408.04720 [pdf, other]

doi 10.21468/SciPostPhys.18.2.040

Learning the Simplicity of Scattering Amplitudes

Authors: Clifford Cheung, Aurélien Dersy, Matthew D. Schwartz

Abstract: The simplification and reorganization of complex expressions lies at the core of scientific progress, particularly in theoretical high-energy physics. This work explores the application of machine learning to a particular facet of this challenge: the task of simplifying scattering amplitudes expressed in terms of spinor-helicity variables. We demonstrate that an encoder-decoder transformer archite… ▽ More The simplification and reorganization of complex expressions lies at the core of scientific progress, particularly in theoretical high-energy physics. This work explores the application of machine learning to a particular facet of this challenge: the task of simplifying scattering amplitudes expressed in terms of spinor-helicity variables. We demonstrate that an encoder-decoder transformer architecture achieves impressive simplification capabilities for expressions composed of handfuls of terms. Lengthier expressions are implemented in an additional embedding network, trained using contrastive learning, which isolates subexpressions that are more likely to simplify. The resulting framework is capable of reducing expressions with hundreds of terms - a regular occurrence in quantum field theory calculations - to vastly simpler equivalent expressions. Starting from lengthy input expressions, our networks can generate the Parke-Taylor formula for five-point gluon scattering, as well as new compact expressions for five-point amplitudes involving scalars and gravitons. An interactive demonstration can be found at https://spinorhelicity.streamlit.app . △ Less

Submitted 19 November, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

Comments: 25+15 pages, 9+6 figures, v2: typos correction and extended the introduction, conclusion, sections 2.2, 2.4 and appendix F

Report number: CALT-TH 2024-031

Journal ref: SciPost Phys. 18, 040 (2025)

arXiv:2406.15936 [pdf, other]

An Automated SQL Query Grading System Using An Attention-Based Convolutional Neural Network

Authors: Donald R. Schwartz, Pablo Rivas

Abstract: Grading SQL queries can be a time-consuming, tedious and challenging task, especially as the number of student submissions increases. Several systems have been introduced in an attempt to mitigate these challenges, but those systems have their own limitations. This paper describes our novel approach to automating the process of grading SQL queries. Unlike previous approaches, we employ a unique co… ▽ More Grading SQL queries can be a time-consuming, tedious and challenging task, especially as the number of student submissions increases. Several systems have been introduced in an attempt to mitigate these challenges, but those systems have their own limitations. This paper describes our novel approach to automating the process of grading SQL queries. Unlike previous approaches, we employ a unique convolutional neural network architecture that employs a parameter-sharing approach for different machine learning tasks that enables the architecture to induce different knowledge representations of the data to increase its potential for understanding SQL statements. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 12 pages, 8 figures, paper accepted at "The 18th International Conference on Frontiers in Education: Computer Science and Computer Engineering"

ACM Class: I.2.6; H.2.3; K.3.2

arXiv:2403.05575 [pdf]

doi 10.2196/51727

Enhancing Health Care Accessibility and Equity Through a Geoprocessing Toolbox for Spatial Accessibility Analysis: Development and Case Study

Authors: Soheil Hashtarkhani, David L Schwartz, Arash Shaban-Nejad

Abstract: Access to health care services is a critical determinant of population health and well-being. Measuring spatial accessibility to health services is essential for understanding health care distribution and addressing potential inequities. In this study, we developed a geoprocessing toolbox including Python script tools for the ArcGIS Pro environment to measure the spatial accessibility of health se… ▽ More Access to health care services is a critical determinant of population health and well-being. Measuring spatial accessibility to health services is essential for understanding health care distribution and addressing potential inequities. In this study, we developed a geoprocessing toolbox including Python script tools for the ArcGIS Pro environment to measure the spatial accessibility of health services using both classic and enhanced versions of the 2-step floating catchment area method. Each of our tools incorporated both distance buffers and travel time catchments to calculate accessibility scores based on users' choices. Additionally, we developed a separate tool to create travel time catchments that is compatible with both locally available network data sets and ArcGIS Online data sources. We conducted a case study focusing on the accessibility of hemodialysis services in the state of Tennessee using the 4 versions of the accessibility tools. Notably, the calculation of the target population considered age as a significant nonspatial factor influencing hemodialysis service accessibility. Weighted populations were calculated using end-stage renal disease incidence rates in different age groups. The implemented tools are made accessible through ArcGIS Online for free use by the research community. The case study revealed disparities in the accessibility of hemodialysis services, with urban areas demonstrating higher scores compared to rural and suburban regions. These geoprocessing tools can serve as valuable decision-support resources for health care providers, organizations, and policy makers to improve equitable access to health care services. This comprehensive approach to measuring spatial accessibility can empower health care stakeholders to address health care distribution challenges effectively. △ Less

Submitted 26 February, 2024; originally announced March 2024.

Comments: 11 pages, 5 figures

MSC Class: 68U05

Journal ref: JMIR Form Res JMIR Formative Research. 2024 Feb 21:8:e51727

arXiv:2308.09451 [pdf, other]

Reconstructing $S$-matrix Phases with Machine Learning

Authors: Aurélien Dersy, Matthew D. Schwartz, Alexander Zhiboedov

Abstract: An important element of the $S$-matrix bootstrap program is the relationship between the modulus of an $S$-matrix element and its phase. Unitarity relates them by an integral equation. Even in the simplest case of elastic scattering, this integral equation cannot be solved analytically and numerical approaches are required. We apply modern machine learning techniques to studying the unitarity cons… ▽ More An important element of the $S$-matrix bootstrap program is the relationship between the modulus of an $S$-matrix element and its phase. Unitarity relates them by an integral equation. Even in the simplest case of elastic scattering, this integral equation cannot be solved analytically and numerical approaches are required. We apply modern machine learning techniques to studying the unitarity constraint. We find that for a given modulus, when a phase exists it can generally be reconstructed to good accuracy with machine learning. Moreover, the loss of the reconstruction algorithm provides a good proxy for whether a given modulus can be consistent with unitarity at all. In addition, we study the question of whether multiple phases can be consistent with a single modulus, finding novel phase-ambiguous solutions. In particular, we find a new phase-ambiguous solution which pushes the known limit on such solutions significantly beyond the previous bound. △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: 43 pages, 21 figures

Report number: CERN-TH-2023-161

arXiv:2307.03223 [pdf, ps, other]

Neural Network Field Theories: Non-Gaussianity, Actions, and Locality

Authors: Mehmet Demirtas, James Halverson, Anindita Maiti, Matthew D. Schwartz, Keegan Stoner

Abstract: Both the path integral measure in field theory and ensembles of neural networks describe distributions over functions. When the central limit theorem can be applied in the infinite-width (infinite-$N$) limit, the ensemble of networks corresponds to a free field theory. Although an expansion in $1/N$ corresponds to interactions in the field theory, others, such as in a small breaking of the statist… ▽ More Both the path integral measure in field theory and ensembles of neural networks describe distributions over functions. When the central limit theorem can be applied in the infinite-width (infinite-$N$) limit, the ensemble of networks corresponds to a free field theory. Although an expansion in $1/N$ corresponds to interactions in the field theory, others, such as in a small breaking of the statistical independence of network parameters, can also lead to interacting theories. These other expansions can be advantageous over the $1/N$-expansion, for example by improved behavior with respect to the universal approximation theorem. Given the connected correlators of a field theory, one can systematically reconstruct the action order-by-order in the expansion parameter, using a new Feynman diagram prescription whose vertices are the connected correlators. This method is motivated by the Edgeworth expansion and allows one to derive actions for neural network field theories. Conversely, the correspondence allows one to engineer architectures realizing a given field theory by representing action deformations as deformations of neural network parameter densities. As an example, $φ^4$ theory is realized as an infinite-$N$ neural network field theory. △ Less

Submitted 13 December, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: 49 pages, plus references and appendices

arXiv:2304.03472 [pdf, other]

Does Prompt-Tuning Language Model Ensure Privacy?

Authors: Shangyu Xie, Wei Dai, Esha Ghosh, Sambuddha Roy, Dan Schwartz, Kim Laine

Abstract: Prompt-tuning has received attention as an efficient tuning method in the language domain, i.e., tuning a prompt that is a few tokens long, while keeping the large language model frozen, yet achieving comparable performance with conventional fine-tuning. Considering the emerging privacy concerns with language models, we initiate the study of privacy leakage in the setting of prompt-tuning. We firs… ▽ More Prompt-tuning has received attention as an efficient tuning method in the language domain, i.e., tuning a prompt that is a few tokens long, while keeping the large language model frozen, yet achieving comparable performance with conventional fine-tuning. Considering the emerging privacy concerns with language models, we initiate the study of privacy leakage in the setting of prompt-tuning. We first describe a real-world email service pipeline to provide customized output for various users via prompt-tuning. Then we propose a novel privacy attack framework to infer users' private information by exploiting the prompt module with user-specific signals. We conduct a comprehensive privacy evaluation on the target pipeline to demonstrate the potential leakage from prompt-tuning. The results also demonstrate the effectiveness of the proposed attack. △ Less

Submitted 15 April, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

Comments: 8 pages

arXiv:2206.04115 [pdf, other]

Simplifying Polylogarithms with Machine Learning

Authors: Aurélien Dersy, Matthew D. Schwartz, Xiaoyuan Zhang

Abstract: Polylogrithmic functions, such as the logarithm or dilogarithm, satisfy a number of algebraic identities. For the logarithm, all the identities follow from the product rule. For the dilogarithm and higher-weight classical polylogarithms, the identities can involve five functions or more. In many calculations relevant to particle physics, complicated combinations of polylogarithms often arise from… ▽ More Polylogrithmic functions, such as the logarithm or dilogarithm, satisfy a number of algebraic identities. For the logarithm, all the identities follow from the product rule. For the dilogarithm and higher-weight classical polylogarithms, the identities can involve five functions or more. In many calculations relevant to particle physics, complicated combinations of polylogarithms often arise from Feynman integrals. Although the initial expressions resulting from the integration usually simplify, it is often difficult to know which identities to apply and in what order. To address this bottleneck, we explore to what extent machine learning methods can help. We consider both a reinforcement learning approach, where the identities are analogous to moves in a game, and a transformer network approach, where the problem is viewed analogously to a language-translation task. While both methods are effective, the transformer network appears more powerful and holds promise for practical use in symbolic manipulation tasks in mathematical physics. △ Less

Submitted 8 June, 2022; originally announced June 2022.

Comments: 41 pages, 10 figures

arXiv:2204.07066 [pdf, other]

EvoSTS Forecasting: Evolutionary Sparse Time-Series Forecasting

Authors: Ethan Jacob Moyer, Alisha Isabelle Augustin, Satvik Tripathi, Ansh Aashish Dholakia, Andy Nguyen, Isamu Mclean Isozaki, Daniel Schwartz, Edward Kim

Abstract: In this work, we highlight our novel evolutionary sparse time-series forecasting algorithm also known as EvoSTS. The algorithm attempts to evolutionary prioritize weights of Long Short-Term Memory (LSTM) Network that best minimize the reconstruction loss of a predicted signal using a learned sparse coded dictionary. In each generation of our evolutionary algorithm, a set number of children with th… ▽ More In this work, we highlight our novel evolutionary sparse time-series forecasting algorithm also known as EvoSTS. The algorithm attempts to evolutionary prioritize weights of Long Short-Term Memory (LSTM) Network that best minimize the reconstruction loss of a predicted signal using a learned sparse coded dictionary. In each generation of our evolutionary algorithm, a set number of children with the same initial weights are spawned. Each child undergoes a training step and adjusts their weights on the same data. Due to stochastic back-propagation, the set of children has a variety of weights with different levels of performance. The weights that best minimize the reconstruction loss with a given signal dictionary are passed to the next generation. The predictions from the best-performing weights of the first and last generation are compared. We found improvements while comparing the weights of these two generations. However, due to several confounding parameters and hyperparameter limitations, some of the weights had negligible improvements. To the best of our knowledge, this is the first attempt to use sparse coding in this way to optimize time series forecasting model weights, such as those of an LSTM network. △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: 5 pages, 2 figures, 2 tables

arXiv:2203.14928 [pdf, other]

RAVIR: A Dataset and Methodology for the Semantic Segmentation and Quantitative Analysis of Retinal Arteries and Veins in Infrared Reflectance Imaging

Authors: Ali Hatamizadeh, Hamid Hosseini, Niraj Patel, Jinseo Choi, Cameron C. Pole, Cory M. Hoeferlin, Steven D. Schwartz, Demetri Terzopoulos

Abstract: The retinal vasculature provides important clues in the diagnosis and monitoring of systemic diseases including hypertension and diabetes. The microvascular system is of primary involvement in such conditions, and the retina is the only anatomical site where the microvasculature can be directly observed. The objective assessment of retinal vessels has long been considered a surrogate biomarker for… ▽ More The retinal vasculature provides important clues in the diagnosis and monitoring of systemic diseases including hypertension and diabetes. The microvascular system is of primary involvement in such conditions, and the retina is the only anatomical site where the microvasculature can be directly observed. The objective assessment of retinal vessels has long been considered a surrogate biomarker for systemic vascular diseases, and with recent advancements in retinal imaging and computer vision technologies, this topic has become the subject of renewed attention. In this paper, we present a novel dataset, dubbed RAVIR, for the semantic segmentation of Retinal Arteries and Veins in Infrared Reflectance (IR) imaging. It enables the creation of deep learning-based models that distinguish extracted vessel type without extensive post-processing. We propose a novel deep learning-based methodology, denoted as SegRAVIR, for the semantic segmentation of retinal arteries and veins and the quantitative measurement of the widths of segmented vessels. Our extensive experiments validate the effectiveness of SegRAVIR and demonstrate its superior performance in comparison to state-of-the-art models. Additionally, we propose a knowledge distillation framework for the domain adaptation of RAVIR pretrained networks on color images. We demonstrate that our pretraining procedure yields new state-of-the-art benchmarks on the DRIVE, STARE, and CHASE_DB1 datasets. Dataset link: https://ravirdataset.github.io/data/ △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Paper accepted to IEEE Journal of Biomedical Health Informatics (JBHI)

arXiv:2110.06948 [pdf, other]

doi 10.1007/JHEP03(2022)066

Challenges for Unsupervised Anomaly Detection in Particle Physics

Authors: Katherine Fraser, Samuel Homiller, Rashmish K. Mishra, Bryan Ostdiek, Matthew D. Schwartz

Abstract: Anomaly detection relies on designing a score to determine whether a particular event is uncharacteristic of a given background distribution. One way to define a score is to use autoencoders, which rely on the ability to reconstruct certain types of data (background) but not others (signals). In this paper, we study some challenges associated with variational autoencoders, such as the dependence o… ▽ More Anomaly detection relies on designing a score to determine whether a particular event is uncharacteristic of a given background distribution. One way to define a score is to use autoencoders, which rely on the ability to reconstruct certain types of data (background) but not others (signals). In this paper, we study some challenges associated with variational autoencoders, such as the dependence on hyperparameters and the metric used, in the context of anomalous signal (top and $W$) jets in a QCD background. We find that the hyperparameter choices strongly affect the network performance and that the optimal parameters for one signal are non-optimal for another. In exploring the networks, we uncover a connection between the latent space of a variational autoencoder trained using mean-squared-error and the optimal transport distances within the dataset. We then show that optimal transport distances to representative events in the background dataset can be used directly for anomaly detection, with performance comparable to the autoencoders. Whether using autoencoders or optimal transport distances for anomaly detection, we find that the choices that best represent the background are not necessarily best for signal identification. These challenges with unsupervised anomaly detection bolster the case for additional exploration of semi-supervised or alternative approaches. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Comments: 22 + 2 pages, 8 figures, 2 tables

arXiv:2101.06511 [pdf, other]

Towards Searching Efficient and Accurate Neural Network Architectures in Binary Classification Problems

Authors: Yigit Alparslan, Ethan Jacob Moyer, Isamu Mclean Isozaki, Daniel Schwartz, Adam Dunlop, Shesh Dave, Edward Kim

Abstract: In recent years, deep neural networks have had great success in machine learning and pattern recognition. Architecture size for a neural network contributes significantly to the success of any neural network. In this study, we optimize the selection process by investigating different search algorithms to find a neural network architecture size that yields the highest accuracy. We apply binary sear… ▽ More In recent years, deep neural networks have had great success in machine learning and pattern recognition. Architecture size for a neural network contributes significantly to the success of any neural network. In this study, we optimize the selection process by investigating different search algorithms to find a neural network architecture size that yields the highest accuracy. We apply binary search on a very well-defined binary classification network search space and compare the results to those of linear search. We also propose how to relax some of the assumptions regarding the dataset so that our solution can be generalized to any binary classification problem. We report a 100-fold running time improvement over the naive linear search when we apply the binary search method to our datasets in order to find the best architecture candidate. By finding the optimal architecture size for any binary classification problem quickly, we hope that our research contributes to discovering intelligent algorithms for optimizing architecture size selection in machine learning. △ Less

Submitted 16 January, 2021; originally announced January 2021.

Comments: 8 pages, 11 figures

arXiv:2008.12360 [pdf, other]

doi 10.1145/3394171.3413755

Language Models as Emotional Classifiers for Textual Conversations

Authors: Connor T. Heaton, David M. Schwartz

Abstract: Emotions play a critical role in our everyday lives by altering how we perceive, process and respond to our environment. Affective computing aims to instill in computers the ability to detect and act on the emotions of human actors. A core aspect of any affective computing system is the classification of a user's emotion. In this study we present a novel methodology for classifying emotion in a co… ▽ More Emotions play a critical role in our everyday lives by altering how we perceive, process and respond to our environment. Affective computing aims to instill in computers the ability to detect and act on the emotions of human actors. A core aspect of any affective computing system is the classification of a user's emotion. In this study we present a novel methodology for classifying emotion in a conversation. At the backbone of our proposed methodology is a pre-trained Language Model (LM), which is supplemented by a Graph Convolutional Network (GCN) that propagates information over the predicate-argument structure identified in an utterance. We apply our proposed methodology on the IEMOCAP and Friends data sets, achieving state-of-the-art performance on the former and a higher accuracy on certain emotional labels on the latter. Furthermore, we examine the role context plays in our methodology by altering how much of the preceding conversation the model has access to when making a classification. △ Less

Submitted 27 August, 2020; originally announced August 2020.

arXiv:1911.03268 [pdf, other]

Inducing brain-relevant bias in natural language processing models

Authors: Dan Schwartz, Mariya Toneva, Leila Wehbe

Abstract: Progress in natural language processing (NLP) models that estimate representations of word sequences has recently been leveraged to improve the understanding of language processing in the brain. However, these models have not been specifically designed to capture the way the brain represents language meaning. We hypothesize that fine-tuning these models to predict recordings of brain activity of p… ▽ More Progress in natural language processing (NLP) models that estimate representations of word sequences has recently been leveraged to improve the understanding of language processing in the brain. However, these models have not been specifically designed to capture the way the brain represents language meaning. We hypothesize that fine-tuning these models to predict recordings of brain activity of people reading text will lead to representations that encode more brain-activity-relevant language information. We demonstrate that a version of BERT, a recently introduced and powerful language model, can improve the prediction of brain activity after fine-tuning. We show that the relationship between language and brain activity learned by BERT during this fine-tuning transfers across multiple participants. We also show that, for some participants, the fine-tuned representations learned from both magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI) are better for predicting fMRI than the representations learned from fMRI alone, indicating that the learned representations capture brain-activity-relevant information that is not simply an artifact of the modality. While changes to language representations help the model predict brain activity, they also do not harm the model's ability to perform downstream NLP tasks. Our findings are notable for research on language understanding in the brain. △ Less

Submitted 29 October, 2019; originally announced November 2019.

Comments: To be published in the proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

arXiv:1905.12120 [pdf, other]

Deep Dilated Convolutional Nets for the Automatic Segmentation of Retinal Vessels

Authors: Ali Hatamizadeh, Hamid Hosseini, Zhengyuan Liu, Steven D. Schwartz, Demetri Terzopoulos

Abstract: The reliable segmentation of retinal vasculature can provide the means to diagnose and monitor the progression of a variety of diseases affecting the blood vessel network, including diabetes and hypertension. We leverage the power of convolutional neural networks to devise a reliable and fully automated method that can accurately detect, segment, and analyze retinal vessels. In particular, we prop… ▽ More The reliable segmentation of retinal vasculature can provide the means to diagnose and monitor the progression of a variety of diseases affecting the blood vessel network, including diabetes and hypertension. We leverage the power of convolutional neural networks to devise a reliable and fully automated method that can accurately detect, segment, and analyze retinal vessels. In particular, we propose a novel, fully convolutional deep neural network with an encoder-decoder architecture that employs dilated spatial pyramid pooling with multiple dilation rates to recover the lost content in the encoder and add multiscale contextual information to the decoder. We also propose a simple yet effective way of quantifying and tracking the widths of retinal vessels through direct use of the segmentation predictions. Unlike previous deep-learning-based approaches to retinal vessel segmentation that mainly rely on patch-wise analysis, our proposed method leverages a whole-image approach during training and inference, resulting in more efficient training and faster inference through the access of global content in the image. We have tested our method on two publicly available datasets, and our state-of-the-art results on both the DRIVE and CHASE-DB1 datasets attest to the effectiveness of our approach. △ Less

Submitted 20 July, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

arXiv:1904.01548 [pdf, other]

doi 10.18653/v1/N19-1005

Understanding language-elicited EEG data by predicting it from a fine-tuned language model

Authors: Dan Schwartz, Tom Mitchell

Abstract: Electroencephalography (EEG) recordings of brain activity taken while participants read or listen to language are widely used within the cognitive neuroscience and psycholinguistics communities as a tool to study language comprehension. Several time-locked stereotyped EEG responses to word-presentations -- known collectively as event-related potentials (ERPs) -- are thought to be markers for seman… ▽ More Electroencephalography (EEG) recordings of brain activity taken while participants read or listen to language are widely used within the cognitive neuroscience and psycholinguistics communities as a tool to study language comprehension. Several time-locked stereotyped EEG responses to word-presentations -- known collectively as event-related potentials (ERPs) -- are thought to be markers for semantic or syntactic processes that take place during comprehension. However, the characterization of each individual ERP in terms of what features of a stream of language trigger the response remains controversial. Improving this characterization would make ERPs a more useful tool for studying language comprehension. We take a step towards better understanding the ERPs by fine-tuning a language model to predict them. This new approach to analysis shows for the first time that all of the ERPs are predictable from embeddings of a stream of language. Prior work has only found two of the ERPs to be predictable. In addition to this analysis, we examine which ERPs benefit from sharing parameters during joint training. We find that two pairs of ERPs previously identified in the literature as being related to each other benefit from joint training, while several other pairs of ERPs that benefit from joint training are suggestive of potential relationships. Extensions of this analysis that further examine what kinds of information in the model embeddings relate to each ERP have the potential to elucidate the processes involved in human language comprehension. △ Less

Submitted 2 April, 2019; originally announced April 2019.

Comments: To appear in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics

arXiv:1712.04602 [pdf, other]

On the organization of grid and place cells: Neural de-noising via subspace learning

Authors: David M. Schwartz, O. Ozan Koyluoglu

Abstract: Place cells in the hippocampus are active when an animal visits a certain location (referred to as a place field) within an environment. Grid cells in the medial entorhinal cortex (MEC) respond at multiple locations, with firing fields that form a periodic and hexagonal tiling of the environment. The joint activity of grid and place cell populations, as a function of location, forms a neural code… ▽ More Place cells in the hippocampus are active when an animal visits a certain location (referred to as a place field) within an environment. Grid cells in the medial entorhinal cortex (MEC) respond at multiple locations, with firing fields that form a periodic and hexagonal tiling of the environment. The joint activity of grid and place cell populations, as a function of location, forms a neural code for space. An ensemble of codes is generated by varying grid and place cell population parameters. For each code in this ensemble, codewords are generated by stimulating a network with a discrete set of locations. In this manuscript, we develop an understanding of the relationships between coding theoretic properties of these combined populations and code construction parameters. These relationships are revisited by measuring the performances of biologically realizable algorithms implemented by networks of place and grid cell populations, as well as constraint neurons, which perform de-noising operations. Objectives of this work include the investigation of coding theoretic limitations of the mammalian neural code for location and how communication between grid and place cell networks may improve the accuracy of each population's representation. Simulations demonstrate that de-noising mechanisms analyzed here can significantly improve fidelity of this neural representation of space. Further, patterns observed in connectivity of each population of simulated cells suggest that inter-hippocampal-medial-entorhinal-cortical connectivity decreases downward along the dorsoventral axis. △ Less

Submitted 15 May, 2018; v1 submitted 12 December, 2017; originally announced December 2017.

arXiv:1603.01207 [pdf]

doi 10.46298/jdmdh.1395

From manuscript catalogues to a handbook of Syriac literature: Modeling an infrastructure for Syriaca.org

Authors: Nathan P. Gibson, David A. Michelson, Daniel L. Schwartz

Abstract: Despite increasing interest in Syriac studies and growing digital availability of Syriac texts, there is currently no up-to-date infrastructure for discovering, identifying, classifying, and referencing works of Syriac literature. The standard reference work (Baumstark's Geschichte) is over ninety years old, and the perhaps 20,000 Syriac manuscripts extant worldwide can be accessed only through di… ▽ More Despite increasing interest in Syriac studies and growing digital availability of Syriac texts, there is currently no up-to-date infrastructure for discovering, identifying, classifying, and referencing works of Syriac literature. The standard reference work (Baumstark's Geschichte) is over ninety years old, and the perhaps 20,000 Syriac manuscripts extant worldwide can be accessed only through disparate catalogues and databases. The present article proposes a tentative data model for Syriaca.org's New Handbook of Syriac Literature, an open-access digital publication that will serve as both an authority file for Syriac works and a guide to accessing their manuscript representations, editions, and translations. The authors hope that by publishing a draft data model they can receive feedback and incorporate suggestions into the next stage of the project. △ Less

Submitted 3 March, 2016; originally announced March 2016.

Comments: Part of special issue: Computer-Aided Processing of Intertextuality in Ancient Languages. 15 pages, 4 figures

Journal ref: Journal of Data Mining & Digital Humanities, Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages (May 30, 2017) jdmdh:1395

arXiv:1404.7173 [pdf, other]

Nonmonotonic Reasoning as a Temporal Activity

Authors: Daniel G. Schwartz

Abstract: A {\it dynamic reasoning system} (DRS) is an adaptation of a conventional formal logical system that explicitly portrays reasoning as a temporal activity, with each extralogical input to the system and each inference rule application being viewed as occurring at a distinct time step. Every DRS incorporates some well-defined logic together with a controller that serves to guide the reasoning proces… ▽ More A {\it dynamic reasoning system} (DRS) is an adaptation of a conventional formal logical system that explicitly portrays reasoning as a temporal activity, with each extralogical input to the system and each inference rule application being viewed as occurring at a distinct time step. Every DRS incorporates some well-defined logic together with a controller that serves to guide the reasoning process in response to user inputs. Logics are generic, whereas controllers are application-specific. Every controller does, nonetheless, provide an algorithm for nonmonotonic belief revision. The general notion of a DRS comprises a framework within which one can formulate the logic and algorithms for a given application and prove that the algorithms are correct, i.e., that they serve to (i) derive all salient information and (ii) preserve the consistency of the belief set. This paper illustrates the idea with ordinary first-order predicate calculus, suitably modified for the present purpose, and an example. The example revisits some classic nonmonotonic reasoning puzzles (Opus the Penguin, Nixon Diamond) and shows how these can be resolved in the context of a DRS, using an expanded version of first-order logic that incorporates typed predicate symbols. All concepts are rigorously defined and effectively computable, thereby providing the foundation for a future software implementation. △ Less

Submitted 28 April, 2014; originally announced April 2014.

Comments: Proceedings of the 15th International Workshop on Non-Monotonic Reasoning (NMR 2014), Vienna, Austria, 17-19 July 2014

arXiv:1308.5374 [pdf, other]

Dynamic Reasoning Systems

Authors: Daniel G. Schwartz

Abstract: A {\it dynamic reasoning system} (DRS) is an adaptation of a conventional formal logical system that explicitly portrays reasoning as a temporal activity, with each extralogical input to the system and each inference rule application being viewed as occurring at a distinct time step. Every DRS incorporates some well-defined logic together with a controller that serves to guide the reasoning proces… ▽ More A {\it dynamic reasoning system} (DRS) is an adaptation of a conventional formal logical system that explicitly portrays reasoning as a temporal activity, with each extralogical input to the system and each inference rule application being viewed as occurring at a distinct time step. Every DRS incorporates some well-defined logic together with a controller that serves to guide the reasoning process in response to user inputs. Logics are generic, whereas controllers are application-specific. Every controller does, nonetheless, provide an algorithm for nonmonotonic belief revision. The general notion of a DRS comprises a framework within which one can formulate the logic and algorithms for a given application and prove that the algorithms are correct, i.e., that they serve to (i) derive all salient information and (ii) preserve the consistency of the belief set. This paper illustrates the idea with ordinary first-order predicate calculus, suitably modified for the present purpose, and two examples. The latter example revisits some classic nonmonotonic reasoning puzzles (Opus the Penguin, Nixon Diamond) and shows how these can be resolved in the context of a DRS, using an expanded version of first-order logic that incorporates typed predicate symbols. All concepts are rigorously defined and effectively computable, thereby providing the foundation for a future software implementation. △ Less

Submitted 24 August, 2013; originally announced August 2013.

Showing 1–22 of 22 results for author: Schwartz, D