-
A Tale of Two Cultures: Comparing Interpersonal Information Disclosure Norms on Twitter
Authors:
Mainack Mondal,
Anju Punuru,
Tyng-Wen Scott Cheng,
Kenneth Vargas,
Chaz Gundry,
Nathan S Driggs,
Noah Schill,
Nathaniel Carlson,
Josh Bedwell,
Jaden Q Lorenc,
Isha Ghosh,
Yao Li,
Nancy Fulda,
Xinru Page
Abstract:
We present an exploration of cultural norms surrounding online disclosure of information about one's interpersonal relationships (such as information about family members, colleagues, friends, or lovers) on Twitter. The literature identifies the cultural dimension of individualism versus collectivism as being a major determinant of offline communication differences in terms of emotion, topic, and…
▽ More
We present an exploration of cultural norms surrounding online disclosure of information about one's interpersonal relationships (such as information about family members, colleagues, friends, or lovers) on Twitter. The literature identifies the cultural dimension of individualism versus collectivism as being a major determinant of offline communication differences in terms of emotion, topic, and content disclosed. We decided to study whether such differences also occur online in context of Twitter when comparing tweets posted in an individualistic (U.S.) versus a collectivist (India) society. We collected more than 2 million tweets posted in the U.S. and India over a 3 month period which contain interpersonal relationship keywords. A card-sort study was used to develop this culturally-sensitive saturated taxonomy of keywords that represent interpersonal relationships (e.g., ma, mom, mother). Then we developed a high-accuracy interpersonal disclosure detector based on dependency-parsing (F1-score: 86%) to identify when the words refer to a personal relationship of the poster (e.g., "my mom" as opposed to "a mom"). This allowed us to identify the 400K+ tweets in our data set which actually disclose information about the poster's interpersonal relationships. We used a mixed methods approach to analyze these tweets (e.g., comparing the amount of joy expressed about one's family) and found differences in emotion, topic, and content disclosed between tweets from the U.S. versus India. Our analysis also reveals how a combination of qualitative and quantitative methods are needed to uncover these differences; Using just one or the other can be misleading. This study extends the prior literature on Multi-Party Privacy and provides guidance for researchers and designers of culturally-sensitive systems.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
PWESuite: Phonetic Word Embeddings and Tasks They Facilitate
Authors:
Vilém Zouhar,
Kalvin Chang,
Chenxuan Cui,
Nathaniel Carlson,
Nathaniel Robinson,
Mrinmaya Sachan,
David Mortensen
Abstract:
Mapping words into a fixed-dimensional vector space is the backbone of modern NLP. While most word embedding methods successfully encode semantic information, they overlook phonetic information that is crucial for many tasks. We develop three methods that use articulatory features to build phonetically informed word embeddings. To address the inconsistent evaluation of existing phonetic word embed…
▽ More
Mapping words into a fixed-dimensional vector space is the backbone of modern NLP. While most word embedding methods successfully encode semantic information, they overlook phonetic information that is crucial for many tasks. We develop three methods that use articulatory features to build phonetically informed word embeddings. To address the inconsistent evaluation of existing phonetic word embedding methods, we also contribute a task suite to fairly evaluate past, current, and future methods. We evaluate both (1) intrinsic aspects of phonetic word embeddings, such as word retrieval and correlation with sound similarity, and (2) extrinsic performance on tasks such as rhyme and cognate detection and sound analogies. We hope our task suite will promote reproducibility and inspire future phonetic embedding research.
△ Less
Submitted 26 March, 2024; v1 submitted 5 April, 2023;
originally announced April 2023.
-
Partial Product Aware Machine Learning on DNA-Encoded Libraries
Authors:
Polina Binder,
Meghan Lawler,
LaShadric Grady,
Neil Carlson,
Sumudu Leelananda,
Svetlana Belyanskaya,
Joe Franklin,
Nicolas Tilmans,
Henri Palacci
Abstract:
DNA encoded libraries (DELs) are used for rapid large-scale screening of small molecules against a protein target. These combinatorial libraries are built through several cycles of chemistry and DNA ligation, producing large sets of DNA-tagged molecules. Training machine learning models on DEL data has been shown to be effective at predicting molecules of interest dissimilar from those in the orig…
▽ More
DNA encoded libraries (DELs) are used for rapid large-scale screening of small molecules against a protein target. These combinatorial libraries are built through several cycles of chemistry and DNA ligation, producing large sets of DNA-tagged molecules. Training machine learning models on DEL data has been shown to be effective at predicting molecules of interest dissimilar from those in the original DEL. Machine learning chemical property prediction approaches rely on the assumption that the property of interest is linked to a single chemical structure. In the context of DNA-encoded libraries, this is equivalent to assuming that every chemical reaction fully yields the desired product. However, in practice, multi-step chemical synthesis sometimes generates partial molecules. Each unique DNA tag in a DEL therefore corresponds to a set of possible molecules. Here, we leverage reaction yield data to enumerate the set of possible molecules corresponding to a given DNA tag. This paper demonstrates that training a custom GNN on this richer dataset improves accuracy and generalization performance.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Cluster Activation Mapping with Applications to Medical Imaging
Authors:
Sarah Ryan,
Nichole Carlson,
Harris Butler,
Tasha Fingerlin,
Lisa Maier,
Fuyong Xing
Abstract:
An open question in deep clustering is how to understand what in the image is creating the cluster assignments. This visual understanding is essential to be able to trust the results of an inherently complex algorithm like deep learning, especially when the derived cluster assignments may be used to inform decision-making or create new disease sub-types. In this work, we developed novel methodolog…
▽ More
An open question in deep clustering is how to understand what in the image is creating the cluster assignments. This visual understanding is essential to be able to trust the results of an inherently complex algorithm like deep learning, especially when the derived cluster assignments may be used to inform decision-making or create new disease sub-types. In this work, we developed novel methodology to generate CLuster Activation Mapping (CLAM) which combines an unsupervised deep clustering framework with a modification of Score-CAM, an approach for discriminative localization in the supervised setting. We evaluated our approach using a simulation study based on computed tomography scans of the lung, and applied it to 3D CT scans from a sarcoidosis population to identify new clusters of sarcoidosis based purely on CT scan presentation.
△ Less
Submitted 9 October, 2020;
originally announced October 2020.
-
Does Phase Matter For Monaural Source Separation?
Authors:
Mohit Dubey,
Garrett Kenyon,
Nils Carlson,
Austin Thresher
Abstract:
The "cocktail party" problem of fully separating multiple sources from a single channel audio waveform remains unsolved. Current biological understanding of neural encoding suggests that phase information is preserved and utilized at every stage of the auditory pathway. However, current computational approaches primarily discard phase information in order to mask amplitude spectrograms of sound. I…
▽ More
The "cocktail party" problem of fully separating multiple sources from a single channel audio waveform remains unsolved. Current biological understanding of neural encoding suggests that phase information is preserved and utilized at every stage of the auditory pathway. However, current computational approaches primarily discard phase information in order to mask amplitude spectrograms of sound. In this paper, we seek to address whether preserving phase information in spectral representations of sound provides better results in monaural separation of vocals from a musical track by using a neurally plausible sparse generative model. Our results demonstrate that preserving phase information reduces artifacts in the separated tracks, as quantified by the signal to artifact ratio (GSAR). Furthermore, our proposed method achieves state-of-the-art performance for source separation, as quantified by a mean signal to interference ratio (GSIR) of 19.46.
△ Less
Submitted 2 November, 2017;
originally announced November 2017.
-
Phase Transitions in Image Denoising via Sparsely Coding Convolutional Neural Networks
Authors:
Jacob Carroll,
Nils Carlson,
Garrett T. Kenyon
Abstract:
Neural networks are analogous in many ways to spin glasses, systems which are known for their rich set of dynamics and equally complex phase diagrams. We apply well-known techniques in the study of spin glasses to a convolutional sparsely encoding neural network and observe power law finite-size scaling behavior in the sparsity and reconstruction error as the network denoises 32$\times$32 RGB CIFA…
▽ More
Neural networks are analogous in many ways to spin glasses, systems which are known for their rich set of dynamics and equally complex phase diagrams. We apply well-known techniques in the study of spin glasses to a convolutional sparsely encoding neural network and observe power law finite-size scaling behavior in the sparsity and reconstruction error as the network denoises 32$\times$32 RGB CIFAR-10 images. This finite-size scaling indicates the presence of a continuous phase transition at a critical value of this sparsity. By using the power law scaling relations inherent to finite-size scaling, we can determine the optimal value of sparsity for any network size by tuning the system to the critical point and operate the system at the minimum denoising error.
△ Less
Submitted 26 October, 2017;
originally announced October 2017.