Search | arXiv e-print repository

arXiv:2503.07540 [pdf]

AI-Enabled Knowledge Sharing for Enhanced Collaboration and Decision-Making in Non-Profit Healthcare Organizations: A Scoping Review Protocol

Authors: Maurice Ongala, Ruth Kiraka, Jyoti Choundrie, Javan Okello

Abstract: This protocol outlines a scoping review designed to systematically map the existing body of evidence on AI-enabled knowledge sharing in resource-limited non-profit healthcare organizations. The review aims to investigate how such technologies enhance collaboration and decision-making, particularly in the context of reduced external support following the cessation of USAID operations. Guided by thr… ▽ More This protocol outlines a scoping review designed to systematically map the existing body of evidence on AI-enabled knowledge sharing in resource-limited non-profit healthcare organizations. The review aims to investigate how such technologies enhance collaboration and decision-making, particularly in the context of reduced external support following the cessation of USAID operations. Guided by three theoretical frameworks namely, the Resource-Based View, Dynamic Capabilities Theory, and Absorptive Capacity Theory, this study will explore the dual role of AI as a strategic resource and an enabler of organizational learning and agility. The protocol details a rigorous methodological approach based on PRISMA-ScR guidelines, encompassing a systematic search strategy across multiple databases, inclusion and exclusion criteria, and a structured data extraction process. By integrating theoretical insights with empirical evidence, this scoping review seeks to identify critical gaps in the literature and inform the design of effective, resource-optimized AI solutions in non-profit healthcare settings. △ Less

Submitted 10 March, 2025; originally announced March 2025.

Comments: 14 pages

arXiv:2501.01835 [pdf, other]

ASKCOS: an open source software suite for synthesis planning

Authors: Zhengkai Tu, Sourabh J. Choure, Mun Hong Fong, Jihye Roh, Itai Levin, Kevin Yu, Joonyoung F. Joung, Nathan Morgan, Shih-Cheng Li, Xiaoqi Sun, Huiqian Lin, Mark Murnin, Jordan P. Liles, Thomas J. Struble, Michael E. Fortunato, Mengjie Liu, William H. Green, Klavs F. Jensen, Connor W. Coley

Abstract: The advancement of machine learning and the availability of large-scale reaction datasets have accelerated the development of data-driven models for computer-aided synthesis planning (CASP) in the past decade. Here, we detail the newest version of ASKCOS, an open source software suite for synthesis planning that makes available several research advances in a freely available, practical tool. Four… ▽ More The advancement of machine learning and the availability of large-scale reaction datasets have accelerated the development of data-driven models for computer-aided synthesis planning (CASP) in the past decade. Here, we detail the newest version of ASKCOS, an open source software suite for synthesis planning that makes available several research advances in a freely available, practical tool. Four one-step retrosynthesis models form the basis of both interactive planning and automatic planning modes. Retrosynthetic planning is complemented by other modules for feasibility assessment and pathway evaluation, including reaction condition recommendation, reaction outcome prediction, and auxiliary capabilities such as solubility prediction and quantum mechanical descriptor prediction. ASKCOS has assisted hundreds of medicinal, synthetic, and process chemists in their day-to-day tasks, complementing expert decision making. It is our belief that CASP tools like ASKCOS are an important part of modern chemistry research, and that they offer ever-increasing utility and accessibility. △ Less

Submitted 3 January, 2025; originally announced January 2025.

arXiv:2410.03224 [pdf, other]

ScriptViz: A Visualization Tool to Aid Scriptwriting based on a Large Movie Database

Authors: Anyi Rao, Jean-Peïc Chou, Maneesh Agrawala

Abstract: Scriptwriters usually rely on their mental visualization to create a vivid story by using their imagination to see, feel, and experience the scenes they are writing. Besides mental visualization, they often refer to existing images or scenes in movies and analyze the visual elements to create a certain mood or atmosphere. In this paper, we develop ScriptViz to provide external visualization based… ▽ More Scriptwriters usually rely on their mental visualization to create a vivid story by using their imagination to see, feel, and experience the scenes they are writing. Besides mental visualization, they often refer to existing images or scenes in movies and analyze the visual elements to create a certain mood or atmosphere. In this paper, we develop ScriptViz to provide external visualization based on a large movie database for the screenwriting process. It retrieves reference visuals on the fly based on scripts' text and dialogue from a large movie database. The tool provides two types of control on visual elements that enable writers to 1) see exactly what they want with fixed visual elements and 2) see variances in uncertain elements. User evaluation among 15 scriptwriters shows that ScriptViz is able to present scriptwriters with consistent yet diverse visual possibilities, aligning closely with their scripts and helping their creation. △ Less

Submitted 4 October, 2024; originally announced October 2024.

Comments: Accepted in the 37th Annual ACM Symposium on User Interface Software and Technology (UIST'24). Webpage: https://virtualfilmstudio.github.io/projects/scriptviz

arXiv:2409.13079 [pdf, other]

Embedding Geometries of Contrastive Language-Image Pre-Training

Authors: Jason Chuan-Chih Chou, Nahid Alam

Abstract: Since the publication of CLIP, the approach of using InfoNCE loss for contrastive pre-training has become widely popular for bridging two or more modalities. Despite its wide adoption, CLIP's original design choices of L2 normalization and cosine similarity logit have rarely been revisited. We have systematically experimented with alternative geometries and softmax logits for language-image pre-tr… ▽ More Since the publication of CLIP, the approach of using InfoNCE loss for contrastive pre-training has become widely popular for bridging two or more modalities. Despite its wide adoption, CLIP's original design choices of L2 normalization and cosine similarity logit have rarely been revisited. We have systematically experimented with alternative geometries and softmax logits for language-image pre-training and identified that variants with intuitive Euclidean geometry, Euclidean CLIP (EuCLIP), match or exceed the performance of CLIP and support hierarchical relationships at least as well as more complicated hyperbolic alternative. △ Less

Submitted 19 September, 2024; originally announced September 2024.

Comments: ECCV 2024 - Beyond Euclidean Workshop

arXiv:2409.10858 [pdf, other]

Speech Recognition for Analysis of Police Radio Communication

Authors: Tejes Srivastava, Ju-Chieh Chou, Priyank Shroff, Karen Livescu, Christopher Graziul

Abstract: Police departments around the world use two-way radio for coordination. These broadcast police communications (BPC) are a unique source of information about everyday police activity and emergency response. Yet BPC are not transcribed, and their naturalistic audio properties make automatic transcription challenging. We collect a corpus of roughly 62,000 manually transcribed radio transmissions (~46… ▽ More Police departments around the world use two-way radio for coordination. These broadcast police communications (BPC) are a unique source of information about everyday police activity and emergency response. Yet BPC are not transcribed, and their naturalistic audio properties make automatic transcription challenging. We collect a corpus of roughly 62,000 manually transcribed radio transmissions (~46 hours of audio) to evaluate the feasibility of automatic speech recognition (ASR) using modern recognition models. We evaluate the performance of off-the-shelf speech recognizers, models fine-tuned on BPC data, and customized end-to-end models. We find that both human and machine transcription is challenging in this domain. Large off-the-shelf ASR models perform poorly, but fine-tuned models can reach the approximate range of human performance. Our work suggests directions for future work, including analysis of short utterances and potential miscommunication in police radio interactions. We make our corpus and data annotation pipeline available to other researchers, to enable further research on recognition and analysis of police communication. △ Less

Submitted 16 September, 2024; originally announced September 2024.

Comments: Accepted by SLT 2024

arXiv:2409.00807 [pdf, other]

Diffusion based multi-domain neuroimaging harmonization method with preservation of anatomical details

Authors: Haoyu Lan, Bino A. Varghese, Nasim Sheikh-Bahaei, Farshid Sepehrband, Arthur W Toga, Jeiran Choupan

Abstract: Multi-center neuroimaging studies face technical variability due to batch differences across sites, which potentially hinders data aggregation and impacts study reliability.Recent efforts in neuroimaging harmonization have aimed to minimize these technical gaps and reduce technical variability across batches. While Generative Adversarial Networks (GAN) has been a prominent method for addressing im… ▽ More Multi-center neuroimaging studies face technical variability due to batch differences across sites, which potentially hinders data aggregation and impacts study reliability.Recent efforts in neuroimaging harmonization have aimed to minimize these technical gaps and reduce technical variability across batches. While Generative Adversarial Networks (GAN) has been a prominent method for addressing image harmonization tasks, GAN-harmonized images suffer from artifacts or anatomical distortions. Given the advancements of denoising diffusion probabilistic model which produces high-fidelity images, we have assessed the efficacy of the diffusion model for neuroimaging harmonization. we have demonstrated the diffusion model's superior capability in harmonizing images from multiple domains, while GAN-based methods are limited to harmonizing images between two domains per model. Our experiments highlight that the learned domain invariant anatomical condition reinforces the model to accurately preserve the anatomical details while differentiating batch differences at each diffusion step. Our proposed method has been tested on two public neuroimaging dataset ADNI1 and ABIDE II, yielding harmonization results with consistent anatomy preservation and superior FID score compared to the GAN-based methods. We have conducted multiple analysis including extensive quantitative and qualitative evaluations against the baseline models, ablation study showcasing the benefits of the learned conditions, and improvements in the consistency of perivascular spaces (PVS) segmentation through harmonization. △ Less

Submitted 1 September, 2024; originally announced September 2024.

arXiv:2408.04360 [pdf]

Detecting Car Speed using Object Detection and Depth Estimation: A Deep Learning Framework

Authors: Subhasis Dasgupta, Arshi Naaz, Jayeeta Choudhury, Nancy Lahiri

Abstract: Road accidents are quite common in almost every part of the world, and, in majority, fatal accidents are attributed to over speeding of vehicles. The tendency to over speeding is usually tried to be controlled using check points at various parts of the road but not all traffic police have the device to check speed with existing speed estimating devices such as LIDAR based, or Radar based guns. The… ▽ More Road accidents are quite common in almost every part of the world, and, in majority, fatal accidents are attributed to over speeding of vehicles. The tendency to over speeding is usually tried to be controlled using check points at various parts of the road but not all traffic police have the device to check speed with existing speed estimating devices such as LIDAR based, or Radar based guns. The current project tries to address the issue of vehicle speed estimation with handheld devices such as mobile phones or wearable cameras with network connection to estimate the speed using deep learning frameworks. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: This is the pre-print of the paper which was accepted for oral presentation and publication in the proceedings of IEEE CONIT 2024, organized at Pune from June 21 to 23, 2024. The paper is 6 pages long and it contains 11 figures and 1 table. This is not the final version of the paper

arXiv:2311.17686 [pdf]

AviationGPT: A Large Language Model for the Aviation Domain

Authors: Liya Wang, Jason Chou, Xin Zhou, Alex Tien, Diane M Baumgartner

Abstract: The advent of ChatGPT and GPT-4 has captivated the world with large language models (LLMs), demonstrating exceptional performance in question-answering, summarization, and content generation. The aviation industry is characterized by an abundance of complex, unstructured text data, replete with technical jargon and specialized terminology. Moreover, labeled data for model building are scarce in th… ▽ More The advent of ChatGPT and GPT-4 has captivated the world with large language models (LLMs), demonstrating exceptional performance in question-answering, summarization, and content generation. The aviation industry is characterized by an abundance of complex, unstructured text data, replete with technical jargon and specialized terminology. Moreover, labeled data for model building are scarce in this domain, resulting in low usage of aviation text data. The emergence of LLMs presents an opportunity to transform this situation, but there is a lack of LLMs specifically designed for the aviation domain. To address this gap, we propose AviationGPT, which is built on open-source LLaMA-2 and Mistral architectures and continuously trained on a wealth of carefully curated aviation datasets. Experimental results reveal that AviationGPT offers users multiple advantages, including the versatility to tackle diverse natural language processing (NLP) problems (e.g., question-answering, summarization, document writing, information extraction, report querying, data cleaning, and interactive data exploration). It also provides accurate and contextually relevant responses within the aviation domain and significantly improves performance (e.g., over a 40% performance gain in tested cases). With AviationGPT, the aviation industry is better equipped to address more complex research problems and enhance the efficiency and safety of National Airspace System (NAS) operations. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.15384 [pdf, other]

Dirichlet Process-based Robust Clustering using the Median-of-Means Estimator

Authors: Supratik Basu, Jyotishka Ray Choudhury, Debolina Paul, Swagatam Das

Abstract: Clustering stands as one of the most prominent challenges in unsupervised machine learning. Among centroid-based methods, the classic $k$-means algorithm, based on Lloyd's heuristic, is widely used. Nonetheless, it is a well-known fact that $k$-means and its variants face several challenges, including heavy reliance on initial cluster centroids, susceptibility to converging into local minima of th… ▽ More Clustering stands as one of the most prominent challenges in unsupervised machine learning. Among centroid-based methods, the classic $k$-means algorithm, based on Lloyd's heuristic, is widely used. Nonetheless, it is a well-known fact that $k$-means and its variants face several challenges, including heavy reliance on initial cluster centroids, susceptibility to converging into local minima of the objective function, and sensitivity to outliers and noise in the data. When data contains noise or outliers, the Median-of-Means (MoM) estimator offers a robust alternative for stabilizing centroid-based methods. On a different note, another limitation in many commonly used clustering methods is the need to specify the number of clusters beforehand. Model-based approaches, such as Bayesian nonparametric models, address this issue by incorporating infinite mixture models, which eliminate the requirement for predefined cluster counts. Motivated by these facts, in this article, we propose an efficient and automatic clustering technique by integrating the strengths of model-based and centroid-based methodologies. Our method mitigates the effect of noise on the quality of clustering; while at the same time, estimates the number of clusters. Statistical guarantees on an upper bound of clustering error, and rigorous assessment through simulated and real datasets, suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms. △ Less

Submitted 29 January, 2025; v1 submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.09526 [pdf, other]

Towards Serverless Optimization with In-place Scaling

Authors: Vincent Hsieh, Jerry Chou

Abstract: Serverless computing has gained popularity due to its cost efficiency, ease of deployment, and enhanced scalability. However, in serverless environments, servers are initiated only after receiving a request, leading to increased response times. This delay is commonly known as the cold start problem. In this study, we explore the in-place scaling feature released in Kubernetes v1.27 and examine its… ▽ More Serverless computing has gained popularity due to its cost efficiency, ease of deployment, and enhanced scalability. However, in serverless environments, servers are initiated only after receiving a request, leading to increased response times. This delay is commonly known as the cold start problem. In this study, we explore the in-place scaling feature released in Kubernetes v1.27 and examine its impact on serverless computing. Our experimental results reveal improvements in request latency, with reductions ranging from 1.16 to 18.15 times across various workloads when compared to traditional cold policy. △ Less

Submitted 15 November, 2023; originally announced November 2023.

arXiv:2310.08715 [pdf, other]

Toward Joint Language Modeling for Speech Units and Text

Authors: Ju-Chieh Chou, Chung-Ming Chien, Wei-Ning Hsu, Karen Livescu, Arun Babu, Alexis Conneau, Alexei Baevski, Michael Auli

Abstract: Speech and text are two major forms of human language. The research community has been focusing on mapping speech to text or vice versa for many years. However, in the field of language modeling, very little effort has been made to model them jointly. In light of this, we explore joint language modeling for speech units and text. Specifically, we compare different speech tokenizers to transform co… ▽ More Speech and text are two major forms of human language. The research community has been focusing on mapping speech to text or vice versa for many years. However, in the field of language modeling, very little effort has been made to model them jointly. In light of this, we explore joint language modeling for speech units and text. Specifically, we compare different speech tokenizers to transform continuous speech signals into discrete units and use different methods to construct mixed speech-text data. We introduce automatic metrics to evaluate how well the joint LM mixes speech and text. We also fine-tune the LM on downstream spoken language understanding (SLU) tasks with different modalities (speech or text) and test its performance to assess the model's learning of shared representations. Our results show that by mixing speech units and text with our proposed mixing techniques, the joint LM improves over a speech-only baseline on SLU tasks and shows zero-shot cross-modal transferability. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: EMNLP findings 2023

arXiv:2310.05919 [pdf, other]

Few-Shot Spoken Language Understanding via Joint Speech-Text Models

Authors: Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, Karen Livescu

Abstract: Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space. In this paper, we leverage such shared representations to address the persistent challenge of limited data availability in spoken language understanding tasks. By employing a pre-trained speech-text model, we fin… ▽ More Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space. In this paper, we leverage such shared representations to address the persistent challenge of limited data availability in spoken language understanding tasks. By employing a pre-trained speech-text model, we find that models fine-tuned on text can be effectively transferred to speech testing data. With as little as 1 hour of labeled speech data, our proposed approach achieves comparable performance on spoken language understanding tasks (specifically, sentiment analysis and named entity recognition) when compared to previous methods using speech-only pre-trained models fine-tuned on 10 times more data. Beyond the proof-of-concept study, we also analyze the latent representations. We find that the bottom layers of speech-text models are largely task-agnostic and align speech and text representations into a shared space, while the top layers are more task-specific. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.08030 [pdf, other]

AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement

Authors: Ju-Chieh Chou, Chung-Ming Chien, Karen Livescu

Abstract: Speech enhancement systems are typically trained using pairs of clean and noisy speech. In audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data available; most audio-visual datasets are collected in real-world environments with background noise and reverberation, hampering the development of AVSE. In this work, we introduce AV2Wav, a resynthesis-based audio-visual s… ▽ More Speech enhancement systems are typically trained using pairs of clean and noisy speech. In audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data available; most audio-visual datasets are collected in real-world environments with background noise and reverberation, hampering the development of AVSE. In this work, we introduce AV2Wav, a resynthesis-based audio-visual speech enhancement approach that can generate clean speech despite the challenges of real-world training data. We obtain a subset of nearly clean speech from an audio-visual corpus using a neural quality estimator, and then train a diffusion model on this subset to generate waveforms conditioned on continuous speech representations from AV-HuBERT with noise-robust training. We use continuous rather than discrete representations to retain prosody and speaker information. With this vocoding task alone, the model can perform speech enhancement better than a masking-based baseline. We further fine-tune the diffusion model on clean/noisy utterance pairs to improve the performance. Our approach outperforms a masking-based baseline in terms of both automatic metrics and a human listening test and is close in quality to the target speech in the listening test. Audio samples can be found at https://home.ttic.edu/~jcchou/demo/avse/avse_demo.html. △ Less

Submitted 4 November, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: extended version for the accepted paper at ICASSP 2024

arXiv:2309.03790 [pdf, other]

doi 10.1145/3586183.3606807

TaleStream: Supporting Story Ideation with Trope Knowledge

Authors: Jean-Peïc Chou, Alexa F. Siu, Nedim Lipka, Ryan Rossi, Franck Dernoncourt, Maneesh Agrawala

Abstract: Story ideation is a critical part of the story-writing process. It is challenging to support computationally due to its exploratory and subjective nature. Tropes, which are recurring narrative elements across stories, are essential in stories as they shape the structure of narratives and our understanding of them. In this paper, we propose to use tropes as an intermediate representation of stories… ▽ More Story ideation is a critical part of the story-writing process. It is challenging to support computationally due to its exploratory and subjective nature. Tropes, which are recurring narrative elements across stories, are essential in stories as they shape the structure of narratives and our understanding of them. In this paper, we propose to use tropes as an intermediate representation of stories to approach story ideation. We present TaleStream, a canvas system that uses tropes as building blocks of stories while providing steerable suggestions of story ideas in the form of tropes. Our trope suggestion methods leverage data from the tvtropes.org wiki. We find that 97% of the time, trope suggestions generated by our methods provide better story ideation materials than random tropes. Our system evaluation suggests that TaleStream can support writers' creative flow and greatly facilitates story development. Tropes, as a rich lexicon of narratives with available examples, play a key role in TaleStream and hold promise for story-creation support systems. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 12 pages, 6 figures, 3 tables

ACM Class: D.2.2; H.1.2; H.5.2

arXiv:2306.13985 [pdf, other]

doi 10.1007/978-3-031-43424-2_6

Robust Classification of High-Dimensional Data using Data-Adaptive Energy Distance

Authors: Jyotishka Ray Choudhury, Aytijhya Saha, Sarbojit Roy, Subhajit Dutta

Abstract: Classification of high-dimensional low sample size (HDLSS) data poses a challenge in a variety of real-world situations, such as gene expression studies, cancer research, and medical imaging. This article presents the development and analysis of some classifiers that are specifically designed for HDLSS data. These classifiers are free of tuning parameters and are robust, in the sense that they are… ▽ More Classification of high-dimensional low sample size (HDLSS) data poses a challenge in a variety of real-world situations, such as gene expression studies, cancer research, and medical imaging. This article presents the development and analysis of some classifiers that are specifically designed for HDLSS data. These classifiers are free of tuning parameters and are robust, in the sense that they are devoid of any moment conditions of the underlying data distributions. It is shown that they yield perfect classification in the HDLSS asymptotic regime, under some fairly general conditions. The comparative performance of the proposed classifiers is also investigated. Our theoretical results are supported by extensive simulation studies and real data analysis, which demonstrate promising advantages of the proposed classification techniques over several widely recognized methods. △ Less

Submitted 19 February, 2025; v1 submitted 24 June, 2023; originally announced June 2023.

Comments: Published at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2023

Journal ref: In: ECML PKDD 2023: Research Track. Lecture Notes in Computer Science, vol 14173. Springer, Cham (2023)

arXiv:2305.09556 [pdf]

Adapting Sentence Transformers for the Aviation Domain

Authors: Liya Wang, Jason Chou, Dave Rouck, Alex Tien, Diane M Baumgartner

Abstract: Learning effective sentence representations is crucial for many Natural Language Processing (NLP) tasks, including semantic search, semantic textual similarity (STS), and clustering. While multiple transformer models have been developed for sentence embedding learning, these models may not perform optimally when dealing with specialized domains like aviation, which has unique characteristics such… ▽ More Learning effective sentence representations is crucial for many Natural Language Processing (NLP) tasks, including semantic search, semantic textual similarity (STS), and clustering. While multiple transformer models have been developed for sentence embedding learning, these models may not perform optimally when dealing with specialized domains like aviation, which has unique characteristics such as technical jargon, abbreviations, and unconventional grammar. Furthermore, the absence of labeled datasets makes it difficult to train models specifically for the aviation domain. To address these challenges, we propose a novel approach for adapting sentence transformers for the aviation domain. Our method is a two-stage process consisting of pre-training followed by fine-tuning. During pre-training, we use Transformers and Sequential Denoising AutoEncoder (TSDAE) with aviation text data as input to improve the initial model performance. Subsequently, we fine-tune our models using a Natural Language Inference (NLI) dataset in the Sentence Bidirectional Encoder Representations from Transformers (SBERT) architecture to mitigate overfitting issues. Experimental results on several downstream tasks show that our adapted sentence transformers significantly outperform general-purpose transformers, demonstrating the effectiveness of our approach in capturing the nuances of the aviation domain. Overall, our work highlights the importance of domain-specific adaptation in developing high-quality NLP solutions for specialized industries like aviation. △ Less

Submitted 29 November, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

arXiv:2212.03957 [pdf, other]

doi 10.1145/3539597.3570438

DeMEtRIS: Counting (near)-Cliques by Crawling

Authors: Suman K. Bera, Jayesh Choudhari, Shahrzad Haddadan, Sara Ahmadian

Abstract: We study the problem of approximately counting cliques and near cliques in a graph, where the access to the graph is only available through crawling its vertices; thus typically seeing only a small portion of it. This model, known as the random walk model or the neighborhood query model has been introduced recently and captures real-life scenarios in which the entire graph is too massive to be sto… ▽ More We study the problem of approximately counting cliques and near cliques in a graph, where the access to the graph is only available through crawling its vertices; thus typically seeing only a small portion of it. This model, known as the random walk model or the neighborhood query model has been introduced recently and captures real-life scenarios in which the entire graph is too massive to be stored as a whole or be scanned entirely and sampling vertices independently is non-trivial in it. We introduce DeMEtRIS: Dense Motif Estimation through Random Incident Sampling. This method provides a scalable algorithm for clique and near clique counting in the random walk model. We prove the correctness of our algorithm through rigorous mathematical analysis and extensive experiments. Both our theoretical results and our experiments show that DeMEtRIS obtains a high precision estimation by only crawling a sub-linear portion on vertices, thus we demonstrate a significant improvement over previously known results. △ Less

Submitted 7 December, 2022; originally announced December 2022.

arXiv:2210.01986 [pdf, other]

MAtt: A Manifold Attention Network for EEG Decoding

Authors: Yue-Ting Pan, Jing-Lun Chou, Chun-Shu Wei

Abstract: Recognition of electroencephalographic (EEG) signals highly affect the efficiency of non-invasive brain-computer interfaces (BCIs). While recent advances of deep-learning (DL)-based EEG decoders offer improved performances, the development of geometric learning (GL) has attracted much attention for offering exceptional robustness in decoding noisy EEG data. However, there is a lack of studies on t… ▽ More Recognition of electroencephalographic (EEG) signals highly affect the efficiency of non-invasive brain-computer interfaces (BCIs). While recent advances of deep-learning (DL)-based EEG decoders offer improved performances, the development of geometric learning (GL) has attracted much attention for offering exceptional robustness in decoding noisy EEG data. However, there is a lack of studies on the merged use of deep neural networks (DNNs) and geometric learning for EEG decoding. We herein propose a manifold attention network (mAtt), a novel geometric deep learning (GDL)-based model, featuring a manifold attention mechanism that characterizes spatiotemporal representations of EEG data fully on a Riemannian symmetric positive definite (SPD) manifold. The evaluation of the proposed MAtt on both time-synchronous and -asyncronous EEG datasets suggests its superiority over other leading DL methods for general EEG decoding. Furthermore, analysis of model interpretation reveals the capability of MAtt in capturing informative EEG features and handling the non-stationarity of brain dynamics. △ Less

Submitted 4 October, 2022; originally announced October 2022.

arXiv:2205.03235 [pdf]

A Review on Text-Based Emotion Detection -- Techniques, Applications, Datasets, and Future Directions

Authors: Sheetal Kusal, Shruti Patil, Jyoti Choudrie, Ketan Kotecha, Deepali Vora, Ilias Pappas

Abstract: Artificial Intelligence (AI) has been used for processing data to make decisions, interact with humans, and understand their feelings and emotions. With the advent of the internet, people share and express their thoughts on day-to-day activities and global and local events through text messaging applications. Hence, it is essential for machines to understand emotions in opinions, feedback, and tex… ▽ More Artificial Intelligence (AI) has been used for processing data to make decisions, interact with humans, and understand their feelings and emotions. With the advent of the internet, people share and express their thoughts on day-to-day activities and global and local events through text messaging applications. Hence, it is essential for machines to understand emotions in opinions, feedback, and textual dialogues to provide emotionally aware responses to users in today's online world. The field of text-based emotion detection (TBED) is advancing to provide automated solutions to various applications, such as businesses, and finances, to name a few. TBED has gained a lot of attention in recent times. The paper presents a systematic literature review of the existing literature published between 2005 to 2021 in TBED. This review has meticulously examined 63 research papers from IEEE, Science Direct, Scopus, and Web of Science databases to address four primary research questions. It also reviews the different applications of TBED across various research domains and highlights its use. An overview of various emotion models, techniques, feature extraction methods, datasets, and research challenges with future directions has also been represented. △ Less

Submitted 26 April, 2022; originally announced May 2022.

Comments: 74 pages

arXiv:2204.08106 [pdf, other]

A New Dynamic Algorithm for Densest Subhypergraphs

Authors: Suman K. Bera, Sayan Bhattacharya, Jayesh Choudhari, Prantar Ghosh

Abstract: Computing a dense subgraph is a fundamental problem in graph mining, with a diverse set of applications ranging from electronic commerce to community detection in social networks. In many of these applications, the underlying context is better modelled as a weighted hypergraph that keeps evolving with time. This motivates the problem of maintaining the densest subhypergraph of a weighted hypergr… ▽ More Computing a dense subgraph is a fundamental problem in graph mining, with a diverse set of applications ranging from electronic commerce to community detection in social networks. In many of these applications, the underlying context is better modelled as a weighted hypergraph that keeps evolving with time. This motivates the problem of maintaining the densest subhypergraph of a weighted hypergraph in a {\em dynamic setting}, where the input keeps changing via a sequence of updates (hyperedge insertions/deletions). Previously, the only known algorithm for this problem was due to Hu et al. [HWC17]. This algorithm worked only on unweighted hypergraphs, and had an approximation ratio of $(1+ε)r^2$ and an update time of $O(\text{poly} (r, \log n))$, where $r$ denotes the maximum rank of the input across all the updates. We obtain a new algorithm for this problem, which works even when the input hypergraph is weighted. Our algorithm has a significantly improved (near-optimal) approximation ratio of $(1+ε)$ that is independent of $r$, and a similar update time of $O(\text{poly} (r, \log n))$. It is the first $(1+ε)$-approximation algorithm even for the special case of weighted simple graphs. To complement our theoretical analysis, we perform experiments with our dynamic algorithm on large-scale, real-world data-sets. Our algorithm significantly outperforms the state of the art [HWC17] both in terms of accuracy and efficiency. △ Less

Submitted 17 April, 2022; originally announced April 2022.

Comments: Extended abstract appears in TheWebConf (previously WWW) 2022

arXiv:2202.05711 [pdf, ps, other]

Global Optimization of Data Pipelines in Heterogeneous Cloud Environments

Authors: Erica Lin, Luna Xu, Suraj Bramhavar, Marco Montes de Oca, Sean Gorsky, Lingyun Yi, Arianna Groetsema, Jeffrey Chou

Abstract: Modern production data processing and machine learning pipelines on the cloud are critical components for many cloud-based companies. These pipelines are typically composed of complex workflows represented by directed acyclic graphs (DAGs). Cloud environments are attractive to these workflows due to the wide range of choice with heterogeneous instances and prices that can provide the flexibility f… ▽ More Modern production data processing and machine learning pipelines on the cloud are critical components for many cloud-based companies. These pipelines are typically composed of complex workflows represented by directed acyclic graphs (DAGs). Cloud environments are attractive to these workflows due to the wide range of choice with heterogeneous instances and prices that can provide the flexibility for different cost-performance needs. However, this flexibility also leads to the complexity of selecting the right resource configuration (e.g., instance type, resource demands) for each task in the DAG, while simultaneously scheduling the tasks with the selected resources to reach the optimal end-to-end performance and cost. These two decisions are often codependent resulting in an NP-hard scheduling optimization bottleneck. Existing solutions only focus solely on either problem and ignore the co-effect on the end-to-end optimum. We propose AGORA, a scheduler that considers both task-level resource allocation and execution for DAG workflows as a whole in heterogeneous cloud environments. AGORA first (1) studies the characteristics of the tasks from prior runs and gives predictions on resource configurations, and (2) automatically finds the best configuration with its corresponding schedules for the entire workflow with a cost-performance objective. We evaluate AGORA in a heterogeneous Amazon Web Services (AWS) cloud environment with multi-tenant workflows served by Airflow and demonstrate a performance improvement up to 45% and cost reduction up to 77% compared to state-of-the-art schedulers. In addition, we apply AGORA to a real-world production trace from Alibaba and show cost reduction of 65% and DAG completion time reduction of 57%. △ Less

Submitted 11 February, 2022; originally announced February 2022.

Comments: 13 pages

arXiv:2111.04494 [pdf]

Multi-Airport Delay Prediction with Transformers

Authors: Liya Wang, Alex Tien, Jason Chou

Abstract: Airport performance prediction with a reasonable look-ahead time is a challenging task and has been attempted by various prior research. Traffic, demand, weather, and traffic management actions are all critical inputs to any prediction model. In this paper, a novel approach based on Temporal Fusion Transformer (TFT) was proposed to predict departure and arrival delays simultaneously for multiple a… ▽ More Airport performance prediction with a reasonable look-ahead time is a challenging task and has been attempted by various prior research. Traffic, demand, weather, and traffic management actions are all critical inputs to any prediction model. In this paper, a novel approach based on Temporal Fusion Transformer (TFT) was proposed to predict departure and arrival delays simultaneously for multiple airports at once. This approach can capture complex temporal dynamics of the inputs known at the time of prediction and then forecast selected delay metrics up to four hours into the future. When dealing with weather inputs, a self-supervised learning (SSL) model was developed to encode high-dimensional weather data into a much lower-dimensional representation to make the training of TFT more efficiently and effectively. The initial results show that the TFT-based delay prediction model achieves satisfactory performance measured by smaller prediction errors on a testing dataset. In addition, the interpretability analysis of the model outputs identifies the important input factors for delay prediction. The proposed approach is expected to help air traffic managers or decision makers gain insights about traffic management actions on delay mitigation and once operationalized, provide enough lead time to plan for predicted performance degradation. △ Less

Submitted 4 November, 2021; originally announced November 2021.

arXiv:2110.07742 [pdf, other]

Beyond Classification: Directly Training Spiking Neural Networks for Semantic Segmentation

Authors: Youngeun Kim, Joshua Chough, Priyadarshini Panda

Abstract: Spiking Neural Networks (SNNs) have recently emerged as the low-power alternative to Artificial Neural Networks (ANNs) because of their sparse, asynchronous, and binary event-driven processing. Due to their energy efficiency, SNNs have a high possibility of being deployed for real-world, resource-constrained systems such as autonomous vehicles and drones. However, owing to their non-differentiable… ▽ More Spiking Neural Networks (SNNs) have recently emerged as the low-power alternative to Artificial Neural Networks (ANNs) because of their sparse, asynchronous, and binary event-driven processing. Due to their energy efficiency, SNNs have a high possibility of being deployed for real-world, resource-constrained systems such as autonomous vehicles and drones. However, owing to their non-differentiable and complex neuronal dynamics, most previous SNN optimization methods have been limited to image recognition. In this paper, we explore the SNN applications beyond classification and present semantic segmentation networks configured with spiking neurons. Specifically, we first investigate two representative SNN optimization techniques for recognition tasks (i.e., ANN-SNN conversion and surrogate gradient learning) on semantic segmentation datasets. We observe that, when converted from ANNs, SNNs suffer from high latency and low performance due to the spatial variance of features. Therefore, we directly train networks with surrogate gradient learning, resulting in lower latency and higher performance than ANN-SNN conversion. Moreover, we redesign two fundamental ANN segmentation architectures (i.e., Fully Convolutional Networks and DeepLab) for the SNN domain. We conduct experiments on two public semantic segmentation benchmarks including the PASCAL VOC2012 dataset and the DDD17 event-based dataset. In addition to showing the feasibility of SNNs for semantic segmentation, we show that SNNs can be more robust and energy-efficient compared to their ANN counterparts in this domain. △ Less

Submitted 14 October, 2021; originally announced October 2021.

arXiv:2107.04734 [pdf, other]

Layer-wise Analysis of a Self-supervised Speech Representation Model

Authors: Ankita Pasad, Ju-Chieh Chou, Karen Livescu

Abstract: Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or extent of information encoded in the pre-trained representations themselves. Developing such insights can help understand the capabilities and limits of t… ▽ More Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or extent of information encoded in the pre-trained representations themselves. Developing such insights can help understand the capabilities and limits of these models and enable the research community to more efficiently develop their usage for downstream applications. In this work, we begin to fill this gap by examining one recent and successful pre-trained model (wav2vec 2.0), via its intermediate representation vectors, using a suite of analysis tools. We use the metrics of canonical correlation, mutual information, and performance on simple downstream tasks with non-parametric probes, in order to (i) query for acoustic and linguistic information content, (ii) characterize the evolution of information across model layers, and (iii) understand how fine-tuning the model for automatic speech recognition (ASR) affects these observations. Our findings motivate modifying the fine-tuning protocol for ASR, which produces improved word error rates in a low-resource setting. △ Less

Submitted 3 December, 2022; v1 submitted 9 July, 2021; originally announced July 2021.

Comments: Accepted to ASRU 2021. Code: https://github.com/ankitapasad/layerwise-analysis

arXiv:2106.16215 [pdf]

Algorithm For 3D-Chemotaxis Using Spiking Neural Network

Authors: Jayesh Choudhary, Vivek Saraswat, Udayan Ganguly

Abstract: In this work, we aim to devise an end-to-end spiking implementation for contour tracking in 3D media inspired by chemotaxis, where the worm reaches the region which has the given set concentration. For a planer medium, efficient contour tracking algorithms have already been devised, but a new degree of freedom has quite a few challenges. Here we devise an algorithm based on klinokinesis - where th… ▽ More In this work, we aim to devise an end-to-end spiking implementation for contour tracking in 3D media inspired by chemotaxis, where the worm reaches the region which has the given set concentration. For a planer medium, efficient contour tracking algorithms have already been devised, but a new degree of freedom has quite a few challenges. Here we devise an algorithm based on klinokinesis - where the motion of the worm is in response to the stimuli but not proportional to it. Thus the path followed is not the shortest, but we can track the set concentration successfully. We are using simple LIF neurons for the neural network implementation, considering the feasibility of its implementation in the neuromorphic computing hardware. △ Less

Submitted 30 June, 2021; originally announced June 2021.

Comments: 12 pages, 8 figures, accepted for the '30th International Conference on Artificial Neural Networks, ICANN2021'

arXiv:2106.04624 [pdf, other]

SpeechBrain: A General-Purpose Speech Toolkit

Authors: Mirco Ravanelli, Titouan Parcollet, Peter Plantinga, Aku Rouhe, Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan Zhong, Ju-Chieh Chou, Sung-Lin Yeh, Szu-Wei Fu, Chien-Feng Liao, Elena Rastorgueva, François Grondin, William Aris, Hwidong Na, Yan Gao, Renato De Mori, Yoshua Bengio

Abstract: SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing… ▽ More SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the research and development of neural speech processing technologies by being simple, flexible, user-friendly, and well-documented. This paper describes the core architecture designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing pipelines. SpeechBrain achieves competitive or state-of-the-art performance in a wide range of speech benchmarks. It also provides training recipes, pretrained models, and inference scripts for popular speech datasets, as well as tutorials which allow anyone with basic Python proficiency to familiarize themselves with speech technologies. △ Less

Submitted 8 June, 2021; originally announced June 2021.

Comments: Preprint

arXiv:2012.06522 [pdf, other]

Online Coresets for Clustering with Bregman Divergences

Authors: Rachit Chhaya, Jayesh Choudhari, Anirban Dasgupta, Supratim Shit

Abstract: We present algorithms that create coresets in an online setting for clustering problems according to a wide subset of Bregman divergences. Notably, our coresets have a small additive error, similar in magnitude to the lightweight coresets Bachem et. al. 2018, and take update time $O(d)$ for every incoming point where $d$ is dimension of the point. Our first algorithm gives online coresets of size… ▽ More We present algorithms that create coresets in an online setting for clustering problems according to a wide subset of Bregman divergences. Notably, our coresets have a small additive error, similar in magnitude to the lightweight coresets Bachem et. al. 2018, and take update time $O(d)$ for every incoming point where $d$ is dimension of the point. Our first algorithm gives online coresets of size $\tilde{O}(\mbox{poly}(k,d,ε,μ))$ for $k$-clusterings according to any $μ$-similar Bregman divergence. We further extend this algorithm to show existence of a non-parametric coresets, where the coreset size is independent of $k$, the number of clusters, for the same subclass of Bregman divergences. Our non-parametric coresets are larger by a factor of $O(\log n)$ ($n$ is number of points) and have similar (small) additive guarantee. At the same time our coresets also function as lightweight coresets for non-parametric versions of the Bregman clustering like DP-Means. While these coresets provide additive error guarantees, they are also significantly smaller (scaling with $O(\log n)$ as opposed to $O(d^d)$ for points in $R^d$) than the (relative-error) coresets obtained in Bachem et. al. 2015 for DP-Means. While our non-parametric coresets are existential, we give an algorithmic version under certain assumptions. △ Less

Submitted 11 December, 2020; originally announced December 2020.

Comments: Work in Progress

arXiv:2008.10828 [pdf, other]

Efficient Hierarchical Clustering for Classification and Anomaly Detection

Authors: Ishita Doshi, Sreekalyan Sajjalla, Jayesh Choudhari, Rushi Bhatt, Anirban Dasgupta

Abstract: We address the problem of large scale real-time classification of content posted on social networks, along with the need to rapidly identify novel spam types. Obtaining manual labels for user-generated content using editorial labeling and taxonomy development lags compared to the rate at which new content type needs to be classified. We propose a class of hierarchical clustering algorithms that ca… ▽ More We address the problem of large scale real-time classification of content posted on social networks, along with the need to rapidly identify novel spam types. Obtaining manual labels for user-generated content using editorial labeling and taxonomy development lags compared to the rate at which new content type needs to be classified. We propose a class of hierarchical clustering algorithms that can be used both for efficient and scalable real-time multiclass classification as well as in detecting new anomalies in user-generated content. Our methods have low query time, linear space usage, and come with theoretical guarantees with respect to a specific hierarchical clustering cost function (Dasgupta, 2016). We compare our solutions against a range of classification techniques and demonstrate excellent empirical performance. △ Less

Submitted 25 August, 2020; originally announced August 2020.

Comments: 19 pages, 2 figures, 9 tables

ACM Class: H.3.3; I.5.3; I.7.0; E.1

arXiv:2006.01225 [pdf, ps, other]

Streaming Coresets for Symmetric Tensor Factorization

Authors: Rachit Chhaya, Jayesh Choudhari, Anirban Dasgupta, Supratim Shit

Abstract: Factorizing tensors has recently become an important optimization module in a number of machine learning pipelines, especially in latent variable models. We show how to do this efficiently in the streaming setting. Given a set of $n$ vectors, each in $\mathbb{R}^d$, we present algorithms to select a sublinear number of these vectors as coreset, while guaranteeing that the CP decomposition of the… ▽ More Factorizing tensors has recently become an important optimization module in a number of machine learning pipelines, especially in latent variable models. We show how to do this efficiently in the streaming setting. Given a set of $n$ vectors, each in $\mathbb{R}^d$, we present algorithms to select a sublinear number of these vectors as coreset, while guaranteeing that the CP decomposition of the $p$-moment tensor of the coreset approximates the corresponding decomposition of the $p$-moment tensor computed from the full data. We introduce two novel algorithmic techniques: online filtering and kernelization. Using these two, we present six algorithms that achieve different tradeoffs of coreset size, update time and working space, beating or matching various state of the art algorithms. In the case of matrices ($2$-ordered tensor), our online row sampling algorithm guarantees $(1 \pm ε)$ relative error spectral approximation. We show applications of our algorithms in learning single topic modeling. △ Less

Submitted 13 July, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

Comments: Accepted at ICML 2020. Included algorithm with improved update time and fixed minor bugs

arXiv:1906.11405 [pdf, other]

BioGen: Automated Biography Generation

Authors: Heer Ambavi, Ayush Garg, Ayush Garg, Nitiksha, Mridul Sharma, Rohit Sharma, Jayesh Choudhari, Mayank Singh

Abstract: A biography of a person is the detailed description of several life events including his education, work, relationships, and death. Wikipedia, the free web-based encyclopedia, consists of millions of manually curated biographies of eminent politicians, film and sports personalities, etc. However, manual curation efforts, even though efficient, suffers from significant delays. In this work, we prop… ▽ More A biography of a person is the detailed description of several life events including his education, work, relationships, and death. Wikipedia, the free web-based encyclopedia, consists of millions of manually curated biographies of eminent politicians, film and sports personalities, etc. However, manual curation efforts, even though efficient, suffers from significant delays. In this work, we propose an automatic biography generation framework BioGen. BioGen generates a short collection of biographical sentences clustered into multiple events of life. Evaluation results show that biographies generated by BioGen are significantly closer to manually written biographies in Wikipedia. A working model of this framework is available at nlpbiogen.herokuapp.com/home/. △ Less

Submitted 26 June, 2019; originally announced June 2019.

Comments: Accepted at JCDL 2019

arXiv:1905.04000 [pdf, other]

doi 10.1109/TVCG.2019.2934433

An Incremental Dimensionality Reduction Method for Visualizing Streaming Multidimensional Data

Authors: Takanori Fujiwara, Jia-Kai Chou, Shilpika, Panpan Xu, Liu Ren, Kwan-Liu Ma

Abstract: Dimensionality reduction (DR) methods are commonly used for analyzing and visualizing multidimensional data. However, when data is a live streaming feed, conventional DR methods cannot be directly used because of their computational complexity and inability to preserve the projected data positions at previous time points. In addition, the problem becomes even more challenging when the dynamic data… ▽ More Dimensionality reduction (DR) methods are commonly used for analyzing and visualizing multidimensional data. However, when data is a live streaming feed, conventional DR methods cannot be directly used because of their computational complexity and inability to preserve the projected data positions at previous time points. In addition, the problem becomes even more challenging when the dynamic data records have a varying number of dimensions as often found in real-world applications. This paper presents an incremental DR solution. We enhance an existing incremental PCA method in several ways to ensure its usability for visualizing streaming multidimensional data. First, we use geometric transformation and animation methods to help preserve a viewer's mental map when visualizing the incremental results. Second, to handle data dimension variants, we use an optimization method to estimate the projected data positions, and also convey the resulting uncertainty in the visualization. We demonstrate the effectiveness of our design with two case studies using real-world datasets. △ Less

Submitted 15 October, 2019; v1 submitted 10 May, 2019; originally announced May 2019.

Comments: This is the author's version of the article that has been published in IEEE Transactions on Visualization and Computer Graphics. The final version of this record is available at: 10.1109/TVCG.2019.2934433

ACM Class: I.3.8

arXiv:1904.10937 [pdf, other]

Generated Loss and Augmented Training of MNIST VAE

Authors: Jason Chou

Abstract: The variational autoencoder (VAE) framework is a popular option for training unsupervised generative models, featuring ease of training and latent representation of data. The objective function of VAE does not guarantee to achieve the latter, however, and failure to do so leads to a frequent failure mode called posterior collapse. Even in successful cases, VAEs often result in low-precision recons… ▽ More The variational autoencoder (VAE) framework is a popular option for training unsupervised generative models, featuring ease of training and latent representation of data. The objective function of VAE does not guarantee to achieve the latter, however, and failure to do so leads to a frequent failure mode called posterior collapse. Even in successful cases, VAEs often result in low-precision reconstructions and generated samples. The introduction of the KL-divergence weight $β$ can help steer the model clear of posterior collapse, but its tuning is often a trial-and-error process with no guiding metrics. Here we test the idea of using the total VAE loss of generated samples (generated loss) as the proxy metric for generation quality, the related hypothesis that VAE reconstruction from the mean latent vector tends to be a more typical example of its class than the original, and the idea of exploiting this property by augmenting training data with generated variants (augmented training). The results are mixed, but repeated encoding and decoding indeed result in qualitatively and quantitatively more typical examples from both convolutional and fully-connected MNIST VAEs, suggesting that it may be an inherent property of the VAE framework. △ Less

Submitted 24 April, 2019; originally announced April 2019.

ACM Class: I.2.6

arXiv:1904.10446 [pdf, other]

Generated Loss, Augmented Training, and Multiscale VAE

Authors: Jason Chou, Gautam Hathi

Abstract: The variational autoencoder (VAE) framework remains a popular option for training unsupervised generative models, especially for discrete data where generative adversarial networks (GANs) require workaround to create gradient for the generator. In our work modeling US postal addresses, we show that our discrete VAE with tree recursive architecture demonstrates limited capability of capturing field… ▽ More The variational autoencoder (VAE) framework remains a popular option for training unsupervised generative models, especially for discrete data where generative adversarial networks (GANs) require workaround to create gradient for the generator. In our work modeling US postal addresses, we show that our discrete VAE with tree recursive architecture demonstrates limited capability of capturing field correlations within structured data, even after overcoming the challenge of posterior collapse with scheduled sampling and tuning of the KL-divergence weight $β$. Worse, VAE seems to have difficulty mapping its generated samples to the latent space, as their VAE loss lags behind or even increases during the training process. Motivated by this observation, we show that augmenting training data with generated variants (augmented training) and training a VAE with multiple values of $β$ simultaneously (multiscale VAE) both improve the generation quality of VAE. Despite their differences in motivation and emphasis, we show that augmented training and multiscale VAE are actually connected and have similar effects on the model. △ Less

Submitted 23 April, 2019; originally announced April 2019.

ACM Class: I.2.6

arXiv:1904.05742 [pdf, other]

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization

Authors: Ju-chieh Chou, Cheng-chieh Yeh, Hung-yi Lee

Abstract: Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers. However, such model suffers from the limitation that it can only convert the voice to the speakers in the training data, which narrows down the applicable scenario of VC. In this paper, we proposed a n… ▽ More Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers. However, such model suffers from the limitation that it can only convert the voice to the speakers in the training data, which narrows down the applicable scenario of VC. In this paper, we proposed a novel one-shot VC approach which is able to perform VC by only an example utterance from source and target speaker respectively, and the source and target speaker do not even need to be seen during training. This is achieved by disentangling speaker and content representations with instance normalization (IN). Objective and subjective evaluation shows that our model is able to generate the voice similar to target speaker. In addition to the performance measurement, we also demonstrate that this model is able to learn meaningful speaker representations without any supervision. △ Less

Submitted 22 August, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

Comments: Interspeech 2019

arXiv:1904.04990 [pdf, other]

Identifying Sub-Phenotypes of Acute Kidney Injury using Structured and Unstructured Electronic Health Record Data with Memory Networks

Authors: Zhenxing Xu, Jingyuan Chou, Xi Sheryl Zhang, Yuan Luo, Tamara Isakova, Prakash Adekkanattu, Jessica S. Ancker, Guoqian Jiang, Richard C. Kiefer, Jennifer A. Pacheco, Luke V. Rasmussen, Jyotishman Pathak, Fei Wang

Abstract: Acute Kidney Injury (AKI) is a common clinical syndrome characterized by the rapid loss of kidney excretory function, which aggravates the clinical severity of other diseases in a large number of hospitalized patients. Accurate early prediction of AKI can enable in-time interventions and treatments. However, AKI is highly heterogeneous, thus identification of AKI sub-phenotypes can lead to an impr… ▽ More Acute Kidney Injury (AKI) is a common clinical syndrome characterized by the rapid loss of kidney excretory function, which aggravates the clinical severity of other diseases in a large number of hospitalized patients. Accurate early prediction of AKI can enable in-time interventions and treatments. However, AKI is highly heterogeneous, thus identification of AKI sub-phenotypes can lead to an improved understanding of the disease pathophysiology and development of more targeted clinical interventions. This study used a memory network-based deep learning approach to discover AKI sub-phenotypes using structured and unstructured electronic health record (EHR) data of patients before AKI diagnosis. We leveraged a real world critical care EHR corpus including 37,486 ICU stays. Our approach identified three distinct sub-phenotypes: sub-phenotype I is with an average age of 63.03$ \pm 17.25 $ years, and is characterized by mild loss of kidney excretory function (Serum Creatinine (SCr) $1.55\pm 0.34$ mg/dL, estimated Glomerular Filtration Rate Test (eGFR) $107.65\pm 54.98$ mL/min/1.73$m^2$). These patients are more likely to develop stage I AKI. Sub-phenotype II is with average age 66.81$ \pm 10.43 $ years, and was characterized by severe loss of kidney excretory function (SCr $1.96\pm 0.49$ mg/dL, eGFR $82.19\pm 55.92$ mL/min/1.73$m^2$). These patients are more likely to develop stage III AKI. Sub-phenotype III is with average age 65.07$ \pm 11.32 $ years, and was characterized moderate loss of kidney excretory function and thus more likely to develop stage II AKI (SCr $1.69\pm 0.32$ mg/dL, eGFR $93.97\pm 56.53$ mL/min/1.73$m^2$). Both SCr and eGFR are significantly different across the three sub-phenotypes with statistical testing plus postdoc analysis, and the conclusion still holds after age adjustment. △ Less

Submitted 22 December, 2019; v1 submitted 9 April, 2019; originally announced April 2019.

arXiv:1902.05193 [pdf, other]

doi 10.46298/lmcs-17(4:22)2021

An application of parallel cut elimination in multiplicative linear logic to the Taylor expansion of proof nets

Authors: Jules Chouquet, Lionel Vaux Auclair

Abstract: We examine some combinatorial properties of parallel cut elimination in multiplicative linear logic (MLL) proof nets. We show that, provided we impose a constraint on some paths, we can bound the size of all the nets satisfying this constraint and reducing to a fixed resultant net. This result gives a sufficient condition for an infinite weighted sum of nets to reduce into another sum of nets, whi… ▽ More We examine some combinatorial properties of parallel cut elimination in multiplicative linear logic (MLL) proof nets. We show that, provided we impose a constraint on some paths, we can bound the size of all the nets satisfying this constraint and reducing to a fixed resultant net. This result gives a sufficient condition for an infinite weighted sum of nets to reduce into another sum of nets, while keeping coefficients finite. We moreover show that our constraints are stable under reduction. Our approach is motivated by the quantitative semantics of linear logic: many models have been proposed, whose structure reflect the Taylor expansion of multiplicative exponential linear logic (MELL) proof nets into infinite sums of differential nets. In order to simulate one cut elimination step in MELL, it is necessary to reduce an arbitrary number of cuts in the differential nets of its Taylor expansion. It turns out our results apply to differential nets, because their cut elimination is essentially multiplicative. We moreover show that the set of differential nets that occur in the Taylor expansion of an MELL net automatically satisfies our constraints. Interestingly, our nets are untyped: we only rely on the sequentiality of linear logic nets and the dynamics of cut elimination. The paths on which we impose bounds are the switching paths involved in the Danos--Regnier criterion for sequentiality. In order to accommodate multiplicative units and weakenings, our nets come equipped with jumps: each weakening node is connected to some other node. Our constraint can then be summed up as a bound on both the length of switching paths, and the number of weakenings that jump to a common node. △ Less

Submitted 17 December, 2021; v1 submitted 13 February, 2019; originally announced February 2019.

Journal ref: Logical Methods in Computer Science, Volume 17, Issue 4 (December 20, 2021) lmcs:5196

arXiv:1810.07259 [pdf, ps, other]

Nearly Optimal Space Efficient Algorithm for Depth First Search

Authors: Jayesh Choudhari, Manoj Gupta, Shivdutt Sharma

Abstract: We design a space-efficient algorithm for performing depth-first search traversal(DFS) of a graph in $O(m+n\log^* n)$ time using $O(n)$ bits of space. While a normal DFS algorithm results in a DFS-tree (in case the graph is connected), our space bounds do not permit us even to store such a tree. However, our algorithm correctly outputs all edges of the DFS-tree. The previous best algorithm (whic… ▽ More We design a space-efficient algorithm for performing depth-first search traversal(DFS) of a graph in $O(m+n\log^* n)$ time using $O(n)$ bits of space. While a normal DFS algorithm results in a DFS-tree (in case the graph is connected), our space bounds do not permit us even to store such a tree. However, our algorithm correctly outputs all edges of the DFS-tree. The previous best algorithm (which used $O(n)$ working space) took $O(m \log n)$ time (Asano, Izumi, Kiyomi, Konagaya, Ono, Otachi, Schweitzer, Tarui, Uehara (ISAAC 2014) and Elmasry, Hagerup, Krammer (STACS 2015)). The main open question left behind in this area was to design faster algorithm for DFS using $O(n)$ bits of space. Our algorithm answers this open question as it has a nearly optimal running time (as the DFS takes $O(m+n)$ time even if there is no space restriction). △ Less

Submitted 16 October, 2018; originally announced October 2018.

arXiv:1809.06018 [pdf, other]

Integrative Analysis of Patient Health Records and Neuroimages via Memory-based Graph Convolutional Network

Authors: Xi Sheryl Zhang, Jingyuan Chou, Fei Wang

Abstract: With the arrival of the big data era, more and more data are becoming readily available in various real-world applications and those data are usually highly heterogeneous. Taking computational medicine as an example, we have both Electronic Health Records (EHR) and medical images for each patient. For complicated diseases such as Parkinson's and Alzheimer's, both EHR and neuroimaging information a… ▽ More With the arrival of the big data era, more and more data are becoming readily available in various real-world applications and those data are usually highly heterogeneous. Taking computational medicine as an example, we have both Electronic Health Records (EHR) and medical images for each patient. For complicated diseases such as Parkinson's and Alzheimer's, both EHR and neuroimaging information are very important for disease understanding because they contain complementary aspects of the disease. However, EHR and neuroimage are completely different. So far the existing research has been mainly focusing on one of them. In this paper, we proposed a framework, Memory-Based Graph Convolution Network (MemGCN), to perform integrative analysis with such multi-modal data. Specifically, GCN is used to extract useful information from the patients' neuroimages. The information contained in the patient EHRs before the acquisition of each brain image is captured by a memory network because of its sequential nature. The information contained in each brain image is combined with the information read out from the memory network to infer the disease state at the image acquisition timestamp. To further enhance the analytical power of MemGCN, we also designed a multi-hop strategy that allows multiple reading and updating on the memory can be performed at each iteration. We conduct experiments using the patient data from the Parkinson's Progression Markers Initiative (PPMI) with the task of classification of Parkinson's Disease (PD) cases versus controls. We demonstrate that superior classification performance can be achieved with our proposed framework, comparing with existing approaches involving a single type of data. △ Less

Submitted 7 May, 2019; v1 submitted 17 September, 2018; originally announced September 2018.

arXiv:1809.04487 [pdf, other]

Discovering Topical Interactions in Text-based Cascades using Hidden Markov Hawkes Processes

Authors: Srikanta Bedathur, Indrajit Bhattacharya, Jayesh Choudhari, Anirban Dasgupta

Abstract: Social media conversations unfold based on complex interactions between users, topics and time. While recent models have been proposed to capture network strengths between users, users' topical preferences and temporal patterns between posting and response times, interaction patterns between topics has not been studied. We propose the Hidden Markov Hawkes Process (HMHP) that incorporates topical M… ▽ More Social media conversations unfold based on complex interactions between users, topics and time. While recent models have been proposed to capture network strengths between users, users' topical preferences and temporal patterns between posting and response times, interaction patterns between topics has not been studied. We propose the Hidden Markov Hawkes Process (HMHP) that incorporates topical Markov Chains within Hawkes processes to jointly model topical interactions along with user-user and user-topic patterns. We propose a Gibbs sampling algorithm for HMHP that jointly infers the network strengths, diffusion paths, the topics of the posts as well as the topic-topic interactions. We show using experiments on real and semi-synthetic data that HMHP is able to generalize better and recover the network strengths, topics and diffusion paths more accurately than state-of-the-art baselines. More interestingly, HMHP finds insightful interactions between topics in real tweets which no existing model is able to do. △ Less

Submitted 12 September, 2018; originally announced September 2018.

Comments: Accepted as a short paper at ICDM-2018

arXiv:1808.03113 [pdf, other]

Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

Authors: Cheng-chieh Yeh, Po-chun Hsu, Ju-chieh Chou, Hung-yi Lee, Lin-shan Lee

Abstract: Speaking rate refers to the average number of phonemes within some unit time, while the rhythmic patterns refer to duration distributions for realizations of different phonemes within different phonetic structures. Both are key components of prosody in speech, which is different for different speakers. Models like cycle-consistent adversarial network (Cycle-GAN) and variational auto-encoder (VAE)… ▽ More Speaking rate refers to the average number of phonemes within some unit time, while the rhythmic patterns refer to duration distributions for realizations of different phonemes within different phonetic structures. Both are key components of prosody in speech, which is different for different speakers. Models like cycle-consistent adversarial network (Cycle-GAN) and variational auto-encoder (VAE) have been successfully applied to voice conversion tasks without parallel data. However, due to the neural network architectures and feature vectors chosen for these approaches, the length of the predicted utterance has to be fixed to that of the input utterance, which limits the flexibility in mimicking the speaking rates and rhythmic patterns for the target speaker. On the other hand, sequence-to-sequence learning model was used to remove the above length constraint, but parallel training data are needed. In this paper, we propose an approach utilizing sequence-to-sequence model trained with unsupervised Cycle-GAN to perform the transformation between the phoneme posteriorgram sequences for different speakers. In this way, the length constraint mentioned above is removed to offer rhythm-flexible voice conversion without requiring parallel data. Preliminary evaluation on two datasets showed very encouraging results. △ Less

Submitted 9 August, 2018; originally announced August 2018.

Comments: 8 pages, 6 figures, Submitted to SLT 2018

arXiv:1804.02812 [pdf, other]

Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations

Authors: Ju-chieh Chou, Cheng-chieh Yeh, Hung-yi Lee, Lin-shan Lee

Abstract: Recently, cycle-consistent adversarial network (Cycle-GAN) has been successfully applied to voice conversion to a different speaker without parallel data, although in those approaches an individual model is needed for each target speaker. In this paper, we propose an adversarial learning framework for voice conversion, with which a single model can be trained to convert the voice to many different… ▽ More Recently, cycle-consistent adversarial network (Cycle-GAN) has been successfully applied to voice conversion to a different speaker without parallel data, although in those approaches an individual model is needed for each target speaker. In this paper, we propose an adversarial learning framework for voice conversion, with which a single model can be trained to convert the voice to many different speakers, all without parallel data, by separating the speaker characteristics from the linguistic content in speech signals. An autoencoder is first trained to extract speaker-independent latent representations and speaker embedding separately using another auxiliary speaker classifier to regularize the latent representation. The decoder then takes the speaker-independent latent representation and the target speaker embedding as the input to generate the voice of the target speaker with the linguistic content of the source utterance. The quality of decoder output is further improved by patching with the residual signal produced by another pair of generator and discriminator. A target speaker set size of 20 was tested in the preliminary experiments, and very good voice quality was obtained. Conventional voice conversion metrics are reported. We also show that the speaker information has been properly reduced from the latent representations. △ Less

Submitted 24 June, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

Comments: Accepted to Interspeech 2018

arXiv:1710.02633 [pdf, other]

doi 10.1109/SMC.2017.8123136

Radiation Pattern Synthesis Using Hybrid Fourier- Woodward-Lawson-Neural Networks for Reliable MIMO Antenna Systems

Authors: Elies Ghayoula, Ridha Ghayoula, Jaouhar Fattahi, Emil Pricop, Jean-Yves Chouinard, Ammar Bouallegue

Abstract: In this paper, we implement hybrid Woodward-Lawson-Neural Networks and weighted Fourier method to synthesize antenna arrays. The neural networks (NN) is applied here to simplify the modeling of MIMO antenna arrays by assessing phases. The main problem is obviously to find optimal weights of the linear antenna array elements giving radiation pattern with minimum sidelobe level (SLL) and hence ameli… ▽ More In this paper, we implement hybrid Woodward-Lawson-Neural Networks and weighted Fourier method to synthesize antenna arrays. The neural networks (NN) is applied here to simplify the modeling of MIMO antenna arrays by assessing phases. The main problem is obviously to find optimal weights of the linear antenna array elements giving radiation pattern with minimum sidelobe level (SLL) and hence ameliorating the antenna array performance. To attain this purpose, an antenna array for reliable Multiple-Input Multiple-Output (MIMO) applications with frequency at 2.45 GHz is implemented. To validate the suggested method, many examples of uniformly excited array patterns with the main beam are put in the direction of the useful signal. The Woodward-Lawson-Neural Networks synthesis method permits to find out interesting analytical equations for the synthesis of an antenna array and highlights the flexibility between the system parameters in input and those in output. The performance of this hybrid optimization underlines how well the system is suitable for a wireless communication and how it participates in reducing interference, as well. △ Less

Submitted 7 October, 2017; originally announced October 2017.

Comments: Accepted at the IEEE SMC 2017

arXiv:1705.10923 [pdf, other]

Saving Critical Nodes with Firefighters is FPT

Authors: Jayesh Choudhari, Anirban Dasgupta, Neeldhara Misra, M. S. Ramanujan

Abstract: We consider the problem of firefighting to save a critical subset of nodes. The firefighting game is a turn-based game played on a graph, where the fire spreads to vertices in a breadth-first manner from a source, and firefighters can be placed on yet unburnt vertices on alternate rounds to block the fire. In this work, we consider the problem of saving a critical subset of nodes from catching fir… ▽ More We consider the problem of firefighting to save a critical subset of nodes. The firefighting game is a turn-based game played on a graph, where the fire spreads to vertices in a breadth-first manner from a source, and firefighters can be placed on yet unburnt vertices on alternate rounds to block the fire. In this work, we consider the problem of saving a critical subset of nodes from catching fire, given a total budget on the number of firefighters. We show that the problem is para-NP-hard when parameterized by the size of the critical set. We also show that it is fixed-parameter tractable on general graphs when parameterized by the number of firefighters. We also demonstrate improved running times on trees and establish that the problem is unlikely to admit a polynomial kernelization (even when restricted to trees). Our work is the first to exploit the connection between the firefighting problem and the notions of important separators and tight separator sequences. Finally, we consider the spreading model of the firefighting game, a closely related problem, and show that the problem of saving a critical set parameterized by the number of firefighters is W[2]-hard, which contrasts our FPT result for the non-spreading model. △ Less

Submitted 30 May, 2017; originally announced May 2017.

Comments: 21 pages, Accepted at ICALP-2017

arXiv:1703.00797 [pdf]

A Simple, Fast and Fully Automated Approach for Midline Shift Measurement on Brain Computed Tomography

Authors: Huan-Chih Wang, Shih-Hao Ho, Furen Xiao, Jen-Hai Chou

Abstract: Brain CT has become a standard imaging tool for emergent evaluation of brain condition, and measurement of midline shift (MLS) is one of the most important features to address for brain CT assessment. We present a simple method to estimate MLS and propose a new alternative parameter to MLS: the ratio of MLS over the maximal width of intracranial region (MLS/ICWMAX). Three neurosurgeons and our aut… ▽ More Brain CT has become a standard imaging tool for emergent evaluation of brain condition, and measurement of midline shift (MLS) is one of the most important features to address for brain CT assessment. We present a simple method to estimate MLS and propose a new alternative parameter to MLS: the ratio of MLS over the maximal width of intracranial region (MLS/ICWMAX). Three neurosurgeons and our automated system were asked to measure MLS and MLS/ICWMAX in the same sets of axial CT images obtained from 41 patients admitted to ICU under neurosurgical service. A weighted midline (WML) was plotted based on individual pixel intensities, with higher weighted given to the darker portions. The MLS could then be measured as the distance between the WML and ideal midline (IML) near the foramen of Monro. The average processing time to output an automatic MLS measurement was around 10 seconds. Our automated system achieved an overall accuracy of 90.24% when the CT images were calibrated automatically, and performed better when the calibrations of head rotation were done manually (accuracy: 92.68%). MLS/ICWMAX and MLS both gave results in same confusion matrices and produced similar ROC curve results. We demonstrated a simple, fast and accurate automated system of MLS measurement and introduced a new parameter (MLS/ICWMAX) as a good alternative to MLS in terms of estimating the degree of brain deformation, especially when non-DICOM images (e.g. JPEG) are more easily accessed. △ Less

Submitted 2 March, 2017; originally announced March 2017.

arXiv:1007.0060 [pdf]

Cryptanalysis on Four Two-Party Authentication Protocols

Authors: Yalin Chen, Jue-Sam Chou*, Chun-Hui Huang

Abstract: In this paper, we analyze four authentication protocols of Bindu et al., Goriparthi et al., Wang et al. and Hölbl et al.. After investigation, we reveal several weaknesses of these schemes. First, Bindu et al.'s protocol suffers from an insider impersonation attack if a malicious user obtains a lost smart card. Second, both Goriparthi et al.'s and Wang et al.'s protocols cannot withstand a DoS att… ▽ More In this paper, we analyze four authentication protocols of Bindu et al., Goriparthi et al., Wang et al. and Hölbl et al.. After investigation, we reveal several weaknesses of these schemes. First, Bindu et al.'s protocol suffers from an insider impersonation attack if a malicious user obtains a lost smart card. Second, both Goriparthi et al.'s and Wang et al.'s protocols cannot withstand a DoS attack in the password change phase, i.e. an attacker can involve the phase to make user's password never be used in subsequent authentications. Third, Hölbl et al.'s protocol is vulnerable to an insider attack since a legal but malevolent user can deduce KGC's secret key. △ Less

Submitted 30 June, 2010; originally announced July 2010.

Comments: 5 pages

MSC Class: 68p25 (data encryption)

Journal ref: (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 2, 2010

arXiv:1007.0057 [pdf]

Comments on Five Smart Card Based Password Authentication Protocols

Authors: Yalin Chen, Jue-Sam Chou*, Chun-Hui Huang

Abstract: In this paper, we use the ten security requirements proposed by Liao et al. for a smart card based authentication protocol to examine five recent work in this area. After analyses, we found that the protocols of Juang et al.'s , Hsiang et al.'s, Kim et al.'s, and Li et al.'s all suffer from offline password guessing attack if the smart card is lost, and the protocol of Xu et al.'s is subjected to… ▽ More In this paper, we use the ten security requirements proposed by Liao et al. for a smart card based authentication protocol to examine five recent work in this area. After analyses, we found that the protocols of Juang et al.'s , Hsiang et al.'s, Kim et al.'s, and Li et al.'s all suffer from offline password guessing attack if the smart card is lost, and the protocol of Xu et al.'s is subjected to an insider impersonation attack. △ Less

Submitted 30 June, 2010; originally announced July 2010.

Comments: 4 pages,

Journal ref: (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 2, 2010

arXiv:0710.4681 [pdf]

A Quality-of-Service Mechanism for Interconnection Networks in System-on-Chips

Authors: Wolf-Dietrich Weber, Joe Chou, Ian Swarbrick, Drew Wingard

Abstract: As Moore's Law continues to fuel the ability to build ever increasingly complex system-on-chips (SoCs), achieving performance goals is rising as a critical challenge to completing designs. In particular, the system interconnect must efficiently service a diverse set of data flows with widely ranging quality-of-service (QoS) requirements. However, the known solutions for off-chip interconnects su… ▽ More As Moore's Law continues to fuel the ability to build ever increasingly complex system-on-chips (SoCs), achieving performance goals is rising as a critical challenge to completing designs. In particular, the system interconnect must efficiently service a diverse set of data flows with widely ranging quality-of-service (QoS) requirements. However, the known solutions for off-chip interconnects such as large-scale networks are not necessarily applicable to the on-chip environment. Latency and memory constraints for on-chip interconnects are quite different from larger-scale interconnects. This paper introduces a novel on-chip interconnect arbitration scheme. We show how this scheme can be distributed across a chip for high-speed implementation. We compare the performance of the arbitration scheme with other known interconnect arbitration schemes. Existing schemes typically focus heavily on either low latency of service for some initiators, or alternatively on guaranteed bandwidth delivery for other initiators. Our scheme allows service latency on some initiators to be traded off smoothly against jitter bounds on other initiators, while still delivering bandwidth guarantees. This scheme is a subset of the QoS controls that are available in the SonicsMX? (SMX) product. △ Less

Submitted 25 October, 2007; originally announced October 2007.

Comments: Submitted on behalf of EDAA (http://www.edaa.com/)

Journal ref: Dans Design, Automation and Test in Europe - DATE'05, Munich : Allemagne (2005)

Showing 1–47 of 47 results for author: Chou*, J