Search | arXiv e-print repository

Lane-Wise Highway Anomaly Detection

Authors: Mei Qiu, William Lorenz Reindl, Yaobin Chen, Stanley Chien, Shu Hu

Abstract: This paper proposes a scalable and interpretable framework for lane-wise highway traffic anomaly detection, leveraging multi-modal time series data extracted from surveillance cameras. Unlike traditional sensor-dependent methods, our approach uses AI-powered vision models to extract lane-specific features, including vehicle count, occupancy, and truck percentage, without relying on costly hardware… ▽ More This paper proposes a scalable and interpretable framework for lane-wise highway traffic anomaly detection, leveraging multi-modal time series data extracted from surveillance cameras. Unlike traditional sensor-dependent methods, our approach uses AI-powered vision models to extract lane-specific features, including vehicle count, occupancy, and truck percentage, without relying on costly hardware or complex road modeling. We introduce a novel dataset containing 73,139 lane-wise samples, annotated with four classes of expert-validated anomalies: three traffic-related anomalies (lane blockage and recovery, foreign object intrusion, and sustained congestion) and one sensor-related anomaly (camera angle shift). Our multi-branch detection system integrates deep learning, rule-based logic, and machine learning to improve robustness and precision. Extensive experiments demonstrate that our framework outperforms state-of-the-art methods in precision, recall, and F1-score, providing a cost-effective and scalable solution for real-world intelligent transportation systems. △ Less

Submitted 5 May, 2025; originally announced May 2025.

arXiv:2502.14803 [pdf, other]

Planning, scheduling, and execution on the Moon: the CADRE technology demonstration mission

Authors: Gregg Rabideau, Joseph Russino, Andrew Branch, Nihal Dhamani, Tiago Stegun Vaquero, Steve Chien, Jean-Pierre de la Croix, Federico Rossi

Abstract: NASA's Cooperative Autonomous Distributed Robotic Exploration (CADRE) mission, slated for flight to the Moon's Reiner Gamma region in 2025/2026, is designed to demonstrate multi-agent autonomous exploration of the Lunar surface and sub-surface. A team of three robots and a base station will autonomously explore a region near the lander, collecting the data required for 3D reconstruction of the sur… ▽ More NASA's Cooperative Autonomous Distributed Robotic Exploration (CADRE) mission, slated for flight to the Moon's Reiner Gamma region in 2025/2026, is designed to demonstrate multi-agent autonomous exploration of the Lunar surface and sub-surface. A team of three robots and a base station will autonomously explore a region near the lander, collecting the data required for 3D reconstruction of the surface with no human input; and then autonomously perform distributed sensing with multi-static ground penetrating radars (GPR), driving in formation while performing coordinated radar soundings to create a map of the subsurface. At the core of CADRE's software architecture is a novel autonomous, distributed planning, scheduling, and execution (PS&E) system. The system coordinates the robots' activities, planning and executing tasks that require multiple robots' participation while ensuring that each individual robot's thermal and power resources stay within prescribed bounds, and respecting ground-prescribed sleep-wake cycles. The system uses a centralized-planning, distributed-execution paradigm, and a leader election mechanism ensures robustness to failures of individual agents. In this paper, we describe the architecture of CADRE's PS&E system; discuss its design rationale; and report on verification and validation (V&V) testing of the system on CADRE's hardware in preparation for deployment on the Moon. △ Less

Submitted 20 February, 2025; originally announced February 2025.

Comments: To be presented at AAMAS 2025

arXiv:2501.13375 [pdf, other]

Bridging The Multi-Modality Gaps of Audio, Visual and Linguistic for Speech Enhancement

Authors: Meng-Ping Lin, Jen-Cheng Hou, Chia-Wei Chen, Shao-Yi Chien, Jun-Cheng Chen, Xugang Lu, Yu Tsao

Abstract: Speech enhancement (SE) aims to improve the quality and intelligibility of speech in noisy environments. Recent studies have shown that incorporating visual cues in audio signal processing can enhance SE performance. Given that human speech communication naturally involves audio, visual, and linguistic modalities, it is reasonable to expect additional improvements by integrating linguistic informa… ▽ More Speech enhancement (SE) aims to improve the quality and intelligibility of speech in noisy environments. Recent studies have shown that incorporating visual cues in audio signal processing can enhance SE performance. Given that human speech communication naturally involves audio, visual, and linguistic modalities, it is reasonable to expect additional improvements by integrating linguistic information. However, effectively bridging these modality gaps, particularly during knowledge transfer remains a significant challenge. In this paper, we propose a novel multi-modal learning framework, termed DLAV-SE, which leverages a diffusion-based model integrating audio, visual, and linguistic information for audio-visual speech enhancement (AVSE). Within this framework, the linguistic modality is modeled using a pretrained language model (PLM), which transfers linguistic knowledge to the audio-visual domain through a cross-modal knowledge transfer (CMKT) mechanism during training. After training, the PLM is no longer required at inference, as its knowledge is embedded into the AVSE model through the CMKT process. We conduct a series of SE experiments to evaluate the effectiveness of our approach. Results show that the proposed DLAV-SE system significantly improves speech quality and reduces generative artifacts, such as phonetic confusion, compared to state-of-the-art (SOTA) methods. Furthermore, visualization analyses confirm that the CMKT method enhances the generation quality of the AVSE outputs. These findings highlight both the promise of diffusion-based methods for advancing AVSE and the value of incorporating linguistic information to further improve system performance. △ Less

Submitted 26 May, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

arXiv:2411.06297 [pdf, other]

Adaptive Aspect Ratios with Patch-Mixup-ViT-based Vehicle ReID

Authors: Mei Qiu, Lauren Ann Christopher, Stanley Chien, Lingxi Li

Abstract: Vision Transformers (ViTs) have shown exceptional performance in vehicle re-identification (ReID) tasks. However, non-square aspect ratios of image or video inputs can negatively impact re-identification accuracy. To address this challenge, we propose a novel, human perception driven, and general ViT-based ReID framework that fuses models trained on various aspect ratios. Our key contributions are… ▽ More Vision Transformers (ViTs) have shown exceptional performance in vehicle re-identification (ReID) tasks. However, non-square aspect ratios of image or video inputs can negatively impact re-identification accuracy. To address this challenge, we propose a novel, human perception driven, and general ViT-based ReID framework that fuses models trained on various aspect ratios. Our key contributions are threefold: (i) We analyze the impact of aspect ratios on performance using the VeRi-776 and VehicleID datasets, providing guidance for input settings based on the distribution of original image aspect ratios. (ii) We introduce patch-wise mixup strategy during ViT patchification (guided by spatial attention scores) and implement uneven stride for better alignment with object aspect ratios. (iii) We propose a dynamic feature fusion ReID network to enhance model robustness. Our method outperforms state-of-the-art transformer-based approaches on both datasets, with only a minimal increase in inference time per image. △ Less

Submitted 9 November, 2024; originally announced November 2024.

arXiv:2409.14430 [pdf, other]

Pomo3D: 3D-Aware Portrait Accessorizing and More

Authors: Tzu-Chieh Liu, Chih-Ting Liu, Shao-Yi Chien

Abstract: We propose Pomo3D, a 3D portrait manipulation framework that allows free accessorizing by decomposing and recomposing portraits and accessories. It enables the avatars to attain out-of-distribution (OOD) appearances of simultaneously wearing multiple accessories. Existing methods still struggle to offer such explicit and fine-grained editing; they either fail to generate additional objects on give… ▽ More We propose Pomo3D, a 3D portrait manipulation framework that allows free accessorizing by decomposing and recomposing portraits and accessories. It enables the avatars to attain out-of-distribution (OOD) appearances of simultaneously wearing multiple accessories. Existing methods still struggle to offer such explicit and fine-grained editing; they either fail to generate additional objects on given portraits or cause alterations to portraits (e.g., identity shift) when generating accessories. This restriction presents a noteworthy obstacle as people typically seek to create charming appearances with diverse and fashionable accessories in the virtual universe. Our approach provides an effective solution to this less-addressed issue. We further introduce the Scribble2Accessories module, enabling Pomo3D to create 3D accessories from user-drawn accessory scribble maps. Moreover, we design a bias-conscious mapper to mitigate biased associations present in real-world datasets. In addition to object-level manipulation above, Pomo3D also offers extensive editing options on portraits, including global or local editing of geometry and texture and avatar stylization, elevating 3D editing of neural portraits to a more comprehensive level. △ Less

Submitted 22 September, 2024; originally announced September 2024.

arXiv:2409.13953 [pdf, other]

Training Large ASR Encoders with Differential Privacy

Authors: Geeticka Chauhan, Steve Chien, Om Thakkar, Abhradeep Thakurta, Arun Narayanan

Abstract: Self-supervised learning (SSL) methods for large speech models have proven to be highly effective at ASR. With the interest in public deployment of large pre-trained models, there is a rising concern for unintended memorization and leakage of sensitive data points from the training data. In this paper, we apply differentially private (DP) pre-training to a SOTA Conformer-based encoder, and study i… ▽ More Self-supervised learning (SSL) methods for large speech models have proven to be highly effective at ASR. With the interest in public deployment of large pre-trained models, there is a rising concern for unintended memorization and leakage of sensitive data points from the training data. In this paper, we apply differentially private (DP) pre-training to a SOTA Conformer-based encoder, and study its performance on a downstream ASR task assuming the fine-tuning data is public. This paper is the first to apply DP to SSL for ASR, investigating the DP noise tolerance of the BEST-RQ pre-training method. Notably, we introduce a novel variant of model pruning called gradient-based layer freezing that provides strong improvements in privacy-utility-compute trade-offs. Our approach yields a LibriSpeech test-clean/other WER (%) of 3.78/ 8.41 with ($10$, 1e^-9)-DP for extrapolation towards low dataset scales, and 2.81/ 5.89 with (10, 7.9e^-11)-DP for extrapolation towards high scales. △ Less

Submitted 20 September, 2024; originally announced September 2024.

Comments: In proceedings of the IEEE Spoken Language Technologies Workshop, 2024

arXiv:2407.09966 [pdf, other]

Optimizing ROI Benefits Vehicle ReID in ITS

Authors: Mei Qiu, Lauren Ann Christopher, Lingxi Li, Stanley Chien, Yaobin Chen

Abstract: Vehicle re-identification (ReID) is a computer vision task that matches the same vehicle across different cameras or viewpoints in a surveillance system. This is crucial for Intelligent Transportation Systems (ITS), where the effectiveness is influenced by the regions from which vehicle images are cropped. This study explores whether optimal vehicle detection regions, guided by detection confidenc… ▽ More Vehicle re-identification (ReID) is a computer vision task that matches the same vehicle across different cameras or viewpoints in a surveillance system. This is crucial for Intelligent Transportation Systems (ITS), where the effectiveness is influenced by the regions from which vehicle images are cropped. This study explores whether optimal vehicle detection regions, guided by detection confidence scores, can enhance feature matching and ReID tasks. Using our framework with multiple Regions of Interest (ROIs) and lane-wise vehicle counts, we employed YOLOv8 for detection and DeepSORT for tracking across twelve Indiana Highway videos, including two pairs of videos from non-overlapping cameras. Tracked vehicle images were cropped from inside and outside the ROIs at five-frame intervals. Features were extracted using pre-trained models: ResNet50, ResNeXt50, Vision Transformer, and Swin-Transformer. Feature consistency was assessed through cosine similarity, information entropy, and clustering variance. Results showed that features from images cropped inside ROIs had higher mean cosine similarity values compared to those involving one image inside and one outside the ROIs. The most significant difference was observed during night conditions (0.7842 inside vs. 0.5 outside the ROI with Swin-Transformer) and in cross-camera scenarios (0.75 inside-inside vs. 0.52 inside-outside the ROI with Vision Transformer). Information entropy and clustering variance further supported that features in ROIs are more consistent. These findings suggest that strategically selected ROIs can enhance tracking performance and ReID accuracy in ITS. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.04688 [pdf, other]

Enhancing Vehicle Re-identification and Matching for Weaving Analysis

Authors: Mei Qiu, Wei Lin, Stanley Chien, Lauren Christopher, Yaobin Chen, Shu Hu

Abstract: Vehicle weaving on highways contributes to traffic congestion, raises safety issues, and underscores the need for sophisticated traffic management systems. Current tools are inadequate in offering precise and comprehensive data on lane-specific weaving patterns. This paper introduces an innovative method for collecting non-overlapping video data in weaving zones, enabling the generation of quantit… ▽ More Vehicle weaving on highways contributes to traffic congestion, raises safety issues, and underscores the need for sophisticated traffic management systems. Current tools are inadequate in offering precise and comprehensive data on lane-specific weaving patterns. This paper introduces an innovative method for collecting non-overlapping video data in weaving zones, enabling the generation of quantitative insights into lane-specific weaving behaviors. Our experimental results confirm the efficacy of this approach, delivering critical data that can assist transportation authorities in enhancing traffic control and roadway infrastructure. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2406.15686 [pdf, other]

Transport-Level Encryption in Datacenter Networks

Authors: Tianyi Gao, Xinshu Ma, Suhas Narreddy, Eugenio Luo, Steven W. D. Chien, Michio Honda

Abstract: Cloud applications need network data encryption to isolate from other tenants and protect their data from potential eavesdroppers in the network infrastructure. This paper presents SDT, a protocol design for emerging datacenter transport protocols to integrate data encryption while using existing NIC offloading designed for TLS over TCP. Therefore, SDT could enable a deployment path of new transpo… ▽ More Cloud applications need network data encryption to isolate from other tenants and protect their data from potential eavesdroppers in the network infrastructure. This paper presents SDT, a protocol design for emerging datacenter transport protocols to integrate data encryption while using existing NIC offloading designed for TLS over TCP. Therefore, SDT could enable a deployment path of new transport protocols in data-centers without giving up hardware offloading. △ Less

Submitted 24 September, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

arXiv:2404.15212 [pdf, other]

Real-time Lane-wise Traffic Monitoring in Optimal ROIs

Authors: Mei Qiu, Wei Lin, Lauren Ann Christopher, Stanley Chien, Yaobin Chen, Shu Hu

Abstract: In the US, thousands of Pan, Tilt, and Zoom (PTZ) traffic cameras monitor highway conditions. There is a great interest in using these highway cameras to gather valuable road traffic data to support traffic analysis and decision-making for highway safety and efficient traffic management. However, there are too many cameras for a few human traffic operators to effectively monitor, so a fully automa… ▽ More In the US, thousands of Pan, Tilt, and Zoom (PTZ) traffic cameras monitor highway conditions. There is a great interest in using these highway cameras to gather valuable road traffic data to support traffic analysis and decision-making for highway safety and efficient traffic management. However, there are too many cameras for a few human traffic operators to effectively monitor, so a fully automated solution is desired. This paper introduces a novel system that learns the locations of highway lanes and traffic directions from these camera feeds automatically. It collects real-time, lane-specific traffic data continuously, even adjusting for changes in camera angle or zoom. This facilitates efficient traffic analysis, decision-making, and improved highway safety. △ Less

Submitted 28 March, 2024; originally announced April 2024.

arXiv:2404.06135 [pdf, other]

Efficient Concertormer for Image Deblurring and Beyond

Authors: Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang

Abstract: The Transformer architecture has achieved remarkable success in natural language processing and high-level vision tasks over the past few years. However, the inherent complexity of self-attention is quadratic to the size of the image, leading to unaffordable computational costs for high-resolution vision tasks. In this paper, we introduce Concertormer, featuring a novel Concerto Self-Attention (CS… ▽ More The Transformer architecture has achieved remarkable success in natural language processing and high-level vision tasks over the past few years. However, the inherent complexity of self-attention is quadratic to the size of the image, leading to unaffordable computational costs for high-resolution vision tasks. In this paper, we introduce Concertormer, featuring a novel Concerto Self-Attention (CSA) mechanism designed for image deblurring. The proposed CSA divides self-attention into two distinct components: one emphasizes generally global and another concentrates on specifically local correspondence. By retaining partial information in additional dimensions independent from the self-attention calculations, our method effectively captures global contextual representations with complexity linear to the image size. To effectively leverage the additional dimensions, we present a Cross-Dimensional Communication module, which linearly combines attention maps and thus enhances expressiveness. Moreover, we amalgamate the two-staged Transformer design into a single stage using the proposed gated-dconv MLP architecture. While our primary objective is single-image motion deblurring, extensive quantitative and qualitative evaluations demonstrate that our approach performs favorably against the state-of-the-art methods in other tasks, such as deraining and deblurring with JPEG artifacts. The source codes and trained models will be made available to the public. △ Less

Submitted 3 December, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2401.14576 [pdf]

iFast: Host-Side Logging for Scientific Applications

Authors: Steven W. D. Chien, Kento Sato, Artur Podobas, Niclas Jansson, Stefano Markidis, Michio Honda

Abstract: We have seen an increase in the heterogeneity of storage technologies potentially available to scientific applications, such as burst buffers, managed cloud parallel file systems (PFS), and object stores. However, those applications cannot easily utilize those technologies, because they are designed for traditional HPC systems that offer very high remote storage and network bandwidth. We present i… ▽ More We have seen an increase in the heterogeneity of storage technologies potentially available to scientific applications, such as burst buffers, managed cloud parallel file systems (PFS), and object stores. However, those applications cannot easily utilize those technologies, because they are designed for traditional HPC systems that offer very high remote storage and network bandwidth. We present iFast, a new distributed host-side logging approach to transparently accelerating scientific applications. iFast has a strong emphasis on deployability, supporting unmodified MPI applications with unmodified MPI implementations while preserving the crash consistency semantics. We evaluate iFast on traditional HPC, cloud HPC, local cluster, and a hybrid of both, using three scientific applications. iFast reduces end-to-end execution time by 13-26% for popular scientific applications on the cloud. We show for the first time, how an application on a recent production HPC system can write data to S3 storage through fully fledged MPI-IO, in a readily shareable format. △ Less

Submitted 2 August, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: Submitted to VLDB 2025

arXiv:2309.07173 [pdf, other]

Using Unsupervised and Supervised Learning and Digital Twin for Deep Convective Ice Storm Classification

Authors: Jason Swope, Steve Chien, Emily Dunkel, Xavier Bosch-Lluis, Qing Yue, William Deal

Abstract: Smart Ice Cloud Sensing (SMICES) is a small-sat concept in which a primary radar intelligently targets ice storms based on information collected by a lookahead radiometer. Critical to the intelligent targeting is accurate identification of storm/cloud types from eight bands of radiance collected by the radiometer. The cloud types of interest are: clear sky, thin cirrus, cirrus, rainy anvil, and co… ▽ More Smart Ice Cloud Sensing (SMICES) is a small-sat concept in which a primary radar intelligently targets ice storms based on information collected by a lookahead radiometer. Critical to the intelligent targeting is accurate identification of storm/cloud types from eight bands of radiance collected by the radiometer. The cloud types of interest are: clear sky, thin cirrus, cirrus, rainy anvil, and convection core. We describe multi-step use of Machine Learning and Digital Twin of the Earth's atmosphere to derive such a classifier. First, a digital twin of Earth's atmosphere called a Weather Research Forecast (WRF) is used generate simulated lookahead radiometer data as well as deeper "science" hidden variables. The datasets simulate a tropical region over the Caribbean and a non-tropical region over the Atlantic coast of the United States. A K-means clustering over the scientific hidden variables was utilized by human experts to generate an automatic labelling of the data - mapping each physical data point to cloud types by scientists informed by mean/centroids of hidden variables of the clusters. Next, classifiers were trained with the inputs of the simulated radiometer data and its corresponding label. The classifiers of a random decision forest (RDF), support vector machine (SVM), Gaussian naïve bayes, feed forward artificial neural network (ANN), and a convolutional neural network (CNN) were trained. Over the tropical dataset, the best performing classifier was able to identify non-storm and storm clouds with over 80% accuracy in each class for a held-out test set. Over the non-tropical dataset, the best performing classifier was able to classify non-storm clouds with over 90% accuracy and storm clouds with over 40% accuracy. Additionally both sets of classifiers were shown to be resilient to instrument noise. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2308.07717 [pdf, other]

Real-time Automatic M-mode Echocardiography Measurement with Panel Attention from Local-to-Global Pixels

Authors: Ching-Hsun Tseng, Shao-Ju Chien, Po-Shen Wang, Shin-Jye Lee, Wei-Huan Hu, Bin Pu, Xiao-jun Zeng

Abstract: Motion mode (M-mode) recording is an essential part of echocardiography to measure cardiac dimension and function. However, the current diagnosis cannot build an automatic scheme, as there are three fundamental obstructs: Firstly, there is no open dataset available to build the automation for ensuring constant results and bridging M-mode echocardiography with real-time instance segmentation (RIS);… ▽ More Motion mode (M-mode) recording is an essential part of echocardiography to measure cardiac dimension and function. However, the current diagnosis cannot build an automatic scheme, as there are three fundamental obstructs: Firstly, there is no open dataset available to build the automation for ensuring constant results and bridging M-mode echocardiography with real-time instance segmentation (RIS); Secondly, the examination is involving the time-consuming manual labelling upon M-mode echocardiograms; Thirdly, as objects in echocardiograms occupy a significant portion of pixels, the limited receptive field in existing backbones (e.g., ResNet) composed from multiple convolution layers are inefficient to cover the period of a valve movement. Existing non-local attentions (NL) compromise being unable real-time with a high computation overhead or losing information from a simplified version of the non-local block. Therefore, we proposed RAMEM, a real-time automatic M-mode echocardiography measurement scheme, contributes three aspects to answer the problems: 1) provide MEIS, a dataset of M-mode echocardiograms for instance segmentation, to enable consistent results and support the development of an automatic scheme; 2) propose panel attention, local-to-global efficient attention by pixel-unshuffling, embedding with updated UPANets V2 in a RIS scheme toward big object detection with global receptive field; 3) develop and implement AMEM, an efficient algorithm of automatic M-mode echocardiography measurement enabling fast and accurate automatic labelling among diagnosis. The experimental results show that RAMEM surpasses existing RIS backbones (with non-local attention) in PASCAL 2012 SBD and human performances in real-time MEIS tested. The code of MEIS and dataset are available at https://github.com/hanktseng131415go/RAME. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2307.16141 [pdf, other]

Pupil Learning Mechanism

Authors: Rua-Huan Tsaih, Yu-Hang Chien, Shih-Yi Chien

Abstract: Studies on artificial neural networks rarely address both vanishing gradients and overfitting issues. In this study, we follow the pupil learning procedure, which has the features of interpreting, picking, understanding, cramming, and organizing, to derive the pupil learning mechanism (PLM) by which to modify the network structure and weights of 2-layer neural networks (2LNNs). The PLM consists of… ▽ More Studies on artificial neural networks rarely address both vanishing gradients and overfitting issues. In this study, we follow the pupil learning procedure, which has the features of interpreting, picking, understanding, cramming, and organizing, to derive the pupil learning mechanism (PLM) by which to modify the network structure and weights of 2-layer neural networks (2LNNs). The PLM consists of modules for sequential learning, adaptive learning, perfect learning, and less-overfitted learning. Based upon a copper price forecasting dataset, we conduct an experiment to validate the PLM module design modules, and an experiment to evaluate the performance of PLM. The empirical results indeed approve the PLM module design and show the superiority of the proposed PLM model over the linear regression model and the conventional backpropagation-based 2LNN model. △ Less

Submitted 30 July, 2023; originally announced July 2023.

arXiv:2306.04931 [pdf]

A New Scoring Method for the Evaluation of Vehicle Road Departure Detection Systems

Authors: Dan Shen, Lingxi Li, Stanley Chien, Yaobin Chen, Rini Sherony

Abstract: Road departure detection systems (RDDSs) for eliminating unintentional road departure collisions have been developed and equipped on some commercial vehicles in recent years. In order to provide a standardized and objective performance evaluation of RDDSs without the affections of systems complex nature of RDDSs and the design requirements, this paper proposes the development of the scoring method… ▽ More Road departure detection systems (RDDSs) for eliminating unintentional road departure collisions have been developed and equipped on some commercial vehicles in recent years. In order to provide a standardized and objective performance evaluation of RDDSs without the affections of systems complex nature of RDDSs and the design requirements, this paper proposes the development of the scoring method for evaluating vehicle RDDSs. Both flat road edge and vertical road edge are considered in the proposed scoring method, which combines two key variables: 1) the lateral distance of vehicle from road edge when RDW triggers; 2) the lateral distance of vehicle from road edge when RKA triggers. Two main criteria of road departure warning (RDW) and Road Keeping Assistance (RKA) are used to describe the performance of RDDSs. △ Less

Submitted 8 June, 2023; originally announced June 2023.

arXiv:2303.08544 [pdf, other]

Joint Security-vs-QoS Game Theoretical Optimization for Intrusion Response Mechanisms for Future Network Systems

Authors: Arash Bozorgchenani, Charilaos C. Zarakovitis, Su Fong Chien, Qiang Ni, Antonios Gouglidis, Wissam Mallouli, Heng Siong Lim

Abstract: Network connectivity exposes the network infrastructure and assets to vulnerabilities that attackers can exploit. Protecting network assets against attacks requires the application of security countermeasures. Nevertheless, employing countermeasures incurs costs, such as monetary costs, along with time and energy to prepare and deploy the countermeasures. Thus, an Intrusion Response System (IRS) s… ▽ More Network connectivity exposes the network infrastructure and assets to vulnerabilities that attackers can exploit. Protecting network assets against attacks requires the application of security countermeasures. Nevertheless, employing countermeasures incurs costs, such as monetary costs, along with time and energy to prepare and deploy the countermeasures. Thus, an Intrusion Response System (IRS) shall consider security and QoS costs when dynamically selecting the countermeasures to address the detected attacks. This has motivated us to formulate a joint Security-vs-QoS optimization problem to select the best countermeasures in an IRS. The problem is then transformed into a matching game-theoretical model. Considering the monetary costs and attack coverage constraints, we first derive the theoretical upper bound for the problem and later propose stable matching-based solutions to address the trade-off. The performance of the proposed solution, considering different settings, is validated over a series of simulations. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 12 pages, 8 figures

arXiv:2303.03177 [pdf, other]

Pre-trained Model Representations and their Robustness against Noise for Speech Emotion Analysis

Authors: Vikramjit Mitra, Vasudha Kowtha, Hsiang-Yun Sherry Chien, Erdrin Azemi, Carlos Avendano

Abstract: Pre-trained model representations have demonstrated state-of-the-art performance in speech recognition, natural language processing, and other applications. Speech models, such as Bidirectional Encoder Representations from Transformers (BERT) and Hidden units BERT (HuBERT), have enabled generating lexical and acoustic representations to benefit speech recognition applications. We investigated the… ▽ More Pre-trained model representations have demonstrated state-of-the-art performance in speech recognition, natural language processing, and other applications. Speech models, such as Bidirectional Encoder Representations from Transformers (BERT) and Hidden units BERT (HuBERT), have enabled generating lexical and acoustic representations to benefit speech recognition applications. We investigated the use of pre-trained model representations for estimating dimensional emotions, such as activation, valence, and dominance, from speech. We observed that while valence may rely heavily on lexical representations, activation and dominance rely mostly on acoustic information. In this work, we used multi-modal fusion representations from pre-trained models to generate state-of-the-art speech emotion estimation, and we showed a 100% and 30% relative improvement in concordance correlation coefficient (CCC) on valence estimation compared to standard acoustic and lexical baselines. Finally, we investigated the robustness of pre-trained model representations against noise and reverberation degradation and noticed that lexical and acoustic representations are impacted differently. We discovered that lexical representations are more robust to distortions compared to acoustic representations, and demonstrated that knowledge distillation from a multi-modal model helps to improve the noise-robustness of acoustic-based models. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: 5 pages, conference

arXiv:2303.00654 [pdf, other]

doi 10.1613/jair.1.14649

How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy

Authors: Natalia Ponomareva, Hussein Hazimeh, Alex Kurakin, Zheng Xu, Carson Denison, H. Brendan McMahan, Sergei Vassilvitskii, Steve Chien, Abhradeep Thakurta

Abstract: ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP t… ▽ More ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP to real world complex ML models are still few and far between. The adoption of DP is hindered by limited practical guidance of what DP protection entails, what privacy guarantees to aim for, and the difficulty of achieving good privacy-utility-computation trade-offs for ML models. Tricks for tuning and maximizing performance are scattered among papers or stored in the heads of practitioners. Furthermore, the literature seems to present conflicting evidence on how and whether to apply architectural adjustments and which components are "safe" to use with DP. This work is a self-contained guide that gives an in-depth overview of the field of DP ML and presents information about achieving the best possible DP ML model with rigorous privacy guarantees. Our target audience is both researchers and practitioners. Researchers interested in DP for ML will benefit from a clear overview of current advances and areas for improvement. We include theory-focused sections that highlight important topics such as privacy accounting and its assumptions, and convergence. For a practitioner, we provide a background in DP theory and a clear step-by-step guide for choosing an appropriate privacy definition and approach, implementing DP training, potentially updating the model architecture, and tuning hyperparameters. For both researchers and practitioners, consistently and fully reporting privacy guarantees is critical, and so we propose a set of specific best practices for stating guarantees. △ Less

Submitted 31 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Journal ref: Journal of Artificial Intelligence Research 77 (2023) 1113-1201

arXiv:2302.14832 [pdf]

Planetary Exploration Horizon 2061 Report Chapter 5: Enabling technologies for planetary exploration

Authors: Manuel Grande, Linli Guo, Michel Blanc, Advenit Makaya, Sami Asmar, David Atkinson, Anne Bourdon, Pascal Chabert, Steve Chien, John Day, Alberto G. Fairen, Anthony Freeman, Antonio Genova, Alain Herique, Wlodek Kofman, Joseph Lazio, Olivier Mousis, Gian Gabriele Ori, Victor Parro, Robert Preston, Jose A Rodriguez-Manfredi, Veerle Sterken, Keith Stephenson, Joshua Vander Hook, Hunter Waite , et al. (1 additional authors not shown)

Abstract: The main objective of this chapter is to present an overview of the different areas of key technologies that will be needed to fly the technically most challenging of the representative missions identified in chapter 4 (the Pillar 2 Horizon 2061 report). It starts with a description of the future scientific instruments which will address the key questions of Horizon 2061 described in chapter 3 (th… ▽ More The main objective of this chapter is to present an overview of the different areas of key technologies that will be needed to fly the technically most challenging of the representative missions identified in chapter 4 (the Pillar 2 Horizon 2061 report). It starts with a description of the future scientific instruments which will address the key questions of Horizon 2061 described in chapter 3 (the Pillar 1 Horizon 2061 report) and the new technologies that the next generations of space instruments will require (section 2). From there, the chapter follows the line of logical development and implementation of a planetary mission: section 3 describes some of the novel mission architectures that will be needed and how they will articulate interplanetary spacecraft and science platforms; section 4 summarizes the system-level technologies needed: power, propulsion, navigation, communication, advanced autonomy on board planetary spacecraft; section 5 describes the diversity of specialized science platforms that will be needed to survive, operate and return scientific data from the extreme environments that future missions will target; section 6 describes the new technology developments that will be needed for long-duration missions and semi-permanent settlements; finally, section 7 attempts to anticipate on the disruptive technologies that should emerge and progressively prevail in the decades to come to meet the long-term needs of future planetary missions. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: 100 pages, 23 figures, Horizon 2061 is a science-driven, foresight exercise, for future scientific investigations

arXiv:2301.08248 [pdf, other]

Enabling Astronaut Self-Scheduling using a Robust Advanced Modelling and Scheduling system: an assessment during a Mars analogue mission

Authors: Michael Saint-Guillain, Jean Vanderdonckt, Nicolas Burny, Vladimir Pletser, Tiago Vaquero, Steve Chien, Alexander Karl, Jessica Marquez, John Karasinski, Cyril Wain, Audrey Comein, Ignacio S. Casla, Jean Jacobs, Julien Meert, Cheyenne Chamart, Sirga Drouet, Julie Manon

Abstract: Human long duration exploration missions (LDEMs) raise a number of technological challenges. This paper addresses the question of the crew autonomy: as the distances increase, the communication delays and constraints tend to prevent the astronauts from being monitored and supported by a real time ground control. Eventually, future planetary missions will necessarily require a form of astronaut sel… ▽ More Human long duration exploration missions (LDEMs) raise a number of technological challenges. This paper addresses the question of the crew autonomy: as the distances increase, the communication delays and constraints tend to prevent the astronauts from being monitored and supported by a real time ground control. Eventually, future planetary missions will necessarily require a form of astronaut self-scheduling. We study the usage of a computer decision-support tool by a crew of analog astronauts, during a Mars simulation mission conducted at the Mars Desert Research Station (MDRS, Mars Society) in Utah. The proposed tool, called Romie, belongs to the new category of Robust Advanced Modelling and Scheduling (RAMS) systems. It allows the crew members (i) to visually model their scientific objectives and constraints, (ii) to compute near-optimal operational schedules while taking uncertainty into account, (iii) to monitor the execution of past and current activities, and (iv) to modify scientific objectives/constraints w.r.t. unforeseen events and opportunistic science. In this study, we empirically measure how the astronauts, who are novice planners, perform at using such a tool when self-scheduling under the realistic assumptions of a simulated Martian planetary habitat. △ Less

Submitted 14 January, 2023; originally announced January 2023.

arXiv:2212.12660 [pdf]

Risk assessment and mitigation of e-scooter crashes with naturalistic driving data

Authors: Avinash Prabu, Zhengming Zhang, Renran Tian, Stanley Chien, Lingxi Li, Yaobin Chen, Rini Sherony

Abstract: Recently, e-scooter-involved crashes have increased significantly but little information is available about the behaviors of on-road e-scooter riders. Most existing e-scooter crash research was based on retrospectively descriptive media reports, emergency room patient records, and crash reports. This paper presents a naturalistic driving study with a focus on e-scooter and vehicle encounters. The… ▽ More Recently, e-scooter-involved crashes have increased significantly but little information is available about the behaviors of on-road e-scooter riders. Most existing e-scooter crash research was based on retrospectively descriptive media reports, emergency room patient records, and crash reports. This paper presents a naturalistic driving study with a focus on e-scooter and vehicle encounters. The goal is to quantitatively measure the behaviors of e-scooter riders in different encounters to help facilitate crash scenario modeling, baseline behavior modeling, and the potential future development of in-vehicle mitigation algorithms. The data was collected using an instrumented vehicle and an e-scooter rider wearable system, respectively. A three-step data analysis process is developed. First, semi-automatic data labeling extracts e-scooter rider images and non-rider human images in similar environments to train an e-scooter-rider classifier. Then, a multi-step scene reconstruction pipeline generates vehicle and e-scooter trajectories in all encounters. The final step is to model e-scooter rider behaviors and e-scooter-vehicle encounter scenarios. A total of 500 vehicle to e-scooter interactions are analyzed. The variables pertaining to the same are also discussed in this paper. △ Less

Submitted 15 January, 2023; v1 submitted 24 December, 2022; originally announced December 2022.

arXiv:2212.12436 [pdf, other]

doi 10.1109/ITSC55140.2022.9921953

SceNDD: A Scenario-based Naturalistic Driving Dataset

Authors: Avinash Prabu, Nitya Ranjan, Lingxi Li, Renran Tian, Stanley Chien, Yaobin Chen, Rini Sherony

Abstract: In this paper, we propose SceNDD: a scenario-based naturalistic driving dataset that is built upon data collected from an instrumented vehicle in downtown Indianapolis. The data collection was completed in 68 driving sessions with different drivers, where each session lasted about 20--40 minutes. The main goal of creating this dataset is to provide the research community with real driving scenario… ▽ More In this paper, we propose SceNDD: a scenario-based naturalistic driving dataset that is built upon data collected from an instrumented vehicle in downtown Indianapolis. The data collection was completed in 68 driving sessions with different drivers, where each session lasted about 20--40 minutes. The main goal of creating this dataset is to provide the research community with real driving scenarios that have diverse trajectories and driving behaviors. The dataset contains ego-vehicle's waypoints, velocity, yaw angle, as well as non-ego actor's waypoints, velocity, yaw angle, entry-time, and exit-time. Certain flexibility is provided to users so that actors, sensors, lanes, roads, and obstacles can be added to the existing scenarios. We used a Joint Probabilistic Data Association (JPDA) tracker to detect non-ego vehicles on the road. We present some preliminary results of the proposed dataset and a few applications associated with it. The complete dataset is expected to be released by early 2023. △ Less

Submitted 22 December, 2022; originally announced December 2022.

Comments: Conference: 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC). Link: https://ieeexplore.ieee.org/document/9921953

arXiv:2212.11979 [pdf]

A Wearable Data Collection System for Studying Micro-Level E-Scooter Behavior in Naturalistic Road Environment

Authors: Avinash Prabu, Dan Shen, Renran Tian, Stanley Chien, Lingxi Li, Yaobin Chen, Rini Sherony

Abstract: As one of the most popular micro-mobility options, e-scooters are spreading in hundreds of big cities and college towns in the US and worldwide. In the meantime, e-scooters are also posing new challenges to traffic safety. In general, e-scooters are suggested to be ridden in bike lanes/sidewalks or share the road with cars at the maximum speed of about 15-20 mph, which is more flexible and much fa… ▽ More As one of the most popular micro-mobility options, e-scooters are spreading in hundreds of big cities and college towns in the US and worldwide. In the meantime, e-scooters are also posing new challenges to traffic safety. In general, e-scooters are suggested to be ridden in bike lanes/sidewalks or share the road with cars at the maximum speed of about 15-20 mph, which is more flexible and much faster than the pedestrains and bicyclists. These features make e-scooters challenging for human drivers, pedestrians, vehicle active safety modules, and self-driving modules to see and interact. To study this new mobility option and address e-scooter riders' and other road users' safety concerns, this paper proposes a wearable data collection system for investigating the micro-level e-Scooter motion behavior in a Naturalistic road environment. An e-Scooter-based data acquisition system has been developed by integrating LiDAR, cameras, and GPS using the robot operating system (ROS). Software frameworks are developed to support hardware interfaces, sensor operation, sensor synchronization, and data saving. The integrated system can collect data continuously for hours, meeting all the requirements including calibration accuracy and capability of collecting the vehicle and e-Scooter encountering data. △ Less

Submitted 22 December, 2022; originally announced December 2022.

Comments: Conference: Fast-zero'21, Kanazawa, Japan Date of publication: Sep 2021 Publisher: JSAE

Journal ref: https://tech.jsae.or.jp/paperinfo/en/content/conf2021-02.11/

arXiv:2211.02625 [pdf, other]

MAEEG: Masked Auto-encoder for EEG Representation Learning

Authors: Hsiang-Yun Sherry Chien, Hanlin Goh, Christopher M. Sandino, Joseph Y. Cheng

Abstract: Decoding information from bio-signals such as EEG, using machine learning has been a challenge due to the small data-sets and difficulty to obtain labels. We propose a reconstruction-based self-supervised learning model, the masked auto-encoder for EEG (MAEEG), for learning EEG representations by learning to reconstruct the masked EEG features using a transformer architecture. We found that MAEEG… ▽ More Decoding information from bio-signals such as EEG, using machine learning has been a challenge due to the small data-sets and difficulty to obtain labels. We propose a reconstruction-based self-supervised learning model, the masked auto-encoder for EEG (MAEEG), for learning EEG representations by learning to reconstruct the masked EEG features using a transformer architecture. We found that MAEEG can learn representations that significantly improve sleep stage classification (~5% accuracy increase) when only a small number of labels are given. We also found that input sample lengths and different ways of masking during reconstruction-based SSL pretraining have a huge effect on downstream model performance. Specifically, learning to reconstruct a larger proportion and more concentrated masked signal results in better performance on sleep classification. Our findings provide insight into how reconstruction-based SSL could help representation learning for EEG. △ Less

Submitted 27 October, 2022; originally announced November 2022.

Comments: 10 pages, 5 figures, accepted by Workshop on Learning from Time Series for Health, NeurIPS2022 as poster presentation

arXiv:2207.03334 [pdf, other]

Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation

Authors: Vikramjit Mitra, Hsiang-Yun Sherry Chien, Vasudha Kowtha, Joseph Yitan Cheng, Erdrin Azemi

Abstract: Estimating dimensional emotions, such as activation, valence and dominance, from acoustic speech signals has been widely explored over the past few years. While accurate estimation of activation and dominance from speech seem to be possible, the same for valence remains challenging. Previous research has shown that the use of lexical information can improve valence estimation performance. Lexical… ▽ More Estimating dimensional emotions, such as activation, valence and dominance, from acoustic speech signals has been widely explored over the past few years. While accurate estimation of activation and dominance from speech seem to be possible, the same for valence remains challenging. Previous research has shown that the use of lexical information can improve valence estimation performance. Lexical information can be obtained from pre-trained acoustic models, where the learned representations can improve valence estimation from speech. We investigate the use of pre-trained model representations to improve valence estimation from acoustic speech signal. We also explore fusion of representations to improve emotion estimation across all three emotion dimensions: activation, valence and dominance. Additionally, we investigate if representations from pre-trained models can be distilled into models trained with low-level features, resulting in models with a less number of parameters. We show that fusion of pre-trained model embeddings result in a 79% relative improvement in concordance correlation coefficient CCC on valence estimation compared to standard acoustic feature baseline (mel-filterbank energies), while distillation from pre-trained model embeddings to lower-dimensional representations yielded a relative 12% improvement. Such performance gains were observed over two evaluation sets, indicating that our proposed architecture generalizes across those evaluation sets. We report new state-of-the-art "text-free" acoustic-only dimensional emotion estimation $CCC$ values on two MSP-Podcast evaluation sets. △ Less

Submitted 2 July, 2022; originally announced July 2022.

Comments: 5 pages, 3 figures, Interspeech 2022

arXiv:2206.06878 [pdf, other]

doi 10.1145/3534678.3539159

Temporal Multimodal Multivariate Learning

Authors: Hyoshin Park, Justice Darko, Niharika Deshpande, Venktesh Pandey, Hui Su, Masahiro Ono, Dedrick Barkely, Larkin Folsom, Derek Posselt, Steve Chien

Abstract: We introduce temporal multimodal multivariate learning, a new family of decision making models that can indirectly learn and transfer online information from simultaneous observations of a probability distribution with more than one peak or more than one outcome variable from one time stage to another. We approximate the posterior by sequentially removing additional uncertainties across different… ▽ More We introduce temporal multimodal multivariate learning, a new family of decision making models that can indirectly learn and transfer online information from simultaneous observations of a probability distribution with more than one peak or more than one outcome variable from one time stage to another. We approximate the posterior by sequentially removing additional uncertainties across different variables and time, based on data-physics driven correlation, to address a broader class of challenging time-dependent decision-making problems under uncertainty. Extensive experiments on real-world datasets ( i.e., urban traffic data and hurricane ensemble forecasting data) demonstrate the superior performance of the proposed targeted decision-making over the state-of-the-art baseline prediction methods across various settings. △ Less

Submitted 14 June, 2022; originally announced June 2022.

Comments: 11 pages, 12 figures, SIGKDD Conference on Knowledge Discovery and Data Mining,

ACM Class: F.4.1

arXiv:2204.09606 [pdf, other]

Detecting Unintended Memorization in Language-Model-Fused ASR

Authors: W. Ronny Huang, Steve Chien, Om Thakkar, Rajiv Mathews

Abstract: End-to-end (E2E) models are often being accompanied by language models (LMs) via shallow fusion for boosting their overall quality as well as recognition of rare words. At the same time, several prior works show that LMs are susceptible to unintentionally memorizing rare or unique sequences in the training data. In this work, we design a framework for detecting memorization of random textual seque… ▽ More End-to-end (E2E) models are often being accompanied by language models (LMs) via shallow fusion for boosting their overall quality as well as recognition of rare words. At the same time, several prior works show that LMs are susceptible to unintentionally memorizing rare or unique sequences in the training data. In this work, we design a framework for detecting memorization of random textual sequences (which we call canaries) in the LM training data when one has only black-box (query) access to LM-fused speech recognizer, as opposed to direct access to the LM. On a production-grade Conformer RNN-T E2E model fused with a Transformer LM, we show that detecting memorization of singly-occurring canaries from the LM training data of 300M examples is possible. Motivated to protect privacy, we also show that such memorization gets significantly reduced by per-example gradient-clipped LM training without compromising overall quality. △ Less

Submitted 28 June, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

Comments: Interspeech 2022

arXiv:2201.12328 [pdf, other]

Toward Training at ImageNet Scale with Differential Privacy

Authors: Alexey Kurakin, Shuang Song, Steve Chien, Roxana Geambasu, Andreas Terzis, Abhradeep Thakurta

Abstract: Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy. We set ou… ▽ More Differential privacy (DP) is the de facto standard for training machine learning (ML) models, including neural networks, while ensuring the privacy of individual examples in the training set. Despite a rich literature on how to train ML models with differential privacy, it remains extremely challenging to train real-life, large neural networks with both reasonable accuracy and privacy. We set out to investigate how to do this, using ImageNet image classification as a poster example of an ML task that is very challenging to resolve accurately with DP right now. This paper shares initial lessons from our effort, in the hope that it will inspire and inform other researchers to explore DP training at scale. We show approaches that help make DP training faster, as well as model types and settings of the training process that tend to work better in the DP setting. Combined, the methods we discuss let us train a Resnet-18 with DP to $47.9\%$ accuracy and privacy parameters $ε= 10, δ= 10^{-6}$. This is a significant improvement over "naive" DP training of ImageNet models, but a far cry from the $75\%$ accuracy that can be obtained by the same network without privacy. The model we use was pretrained on the Places365 data set as a starting point. We share our code at https://github.com/google-research/dp-imagenet, calling for others to build upon this new baseline to further improve DP at scale. △ Less

Submitted 8 February, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

Comments: 25 pages, 7 figures. Code available at https://github.com/google-research/dp-imagenet

arXiv:2112.12496 [pdf, other]

FedFR: Joint Optimization Federated Framework for Generic and Personalized Face Recognition

Authors: Chih-Ting Liu, Chien-Yi Wang, Shao-Yi Chien, Shang-Hong Lai

Abstract: Current state-of-the-art deep learning based face recognition (FR) models require a large number of face identities for central training. However, due to the growing privacy awareness, it is prohibited to access the face images on user devices to continually improve face recognition models. Federated Learning (FL) is a technique to address the privacy issue, which can collaboratively optimize the… ▽ More Current state-of-the-art deep learning based face recognition (FR) models require a large number of face identities for central training. However, due to the growing privacy awareness, it is prohibited to access the face images on user devices to continually improve face recognition models. Federated Learning (FL) is a technique to address the privacy issue, which can collaboratively optimize the model without sharing the data between clients. In this work, we propose a FL based framework called FedFR to improve the generic face representation in a privacy-aware manner. Besides, the framework jointly optimizes personalized models for the corresponding clients via the proposed Decoupled Feature Customization module. The client-specific personalized model can serve the need of optimized face recognition experience for registered identities at the local device. To the best of our knowledge, we are the first to explore the personalized face recognition in FL setup. The proposed framework is validated to be superior to previous approaches on several generic and personalized face recognition benchmarks with diverse FL scenarios. The source codes and our proposed personalized FR benchmark under FL setup are available at https://github.com/jackie840129/FedFR. △ Less

Submitted 21 March, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

Comments: This paper was accepted by AAAI 2022 Conference on Artificial Intelligence and selected as an oral paper

arXiv:2112.06879 [pdf, other]

Multi-Robot On-site Shared Analytics Information and Computing

Authors: Joshua Vander Hook, Federico Rossi, Tiago Vaquero, Martina Troesch, Marc Sanchez Net, Joshua Schoolcraft, Jean-Pierre de la Croix, Steve Chien

Abstract: Computation load-sharing across a network of heterogeneous robots is a promising approach to increase robots capabilities and efficiency as a team in extreme environments. However, in such environments, communication links may be intermittent and connections to the cloud or internet may be nonexistent. In this paper we introduce a communication-aware, computation task scheduling problem for multi-… ▽ More Computation load-sharing across a network of heterogeneous robots is a promising approach to increase robots capabilities and efficiency as a team in extreme environments. However, in such environments, communication links may be intermittent and connections to the cloud or internet may be nonexistent. In this paper we introduce a communication-aware, computation task scheduling problem for multi-robot systems and propose an integer linear program (ILP) that optimizes the allocation of computational tasks across a network of heterogeneous robots, accounting for the networked robots' computational capabilities and for available (and possibly time-varying) communication links. We consider scheduling of a set of inter-dependent required and optional tasks modeled by a dependency graph. We present a consensus-backed scheduling architecture for shared-world, distributed systems. We validate the ILP formulation and the distributed implementation in different computation platforms and in simulated scenarios with a bias towards lunar or planetary exploration scenarios. Our results show that the proposed implementation can optimize schedules to allow a threefold increase the amount of rewarding tasks performed (e.g., science measurements) compared to an analogous system with no computational load-sharing. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: 14 pages, 11 figures. Extended version of journal submission in preparation

arXiv:2112.03570 [pdf, other]

Membership Inference Attacks From First Principles

Authors: Nicholas Carlini, Steve Chien, Milad Nasr, Shuang Song, Andreas Terzis, Florian Tramer

Abstract: A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset. These attacks are currently evaluated using average-case "accuracy" metrics that fail to characterize whether the attack can confidently identify any members of the training set. We argue that attacks should instea… ▽ More A membership inference attack allows an adversary to query a trained machine learning model to predict whether or not a particular example was contained in the model's training dataset. These attacks are currently evaluated using average-case "accuracy" metrics that fail to characterize whether the attack can confidently identify any members of the training set. We argue that attacks should instead be evaluated by computing their true-positive rate at low (e.g., <0.1%) false-positive rates, and find most prior attacks perform poorly when evaluated in this way. To address this we develop a Likelihood Ratio Attack (LiRA) that carefully combines multiple ideas from the literature. Our attack is 10x more powerful at low false-positive rates, and also strictly dominates prior attacks on existing metrics. △ Less

Submitted 12 April, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

arXiv:2111.13876 [pdf, other]

Learning Discriminative Shrinkage Deep Networks for Image Deconvolution

Authors: Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien, Ming-Hsuan Yang

Abstract: Most existing methods usually formulate the non-blind deconvolution problem into a maximum-a-posteriori framework and address it by manually designing kinds of regularization terms and data terms of the latent clear images. However, explicitly designing these two terms is quite challenging and usually leads to complex optimization problems which are difficult to solve. In this paper, we propose an… ▽ More Most existing methods usually formulate the non-blind deconvolution problem into a maximum-a-posteriori framework and address it by manually designing kinds of regularization terms and data terms of the latent clear images. However, explicitly designing these two terms is quite challenging and usually leads to complex optimization problems which are difficult to solve. In this paper, we propose an effective non-blind deconvolution approach by learning discriminative shrinkage functions to implicitly model these terms. In contrast to most existing methods that use deep convolutional neural networks (CNNs) or radial basis functions to simply learn the regularization term, we formulate both the data term and regularization term and split the deconvolution model into data-related and regularization-related sub-problems according to the alternating direction method of multipliers. We explore the properties of the Maxout function and develop a deep CNN model with a Maxout layer to learn discriminative shrinkage functions to directly approximate the solutions of these two sub-problems. Moreover, given the fast-Fourier-transform-based image restoration usually leads to ringing artifacts while conjugate-gradient-based approach is time-consuming, we develop the Conjugate Gradient Network to restore the latent clear images effectively and efficiently. Experimental results show that the proposed method performs favorably against the state-of-the-art ones in terms of efficiency and accuracy. △ Less

Submitted 20 July, 2022; v1 submitted 27 November, 2021; originally announced November 2021.

arXiv:2111.12885 [pdf, other]

A Dense Tensor Accelerator with Data Exchange Mesh for DNN and Vision Workloads

Authors: Yu-Sheng Lin, Wei-Chao Chen. Chia-Lin Yang, Shao-Yi Chien

Abstract: We propose a dense tensor accelerator called VectorMesh, a scalable, memory-efficient architecture that can support a wide variety of DNN and computer vision workloads. Its building block is a tile execution unit~(TEU), which includes dozens of processing elements~(PEs) and SRAM buffers connected through a butterfly network. A mesh of FIFOs between the TEUs facilitates data exchange between tiles… ▽ More We propose a dense tensor accelerator called VectorMesh, a scalable, memory-efficient architecture that can support a wide variety of DNN and computer vision workloads. Its building block is a tile execution unit~(TEU), which includes dozens of processing elements~(PEs) and SRAM buffers connected through a butterfly network. A mesh of FIFOs between the TEUs facilitates data exchange between tiles and promote local data to global visibility. Our design performs better according to the roofline model for CNN, GEMM, and spatial matching algorithms compared to state-of-the-art architectures. It can reduce global buffer and DRAM fetches by 2-22 times and up to 5 times, respectively. △ Less

Submitted 24 November, 2021; originally announced November 2021.

arXiv:2111.10970 [pdf, other]

doi 10.1109/AERO53065.2022.9843352

Operations for Autonomous Spacecraft

Authors: Rebecca Castano, Tiago Vaquero, Federico Rossi, Vandi Verma, Ellen Van Wyk, Dan Allard, Bennett Huffmann, Erin M. Murphy, Nihal Dhamani, Robert A. Hewitt, Scott Davidoff, Rashied Amini, Anthony Barrett, Julie Castillo-Rogez, Steve A. Chien, Mathieu Choukroun, Alain Dadaian, Raymond Francis, Benjamin Gorr, Mark Hofstadter, Mitch Ingham, Cristina Sorice, Iain Tierney

Abstract: Onboard autonomy technologies such as planning and scheduling, identification of scientific targets, and content-based data summarization, will lead to exciting new space science missions. However, the challenge of operating missions with such onboard autonomous capabilities has not been studied to a level of detail sufficient for consideration in mission concepts. These autonomy capabilities will… ▽ More Onboard autonomy technologies such as planning and scheduling, identification of scientific targets, and content-based data summarization, will lead to exciting new space science missions. However, the challenge of operating missions with such onboard autonomous capabilities has not been studied to a level of detail sufficient for consideration in mission concepts. These autonomy capabilities will require changes to current operations processes, practices, and tools. We have developed a case study to assess the changes needed to enable operators and scientists to operate an autonomous spacecraft by facilitating a common model between the ground personnel and the onboard algorithms. We assess the new operations tools and workflows necessary to enable operators and scientists to convey their desired intent to the spacecraft, and to be able to reconstruct and explain the decisions made onboard and the state of the spacecraft. Mock-ups of these tools were used in a user study to understand the effectiveness of the processes and tools in enabling a shared framework of understanding, and in the ability of the operators and scientists to effectively achieve mission science objectives. △ Less

Submitted 21 November, 2021; originally announced November 2021.

Comments: 16 pages, 18 Figures, 1 Table, to be published in IEEE Aerospace 2022 (AeroConf 2022)

Journal ref: Proceedings of the 2022 IEEE Aerospace Conference (IEEE AERO 2022), 1-20

arXiv:2110.02017 [pdf]

doi 10.1364/OL.447986

In-plane subwavelength near field optical capsule for lab-on-a-chip optical nano-tweezer

Authors: Oleg V. Minin, Shuo-Chih Chien, Wei-Yu Chen, Cheng-Yang Liu, Igor V. Minin

Abstract: In this letter, we propose a new proof-of-concept of optical nano-tweezer on the basis of a pair of dielectric rectangular rods capable of generating a novel class of controlled finite-volume near field light capsules. The finite-difference time-domain simulations of light spatial structure and optical trapping forces of the gold nanoparticle immersed in water demonstrate the physical concept of a… ▽ More In this letter, we propose a new proof-of-concept of optical nano-tweezer on the basis of a pair of dielectric rectangular rods capable of generating a novel class of controlled finite-volume near field light capsules. The finite-difference time-domain simulations of light spatial structure and optical trapping forces of the gold nanoparticle immersed in water demonstrate the physical concept of an in-plane subwavelength optical capsule, integrated with the microfluidic mesoscale device. It is shown that refractive index and distance between dielectric rectangular rods can control the shape and axial position of the optical capsule. Such an in-plane wavelength-scaled structure provides a new path for manipulating absorbing nano-particles including bio-particles in a compact planar architecture and should thus open promising perspectives in lab-on-a-chip domains. △ Less

Submitted 3 October, 2021; originally announced October 2021.

Comments: 5 pages, 5 Figures

MSC Class: 65Z05 ACM Class: J.2

Journal ref: Optics Letters 47, 794-797 (2022)

arXiv:2107.09802 [pdf, other]

Private Alternating Least Squares: Practical Private Matrix Completion with Tighter Rates

Authors: Steve Chien, Prateek Jain, Walid Krichene, Steffen Rendle, Shuang Song, Abhradeep Thakurta, Li Zhang

Abstract: We study the problem of differentially private (DP) matrix completion under user-level privacy. We design a joint differentially private variant of the popular Alternating-Least-Squares (ALS) method that achieves: i) (nearly) optimal sample complexity for matrix completion (in terms of number of items, users), and ii) the best known privacy/utility trade-off both theoretically, as well as on bench… ▽ More We study the problem of differentially private (DP) matrix completion under user-level privacy. We design a joint differentially private variant of the popular Alternating-Least-Squares (ALS) method that achieves: i) (nearly) optimal sample complexity for matrix completion (in terms of number of items, users), and ii) the best known privacy/utility trade-off both theoretically, as well as on benchmark data sets. In particular, we provide the first global convergence analysis of ALS with noise introduced to ensure DP, and show that, in comparison to the best known alternative (the Private Frank-Wolfe algorithm by Jain et al. (2018)), our error bounds scale significantly better with respect to the number of items and users, which is critical in practical problems. Extensive validation on standard benchmarks demonstrate that the algorithm, in combination with carefully designed sampling procedures, is significantly more accurate than existing techniques, thus promising to be the first practical DP embedding model. △ Less

Submitted 20 July, 2021; originally announced July 2021.

arXiv:2107.06676 [pdf, other]

Higgs Boson Classification: Brain-inspired BCPNN Learning with StreamBrain

Authors: Martin Svedin, Artur Podobas, Steven W. D. Chien, Stefano Markidis

Abstract: One of the most promising approaches for data analysis and exploration of large data sets is Machine Learning techniques that are inspired by brain models. Such methods use alternative learning rules potentially more efficiently than established learning rules. In this work, we focus on the potential of brain-inspired ML for exploiting High-Performance Computing (HPC) resources to solve ML problem… ▽ More One of the most promising approaches for data analysis and exploration of large data sets is Machine Learning techniques that are inspired by brain models. Such methods use alternative learning rules potentially more efficiently than established learning rules. In this work, we focus on the potential of brain-inspired ML for exploiting High-Performance Computing (HPC) resources to solve ML problems: we discuss the BCPNN and an HPC implementation, called StreamBrain, its computational cost, suitability to HPC systems. As an example, we use StreamBrain to analyze the Higgs Boson dataset from High Energy Physics and discriminate between background and signal classes in collisions of high-energy particle colliders. Overall, we reach up to 69.15% accuracy and 76.4% Area Under the Curve (AUC) performance. △ Less

Submitted 17 August, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

Comments: Accepted for publication at The 2nd Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S 2021)

arXiv:2106.10465 [pdf, other]

Interactive Object Segmentation with Dynamic Click Transform

Authors: Chun-Tse Lin, Wei-Chih Tu, Chih-Ting Liu, Shao-Yi Chien

Abstract: In the interactive segmentation, users initially click on the target object to segment the main body and then provide corrections on mislabeled regions to iteratively refine the segmentation masks. Most existing methods transform these user-provided clicks into interaction maps and concatenate them with image as the input tensor. Typically, the interaction maps are determined by measuring the dist… ▽ More In the interactive segmentation, users initially click on the target object to segment the main body and then provide corrections on mislabeled regions to iteratively refine the segmentation masks. Most existing methods transform these user-provided clicks into interaction maps and concatenate them with image as the input tensor. Typically, the interaction maps are determined by measuring the distance of each pixel to the clicked points, ignoring the relation between clicks and mislabeled regions. We propose a Dynamic Click Transform Network~(DCT-Net), consisting of Spatial-DCT and Feature-DCT, to better represent user interactions. Spatial-DCT transforms each user-provided click with individual diffusion distance according to the target scale, and Feature-DCT normalizes the extracted feature map to a specific distribution predicted from the clicked points. We demonstrate the effectiveness of our proposed method and achieve favorable performance compared to the state-of-the-art on three standard benchmark datasets. △ Less

Submitted 19 June, 2021; originally announced June 2021.

Comments: This paper was accepted by IEEE International Conference on Image Processing (ICIP) 2021

arXiv:2106.07204 [pdf, other]

Hard Samples Rectification for Unsupervised Cross-domain Person Re-identification

Authors: Chih-Ting Liu, Man-Yu Lee, Tsai-Shien Chen, Shao-Yi Chien

Abstract: Person re-identification (re-ID) has received great success with the supervised learning methods. However, the task of unsupervised cross-domain re-ID is still challenging. In this paper, we propose a Hard Samples Rectification (HSR) learning scheme which resolves the weakness of original clustering-based methods being vulnerable to the hard positive and negative samples in the target unlabelled d… ▽ More Person re-identification (re-ID) has received great success with the supervised learning methods. However, the task of unsupervised cross-domain re-ID is still challenging. In this paper, we propose a Hard Samples Rectification (HSR) learning scheme which resolves the weakness of original clustering-based methods being vulnerable to the hard positive and negative samples in the target unlabelled dataset. Our HSR contains two parts, an inter-camera mining method that helps recognize a person under different views (hard positive) and a part-based homogeneity technique that makes the model discriminate different persons but with similar appearance (hard negative). By rectifying those two hard cases, the re-ID model can learn effectively and achieve promising results on two large-scale benchmarks. △ Less

Submitted 14 June, 2021; originally announced June 2021.

Comments: This paper was accepted by IEEE International Conference on Image Processing (ICIP) 2021

arXiv:2106.05373 [pdf, other]

doi 10.1145/3468044.3468052

StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs

Authors: Artur Podobas, Martin Svedin, Steven W. D. Chien, Ivy B. Peng, Naresh Balaji Ravichandran, Pawel Herman, Anders Lansner, Stefano Markidis

Abstract: The modern deep learning method based on backpropagation has surged in popularity and has been used in multiple domains and application areas. At the same time, there are other -- less-known -- machine learning algorithms with a mature and solid theoretical foundation whose performance remains unexplored. One such example is the brain-like Bayesian Confidence Propagation Neural Network (BCPNN). In… ▽ More The modern deep learning method based on backpropagation has surged in popularity and has been used in multiple domains and application areas. At the same time, there are other -- less-known -- machine learning algorithms with a mature and solid theoretical foundation whose performance remains unexplored. One such example is the brain-like Bayesian Confidence Propagation Neural Network (BCPNN). In this paper, we introduce StreamBrain -- a framework that allows neural networks based on BCPNN to be practically deployed in High-Performance Computing systems. StreamBrain is a domain-specific language (DSL), similar in concept to existing machine learning (ML) frameworks, and supports backends for CPUs, GPUs, and even FPGAs. We empirically demonstrate that StreamBrain can train the well-known ML benchmark dataset MNIST within seconds, and we are the first to demonstrate BCPNN on STL-10 size networks. We also show how StreamBrain can be used to train with custom floating-point formats and illustrate the impact of using different bfloat variations on BCPNN using FPGAs. △ Less

Submitted 9 June, 2021; originally announced June 2021.

Comments: Accepted for publication at the International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART 2021)

arXiv:2106.04979 [pdf]

Benchmarking the Nvidia GPU Lineage: From Early K80 to Modern A100 with Asynchronous Memory Transfers

Authors: Martin Svedin, Steven W. D. Chien, Gibson Chikafa, Niclas Jansson, Artur Podobas

Abstract: For many, Graphics Processing Units (GPUs) provides a source of reliable computing power. Recently, Nvidia introduced its 9th generation HPC-grade GPUs, the Ampere 100, claiming significant performance improvements over previous generations, particularly for AI-workloads, as well as introducing new architectural features such as asynchronous data movement. But how well does the A100 perform on non… ▽ More For many, Graphics Processing Units (GPUs) provides a source of reliable computing power. Recently, Nvidia introduced its 9th generation HPC-grade GPUs, the Ampere 100, claiming significant performance improvements over previous generations, particularly for AI-workloads, as well as introducing new architectural features such as asynchronous data movement. But how well does the A100 perform on non-AI benchmarks, and can we expect the A100 to deliver the application improvements we have grown used to with previous GPU generations? In this paper, we benchmark the A100 GPU and compare it to four previous generations of GPUs, with particular focus on empirically quantifying our derived performance expectations, and -- should those expectations be undelivered -- investigate whether the introduced data-movement features can offset any eventual loss in performance? We find that the A100 delivers less performance increase than previous generations for the well-known Rodinia benchmark suite; we show that some of these performance anomalies can be remedied through clever use of the new data-movement features, which we microbenchmark and demonstrate where (and more importantly, how) they should be used. △ Less

Submitted 3 July, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: 7 pages

arXiv:2106.03719 [pdf, other]

Incremental False Negative Detection for Contrastive Learning

Authors: Tsai-Shien Chen, Wei-Chih Hung, Hung-Yu Tseng, Shao-Yi Chien, Ming-Hsuan Yang

Abstract: Self-supervised learning has recently shown great potential in vision tasks through contrastive learning, which aims to discriminate each image, or instance, in the dataset. However, such instance-level learning ignores the semantic relationship among instances and sometimes undesirably repels the anchor from the semantically similar samples, termed as "false negatives". In this work, we show that… ▽ More Self-supervised learning has recently shown great potential in vision tasks through contrastive learning, which aims to discriminate each image, or instance, in the dataset. However, such instance-level learning ignores the semantic relationship among instances and sometimes undesirably repels the anchor from the semantically similar samples, termed as "false negatives". In this work, we show that the unfavorable effect from false negatives is more significant for the large-scale datasets with more semantic concepts. To address the issue, we propose a novel self-supervised contrastive learning framework that incrementally detects and explicitly removes the false negative samples. Specifically, following the training process, our method dynamically detects increasing high-quality false negatives considering that the encoder gradually improves and the embedding space becomes more semantically structural. Next, we discuss two strategies to explicitly remove the detected false negatives during contrastive learning. Extensive experiments show that our framework outperforms other self-supervised contrastive learning methods on multiple benchmarks in a limited resource setup. △ Less

Submitted 16 March, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: ICLR 2022

arXiv:2105.10678 [pdf, other]

Video-based Person Re-identification without Bells and Whistles

Authors: Chih-Ting Liu, Jun-Cheng Chen, Chu-Song Chen, Shao-Yi Chien

Abstract: Video-based person re-identification (Re-ID) aims at matching the video tracklets with cropped video frames for identifying the pedestrians under different cameras. However, there exists severe spatial and temporal misalignment for those cropped tracklets due to the imperfect detection and tracking results generated with obsolete methods. To address this issue, we present a simple re-Detect and Li… ▽ More Video-based person re-identification (Re-ID) aims at matching the video tracklets with cropped video frames for identifying the pedestrians under different cameras. However, there exists severe spatial and temporal misalignment for those cropped tracklets due to the imperfect detection and tracking results generated with obsolete methods. To address this issue, we present a simple re-Detect and Link (DL) module which can effectively reduce those unexpected noise through applying the deep learning-based detection and tracking on the cropped tracklets. Furthermore, we introduce an improved model called Coarse-to-Fine Axial-Attention Network (CF-AAN). Based on the typical Non-local Network, we replace the non-local module with three 1-D position-sensitive axial attentions, in addition to our proposed coarse-to-fine structure. With the developed CF-AAN, compared to the original non-local operation, we can not only significantly reduce the computation cost but also obtain the state-of-the-art performance (91.3% in rank-1 and 86.5% in mAP) on the large-scale MARS dataset. Meanwhile, by simply adopting our DL module for data alignment, to our surprise, several baseline models can achieve better or comparable results with the current state-of-the-arts. Besides, we discover the errors not only for the identity labels of tracklets but also for the evaluation protocol for the test data of MARS. We hope that our work can help the community for the further development of invariant representation without the hassle of the spatial and temporal alignment and dataset noise. The code, corrected labels, evaluation protocol, and the aligned data will be available at https://github.com/jackie840129/CF-AAN. △ Less

Submitted 22 September, 2021; v1 submitted 22 May, 2021; originally announced May 2021.

Comments: This paper was accepted by CVPR 2021 Biometrics Workshop

arXiv:2105.05944 [pdf, other]

Slower is Better: Revisiting the Forgetting Mechanism in LSTM for Slower Information Decay

Authors: Hsiang-Yun Sherry Chien, Javier S. Turek, Nicole Beckage, Vy A. Vo, Christopher J. Honey, Ted L. Willke

Abstract: Sequential information contains short- to long-range dependencies; however, learning long-timescale information has been a challenge for recurrent neural networks. Despite improvements in long short-term memory networks (LSTMs), the forgetting mechanism results in the exponential decay of information, limiting their capacity to capture long-timescale information. Here, we propose a power law forge… ▽ More Sequential information contains short- to long-range dependencies; however, learning long-timescale information has been a challenge for recurrent neural networks. Despite improvements in long short-term memory networks (LSTMs), the forgetting mechanism results in the exponential decay of information, limiting their capacity to capture long-timescale information. Here, we propose a power law forget gate, which instead learns to forget information along a slower power law decay function. Specifically, the new gate learns to control the power law decay factor, p, allowing the network to adjust the information decay rate according to task demands. Our experiments show that an LSTM with power law forget gates (pLSTM) can effectively capture long-range dependencies beyond hundreds of elements on image classification, language modeling, and categorization tasks, improving performance over the vanilla LSTM. We also inspected the revised forget gate by varying the initialization of p, setting p to a fixed value, and ablating cells in the pLSTM network. The results show that the information decay can be controlled by the learnable decay factor p, which allows pLSTM to achieve its superior performance. Altogether, we found that LSTM with the proposed forget gate can learn long-term dependencies, outperforming other recurrent networks in multiple domains; such gating mechanism can be integrated into other architectures for improving the learning of long timescale information in recurrent neural networks. △ Less

Submitted 12 May, 2021; originally announced May 2021.

Comments: 16 pages, 10 figures

arXiv:2012.06717 [pdf, other]

Mapping the Timescale Organization of Neural Language Models

Authors: Hsiang-Yun Sherry Chien, Jinhan Zhang, Christopher. J. Honey

Abstract: In the human brain, sequences of language input are processed within a distributed and hierarchical architecture, in which higher stages of processing encode contextual information over longer timescales. In contrast, in recurrent neural networks which perform natural language processing, we know little about how the multiple timescales of contextual information are functionally organized. Therefo… ▽ More In the human brain, sequences of language input are processed within a distributed and hierarchical architecture, in which higher stages of processing encode contextual information over longer timescales. In contrast, in recurrent neural networks which perform natural language processing, we know little about how the multiple timescales of contextual information are functionally organized. Therefore, we applied tools developed in neuroscience to map the "processing timescales" of individual units within a word-level LSTM language model. This timescale-mapping method assigned long timescales to units previously found to track long-range syntactic dependencies. Additionally, the mapping revealed a small subset of the network (less than 15% of units) with long timescales and whose function had not previously been explored. We next probed the functional organization of the network by examining the relationship between the processing timescale of units and their network connectivity. We identified two classes of long-timescale units: "controller" units composed a densely interconnected subnetwork and strongly projected to the rest of the network, while "integrator" units showed the longest timescales in the network, and expressed projection profiles closer to the mean projection profile. Ablating integrator and controller units affected model performance at different positions within a sentence, suggesting distinctive functions of these two sets of units. Finally, we tested the generalization of these results to a character-level LSTM model and models with different architectures. In summary, we demonstrated a model-free technique for mapping the timescale organization in recurrent neural networks, and we applied this method to reveal the timescale and functional organization of neural language models. △ Less

Submitted 17 March, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

Comments: 23 pages, 4 main figures, 10 appendix figures; published as a conference paper at ICLR 2021

arXiv:2012.01874 [pdf, other]

How to Exploit the Transferability of Learned Image Compression to Conventional Codecs

Authors: Jan P. Klopp, Keng-Chi Liu, Liang-Gee Chen, Shao-Yi Chien

Abstract: Lossy image compression is often limited by the simplicity of the chosen loss measure. Recent research suggests that generative adversarial networks have the ability to overcome this limitation and serve as a multi-modal loss, especially for textures. Together with learned image compression, these two techniques can be used to great effect when relaxing the commonly employed tight measures of dist… ▽ More Lossy image compression is often limited by the simplicity of the chosen loss measure. Recent research suggests that generative adversarial networks have the ability to overcome this limitation and serve as a multi-modal loss, especially for textures. Together with learned image compression, these two techniques can be used to great effect when relaxing the commonly employed tight measures of distortion. However, convolutional neural network based algorithms have a large computational footprint. Ideally, an existing conventional codec should stay in place, which would ensure faster adoption and adhering to a balanced computational envelope. As a possible avenue to this goal, in this work, we propose and investigate how learned image coding can be used as a surrogate to optimize an image for encoding. The image is altered by a learned filter to optimise for a different performance measure or a particular task. Extending this idea with a generative adversarial network, we show how entire textures are replaced by ones that are less costly to encode but preserve sense of detail. Our approach can remodel a conventional codec to adjust for the MS-SSIM distortion with over 20% rate improvement without any decoding overhead. On task-aware image compression, we perform favourably against a similar but codec-specific approach. △ Less

Submitted 6 March, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: 10 pages, 5 figures

arXiv:2011.08733 [pdf, ps, other]

Using Explainable Scheduling for the Mars 2020 Rover Mission

Authors: Jagriti Agrawal, Amruta Yelamanchili, Steve Chien

Abstract: Understanding the reasoning behind the behavior of an automated scheduling system is essential to ensure that it will be trusted and consequently used to its full capabilities in critical applications. In cases where a scheduler schedules activities in an invalid location, it is usually easy for the user to infer the missing constraint by inspecting the schedule with the invalid activity to determ… ▽ More Understanding the reasoning behind the behavior of an automated scheduling system is essential to ensure that it will be trusted and consequently used to its full capabilities in critical applications. In cases where a scheduler schedules activities in an invalid location, it is usually easy for the user to infer the missing constraint by inspecting the schedule with the invalid activity to determine the missing constraint. If a scheduler fails to schedule activities because constraints could not be satisfied, determining the cause can be more challenging. In such cases it is important to understand which constraints caused the activities to fail to be scheduled and how to alter constraints to achieve the desired schedule. In this paper, we describe such a scheduling system for NASA's Mars 2020 Perseverance Rover, as well as Crosscheck, an explainable scheduling tool that explains the scheduler behavior. The scheduling system and Crosscheck are the baseline for operational use to schedule activities for the Mars 2020 rover. As we describe, the scheduler generates a schedule given a set of activities and their constraints and Crosscheck: (1) provides a visual representation of the generated schedule; (2) analyzes and explains why activities failed to schedule given the constraints provided; and (3) provides guidance on potential constraint relaxations to enable the activities to schedule in future scheduler runs. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: Submitted to the International Workshop of Explainable AI Planning (XAIP) at the International Conference on Automated Planning and Scheduling (ICAPS) 2020

arXiv:2010.05810 [pdf, other]

Viewpoint-Aware Channel-Wise Attentive Network for Vehicle Re-Identification

Authors: Tsai-Shien Chen, Man-Yu Lee, Chih-Ting Liu, Shao-Yi Chien

Abstract: Vehicle re-identification (re-ID) matches images of the same vehicle across different cameras. It is fundamentally challenging because the dramatically different appearance caused by different viewpoints would make the framework fail to match two vehicles of the same identity. Most existing works solved the problem by extracting viewpoint-aware feature via spatial attention mechanism, which, yet,… ▽ More Vehicle re-identification (re-ID) matches images of the same vehicle across different cameras. It is fundamentally challenging because the dramatically different appearance caused by different viewpoints would make the framework fail to match two vehicles of the same identity. Most existing works solved the problem by extracting viewpoint-aware feature via spatial attention mechanism, which, yet, usually suffers from noisy generated attention map or otherwise requires expensive keypoint labels to improve the quality. In this work, we propose Viewpoint-aware Channel-wise Attention Mechanism (VCAM) by observing the attention mechanism from a different aspect. Our VCAM enables the feature learning framework channel-wisely reweighing the importance of each feature maps according to the "viewpoint" of input vehicle. Extensive experiments validate the effectiveness of the proposed method and show that we perform favorably against state-of-the-arts methods on the public VeRi-776 dataset and obtain promising results on the 2020 AI City Challenge. We also conduct other experiments to demonstrate the interpretability of how our VCAM practically assists the learning framework. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Comments: CVPR Workshop 2020

arXiv:2010.01892 [pdf, other]

Joint Pruning & Quantization for Extremely Sparse Neural Networks

Authors: Po-Hsiang Yu, Sih-Sian Wu, Jan P. Klopp, Liang-Gee Chen, Shao-Yi Chien

Abstract: We investigate pruning and quantization for deep neural networks. Our goal is to achieve extremely high sparsity for quantized networks to enable implementation on low cost and low power accelerator hardware. In a practical scenario, there are particularly many applications for dense prediction tasks, hence we choose stereo depth estimation as target. We propose a two stage pruning and quantizat… ▽ More We investigate pruning and quantization for deep neural networks. Our goal is to achieve extremely high sparsity for quantized networks to enable implementation on low cost and low power accelerator hardware. In a practical scenario, there are particularly many applications for dense prediction tasks, hence we choose stereo depth estimation as target. We propose a two stage pruning and quantization pipeline and introduce a Taylor Score alongside a new fine-tuning mode to achieve extreme sparsity without sacrificing performance. Our evaluation does not only show that pruning and quantization should be investigated jointly, but also shows that almost 99% of memory demand can be cut while hardware costs can be reduced up to 99.9%. In addition, to compare with other works, we demonstrate that our pruning stage alone beats the state-of-the-art when applied to ResNet on CIFAR10 and ImageNet. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: 13 page, 16 figures

Showing 1–50 of 85 results for author: Chien, S