-
iMedic: Towards Smartphone-based Self-Auscultation Tool for AI-Powered Pediatric Respiratory Assessment
Authors:
Seung Gyu Jeong,
Sung Woo Nam,
Seong Kwan Jung,
Seong-Eun Kim
Abstract:
Respiratory auscultation is crucial for early detection of pediatric pneumonia, a condition that can quickly worsen without timely intervention. In areas with limited physician access, effective auscultation is challenging. We present a smartphone-based system that leverages built-in microphones and advanced deep learning algorithms to detect abnormal respiratory sounds indicative of pneumonia ris…
▽ More
Respiratory auscultation is crucial for early detection of pediatric pneumonia, a condition that can quickly worsen without timely intervention. In areas with limited physician access, effective auscultation is challenging. We present a smartphone-based system that leverages built-in microphones and advanced deep learning algorithms to detect abnormal respiratory sounds indicative of pneumonia risk. Our end-to-end deep learning framework employs domain generalization to integrate a large electronic stethoscope dataset with a smaller smartphone-derived dataset, enabling robust feature learning for accurate respiratory assessments without expensive equipment. The accompanying mobile application guides caregivers in collecting high-quality lung sound samples and provides immediate feedback on potential pneumonia risks. User studies show strong classification performance and high acceptance, demonstrating the system's ability to facilitate proactive interventions and reduce preventable childhood pneumonia deaths. By seamlessly integrating into ubiquitous smartphones, this approach offers a promising avenue for more equitable and comprehensive remote pediatric care.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Kanana: Compute-efficient Bilingual Language Models
Authors:
Kanana LLM Team,
Yunju Bak,
Hojin Lee,
Minho Ryu,
Jiyeon Ham,
Seungjae Jung,
Daniel Wontae Nam,
Taegyeong Eo,
Donghun Lee,
Doohae Jung,
Boseop Kim,
Nayeon Kim,
Jaesun Park,
Hyunho Kim,
Hyunwoong Ko,
Changmin Lee,
Kyoung-Woon On,
Seulye Baeg,
Junrae Cho,
Sunghee Jung,
Jieun Kang,
EungGyun Kim,
Eunhwa Kim,
Byeongil Ko,
Daniel Lee
, et al. (4 additional authors not shown)
Abstract:
We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality dat…
▽ More
We introduce Kanana, a series of bilingual language models that demonstrate exceeding performance in Korean and competitive performance in English. The computational cost of Kanana is significantly lower than that of state-of-the-art models of similar size. The report details the techniques employed during pre-training to achieve compute-efficient yet competitive models, including high quality data filtering, staged pre-training, depth up-scaling, and pruning and distillation. Furthermore, the report outlines the methodologies utilized during the post-training of the Kanana models, encompassing supervised fine-tuning and preference optimization, aimed at enhancing their capability for seamless interaction with users. Lastly, the report elaborates on plausible approaches used for language model adaptation to specific scenarios, such as embedding, retrieval augmented generation, and function calling. The Kanana model series spans from 2.1B to 32.5B parameters with 2.1B models (base, instruct, embedding) publicly released to promote research on Korean language models.
△ Less
Submitted 28 February, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
Large Language Model-based Decision-making for COLREGs and the Control of Autonomous Surface Vehicles
Authors:
Klinsmann Agyei,
Pouria Sarhadi,
Wasif Naeem
Abstract:
In the field of autonomous surface vehicles (ASVs), devising decision-making and obstacle avoidance solutions that address maritime COLREGs (Collision Regulations), primarily defined for human operators, has long been a pressing challenge. Recent advancements in explainable Artificial Intelligence (AI) and machine learning have shown promise in enabling human-like decision-making. Notably, signifi…
▽ More
In the field of autonomous surface vehicles (ASVs), devising decision-making and obstacle avoidance solutions that address maritime COLREGs (Collision Regulations), primarily defined for human operators, has long been a pressing challenge. Recent advancements in explainable Artificial Intelligence (AI) and machine learning have shown promise in enabling human-like decision-making. Notably, significant developments have occurred in the application of Large Language Models (LLMs) to the decision-making of complex systems, such as self-driving cars. The textual and somewhat ambiguous nature of COLREGs (from an algorithmic perspective), however, poses challenges that align well with the capabilities of LLMs, suggesting that LLMs may become increasingly suitable for this application soon. This paper presents and demonstrates the first application of LLM-based decision-making and control for ASVs. The proposed method establishes a high-level decision-maker that uses online collision risk indices and key measurements to make decisions for safe manoeuvres. A tailored design and runtime structure is developed to support training and real-time action generation on a realistic ASV model. Local planning and control algorithms are integrated to execute the commands for waypoint following and collision avoidance at a lower level. To the authors' knowledge, this study represents the first attempt to apply explainable AI to the dynamic control problem of maritime systems recognising the COLREGs rules, opening new avenues for research in this challenging area. Results obtained across multiple test scenarios demonstrate the system's ability to maintain online COLREGs compliance, accurate waypoint tracking, and feasible control, while providing human-interpretable reasoning for each decision.
△ Less
Submitted 8 April, 2025; v1 submitted 25 November, 2024;
originally announced November 2024.
-
Traceable random numbers from a nonlocal quantum advantage
Authors:
Gautam A. Kavuri,
Jasper Palfree,
Dileep V. Reddy,
Yanbao Zhang,
Joshua C. Bienfang,
Michael D. Mazurek,
Mohammad A. Alhejji,
Aliza U. Siddiqui,
Joseph M. Cavanagh,
Aagam Dalal,
Carlos Abellán,
Waldimar Amaya,
Morgan W. Mitchell,
Katherine E. Stange,
Paul D. Beale,
Luís T. A. N. Brandão,
Harold Booth,
René Peralta,
Sae Woo Nam,
Richard P. Mirin,
Martin J. Stevens,
Emanuel Knill,
Lynden K. Shalm
Abstract:
The unpredictability of random numbers is fundamental to both digital security and applications that fairly distribute resources. However, existing random number generators have limitations-the generation processes cannot be fully traced, audited, and certified to be unpredictable. The algorithmic steps used in pseudorandom number generators are auditable, but they cannot guarantee that their outp…
▽ More
The unpredictability of random numbers is fundamental to both digital security and applications that fairly distribute resources. However, existing random number generators have limitations-the generation processes cannot be fully traced, audited, and certified to be unpredictable. The algorithmic steps used in pseudorandom number generators are auditable, but they cannot guarantee that their outputs were a priori unpredictable given knowledge of the initial seed. Device-independent quantum random number generators can ensure that the source of randomness was unknown beforehand, but the steps used to extract the randomness are vulnerable to tampering. Here, for the first time, we demonstrate a fully traceable random number generation protocol based on device-independent techniques. Our protocol extracts randomness from unpredictable non-local quantum correlations, and uses distributed intertwined hash chains to cryptographically trace and verify the extraction process. This protocol is at the heart of a public traceable and certifiable quantum randomness beacon that we have launched. Over the first 40 days of operation, we completed the protocol 7434 out of 7454 attempts -- a success rate of 99.7%. Each time the protocol succeeded, the beacon emitted a pulse of 512 bits of traceable randomness. The bits are certified to be uniform with error times actual success probability bounded by $2^{-64}$. The generation of certifiable and traceable randomness represents one of the first public services that operates with an entanglement-derived advantage over comparable classical approaches.
△ Less
Submitted 7 November, 2024;
originally announced November 2024.
-
Unveiling Population Heterogeneity in Health Risks Posed by Environmental Hazards Using Regression-Guided Neural Network
Authors:
Jong Woo Nam,
Eun Young Choi,
Jennifer A. Ailshire,
Yao-Yi Chiang
Abstract:
Environmental hazards place certain individuals at disproportionately higher risks. As these hazards increasingly endanger human health, precise identification of the most vulnerable population subgroups is critical for public health. Moderated multiple regression (MMR) offers a straightforward method for investigating this by adding interaction terms between the exposure to a hazard and other pop…
▽ More
Environmental hazards place certain individuals at disproportionately higher risks. As these hazards increasingly endanger human health, precise identification of the most vulnerable population subgroups is critical for public health. Moderated multiple regression (MMR) offers a straightforward method for investigating this by adding interaction terms between the exposure to a hazard and other population characteristics to a linear regression model. However, when the vulnerabilities are hidden within a cross-section of many characteristics, MMR is often limited in its capabilities to find any meaningful discoveries. Here, we introduce a hybrid method, named regression-guided neural networks (ReGNN), which utilizes artificial neural networks (ANNs) to non-linearly combine predictors, generating a latent representation that interacts with a focal predictor (i.e. variable measuring exposure to an environmental hazard). We showcase the use of ReGNN for investigating the population heterogeneity in the health effects of exposure to air pollution (PM2.5) on cognitive functioning scores. We demonstrate that population heterogeneity that would otherwise be hidden using traditional MMR can be found using ReGNN by comparing its results to the fit results of the traditional MMR models. In essence, ReGNN is a novel tool that enhances traditional regression models by effectively summarizing and quantifying an individual's susceptibility to health risks.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback
Authors:
Eunseop Yoon,
Hee Suk Yoon,
SooHwan Eom,
Gunsoo Han,
Daniel Wontae Nam,
Daejin Jo,
Kyoung-Woon On,
Mark A. Hasegawa-Johnson,
Sungwoong Kim,
Chang D. Yoo
Abstract:
Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tri…
▽ More
Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tried to provide token-level (i.e., dense) rewards for each individual token, these typically rely on predefined discrete reward values (e.g., positive: +1, negative: -1, neutral: 0), failing to account for varying degrees of preference inherent to each token. To address this limitation, we introduce TLCR (Token-Level Continuous Reward) for RLHF, which incorporates a discriminator trained to distinguish positive and negative tokens, and the confidence of the discriminator is used to assign continuous rewards to each token considering the context. Extensive experiments show that our proposed TLCR leads to consistent performance improvements over previous sequence-level or token-level discrete rewards on open-ended generation benchmarks.
△ Less
Submitted 8 December, 2024; v1 submitted 23 July, 2024;
originally announced July 2024.
-
Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores
Authors:
Kiyoon Jeong,
Woojun Lee,
Woongchan Nam,
Minjeong Ma,
Pilsung Kang
Abstract:
This report presents the ECO (Ensembled Clip score and cOnsensus score) pipeline from team DSBA LAB, which is a new framework used to evaluate and rank captions for a given image. ECO selects the most accurate caption describing image. It is made possible by combining an Ensembled CLIP score, which considers the semantic alignment between the image and captions, with a Consensus score that account…
▽ More
This report presents the ECO (Ensembled Clip score and cOnsensus score) pipeline from team DSBA LAB, which is a new framework used to evaluate and rank captions for a given image. ECO selects the most accurate caption describing image. It is made possible by combining an Ensembled CLIP score, which considers the semantic alignment between the image and captions, with a Consensus score that accounts for the essentialness of the captions. Using this framework, we achieved notable success in the CVPR 2024 Workshop Challenge on Caption Re-ranking Evaluation at the New Frontiers for Zero-Shot Image Captioning Evaluation (NICE). Specifically, we secured third place based on the CIDEr metric, second in both the SPICE and METEOR metrics, and first in the ROUGE-L and all BLEU Score metrics. The code and configuration for the ECO framework are available at https://github.com/DSBA-Lab/ECO .
△ Less
Submitted 13 June, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Binary Classifier Optimization for Large Language Model Alignment
Authors:
Seungjae Jung,
Gunsoo Han,
Daniel Wontae Nam,
Kyoung-Woon On
Abstract:
In real-world services such as ChatGPT, aligning models based on user feedback is crucial for improving model performance. However, due to the simplicity and convenience of providing feedback, users typically offer only basic binary signals, such as 'thumbs-up' or 'thumbs-down'. Most existing alignment research, on the other hand, relies on preference-based approaches that require both positive an…
▽ More
In real-world services such as ChatGPT, aligning models based on user feedback is crucial for improving model performance. However, due to the simplicity and convenience of providing feedback, users typically offer only basic binary signals, such as 'thumbs-up' or 'thumbs-down'. Most existing alignment research, on the other hand, relies on preference-based approaches that require both positive and negative responses as a pair. We propose Binary Classifier Optimization (BCO), a technique that effectively aligns LLMs using only binary feedback. BCO trains a binary classifier, where the logit serves as an implicit reward, effectively minimizing the Direct Preference Optimization (DPO) loss. We demonstrate that the binary cross-entropy loss employed in classifier training acts as an upper bound for the DPO loss. Additionally, a novel reward shift technique further minimizes the gap between the losses. We validate our methodology in two settings: first, on a paired preference dataset, where our method performs on par with DPO; and second, on a Likert-5 scale annotation dataset which stems from real users' queries. Our model consistently demonstrates effective and robust alignment across four base LLMs and three different datasets, showcasing the strength of our approach to learning from binary signals.
△ Less
Submitted 9 June, 2025; v1 submitted 6 April, 2024;
originally announced April 2024.
-
Appearance Debiased Gaze Estimation via Stochastic Subject-Wise Adversarial Learning
Authors:
Suneung Kim,
Woo-Jeoung Nam,
Seong-Whan Lee
Abstract:
Recently, appearance-based gaze estimation has been attracting attention in computer vision, and remarkable improvements have been achieved using various deep learning techniques. Despite such progress, most methods aim to infer gaze vectors from images directly, which causes overfitting to person-specific appearance factors. In this paper, we address these challenges and propose a novel framework…
▽ More
Recently, appearance-based gaze estimation has been attracting attention in computer vision, and remarkable improvements have been achieved using various deep learning techniques. Despite such progress, most methods aim to infer gaze vectors from images directly, which causes overfitting to person-specific appearance factors. In this paper, we address these challenges and propose a novel framework: Stochastic subject-wise Adversarial gaZE learning (SAZE), which trains a network to generalize the appearance of subjects. We design a Face generalization Network (Fgen-Net) using a face-to-gaze encoder and face identity classifier and a proposed adversarial loss. The proposed loss generalizes face appearance factors so that the identity classifier inferences a uniform probability distribution. In addition, the Fgen-Net is trained by a learning mechanism that optimizes the network by reselecting a subset of subjects at every training step to avoid overfitting. Our experimental results verify the robustness of the method in that it yields state-of-the-art performance, achieving 3.89 and 4.42 on the MPIIGaze and EyeDiap datasets, respectively. Furthermore, we demonstrate the positive generalization effect by conducting further experiments using face images involving different styles generated from the generative model.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Classifying Proposals of Decentralized Autonomous Organizations Using Large Language Models
Authors:
Christian Ziegler,
Marcos Miranda,
Guangye Cao,
Gustav Arentoft,
Doo Wan Nam
Abstract:
Our study demonstrates the effective use of Large Language Models (LLMs) for automating the classification of complex datasets. We specifically target proposals of Decentralized Autonomous Organizations (DAOs), as the clas-sification of this data requires the understanding of context and, therefore, depends on human expertise, leading to high costs associated with the task. The study applies an it…
▽ More
Our study demonstrates the effective use of Large Language Models (LLMs) for automating the classification of complex datasets. We specifically target proposals of Decentralized Autonomous Organizations (DAOs), as the clas-sification of this data requires the understanding of context and, therefore, depends on human expertise, leading to high costs associated with the task. The study applies an iterative approach to specify categories and further re-fine them and the prompt in each iteration, which led to an accuracy rate of 95% in classifying a set of 100 proposals. With this, we demonstrate the po-tential of LLMs to automate data labeling tasks that depend on textual con-text effectively.
△ Less
Submitted 3 July, 2024; v1 submitted 13 January, 2024;
originally announced January 2024.
-
Towards Better Visualizing the Decision Basis of Networks via Unfold and Conquer Attribution Guidance
Authors:
Jung-Ho Hong,
Woo-Jeoung Nam,
Kyu-Sung Jeon,
Seong-Whan Lee
Abstract:
Revealing the transparency of Deep Neural Networks (DNNs) has been widely studied to describe the decision mechanisms of network inner structures. In this paper, we propose a novel post-hoc framework, Unfold and Conquer Attribution Guidance (UCAG), which enhances the explainability of the network decision by spatially scrutinizing the input features with respect to the model confidence. Addressing…
▽ More
Revealing the transparency of Deep Neural Networks (DNNs) has been widely studied to describe the decision mechanisms of network inner structures. In this paper, we propose a novel post-hoc framework, Unfold and Conquer Attribution Guidance (UCAG), which enhances the explainability of the network decision by spatially scrutinizing the input features with respect to the model confidence. Addressing the phenomenon of missing detailed descriptions, UCAG sequentially complies with the confidence of slices of the image, leading to providing an abundant and clear interpretation. Therefore, it is possible to enhance the representation ability of explanation by preserving the detailed descriptions of assistant input features, which are commonly overwhelmed by the main meaningful regions. We conduct numerous evaluations to validate the performance in several metrics: i) deletion and insertion, ii) (energy-based) pointing games, and iii) positive and negative density maps. Experimental results, including qualitative comparisons, demonstrate that our method outperforms the existing methods with the nature of clear and detailed explanations and applicability.
△ Less
Submitted 6 July, 2025; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Programmable Superconducting Optoelectronic Single-Photon Synapses with Integrated Multi-State Memory
Authors:
Bryce A. Primavera,
Saeed Khan,
Richard P. Mirin,
Sae Woo Nam,
Jeffrey M. Shainline
Abstract:
The co-location of memory and processing is a core principle of neuromorphic computing. A local memory device for synaptic weight storage has long been recognized as an enabling element for large-scale, high-performance neuromorphic hardware. In this work, we demonstrate programmable superconducting synapses with integrated memories for use in superconducting optoelectronic neural systems. Superco…
▽ More
The co-location of memory and processing is a core principle of neuromorphic computing. A local memory device for synaptic weight storage has long been recognized as an enabling element for large-scale, high-performance neuromorphic hardware. In this work, we demonstrate programmable superconducting synapses with integrated memories for use in superconducting optoelectronic neural systems. Superconducting nanowire single-photon detectors and Josephson junctions are combined into programmable synaptic circuits that exhibit single-photon sensitivity, memory cells with more than 400 internal states, leaky integration of input spike events, and 0.4 fJ programming energies (including cooling power). These results are attractive for implementing a variety of supervised and unsupervised learning algorithms and lay the foundation for a new hardware platform optimized for large-scale spiking network accelerators.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Hexa: Self-Improving for Knowledge-Grounded Dialogue System
Authors:
Daejin Jo,
Daniel Wontae Nam,
Gunsoo Han,
Kyoung-Woon On,
Taehwan Kwon,
Seungeun Rho,
Sungwoong Kim
Abstract:
A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the gene…
▽ More
A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the generative performances of intermediate steps without the ground truth data. In particular, we propose a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses. Through experiments on various benchmark datasets, we empirically demonstrate that our method successfully leverages a self-improving mechanism in generating intermediate and final responses and improves the performances on the task of knowledge-grounded dialogue generation.
△ Less
Submitted 2 April, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Effortless Integration of Memory Management into Open-Domain Conversation Systems
Authors:
Eunbi Choi,
Kyoung-Woon On,
Gunsoo Han,
Sungwoong Kim,
Daniel Wontae Nam,
Daejin Jo,
Seung Eun Rho,
Taehwan Kwon,
Minjoon Seo
Abstract:
Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach. One of the limitations of the system, however, is the absence of management capability for external memory. In this paper, we propose a simple method to improve BlenderBot3 by integrating memory management ability into it. Since no training data exists for this purpose, we propo…
▽ More
Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach. One of the limitations of the system, however, is the absence of management capability for external memory. In this paper, we propose a simple method to improve BlenderBot3 by integrating memory management ability into it. Since no training data exists for this purpose, we propose an automating dataset creation for memory management. Our method 1) requires little cost for data construction, 2) does not affect performance in other tasks, and 3) reduces external memory. We show that our proposed model BlenderBot3-M^3, which is multi-task trained with memory management, outperforms BlenderBot3 with a relative 4% performance gain in terms of F1 score.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Multiplexed gradient descent: Fast online training of modern datasets on hardware neural networks without backpropagation
Authors:
Adam N. McCaughan,
Bakhrom G. Oripov,
Natesh Ganesh,
Sae Woo Nam,
Andrew Dienstfrey,
Sonia M. Buckley
Abstract:
We present multiplexed gradient descent (MGD), a gradient descent framework designed to easily train analog or digital neural networks in hardware. MGD utilizes zero-order optimization techniques for online training of hardware neural networks. We demonstrate its ability to train neural networks on modern machine learning datasets, including CIFAR-10 and Fashion-MNIST, and compare its performance…
▽ More
We present multiplexed gradient descent (MGD), a gradient descent framework designed to easily train analog or digital neural networks in hardware. MGD utilizes zero-order optimization techniques for online training of hardware neural networks. We demonstrate its ability to train neural networks on modern machine learning datasets, including CIFAR-10 and Fashion-MNIST, and compare its performance to backpropagation. Assuming realistic timescales and hardware parameters, our results indicate that these optimization techniques can train a network on emerging hardware platforms orders of magnitude faster than the wall-clock time of training via backpropagation on a standard GPU, even in the presence of imperfect weight updates or device-to-device variations in the hardware. We additionally describe how it can be applied to existing hardware as part of chip-in-the-loop training, or integrated directly at the hardware level. Crucially, the MGD framework is highly flexible, and its gradient descent process can be optimized to compensate for specific hardware limitations such as slow parameter-update speeds or limited input bandwidth.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
Reinforcement Learning-Enhanced Control Barrier Functions for Robot Manipulators
Authors:
Stephen McIlvanna,
Nhat Nguyen Minh,
Yuzhu Sun,
Mien Van,
Wasif Naeem
Abstract:
In this paper we present the implementation of a Control Barrier Function (CBF) using a quadratic program (QP) formulation that provides obstacle avoidance for a robotic manipulator arm system. CBF is a control technique that has emerged and developed over the past decade and has been extensively explored in the literature on its mathematical foundations, proof of set invariance and potential appl…
▽ More
In this paper we present the implementation of a Control Barrier Function (CBF) using a quadratic program (QP) formulation that provides obstacle avoidance for a robotic manipulator arm system. CBF is a control technique that has emerged and developed over the past decade and has been extensively explored in the literature on its mathematical foundations, proof of set invariance and potential applications for a variety of safety-critical control systems. In this work we will look at the design of CBF for the robotic manipulator obstacle avoidance, discuss the selection of the CBF parameters and present a Reinforcement Learning (RL) scheme to assist with finding parameters values that provide the most efficient trajectory to successfully avoid different sized obstacles. We then create a data-set across a range of scenarios used to train a Neural-Network (NN) model that can be used within the control scheme to allow the system to efficiently adapt to different obstacle scenarios. Computer simulations (based on Matlab/Simulink) demonstrate the effectiveness of the proposed algorithm.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
LECO: Learnable Episodic Count for Task-Specific Intrinsic Reward
Authors:
Daejin Jo,
Sungwoong Kim,
Daniel Wontae Nam,
Taehwan Kwon,
Seungeun Rho,
Jongmin Kim,
Donghoon Lee
Abstract:
Episodic count has been widely used to design a simple yet effective intrinsic motivation for reinforcement learning with a sparse reward. However, the use of episodic count in a high-dimensional state space as well as over a long episode time requires a thorough state compression and fast hashing, which hinders rigorous exploitation of it in such hard and complex exploration environments. Moreove…
▽ More
Episodic count has been widely used to design a simple yet effective intrinsic motivation for reinforcement learning with a sparse reward. However, the use of episodic count in a high-dimensional state space as well as over a long episode time requires a thorough state compression and fast hashing, which hinders rigorous exploitation of it in such hard and complex exploration environments. Moreover, the interference from task-irrelevant observations in the episodic count may cause its intrinsic motivation to overlook task-related important changes of states, and the novelty in an episodic manner can lead to repeatedly revisit the familiar states across episodes. In order to resolve these issues, in this paper, we propose a learnable hash-based episodic count, which we name LECO, that efficiently performs as a task-specific intrinsic reward in hard exploration problems. In particular, the proposed intrinsic reward consists of the episodic novelty and the task-specific modulation where the former employs a vector quantized variational autoencoder to automatically obtain the discrete state codes for fast counting while the latter regulates the episodic novelty by learning a modulator to optimize the task-specific extrinsic reward. The proposed LECO specifically enables the automatic transition from exploration to exploitation during reinforcement learning. We experimentally show that in contrast to the previous exploration methods LECO successfully solves hard exploration problems and also scales to large state spaces through the most difficult tasks in MiniGrid and DMLab environments.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
A Survey of Recent Machine Learning Solutions for Ship Collision Avoidance and Mission Planning
Authors:
Pouria Sarhadi,
Wasif Naeem,
Nikolaos Athanasopoulos
Abstract:
Machine Learning (ML) techniques have gained significant traction as a means of improving the autonomy of marine vehicles over the last few years. This article surveys the recent ML approaches utilised for ship collision avoidance (COLAV) and mission planning. Following an overview of the ever-expanding ML exploitation for maritime vehicles, key topics in the mission planning of ships are outlined…
▽ More
Machine Learning (ML) techniques have gained significant traction as a means of improving the autonomy of marine vehicles over the last few years. This article surveys the recent ML approaches utilised for ship collision avoidance (COLAV) and mission planning. Following an overview of the ever-expanding ML exploitation for maritime vehicles, key topics in the mission planning of ships are outlined. Notable papers with direct and indirect applications to the COLAV subject are technically reviewed and compared. Critiques, challenges, and future directions are also identified. The outcome clearly demonstrates the thriving research in this field, even though commercial marine ships incorporating machine intelligence able to perform autonomously under all operating conditions are still a long way off.
△ Less
Submitted 15 July, 2022; v1 submitted 6 July, 2022;
originally announced July 2022.
-
Multi-Contextual Predictions with Vision Transformer for Video Anomaly Detection
Authors:
Joo-Yeon Lee,
Woo-Jeoung Nam,
Seong-Whan Lee
Abstract:
Video Anomaly Detection(VAD) has been traditionally tackled in two main methodologies: the reconstruction-based approach and the prediction-based one. As the reconstruction-based methods learn to generalize the input image, the model merely learns an identity function and strongly causes the problem called generalizing issue. On the other hand, since the prediction-based ones learn to predict a fu…
▽ More
Video Anomaly Detection(VAD) has been traditionally tackled in two main methodologies: the reconstruction-based approach and the prediction-based one. As the reconstruction-based methods learn to generalize the input image, the model merely learns an identity function and strongly causes the problem called generalizing issue. On the other hand, since the prediction-based ones learn to predict a future frame given several previous frames, they are less sensitive to the generalizing issue. However, it is still uncertain if the model can learn the spatio-temporal context of a video. Our intuition is that the understanding of the spatio-temporal context of a video plays a vital role in VAD as it provides precise information on how the appearance of an event in a video clip changes. Hence, to fully exploit the context information for anomaly detection in video circumstances, we designed the transformer model with three different contextual prediction streams: masked, whole and partial. By learning to predict the missing frames of consecutive normal frames, our model can effectively learn various normality patterns in the video, which leads to a high reconstruction error at the abnormal cases that are unsuitable to the learned context. To verify the effectiveness of our approach, we assess our model on the public benchmark datasets: USCD Pedestrian 2, CUHK Avenue and ShanghaiTech and evaluate the performance with the anomaly score metric of reconstruction error. The results demonstrate that our proposed approach achieves a competitive performance compared to the existing video anomaly detection methods.
△ Less
Submitted 17 June, 2022;
originally announced June 2022.
-
Illuminating Salient Contributions in Neuron Activation with Attribution Equilibrium
Authors:
Woo-Jeoung Nam,
Seong-Whan Lee
Abstract:
With the remarkable success of deep neural networks, there is a growing interest in research aimed at providing clear interpretations of their decision-making processes. In this paper, we introduce Attribution Equilibrium, a novel method to decompose output predictions into fine-grained attributions, balancing positive and negative relevance for clearer visualization of the evidence behind a netwo…
▽ More
With the remarkable success of deep neural networks, there is a growing interest in research aimed at providing clear interpretations of their decision-making processes. In this paper, we introduce Attribution Equilibrium, a novel method to decompose output predictions into fine-grained attributions, balancing positive and negative relevance for clearer visualization of the evidence behind a network decision. We carefully analyze conventional approaches to decision explanation and present a different perspective on the conservation of evidence. We define the evidence as a gap between positive and negative influences among gradient-derived initial contribution maps. Then, we incorporate antagonistic elements and a user-defined criterion for the degree of positive attribution during propagation. Additionally, we consider the role of inactivated neurons in the propagation rule, thereby enhancing the discernment of less relevant elements such as the background. We conduct various assessments in a verified experimental environment with PASCAL VOC 2007, MS COCO 2014, and ImageNet datasets. The results demonstrate that our method outperforms existing attribution methods both qualitatively and quantitatively in identifying the key input features that influence model decisions.
△ Less
Submitted 28 October, 2024; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Few-Shot Object Detection with Proposal Balance Refinement
Authors:
Sueyeon Kim,
Woo-Jeoung Nam,
Seong-Whan Lee
Abstract:
Few-shot object detection has gained significant attention in recent years as it has the potential to greatly reduce the reliance on large amounts of manually annotated bounding boxes. While most existing few-shot object detection literature primarily focuses on bounding box classification by obtaining as discriminative feature embeddings as possible, we emphasize the necessity of handling the lac…
▽ More
Few-shot object detection has gained significant attention in recent years as it has the potential to greatly reduce the reliance on large amounts of manually annotated bounding boxes. While most existing few-shot object detection literature primarily focuses on bounding box classification by obtaining as discriminative feature embeddings as possible, we emphasize the necessity of handling the lack of intersection-over-union (IoU) variations induced by a biased distribution of novel samples. In this paper, we analyze the IoU imbalance that is caused by the relatively high number of low-quality region proposals, and reveal that it plays a critical role in improving few-shot learning capabilities. The well-known two stage fine-tuning technique causes insufficient quality and quantity of the novel positive samples, which hinders the effective object detection of unseen novel classes. To alleviate this issue, we present a few-shot object detection model with proposal balance refinement, a simple yet effective approach in learning object proposals using an auxiliary sequential bounding box refinement process. This process enables the detector to be optimized on the various IoU scores through additional novel class samples. To fully exploit our sequential stage architecture, we revise the fine-tuning strategy and expose the Region Proposal Network to the novel classes in order to provide increased learning opportunities for the region-of-interest (RoI) classifiers and regressors. Our extensive assessments on PASCAL VOC and COCO demonstrate that our framework substantially outperforms other existing few-shot object detection approaches.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
Demonstration of Superconducting Optoelectronic Single-Photon Synapses
Authors:
Saeed Khan,
Bryce A. Primavera,
Jeff Chiles,
Adam N. McCaughan,
Sonia M. Buckley,
Alexander N. Tait,
Adriana Lita,
John Biesecker,
Anna Fox,
David Olaya,
Richard P. Mirin,
Sae Woo Nam,
Jeffrey M. Shainline
Abstract:
Superconducting optoelectronic hardware is being explored as a path towards artificial spiking neural networks with unprecedented scales of complexity and computational ability. Such hardware combines integrated-photonic components for few-photon, light-speed communication with superconducting circuits for fast, energy-efficient computation. Monolithic integration of superconducting and photonic d…
▽ More
Superconducting optoelectronic hardware is being explored as a path towards artificial spiking neural networks with unprecedented scales of complexity and computational ability. Such hardware combines integrated-photonic components for few-photon, light-speed communication with superconducting circuits for fast, energy-efficient computation. Monolithic integration of superconducting and photonic devices is necessary for the scaling of this technology. In the present work, superconducting-nanowire single-photon detectors are monolithically integrated with Josephson junctions for the first time, enabling the realization of superconducting optoelectronic synapses. We present circuits that perform analog weighting and temporal leaky integration of single-photon presynaptic signals. Synaptic weighting is implemented in the electronic domain so that binary, single-photon communication can be maintained. Records of recent synaptic activity are locally stored as current in superconducting loops. Dendritic and neuronal nonlinearities are implemented with a second stage of Josephson circuitry. The hardware presents great design flexibility, with demonstrated synaptic time constants spanning four orders of magnitude (hundreds of nanoseconds to milliseconds). The synapses are responsive to presynaptic spike rates exceeding 10 MHz and consume approximately 33 aJ of dynamic power per synapse event before accounting for cooling. In addition to neuromorphic hardware, these circuits introduce new avenues towards realizing large-scale single-photon-detector arrays for diverse imaging, sensing, and quantum communication applications.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Style-Guided Domain Adaptation for Face Presentation Attack Detection
Authors:
Young-Eun Kim,
Woo-Jeoung Nam,
Kyungseo Min,
Seong-Whan Lee
Abstract:
Domain adaptation (DA) or domain generalization (DG) for face presentation attack detection (PAD) has attracted attention recently with its robustness against unseen attack scenarios. Existing DA/DG-based PAD methods, however, have not yet fully explored the domain-specific style information that can provide knowledge regarding attack styles (e.g., materials, background, illumination and resolutio…
▽ More
Domain adaptation (DA) or domain generalization (DG) for face presentation attack detection (PAD) has attracted attention recently with its robustness against unseen attack scenarios. Existing DA/DG-based PAD methods, however, have not yet fully explored the domain-specific style information that can provide knowledge regarding attack styles (e.g., materials, background, illumination and resolution). In this paper, we introduce a novel Style-Guided Domain Adaptation (SGDA) framework for inference-time adaptive PAD. Specifically, Style-Selective Normalization (SSN) is proposed to explore the domain-specific style information within the high-order feature statistics. The proposed SSN enables the adaptation of the model to the target domain by reducing the style difference between the target and the source domains. Moreover, we carefully design Style-Aware Meta-Learning (SAML) to boost the adaptation ability, which simulates the inference-time adaptation with style selection process on virtual test domain. In contrast to previous domain adaptation approaches, our method does not require either additional auxiliary models (e.g., domain adaptors) or the unlabeled target domain during training, which makes our method more practical to PAD task. To verify our experiments, we utilize the public datasets: MSU-MFSD, CASIA-FASD, OULU-NPU and Idiap REPLAYATTACK. In most assessments, the result demonstrates a notable gap of performance compared to the conventional DA/DG-based PAD methods.
△ Less
Submitted 19 June, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
ShapeY: Measuring Shape Recognition Capacity Using Nearest Neighbor Matching
Authors:
Jong Woo Nam,
Amanda S. Rios,
Bartlett W. Mel
Abstract:
Object recognition in humans depends primarily on shape cues. We have developed a new approach to measuring the shape recognition performance of a vision system based on nearest neighbor view matching within the system's embedding space. Our performance benchmark, ShapeY, allows for precise control of task difficulty, by enforcing that view matching span a specified degree of 3D viewpoint change a…
▽ More
Object recognition in humans depends primarily on shape cues. We have developed a new approach to measuring the shape recognition performance of a vision system based on nearest neighbor view matching within the system's embedding space. Our performance benchmark, ShapeY, allows for precise control of task difficulty, by enforcing that view matching span a specified degree of 3D viewpoint change and/or appearance change. As a first test case we measured the performance of ResNet50 pre-trained on ImageNet. Matching error rates were high. For example, a 27 degree change in object pitch led ResNet50 to match the incorrect object 45% of the time. Appearance changes were also highly disruptive. Examination of false matches indicates that ResNet50's embedding space is severely "tangled". These findings suggest ShapeY can be a useful tool for charting the progress of artificial vision systems towards human-level shape recognition capabilities.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Improving Interpretability of Deep Neural Networks in Medical Diagnosis by Investigating the Individual Units
Authors:
Woo-Jeoung Nam,
Seong-Whan Lee
Abstract:
As interpretability has been pointed out as the obstacle to the adoption of Deep Neural Networks (DNNs), there is an increasing interest in solving a transparency issue to guarantee the impressive performance. In this paper, we demonstrate the efficiency of recent attribution techniques to explain the diagnostic decision by visualizing the significant factors in the input image. By utilizing the c…
▽ More
As interpretability has been pointed out as the obstacle to the adoption of Deep Neural Networks (DNNs), there is an increasing interest in solving a transparency issue to guarantee the impressive performance. In this paper, we demonstrate the efficiency of recent attribution techniques to explain the diagnostic decision by visualizing the significant factors in the input image. By utilizing the characteristics of objectness that DNNs have learned, fully decomposing the network prediction visualizes clear localization of target lesion. To verify our work, we conduct our experiments on Chest X-ray diagnosis with publicly accessible datasets. As an intuitive assessment metric for explanations, we report the performance of intersection of Union between visual explanation and bounding box of lesions. Experiment results show that recently proposed attribution methods visualize the more accurate localization for the diagnostic decision compared to the traditionally used CAM. Furthermore, we analyze the inconsistency of intentions between humans and DNNs, which is easily obscured by high performance. By visualizing the relevant factors, it is possible to confirm that the criterion for decision is in line with the learning strategy. Our analysis of unmasking machine intelligence represents the necessity of explainability in the medical diagnostic decision.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
GMAC: A Distributional Perspective on Actor-Critic Framework
Authors:
Daniel Wontae Nam,
Younghoon Kim,
Chan Y. Park
Abstract:
In this paper, we devise a distributional framework on actor-critic as a solution to distributional instability, action type restriction, and conflation between samples and statistics. We propose a new method that minimizes the Cramér distance with the multi-step Bellman target distribution generated from a novel Sample-Replacement algorithm denoted SR($λ$), which learns the correct value distribu…
▽ More
In this paper, we devise a distributional framework on actor-critic as a solution to distributional instability, action type restriction, and conflation between samples and statistics. We propose a new method that minimizes the Cramér distance with the multi-step Bellman target distribution generated from a novel Sample-Replacement algorithm denoted SR($λ$), which learns the correct value distribution under multiple Bellman operations. Parameterizing a value distribution with Gaussian Mixture Model further improves the efficiency and the performance of the method, which we name GMAC. We empirically show that GMAC captures the correct representation of value distributions and improves the performance of a conventional actor-critic method with low computational cost, in both discrete and continuous action spaces using Arcade Learning Environment (ALE) and PyBullet environment.
△ Less
Submitted 15 July, 2021; v1 submitted 24 May, 2021;
originally announced May 2021.
-
PHIDL: Python CAD layout and geometry creation for nanolithography
Authors:
A. N. McCaughan,
A. M. Tait,
S. M. Buckley,
D. M. Oh,
J. T. Chiles,
J. M. Shainline,
S. W. Nam
Abstract:
Computer-aided design (CAD) has become a critical element in the creation of nanopatterned structures and devices. In particular, with the increased adoption of easy-to-learn programming languages like Python there has been a significant rise in the amount of lithographic geometries generated through scripting and programming. However, there are currently unaddressed gaps in usability for open-sou…
▽ More
Computer-aided design (CAD) has become a critical element in the creation of nanopatterned structures and devices. In particular, with the increased adoption of easy-to-learn programming languages like Python there has been a significant rise in the amount of lithographic geometries generated through scripting and programming. However, there are currently unaddressed gaps in usability for open-source CAD tools -- especially those in the GDSII design space -- that prevent wider adoption by scientists and students who might otherwise benefit from scripted design. For example, constructing relations between adjacent geometries is often much more difficult than necessary -- spacing a resonator structure a few micrometers from a readout structure often requires manually-coding the placement arithmetic. While inconveniences like this can be overcome by writing custom functions, they are often significant barriers to entry for new users or those less familiar with programming. To help streamline the design process and reduce barrier to entry for scripting designs, we have developed PHIDL, an open-source GDSII-based CAD tool for Python 2 and 3.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Interpreting Deep Neural Networks with Relative Sectional Propagation by Analyzing Comparative Gradients and Hostile Activations
Authors:
Woo-Jeoung Nam,
Jaesik Choi,
Seong-Whan Lee
Abstract:
The clear transparency of Deep Neural Networks (DNNs) is hampered by complex internal structures and nonlinear transformations along deep hierarchies. In this paper, we propose a new attribution method, Relative Sectional Propagation (RSP), for fully decomposing the output predictions with the characteristics of class-discriminative attributions and clear objectness. We carefully revisit some shor…
▽ More
The clear transparency of Deep Neural Networks (DNNs) is hampered by complex internal structures and nonlinear transformations along deep hierarchies. In this paper, we propose a new attribution method, Relative Sectional Propagation (RSP), for fully decomposing the output predictions with the characteristics of class-discriminative attributions and clear objectness. We carefully revisit some shortcomings of backpropagation-based attribution methods, which are trade-off relations in decomposing DNNs. We define hostile factor as an element that interferes with finding the attributions of the target and propagate it in a distinguishable way to overcome the non-suppressed nature of activated neurons. As a result, it is possible to assign the bi-polar relevance scores of the target (positive) and hostile (negative) attributions while maintaining each attribution aligned with the importance. We also present the purging techniques to prevent the decrement of the gap between the relevance scores of the target and hostile attributions during backward propagation by eliminating the conflicting units to channel attribution map. Therefore, our method makes it possible to decompose the predictions of DNNs with clearer class-discriminativeness and detailed elucidations of activation neurons compared to the conventional attribution methods. In a verified experimental environment, we report the results of the assessments: (i) Pointing Game, (ii) mIoU, and (iii) Model Sensitivity with PASCAL VOC 2007, MS COCO 2014, and ImageNet datasets. The results demonstrate that our method outperforms existing backward decomposition methods, including distinctive and intuitive visualizations.
△ Less
Submitted 12 December, 2020; v1 submitted 6 December, 2020;
originally announced December 2020.
-
Rotation Invariant Aerial Image Retrieval with Group Convolutional Metric Learning
Authors:
Hyunseung Chung,
Woo-Jeoung Nam,
Seong-Whan Lee
Abstract:
Remote sensing image retrieval (RSIR) is the process of ranking database images depending on the degree of similarity compared to the query image. As the complexity of RSIR increases due to the diversity in shooting range, angle, and location of remote sensors, there is an increasing demand for methods to address these issues and improve retrieval performance. In this work, we introduce a novel me…
▽ More
Remote sensing image retrieval (RSIR) is the process of ranking database images depending on the degree of similarity compared to the query image. As the complexity of RSIR increases due to the diversity in shooting range, angle, and location of remote sensors, there is an increasing demand for methods to address these issues and improve retrieval performance. In this work, we introduce a novel method for retrieving aerial images by merging group convolution with attention mechanism and metric learning, resulting in robustness to rotational variations. For refinement and emphasis on important features, we applied channel attention in each group convolution stage. By utilizing the characteristics of group convolution and channel-wise attention, it is possible to acknowledge the equality among rotated but identically located images. The training procedure has two main steps: (i) training the network with Aerial Image Dataset (AID) for classification, (ii) fine-tuning the network with triplet-loss for retrieval with Google Earth South Korea and NWPU-RESISC45 datasets. Results show that the proposed method performance exceeds other state-of-the-art retrieval methods in both rotated and original environments. Furthermore, we utilize class activation maps (CAM) to visualize the distinct difference of main features between our method and baseline, resulting in better adaptability in rotated environments.
△ Less
Submitted 19 October, 2020;
originally announced October 2020.
-
A Two-Stream Symmetric Network with Bidirectional Ensemble for Aerial Image Matching
Authors:
Jae-Hyun Park,
Woo-Jeoung Nam,
Seong-Whan Lee
Abstract:
In this paper, we propose a novel method to precisely match two aerial images that were obtained in different environments via a two-stream deep network. By internally augmenting the target image, the network considers the two-stream with the three input images and reflects the additional augmented pair in the training. As a result, the training process of the deep network is regularized and the n…
▽ More
In this paper, we propose a novel method to precisely match two aerial images that were obtained in different environments via a two-stream deep network. By internally augmenting the target image, the network considers the two-stream with the three input images and reflects the additional augmented pair in the training. As a result, the training process of the deep network is regularized and the network becomes robust for the variance of aerial images. Furthermore, we introduce an ensemble method that is based on the bidirectional network, which is motivated by the isomorphic nature of the geometric transformation. We obtain two global transformation parameters without any additional network or parameters, which alleviate asymmetric matching results and enable significant improvement in performance by fusing two outcomes. For the experiment, we adopt aerial images from Google Earth and the International Society for Photogrammetry and Remote Sensing (ISPRS). To quantitatively assess our result, we apply the probability of correct keypoints (PCK) metric, which measures the degree of matching. The qualitative and quantitative results show the sizable gap of performance compared to the conventional methods for matching the aerial images. All code and our trained model, as well as the dataset are available online.
△ Less
Submitted 4 February, 2020;
originally announced February 2020.
-
GeoCMS : Towards a Geo-Tagged Media Management System
Authors:
Jang You Park,
YongHee Jung,
Wei Ding,
Kwang Woo Nam
Abstract:
In this paper, we propose the design and implementation of the new geotagged media management system. A large amount of daily geo-tagged media data generated by user's smart phone, mobile device, dash cam and camera. Geotagged media, such as geovideos and geophotos, can be captured with spatial temporal information such as time, location, visible area, camera direction, moving direction and visibl…
▽ More
In this paper, we propose the design and implementation of the new geotagged media management system. A large amount of daily geo-tagged media data generated by user's smart phone, mobile device, dash cam and camera. Geotagged media, such as geovideos and geophotos, can be captured with spatial temporal information such as time, location, visible area, camera direction, moving direction and visible distance information. Due to the increase in geo-tagged multimedia data, the researches for efficient managing and mining geo-tagged multimedia are newly expected to be a new area in database and data mining. This paper proposes a geo-tagged media management system, so called Open GeoCMS(Geotagged media Contents Management System). Open GeoCMS is a new framework to manage geotagged media data on the web. Our framework supports various types which are for moving point, moving photo - a sequence of photos by a drone, moving double and moving video. Also, GeoCMS has the label viewer and editor system for photos and videos. The Open GeoCMS have been developed as an open source system.
△ Less
Submitted 9 January, 2020;
originally announced January 2020.
-
Fast Mining of Spatial Frequent Wordset from Social Database
Authors:
Yongmi Lee,
Kwang Woo Nam,
Keun Ho Ryu
Abstract:
In this paper, we propose an algorithm that extracts spatial frequent patterns to explain the relative characteristics of a specific location from the available social data. This paper proposes a spatial social data model which includes spatial social data, spatial support, spatial frequent patterns, spatial partition, and spatial clustering; these concepts are used for describing the exploration…
▽ More
In this paper, we propose an algorithm that extracts spatial frequent patterns to explain the relative characteristics of a specific location from the available social data. This paper proposes a spatial social data model which includes spatial social data, spatial support, spatial frequent patterns, spatial partition, and spatial clustering; these concepts are used for describing the exploration algorithm of spatial frequent patterns. With these defined concepts as the foundation, an SFP-tree structure that maintains not only the frequent words but also the frequent cells was proposed, and an SFP-growth algorithm that explores the frequent patterns on the basis of this SFP-tree was proposed.
△ Less
Submitted 26 December, 2019; v1 submitted 19 December, 2019;
originally announced December 2019.
-
Measuring similarity between geo-tagged videos using largest common view
Authors:
Wei Ding,
KwangSoo Yang,
Kwang Woo Nam
Abstract:
This paper presents a novel problem for discovering the similar trajectories based on the field of view (FoV) of the video data. The problem is important for many societal applications such as grouping moving objects, classifying geo-images, and identifying the interesting trajectory patterns. Prior work consider only either spatial locations or spatial relationship between two line-segments. Howe…
▽ More
This paper presents a novel problem for discovering the similar trajectories based on the field of view (FoV) of the video data. The problem is important for many societal applications such as grouping moving objects, classifying geo-images, and identifying the interesting trajectory patterns. Prior work consider only either spatial locations or spatial relationship between two line-segments. However, these approaches show a limitation to find the similar moving objects with common views. In this paper, we propose new algorithm that can group both spatial locations and points of view to identify similar trajectories. We also propose novel methods that reduce the computational cost for the proposed work. Experimental results using real-world datasets demonstrates that the proposed approach outperforms prior work and reduces the computational cost.
△ Less
Submitted 28 April, 2019;
originally announced May 2019.
-
Relative Attributing Propagation: Interpreting the Comparative Contributions of Individual Units in Deep Neural Networks
Authors:
Woo-Jeoung Nam,
Shir Gur,
Jaesik Choi,
Lior Wolf,
Seong-Whan Lee
Abstract:
As Deep Neural Networks (DNNs) have demonstrated superhuman performance in a variety of fields, there is an increasing interest in understanding the complex internal mechanisms of DNNs. In this paper, we propose Relative Attributing Propagation (RAP), which decomposes the output predictions of DNNs with a new perspective of separating the relevant (positive) and irrelevant (negative) attributions…
▽ More
As Deep Neural Networks (DNNs) have demonstrated superhuman performance in a variety of fields, there is an increasing interest in understanding the complex internal mechanisms of DNNs. In this paper, we propose Relative Attributing Propagation (RAP), which decomposes the output predictions of DNNs with a new perspective of separating the relevant (positive) and irrelevant (negative) attributions according to the relative influence between the layers. The relevance of each neuron is identified with respect to its degree of contribution, separated into positive and negative, while preserving the conservation rule. Considering the relevance assigned to neurons in terms of relative priority, RAP allows each neuron to be assigned with a bi-polar importance score concerning the output: from highly relevant to highly irrelevant. Therefore, our method makes it possible to interpret DNNs with much clearer and attentive visualizations of the separated attributions than the conventional explaining methods. To verify that the attributions propagated by RAP correctly account for each meaning, we utilize the evaluation metrics: (i) Outside-inside relevance ratio, (ii) Segmentation mIOU and (iii) Region perturbation. In all experiments and metrics, we present a sizable gap in comparison to the existing literature. Our source code is available in \url{https://github.com/wjNam/Relative_Attributing_Propagation}.
△ Less
Submitted 13 November, 2019; v1 submitted 1 April, 2019;
originally announced April 2019.
-
Superconducting Optoelectronic Neurons II: Receiver Circuits
Authors:
Jeffrey M. Shainline,
Sonia M. Buckley,
Adam N. McCaughan,
Manuel Castellanos-Beltran,
Christine A. Donnelly,
Michael L. Schneider,
Richard P. Mirin,
Sae Woo Nam
Abstract:
Circuits using superconducting single-photon detectors and Josephson junctions to perform signal reception, synaptic weighting, and integration are investigated. The circuits convert photon-detection events into flux quanta, the number of which is determined by the synaptic weight. The current from many synaptic connections is inductively coupled to a superconducting loop that implements the neuro…
▽ More
Circuits using superconducting single-photon detectors and Josephson junctions to perform signal reception, synaptic weighting, and integration are investigated. The circuits convert photon-detection events into flux quanta, the number of which is determined by the synaptic weight. The current from many synaptic connections is inductively coupled to a superconducting loop that implements the neuronal threshold operation. Designs are presented for synapses and neurons that perform integration as well as detect coincidence events for temporal coding. Both excitatory and inhibitory connections are demonstrated. It is shown that a neuron with a single integration loop can receive input from 1000 such synaptic connections, and neurons of similar design could employ many loops for dendritic processing.
△ Less
Submitted 15 May, 2018; v1 submitted 7 May, 2018;
originally announced May 2018.
-
Circuit designs for superconducting optoelectronic loop neurons
Authors:
Jeffrey M. Shainline,
Sonia M. Buckley,
Adam N. McCaughan,
Jeff Chiles,
Richard P. Mirin,
Sae Woo Nam
Abstract:
Optical communication achieves high fanout and short delay advantageous for information integration in neural systems. Superconducting detectors enable signaling with single photons for maximal energy efficiency. We present designs of superconducting optoelectronic neurons based on superconducting single-photon detectors, Josephson junctions, semiconductor light sources, and multi-planar dielectri…
▽ More
Optical communication achieves high fanout and short delay advantageous for information integration in neural systems. Superconducting detectors enable signaling with single photons for maximal energy efficiency. We present designs of superconducting optoelectronic neurons based on superconducting single-photon detectors, Josephson junctions, semiconductor light sources, and multi-planar dielectric waveguides. These circuits achieve complex synaptic and neuronal functions with high energy efficiency, leveraging the strengths of light for communication and superconducting electronics for computation. The neurons send few-photon signals to synaptic connections. These signals communicate neuronal firing events as well as update synaptic weights. Spike-timing-dependent plasticity is implemented with a single photon triggering each step of the process. Microscale light-emitting diodes and waveguide networks enable connectivity from a neuron to thousands of synaptic connections, and the use of light for communication enables synchronization of neurons across an area limited only by the distance light can travel within the period of a network oscillation. Experimentally, each of the requisite circuit elements has been demonstrated, yet a hardware platform combining them all has not been attempted. Compared to digital logic or quantum computing, device tolerances are relaxed. For this neural application, optical sources providing incoherent pulses with 10,000 photons produced with efficiency of 10$^{-3}$ operating at 20\,MHz at 4.2\,K are sufficient to enable a massively scalable neural computing platform with connectivity comparable to the brain and thirty thousand times higher speed.
△ Less
Submitted 7 September, 2018; v1 submitted 4 May, 2018;
originally announced May 2018.
-
Superconducting Optoelectronic Neurons V: Networks and Scaling
Authors:
Jeffrey M. Shainline,
Jeff Chiles,
Sonia M. Buckley,
Adam N. McCaughan,
Richard P. Mirin,
Sae Woo Nam
Abstract:
Networks of superconducting optoelectronic neurons are investigated for their near-term technological potential and long-term physical limitations. Networks with short average path length, high clustering coefficient, and power-law degree distribution are designed using a growth model that assigns connections between new and existing nodes based on spatial distance as well as degree of existing no…
▽ More
Networks of superconducting optoelectronic neurons are investigated for their near-term technological potential and long-term physical limitations. Networks with short average path length, high clustering coefficient, and power-law degree distribution are designed using a growth model that assigns connections between new and existing nodes based on spatial distance as well as degree of existing nodes. The network construction algorithm is scalable to arbitrary levels of network hierarchy and achieves systems with fractal spatial properties and efficient wiring. By modeling the physical size of superconducting optoelectronic neurons, we calculate the area of these networks. A system with 8100 neurons and 330,430 total synapses will fit on a 1\,cm $\times$ 1\,cm die. Systems of millions of neurons with hundreds of millions of synapses will fit on a 300\,mm wafer. For multi-wafer assemblies, communication at light speed enables a neuronal pool the size of a large data center comprising 100 trillion neurons with coherent oscillations at 1\,MHz. Assuming a power law frequency distribution, as is necessary for self-organized criticality, we calculate the power consumption of the networks. We find the use of single photons for communication and superconducting circuits for computation leads to power density low enough to be cooled by liquid $^4$He for networks of any scale.
△ Less
Submitted 15 May, 2018; v1 submitted 4 May, 2018;
originally announced May 2018.
-
Superconducting Optoelectronic Neurons IV: Transmitter Circuits
Authors:
Jeffrey M. Shainline,
Adam N. McCaughan,
Sonia M. Buckley,
Richard P. Mirin,
Sae Woo Nam
Abstract:
A superconducting optoelectronic neuron will produce a small current pulse upon reaching threshold. We present an amplifier chain that converts this small current pulse to a voltage pulse sufficient to produce light from a semiconductor diode. This light is the signal used to communicate between neurons in the network. The amplifier chain comprises a thresholding Josephson junction, a relaxation o…
▽ More
A superconducting optoelectronic neuron will produce a small current pulse upon reaching threshold. We present an amplifier chain that converts this small current pulse to a voltage pulse sufficient to produce light from a semiconductor diode. This light is the signal used to communicate between neurons in the network. The amplifier chain comprises a thresholding Josephson junction, a relaxation oscillator Josephson junction, a superconducting thin-film current-gated current amplifier, and a superconducting thin-film current-gated voltage amplifier. We analyze the performance of the elements in the amplifier chain in the time domain to calculate the energy consumption per photon created for several values of light-emitting diode capacitance and efficiency. The speed of the amplification sequence allows neuronal firing up to at least 20\,MHz with power density low enough to be cooled easily with standard $^4$He cryogenic systems operating at 4.2\,K.
△ Less
Submitted 8 May, 2018; v1 submitted 4 May, 2018;
originally announced May 2018.
-
Superconducting Optoelectronic Neurons III: Synaptic Plasticity
Authors:
Jeffrey M. Shainline,
Adam N. McCaughan,
Sonia M. Buckley,
Christine A. Donnelly,
Manuel Castellanos-Beltran,
Michael L. Schneider,
Richard P. Mirin,
Sae Woo Nam
Abstract:
As a means of dynamically reconfiguring the synaptic weight of a superconducting optoelectronic loop neuron, a superconducting flux storage loop is inductively coupled to the synaptic current bias of the neuron. A standard flux memory cell is used to achieve a binary synapse, and loops capable of storing many flux quanta are used to enact multi-stable synapses. Circuits are designed to implement s…
▽ More
As a means of dynamically reconfiguring the synaptic weight of a superconducting optoelectronic loop neuron, a superconducting flux storage loop is inductively coupled to the synaptic current bias of the neuron. A standard flux memory cell is used to achieve a binary synapse, and loops capable of storing many flux quanta are used to enact multi-stable synapses. Circuits are designed to implement supervised learning wherein current pulses add or remove flux from the loop to strengthen or weaken the synaptic weight. Designs are presented for circuits with hundreds of intermediate synaptic weights between minimum and maximum strengths. Circuits for implementing unsupervised learning are modeled using two photons to strengthen and two photons to weaken the synaptic weight via Hebbian and anti-Hebbian learning rules, and techniques are proposed to control the learning rate. Implementation of short-term plasticity, homeostatic plasticity, and metaplasticity in loop neurons is discussed.
△ Less
Submitted 3 July, 2018; v1 submitted 4 May, 2018;
originally announced May 2018.
-
Superconducting Optoelectronic Neurons I: General Principles
Authors:
Jeffrey M. Shainline,
Sonia M. Buckley,
Adam N. McCaughan,
Jeff Chiles,
Richard P. Mirin,
Sae Woo Nam
Abstract:
The design of neural hardware is informed by the prominence of differentiated processing and information integration in cognitive systems. The central role of communication leads to the principal assumption of the hardware platform: signals between neurons should be optical to enable fanout and communication with minimal delay. The requirement of energy efficiency leads to the utilization of super…
▽ More
The design of neural hardware is informed by the prominence of differentiated processing and information integration in cognitive systems. The central role of communication leads to the principal assumption of the hardware platform: signals between neurons should be optical to enable fanout and communication with minimal delay. The requirement of energy efficiency leads to the utilization of superconducting detectors to receive single-photon signals. We discuss the potential of superconducting optoelectronic hardware to achieve the spatial and temporal information integration advantageous for cognitive processing, and we consider physical scaling limits based on light-speed communication. We introduce the superconducting optoelectronic neurons and networks that are the subject of the subsequent papers in this series.
△ Less
Submitted 24 May, 2018; v1 submitted 4 May, 2018;
originally announced May 2018.
-
Texture Enhancement via High-Resolution Style Transfer for Single-Image Super-Resolution
Authors:
Il Jun Ahn,
Woo Hyun Nam
Abstract:
Recently, various deep-neural-network (DNN)-based approaches have been proposed for single-image super-resolution (SISR). Despite their promising results on major structure regions such as edges and lines, they still suffer from limited performance on texture regions that consist of very complex and fine patterns. This is because, during the acquisition of a low-resolution (LR) image via down-samp…
▽ More
Recently, various deep-neural-network (DNN)-based approaches have been proposed for single-image super-resolution (SISR). Despite their promising results on major structure regions such as edges and lines, they still suffer from limited performance on texture regions that consist of very complex and fine patterns. This is because, during the acquisition of a low-resolution (LR) image via down-sampling, these regions lose most of the high frequency information necessary to represent the texture details. In this paper, we present a novel texture enhancement framework for SISR to effectively improve the spatial resolution in the texture regions as well as edges and lines. We call our method, high-resolution (HR) style transfer algorithm. Our framework consists of three steps: (i) generate an initial HR image from an interpolated LR image via an SISR algorithm, (ii) generate an HR style image from the initial HR image via down-scaling and tiling, and (iii) combine the HR style image with the initial HR image via a customized style transfer algorithm. Here, the HR style image is obtained by down-scaling the initial HR image and then repetitively tiling it into an image of the same size as the HR image. This down-scaling and tiling process comes from the idea that texture regions are often composed of small regions that similar in appearance albeit sometimes different in scale. This process creates an HR style image that is rich in details, which can be used to restore high-frequency texture details back into the initial HR image via the style transfer algorithm. Experimental results on a number of texture datasets show that our proposed HR style transfer algorithm provides more visually pleasing results compared with competitive methods.
△ Less
Submitted 30 November, 2016;
originally announced December 2016.
-
Superconducting optoelectronic circuits for neuromorphic computing
Authors:
Jeffrey M. Shainline,
Sonia M. Buckley,
Richard P. Mirin,
Sae Woo Nam
Abstract:
Neural networks have proven effective for solving many difficult computational problems. Implementing complex neural networks in software is very computationally expensive. To explore the limits of information processing, it will be necessary to implement new hardware platforms with large numbers of neurons, each with a large number of connections to other neurons. Here we propose a hybrid semicon…
▽ More
Neural networks have proven effective for solving many difficult computational problems. Implementing complex neural networks in software is very computationally expensive. To explore the limits of information processing, it will be necessary to implement new hardware platforms with large numbers of neurons, each with a large number of connections to other neurons. Here we propose a hybrid semiconductor-superconductor hardware platform for the implementation of neural networks and large-scale neuromorphic computing. The platform combines semiconducting few-photon light-emitting diodes with superconducting-nanowire single-photon detectors to behave as spiking neurons. These processing units are connected via a network of optical waveguides, and variable weights of connection can be implemented using several approaches. The use of light as a signaling mechanism overcomes fanout and parasitic constraints on electrical signals while simultaneously introducing physical degrees of freedom which can be employed for computation. The use of supercurrents achieves the low power density necessary to scale to systems with enormous entropy. The proposed processing units can operate at speeds of at least $20$ MHz with fully asynchronous activity, light-speed-limited latency, and power densities on the order of 1 mW/cm$^2$ for neurons with 700 connections operating at full speed at 2 K. The processing units achieve an energy efficiency of $\approx 20$ aJ per synapse event. By leveraging multilayer photonics with deposited waveguides and superconductors with feature sizes $>$ 100 nm, this approach could scale to systems with massive interconnectivity and complexity for advanced computing as well as explorations of information processing capacity in systems with an enormous number of information-bearing microstates.
△ Less
Submitted 10 November, 2016; v1 submitted 30 September, 2016;
originally announced October 2016.
-
Local Decorrelation For Improved Detection
Authors:
Woonhyun Nam,
Piotr Dollár,
Joon Hee Han
Abstract:
Even with the advent of more sophisticated, data-hungry methods, boosted decision trees remain extraordinarily successful for fast rigid object detection, achieving top accuracy on numerous datasets. While effective, most boosted detectors use decision trees with orthogonal (single feature) splits, and the topology of the resulting decision boundary may not be well matched to the natural topology…
▽ More
Even with the advent of more sophisticated, data-hungry methods, boosted decision trees remain extraordinarily successful for fast rigid object detection, achieving top accuracy on numerous datasets. While effective, most boosted detectors use decision trees with orthogonal (single feature) splits, and the topology of the resulting decision boundary may not be well matched to the natural topology of the data. Given highly correlated data, decision trees with oblique (multiple feature) splits can be effective. Use of oblique splits, however, comes at considerable computational expense. Inspired by recent work on discriminative decorrelation of HOG features, we instead propose an efficient feature transform that removes correlations in local neighborhoods. The result is an overcomplete but locally decorrelated representation ideally suited for use with orthogonal decision trees. In fact, orthogonal trees with our locally decorrelated features outperform oblique trees trained over the original features at a fraction of the computational cost. The overall improvement in accuracy is dramatic: on the Caltech Pedestrian Dataset, we reduce false positives nearly tenfold over the previous state-of-the-art.
△ Less
Submitted 3 November, 2014; v1 submitted 4 June, 2014;
originally announced June 2014.
-
On the capacity limit of wireless channels under colored scattering
Authors:
Wooseok Nam,
Dongwoon Bai,
Jungwon Lee,
Inyup Kang
Abstract:
It has been generally believed that the multiple-input multiple-output (MIMO) channel capacity grows linearly with the size of antenna arrays. In terms of degrees of freedom, linear transmit and receive arrays of length $L$ in a scattering environment of total angular spread $|Ω|$ asymptotically have $|Ω| L$ degrees of freedom. In this paper, it is claimed that the linear increase in degrees of fr…
▽ More
It has been generally believed that the multiple-input multiple-output (MIMO) channel capacity grows linearly with the size of antenna arrays. In terms of degrees of freedom, linear transmit and receive arrays of length $L$ in a scattering environment of total angular spread $|Ω|$ asymptotically have $|Ω| L$ degrees of freedom. In this paper, it is claimed that the linear increase in degrees of freedom may not be attained when scattered electromagnetic fields in the underlying scattering environment are statistically correlated. After introducing a model of correlated scattering, which is referred to as the colored scattering model, we derive the number of degrees of freedom. Unlike the uncorrelated case, the number of degrees of freedom in the colored scattering channel is asymptotically limited by $|Ω| \cdot \min \{L, 1/Γ}$, where $Γ$ is a parameter determining the extent of correlation. In other words, for very large arrays in the colored scattering environment, degrees of freedom can get saturated to an intrinsic limit rather than increasing linearly with the array size.
△ Less
Submitted 30 November, 2012;
originally announced December 2012.
-
Least-squares based iterative multipath super-resolution technique
Authors:
Wooseok Nam,
Seung-Hyun Kong
Abstract:
In this paper, we study the problem of multipath channel estimation for direct sequence spread spectrum signals. To resolve multipath components arriving within a short interval, we propose a new algorithm called the least-squares based iterative multipath super-resolution (LIMS). Compared to conventional super-resolution techniques, such as the multiple signal classification (MUSIC) and the estim…
▽ More
In this paper, we study the problem of multipath channel estimation for direct sequence spread spectrum signals. To resolve multipath components arriving within a short interval, we propose a new algorithm called the least-squares based iterative multipath super-resolution (LIMS). Compared to conventional super-resolution techniques, such as the multiple signal classification (MUSIC) and the estimation of signal parameters via rotation invariance techniques (ESPRIT), our algorithm has several appealing features. In particular, even in critical situations where the conventional super-resolution techniques are not very powerful due to limited data or the correlation between path coefficients, the LIMS algorithm can produce successful results. In addition, due to its iterative nature, the LIMS algorithm is suitable for recursive multipath tracking, whereas the conventional super-resolution techniques may not be. Through numerical simulations, we show that the LIMS algorithm can resolve the first arrival path among closely arriving independently faded multipaths with a much lower mean square error than can conventional early-late discriminator based techniques.
△ Less
Submitted 29 March, 2011;
originally announced April 2011.
-
Capacity of the Gaussian Two-way Relay Channel to within 1/2 Bit
Authors:
Wooseok Nam,
Sae-Young Chung,
Yong H. Lee
Abstract:
In this paper, a Gaussian two-way relay channel, where two source nodes exchange messages with each other through a relay, is considered. We assume that all nodes operate in full-duplex mode and there is no direct channel between the source nodes. We propose an achievable scheme composed of nested lattice codes for the uplink and structured binning for the downlink. We show that the scheme achie…
▽ More
In this paper, a Gaussian two-way relay channel, where two source nodes exchange messages with each other through a relay, is considered. We assume that all nodes operate in full-duplex mode and there is no direct channel between the source nodes. We propose an achievable scheme composed of nested lattice codes for the uplink and structured binning for the downlink. We show that the scheme achieves within 1/2 bit from the cut-set bound for all channel parameters and becomes asymptotically optimal as the signal to noise ratios increase.
△ Less
Submitted 14 February, 2009;
originally announced February 2009.
-
Nested Lattice Codes for Gaussian Relay Networks with Interference
Authors:
Wooseok Nam,
Sae-Young Chung,
Yong H. Lee
Abstract:
In this paper, a class of relay networks is considered. We assume that, at a node, outgoing channels to its neighbors are orthogonal, while incoming signals from neighbors can interfere with each other. We are interested in the multicast capacity of these networks. As a subclass, we first focus on Gaussian relay networks with interference and find an achievable rate using a lattice coding scheme…
▽ More
In this paper, a class of relay networks is considered. We assume that, at a node, outgoing channels to its neighbors are orthogonal, while incoming signals from neighbors can interfere with each other. We are interested in the multicast capacity of these networks. As a subclass, we first focus on Gaussian relay networks with interference and find an achievable rate using a lattice coding scheme. It is shown that there is a constant gap between our achievable rate and the information theoretic cut-set bound. This is similar to the recent result by Avestimehr, Diggavi, and Tse, who showed such an approximate characterization of the capacity of general Gaussian relay networks. However, our achievability uses a structured code instead of a random one. Using the same idea used in the Gaussian case, we also consider linear finite-field symmetric networks with interference and characterize the capacity using a linear coding scheme.
△ Less
Submitted 14 February, 2009;
originally announced February 2009.