-
Debiasing pipeline improves deep learning model generalization for X-ray based lung nodule detection
Authors:
Michael Horry,
Subrata Chakraborty,
Biswajeet Pradhan,
Manoranjan Paul,
Jing Zhu,
Hui Wen Loh,
Prabal Datta Barua,
U. Rajendra Arharya
Abstract:
Lung cancer is the leading cause of cancer death worldwide and a good prognosis depends on early diagnosis. Unfortunately, screening programs for the early diagnosis of lung cancer are uncommon. This is in-part due to the at-risk groups being located in rural areas far from medical facilities. Reaching these populations would require a scaled approach that combines mobility, low cost, speed, accur…
▽ More
Lung cancer is the leading cause of cancer death worldwide and a good prognosis depends on early diagnosis. Unfortunately, screening programs for the early diagnosis of lung cancer are uncommon. This is in-part due to the at-risk groups being located in rural areas far from medical facilities. Reaching these populations would require a scaled approach that combines mobility, low cost, speed, accuracy, and privacy. We can resolve these issues by combining the chest X-ray imaging mode with a federated deep-learning approach, provided that the federated model is trained on homogenous data to ensure that no single data source can adversely bias the model at any point in time. In this study we show that an image pre-processing pipeline that homogenizes and debiases chest X-ray images can improve both internal classification and external generalization, paving the way for a low-cost and accessible deep learning-based clinical system for lung cancer screening. An evolutionary pruning mechanism is used to train a nodule detection deep learning model on the most informative images from a publicly available lung nodule X-ray dataset. Histogram equalization is used to remove systematic differences in image brightness and contrast. Model training is performed using all combinations of lung field segmentation, close cropping, and rib suppression operators. We show that this pre-processing pipeline results in deep learning models that successfully generalize an independent lung nodule dataset using ablation studies to assess the contribution of each operator in this pipeline. In stripping chest X-ray images of known confounding variables by lung field segmentation, along with suppression of signal noise from the bone structure we can train a highly accurate deep learning lung nodule detection algorithm with outstanding generalization accuracy of 89% to nodule samples in unseen data.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
A Study of Experiment-based Radio Frequency Electromagnetic Field Exposure Evidence on Stochastic Nature of A Massive MIMO System
Authors:
Tian Hong Loh,
David Cheadle,
Fabien Heliot,
Ayodeji Sunday,
Micheal Dieudonne
Abstract:
In this paper, a massive multiple-input-multiple-output (mMIMO) testbed that is capable of mimicking realistic 5G new radio (NR) base station (BS) beamforming performance has been utilised to gather experimental-based evidence of 5G BS RF-EMF exposure within a real-world indoor environment. The mMIMO testbed has up to 128 RF channels with user-programmable software defined radio (SDR) capability.…
▽ More
In this paper, a massive multiple-input-multiple-output (mMIMO) testbed that is capable of mimicking realistic 5G new radio (NR) base station (BS) beamforming performance has been utilised to gather experimental-based evidence of 5G BS RF-EMF exposure within a real-world indoor environment. The mMIMO testbed has up to 128 RF channels with user-programmable software defined radio (SDR) capability. The stochastic nature of the 5G NR mMIMO system has been statistically assessed by evaluating the spatial variation of the RF-EMF exposure surrounding the mMIMO testbed when taking into account different beam profiles and data rates. Several other factors that influence the RF-EMF of mMIMO system have also being considered.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
Application of artificial intelligence techniques for automated detection of myocardial infarction: A review
Authors:
Javad Hassannataj Joloudari,
Sanaz Mojrian,
Issa Nodehi,
Amir Mashmool,
Zeynab Kiani Zadegan,
Sahar Khanjani Shirkharkolaie,
Roohallah Alizadehsani,
Tahereh Tamadon,
Samiyeh Khosravi,
Mitra Akbari Kohnehshari,
Edris Hassannatajjeloudari,
Danial Sharifrazi,
Amir Mosavi,
Hui Wen Loh,
Ru-San Tan,
U Rajendra Acharya
Abstract:
Myocardial infarction (MI) results in heart muscle injury due to receiving insufficient blood flow. MI is the most common cause of mortality in middle-aged and elderly individuals around the world. To diagnose MI, clinicians need to interpret electrocardiography (ECG) signals, which requires expertise and is subject to observer bias. Artificial intelligence-based methods can be utilized to screen…
▽ More
Myocardial infarction (MI) results in heart muscle injury due to receiving insufficient blood flow. MI is the most common cause of mortality in middle-aged and elderly individuals around the world. To diagnose MI, clinicians need to interpret electrocardiography (ECG) signals, which requires expertise and is subject to observer bias. Artificial intelligence-based methods can be utilized to screen for or diagnose MI automatically using ECG signals. In this work, we conducted a comprehensive assessment of artificial intelligence-based approaches for MI detection based on ECG as well as other biophysical signals, including machine learning (ML) and deep learning (DL) models. The performance of traditional ML methods relies on handcrafted features and manual selection of ECG signals, whereas DL models can automate these tasks. The review observed that deep convolutional neural networks (DCNNs) yielded excellent classification performance for MI diagnosis, which explains why they have become prevalent in recent years. To our knowledge, this is the first comprehensive survey of artificial intelligence techniques employed for MI diagnosis using ECG and other biophysical signals.
△ Less
Submitted 21 February, 2022; v1 submitted 5 July, 2021;
originally announced July 2021.
-
Deep Attentive Study Session Dropout Prediction in Mobile Learning Environment
Authors:
Youngnam Lee,
Dongmin Shin,
HyunBin Loh,
Jaemin Lee,
Piljae Chae,
Junghyun Cho,
Seoyon Park,
Jinhwan Lee,
Jineon Baek,
Byungsoo Kim,
Youngduck Choi
Abstract:
Student dropout prediction provides an opportunity to improve student engagement, which maximizes the overall effectiveness of learning experiences. However, researches on student dropout were mainly conducted on school dropout or course dropout, and study session dropout in a mobile learning environment has not been considered thoroughly. In this paper, we investigate the study session dropout pr…
▽ More
Student dropout prediction provides an opportunity to improve student engagement, which maximizes the overall effectiveness of learning experiences. However, researches on student dropout were mainly conducted on school dropout or course dropout, and study session dropout in a mobile learning environment has not been considered thoroughly. In this paper, we investigate the study session dropout prediction problem in a mobile learning environment. First, we define the concept of the study session, study session dropout and study session dropout prediction task in a mobile learning environment. Based on the definitions, we propose a novel Transformer based model for predicting study session dropout, DAS: Deep Attentive Study Session Dropout Prediction in Mobile Learning Environment. DAS has an encoder-decoder structure which is composed of stacked multi-head attention and point-wise feed-forward networks. The deep attentive computations in DAS are capable of capturing complex relations among dynamic student interactions. To the best of our knowledge, this is the first attempt to investigate study session dropout in a mobile learning environment. Empirical evaluations on a large-scale dataset show that DAS achieves the best performance with a significant improvement in area under the receiver operating characteristic curve compared to baseline models.
△ Less
Submitted 1 February, 2021; v1 submitted 14 February, 2020;
originally announced February 2020.
-
Creating A Neural Pedagogical Agent by Jointly Learning to Review and Assess
Authors:
Youngnam Lee,
Youngduck Choi,
Junghyun Cho,
Alexander R. Fabbri,
Hyunbin Loh,
Chanyou Hwang,
Yongku Lee,
Sang-Wook Kim,
Dragomir Radev
Abstract:
Machine learning plays an increasing role in intelligent tutoring systems as both the amount of data available and specialization among students grow. Nowadays, these systems are frequently deployed on mobile applications. Users on such mobile education platforms are dynamic, frequently being added, accessing the application with varying levels of focus, and changing while using the service. The e…
▽ More
Machine learning plays an increasing role in intelligent tutoring systems as both the amount of data available and specialization among students grow. Nowadays, these systems are frequently deployed on mobile applications. Users on such mobile education platforms are dynamic, frequently being added, accessing the application with varying levels of focus, and changing while using the service. The education material itself, on the other hand, is often static and is an exhaustible resource whose use in tasks such as problem recommendation must be optimized. The ability to update user models with respect to educational material in real-time is thus essential; however, existing approaches require time-consuming re-training of user features whenever new data is added. In this paper, we introduce a neural pedagogical agent for real-time user modeling in the task of predicting user response correctness, a central task for mobile education applications. Our model, inspired by work in natural language processing on sequence modeling and machine translation, updates user features in real-time via bidirectional recurrent neural networks with an attention mechanism over embedded question-response pairs. We experiment on the mobile education application SantaTOEIC, which has 559k users, 66M response data points as well as a set of 10k study problems each expert-annotated with topic tags and gathered since 2016. Our model outperforms existing approaches over several metrics in predicting user response correctness, notably out-performing other methods on new users without large question-response histories. Additionally, our attention mechanism and annotated tag set allow us to create an interpretable education platform, with a smart review system that addresses the aforementioned issue of varied user attention and problem exhaustion.
△ Less
Submitted 1 July, 2019; v1 submitted 26 June, 2019;
originally announced June 2019.
-
High-Performance and Energy-Effcient Memory Scheduler Design for Heterogeneous Systems
Authors:
Rachata Ausavarungnirun,
Gabriel H. Loh,
Lavanya Subramanian,
Kevin Chang,
Onur Mutlu
Abstract:
When multiple processor cores (CPUs) and a GPU integrated together on the same chip share the off-chip DRAM, requests from the GPU can heavily interfere with requests from the CPUs, leading to low system performance and starvation of cores. Unfortunately, state-of-the-art memory scheduling algorithms are ineffective at solving this problem due to the very large amount of GPU memory traffic, unless…
▽ More
When multiple processor cores (CPUs) and a GPU integrated together on the same chip share the off-chip DRAM, requests from the GPU can heavily interfere with requests from the CPUs, leading to low system performance and starvation of cores. Unfortunately, state-of-the-art memory scheduling algorithms are ineffective at solving this problem due to the very large amount of GPU memory traffic, unless a very large and costly request buffer is employed to provide these algorithms with enough visibility across the global request stream.
Previously-proposed memory controller (MC) designs use a single monolithic structure to perform three main tasks. First, the MC attempts to schedule together requests to the same DRAM row to increase row buffer hit rates. Second, the MC arbitrates among the requesters (CPUs and GPU) to optimize for overall system throughput, average response time, fairness and quality of service. Third, the MC manages the low-level DRAM command scheduling to complete requests while ensuring compliance with all DRAM timing and power constraints. This paper proposes a fundamentally new approach, called the Staged Memory Scheduler (SMS), which decouples the three primary MC tasks into three significantly simpler structures that together improve system performance and fairness. Our evaluation shows that SMS provides 41.2% performance improvement and fairness improvement compared to the best previous state-of-the-art technique, while enabling a design that is significantly less complex and more power-efficient to implement.
△ Less
Submitted 30 April, 2018;
originally announced April 2018.
-
Holistic Management of the GPGPU Memory Hierarchy to Manage Warp-level Latency Tolerance
Authors:
Rachata Ausavarungnirun,
Saugata Ghose,
Onur Kayıran,
Gabriel H. Loh,
Chita R. Das,
Mahmut T. Kandemir,
Onur Mutlu
Abstract:
In a modern GPU architecture, all threads within a warp execute the same instruction in lockstep. For a memory instruction, this can lead to memory divergence: the memory requests for some threads are serviced early, while the remaining requests incur long latencies. This divergence stalls the warp, as it cannot execute the next instruction until all requests from the current instruction complete.…
▽ More
In a modern GPU architecture, all threads within a warp execute the same instruction in lockstep. For a memory instruction, this can lead to memory divergence: the memory requests for some threads are serviced early, while the remaining requests incur long latencies. This divergence stalls the warp, as it cannot execute the next instruction until all requests from the current instruction complete. In this work, we make three new observations. First, GPGPU warps exhibit heterogeneous memory divergence behavior at the shared cache: some warps have most of their requests hit in the cache, while other warps see most of their request miss. Second, a warp retains the same divergence behavior for long periods of execution. Third, requests going to the shared cache can incur queuing delays as large as hundreds of cycles, exacerbating the effects of memory divergence. We propose a set of techniques, collectively called Memory Divergence Correction (MeDiC), that reduce the negative performance impact of memory divergence and cache queuing. MeDiC delivers an average speedup of 21.8%, and 20.1% higher energy efficiency, over a state-of-the-art GPU cache management mechanism across 15 different GPGPU applications.
△ Less
Submitted 29 April, 2018;
originally announced April 2018.
-
CODA: Enabling Co-location of Computation and Data for Near-Data Processing
Authors:
Hyojong Kim,
Ramyad Hadidi,
Lifeng Nai,
Hyesoon Kim,
Nuwan Jayasena,
Yasuko Eckert,
Onur Kayiran,
Gabriel H. Loh
Abstract:
Recent studies have demonstrated that near-data processing (NDP) is an effective technique for improving performance and energy efficiency of data-intensive workloads. However, leveraging NDP in realistic systems with multiple memory modules introduces a new challenge. In today's systems, where no computation occurs in memory modules, the physical address space is interleaved at a fine granularity…
▽ More
Recent studies have demonstrated that near-data processing (NDP) is an effective technique for improving performance and energy efficiency of data-intensive workloads. However, leveraging NDP in realistic systems with multiple memory modules introduces a new challenge. In today's systems, where no computation occurs in memory modules, the physical address space is interleaved at a fine granularity among all memory modules to help improve the utilization of processor-memory interfaces by distributing the memory traffic. However, this is at odds with efficient use of NDP, which requires careful placement of data in memory modules such that near-data computations and their exclusively used data can be localized in individual memory modules, while distributing shared data among memory modules to reduce hotspots. In order to address this new challenge, we propose a set of techniques that (1) enable collections of OS pages to either be fine-grain interleaved among memory modules (as is done today) or to be placed contiguously on individual memory modules (as is desirable for NDP private data), and (2) decide whether to localize or distribute each memory object based on its anticipated access pattern and steer computations to the memory where the data they access is located. Our evaluations across a wide range of workloads show that the proposed mechanism improves performance by 31% and reduces 38% remote data accesses over a baseline system that cannot exploit computate-data affinity characteristics.
△ Less
Submitted 25 October, 2017;
originally announced October 2017.
-
Achieving both High Energy Efficiency and High Performance in On-Chip Communication using Hierarchical Rings with Deflection Routing
Authors:
Rachata Ausavarungnirun,
Chris Fallin,
Xiangyao Yu,
Kevin Kai-Wei Chang,
Greg Nazario,
Reetuparna Das,
Gabriel H. Loh,
Onur Mutlu
Abstract:
Hierarchical ring networks, which hierarchically connect multiple levels of rings, have been proposed in the past to improve the scalability of ring interconnects, but past hierarchical ring designs sacrifice some of the key benefits of rings by introducing more complex in-ring buffering and buffered flow control. Our goal in this paper is to design a new hierarchical ring interconnect that can ma…
▽ More
Hierarchical ring networks, which hierarchically connect multiple levels of rings, have been proposed in the past to improve the scalability of ring interconnects, but past hierarchical ring designs sacrifice some of the key benefits of rings by introducing more complex in-ring buffering and buffered flow control. Our goal in this paper is to design a new hierarchical ring interconnect that can maintain most of the simplicity of traditional ring designs (no in-ring buffering or buffered flow control) while achieving high scalability as more complex buffered hierarchical ring designs. Our design, called HiRD (Hierarchical Rings with Deflection), includes features that allow us to mostly maintain the simplicity of traditional simple ring topologies while providing higher energy efficiency and scalability. First, HiRD does not have any buffering or buffered flow control within individual rings, and requires only a small amount of buffering between the ring hierarchy levels. When inter-ring buffers are full, our design simply deflects flits so that they circle the ring and try again, which eliminates the need for in-ring buffering. Second, we introduce two simple mechanisms that provides an end-to-end delivery guarantee within the entire network without impacting the critical path or latency of the vast majority of network traffic. HiRD attains equal or better performance at better energy efficiency than multiple versions of both a previous hierarchical ring design and a traditional single ring design. We also analyze our design's characteristics and injection and delivery guarantees. We conclude that HiRD can be a compelling design point that allows higher energy efficiency and scalability while retaining the simplicity and appeal of conventional ring-based designs.
△ Less
Submitted 18 February, 2016;
originally announced February 2016.
-
Enabling Efficient Dynamic Resizing of Large DRAM Caches via A Hardware Consistent Hashing Mechanism
Authors:
Kevin K. Chang,
Gabriel H. Loh,
Mithuna Thottethodi,
Yasuko Eckert,
Mike O'Connor,
Srilatha Manne,
Lisa Hsu,
Lavanya Subramanian,
Onur Mutlu
Abstract:
Die-stacked DRAM has been proposed for use as a large, high-bandwidth, last-level cache with hundreds or thousands of megabytes of capacity. Not all workloads (or phases) can productively utilize this much cache space, however. Unfortunately, the unused (or under-used) cache continues to consume power due to leakage in the peripheral circuitry and periodic DRAM refresh. Dynamically adjusting the a…
▽ More
Die-stacked DRAM has been proposed for use as a large, high-bandwidth, last-level cache with hundreds or thousands of megabytes of capacity. Not all workloads (or phases) can productively utilize this much cache space, however. Unfortunately, the unused (or under-used) cache continues to consume power due to leakage in the peripheral circuitry and periodic DRAM refresh. Dynamically adjusting the available DRAM cache capacity could largely eliminate this energy overhead. However, the current proposed DRAM cache organization introduces new challenges for dynamic cache resizing. The organization differs from a conventional SRAM cache organization because it places entire cache sets and their tags within a single bank to reduce on-chip area and power overhead. Hence, resizing a DRAM cache requires remapping sets from the powered-down banks to active banks.
In this paper, we propose CRUNCH (Cache Resizing Using Native Consistent Hashing), a hardware data remapping scheme inspired by consistent hashing, an algorithm originally proposed to uniformly and dynamically distribute Internet traffic across a changing population of web servers. CRUNCH provides a load-balanced remapping of data from the powered-down banks alone to the active banks, without requiring sets from all banks to be remapped, unlike naive schemes to achieve load balancing. CRUNCH remaps only sets from the powered-down banks, so it achieves this load balancing with low bank power-up/down transition latencies. CRUNCH's combination of good load balancing and low transition latencies provides a substrate to enable efficient DRAM cache resizing.
△ Less
Submitted 1 February, 2016;
originally announced February 2016.