Skip to main content

Showing 1–50 of 67 results for author: Kung, H

.
  1. arXiv:2504.09072  [pdf, other

    cs.AR cs.LG

    MGS: Markov Greedy Sums for Accurate Low-Bitwidth Floating-Point Accumulation

    Authors: Vikas Natesh, H. T. Kung, David Kong

    Abstract: We offer a novel approach, MGS (Markov Greedy Sums), to improve the accuracy of low-bitwidth floating-point dot products in neural network computations. In conventional 32-bit floating-point summation, adding values with different exponents may lead to loss of precision in the mantissa of the smaller term, which is right-shifted to align with the larger term's exponent. Such shifting (a.k.a. 'swam… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  2. arXiv:2504.09064  [pdf, other

    cs.LG cs.AI

    PQS (Prune, Quantize, and Sort): Low-Bitwidth Accumulation of Dot Products in Neural Network Computations

    Authors: Vikas Natesh, H. T. Kung

    Abstract: We present PQS, which uses three techniques together - Prune, Quantize, and Sort - to achieve low-bitwidth accumulation of dot products in neural network computations. In conventional quantized (e.g., 8-bit) dot products, partial results are accumulated into wide (e.g., 32-bit) accumulators to avoid overflows when accumulating intermediate partial sums. However, such wide accumulators increase mem… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  3. arXiv:2504.00254  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    ElaLoRA: Elastic & Learnable Low-Rank Adaptation for Efficient Model Fine-Tuning

    Authors: Huandong Chang, Zicheng Ma, Mingyuan Ma, Zhenting Qi, Andrew Sabot, Hong Jiang, H. T. Kung

    Abstract: Low-Rank Adaptation (LoRA) has become a widely adopted technique for fine-tuning large-scale pre-trained models with minimal parameter updates. However, existing methods rely on fixed ranks or focus solely on either rank pruning or expansion, failing to adapt ranks dynamically to match the importance of different layers during training. In this work, we propose ElaLoRA, an adaptive low-rank adapta… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

  4. arXiv:2503.12099  [pdf, other

    quant-ph cond-mat.supr-con

    Automatic Characterization of Fluxonium Superconducting Qubits Parameters with Deep Transfer Learning

    Authors: Huan-Hsuan Kung, Chen-Yu Liu, Qian-Rui Lee, Chiang-Yuan Hu, Yu-Chi Chang, Ching-Yeh Chen, Daw-Wei Wang, Yen-Hsiang Lin

    Abstract: Accurate determination of qubit parameters is critical for the successful implementation of quantum information and computation applications. In solid state systems, the parameters of individual qubits vary across the entire system, requiring time consuming measurements and manual fitting processes for characterization. Recent developed superconducting qubits, such as fluxonium or 0-pi qubits, off… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

    Comments: 11 pages, 7 figures

  5. arXiv:2503.08478  [pdf, other

    cs.CV

    NullFace: Training-Free Localized Face Anonymization

    Authors: Han-Wei Kung, Tuomas Varanka, Terence Sim, Nicu Sebe

    Abstract: Privacy concerns around ever increasing number of cameras are increasing in today's digital age. Although existing anonymization methods are able to obscure identity information, they often struggle to preserve the utility of the images. In this work, we introduce a training-free method for face anonymization that preserves key non-identity-related attributes. Our approach utilizes a pre-trained t… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

  6. arXiv:2412.02884  [pdf, other

    cs.NE

    Were You Helpful -- Predicting Helpful Votes from Amazon Reviews

    Authors: Emin Kirimlioglu, Harrison Kung, Dominic Orlando

    Abstract: This project investigates factors that influence the perceived helpfulness of Amazon product reviews through machine learning techniques. After extensive feature analysis and correlation testing, we identified key metadata characteristics that serve as strong predictors of review helpfulness. While we initially explored natural language processing approaches using TextBlob for sentiment analysis,… ▽ More

    Submitted 3 December, 2024; originally announced December 2024.

  7. arXiv:2411.04335  [pdf, other

    cs.CV

    GazeGen: Gaze-Driven User Interaction for Visual Content Generation

    Authors: He-Yen Hsieh, Ziyun Li, Sai Qian Zhang, Wei-Te Mark Ting, Kao-Den Chang, Barbara De Salvo, Chiao Liu, H. T. Kung

    Abstract: We present GazeGen, a user interaction system that generates visual content (images and videos) for locations indicated by the user's eye gaze. GazeGen allows intuitive manipulation of visual content by targeting regions of interest with gaze. Using advanced techniques in object detection and generative AI, GazeGen performs gaze-controlled image adding/deleting, repositioning, and surface style ch… ▽ More

    Submitted 17 November, 2024; v1 submitted 6 November, 2024; originally announced November 2024.

    Comments: 12 pages, 10 figures

  8. arXiv:2411.00762  [pdf, other

    cs.CV cs.CR

    Face Anonymization Made Simple

    Authors: Han-Wei Kung, Tuomas Varanka, Sanjay Saha, Terence Sim, Nicu Sebe

    Abstract: Current face anonymization techniques often depend on identity loss calculated by face recognition models, which can be inaccurate and unreliable. Additionally, many methods require supplementary data such as facial landmarks and masks to guide the synthesis process. In contrast, our approach uses diffusion models with only a reconstruction loss, eliminating the need for facial landmarks or masks… ▽ More

    Submitted 1 November, 2024; originally announced November 2024.

  9. arXiv:2410.21730  [pdf, other

    cs.AR cs.AI cs.ET cs.LG

    Efficient Reprogramming of Memristive Crossbars for DNNs: Weight Sorting and Bit Stucking

    Authors: Matheus Farias, H. T. Kung

    Abstract: We introduce a novel approach to reduce the number of times required for reprogramming memristors on bit-sliced compute-in-memory crossbars for deep neural networks (DNNs). Our idea addresses the limited non-volatile memory endurance, which restrict the number of times they can be reprogrammed. To reduce reprogramming demands, we employ two techniques: (1) we organize weights into sorted section… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

    Comments: 5 pages, 10 figures

  10. arXiv:2410.11298  [pdf, other

    cs.AR cs.AI cs.ET cs.LG

    Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory Crossbars

    Authors: Matheus Farias, H. T. Kung

    Abstract: We introduce $\textit{sorted weight sectioning}$ (SWS): a weight allocation algorithm that places sorted deep neural network (DNN) weight sections on bit-sliced compute-in-memory (CIM) crossbars to reduce analog-to-digital converter (ADC) energy consumption. Data conversions are the most energy-intensive process in crossbar operation. SWS effectively reduces this cost leveraging (1) small weights… ▽ More

    Submitted 29 October, 2024; v1 submitted 15 October, 2024; originally announced October 2024.

    Comments: 5 pages, 4 figures

  11. arXiv:2407.20175  [pdf, other

    cs.CV

    Towards Localized Fine-Grained Control for Facial Expression Generation

    Authors: Tuomas Varanka, Huai-Qian Khor, Yante Li, Mengting Wei, Hanwei Kung, Nicu Sebe, Guoying Zhao

    Abstract: Generative models have surged in popularity recently due to their ability to produce high-quality images and video. However, steering these models to produce images with specific attributes and precise control remains challenging. Humans, particularly their faces, are central to content generation due to their ability to convey rich expressions and intent. Current generative models mostly generate… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  12. arXiv:2407.08798  [pdf, other

    cond-mat.mtrl-sci

    Electronically-driven switching of topology in LaSbTe

    Authors: J. Bannies, M. Michiardi, H. -H. Kung, S. Godin, J. W. Simonson, M. Oudah, M. Zonno, S. Gorovikov, S. Zhdanovich, I. S. Elfimov, A. Damascelli, M. C. Aronson

    Abstract: In the past two decades, various classes of topological materials have been discovered, spanning topological insulators, semimetals, and metals. While the observation and understanding of the topology of a material has been a primary focus so far, the precise and easy control of topology in a single material remains largely unexplored. Here, we demonstrate full experimental control over the topolo… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  13. arXiv:2403.16451  [pdf, other

    cs.LG cs.AI

    DeepMachining: Online Prediction of Machining Errors of Lathe Machines

    Authors: Xiang-Li Lu, Hwai-Jung Hsu, Che-Wei Chou, H. T. Kung, Chen-Hsin Lee, Sheng-Mao Cheng

    Abstract: We describe DeepMachining, a deep learning-based AI system for online prediction of machining errors of lathe machine operations. We have built and evaluated DeepMachining based on manufacturing data from factories. Specifically, we first pretrain a deep learning model for a given lathe machine's operations to learn the salient features of machining states. Then, we fine-tune the pretrained model… ▽ More

    Submitted 28 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  14. arXiv:2402.15504  [pdf, other

    cs.CV cs.AI

    Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

    Authors: Chun-Hsiao Yeh, Ta-Ying Cheng, He-Yen Hsieh, Chuan-En Lin, Yi Ma, Andrew Markham, Niki Trigoni, H. T. Kung, Yubei Chen

    Abstract: Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts (e.g., their own pets or specific items) with just a few examples for training. This paper tackles two interconnected issues within this realm of personalizing text-to-image diffusion models. First, current personalization techniques fail to reliably extend to multiple concepts --… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: Preprint; Project Page: https://danielchyeh.github.io/Gen4Gen/

  15. arXiv:2311.11402  [pdf, ps, other

    cond-mat.supr-con cond-mat.mtrl-sci

    Discovery of Superconductivity and Electron-Phonon Drag in the Non-Centrosymmetric Weyl Semimetal LaRhGe$_3$

    Authors: Mohamed Oudah, Hsiang-Hsi Kung, Samikshya Sahu, Niclas Heinsdorf, Armin Schulz, Kai Philippi, Marta-Villa De Toro Sanchez, Yipeng Cai, Kenji Kojima, Andreas P. Schnyder, Hidenori Takagi, Bernhard Keimer, Doug A. Bonn, Alannah M. Hallas

    Abstract: We present an exploration of the effect of electron-phonon coupling and broken inversion symmetry on the electronic and thermal properties of the semimetal LaRhGe$_3$. Our transport measurements reveal evidence for electron-hole compensation at low temperatures, resulting in a large magnetoresistance of 3000% at 1.8 K and 14 T. The carrier concentration is on the order of $10^{21}\rm{/cm}^3$ with… ▽ More

    Submitted 29 May, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

  16. Spin-Mediated Direct Photon Scattering by Plasmons in BiTeI

    Authors: A. C. Lee, S. Sarkar, K. Du, H. -H. Kung, C. J. Won, K. Wang, S. -W. Cheong, S. Maiti, G. Blumberg

    Abstract: We use polarization resolved Raman spectroscopy to demonstrate that for a 3D giant Rashba system the bulk plasmon collective mode can directly couple to the Raman response even in the long wavelength $\mathbf q \rightarrow 0$ limit. Although conventional theory predicts the plasmon spectral weight to be suppressed as the square of its quasi-momentum and thus negligibly weak in the Raman spectra, w… ▽ More

    Submitted 18 February, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

    Comments: Editors' Suggestion

    Journal ref: Phys. Rev. B 109, L041111 (2024)

  17. arXiv:2310.03170  [pdf

    cond-mat.supr-con

    Critical Role of Disorder for Superconductivity in the Series of Epitaxial Ti(O,N) Films

    Authors: Fengmiao Li, Oliver Dicks, Myung-Geun Han, Solveig Aamlid, Giorgio Levy, Ronny Sutarto, Chong Liu, Hsiang-Hsi Kung, Oleksandr Foyevstov, Simon Godin, Bruce A. Davidson, Andrea Damascelli, Yimei Zhu, Christoph Heil, Ilya Elfimov, George A. Sawatzky, Ke Zou

    Abstract: Realizing experimental control of superconductivity is of paramount importance to advancing both basic research and technological applications. Disorder, generally existing in most superconductors, intricately interacts with Cooper pairs and also impacts the performance of quantum devices. In this paper, we report the study of a series of Ti(O,N) crystalline films prepared via molecular beam epita… ▽ More

    Submitted 24 November, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  18. arXiv:2307.03930  [pdf, other

    cs.LG cs.AR cs.PF cs.PL

    Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels

    Authors: Vikas Natesh, Andrew Sabot, H. T. Kung, Mark Ting

    Abstract: We propose Rosko -- row skipping outer products -- for deriving sparse matrix multiplication (SpMM) kernels in reducing computation and memory access requirements of deep neural networks (DNNs). Rosko allows skipping of entire row computations during program execution with low sparsity-management overheads. We analytically derive sparse CPU kernels that adapt to given hardware characteristics to e… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Rosko's CPU implementation can be found at https://github.com/vnatesh/Rosko

  19. Electronic and Vibrational Excitations on the Surface of the Three-Dimensional Topological Insulator Bi$_2$Te$_{3-x}$Se$_{x}$ (x = 0, 2, 3)

    Authors: A. Lee, H. -H. Kung, Xueyun Wang, S. -W. Cheong, G. Blumberg

    Abstract: We study surface states in the three-dimensional topological insulators Bi$_2$Te$_{3-x}$Se$_{x}$ (x = 0, 2, 3) by polarization resolved resonant Raman spectroscopy. By tracking the spectral intensity of the surface phonon modes with respect to the incident photon energy, we show that the surface phonons are qualitatively similar to their bulk counterparts. Using the resonant Raman excitation profi… ▽ More

    Submitted 14 January, 2024; v1 submitted 27 May, 2023; originally announced May 2023.

  20. arXiv:2304.05544  [pdf, other

    cs.LG cs.AR cs.PF cs.PL

    MEMA Runtime Framework: Minimizing External Memory Accesses for TinyML on Microcontrollers

    Authors: Andrew Sabot, Vikas Natesh, H. T. Kung, Wei-Te Ting

    Abstract: We present the MEMA framework for the easy and quick derivation of efficient inference runtimes that minimize external memory accesses for matrix multiplication on TinyML systems. The framework accounts for hardware resource constraints and problem sizes in analytically determining optimized schedules and kernels that minimize memory accesses. MEMA provides a solution to a well-known problem in th… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Comments: Accepted as a full paper by the TinyML Research Symposium 2023

  21. arXiv:2301.01947  [pdf, ps, other

    cs.LG cs.AI cs.CV

    StitchNet: Composing Neural Networks from Pre-Trained Fragments

    Authors: Surat Teerapittayanon, Marcus Comiter, Brad McDanel, H. T. Kung

    Abstract: We propose StitchNet, a novel neural network creation paradigm that stitches together fragments (one or more consecutive network layers) from multiple pre-trained neural networks. StitchNet allows the creation of high-performing neural networks without the large compute and data requirements needed under traditional model creation processes via backpropagation training. We leverage Centered Kernel… ▽ More

    Submitted 23 September, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  22. arXiv:2209.12127  [pdf, other

    cs.LG

    SpeedLimit: Neural Architecture Search for Quantized Transformer Models

    Authors: Yuji Chai, Luke Bailey, Yunho Jin, Matthew Karle, Glenn G. Ko, David Brooks, Gu-Yeon Wei, H. T. Kung

    Abstract: While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an u… ▽ More

    Submitted 13 October, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

  23. arXiv:2207.09413  [pdf, other

    cs.LG cs.AI cs.CV cs.DC

    SphereFed: Hyperspherical Federated Learning

    Authors: Xin Dong, Sai Qian Zhang, Ang Li, H. T. Kung

    Abstract: Federated Learning aims at training a global model from multiple decentralized devices (i.e. clients) without exchanging their private local data. A key challenge is the handling of non-i.i.d. (independent identically distributed) data across multiple clients that may induce disparities of their local features. We introduce the Hyperspherical Federated Learning (SphereFed) framework to address the… ▽ More

    Submitted 19 July, 2022; originally announced July 2022.

    Comments: European Conference on Computer Vision 2022

  24. arXiv:2204.04705  [pdf, other

    cs.LG cs.AI cs.DC

    SplitNets: Designing Neural Architectures for Efficient Distributed Computing on Head-Mounted Systems

    Authors: Xin Dong, Barbara De Salvo, Meng Li, Chiao Liu, Zhongnan Qu, H. T. Kung, Ziyun Li

    Abstract: We design deep neural networks (DNNs) and corresponding networks' splittings to distribute DNNs' workload to camera sensors and a centralized aggregator on head mounted devices to meet system performance targets in inference accuracy and latency under the given hardware resource constraints. To achieve an optimal balance among computation, communication, and performance, a split-aware neural archi… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022

  25. arXiv:2202.09642  [pdf, other

    cond-mat.str-el

    Anisotropy of Kondo-lattice coherence in momentum space for CeCoIn5

    Authors: Mai Ye, Hsiang-Hsi Kung, Priscila F. S. Rosa, Eric D. Bauer, Kristjan Haule, Girsh Blumberg

    Abstract: We study the electronic and phononic excitations of heavy-fermion metal CeCoIn$_5$ by polarization-resolved Raman spectroscopy to explore the Kondo-lattice coherence. Below the coherence temperature T*\,=\,45\,K, the continuum of electronic excitations in the XY scattering geometry is suppressed at frequencies below 50\,cm$^{-1}$, whereas the low-frequency continuum in the X'Y' geometry exhibits n… ▽ More

    Submitted 19 February, 2022; originally announced February 2022.

  26. Chiral Electronic Excitations in a Quasi-2D Rashba System BiTeI

    Authors: A. C. Lee, B. Peng, K. Du, H. -H. Kung, B. Monserrat, S. -W. Cheong, C. J. Won, G. Blumberg

    Abstract: The optical transitions between spin-polarized bands of the quasi-two dimensional Rashba system BiTeI are investigated using polarization resolved resonant Raman spectroscopy. We detect chiral excitations between states with opposite helicity and compare spectra to calculations within a three-band model. Using the resonant Raman excitation profile, we deduce the Rashba parameters and band gaps of… ▽ More

    Submitted 25 April, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

  27. arXiv:2110.15456  [pdf, other

    cs.LG cs.AR

    FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

    Authors: Sai Qian Zhang, Bradley McDanel, H. T. Kung

    Abstract: Block Floating Point (BFP) can efficiently support quantization for Deep Neural Network (DNN) training by providing a wide dynamic range via a shared exponent across a group of values. In this paper, we propose a Fast First, Accurate Second Training (FAST) system for DNNs, where the weights, activations, and gradients are represented in BFP. FAST supports matrix multiplication with variable precis… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

  28. arXiv:2107.06304  [pdf, other

    cs.LG cs.CV

    Privacy Vulnerability of Split Computing to Data-Free Model Inversion Attacks

    Authors: Xin Dong, Hongxu Yin, Jose M. Alvarez, Jan Kautz, Pavlo Molchanov, H. T. Kung

    Abstract: Mobile edge devices see increased demands in deep neural networks (DNNs) inference while suffering from stringent constraints in computing resources. Split computing (SC) emerges as a popular approach to the issue by executing only initial layers on devices and offloading the remaining to the cloud. Prior works usually assume that SC offers privacy benefits as only intermediate features, instead o… ▽ More

    Submitted 24 October, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: A new data-free inversion method to reverse neural networks and get input from intermediate feature maps. BMVC'22

  29. arXiv:2106.11423  [pdf, other

    cs.CV cs.GR

    Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement

    Authors: Huiwen Luo, Koki Nagano, Han-Wei Kung, Mclean Goldwhite, Qingguo Xu, Zejian Wang, Lingyu Wei, Liwen Hu, Hao Li

    Abstract: We introduce a highly robust GAN-based framework for digitizing a normalized 3D avatar of a person from a single unconstrained photo. While the input image can be of a smiling person or taken in extreme lighting conditions, our method can reliably produce a high-quality textured model of a person's face in neutral expression and skin textures under diffuse lighting condition. Cutting-edge 3D face… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: Accepted to CVPR 2021

  30. arXiv:2105.09320  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci

    Optical manipulation of Rashba-split 2-Dimensional Electron Gas

    Authors: M. Michiardi, F. Boschini, H. -H. Kung, M. X. Na, S. K. Y. Dufresne, A. Currie, G. Levy, S. Zhdanovich, A. K. Mills, D. J. Jones, J. L. Mi, B. B. Iversen, Ph. Hofmann, A. Damascelli

    Abstract: In spintronic devices, the two main approaches to actively control the electrons' spin degree of freedom involve either static magnetic or electric fields. An alternative avenue relies on the application of optical fields to generate spin currents, which promises to bolster spin-device performance allowing for significantly faster and more efficient spin logic. To date, research has mainly focused… ▽ More

    Submitted 2 June, 2022; v1 submitted 19 May, 2021; originally announced May 2021.

    Journal ref: Nature Communications 13, 3096 (2022)

  31. arXiv:2104.11408  [pdf, other

    cs.LG

    Neural Mean Discrepancy for Efficient Out-of-Distribution Detection

    Authors: Xin Dong, Junfeng Guo, Ang Li, Wei-Te Ting, Cong Liu, H. T. Kung

    Abstract: Various approaches have been proposed for out-of-distribution (OOD) detection by augmenting models, input examples, training sets, and optimization objectives. Deviating from existing work, we have a simple hypothesis that standard off-the-shelf models may already contain sufficient information about the training set distribution which can be leveraged for reliable OOD detection. Our empirical stu… ▽ More

    Submitted 26 March, 2022; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022

  32. Extremely large magnetoresistance from electron-hole compensation in the nodal loop semimetal ZrP$_2$

    Authors: J. Bannies, E. Razzoli, M. Michiardi, H. -H. Kung, I. S. Elfimov, M. Yao, A. Fedorov, J. Fink, C. Jozwiak, A. Bostwick, E. Rotenberg, A. Damascelli, C. Felser

    Abstract: Several early transition metal dipnictides have been found to host topological semimetal states and exhibit large magnetoresistance. In this study, we use angle-resolved photoemission spectroscopy (ARPES) and magneto-transport to study the electronic properties of a new transition metal dipnictide ZrP$_2$. We find that ZrP$_2$ exhibits an extremely large and unsaturated magnetoresistance of up to… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: Accepted for publication in Physical Review B

    Journal ref: Phys. Rev. B 103, 155144 (2021)

  33. arXiv:2007.06389  [pdf, other

    cs.CV cs.LG

    Term Revealing: Furthering Quantization at Run Time on Quantized DNNs

    Authors: H. T. Kung, Bradley McDanel, Sai Qian Zhang

    Abstract: We present a novel technique, called Term Revealing (TR), for furthering quantization at run time for improved performance of Deep Neural Networks (DNNs) already quantized with conventional quantization methods. TR operates on power-of-two terms in binary expressions of values. In computing a dot-product computation, TR dynamically selects a fixed number of largest terms to use from the values of… ▽ More

    Submitted 26 July, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: 13 pages, 19 figures, 4 tables, To appear in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020 Update: Revised writing/figures and added more references for Section IV Update: Revised Section IV writing/figures and added additional references on signed digit representations

  34. arXiv:1907.08377  [pdf, other

    cs.LG cs.AI cs.CR

    DaiMoN: A Decentralized Artificial Intelligence Model Network

    Authors: Surat Teerapittayanon, H. T. Kung

    Abstract: We introduce DaiMoN, a decentralized artificial intelligence model network, which incentivizes peer collaboration in improving the accuracy of machine learning models for a given classification problem. It is an autonomous network where peers may submit models with improved accuracy and other peers may verify the accuracy improvement. The system maintains an append-only decentralized ledger to kee… ▽ More

    Submitted 19 July, 2019; originally announced July 2019.

    Comments: 2019 IEEE International Conference on Blockchain

  35. arXiv:1906.07148  [pdf, other

    cs.LG cs.CR stat.ML

    CheckNet: Secure Inference on Untrusted Devices

    Authors: Marcus Comiter, Surat Teerapittayanon, H. T. Kung

    Abstract: We introduce CheckNet, a method for secure inference with deep neural networks on untrusted devices. CheckNet is like a checksum for neural network inference: it verifies the integrity of the inference computation performed by untrusted devices to 1) ensure the inference has actually been performed, and 2) ensure the inference has not been manipulated by an attacker. CheckNet is completely transpa… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

  36. arXiv:1905.00462  [pdf, other

    cs.LG

    Full-stack Optimization for Accelerating CNNs with FPGA Validation

    Authors: Bradley McDanel, Sai Qian Zhang, H. T. Kung, Xin Dong

    Abstract: We present a full-stack optimization framework for accelerating inference of CNNs (Convolutional Neural Networks) and validate the approach with field-programmable gate arrays (FPGA) implementations. By jointly optimizing CNN models, computing architectures, and hardware implementations, our full-stack approach achieves unprecedented performance in the trade-off space characterized by inference la… ▽ More

    Submitted 1 May, 2019; originally announced May 2019.

  37. arXiv:1903.01999  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci

    Observation of Chiral Surface Excitons in a Topological Insulator Bi$_2$Se$_3$

    Authors: H. -H. Kung, A. P. Goyal, D. L. Maslov, X. Wang, A. Lee, A. F. Kemper, S. -W. Cheong, G. Blumberg

    Abstract: The protected electron states at the boundaries or on the surfaces of topological insulators (TIs) have been the subject of intense theoretical and experimental investigations. Such states are enforced by very strong spin-orbit interaction in solids composed of heavy elements. Here, we study the composite particles -- chiral excitons -- formed by the Coulomb attraction between electrons and holes… ▽ More

    Submitted 5 March, 2019; originally announced March 2019.

    Comments: 22 pages, 11 figures

    Journal ref: Proceedings of the National Academy of Sciences Feb 2019, 116 (10) 4006-4011

  38. Raman spectroscopy of $f$-electron metals: an example of CeB$_{6}$

    Authors: Mai Ye, H. -H. Kung, Priscila F. S. Rosa, Eric D. Bauer, Zachary Fisk, G. Blumberg

    Abstract: We performed an optical spectroscopy study of electronic and magnetic excitations for a rare-earth system with a single electron quasi-localized in the f-shell on an ion at high-symmetry crystallographic site in application to CeB$_{6}$ heavy-fermion metal. We carried out group-theoretical classification of the electronic crystal field (CF) transitions and assessed their coupling to light cross-se… ▽ More

    Submitted 23 June, 2019; v1 submitted 24 February, 2019; originally announced February 2019.

    Journal ref: Phys. Rev. Materials 3, 065003 (2019)

  39. arXiv:1812.05083  [pdf, other

    cs.CV cs.CL cs.LG

    Adversarial Learning of Semantic Relevance in Text to Image Synthesis

    Authors: Miriam Cha, Youngjune L. Gwon, H. T. Kung

    Abstract: We describe a new approach that improves the training of generative adversarial nets (GANs) for synthesizing diverse images from a text input. Our approach is based on the conditional version of GANs and expands on previous work leveraging an auxiliary task in the discriminator. Our generated images are not limited to certain classes and do not suffer from mode collapse while semantically matching… ▽ More

    Submitted 5 February, 2019; v1 submitted 12 December, 2018; originally announced December 2018.

  40. arXiv:1811.04770  [pdf, other

    cs.LG cs.AR stat.ML

    Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

    Authors: H. T. Kung, Bradley McDanel, Sai Qian Zhang

    Abstract: This paper describes a novel approach of packing sparse convolutional neural networks for their efficient systolic array implementations. By combining subsets of columns in the original filter matrix associated with a convolutional layer, we increase the utilization efficiency of the systolic array substantially (e.g., ~4x) due to the increased density of nonzeros in the resulting packed filter ma… ▽ More

    Submitted 7 November, 2018; originally announced November 2018.

    Comments: To appear in ASPLOS 2019

  41. arXiv:1809.09467  [pdf, other

    cond-mat.str-el cond-mat.mtrl-sci

    Intrinsic Insulating Ground State in Transition Metal Dichalcogenide TiSe2

    Authors: Daniel J. Campbell, Chris Eckberg, Peter Y. Zavalij, Hsiang-Hsi Kung, Elia Razzoli, Matteo Michiardi, Chris Jozwiak, Aaron Bostwick, Eli Rotenberg, Andrea Damascelli, Johnpierre Paglione

    Abstract: The transition metal dichalcogenide TiSe$_2$ has received significant research attention over the past four decades. Different studies have presented ways to suppress the 200~K charge density wave transition, vary low temperature resistivity by several orders of magnitude, and stabilize magnetism or superconductivity. Here we give the results of a new synthesis technique whereby samples were grown… ▽ More

    Submitted 17 February, 2019; v1 submitted 25 September, 2018; originally announced September 2018.

    Comments: 11 pages, 7 figures

    Journal ref: Phys. Rev. Materials 3, 053402 (2019)

  42. arXiv:1806.07467  [pdf, other

    cs.GR

    HairNet: Single-View Hair Reconstruction using Convolutional Neural Networks

    Authors: Yi Zhou, Liwen Hu, Jun Xing, Weikai Chen, Han-Wei Kung, Xin Tong, Hao Li

    Abstract: We introduce a deep learning-based method to generate full 3D hair geometry from an unconstrained image. Our method can recover local strand details and has real-time performance. State-of-the-art hair modeling techniques rely on large hairstyle collections for nearest neighbor retrieval and then perform ad-hoc refinement. Our deep learning approach, in contrast, is highly efficient in storage and… ▽ More

    Submitted 10 July, 2018; v1 submitted 19 June, 2018; originally announced June 2018.

    Comments: 21 pages, 17 figures

  43. arXiv:1802.03373  [pdf, other

    cs.NI

    InferBeam: A Fast Beam Alignment Protocol for Millimeter-wave Networking

    Authors: Sai Qian Zhang, H. T. Kung, Youngjune Gwon

    Abstract: We introduce fast millimeter-wave base station (BS) and its antenna sector selection for user equipment based on its location. Using a conditional random field inference model with specially designed parameters, which are robust to change of environment, InferBeam allows the use of measurement samples on best beam selection at a small number of locations to infer the rest dynamically. Compared to… ▽ More

    Submitted 5 March, 2018; v1 submitted 9 February, 2018; originally announced February 2018.

  44. arXiv:1712.06066  [pdf, other

    cond-mat.supr-con

    On the origin of critical nematic fluctuations in pnictide superconductors

    Authors: S. -F. Wu, W. -L. Zhang, L. Li, H. B. Cao, H. -H. Kung, A. S. Sefat, H. Ding, P. Richard, G. Blumberg

    Abstract: We employ polarization-resolved Raman spectroscopy to study critical nematic fluctuations in Ba(Fe$_{1-x}$Au$_x$)$_2$As$_2$ superconductors above and across well separated tetragonal to orthorhombic phase transition at temperature $T_S(x)$ and the Néel transition at $T_N(x)$. The static Raman susceptibility in $XY$ symmetry channel increases upon cooling from room temperature following the Curie-W… ▽ More

    Submitted 17 December, 2017; originally announced December 2017.

  45. Anomalous magneto-elastic coupling in Au-doped BaFe2As2

    Authors: S. -F. Wu, W. -L. Zhang, L. Li, H. -B. Cao, H. -H. Kung, A. S. Sefat, H. Ding, P. Richard, G. Blumberg

    Abstract: We used polarization-resolved Raman scattering to study magneto-elastic coupling in Ba(Fe$_{1-x}$Au$_{x}$)$_2$As$_2$ crystals as a function of light Au-doping, materials for which temperatures of the structural transition ($T_S$) and of the magnetic ordering transition ($T_N$) split. We study the appearance of the $A_g$(As)phonon intensity in the $XY$ scattering geometry that is very weak just bel… ▽ More

    Submitted 5 December, 2017; originally announced December 2017.

    Journal ref: Phys. Rev. B 102, 014501 (2020)

  46. arXiv:1710.07830  [pdf, other

    cs.LG cs.CV stat.ML

    Incomplete Dot Products for Dynamic Computation Scaling in Neural Network Inference

    Authors: Bradley McDanel, Surat Teerapittayanon, H. T. Kung

    Abstract: We propose the use of incomplete dot products (IDP) to dynamically adjust the number of input channels used in each layer of a convolutional neural network during feedforward inference. IDP adds monotonically non-increasing coefficients, referred to as a "profile", to the channels during training. The profile orders the contribution of each channel in non-increasing order. At inference time, the n… ▽ More

    Submitted 21 October, 2017; originally announced October 2017.

  47. arXiv:1709.02260  [pdf, other

    cs.CV cs.LG

    Embedded Binarized Neural Networks

    Authors: Bradley McDanel, Surat Teerapittayanon, H. T. Kung

    Abstract: We study embedded Binarized Neural Networks (eBNNs) with the aim of allowing current binarized neural networks (BNNs) in the literature to perform feedforward inference efficiently on small embedded devices. We focus on minimizing the required memory footprint, given that these devices often have memory as small as tens of kilobytes (KB). Beyond minimizing the memory required to store weights, as… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.

  48. arXiv:1709.01921  [pdf, other

    cs.CV cs.DC

    Distributed Deep Neural Networks over the Cloud, the Edge and End Devices

    Authors: Surat Teerapittayanon, Bradley McDanel, H. T. Kung

    Abstract: We propose distributed deep neural networks (DDNNs) over distributed computing hierarchies, consisting of the cloud, the edge (fog) and end devices. While being able to accommodate inference of a deep neural network (DNN) in the cloud, a DDNN also allows fast and localized inference using shallow portions of the neural network at the edge and end devices. When supported by a scalable distributed c… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.

  49. arXiv:1709.01888  [pdf, other

    cs.CL cs.LG

    Language Modeling by Clustering with Word Embeddings for Text Readability Assessment

    Authors: Miriam Cha, Youngjune Gwon, H. T. Kung

    Abstract: We present a clustering-based language model using word embeddings for text readability prediction. Presumably, an Euclidean semantic space hypothesis holds true for word embeddings whose training is done by observing word co-occurrences. We argue that clustering with word embeddings in the metric space should yield feature representations in a higher semantic space appropriate for text regression… ▽ More

    Submitted 4 September, 2017; originally announced September 2017.

  50. arXiv:1709.01686  [pdf, other

    cs.NE cs.CV cs.LG

    BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks

    Authors: Surat Teerapittayanon, Bradley McDanel, H. T. Kung

    Abstract: Deep neural networks are state of the art methods for many learning tasks due to their ability to extract increasingly better features at each network layer. However, the improved performance of additional layers in a deep network comes at the cost of added latency and energy usage in feedforward inference. As networks continue to get deeper and larger, these costs become more prohibitive for real… ▽ More

    Submitted 6 September, 2017; originally announced September 2017.