-
Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model
Authors:
Mu-Chi Chen,
Po-Hsuan Huang,
Xiangrui Ke,
Chia-Heng Tu,
Chun Jason Xue,
Shih-Hao Hung
Abstract:
Large Language Models (LLMs) have revolutionized Artificial Intelligence (AI) with significant advancements such as OpenAI's ChatGPT, Meta's Llama, and Databricks' DBRX. This paper addresses the cost and scalability challenges encountered when constructing private LLM systems for personal or small group services, as aimed by Apple Intelligence. A Mac Studio cluster with Apple's M2 Ultra chips is e…
▽ More
Large Language Models (LLMs) have revolutionized Artificial Intelligence (AI) with significant advancements such as OpenAI's ChatGPT, Meta's Llama, and Databricks' DBRX. This paper addresses the cost and scalability challenges encountered when constructing private LLM systems for personal or small group services, as aimed by Apple Intelligence. A Mac Studio cluster with Apple's M2 Ultra chips is established as a cost-efficient solution to host and accelerate the pretrained DBRX model with the Mixture-of-Experts (MoE) architecture. Our performance analysis reveal that parallel execution of the model's experts across two to four machine nodes significantly reduces inference time. We find that computation time for the experts is comparable to the communication time for exchanging their outputs, emphasizing the importance of network latency over bandwidth. We also observe significant management overhead due to Apple software stack's memory management logic. Based on these findings, we develop optimization schemes to eliminate the memory management overhead. As a result, the Mac Studio cluster is 1.15 times more cost-efficient than the state-of-the-art AI supercomputer with NVIDIA H100 GPUs. In addition, we construct a performance model to estimate system performance under varying configurations, and the model provides valuable insights for designing private LLM systems.
△ Less
Submitted 30 June, 2025;
originally announced June 2025.
-
TranslationCorrect: A Unified Framework for Machine Translation Post-Editing with Predictive Error Assistance
Authors:
Syed Mekael Wasti,
Shou-Yi Hung,
Christopher Collins,
En-Shiun Annie Lee
Abstract:
Machine translation (MT) post-editing and research data collection often rely on inefficient, disconnected workflows. We introduce TranslationCorrect, an integrated framework designed to streamline these tasks. TranslationCorrect combines MT generation using models like NLLB, automated error prediction using models like XCOMET or LLM APIs (providing detailed reasoning), and an intuitive post-editi…
▽ More
Machine translation (MT) post-editing and research data collection often rely on inefficient, disconnected workflows. We introduce TranslationCorrect, an integrated framework designed to streamline these tasks. TranslationCorrect combines MT generation using models like NLLB, automated error prediction using models like XCOMET or LLM APIs (providing detailed reasoning), and an intuitive post-editing interface within a single environment. Built with human-computer interaction (HCI) principles in mind to minimize cognitive load, as confirmed by a user study. For translators, it enables them to correct errors and batch translate efficiently. For researchers, TranslationCorrect exports high-quality span-based annotations in the Error Span Annotation (ESA) format, using an error taxonomy inspired by Multidimensional Quality Metrics (MQM). These outputs are compatible with state-of-the-art error detection models and suitable for training MT or post-editing systems. Our user study confirms that TranslationCorrect significantly improves translation efficiency and user satisfaction over traditional annotation methods.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability
Authors:
Genta Indra Winata,
David Anugraha,
Emmy Liu,
Alham Fikri Aji,
Shou-Yi Hung,
Aditya Parashar,
Patrick Amadeus Irawan,
Ruochen Zhang,
Zheng-Xin Yong,
Jan Christian Blaise Cruz,
Niklas Muennighoff,
Seungone Kim,
Hanyang Zhao,
Sudipta Kar,
Kezia Erina Suryoraharjo,
M. Farid Adilazuarda,
En-Shiun Annie Lee,
Ayu Purwarianti,
Derry Tanti Wijaya,
Monojit Choudhury
Abstract:
High-quality datasets are fundamental to training and evaluating machine learning models, yet their creation-especially with accurate human annotations-remains a significant challenge. Many dataset paper submissions lack originality, diversity, or rigorous quality control, and these shortcomings are often overlooked during peer review. Submissions also frequently omit essential details about datas…
▽ More
High-quality datasets are fundamental to training and evaluating machine learning models, yet their creation-especially with accurate human annotations-remains a significant challenge. Many dataset paper submissions lack originality, diversity, or rigorous quality control, and these shortcomings are often overlooked during peer review. Submissions also frequently omit essential details about dataset construction and properties. While existing tools such as datasheets aim to promote transparency, they are largely descriptive and do not provide standardized, measurable methods for evaluating data quality. Similarly, metadata requirements at conferences promote accountability but are inconsistently enforced. To address these limitations, this position paper advocates for the integration of systematic, rubric-based evaluation metrics into the dataset review process-particularly as submission volumes continue to grow. We also explore scalable, cost-effective methods for synthetic data generation, including dedicated tools and LLM-as-a-judge approaches, to support more efficient evaluation. As a call to action, we introduce DataRubrics, a structured framework for assessing the quality of both human- and model-generated datasets. Leveraging recent advances in LLM-based evaluation, DataRubrics offers a reproducible, scalable, and actionable solution for dataset quality assessment, enabling both authors and reviewers to uphold higher standards in data-centric research. We also release code to support reproducibility of LLM-based evaluations at https://github.com/datarubrics/datarubrics.
△ Less
Submitted 3 June, 2025; v1 submitted 2 June, 2025;
originally announced June 2025.
-
Certified randomness using a trapped-ion quantum processor
Authors:
Minzhao Liu,
Ruslan Shaydulin,
Pradeep Niroula,
Matthew DeCross,
Shih-Han Hung,
Wen Yu Kon,
Enrique Cervero-Martín,
Kaushik Chakraborty,
Omar Amer,
Scott Aaronson,
Atithi Acharya,
Yuri Alexeev,
K. Jordan Berg,
Shouvanik Chakrabarti,
Florian J. Curchod,
Joan M. Dreiling,
Neal Erickson,
Cameron Foltz,
Michael Foss-Feig,
David Hayes,
Travis S. Humble,
Niraj Kumar,
Jeffrey Larson,
Danylo Lykov,
Michael Mills
, et al. (7 additional authors not shown)
Abstract:
While quantum computers have the potential to perform a wide range of practically important tasks beyond the capabilities of classical computers, realizing this potential remains a challenge. One such task is to use an untrusted remote device to generate random bits that can be certified to contain a certain amount of entropy. Certified randomness has many applications but is fundamentally impossi…
▽ More
While quantum computers have the potential to perform a wide range of practically important tasks beyond the capabilities of classical computers, realizing this potential remains a challenge. One such task is to use an untrusted remote device to generate random bits that can be certified to contain a certain amount of entropy. Certified randomness has many applications but is fundamentally impossible to achieve solely by classical computation. In this work, we demonstrate the generation of certifiably random bits using the 56-qubit Quantinuum H2-1 trapped-ion quantum computer accessed over the internet. Our protocol leverages the classical hardness of recent random circuit sampling demonstrations: a client generates quantum "challenge" circuits using a small randomness seed, sends them to an untrusted quantum server to execute, and verifies the server's results. We analyze the security of our protocol against a restricted class of realistic near-term adversaries. Using classical verification with measured combined sustained performance of $1.1\times10^{18}$ floating-point operations per second across multiple supercomputers, we certify $71,313$ bits of entropy under this restricted adversary and additional assumptions. Our results demonstrate a step towards the practical applicability of today's quantum computers.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
QOPS: A Compiler Framework for Quantum Circuit Simulation Acceleration with Profile Guided Optimizations
Authors:
Yu-Tsung Wu,
Po-Hsuan Huang,
Kai-Chieh Chang,
Chia-Heng Tu,
Shih-Hao Hung
Abstract:
Quantum circuit simulation is important in the evolution of quantum software and hardware. Novel algorithms can be developed and evaluated by performing quantum circuit simulations on classical computers before physical quantum computers are available. Unfortunately, compared with a physical quantum computer, a prolonged simulation time hampers the rapid development of quantum algorithms. Inspired…
▽ More
Quantum circuit simulation is important in the evolution of quantum software and hardware. Novel algorithms can be developed and evaluated by performing quantum circuit simulations on classical computers before physical quantum computers are available. Unfortunately, compared with a physical quantum computer, a prolonged simulation time hampers the rapid development of quantum algorithms. Inspired by the feedback-directed optimization scheme used by classical compilers to improve the generated code, this work proposes a quantum compiler framework QOPS to enable profile-guided optimization (PGO) for quantum circuit simulation acceleration. The QOPS compiler instruments a quantum simulator to collect performance data during the circuit simulation and it then generates the optimized version of the quantum circuit based on the collected data. Experimental results show the PGO can effectively shorten the simulation time on our tested benchmark programs. Especially, the simulator-specific PGO (virtual swap) can be applied to the benchmarks to accelerate the simulation speed by a factor of 1.19. As for the hardware-independent PGO, compared with the brute force mechanism (turning on all available compilation flags), which achieves 21% performance improvement against the non-optimized version, the PGO can achieve 16% speedup with a factor of 63 less compilation time than the brute force approach.
△ Less
Submitted 20 October, 2024; v1 submitted 11 October, 2024;
originally announced October 2024.
-
Quantum Data Management in the NISQ Era: Extended Version
Authors:
Rihan Hai,
Shih-Han Hung,
Tim Coopmans,
Tim Littau,
Floris Geerts
Abstract:
Quantum computing has emerged as a promising tool for transforming the landscape of computing technology. Recent efforts have applied quantum techniques to classical database challenges, such as query optimization, data integration, index selection, and transaction management. In this paper, we shift focus to a critical yet underexplored area: data management for quantum computing. We are currentl…
▽ More
Quantum computing has emerged as a promising tool for transforming the landscape of computing technology. Recent efforts have applied quantum techniques to classical database challenges, such as query optimization, data integration, index selection, and transaction management. In this paper, we shift focus to a critical yet underexplored area: data management for quantum computing. We are currently in the noisy intermediate-scale quantum (NISQ) era, where qubits, while promising, are fragile and still limited in scale. After differentiating quantum data from classical data, we outline current and future data management paradigms in the NISQ era and beyond. We address the data management challenges arising from the emerging demands of near-term quantum computing. Our goal is to chart a clear course for future quantum-oriented data management research, establishing it as a cornerstone for the advancement of quantum computing in the NISQ era.
△ Less
Submitted 11 April, 2025; v1 submitted 21 September, 2024;
originally announced September 2024.
-
LiDAR-based Quadrotor for Slope Inspection in Dense Vegetation
Authors:
Wenyi Liu,
Yunfan Ren,
Rui Guo,
Vickie W. W. Kong,
Anthony S. P. Hung,
Fangcheng Zhu,
Yixi Cai,
Yuying Zou,
Fu Zhang
Abstract:
This work presents a LiDAR-based quadrotor system for slope inspection in dense vegetation environments. Cities like Hong Kong are vulnerable to climate hazards, which often result in landslides. To mitigate the landslide risks, the Civil Engineering and Development Department (CEDD) has constructed steel flexible debris-resisting barriers on vulnerable natural catchments to protect residents. How…
▽ More
This work presents a LiDAR-based quadrotor system for slope inspection in dense vegetation environments. Cities like Hong Kong are vulnerable to climate hazards, which often result in landslides. To mitigate the landslide risks, the Civil Engineering and Development Department (CEDD) has constructed steel flexible debris-resisting barriers on vulnerable natural catchments to protect residents. However, it is necessary to carry out regular inspections to identify any anomalies, which may affect the proper functioning of the barriers. Traditional manual inspection methods face challenges and high costs due to steep terrain and dense vegetation. Compared to manual inspection, unmanned aerial vehicles (UAVs) equipped with LiDAR sensors and cameras have advantages such as maneuverability in complex terrain, and access to narrow areas and high spots. However, conducting slope inspections using UAVs in dense vegetation poses significant challenges. First, in terms of hardware, the overall design of the UAV must carefully consider its maneuverability in narrow spaces, flight time, and the types of onboard sensors required for effective inspection. Second, regarding software, navigation algorithms need to be designed to enable obstacle avoidance flight in dense vegetation environments. To overcome these challenges, we develop a LiDAR-based quadrotor, accompanied by a comprehensive software system. The goal is to deploy our quadrotor in field environments to achieve efficient slope inspection. To assess the feasibility of our hardware and software system, we conduct functional tests in non-operational scenarios. Subsequently, invited by CEDD, we deploy our quadrotor in six field environments, including five flexible debris-resisting barriers located in dense vegetation and one slope that experienced a landslide. These experiments demonstrated the superiority of our quadrotor in slope inspection.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Site-Specific Color Features of Green Coffee Beans
Authors:
Shu-Min Tan,
Shih-Hsun Hung,
Je-Chiang Tsai
Abstract:
Coffee is one of the most valuable primary commodities. Despite this, the common selection technique of green coffee beans relies on personnel visual inspection, which is labor-intensive and subjective. Therefore, an efficient way to evaluate the quality of beans is needed. In this paper, we demonstrate a site-independent approach to find site-specific color features of the seed coat in qualified…
▽ More
Coffee is one of the most valuable primary commodities. Despite this, the common selection technique of green coffee beans relies on personnel visual inspection, which is labor-intensive and subjective. Therefore, an efficient way to evaluate the quality of beans is needed. In this paper, we demonstrate a site-independent approach to find site-specific color features of the seed coat in qualified green coffee beans. We then propose two evaluation schemes for green coffee beans based on this site-specific color feature of qualified beans. Due to the site-specific properties of these color features, machine learning classifiers indicate that compared with the existing evaluation schemes of beans, our evaluation schemes have the advantages of being simple, having less computational costs, and having universal applicability. Finally, this site-specific color feature can distinguish qualified beans from different growing sites. Moreover, this function can prevent cheating in the coffee business and is unique to our evaluation scheme of beans.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Find the Assembly Mistakes: Error Segmentation for Industrial Applications
Authors:
Dan Lehman,
Tim J. Schoonbeek,
Shao-Hsuan Hung,
Jacek Kustra,
Peter H. N. de With,
Fons van der Sommen
Abstract:
Recognizing errors in assembly and maintenance procedures is valuable for industrial applications, since it can increase worker efficiency and prevent unplanned down-time. Although assembly state recognition is gaining attention, none of the current works investigate assembly error localization. Therefore, we propose StateDiffNet, which localizes assembly errors based on detecting the differences…
▽ More
Recognizing errors in assembly and maintenance procedures is valuable for industrial applications, since it can increase worker efficiency and prevent unplanned down-time. Although assembly state recognition is gaining attention, none of the current works investigate assembly error localization. Therefore, we propose StateDiffNet, which localizes assembly errors based on detecting the differences between a (correct) intended assembly state and a test image from a similar viewpoint. StateDiffNet is trained on synthetically generated image pairs, providing full control over the type of meaningful change that should be detected. The proposed approach is the first to correctly localize assembly errors taken from real ego-centric video data for both states and error types that are never presented during training. Furthermore, the deployment of change detection to this industrial application provides valuable insights and considerations into the mechanisms of state-of-the-art change detection algorithms. The code and data generation pipeline are publicly available at: https://timschoonbeek.github.io/error_seg.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Supervised Representation Learning towards Generalizable Assembly State Recognition
Authors:
Tim J. Schoonbeek,
Goutham Balachandran,
Hans Onvlee,
Tim Houben,
Shao-Hsuan Hung,
Jacek Kustra,
Peter H. N. de With,
Fons van der Sommen
Abstract:
Assembly state recognition facilitates the execution of assembly procedures, offering feedback to enhance efficiency and minimize errors. However, recognizing assembly states poses challenges in scalability, since parts are frequently updated, and the robustness to execution errors remains underexplored. To address these challenges, this paper proposes an approach based on representation learning…
▽ More
Assembly state recognition facilitates the execution of assembly procedures, offering feedback to enhance efficiency and minimize errors. However, recognizing assembly states poses challenges in scalability, since parts are frequently updated, and the robustness to execution errors remains underexplored. To address these challenges, this paper proposes an approach based on representation learning and the novel intermediate-state informed loss function modification (ISIL). ISIL leverages unlabeled transitions between states and demonstrates significant improvements in clustering and classification performance for all tested architectures and losses. Despite being trained exclusively on images without execution errors, thorough analysis on error states demonstrates that our approach accurately distinguishes between correct states and states with various types of execution errors. The integration of the proposed algorithm can offer meaningful assistance to workers and mitigate unexpected losses due to procedural mishaps in industrial settings. The code is available at: https://timschoonbeek.github.io/state_rec
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Queen: A quick, scalable, and comprehensive quantum circuit simulation for supercomputing
Authors:
Chuan-Chi Wang,
Yu-Cheng Lin,
Yan-Jie Wang,
Chia-Heng Tu,
Shih-Hao Hung
Abstract:
The state vector-based simulation offers a convenient approach to developing and validating quantum algorithms with noise-free results. However, limited by the absence of cache-aware implementations and unpolished circuit optimizations, the past simulators were severely constrained in performance, leading to stagnation in quantum computing. In this paper, we present an innovative quantum circuit s…
▽ More
The state vector-based simulation offers a convenient approach to developing and validating quantum algorithms with noise-free results. However, limited by the absence of cache-aware implementations and unpolished circuit optimizations, the past simulators were severely constrained in performance, leading to stagnation in quantum computing. In this paper, we present an innovative quantum circuit simulation toolkit comprising gate optimization and simulation modules to address these performance challenges. For the performance, scalability, and comprehensive evaluation, we conduct a series of particular circuit benchmarks and strong scaling tests on a DGX-A100 workstation and achieve averaging 9 times speedup compared to state-of-the-art simulators, including QuEST, IBM-Aer, and NVIDIA-cuQuantum. Moreover, the critical performance metric FLOPS increases by up to a factor of 8-fold, and arithmetic intensity experiences a remarkable 96x enhancement. We believe the proposed toolkit paves the way for faster quantum circuit simulations, thereby facilitating the development of novel quantum algorithms.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Oracle Separation between Noisy Quantum Polynomial Time and the Polynomial Hierarchy
Authors:
Nai-Hui Chia,
Min-Hsiu Hsieh,
Shih-Han Hung,
En-Jui Kuo
Abstract:
This work investigates the oracle separation between the physically motivated complexity class of noisy quantum circuits, inspired by definitions such as those presented by Chen, Cotler, Huang, and Li (2022). We establish that with a constant error rate, separation can be achieved in terms of NP. When the error rate is $Ω(\log n/n)$, we can extend this result to the separation of PH. Notably, our…
▽ More
This work investigates the oracle separation between the physically motivated complexity class of noisy quantum circuits, inspired by definitions such as those presented by Chen, Cotler, Huang, and Li (2022). We establish that with a constant error rate, separation can be achieved in terms of NP. When the error rate is $Ω(\log n/n)$, we can extend this result to the separation of PH. Notably, our oracles, in all separations, do not necessitate error correction schemes or fault tolerance, as all quantum circuits are of constant depth. This indicates that even quantum computers with minor errors, without error correction, may surpass classical complexity classes under various scenarios and assumptions. We also explore various common noise settings and present new classical hardness results, generalizing those found in studies by Raz and Tal (2022) and Bassirian, Bouland, Fefferman, Gunn, and Tal (2021), which are of independent interest.
△ Less
Submitted 14 May, 2024; v1 submitted 11 May, 2024;
originally announced May 2024.
-
Quantum Data Management: From Theory to Opportunities
Authors:
Rihan Hai,
Shih-Han Hung,
Sebastian Feld
Abstract:
Quantum computing has emerged as a transformative tool for future data management. Classical problems in database domains, including query optimization, data integration, and transaction management, have recently been addressed using quantum computing techniques. This tutorial aims to establish the theoretical foundation essential for enhancing methodologies and practical implementations in this l…
▽ More
Quantum computing has emerged as a transformative tool for future data management. Classical problems in database domains, including query optimization, data integration, and transaction management, have recently been addressed using quantum computing techniques. This tutorial aims to establish the theoretical foundation essential for enhancing methodologies and practical implementations in this line of research. Moreover, this tutorial takes a forward-looking approach by delving into recent strides in quantum internet technologies and the nonlocality theory. We aim to shed light on the uncharted territory of future data systems tailored for the quantum internet.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Towards Optimizations of Quantum Circuit Simulation for Solving Max-Cut Problems with QAOA
Authors:
Yu-Cheng Lin,
Chuan-Chi Wang,
Chia-Heng Tu,
Shih-Hao Hung
Abstract:
Quantum approximate optimization algorithm (QAOA) is one of the popular quantum algorithms that are used to solve combinatorial optimization problems via approximations. QAOA is able to be evaluated on both physical and virtual quantum computers simulated by classical computers, with virtual ones being favored for their noise-free feature and availability. Nevertheless, performing QAOA on virtual…
▽ More
Quantum approximate optimization algorithm (QAOA) is one of the popular quantum algorithms that are used to solve combinatorial optimization problems via approximations. QAOA is able to be evaluated on both physical and virtual quantum computers simulated by classical computers, with virtual ones being favored for their noise-free feature and availability. Nevertheless, performing QAOA on virtual quantum computers suffers from a slow simulation speed for solving combinatorial optimization problems which require large-scale quantum circuit simulation (QCS). In this paper, we propose techniques to accelerate QCS for QAOA using mathematical optimizations to compress quantum operations, incorporating efficient bitwise operations to further lower the computational complexity, and leveraging different levels of parallelisms from modern multi-core processors, with a study case to show the effectiveness on solving max-cut problems.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Global Topology of 3D Symmetric Tensor Fields
Authors:
Shih-Hsuan Hung,
Yue Zhang,
Eugene Zhang
Abstract:
There have been recent advances in the analysis and visualization of 3D symmetric tensor fields, with a focus on the robust extraction of tensor field topology. However, topological features such as degenerate curves and neutral surfaces do not live in isolation. Instead, they intriguingly interact with each other. In this paper, we introduce the notion of {\em topological graph} for 3D symmetric…
▽ More
There have been recent advances in the analysis and visualization of 3D symmetric tensor fields, with a focus on the robust extraction of tensor field topology. However, topological features such as degenerate curves and neutral surfaces do not live in isolation. Instead, they intriguingly interact with each other. In this paper, we introduce the notion of {\em topological graph} for 3D symmetric tensor fields to facilitate global topological analysis of such fields. The nodes of the graph include degenerate curves and regions bounded by neutral surfaces in the domain. The edges in the graph denote the adjacency information between the regions and degenerate curves. In addition, we observe that a degenerate curve can be a loop and even a knot and that two degenerate curves (whether in the same region or not) can form a link. We provide a definition and theoretical analysis of individual degenerate curves in order to help understand why knots and links may occur. Moreover, we differentiate between wedges and trisectors, thus making the analysis more detailed about degenerate curves. We incorporate this information into the topological graph. Such a graph can not only reveal the global structure in a 3D symmetric tensor field but also allow two symmetric tensor fields to be compared. We demonstrate our approach by applying it to solid mechanics and material science data sets.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Certified Randomness from Quantum Supremacy
Authors:
Scott Aaronson,
Shih-Han Hung
Abstract:
We propose an application for near-term quantum devices: namely, generating cryptographically certified random bits, to use (for example) in proof-of-stake cryptocurrencies. Our protocol repurposes the existing "quantum supremacy" experiments, based on random circuit sampling, that Google and USTC have successfully carried out starting in 2019. We show that, whenever the outputs of these experimen…
▽ More
We propose an application for near-term quantum devices: namely, generating cryptographically certified random bits, to use (for example) in proof-of-stake cryptocurrencies. Our protocol repurposes the existing "quantum supremacy" experiments, based on random circuit sampling, that Google and USTC have successfully carried out starting in 2019. We show that, whenever the outputs of these experiments pass the now-standard Linear Cross-Entropy Benchmark (LXEB), under plausible hardness assumptions they necessarily contain $Ω(n)$ min-entropy, where $n$ is the number of qubits. To achieve a net gain in randomness, we use a small random seed to produce pseudorandom challenge circuits. In response to the challenge circuits, the quantum computer generates output strings that, after verification, can then be fed into a randomness extractor to produce certified nearly-uniform bits -- thereby "bootstrapping" from pseudorandomness to genuine randomness. We prove our protocol sound in two senses: (i) under a hardness assumption called Long List Quantum Supremacy Verification, which we justify in the random oracle model, and (ii) unconditionally in the random oracle model against an eavesdropper who could share arbitrary entanglement with the device. (Note that our protocol's output is unpredictable even to a computationally unbounded adversary who can see the random oracle.) Currently, the central drawback of our protocol is the exponential cost of verification, which in practice will limit its implementation to at most $n\sim 60$ qubits, a regime where attacks are expensive but not impossible. Modulo that drawback, our protocol appears to be the only practical application of quantum computing that both requires a QC and is physically realizable today.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
The Computational Complexity of Quantum Determinants
Authors:
Shih-Han Hung,
En-Jui Kuo
Abstract:
In this work, we study the computational complexity of quantum determinants, a $q$-deformation of matrix permanents: Given a complex number $q$ on the unit circle in the complex plane and an $n\times n$ matrix $X$, the $q$-permanent of $X$ is defined as $$\mathrm{Per}_q(X) = \sum_{σ\in S_n} q^{\ell(σ)}X_{1,σ(1)}\ldots X_{n,σ(n)},$$ where $\ell(σ)$ is the inversion number of permutation $σ$ in the…
▽ More
In this work, we study the computational complexity of quantum determinants, a $q$-deformation of matrix permanents: Given a complex number $q$ on the unit circle in the complex plane and an $n\times n$ matrix $X$, the $q$-permanent of $X$ is defined as $$\mathrm{Per}_q(X) = \sum_{σ\in S_n} q^{\ell(σ)}X_{1,σ(1)}\ldots X_{n,σ(n)},$$ where $\ell(σ)$ is the inversion number of permutation $σ$ in the symmetric group $S_n$ on $n$ elements. The function family generalizes determinant and permanent, which correspond to the cases $q=-1$ and $q=1$ respectively.
For worst-case hardness, by Liouville's approximation theorem and facts from algebraic number theory, we show that for primitive $m$-th root of unity $q$ for odd prime power $m=p^k$, exactly computing $q$-permanent is $\mathsf{Mod}_p\mathsf{P}$-hard. This implies that an efficient algorithm for computing $q$-permanent results in a collapse of the polynomial hierarchy. Next, we show that computing $q$-permanent can be achieved using an oracle that approximates to within a polynomial multiplicative error and a membership oracle for a finite set of algebraic integers. From this, an efficient approximation algorithm would also imply a collapse of the polynomial hierarchy. By random self-reducibility, computing $q$-permanent remains to be hard for a wide range of distributions satisfying a property called the strong autocorrelation property. Specifically, this is proved via a reduction from $1$-permanent to $q$-permanent for $O(1/n^2)$ points $z$ on the unit circle. Since the family of permanent functions shares common algebraic structure, various techniques developed for the hardness of permanent can be generalized to $q$-permanents.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Classical verification of quantum depth
Authors:
Nai-Hui Chia,
Shih-Han Hung
Abstract:
We present two protocols for classical verification of quantum depth. Our protocols allow a purely classical verifier to distinguish devices with different quantum circuit depths even in the presence of classical computation. We show that a device with quantum circuit depth at most d will be rejected by the verifier even if the prover applies additional polynomial-time classical computation to che…
▽ More
We present two protocols for classical verification of quantum depth. Our protocols allow a purely classical verifier to distinguish devices with different quantum circuit depths even in the presence of classical computation. We show that a device with quantum circuit depth at most d will be rejected by the verifier even if the prover applies additional polynomial-time classical computation to cheat. On the other hand, the verifier accepts a device which has quantum circuit depth d' for some d'>d. In our first protocol, we introduce an additional untrusted quantum machine which shares entanglements with the target machine. Applying a robust self-test, our first protocol certifies the depth of the target machine with information theoretic security and nearly optimal separation. The protocol relies on the oracle separation problem for quantum depth by Chia, Chung and Lai [STOC 2020] and a transformation from an oracle separation problem to a two-player non-local game. Our second protocol certifies the quantum depth of a single device based on quantum hardness of learning with errors. The protocol relies on the noisy trapdoor claw-free function family and the idea of pointer chasing to force the prover to keep quantum coherence until all preceding message exchanges are completed. To our knowledge, we give the first constructions for distinguishing hybrid quantum-classical computers with different circuit depths in unrelativized models.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
cuPSO: GPU Parallelization for Particle Swarm Optimization Algorithms
Authors:
Chuan-Chi Wang,
Chun-Yen Ho,
Chia-Heng Tu,
Shih-Hao Hung
Abstract:
Particle Swarm Optimization (PSO) is a stochastic technique for solving the optimization problem. Attempts have been made to shorten the computation times of PSO based algorithms with massive threads on GPUs (graphic processing units), where thread groups are formed to calculate the information of particles and the computed outputs for the particles are aggregated and analyzed to find the best sol…
▽ More
Particle Swarm Optimization (PSO) is a stochastic technique for solving the optimization problem. Attempts have been made to shorten the computation times of PSO based algorithms with massive threads on GPUs (graphic processing units), where thread groups are formed to calculate the information of particles and the computed outputs for the particles are aggregated and analyzed to find the best solution. In particular, the reduction-based method is considered as a common approach to handle the data aggregation and analysis for the calculated particle information. Nevertheless, based on our analysis, the reduction-based method would suffer from excessive memory accesses and thread synchronization overheads. In this paper, we propose a novel algorithm to alleviate the above overheads with the atomic functions. The threads within a thread group update the calculated results atomically to the intra-group data queue conditionally, which prevents the frequent accesses to the memory as done by the parallel reduction operations. Furthermore, we develop an enhanced version of the algorithm to alleviate the synchronization barrier among the thread groups, which is achieved by allowing the thread groups to run asynchronously and updating to the global, lock-protected variables occasionally if necessary. Our experimental results show that our proposed algorithm running on the Nvidia GPU is about 200 times faster than the serial version executed by the Intel Xeon CPU. Moreover, the novel algorithm outperforms the state-of-the-art method (the parallel reduction approach) by a factor of 2.2.
△ Less
Submitted 3 December, 2023; v1 submitted 3 May, 2022;
originally announced May 2022.
-
PolyNet: Polynomial Neural Network for 3D Shape Recognition with PolyShape Representation
Authors:
Mohsen Yavartanoo,
Shih-Hsuan Hung,
Reyhaneh Neshatavar,
Yue Zhang,
Kyoung Mu Lee
Abstract:
3D shape representation and its processing have substantial effects on 3D shape recognition. The polygon mesh as a 3D shape representation has many advantages in computer graphics and geometry processing. However, there are still some challenges for the existing deep neural network (DNN)-based methods on polygon mesh representation, such as handling the variations in the degree and permutations of…
▽ More
3D shape representation and its processing have substantial effects on 3D shape recognition. The polygon mesh as a 3D shape representation has many advantages in computer graphics and geometry processing. However, there are still some challenges for the existing deep neural network (DNN)-based methods on polygon mesh representation, such as handling the variations in the degree and permutations of the vertices and their pairwise distances. To overcome these challenges, we propose a DNN-based method (PolyNet) and a specific polygon mesh representation (PolyShape) with a multi-resolution structure. PolyNet contains two operations; (1) a polynomial convolution (PolyConv) operation with learnable coefficients, which learns continuous distributions as the convolutional filters to share the weights across different vertices, and (2) a polygonal pooling (PolyPool) procedure by utilizing the multi-resolution structure of PolyShape to aggregate the features in a much lower dimension. Our experiments demonstrate the strength and the advantages of PolyNet on both 3D shape classification and retrieval tasks compared to existing polygon mesh-based methods and its superiority in classifying graph representations of images. The code is publicly available from https://myavartanoo.github.io/polynet/.
△ Less
Submitted 15 October, 2021;
originally announced October 2021.
-
Feature Curves and Surfaces of 3D Asymmetric Tensor Fields
Authors:
Shih-Hsuan Hung,
Yue Zhang,
Harry Yeh,
Eugene Zhang
Abstract:
3D asymmetric tensor fields have found many applications in science and engineering domains, such as fluid dynamics and solid mechanics. 3D asymmetric tensors can have complex eigenvalues, which makes their analysis and visualization more challenging than 3D symmetric tensors. Existing research in tensor field visualization focuses on 2D asymmetric tensor fields and 3D symmetric tensor fields. In…
▽ More
3D asymmetric tensor fields have found many applications in science and engineering domains, such as fluid dynamics and solid mechanics. 3D asymmetric tensors can have complex eigenvalues, which makes their analysis and visualization more challenging than 3D symmetric tensors. Existing research in tensor field visualization focuses on 2D asymmetric tensor fields and 3D symmetric tensor fields. In this paper, we address the analysis and visualization of 3D asymmetric tensor fields. We introduce six topological surfaces and one topological curve, which lead to an eigenvalue space based on the tensor mode that we define. In addition, we identify several non-topological feature surfaces that are nonetheless physically important. Included in our analysis are the realizations that triple degenerate tensors are structurally stable and form curves, unlike the case for 3D symmetric tensors fields. Furthermore, there are two different ways of measuring the relative strengths of rotation and angular deformation in the tensor fields, unlike the case for 2D asymmetric tensor fields. We extract these feature surfaces using the A-patches algorithm. However, since three of our feature surfaces are quadratic, we develop a method to extract quadratic surfaces at any given accuracy. To facilitate the analysis of eigenvector fields, we visualize a hyperstreamline as a tree stem with the other two eigenvectors represented as thorns in the real domain or the dual-eigenvectors as leaves in the complex domain. To demonstrate the effectiveness of our analysis and visualization, we apply our approach to datasets from solid mechanics and fluid dynamics.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
Quantum query complexity with matrix-vector products
Authors:
Andrew M. Childs,
Shih-Han Hung,
Tongyang Li
Abstract:
We study quantum algorithms that learn properties of a matrix using queries that return its action on an input vector. We show that for various problems, including computing the trace, determinant, or rank of a matrix or solving a linear system that it specifies, quantum computers do not provide an asymptotic speedup over classical computation. On the other hand, we show that for some problems, su…
▽ More
We study quantum algorithms that learn properties of a matrix using queries that return its action on an input vector. We show that for various problems, including computing the trace, determinant, or rank of a matrix or solving a linear system that it specifies, quantum computers do not provide an asymptotic speedup over classical computation. On the other hand, we show that for some problems, such as computing the parities of rows or columns or deciding if there are two identical rows or columns, quantum computers provide exponential speedup. We demonstrate this by showing equivalence between models that provide matrix-vector products, vector-matrix products, and vector-matrix-vector products, whereas the power of these models can vary significantly for classical computation.
△ Less
Submitted 14 March, 2021; v1 submitted 22 February, 2021;
originally announced February 2021.
-
Federated Evaluation and Tuning for On-Device Personalization: System Design & Applications
Authors:
Matthias Paulik,
Matt Seigel,
Henry Mason,
Dominic Telaar,
Joris Kluivers,
Rogier van Dalen,
Chi Wai Lau,
Luke Carlson,
Filip Granqvist,
Chris Vandevelde,
Sudeep Agarwal,
Julien Freudiger,
Andrew Byde,
Abhishek Bhowmick,
Gaurav Kapoor,
Si Beaumont,
Áine Cahill,
Dominic Hughes,
Omid Javidbakht,
Fei Dong,
Rehan Rishi,
Stanley Hung
Abstract:
We describe the design of our federated task processing system. Originally, the system was created to support two specific federated tasks: evaluation and tuning of on-device ML systems, primarily for the purpose of personalizing these systems. In recent years, support for an additional federated task has been added: federated learning (FL) of deep neural networks. To our knowledge, only one other…
▽ More
We describe the design of our federated task processing system. Originally, the system was created to support two specific federated tasks: evaluation and tuning of on-device ML systems, primarily for the purpose of personalizing these systems. In recent years, support for an additional federated task has been added: federated learning (FL) of deep neural networks. To our knowledge, only one other system has been described in literature that supports FL at scale. We include comparisons to that system to help discuss design decisions and attached trade-offs. Finally, we describe two specific large scale personalization use cases in detail to showcase the applicability of federated tuning to on-device personalization and to highlight application specific solutions.
△ Less
Submitted 16 February, 2021;
originally announced February 2021.
-
ResPerfNet: Deep Residual Learning for Regressional Performance Modeling of Deep Neural Networks
Authors:
Chuan-Chi Wang,
Ying-Chiao Liao,
Chia-Heng Tu,
Ming-Chang Kao,
Wen-Yew Liang,
Shih-Hao Hung
Abstract:
The rapid advancements of computing technology facilitate the development of diverse deep learning applications. Unfortunately, the efficiency of parallel computing infrastructures varies widely with neural network models, which hinders the exploration of the design space to find high-performance neural network architectures on specific computing platforms for a given application. To address such…
▽ More
The rapid advancements of computing technology facilitate the development of diverse deep learning applications. Unfortunately, the efficiency of parallel computing infrastructures varies widely with neural network models, which hinders the exploration of the design space to find high-performance neural network architectures on specific computing platforms for a given application. To address such a challenge, we propose a deep learning-based method, ResPerfNet, which trains a residual neural network with representative datasets obtained on the target platform to predict the performance for a deep neural network. Our experimental results show that ResPerfNet can accurately predict the execution time of individual neural network layers and full network models on a variety of platforms. In particular, ResPerfNet achieves 8.4% of mean absolute percentage error for LeNet, AlexNet and VGG16 on the NVIDIA GTX 1080Ti, which is substantially lower than the previously published works.
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
Toward Accurate Platform-Aware Performance Modeling for Deep Neural Networks
Authors:
Chuan-Chi Wang,
Ying-Chiao Liao,
Ming-Chang Kao,
Wen-Yew Liang,
Shih-Hao Hung
Abstract:
In this paper, we provide a fine-grain machine learning-based method, PerfNetV2, which improves the accuracy of our previous work for modeling the neural network performance on a variety of GPU accelerators. Given an application, the proposed method can be used to predict the inference time and training time of the convolutional neural networks used in the application, which enables the system dev…
▽ More
In this paper, we provide a fine-grain machine learning-based method, PerfNetV2, which improves the accuracy of our previous work for modeling the neural network performance on a variety of GPU accelerators. Given an application, the proposed method can be used to predict the inference time and training time of the convolutional neural networks used in the application, which enables the system developer to optimize the performance by choosing the neural networks and/or incorporating the hardware accelerators to deliver satisfactory results in time. Furthermore, the proposed method is capable of predicting the performance of an unseen or non-existing device, e.g. a new GPU which has a higher operating frequency with less processor cores, but more memory capacity. This allows a system developer to quickly search the hardware design space and/or fine-tune the system configuration. Compared to the previous works, PerfNetV2 delivers more accurate results by modeling detailed host-accelerator interactions in executing the full neural networks and improving the architecture of the machine learning model used in the predictor. Our case studies show that PerfNetV2 yields a mean absolute percentage error within 13.1% on LeNet, AlexNet, and VGG16 on NVIDIA GTX-1080Ti, while the error rate on a previous work published in ICBD 2018 could be as large as 200%.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Topic Diffusion Discovery Based on Deep Non-negative Autoencoder
Authors:
Sheng-Tai Huang,
Yihuang Kang,
Shao-Min Hung,
Bowen Kuo,
I-Ling Cheng
Abstract:
Researchers have been overwhelmed by the explosion of research articles published by various research communities. Many research scholarly websites, search engines, and digital libraries have been created to help researchers identify potential research topics and keep up with recent progress on research of interests. However, it is still difficult for researchers to keep track of the research topi…
▽ More
Researchers have been overwhelmed by the explosion of research articles published by various research communities. Many research scholarly websites, search engines, and digital libraries have been created to help researchers identify potential research topics and keep up with recent progress on research of interests. However, it is still difficult for researchers to keep track of the research topic diffusion and evolution without spending a large amount of time reviewing numerous relevant and irrelevant articles. In this paper, we consider a novel topic diffusion discovery technique. Specifically, we propose using a Deep Non-negative Autoencoder with information divergence measurement that monitors evolutionary distance of the topic diffusion to understand how research topics change with time. The experimental results show that the proposed approach is able to identify the evolution of research topics as well as to discover topic diffusions in online fashions.
△ Less
Submitted 7 October, 2020;
originally announced October 2020.
-
Proving Quantum Programs Correct
Authors:
Kesha Hietala,
Robert Rand,
Shih-Han Hung,
Liyi Li,
Michael Hicks
Abstract:
As quantum computing progresses steadily from theory into practice, programmers will face a common problem: How can they be sure that their code does what they intend it to do? This paper presents encouraging results in the application of mechanized proof to the domain of quantum programming in the context of the SQIR development. It verifies the correctness of a range of a quantum algorithms incl…
▽ More
As quantum computing progresses steadily from theory into practice, programmers will face a common problem: How can they be sure that their code does what they intend it to do? This paper presents encouraging results in the application of mechanized proof to the domain of quantum programming in the context of the SQIR development. It verifies the correctness of a range of a quantum algorithms including Grover's algorithm and quantum phase estimation, a key component of Shor's algorithm. In doing so, it aims to highlight both the successes and challenges of formal verification in the quantum context and motivate the theorem proving community to target quantum computing as an application domain.
△ Less
Submitted 13 July, 2021; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Thermal Analysis of a 3D Stacked High-Performance Commercial Microprocessor using Face-to-Face Wafer Bonding Technology
Authors:
Rahul Mathur,
Chien-Ju Chao,
Rossana Liu,
Nikhil Tadepalli,
Pranavi Chandupatla,
Shawn Hung,
Xiaoqing Xu,
Saurabh Sinha,
Jaydeep Kulkarni
Abstract:
3D integration technologies are seeing widespread adoption in the semiconductor industry to offset the limitations and slowdown of two-dimensional scaling. High-density 3D integration techniques such as face-to-face wafer bonding with sub-10 $μ$m pitch can enable new ways of designing SoCs using all 3 dimensions, like folding a microprocessor design across multiple 3D tiers. However, overlapping t…
▽ More
3D integration technologies are seeing widespread adoption in the semiconductor industry to offset the limitations and slowdown of two-dimensional scaling. High-density 3D integration techniques such as face-to-face wafer bonding with sub-10 $μ$m pitch can enable new ways of designing SoCs using all 3 dimensions, like folding a microprocessor design across multiple 3D tiers. However, overlapping thermal hotspots can be a challenge in such 3D stacked designs due to a general increase in power density. In this work, we perform a thorough thermal simulation study on sign-off quality physical design implementation of a state-of-the-art, high-performance, out-of-order microprocessor on a 7nm process technology. The physical design of the microprocessor is partitioned and implemented in a 2-tier, 3D stacked configuration with logic blocks and memory instances in separate tiers (logic-over-memory 3D). The thermal simulation model was calibrated to temperature measurement data from a high-performance, CPU-based 2D SoC chip fabricated on the same 7nm process technology. Thermal profiles of different 3D configurations under various workload conditions are simulated and compared. We find that stacking microprocessor designs in 3D without considering thermal implications can result in maximum die temperature up to 12°C higher than their 2D counterparts under the worst-case power-indicative workload. This increase in temperature would reduce the amount of time for which a power-intensive workload can be run before throttling is required. However, logic-over-memory partitioned 3D CPU implementation can mitigate this temperature increase by half, which makes the temperature of the 3D design only 6$^\circ$C higher than the 2D baseline. We conclude that using thermal aware design partitioning and improved cooling techniques can overcome the thermal challenges associated with 3D stacking.
△ Less
Submitted 31 July, 2020;
originally announced July 2020.
-
Social Distancing 2.0 with Privacy-Preserving Contact Tracing to Avoid a Second Wave of COVID-19
Authors:
Yu-Chen Ho,
Yi-Hsuan Chen,
Shen-Hua Hung,
Chien-Hao Huang,
Poga Po,
Chung-Hsi Chan,
Di-Kai Yang,
Yi-Chin Tu,
Tyng-Luh Liu,
Chi-Tai Fang
Abstract:
How to avoid a second wave of COVID-19 after reopening the economy is a pressing question. The extremely high basic reproductive number $R_0$ (5.7 to 6.4, shown in new studies) of SARS-CoV-2 further complicates the challenge. Here we assess effects of Social distancing 2.0, i.e. proximity alert (to maintain inter-personal distance) plus privacy-preserving contact tracing. To solve the dual task, w…
▽ More
How to avoid a second wave of COVID-19 after reopening the economy is a pressing question. The extremely high basic reproductive number $R_0$ (5.7 to 6.4, shown in new studies) of SARS-CoV-2 further complicates the challenge. Here we assess effects of Social distancing 2.0, i.e. proximity alert (to maintain inter-personal distance) plus privacy-preserving contact tracing. To solve the dual task, we developed an open source mobile app. The app uses a Bluetooth-based, decentralized contact tracing platform over which the anonymous user ID cannot be linked by the government or a third party. Modelling results show that a 50\% adoption rate of Social distancing 2.0, with privacy-preserving contact tracing, would suffice to decrease the $R_0$ to less than 1 and prevent the resurgence of COVID-19 epidemic.
△ Less
Submitted 5 August, 2020; v1 submitted 30 June, 2020;
originally announced June 2020.
-
On the Principles of Differentiable Quantum Programming Languages
Authors:
Shaopeng Zhu,
Shih-Han Hung,
Shouvanik Chakrabarti,
Xiaodi Wu
Abstract:
Variational Quantum Circuits (VQCs), or the so-called quantum neural-networks, are predicted to be one of the most important near-term quantum applications, not only because of their similar promises as classical neural-networks, but also because of their feasibility on near-term noisy intermediate-size quantum (NISQ) machines. The need for gradient information in the training procedure of VQC app…
▽ More
Variational Quantum Circuits (VQCs), or the so-called quantum neural-networks, are predicted to be one of the most important near-term quantum applications, not only because of their similar promises as classical neural-networks, but also because of their feasibility on near-term noisy intermediate-size quantum (NISQ) machines. The need for gradient information in the training procedure of VQC applications has stimulated the development of auto-differentiation techniques for quantum circuits. We propose the first formalization of this technique, not only in the context of quantum circuits but also for imperative quantum programs (e.g., with controls), inspired by the success of differentiable programming languages in classical machine learning. In particular, we overcome a few unique difficulties caused by exotic quantum features (such as quantum no-cloning) and provide a rigorous formulation of differentiation applied to bounded-loop imperative quantum programs, its code-transformation rules, as well as a sound logic to reason about their correctness. Moreover, we have implemented our code transformation in OCaml and demonstrated the resource-efficiency of our scheme both analytically and empirically. We also conduct a case study of training a VQC instance with controls, which shows the advantage of our scheme over existing auto-differentiation for quantum circuits without controls.
△ Less
Submitted 2 April, 2020;
originally announced April 2020.
-
A Verified Optimizer for Quantum Circuits
Authors:
Kesha Hietala,
Robert Rand,
Shih-Han Hung,
Xiaodi Wu,
Michael Hicks
Abstract:
We present VOQC, the first fully verified optimizer for quantum circuits, written using the Coq proof assistant. Quantum circuits are expressed as programs in a simple, low-level language called SQIR, a simple quantum intermediate representation, which is deeply embedded in Coq. Optimizations and other transformations are expressed as Coq functions, which are proved correct with respect to a seman…
▽ More
We present VOQC, the first fully verified optimizer for quantum circuits, written using the Coq proof assistant. Quantum circuits are expressed as programs in a simple, low-level language called SQIR, a simple quantum intermediate representation, which is deeply embedded in Coq. Optimizations and other transformations are expressed as Coq functions, which are proved correct with respect to a semantics of SQIR programs. SQIR uses a semantics of matrices of complex numbers, which is the standard for quantum computation, but treats matrices symbolically in order to reason about programs that use an arbitrary number of quantum bits. SQIR's careful design and our provided automation make it possible to write and verify a broad range of optimizations in VOQC, including full-circuit transformations from cutting-edge optimizers.
△ Less
Submitted 12 November, 2020; v1 submitted 4 December, 2019;
originally announced December 2019.
-
Non-interactive classical verification of quantum computation
Authors:
Gorjan Alagic,
Andrew M. Childs,
Alex B. Grilo,
Shih-Han Hung
Abstract:
In a recent breakthrough, Mahadev constructed an interactive protocol that enables a purely classical party to delegate any quantum computation to an untrusted quantum prover. In this work, we show that this same task can in fact be performed non-interactively and in zero-knowledge.
Our protocols result from a sequence of significant improvements to the original four-message protocol of Mahadev.…
▽ More
In a recent breakthrough, Mahadev constructed an interactive protocol that enables a purely classical party to delegate any quantum computation to an untrusted quantum prover. In this work, we show that this same task can in fact be performed non-interactively and in zero-knowledge.
Our protocols result from a sequence of significant improvements to the original four-message protocol of Mahadev. We begin by making the first message instance-independent and moving it to an offline setup phase. We then establish a parallel repetition theorem for the resulting three-message protocol, with an asymptotically optimal rate. This, in turn, enables an application of the Fiat-Shamir heuristic, eliminating the second message and giving a non-interactive protocol. Finally, we employ classical non-interactive zero-knowledge (NIZK) arguments and classical fully homomorphic encryption (FHE) to give a zero-knowledge variant of this construction. This yields the first purely classical NIZK argument system for QMA, a quantum analogue of NP.
We establish the security of our protocols under standard assumptions in quantum-secure cryptography. Specifically, our protocols are secure in the Quantum Random Oracle Model, under the assumption that Learning with Errors is quantumly hard. The NIZK construction also requires circuit-private FHE.
△ Less
Submitted 9 March, 2020; v1 submitted 19 November, 2019;
originally announced November 2019.
-
Compacting, Picking and Growing for Unforgetting Continual Learning
Authors:
Steven C. Y. Hung,
Cheng-Hao Tu,
Cheng-En Wu,
Chien-Hung Chen,
Yi-Ming Chan,
Chu-Song Chen
Abstract:
Continual lifelong learning is essential to many applications. In this paper, we propose a simple but effective approach to continual deep learning. Our approach leverages the principles of deep model compression, critical weights selection, and progressive networks expansion. By enforcing their integration in an iterative manner, we introduce an incremental learning method that is scalable to the…
▽ More
Continual lifelong learning is essential to many applications. In this paper, we propose a simple but effective approach to continual deep learning. Our approach leverages the principles of deep model compression, critical weights selection, and progressive networks expansion. By enforcing their integration in an iterative manner, we introduce an incremental learning method that is scalable to the number of sequential tasks in a continual learning process. Our approach is easy to implement and owns several favorable characteristics. First, it can avoid forgetting (i.e., learn new tasks while remembering all previous tasks). Second, it allows model expansion but can maintain the model compactness when handling sequential tasks. Besides, through our compaction and selection/expansion mechanism, we show that the knowledge accumulated through learning previous tasks is helpful to build a better model for the new tasks compared to training the models independently with tasks. Experimental results show that our approach can incrementally learn a deep model tackling multiple tasks without forgetting, while the model compactness is maintained with the performance more satisfiable than individual task training.
△ Less
Submitted 30 October, 2019; v1 submitted 15 October, 2019;
originally announced October 2019.
-
Quantum algorithm for estimating volumes of convex bodies
Authors:
Shouvanik Chakrabarti,
Andrew M. Childs,
Shih-Han Hung,
Tongyang Li,
Chunhao Wang,
Xiaodi Wu
Abstract:
Estimating the volume of a convex body is a central problem in convex geometry and can be viewed as a continuous version of counting. We present a quantum algorithm that estimates the volume of an $n$-dimensional convex body within multiplicative error $ε$ using $\tilde{O}(n^{3}+n^{2.5}/ε)$ queries to a membership oracle and $\tilde{O}(n^{5}+n^{4.5}/ε)$ additional arithmetic operations. For compar…
▽ More
Estimating the volume of a convex body is a central problem in convex geometry and can be viewed as a continuous version of counting. We present a quantum algorithm that estimates the volume of an $n$-dimensional convex body within multiplicative error $ε$ using $\tilde{O}(n^{3}+n^{2.5}/ε)$ queries to a membership oracle and $\tilde{O}(n^{5}+n^{4.5}/ε)$ additional arithmetic operations. For comparison, the best known classical algorithm uses $\tilde{O}(n^{4}+n^{3}/ε^{2})$ queries and $\tilde{O}(n^{6}+n^{5}/ε^{2})$ additional arithmetic operations. To the best of our knowledge, this is the first quantum speedup for volume estimation. Our algorithm is based on a refined framework for speeding up simulated annealing algorithms that might be of independent interest. This framework applies in the setting of "Chebyshev cooling", where the solution is expressed as a telescoping product of ratios, each having bounded variance. We develop several novel techniques when implementing our framework, including a theory of continuous-space quantum walks with rigorous bounds on discretization error. To complement our quantum algorithms, we also prove that volume estimation requires $Ω(\sqrt n+1/ε)$ quantum membership queries, which rules out the possibility of exponential quantum speedup in $n$ and shows optimality of our algorithm in $1/ε$ up to poly-logarithmic factors.
△ Less
Submitted 1 November, 2021; v1 submitted 11 August, 2019;
originally announced August 2019.
-
Verified Optimization in a Quantum Intermediate Representation
Authors:
Kesha Hietala,
Robert Rand,
Shih-Han Hung,
Xiaodi Wu,
Michael Hicks
Abstract:
We present sqire, a low-level language for quantum computing and verification. sqire uses a global register of quantum bits, allowing easy compilation to and from existing `quantum assembly' languages and simplifying the verification process. We demonstrate the power of sqire as an intermediate representation of quantum programs by verifying a number of useful optimizations, and we demonstrate sqi…
▽ More
We present sqire, a low-level language for quantum computing and verification. sqire uses a global register of quantum bits, allowing easy compilation to and from existing `quantum assembly' languages and simplifying the verification process. We demonstrate the power of sqire as an intermediate representation of quantum programs by verifying a number of useful optimizations, and we demonstrate sqire's use as a tool for general verification by proving several quantum programs correct.
△ Less
Submitted 6 December, 2019; v1 submitted 12 April, 2019;
originally announced April 2019.
-
Quantitative Robustness Analysis of Quantum Programs (Extended Version)
Authors:
Shih-Han Hung,
Kesha Hietala,
Shaopeng Zhu,
Mingsheng Ying,
Michael Hicks,
Xiaodi Wu
Abstract:
Quantum computation is a topic of significant recent interest, with practical advances coming from both research and industry. A major challenge in quantum programming is dealing with errors (quantum noise) during execution. Because quantum resources (e.g., qubits) are scarce, classical error correction techniques applied at the level of the architecture are currently cost-prohibitive. But while t…
▽ More
Quantum computation is a topic of significant recent interest, with practical advances coming from both research and industry. A major challenge in quantum programming is dealing with errors (quantum noise) during execution. Because quantum resources (e.g., qubits) are scarce, classical error correction techniques applied at the level of the architecture are currently cost-prohibitive. But while this reality means that quantum programs are almost certain to have errors, there as yet exists no principled means to reason about erroneous behavior. This paper attempts to fill this gap by developing a semantics for erroneous quantum while-programs, as well as a logic for reasoning about them. This logic permits proving a property we have identified, called $ε$-robustness, which characterizes possible "distance" between an ideal program and an erroneous one. We have proved the logic sound, and showed its utility on several case studies, notably: (1) analyzing the robustness of noisy versions of the quantum Bernoulli factory (QBF) and quantum walk (QW); (2) demonstrating the (in)effectiveness of different error correction schemes on single-qubit errors; and (3) analyzing the robustness of a fault-tolerant version of QBF.
△ Less
Submitted 1 December, 2018; v1 submitted 8 November, 2018;
originally announced November 2018.
-
Incorporating Luminance, Depth and Color Information by a Fusion-based Network for Semantic Segmentation
Authors:
Shang-Wei Hung,
Shao-Yuan Lo,
Hsueh-Ming Hang
Abstract:
Semantic segmentation has made encouraging progress due to the success of deep convolutional networks in recent years. Meanwhile, depth sensors become prevalent nowadays, so depth maps can be acquired more easily. However, there are few studies that focus on the RGB-D semantic segmentation task. Exploiting the depth information effectiveness to improve performance is a challenge. In this paper, we…
▽ More
Semantic segmentation has made encouraging progress due to the success of deep convolutional networks in recent years. Meanwhile, depth sensors become prevalent nowadays, so depth maps can be acquired more easily. However, there are few studies that focus on the RGB-D semantic segmentation task. Exploiting the depth information effectiveness to improve performance is a challenge. In this paper, we propose a novel solution named LDFNet, which incorporates Luminance, Depth and Color information by a fusion-based network. It includes a sub-network to process depth maps and employs luminance images to assist the depth information in processes. LDFNet outperforms the other state-of-art systems on the Cityscapes dataset, and its inference speed is faster than most of the existing networks. The experimental results show the effectiveness of the proposed multi-modal fusion network and its potential for practical applications.
△ Less
Submitted 19 May, 2019; v1 submitted 24 September, 2018;
originally announced September 2018.
-
Quantum algorithm for multivariate polynomial interpolation
Authors:
Jianxin Chen,
Andrew M. Childs,
Shih-Han Hung
Abstract:
How many quantum queries are required to determine the coefficients of a degree-$d$ polynomial in $n$ variables? We present and analyze quantum algorithms for this multivariate polynomial interpolation problem over the fields $\mathbb{F}_q$, $\mathbb{R}$, and $\mathbb{C}$. We show that $k_{\mathbb{C}}$ and $2k_{\mathbb{C}}$ queries suffice to achieve probability $1$ for $\mathbb{C}$ and…
▽ More
How many quantum queries are required to determine the coefficients of a degree-$d$ polynomial in $n$ variables? We present and analyze quantum algorithms for this multivariate polynomial interpolation problem over the fields $\mathbb{F}_q$, $\mathbb{R}$, and $\mathbb{C}$. We show that $k_{\mathbb{C}}$ and $2k_{\mathbb{C}}$ queries suffice to achieve probability $1$ for $\mathbb{C}$ and $\mathbb{R}$, respectively, where $k_{\mathbb{C}}=\smash{\lceil\frac{1}{n+1}{n+d\choose d}\rceil}$ except for $d=2$ and four other special cases. For $\mathbb{F}_q$, we show that $\smash{\lceil\frac{d}{n+d}{n+d\choose d}\rceil}$ queries suffice to achieve probability approaching $1$ for large field order $q$. The classical query complexity of this problem is $\smash{n+d\choose d}$, so our result provides a speedup by a factor of $n+1$, $\frac{n+1}{2}$, and $\frac{n+d}{d}$ for $\mathbb{C}$, $\mathbb{R}$, and $\mathbb{F}_q$, respectively. Thus we find a much larger gap between classical and quantum algorithms than the univariate case, where the speedup is by a factor of $2$. For the case of $\mathbb{F}_q$, we conjecture that $2k_{\mathbb{C}}$ queries also suffice to achieve probability approaching $1$ for large field order $q$, although we leave this as an open problem.
△ Less
Submitted 19 January, 2018; v1 submitted 14 January, 2017;
originally announced January 2017.
-
Multiparty Quantum Private Comparsion with Individually Dishonest Third Parties for Strangers
Authors:
Shih-Min Hung,
Sheng-Liang Hwang,
Tzonelih Hwang,
Shih-Hung Kao
Abstract:
This study explores a new security problem existing in various state-of-the-art quantum private comparison (QPC) protocols, where a malicious third-party (TP) announces fake comparison (or intermediate) results. In this case, the participants could eventually be led to a wrong direction and the QPC will become fraudulent. In order to resolve this problem, a new level of trustworthiness for TP is d…
▽ More
This study explores a new security problem existing in various state-of-the-art quantum private comparison (QPC) protocols, where a malicious third-party (TP) announces fake comparison (or intermediate) results. In this case, the participants could eventually be led to a wrong direction and the QPC will become fraudulent. In order to resolve this problem, a new level of trustworthiness for TP is defined and a new QPC protocol is proposed, where a second TP is introduced to monitor the first one. Once a TP announces a fake comparison (or intermediate) result, participants can detect the fraud immediately. Besides, due to the introduction of the second TP, the proposed protocol allows strangers to compare their secrets privately, whereas the state-of-the-art QPCs require the involved clients to know each other before running the protocol.
△ Less
Submitted 24 July, 2016;
originally announced July 2016.
-
Optimal quantum algorithm for polynomial interpolation
Authors:
Andrew M. Childs,
Wim van Dam,
Shih-Han Hung,
Igor E. Shparlinski
Abstract:
We consider the number of quantum queries required to determine the coefficients of a degree-d polynomial over GF(q). A lower bound shown independently by Kane and Kutin and by Meyer and Pommersheim shows that d/2+1/2 quantum queries are needed to solve this problem with bounded error, whereas an algorithm of Boneh and Zhandry shows that d quantum queries are sufficient. We show that the lower bou…
▽ More
We consider the number of quantum queries required to determine the coefficients of a degree-d polynomial over GF(q). A lower bound shown independently by Kane and Kutin and by Meyer and Pommersheim shows that d/2+1/2 quantum queries are needed to solve this problem with bounded error, whereas an algorithm of Boneh and Zhandry shows that d quantum queries are sufficient. We show that the lower bound is achievable: d/2+1/2 quantum queries suffice to determine the polynomial with bounded error. Furthermore, we show that d/2+1 queries suffice to achieve probability approaching 1 for large q. These upper bounds improve results of Boneh and Zhandry on the insecurity of cryptographic protocols against quantum attacks. We also show that our algorithm's success probability as a function of the number of queries is precisely optimal. Furthermore, the algorithm can be implemented with gate complexity poly(log q) with negligible decrease in the success probability. We end with a conjecture about the quantum query complexity of multivariate polynomial interpolation.
△ Less
Submitted 1 March, 2016; v1 submitted 30 September, 2015;
originally announced September 2015.
-
Comparison of Spearman's rho and Kendall's tau in Normal and Contaminated Normal Models
Authors:
Weichao Xu,
Yunhe Hou,
Y. S. Hung,
Yuexian Zou
Abstract:
This paper analyzes the performances of the Spearman's rho (SR) and Kendall's tau (KT) with respect to samples drawn from bivariate normal and bivariate contaminated normal populations. The exact analytical formulae of the variance of SR and the covariance between SR and KT are obtained based on the Childs's reduction formula for the quadrivariate normal positive orthant probabilities. Close form…
▽ More
This paper analyzes the performances of the Spearman's rho (SR) and Kendall's tau (KT) with respect to samples drawn from bivariate normal and bivariate contaminated normal populations. The exact analytical formulae of the variance of SR and the covariance between SR and KT are obtained based on the Childs's reduction formula for the quadrivariate normal positive orthant probabilities. Close form expressions with respect to the expectations of SR and KT are established under the bivariate contaminated normal models. The bias, mean square error (MSE) and asymptotic relative efficiency (ARE) of the three estimators based on SR and KT to the Pearson's product moment correlation coefficient (PPMCC) are investigated in both the normal and contaminated normal models. Theoretical and simulation results suggest that, contrary to the opinion of equivalence between SR and KT in some literature, the behaviors of SR and KT are strikingly different in the aspects of bias effect, variance, mean square error, and asymptotic relative efficiency. The new findings revealed in this work provide not only deeper insights into the two most widely used rank based correlation coefficients, but also a guidance for choosing which one to use under the circumstances where the PPMCC fails to apply.
△ Less
Submitted 9 November, 2010;
originally announced November 2010.