Search | arXiv e-print repository

Theory of wakefield in a transversely inhomogeneous plasma waveguide

Authors: K. V. Galaydych, P. I. Markov, G. V. Sotnikov

Abstract: Theoretical studies have been made into the relativistic drive bunch generation of a wakefield in a cylindrical waveguide filled with a transversely inhomogeneous plasma. According to the model used, the transversely inhomogeneous plasma is considered as a combination of tubular plasma and the plasma background of different density. Analytical expressions have been derived for the excited radial a… ▽ More Theoretical studies have been made into the relativistic drive bunch generation of a wakefield in a cylindrical waveguide filled with a transversely inhomogeneous plasma. According to the model used, the transversely inhomogeneous plasma is considered as a combination of tubular plasma and the plasma background of different density. Analytical expressions have been derived for the excited radial and axial electric field components, and for the azimuthal magnetic field component. The dispersion of the plasma waveguide under study, as well as the topography of the electromagnetic field components of the TM-eigenwaves, resonant with the bunch, have been investigated. Longitudinal and transverse amplitude distribution structures of the axial and radial wakefields have been determined. Spectrum analysis of the longitudinal and transverse wakefields has been performed with the result that their frequency content has been determined. △ Less

Submitted 19 June, 2025; originally announced June 2025.

arXiv:2505.14371 [pdf, ps, other]

Layer-wise Quantization for Quantized Optimistic Dual Averaging

Authors: Anh Duc Nguyen, Ilia Markov, Frank Zhengqing Wu, Ali Ramezani-Kebrya, Kimon Antonakopoulos, Dan Alistarh, Volkan Cevher

Abstract: Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneiti… ▽ More Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a $150\%$ speedup over the baselines in end-to-end training time for training Wasserstein GAN on $12+$ GPUs. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Comments: Accepted at the International Conference on Machine Learning (ICML 2025)

arXiv:2411.10406 [pdf, other]

How to Build a Quantum Supercomputer: Scaling from Hundreds to Millions of Qubits

Authors: Masoud Mohseni, Artur Scherer, K. Grace Johnson, Oded Wertheim, Matthew Otten, Navid Anjum Aadit, Yuri Alexeev, Kirk M. Bresniker, Kerem Y. Camsari, Barbara Chapman, Soumitra Chatterjee, Gebremedhin A. Dagnew, Aniello Esposito, Farah Fahim, Marco Fiorentino, Archit Gajjar, Abdullah Khalid, Xiangzhou Kong, Bohdan Kulchytskyy, Elica Kyoseva, Ruoyu Li, P. Aaron Lott, Igor L. Markov, Robert F. McDermott, Giacomo Pedretti , et al. (16 additional authors not shown)

Abstract: In the span of four decades, quantum computation has evolved from an intellectual curiosity to a potentially realizable technology. Today, small-scale demonstrations have become possible for quantum algorithmic primitives on hundreds of physical qubits and proof-of-principle error-correction on a single logical qubit. Nevertheless, despite significant progress and excitement, the path toward a ful… ▽ More In the span of four decades, quantum computation has evolved from an intellectual curiosity to a potentially realizable technology. Today, small-scale demonstrations have become possible for quantum algorithmic primitives on hundreds of physical qubits and proof-of-principle error-correction on a single logical qubit. Nevertheless, despite significant progress and excitement, the path toward a full-stack scalable technology is largely unknown. There are significant outstanding quantum hardware, fabrication, software architecture, and algorithmic challenges that are either unresolved or overlooked. These issues could seriously undermine the arrival of utility-scale quantum computers for the foreseeable future. Here, we provide a comprehensive review of these scaling challenges. We show how the road to scaling could be paved by adopting existing semiconductor technology to build much higher-quality qubits, employing system engineering approaches, and performing distributed quantum computation within heterogeneous high-performance computing infrastructures. These opportunities for research and development could unlock certain promising applications, in particular, efficient quantum simulation/learning of quantum data generated by natural or engineered quantum systems. To estimate the true cost of such promises, we provide a detailed resource and sensitivity analysis for classically hard quantum chemistry calculations on surface-code error-corrected quantum computers given current, target, and desired hardware specifications based on superconducting qubits, accounting for a realistic distribution of errors. Furthermore, we argue that, to tackle industry-scale classical optimization and machine learning problems in a cost-effective manner, heterogeneous quantum-probabilistic computing with custom-designed accelerators should be considered as a complementary path toward scalability. △ Less

Submitted 31 January, 2025; v1 submitted 15 November, 2024; originally announced November 2024.

Comments: 76 pages, 46 figures. General revision, added figures, added references, added appendices

arXiv:2410.24038 [pdf, other]

Acceleration and Focusing Electron/Positron Bunches in Plasma-Dielectric Wakefield Accelerator

Authors: Gennadiy V. Sotnikov, Kostyantyn V. Galaydych, Jay L. Hirshfield, Peter I. Markov, Ivan M. Onishchenko

Abstract: To mitigate the BBU instability and improve characteristics of accelerated bunches in Dielectric Wakefield Accelerator one can be used the isotropic plasma filling of the transport channel. Here we present the results of analytical and numerical studies of the dynamics of accelerated electron/positron and drive electron bunches under wake acceleration in a plasma DWA (PDWA) with a vacuum channel.… ▽ More To mitigate the BBU instability and improve characteristics of accelerated bunches in Dielectric Wakefield Accelerator one can be used the isotropic plasma filling of the transport channel. Here we present the results of analytical and numerical studies of the dynamics of accelerated electron/positron and drive electron bunches under wake acceleration in a plasma DWA (PDWA) with a vacuum channel. The wake field is excited by an electron bunch in a quartz dielectric tube inserted into a cylindrical metal waveguide. The inner region of the dielectric tube is filled with plasma with a vacuum channel along the waveguide axis. At the numerical simulations the energy and spatial characteristics, efficiency, emittance, and energy spread for accelerated positron and electron bunches is studied for different radii of the vacuum channel. The transverse instability of the drive bunch in PDWA is studied analytically and numerically. The analytical studies have discovered the presence of one surface and one bulk eigenwaves, which are absent in corresponding dielectric-loaded waveguide without plasma filling. The main contribution to amplitude of transverse wakefield, responsible for stabilization of transverse motion of bunches, brings the bulk plasma eigenwave. The comparative analysis of the data resulting from analytical studies and the ones obtained by numerical simulation has demonstrated qualitative agreement between the results. △ Less

Submitted 31 October, 2024; originally announced October 2024.

Comments: 33 pages, 13 fugures, AAC2024 Worhshop, will be submitted to Nuclear Instruments and Methods in Physics Research

arXiv:2409.09659 [pdf, other]

Leveraging Open-Source Large Language Models for Native Language Identification

Authors: Yee Man Ng, Ilia Markov

Abstract: Native Language Identification (NLI) - the task of identifying the native language (L1) of a person based on their writing in the second language (L2) - has applications in forensics, marketing, and second language acquisition. Historically, conventional machine learning approaches that heavily rely on extensive feature engineering have outperformed transformer-based language models on this task.… ▽ More Native Language Identification (NLI) - the task of identifying the native language (L1) of a person based on their writing in the second language (L2) - has applications in forensics, marketing, and second language acquisition. Historically, conventional machine learning approaches that heavily rely on extensive feature engineering have outperformed transformer-based language models on this task. Recently, closed-source generative large language models (LLMs), e.g., GPT-4, have demonstrated remarkable performance on NLI in a zero-shot setting, including promising results in open-set classification. However, closed-source LLMs have many disadvantages, such as high costs and undisclosed nature of training data. This study explores the potential of using open-source LLMs for NLI. Our results indicate that open-source LLMs do not reach the accuracy levels of closed-source LLMs when used out-of-the-box. However, when fine-tuned on labeled training data, open-source LLMs can achieve performance comparable to that of commercial LLMs. △ Less

Submitted 19 January, 2025; v1 submitted 15 September, 2024; originally announced September 2024.

arXiv:2407.10994 [pdf, other]

Panza: Design and Analysis of a Fully-Local Personalized Text Writing Assistant

Authors: Armand Nicolicioiu, Eugenia Iofinova, Andrej Jovanovic, Eldar Kurtic, Mahdi Nikdan, Andrei Panferov, Ilia Markov, Nir Shavit, Dan Alistarh

Abstract: The availability of powerful open-source large language models (LLMs) opens exciting use-cases, such as using personal data to fine-tune these models to imitate a user's unique writing style. Two key requirements for such assistants are personalization - in the sense that the assistant should recognizably reflect the user's own writing style - and privacy - users may justifiably be wary of uploadi… ▽ More The availability of powerful open-source large language models (LLMs) opens exciting use-cases, such as using personal data to fine-tune these models to imitate a user's unique writing style. Two key requirements for such assistants are personalization - in the sense that the assistant should recognizably reflect the user's own writing style - and privacy - users may justifiably be wary of uploading extremely personal data, such as their email archive, to a third-party service. In this paper, we present a new design and evaluation for such an automated assistant, for the specific use case of email generation, which we call Panza. Panza's personalization features are based on a combination of fine-tuning using a variant of the Reverse Instructions technique together with Retrieval-Augmented Generation (RAG). We demonstrate that this combination allows us to fine-tune an LLM to reflect a user's writing style using limited data, while executing on extremely limited resources, e.g. on a free Google Colab instance. Our key methodological contribution is the first detailed study of evaluation metrics for this personalized writing task, and of how different choices of system components--the use of RAG and of different fine-tuning approaches-impact the system's performance. Additionally, we demonstrate that very little data - under 100 email samples - are sufficient to create models that convincingly imitate humans. This finding showcases a previously-unknown attack vector in language models - that access to a small number of writing samples can allow a bad actor to cheaply create generative models that imitate a target's writing style. We are releasing the full Panza code as well as three new email datasets licensed for research use at https://github.com/IST-DASLab/PanzaMail. △ Less

Submitted 10 February, 2025; v1 submitted 24 June, 2024; originally announced July 2024.

Comments: Panza is available at https://github.com/IST-DASLab/PanzaMail

arXiv:2405.15756 [pdf, other]

Wasserstein Distances, Neuronal Entanglement, and Sparsity

Authors: Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir Shavit

Abstract: Disentangling polysemantic neurons is at the core of many current approaches to interpretability of large language models. Here we attempt to study how disentanglement can be used to understand performance, particularly under weight sparsity, a leading post-training optimization technique. We suggest a novel measure for estimating neuronal entanglement: the Wasserstein distance of a neuron's outpu… ▽ More Disentangling polysemantic neurons is at the core of many current approaches to interpretability of large language models. Here we attempt to study how disentanglement can be used to understand performance, particularly under weight sparsity, a leading post-training optimization technique. We suggest a novel measure for estimating neuronal entanglement: the Wasserstein distance of a neuron's output distribution to a Gaussian. Moreover, we show the existence of a small number of highly entangled "Wasserstein Neurons" in each linear layer of an LLM, characterized by their highly non-Gaussian output distributions, their role in mapping similar inputs to dissimilar outputs, and their significant impact on model accuracy. To study these phenomena, we propose a new experimental framework for disentangling polysemantic neurons. Our framework separates each layer's inputs to create a mixture of experts where each neuron's output is computed by a mixture of neurons of lower Wasserstein distance, each better at maintaining accuracy when sparsified without retraining. We provide strong evidence that this is because the mixture of sparse experts is effectively disentangling the input-output relationship of individual neurons, in particular the difficult Wasserstein neurons. △ Less

Submitted 26 February, 2025; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 10 pages, 9 figures

arXiv:2405.13754 [pdf, other]

Grounding Toxicity in Real-World Events across Languages

Authors: Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen

Abstract: Social media conversations frequently suffer from toxicity, creating significant issues for users, moderators, and entire communities. Events in the real world, like elections or conflicts, can initiate and escalate toxic behavior online. Our study investigates how real-world events influence the origin and spread of toxicity in online discussions across various languages and regions. We gathered… ▽ More Social media conversations frequently suffer from toxicity, creating significant issues for users, moderators, and entire communities. Events in the real world, like elections or conflicts, can initiate and escalate toxic behavior online. Our study investigates how real-world events influence the origin and spread of toxicity in online discussions across various languages and regions. We gathered Reddit data comprising 4.5 million comments from 31 thousand posts in six different languages (Dutch, English, German, Arabic, Turkish and Spanish). We target fifteen major social and political world events that occurred between 2020 and 2023. We observe significant variations in toxicity, negative sentiment, and emotion expressions across different events and language communities, showing that toxicity is a complex phenomenon in which many different factors interact and still need to be investigated. We will release the data for further research along with our code. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Paper accepted for at The 29th International Conference on Natural Language & Information Systems (NLDB 2024)

arXiv:2404.18865 [pdf, other]

Truth-value judgment in language models: belief directions are context sensitive

Authors: Stefan F. Schouten, Peter Bloem, Ilia Markov, Piek Vossen

Abstract: Recent work has demonstrated that the latent spaces of large language models (LLMs) contain directions predictive of the truth of sentences. Multiple methods recover such directions and build probes that are described as getting at a model's "knowledge" or "beliefs". We investigate this phenomenon, looking closely at the impact of context on the probes. Our experiments establish where in the LLM t… ▽ More Recent work has demonstrated that the latent spaces of large language models (LLMs) contain directions predictive of the truth of sentences. Multiple methods recover such directions and build probes that are described as getting at a model's "knowledge" or "beliefs". We investigate this phenomenon, looking closely at the impact of context on the probes. Our experiments establish where in the LLM the probe's predictions can be described as being conditional on the preceding (related) sentences. Specifically, we quantify the responsiveness of the probes to the presence of (negated) supporting and contradicting sentences, and score the probes on their consistency. We also perform a causal intervention experiment, investigating whether moving the representation of a premise along these belief directions influences the position of the hypothesis along that same direction. We find that the probes we test are generally context sensitive, but that contexts which should not affect the truth often still impact the probe outputs. Our experiments show that the type of errors depend on the layer, the (type of) model, and the kind of data. Finally, our results suggest that belief directions are (one of the) causal mediators in the inference process that incorporates in-context information. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18810 [pdf, other]

Unknown Script: Impact of Script on Cross-Lingual Transfer

Authors: Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen

Abstract: Cross-lingual transfer has become an effective way of transferring knowledge between languages. In this paper, we explore an often overlooked aspect in this domain: the influence of the source language of a language model on language transfer performance. We consider a case where the target language and its script are not part of the pre-trained model. We conduct a series of experiments on monolin… ▽ More Cross-lingual transfer has become an effective way of transferring knowledge between languages. In this paper, we explore an often overlooked aspect in this domain: the influence of the source language of a language model on language transfer performance. We consider a case where the target language and its script are not part of the pre-trained model. We conduct a series of experiments on monolingual and multilingual models that are pre-trained on different tokenization methods to determine factors that affect cross-lingual transfer to a new language with a unique script. Our findings reveal the importance of the tokenizer as a stronger factor than the shared script, language similarity, and model size. △ Less

Submitted 7 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: Paper accepted to NAACL Student Research Workshop (SRW) 2024

arXiv:2404.18726 [pdf, other]

The Constant in HATE: Analyzing Toxicity in Reddit across Topics and Languages

Authors: Wondimagegnhue Tsegaye Tufa, Ilia Markov, Piek Vossen

Abstract: Toxic language remains an ongoing challenge on social media platforms, presenting significant issues for users and communities. This paper provides a cross-topic and cross-lingual analysis of toxicity in Reddit conversations. We collect 1.5 million comment threads from 481 communities in six languages: English, German, Spanish, Turkish,Arabic, and Dutch, covering 80 topics such as Culture, Politic… ▽ More Toxic language remains an ongoing challenge on social media platforms, presenting significant issues for users and communities. This paper provides a cross-topic and cross-lingual analysis of toxicity in Reddit conversations. We collect 1.5 million comment threads from 481 communities in six languages: English, German, Spanish, Turkish,Arabic, and Dutch, covering 80 topics such as Culture, Politics, and News. We thoroughly analyze how toxicity spikes within different communities in relation to specific topics. We observe consistent patterns of increased toxicity across languages for certain topics, while also noting significant variations within specific language communities. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Accepted to TRAC 2024

arXiv:2311.17614 [pdf, other]

Bunch-excited wakefield in dielectric waveguide with hollow plasma channel

Authors: K. V. Galaydych, P. I. Markov, G. V. Sotnikov

Abstract: Wakefield excitation by a single relativistic electron bunch in a plasma-dielectric accelerating structure has been studied both analytically and numerically. The structure represents a dielectric-loaded cylindrical metal waveguide, which has partially plasma-filled channel (the hollow plasma channel) to transport charged particles. Assuming the linear regime of excitation, analytical expressions… ▽ More Wakefield excitation by a single relativistic electron bunch in a plasma-dielectric accelerating structure has been studied both analytically and numerically. The structure represents a dielectric-loaded cylindrical metal waveguide, which has partially plasma-filled channel (the hollow plasma channel) to transport charged particles. Assuming the linear regime of excitation, analytical expressions have been derived for the longitudinal and radial wakefields generated by a finite-size electron bunch. Axial profiles of wakefield component amplitudes have been studied, and their mode and spectrum analyses have been performed. Furthermore, the electron bunch-driven wakefield excitation has been PIC-simulated numerically for the quasi-linear regime. The comparative analysis of the data resulting from analytical studies and the ones obtained by numerical simulation has demonstrated qualitative agreement between the results. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.05787 [pdf, other]

Towards stable real-world equation discovery with assessing differentiating quality influence

Authors: Mikhail Masliaev, Ilya Markov, Alexander Hvatov

Abstract: This paper explores the critical role of differentiation approaches for data-driven differential equation discovery. Accurate derivatives of the input data are essential for reliable algorithmic operation, particularly in real-world scenarios where measurement quality is inevitably compromised. We propose alternatives to the commonly used finite differences-based method, notorious for its instabil… ▽ More This paper explores the critical role of differentiation approaches for data-driven differential equation discovery. Accurate derivatives of the input data are essential for reliable algorithmic operation, particularly in real-world scenarios where measurement quality is inevitably compromised. We propose alternatives to the commonly used finite differences-based method, notorious for its instability in the presence of noise, which can exacerbate random errors in the data. Our analysis covers four distinct methods: Savitzky-Golay filtering, spectral differentiation, smoothing based on artificial neural networks, and the regularization of derivative variation. We evaluate these methods in terms of applicability to problems, similar to the real ones, and their ability to ensure the convergence of equation discovery algorithms, providing valuable insights for robust modeling of real-world processes. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.14657 [pdf, other]

Reasoning about Ambiguous Definite Descriptions

Authors: Stefan F. Schouten, Peter Bloem, Ilia Markov, Piek Vossen

Abstract: Natural language reasoning plays an increasingly important role in improving language models' ability to solve complex language understanding tasks. An interesting use case for reasoning is the resolution of context-dependent ambiguity. But no resources exist to evaluate how well Large Language Models can use explicit reasoning to resolve ambiguity in language. We propose to use ambiguous definite… ▽ More Natural language reasoning plays an increasingly important role in improving language models' ability to solve complex language understanding tasks. An interesting use case for reasoning is the resolution of context-dependent ambiguity. But no resources exist to evaluate how well Large Language Models can use explicit reasoning to resolve ambiguity in language. We propose to use ambiguous definite descriptions for this purpose and create and publish the first benchmark dataset consisting of such phrases. Our method includes all information required to resolve the ambiguity in the prompt, which means a model does not require anything but reasoning to do well. We find this to be a challenging task for recent LLMs. Code and data available at: https://github.com/sfschouten/exploiting-ambiguity △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Findings

arXiv:2310.09259 [pdf, other]

QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models

Authors: Saleh Ashkboos, Ilia Markov, Elias Frantar, Tingxuan Zhong, Xincheng Wang, Jie Ren, Torsten Hoefler, Dan Alistarh

Abstract: Large Language Models (LLMs) from the GPT family have become extremely popular, leading to a race towards reducing their inference costs to allow for efficient local computation. Yet, the vast majority of existing work focuses on weight-only quantization, which can reduce runtime costs in the memory-bound one-token-at-a-time generative setting, but does not address them in compute-bound scenarios,… ▽ More Large Language Models (LLMs) from the GPT family have become extremely popular, leading to a race towards reducing their inference costs to allow for efficient local computation. Yet, the vast majority of existing work focuses on weight-only quantization, which can reduce runtime costs in the memory-bound one-token-at-a-time generative setting, but does not address them in compute-bound scenarios, such as batched inference or prompt processing. In this paper, we address the general quantization problem, where both weights and activations should be quantized. We show, for the first time, that the majority of inference computations for large generative models such as LLaMA, OPT, and Falcon can be performed with both weights and activations being cast to 4 bits, in a way that leads to practical speedups, while at the same time maintaining good accuracy. We achieve this via a hybrid quantization strategy called QUIK, which compresses most of the weights and activations to 4-bit, while keeping some outlier weights and activations in higher-precision. The key feature of our scheme is that it is designed with computational efficiency in mind: we provide GPU kernels matching the QUIK format with highly-efficient layer-wise runtimes, which lead to practical end-to-end throughput improvements of up to 3.4x relative to FP16 execution. We provide detailed studies for models from the OPT, LLaMA-2 and Falcon families, as well as a first instance of accurate inference using quantization plus 2:4 sparsity. Code is available at: https://github.com/IST-DASLab/QUIK. △ Less

Submitted 2 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: 16 pages

arXiv:2306.09642 [pdf, ps, other]

Cross-Domain Toxic Spans Detection

Authors: Stefan F. Schouten, Baran Barbarestani, Wondimagegnhue Tufa, Piek Vossen, Ilia Markov

Abstract: Given the dynamic nature of toxic language use, automated methods for detecting toxic spans are likely to encounter distributional shift. To explore this phenomenon, we evaluate three approaches for detecting toxic spans under cross-domain conditions: lexicon-based, rationale extraction, and fine-tuned language models. Our findings indicate that a simple method using off-the-shelf lexicons perform… ▽ More Given the dynamic nature of toxic language use, automated methods for detecting toxic spans are likely to encounter distributional shift. To explore this phenomenon, we evaluate three approaches for detecting toxic spans under cross-domain conditions: lexicon-based, rationale extraction, and fine-tuned language models. Our findings indicate that a simple method using off-the-shelf lexicons performs best in the cross-domain setup. The cross-domain error analysis suggests that (1) rationale extraction methods are prone to false negatives, while (2) language models, despite performing best for the in-domain case, recall fewer explicitly toxic words than lexicons and are prone to certain types of false positives. Our code is publicly available at: https://github.com/sfschouten/toxic-cross-domain. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: NLDB 2023

arXiv:2306.09633 [pdf, other]

The False Dawn: Reevaluating Google's Reinforcement Learning for Chip Macro Placement

Authors: Igor L. Markov

Abstract: Reinforcement learning (RL) for physical design of silicon chips in a Google 2021 Nature paper stirred controversy due to poorly documented claims that raised eyebrows and drew critical media coverage. The paper withheld critical methodology steps and most inputs needed to reproduce results. Our meta-analysis shows how two separate evaluations filled in the gaps and demonstrated that Google RL lag… ▽ More Reinforcement learning (RL) for physical design of silicon chips in a Google 2021 Nature paper stirred controversy due to poorly documented claims that raised eyebrows and drew critical media coverage. The paper withheld critical methodology steps and most inputs needed to reproduce results. Our meta-analysis shows how two separate evaluations filled in the gaps and demonstrated that Google RL lags behind (i) human designers, (ii) a well-known algorithm (Simulated Annealing), and (iii) generally-available commercial software, while being slower; and in a 2023 open research contest, RL methods weren't in top 5. Crosschecked data indicate that the integrity of the Nature paper is substantially undermined owing to errors in conduct, analysis and reporting. Before publishing, Google rebuffed internal allegations of fraud, which still stand. We note policy implications and conclusions for chip design. △ Less

Submitted 28 September, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: 14 pages, 1 figure, 4 tables, 83 references

arXiv:2303.16531 [pdf, other]

RusTitW: Russian Language Text Dataset for Visual Text in-the-Wild Recognition

Authors: Igor Markov, Sergey Nesteruk, Andrey Kuznetsov, Denis Dimitrov

Abstract: Information surrounds people in modern life. Text is a very efficient type of information that people use for communication for centuries. However, automated text-in-the-wild recognition remains a challenging problem. The major limitation for a DL system is the lack of training data. For the competitive performance, training set must contain many samples that replicate the real-world cases. While… ▽ More Information surrounds people in modern life. Text is a very efficient type of information that people use for communication for centuries. However, automated text-in-the-wild recognition remains a challenging problem. The major limitation for a DL system is the lack of training data. For the competitive performance, training set must contain many samples that replicate the real-world cases. While there are many high-quality datasets for English text recognition; there are no available datasets for Russian language. In this paper, we present a large-scale human-labeled dataset for Russian text recognition in-the-wild. We also publish a synthetic dataset and code to reproduce the generation process △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: 5 pages, 6 figures, 2 tables

arXiv:2303.11580 [pdf, other]

Efficient Multi-stage Inference on Tabular Data

Authors: Daniel S Johnson, Igor L Markov

Abstract: Many ML applications and products train on medium amounts of input data but get bottlenecked in real-time inference. When implementing ML systems, conventional wisdom favors segregating ML code into services queried by product code via Remote Procedure Call (RPC) APIs. This approach clarifies the overall software architecture and simplifies product code by abstracting away ML internals. However, t… ▽ More Many ML applications and products train on medium amounts of input data but get bottlenecked in real-time inference. When implementing ML systems, conventional wisdom favors segregating ML code into services queried by product code via Remote Procedure Call (RPC) APIs. This approach clarifies the overall software architecture and simplifies product code by abstracting away ML internals. However, the separation adds network latency and entails additional CPU overhead. Hence, we simplify inference algorithms and embed them into the product code to reduce network communication. For public datasets and a high-performance real-time platform that deals with tabular data, we show that over half of the inputs are often amenable to such optimization, while the remainder can be handled by the original model. By applying our optimization with AutoML to both training and inference, we reduce inference latency by 1.3x, CPU resources by 30%, and network communication between application front-end and ML back-end by about 50% for a commercial end-to-end ML platform that serves millions of real-time decisions per second. △ Less

Submitted 21 July, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.03460 [pdf, other]

Ever more optimized simulations of fermionic systems on a quantum computer

Authors: Qingfeng Wang, Ze-Pei Cian, Ming Li, Igor L. Markov, Yunseong Nam

Abstract: Despite using a novel model of computation, quantum computers break down programs into elementary gates. Among such gates, entangling gates are the most expensive. In the context of fermionic simulations, we develop a suite of compilation and optimization techniques that massively reduce the entangling-gate counts. We exploit the well-studied non-quantum optimization algorithms to achieve up to 24… ▽ More Despite using a novel model of computation, quantum computers break down programs into elementary gates. Among such gates, entangling gates are the most expensive. In the context of fermionic simulations, we develop a suite of compilation and optimization techniques that massively reduce the entangling-gate counts. We exploit the well-studied non-quantum optimization algorithms to achieve up to 24\% savings over the state of the art for several small-molecule simulations, with no loss of accuracy or hidden costs. Our methodologies straightforwardly generalize to wider classes of near-term simulations of the ground state of a fermionic system or real-time simulations probing dynamical properties of a fermionic system. △ Less

Submitted 6 March, 2023; originally announced March 2023.

arXiv:2302.14139 [pdf, other]

Scalable End-to-End ML Platforms: from AutoML to Self-serve

Authors: Igor L. Markov, Pavlos A. Apostolopoulos, Mia R. Garrard, Tanya Qie, Yin Huang, Tanvi Gupta, Anika Li, Cesar Cardoso, George Han, Ryan Maghsoudian, Norm Zhou

Abstract: ML platforms help enable intelligent data-driven applications and maintain them with limited engineering effort. Upon sufficiently broad adoption, such platforms reach economies of scale that bring greater component reuse while improving efficiency of system development and maintenance. For an end-to-end ML platform with broad adoption, scaling relies on pervasive ML automation and system integrat… ▽ More ML platforms help enable intelligent data-driven applications and maintain them with limited engineering effort. Upon sufficiently broad adoption, such platforms reach economies of scale that bring greater component reuse while improving efficiency of system development and maintenance. For an end-to-end ML platform with broad adoption, scaling relies on pervasive ML automation and system integration to reach the quality we term self-serve that we define with ten requirements and six optional capabilities. With this in mind, we identify long-term goals for platform development, discuss related tradeoffs and future work. Our reasoning is illustrated on two commercially-deployed end-to-end ML platforms that host hundreds of real-time use cases -- one general-purpose and one specialized. △ Less

Submitted 3 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: 10 pages, 1 figure, 2 tables

arXiv:2302.12360 [pdf, other]

Practical Knowledge Distillation: Using DNNs to Beat DNNs

Authors: Chung-Wei Lee, Pavlos Athanasios Apostolopulos, Igor L. Markov

Abstract: For tabular data sets, we explore data and model distillation, as well as data denoising. These techniques improve both gradient-boosting models and a specialized DNN architecture. While gradient boosting is known to outperform DNNs on tabular data, we close the gap for datasets with 100K+ rows and give DNNs an advantage on small data sets. We extend these results with input-data distillation and… ▽ More For tabular data sets, we explore data and model distillation, as well as data denoising. These techniques improve both gradient-boosting models and a specialized DNN architecture. While gradient boosting is known to outperform DNNs on tabular data, we close the gap for datasets with 100K+ rows and give DNNs an advantage on small data sets. We extend these results with input-data distillation and optimized ensembling to help DNN performance match or exceed that of gradient boosting. As a theoretical justification of our practical method, we prove its equivalence to classical cross-entropy knowledge distillation. We also qualitatively explain the superiority of DNN ensembles over XGBoost on small data sets. For an industry end-to-end real-time ML platform with 4M production inferences per second, we develop a model-training workflow based on data sampling that distills ensembles of models into a single gradient-boosting model favored for high-performance real-time inference, without performance loss. Empirical evaluation shows that the proposed combination of methods consistently improves model accuracy over prior best models across several production applications deployed worldwide. △ Less

Submitted 1 March, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: 11 pages, 1 figure, 17 tables

arXiv:2302.02390 [pdf, other]

Quantized Distributed Training of Large Models with Convergence Guarantees

Authors: Ilia Markov, Adrian Vladu, Qi Guo, Dan Alistarh

Abstract: Communication-reduction techniques are a popular way to improve scalability in data-parallel training of deep neural networks (DNNs). The recent emergence of large language models such as GPT has created the need for new approaches to exploit data-parallelism. Among these, fully-sharded data parallel (FSDP) training is highly popular, yet it still encounters scalability bottlenecks. One reason is… ▽ More Communication-reduction techniques are a popular way to improve scalability in data-parallel training of deep neural networks (DNNs). The recent emergence of large language models such as GPT has created the need for new approaches to exploit data-parallelism. Among these, fully-sharded data parallel (FSDP) training is highly popular, yet it still encounters scalability bottlenecks. One reason is that applying compression techniques to FSDP is challenging: as the vast majority of the communication involves the model's weights, direct compression alters convergence and leads to accuracy loss. We present QSDP, a variant of FSDP which supports both gradient and weight quantization with theoretical guarantees, is simple to implement and has essentially no overheads. To derive QSDP we prove that a natural modification of SGD achieves convergence even when we only maintain quantized weights, and thus the domain over which we train consists of quantized points and is, therefore, highly non-convex. We validate this approach by training GPT-family models with up to 1.3 billion parameters on a multi-node cluster. Experiments show that QSDP preserves model accuracy, while completely removing the communication bottlenecks of FSDP, providing end-to-end speedups of up to 2.2x. △ Less

Submitted 5 February, 2023; originally announced February 2023.

arXiv:2301.07233 [pdf, other]

Enhancing quantum computer performance via symmetrization

Authors: Andrii Maksymov, Jason Nguyen, Yunseong Nam, Igor Markov

Abstract: Large quantum computers promise to solve some critical problems not solvable otherwise. However, modern quantum technologies suffer various imperfections such as control errors and qubit decoherence, inhibiting their potential utility. The overheads of quantum error correction are too great for near-term quantum computers, whereas error-mitigation strategies that address specific device imperfecti… ▽ More Large quantum computers promise to solve some critical problems not solvable otherwise. However, modern quantum technologies suffer various imperfections such as control errors and qubit decoherence, inhibiting their potential utility. The overheads of quantum error correction are too great for near-term quantum computers, whereas error-mitigation strategies that address specific device imperfections may lose relevance as devices improve. To enhance the performance of quantum computers with high-quality qubits, we introduce a strategy based on symmetrization and nonlinear aggregation. On a commercial trapped-ion quantum computer, it improves performance of multiple practical algorithms by 100x with no qubit or gate overhead. △ Less

Submitted 17 January, 2023; originally announced January 2023.

arXiv:2210.17357 [pdf, other]

L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning

Authors: Mohammadreza Alimohammadi, Ilia Markov, Elias Frantar, Dan Alistarh

Abstract: Data-parallel distributed training of deep neural networks (DNN) has gained very widespread adoption, but can still experience communication bottlenecks. To address this issue, entire families of compression mechanisms have been developed, including quantization, sparsification, and low-rank approximation, some of which are seeing significant practical adoption. Despite this progress, almost all k… ▽ More Data-parallel distributed training of deep neural networks (DNN) has gained very widespread adoption, but can still experience communication bottlenecks. To address this issue, entire families of compression mechanisms have been developed, including quantization, sparsification, and low-rank approximation, some of which are seeing significant practical adoption. Despite this progress, almost all known compression schemes apply compression uniformly across DNN layers, although layers are heterogeneous in terms of parameter count and their impact on model accuracy. In this work, we provide a general framework for adapting the degree of compression across the model's layers dynamically during training, improving the overall compression, while leading to substantial speedups, without sacrificing accuracy. Our framework, called L-GreCo, is based on an adaptive algorithm, which automatically picks the optimal compression parameters for model layers guaranteeing the best compression ratio while satisfying an error constraint. Extensive experiments over image classification and language modeling tasks shows that L-GreCo is effective across all existing families of compression methods, and achieves up to 2.5$\times$ training speedup and up to 5$\times$ compression improvement over efficient implementations of existing approaches, while recovering full accuracy. Moreover, L-GreCo is complementary to existing adaptive algorithms, improving their compression ratio by 50% and practical throughput by 66%. △ Less

Submitted 9 June, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

arXiv:2210.12526 [pdf, other]

Federated Calibration and Evaluation of Binary Classifiers

Authors: Graham Cormode, Igor Markov

Abstract: We address two major obstacles to practical use of supervised classifiers on distributed private data. Whether a classifier was trained by a federation of cooperating clients or trained centrally out of distribution, (1) the output scores must be calibrated, and (2) performance metrics must be evaluated -- all without assembling labels in one place. In particular, we show how to perform calibratio… ▽ More We address two major obstacles to practical use of supervised classifiers on distributed private data. Whether a classifier was trained by a federation of cooperating clients or trained centrally out of distribution, (1) the output scores must be calibrated, and (2) performance metrics must be evaluated -- all without assembling labels in one place. In particular, we show how to perform calibration and compute precision, recall, accuracy and ROC-AUC in the federated setting under three privacy models (i) secure aggregation, (ii) distributed differential privacy, (iii) local differential privacy. Our theorems and experiments clarify tradeoffs between privacy, accuracy, and data efficiency. They also help decide whether a given application has sufficient data to support federated calibration and evaluation. △ Less

Submitted 22 October, 2022; originally announced October 2022.

Comments: 24 pages

arXiv:2202.09483 [pdf, other]

Data-Driven Mitigation of Adversarial Text Perturbation

Authors: Rasika Bhalerao, Mohammad Al-Rubaie, Anand Bhaskar, Igor Markov

Abstract: Social networks have become an indispensable part of our lives, with billions of people producing ever-increasing amounts of text. At such scales, content policies and their enforcement become paramount. To automate moderation, questionable content is detected by Natural Language Processing (NLP) classifiers. However, high-performance classifiers are hampered by misspellings and adversarial text p… ▽ More Social networks have become an indispensable part of our lives, with billions of people producing ever-increasing amounts of text. At such scales, content policies and their enforcement become paramount. To automate moderation, questionable content is detected by Natural Language Processing (NLP) classifiers. However, high-performance classifiers are hampered by misspellings and adversarial text perturbations. In this paper, we classify intentional and unintentional adversarial text perturbation into ten types and propose a deobfuscation pipeline to make NLP models robust to such perturbations. We propose Continuous Word2Vec (CW2V), our data-driven method to learn word embeddings that ensures that perturbations of words have embeddings similar to those of the original words. We show that CW2V embeddings are generally more robust to text perturbations than embeddings based on character ngrams. Our robust classification pipeline combines deobfuscation and classification, using proposed defense methods and word embeddings to classify whether Facebook posts are requesting engagement such as likes. Our pipeline results in engagement bait classification that goes from 0.70 to 0.67 AUC with adversarial text perturbation, while character ngram-based word embedding methods result in downstream classification that goes from 0.76 to 0.64. △ Less

Submitted 18 February, 2022; originally announced February 2022.

arXiv:2111.12795 [pdf, other]

Picasso: Model-free Feature Visualization

Authors: Binh Vu, Igor Markov

Abstract: Today, Machine Learning (ML) applications can have access to tens of thousands of features. With such feature sets, efficiently browsing and curating subsets of most relevant features is a challenge. In this paper, we present a novel approach to visualize up to several thousands of features in a single image. The image not only shows information on individual features, but also expresses feature i… ▽ More Today, Machine Learning (ML) applications can have access to tens of thousands of features. With such feature sets, efficiently browsing and curating subsets of most relevant features is a challenge. In this paper, we present a novel approach to visualize up to several thousands of features in a single image. The image not only shows information on individual features, but also expresses feature interactions via the relative positioning of features. △ Less

Submitted 24 November, 2021; originally announced November 2021.

arXiv:2111.08617 [pdf, other]

doi 10.1145/3528535.3565248

CGX: Adaptive System Support for Communication-Efficient Deep Learning

Authors: Ilia Markov, Hamidreza Ramezanikebrya, Dan Alistarh

Abstract: The ability to scale out training workloads has been one of the key performance enablers of deep learning. The main scaling approach is data-parallel GPU-based training, which has been boosted by hardware and software support for highly efficient point-to-point communication, and in particular via hardware bandwidth overprovisioning. Overprovisioning comes at a cost: there is an order of magnitude… ▽ More The ability to scale out training workloads has been one of the key performance enablers of deep learning. The main scaling approach is data-parallel GPU-based training, which has been boosted by hardware and software support for highly efficient point-to-point communication, and in particular via hardware bandwidth overprovisioning. Overprovisioning comes at a cost: there is an order of magnitude price difference between "cloud-grade" servers with such support, relative to their popular "consumer-grade" counterparts, although single server-grade and consumer-grade GPUs can have similar computational envelopes. In this paper, we show that the costly hardware overprovisioning approach can be supplanted via algorithmic and system design, and propose a framework called CGX, which provides efficient software support for compressed communication in ML applications, for both multi-GPU single-node training, as well as larger-scale multi-node training. CGX is based on two technical advances: \emph{At the system level}, it relies on a re-developed communication stack for ML frameworks, which provides flexible, highly-efficient support for compressed communication. \emph{At the application level}, it provides \emph{seamless, parameter-free} integration with popular frameworks, so that end-users do not have to modify training recipes, nor significant training code. This is complemented by a \emph{layer-wise adaptive compression} technique which dynamically balances compression gains with accuracy preservation. CGX integrates with popular ML frameworks, providing up to 3X speedups for multi-GPU nodes based on commodity hardware, and order-of-magnitude improvements in the multi-node setting, with negligible impact on accuracy. △ Less

Submitted 29 December, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

Journal ref: Middleware 2022

arXiv:2110.07554 [pdf, other]

Looper: An end-to-end ML platform for product decisions

Authors: Igor L. Markov, Hanson Wang, Nitya Kasturi, Shaun Singh, Sze Wai Yuen, Mia Garrard, Sarah Tran, Yin Huang, Zehui Wang, Igor Glotov, Tanvi Gupta, Boshuang Huang, Peng Chen, Xiaowen Xie, Michael Belkin, Sal Uryasev, Sam Howie, Eytan Bakshy, Norm Zhou

Abstract: Modern software systems and products increasingly rely on machine learning models to make data-driven decisions based on interactions with users, infrastructure and other systems. For broader adoption, this practice must (i) accommodate product engineers without ML backgrounds, (ii) support finegrain product-metric evaluation and (iii) optimize for product goals. To address shortcomings of prior p… ▽ More Modern software systems and products increasingly rely on machine learning models to make data-driven decisions based on interactions with users, infrastructure and other systems. For broader adoption, this practice must (i) accommodate product engineers without ML backgrounds, (ii) support finegrain product-metric evaluation and (iii) optimize for product goals. To address shortcomings of prior platforms, we introduce general principles for and the architecture of an ML platform, Looper, with simple APIs for decision-making and feedback collection. Looper covers the end-to-end ML lifecycle from collecting training data and model training to deployment and inference, and extends support to personalization, causal evaluation with heterogenous treatment effects, and Bayesian tuning for product goals. During the 2021 production deployment Looper simultaneously hosted 440-1,000 ML models that made 4-6 million real-time decisions per second. We sum up experiences of platform adopters and describe their learning curve. △ Less

Submitted 21 June, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: 11 pages + references, 7 figures; to appear in KDD 2022

arXiv:2109.11577 [pdf, other]

Text Ranking and Classification using Data Compression

Authors: Nitya Kasturi, Igor L. Markov

Abstract: A well-known but rarely used approach to text categorization uses conditional entropy estimates computed using data compression tools. Text affinity scores derived from compressed sizes can be used for classification and ranking tasks, but their success depends on the compression tools used. We use the Zstandard compressor and strengthen these ideas in several ways, calling the resulting language-… ▽ More A well-known but rarely used approach to text categorization uses conditional entropy estimates computed using data compression tools. Text affinity scores derived from compressed sizes can be used for classification and ranking tasks, but their success depends on the compression tools used. We use the Zstandard compressor and strengthen these ideas in several ways, calling the resulting language-agnostic technique Zest. In applications, this approach simplifies configuration, avoiding careful feature extraction and large ML models. Our ablation studies confirm the value of individual enhancements we introduce. We show that Zest complements and can compete with language-specific multidimensional content embeddings in production, but cannot outperform other counting methods on public datasets. △ Less

Submitted 7 December, 2021; v1 submitted 23 September, 2021; originally announced September 2021.

Journal ref: ICBINB workshop at NeurIPS 2021

arXiv:2108.13815 [pdf, other]

doi 10.1088/1748-0221/17/11/P11013

Acceleration and focusing of positron bunch in a dielectric waveguide accelerator with homogeneous plasma in transport channel

Authors: P. I. Markov, R. R. Kniaziev, G. V. Sotnikov

Abstract: The paper presents the results of numerical PIC-simulation of positron bunch focusing when acceleration in a plasma dielectric wakefield accelerator. The wakefield was excited by drive electron bunch in quartz dielectric tube, embedded in cylindrical metal waveguide. The internal area of dielectric tube has been filled with radially homogeneous plasma having in general case the vacuum channel alon… ▽ More The paper presents the results of numerical PIC-simulation of positron bunch focusing when acceleration in a plasma dielectric wakefield accelerator. The wakefield was excited by drive electron bunch in quartz dielectric tube, embedded in cylindrical metal waveguide. The internal area of dielectric tube has been filled with radially homogeneous plasma having in general case the vacuum channel along waveguide axis. Results of numerical PIC simulation have shown that it is possible a simultaneous acceleration and focusing of test positron bunch in the wakefield. The dependence of transport and acceleration of positron bunch on size of vacuum channel and waveguide length is studied. △ Less

Submitted 11 December, 2021; v1 submitted 31 August, 2021; originally announced August 2021.

Comments: 10 pages, 9 figures

arXiv:2108.08538 [pdf, other]

doi 10.1145/3459637.3482275

Mixture-Based Correction for Position and Trust Bias in Counterfactual Learning to Rank

Authors: Ali Vardasbi, Maarten de Rijke, Ilya Markov

Abstract: In counterfactual learning to rank (CLTR) user interactions are used as a source of supervision. Since user interactions come with bias, an important focus of research in this field lies in developing methods to correct for the bias of interactions. Inverse propensity scoring (IPS) is a popular method suitable for correcting position bias. Affine correction (AC) is a generalization of IPS that cor… ▽ More In counterfactual learning to rank (CLTR) user interactions are used as a source of supervision. Since user interactions come with bias, an important focus of research in this field lies in developing methods to correct for the bias of interactions. Inverse propensity scoring (IPS) is a popular method suitable for correcting position bias. Affine correction (AC) is a generalization of IPS that corrects for position bias and trust bias. IPS and AC provably remove bias, conditioned on an accurate estimation of the bias parameters. Estimating the bias parameters, in turn, requires an accurate estimation of the relevance probabilities. This cyclic dependency introduces practical limitations in terms of sensitivity, convergence and efficiency. We propose a new correction method for position and trust bias in CLTR in which, unlike the existing methods, the correction does not rely on relevance estimation. Our proposed method, mixture-based correction (MBC), is based on the assumption that the distribution of the CTRs over the items being ranked is a mixture of two distributions: the distribution of CTRs for relevant items and the distribution of CTRs for non-relevant items. We prove that our method is unbiased. The validity of our proof is not conditioned on accurate bias parameter estimation. Our experiments show that MBC, when used in different bias settings and accompanied by different LTR algorithms, outperforms AC, the state-of-the-art method for correcting position and trust bias, in some settings, while performing on par in other settings. Furthermore, MBC is orders of magnitude more efficient than AC in terms of the training time. △ Less

Submitted 19 August, 2021; originally announced August 2021.

Comments: CIKM 2021

arXiv:2108.03708 [pdf, other]

Detecting Qubit-coupling Faults in Ion-trap Quantum Computers

Authors: Andrii O. Maksymov, Jason Nguyen, Vandiver Chaplin, Yunseong Nam, Igor L. Markov

Abstract: Ion-trap quantum computers offer a large number of possible qubit couplings, each of which requires individual calibration and can be misconfigured. To enhance the duty cycle of an ion trap, we develop a strategy that diagnoses individual miscalibrated couplings using only log-many tests. This strategy is validated on a commercial ion-trap quantum computer, where we illustrate the process of debug… ▽ More Ion-trap quantum computers offer a large number of possible qubit couplings, each of which requires individual calibration and can be misconfigured. To enhance the duty cycle of an ion trap, we develop a strategy that diagnoses individual miscalibrated couplings using only log-many tests. This strategy is validated on a commercial ion-trap quantum computer, where we illustrate the process of debugging faulty quantum gates. Our methodology provides a scalable pathway towards fault detections on a larger scale ion-trap quantum computers, confirmed by simulations up to 32 qubits. △ Less

Submitted 12 December, 2021; v1 submitted 8 August, 2021; originally announced August 2021.

Journal ref: HPCA 2022

arXiv:2108.01521 [pdf, other]

Bit-efficient Numerical Aggregation and Stronger Privacy for Trust in Federated Analytics

Authors: Graham Cormode, Igor L. Markov

Abstract: Private data generated by edge devices -- from smart phones to automotive electronics -- are highly informative when aggregated but can be damaging when mishandled. A variety of solutions are being explored but have not yet won the public's trust and full backing of mobile platforms. In this work, we propose numerical aggregation protocols that empirically improve upon prior art, while providing c… ▽ More Private data generated by edge devices -- from smart phones to automotive electronics -- are highly informative when aggregated but can be damaging when mishandled. A variety of solutions are being explored but have not yet won the public's trust and full backing of mobile platforms. In this work, we propose numerical aggregation protocols that empirically improve upon prior art, while providing comparable local differential privacy guarantees. Sharing a single private bit per value supports privacy metering that enable privacy controls and guarantees that are not covered by differential privacy. We put emphasis on the ease of implementation, compatibility with existing methods, and compelling empirical performance. △ Less

Submitted 3 August, 2021; originally announced August 2021.

Comments: 15 pages

arXiv:2104.13818

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

Authors: Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

Abstract: As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD prov… ▽ More As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods. △ Less

Submitted 1 May, 2021; v1 submitted 28 April, 2021; originally announced April 2021.

Comments: This entry is redundant and was created in error. See arXiv:1908.06077 for the latest version

arXiv:2102.09507 [pdf, ps, other]

Regular Expressions for Fast-response COVID-19 Text Classification

Authors: Igor L. Markov, Jacqueline Liu, Adam Vagner

Abstract: Text classifiers are at the core of many NLP applications and use a variety of algorithmic approaches and software. This paper introduces infrastructure and methodologies for text classifiers based on large-scale regular expressions. In particular, we describe how Facebook determines if a given piece of text - anything from a hashtag to a post - belongs to a narrow topic such as COVID-19. To fully… ▽ More Text classifiers are at the core of many NLP applications and use a variety of algorithmic approaches and software. This paper introduces infrastructure and methodologies for text classifiers based on large-scale regular expressions. In particular, we describe how Facebook determines if a given piece of text - anything from a hashtag to a post - belongs to a narrow topic such as COVID-19. To fully define a topic and evaluate classifier performance we employ human-guided iterations of keyword discovery, but do not require labeled data. For COVID-19, we build two sets of regular expressions: (1) for 66 languages, with 99% precision and recall >50%, (2) for the 11 most common languages, with precision >90% and recall >90%. Regular expressions enable low-latency queries from multiple platforms. Response to challenges like COVID-19 is fast and so are revisions. Comparisons to a DNN classifier show explainable results, higher precision and recall, and less overfitting. Our learnings can be applied to other narrow-topic classifiers. △ Less

Submitted 21 June, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: 10 pages, 7 tables

arXiv:2102.08465 [pdf, other]

Prioritizing Original News on Facebook

Authors: Xiuyan Ni, Shujian Bu, Igor L. Markov

Abstract: This work outlines how we prioritize original news, a critical indicator of news quality. By examining the landscape and life-cycle of news posts on our social media platform, we identify challenges of building and deploying an originality score. We pursue an approach based on normalized PageRank values and three-step clustering, and refresh the score on an hourly basis to capture the dynamics of… ▽ More This work outlines how we prioritize original news, a critical indicator of news quality. By examining the landscape and life-cycle of news posts on our social media platform, we identify challenges of building and deploying an originality score. We pursue an approach based on normalized PageRank values and three-step clustering, and refresh the score on an hourly basis to capture the dynamics of online news. We describe a near real-time system architecture, evaluate our methodology, and deploy it to production. Our empirical results validate individual components and show that prioritizing original news increases user engagement with news and improves proprietary cumulative metrics. △ Less

Submitted 14 March, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: 9 pages, 8 figures, 6 tables, 2 algorithm pseudocodes

Journal ref: CIKM 2021

arXiv:2102.05612 [pdf, other]

Personalization for Web-based Services using Offline Reinforcement Learning

Authors: Pavlos Athanasios Apostolopoulos, Zehui Wang, Hanson Wang, Chad Zhou, Kittipat Virochsiri, Norm Zhou, Igor L. Markov

Abstract: Large-scale Web-based services present opportunities for improving UI policies based on observed user interactions. We address challenges of learning such policies through model-free offline Reinforcement Learning (RL) with off-policy training. Deployed in a production system for user authentication in a major social network, it significantly improves long-term objectives. We articulate practical… ▽ More Large-scale Web-based services present opportunities for improving UI policies based on observed user interactions. We address challenges of learning such policies through model-free offline Reinforcement Learning (RL) with off-policy training. Deployed in a production system for user authentication in a major social network, it significantly improves long-term objectives. We articulate practical challenges, compare several ML techniques, provide insights on training and evaluation of RL models, and discuss generalizations. △ Less

Submitted 10 February, 2021; originally announced February 2021.

Comments: 9 pages, 8 figures, 3 tables

Journal ref: 2nd Offline Reinforcement Learning Workshop at NeurIPS 2021

arXiv:2012.05615 [pdf, other]

doi 10.23919/DATE51398.2021.9474034

As Accurate as Needed, as Efficient as Possible: Approximations in DD-based Quantum Circuit Simulation

Authors: Stefan Hillmich, Richard Kueng, Igor L. Markov, Robert Wille

Abstract: Quantum computers promise to solve important problems faster than conventional computers. However, unleashing this power has been challenging. In particular, design automation runs into (1) the probabilistic nature of quantum computation and (2) exponential requirements for computational resources on non-quantum hardware. In quantum circuit simulation, Decision Diagrams (DDs) have previously shown… ▽ More Quantum computers promise to solve important problems faster than conventional computers. However, unleashing this power has been challenging. In particular, design automation runs into (1) the probabilistic nature of quantum computation and (2) exponential requirements for computational resources on non-quantum hardware. In quantum circuit simulation, Decision Diagrams (DDs) have previously shown to reduce the required memory in many important cases by exploiting redundancies in the quantum state. In this paper, we show that this reduction can be amplified by exploiting the probabilistic nature of quantum computers to achieve even more compact representations. Specifically, we propose two new DD-based simulation strategies that approximate the quantum states to attain more compact representations, while, at the same time, allowing the user to control the resulting degradation in accuracy. We also analytically prove the effect of multiple approximations on the attained accuracy and empirically show that the resulting simulation scheme enables speed-ups up to several orders of magnitudes. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: 6 pages, 2 figures, to be published at Design, Automation, and Test in Europe 2021

arXiv:2010.12460 [pdf, other]

Adaptive Gradient Quantization for Data-Parallel SGD

Authors: Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel Roy, Ali Ramezani-Kebrya

Abstract: Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression sch… ▽ More Many communication-efficient variants of SGD use gradient quantization schemes. These schemes are often heuristic and fixed over the course of training. We empirically observe that the statistics of gradients of deep models change during the training. Motivated by this observation, we introduce two adaptive quantization schemes, ALQ and AMQ. In both schemes, processors update their compression schemes in parallel by efficiently computing sufficient statistics of a parametric distribution. We improve the validation accuracy by almost 2% on CIFAR-10 and 1% on ImageNet in challenging low-cost communication setups. Our adaptive methods are also significantly more robust to the choice of hyperparameters. △ Less

Submitted 23 October, 2020; originally announced October 2020.

Comments: Accepted at the conference on Neural Information Processing Systems (NeurIPS 2020)

arXiv:2008.00216 [pdf, other]

Faster Schrödinger-style simulation of quantum circuits

Authors: Aneeqa Fatima, Igor L. Markov

Abstract: Recent demonstrations of superconducting quantum computers by Google and IBM and trapped-ion computers from IonQ fueled new research in quantum algorithms, compilation into quantum circuits, and empirical algorithmics. While online access to quantum hardware remains too limited to meet the demand, simulating quantum circuits on conventional computers satisfies many needs. We advance Schrödinger-st… ▽ More Recent demonstrations of superconducting quantum computers by Google and IBM and trapped-ion computers from IonQ fueled new research in quantum algorithms, compilation into quantum circuits, and empirical algorithmics. While online access to quantum hardware remains too limited to meet the demand, simulating quantum circuits on conventional computers satisfies many needs. We advance Schrödinger-style simulation of quantum circuits that is useful standalone and as a building block in layered simulation algorithms, both cases are illustrated in our results. Our algorithmic contributions show how to simulate multiple quantum gates at once, how to avoid floating-point multiplies, how to best use instruction-level and thread-level parallelism as well as CPU cache, and how to leverage these optimizations by reordering circuit gates. While not described previously, these techniques implemented by us supported published high-performance distributed simulations up to 64 qubits. To show additional impact, we benchmark our simulator against Microsoft, IBM and Google simulators on hard circuits from Google. △ Less

Submitted 24 November, 2020; v1 submitted 1 August, 2020; originally announced August 2020.

Comments: 14 pages, 15 figures, 4 tables. Version 2 : Additional optimizations; improved simulation runtimes; profiling data; comparisons with the latest IBM QISKit simulator; dispelled apparent limitations of techniques. Version 3 : Ablation experiments and images for the code snippets

Journal ref: HPCA 2021

arXiv:2007.15285 [pdf, ps, other]

doi 10.1109/DAC18072.2020.9218555

Just Like the Real Thing: Fast Weak Simulation of Quantum Computation

Authors: Stefan Hillmich, Igor L. Markov, Robert Wille

Abstract: Quantum computers promise significant speedups in solving problems intractable for conventional computers but, despite recent progress, remain limited in scaling and availability. Therefore, quantum software and hardware development heavily rely on simulation that runs on conventional computers. Most such approaches perform strong simulation in that they explicitly compute amplitudes of quantum st… ▽ More Quantum computers promise significant speedups in solving problems intractable for conventional computers but, despite recent progress, remain limited in scaling and availability. Therefore, quantum software and hardware development heavily rely on simulation that runs on conventional computers. Most such approaches perform strong simulation in that they explicitly compute amplitudes of quantum states. However, such information is not directly observable from a physical quantum computer because quantum measurements produce random samples from probability distributions defined by those amplitudes. In this work, we focus on weak simulation that aims to produce outputs which are statistically indistinguishable from those of error-free quantum computers. We develop algorithms for weak simulation based on quantum state representation in terms of decision diagrams. We compare them to using state-vector arrays and binary search on prefix sums to perform sampling. Empirical validation shows, for the first time, that this enables mimicking of physical quantum computers of significant scale. △ Less

Submitted 30 July, 2020; originally announced July 2020.

Comments: 6 pages, 4 figures

Journal ref: Design Automation Conference (DAC) 2020

arXiv:2005.11938 [pdf, other]

doi 10.1145/3397271.3401299

Cascade Model-based Propensity Estimation for Counterfactual Learning to Rank

Authors: Ali Vardasbi, Maarten de Rijke, Ilya Markov

Abstract: Unbiased CLTR requires click propensities to compensate for the difference between user clicks and true relevance of search results via IPS. Current propensity estimation methods assume that user click behavior follows the PBM and estimate click propensities based on this assumption. However, in reality, user clicks often follow the CM, where users scan search results from top to bottom and where… ▽ More Unbiased CLTR requires click propensities to compensate for the difference between user clicks and true relevance of search results via IPS. Current propensity estimation methods assume that user click behavior follows the PBM and estimate click propensities based on this assumption. However, in reality, user clicks often follow the CM, where users scan search results from top to bottom and where each next click depends on the previous one. In this cascade scenario, PBM-based estimates of propensities are not accurate, which, in turn, hurts CLTR performance. In this paper, we propose a propensity estimation method for the cascade scenario, called CM-IPS. We show that CM-IPS keeps CLTR performance close to the full-information performance in case the user clicks follow the CM, while PBM-based CLTR has a significant gap towards the full-information. The opposite is true if the user clicks follow PBM instead of the CM. Finally, we suggest a way to select between CM- and PBM-based propensity estimation methods based on historical user clicks. △ Less

Submitted 25 May, 2020; originally announced May 2020.

Comments: 4 pages, 2 figures, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '20)

arXiv:2005.01588 [pdf]

Workshops on Extreme Scale Design Automation (ESDA) Challenges and Opportunities for 2025 and Beyond

Authors: R. Iris Bahar, Alex K. Jones, Srinivas Katkoori, Patrick H. Madden, Diana Marculescu, Igor L. Markov

Abstract: Integrated circuits and electronic systems, as well as design technologies, are evolving at a great rate -- both quantitatively and qualitatively. Major developments include new interconnects and switching devices with atomic-scale uncertainty, the depth and scale of on-chip integration, electronic system-level integration, the increasing significance of software, as well as more effective means o… ▽ More Integrated circuits and electronic systems, as well as design technologies, are evolving at a great rate -- both quantitatively and qualitatively. Major developments include new interconnects and switching devices with atomic-scale uncertainty, the depth and scale of on-chip integration, electronic system-level integration, the increasing significance of software, as well as more effective means of design entry, compilation, algorithmic optimization, numerical simulation, pre- and post-silicon design validation, and chip test. Application targets and key markets are also shifting substantially from desktop CPUs to mobile platforms to an Internet-of-Things infrastructure. In light of these changes in electronic design contexts and given EDA's significant dependence on such context, the EDA community must adapt to these changes and focus on the opportunities for research and commercial success. The CCC workshop series on Extreme-Scale Design Automation, organized with the support of ACM SIGDA, studied challenges faced by the EDA community as well as new and exciting opportunities currently available. This document represents a summary of the findings from these meetings. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: A Computing Community Consortium (CCC) workshop report, 32 pages

Report number: ccc2014report_1

arXiv:2002.04904 [pdf, ps, other]

doi 10.1109/ASP-DAC47756.2020.9045454

Approximation of Quantum States Using Decision Diagrams

Authors: Alwin Zulehner, Stefan Hillmich, Igor L. Markov, Robert Wille

Abstract: The computational power of quantum computers poses major challenges to new design tools since representing pure quantum states typically requires exponentially large memory. As shown previously, decision diagrams can reduce these memory requirements by exploiting redundancies. In this work, we demonstrate further reductions by allowing for small inaccuracies in the quantum state representation. Su… ▽ More The computational power of quantum computers poses major challenges to new design tools since representing pure quantum states typically requires exponentially large memory. As shown previously, decision diagrams can reduce these memory requirements by exploiting redundancies. In this work, we demonstrate further reductions by allowing for small inaccuracies in the quantum state representation. Such inaccuracies are legitimate since quantum computers themselves experience gate and measurement errors and since quantum algorithms are somewhat resistant to errors (even without error correction). We develop four dedicated schemes that exploit these observations and effectively approximate quantum states represented by decision diagrams. We empirically show that the proposed schemes reduce the size of decision diagrams by up to several orders of magnitude while controlling the fidelity of approximate quantum state representations. △ Less

Submitted 12 February, 2020; originally announced February 2020.

Journal ref: Asia and South Pacific Design Automation Conference 2020

arXiv:2002.00467 [pdf, other]

Safe Exploration for Optimizing Contextual Bandits

Authors: Rolf Jagerman, Ilya Markov, Maarten de Rijke

Abstract: Contextual bandit problems are a natural fit for many information retrieval tasks, such as learning to rank, text classification, recommendation, etc. However, existing learning methods for contextual bandit problems have one of two drawbacks: they either do not explore the space of all possible document rankings (i.e., actions) and, thus, may miss the optimal ranking, or they present suboptimal r… ▽ More Contextual bandit problems are a natural fit for many information retrieval tasks, such as learning to rank, text classification, recommendation, etc. However, existing learning methods for contextual bandit problems have one of two drawbacks: they either do not explore the space of all possible document rankings (i.e., actions) and, thus, may miss the optimal ranking, or they present suboptimal rankings to a user and, thus, may harm the user experience. We introduce a new learning method for contextual bandit problems, Safe Exploration Algorithm (SEA), which overcomes the above drawbacks. SEA starts by using a baseline (or production) ranking system (i.e., policy), which does not harm the user experience and, thus, is safe to execute, but has suboptimal performance and, thus, needs to be improved. Then SEA uses counterfactual learning to learn a new policy based on the behavior of the baseline policy. SEA also uses high-confidence off-policy evaluation to estimate the performance of the newly learned policy. Once the performance of the newly learned policy is at least as good as the performance of the baseline policy, SEA starts using the new policy to execute new actions, allowing it to actively explore favorable regions of the action space. This way, SEA never performs worse than the baseline policy and, thus, does not harm the user experience, while still exploring the action space and, thus, being able to find an optimal policy. Our experiments using text classification and document retrieval confirm the above by comparing SEA (and a boundless variant called BSEA) to online and offline learning methods for contextual bandit problems. △ Less

Submitted 2 February, 2020; originally announced February 2020.

Comments: 23 pages, 3 figures

arXiv:2001.05918 [pdf, other]

Elastic Consistency: A General Consistency Model for Distributed Stochastic Gradient Descent

Authors: Giorgi Nadiradze, Ilia Markov, Bapi Chatterjee, Vyacheslav Kungurtsev, Dan Alistarh

Abstract: Machine learning has made tremendous progress in recent years, with models matching or even surpassing humans on a series of specialized tasks. One key element behind the progress of machine learning in recent years has been the ability to train machine learning models in large-scale distributed shared-memory and message-passing environments. Many of these models are trained employing variants of… ▽ More Machine learning has made tremendous progress in recent years, with models matching or even surpassing humans on a series of specialized tasks. One key element behind the progress of machine learning in recent years has been the ability to train machine learning models in large-scale distributed shared-memory and message-passing environments. Many of these models are trained employing variants of stochastic gradient descent (SGD) based optimization. In this paper, we introduce a general consistency condition covering communication-reduced and asynchronous distributed SGD implementations. Our framework, called elastic consistency enables us to derive convergence bounds for a variety of distributed SGD methods used in practice to train large-scale machine learning models. The proposed framework de-clutters the implementation-specific convergence analysis and provides an abstraction to derive convergence bounds. We utilize the framework to analyze a sparsification scheme for distributed SGD methods in an asynchronous setting for convex and non-convex objectives. We implement the distributed SGD variant to train deep CNN models in an asynchronous shared-memory setting. Empirical results show that error-feedback may not necessarily help in improving the convergence of sparsified asynchronous distributed SGD, which corroborates an insight suggested by our convergence analysis. △ Less

Submitted 28 June, 2020; v1 submitted 16 January, 2020; originally announced January 2020.

arXiv:1912.07263 [pdf, other]

Focusing of Drive and Test Bunches in a Dielectric Waveguide Filled with Inhomogeneous Plasma

Authors: G. V. Sotnikov, P. I. Markov, I. N. Onishchenko

Abstract: The paper presents the results of numerical PIC-simulation of accelerated and drive bunches dynamics in a dielectric waveguide filled with radially inhomogeneous plasma. The wakefield was excited by the electron bunch in a quartz (permittivity 3.75) dielectric tube with outer and inner diameters of 1.2 mm and 1.0 mm, respectively, which was nested into a cylindrical metallic waveguide. The drive b… ▽ More The paper presents the results of numerical PIC-simulation of accelerated and drive bunches dynamics in a dielectric waveguide filled with radially inhomogeneous plasma. The wakefield was excited by the electron bunch in a quartz (permittivity 3.75) dielectric tube with outer and inner diameters of 1.2 mm and 1.0 mm, respectively, which was nested into a cylindrical metallic waveguide. The drive bunch characteristics were chosen to be: 5 GeV for electron energy, 3 nC for the charge, 0.2 mm - the bunch length, 0.9 mm - the bunch diameter. The accelerated bunch had the same parameters, except for the charge, which was equal to 0.3 nC. The interior of the waveguide was filled with plasma having different transverse density profiles, viz., the density profile formed in the capillary discharge, and the radially nonuniform density profile with the vacuum channel along the waveguide axis. For all the cases under study the plasma density was low, so that the plasma frequency was lower than the fundamental dielectric mode frequency. The obtained PIC-simulation data have shown that the vacuum channel in the inhomogeneous plasma cylinder improves the accelerated bunch focusing. There is the optimum vacuum-channel size value, at which the focusing turns out to be the strongest. The improvement in the accelerated bunch focusing is accompanied by the decrease in the accelerating gradient as compared with the full plasma filling of the drift channel. The best acceleration takes place in the absence of plasma; however in that case the test bunch focusing does not occur. △ Less

Submitted 16 December, 2019; originally announced December 2019.

Comments: 7 pages, 8 figures

arXiv:1908.06077 [pdf, other]

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

Authors: Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

Abstract: As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD prov… ▽ More As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods. △ Less

Submitted 3 May, 2021; v1 submitted 16 August, 2019; originally announced August 2019.

Comments: 42 pages, 21 figures. To appear in the Journal of Machine Learning Research (JMLR)

Showing 1–50 of 102 results for author: Markov, I