Search | arXiv e-print repository

arXiv:2506.10257 [pdf, ps, other]

Enhancing Ultrasound Molecular Imaging: Toward Real-Time RPCA-Based Filtering to Differentiate Bound and Free Microbubbles

Authors: Hoda S. Hashemi, Dongwoon Hyun, Nathan Nguyen, Jihye Baek, Arutselvan Natarajan, Farbod Tabesh, Andrew Andrzejek, Ramasamy Paulmurugan, Jeremy J. Dahl

Abstract: Ultrasound molecular imaging (UMI) is an advanced imaging modality that shows promise in detecting cancer at early stages. It uses microbubbles as contrast agents, which are functionalized to bind to cancer biomarkers overexpressed on endothelial cells. A major challenge in UMI is isolating bound microbubble signal, which represents the molecular imaging signal, from that of free-floating microbub… ▽ More Ultrasound molecular imaging (UMI) is an advanced imaging modality that shows promise in detecting cancer at early stages. It uses microbubbles as contrast agents, which are functionalized to bind to cancer biomarkers overexpressed on endothelial cells. A major challenge in UMI is isolating bound microbubble signal, which represents the molecular imaging signal, from that of free-floating microbubbles, which is considered background noise. In this work, we propose a fast GPU-based method using robust principal component analysis (RPCA) to distinguish bound microbubbles from free-floating ones. We explore the method using simulations and measure the accuracy using the Dice coefficient and RMS error as functions of the number of frames used in RPCA reconstruction. Experiments using stationary and flowing microbubbles in tissue-mimicking phantoms were used to validate the method. Additionally, the method was applied to data from ten transgenic mouse models of breast cancer development, injected with B7-H3-targeted microbubbles, and two mice injected with non-targeted microbubbles. The results showed that RPCA using 20 frames achieved a Dice score of 0.95 and a computation time of 0.2 seconds, indicating that 20 frames is potentially suitable for real-time implementation. On in vivo data, RPCA using 20 frames achieved a Dice score of 0.82 with DTE, indicating good agreement between the two, given the limitations of each method. △ Less

Submitted 11 June, 2025; originally announced June 2025.

arXiv:2505.06027 [pdf, other]

Unilogit: Robust Machine Unlearning for LLMs Using Uniform-Target Self-Distillation

Authors: Stefan Vasilev, Christian Herold, Baohao Liao, Seyyed Hadi Hashemi, Shahram Khadivi, Christof Monz

Abstract: This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit d… ▽ More This paper introduces Unilogit, a novel self-distillation method for machine unlearning in Large Language Models. Unilogit addresses the challenge of selectively forgetting specific information while maintaining overall model utility, a critical task in compliance with data privacy regulations like GDPR. Unlike prior methods that rely on static hyperparameters or starting model outputs, Unilogit dynamically adjusts target logits to achieve a uniform probability for the target token, leveraging the current model's outputs for more accurate self-distillation targets. This approach not only eliminates the need for additional hyperparameters but also enhances the model's ability to approximate the golden targets. Extensive experiments on public benchmarks and an in-house e-commerce dataset demonstrate Unilogit's superior performance in balancing forget and retain objectives, outperforming state-of-the-art methods such as NPO and UnDIAL. Our analysis further reveals Unilogit's robustness across various scenarios, highlighting its practical applicability and effectiveness in achieving efficacious machine unlearning. △ Less

Submitted 9 May, 2025; originally announced May 2025.

Comments: 16 pages, 6 figures, 5 tables, under review at ACL

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2504.14690 [pdf]

FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models

Authors: Mehrnoush Shamsfard, Zahra Saaberi, Mostafa Karimi manesh, Seyed Mohammad Hossein Hashemi, Zahra Vatankhah, Motahareh Ramezani, Niki Pourazin, Tara Zare, Maryam Azimi, Sarina Chitsaz, Sama Khoraminejad, Morteza Mahdavi Mortazavi, Mohammad Mahdi Chizari, Sahar Maleki, Seyed Soroush Majd, Mostafa Masumi, Sayed Ali Musavi Khoeini, Amir Mohseni, Sogol Alipour

Abstract: Research on evaluating and analyzing large language models (LLMs) has been extensive for resource-rich languages such as English, yet their performance in languages such as Persian has received considerably less attention. This paper introduces FarsEval-PKBETS benchmark, a subset of FarsEval project for evaluating large language models in Persian. This benchmark consists of 4000 questions and answ… ▽ More Research on evaluating and analyzing large language models (LLMs) has been extensive for resource-rich languages such as English, yet their performance in languages such as Persian has received considerably less attention. This paper introduces FarsEval-PKBETS benchmark, a subset of FarsEval project for evaluating large language models in Persian. This benchmark consists of 4000 questions and answers in various formats, including multiple choice, short answer and descriptive responses. It covers a wide range of domains and tasks,including medicine, law, religion, Persian language, encyclopedic knowledge, human preferences, social knowledge, ethics and bias, text generation, and respecting others' rights. This bechmark incorporates linguistics, cultural, and local considerations relevant to the Persian language and Iran. To ensure the questions are challenging for current LLMs, three models -- Llama3-70B, PersianMind, and Dorna -- were evaluated using this benchmark. Their average accuracy was below 50%, meaning they provided fully correct answers to fewer than half of the questions. These results indicate that current language models are still far from being able to solve this benchmark △ Less

Submitted 20 April, 2025; originally announced April 2025.

Comments: 24 pages, 3 figures, 3 tables

MSC Class: 68T50 ACM Class: I.2.7; E.0

arXiv:2503.19786 [pdf, other]

Gemma 3 Technical Report

Authors: Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean-bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, Gaël Liu, Francesco Visin, Kathleen Kenealy, Lucas Beyer, Xiaohai Zhai, Anton Tsitsulin , et al. (191 additional authors not shown)

Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achie… ▽ More We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community. △ Less

Submitted 25 March, 2025; originally announced March 2025.

arXiv:2503.13089 [pdf, ps, other]

ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning

Authors: Baohao Liao, Christian Herold, Seyyed Hadi Hashemi, Stefan Vasilev, Shahram Khadivi, Christof Monz

Abstract: As large language models (LLMs) scale, model compression is crucial for edge deployment and accessibility. Weight-only quantization reduces model size but suffers from performance degradation at lower bit widths. Moreover, standard finetuning is incompatible with quantized models, and alternative methods often fall short of full finetuning. In this paper, we propose ClusComp, a simple yet effectiv… ▽ More As large language models (LLMs) scale, model compression is crucial for edge deployment and accessibility. Weight-only quantization reduces model size but suffers from performance degradation at lower bit widths. Moreover, standard finetuning is incompatible with quantized models, and alternative methods often fall short of full finetuning. In this paper, we propose ClusComp, a simple yet effective compression paradigm that clusters weight matrices into codebooks and finetunes them block-by-block. ClusComp (1) achieves superior performance in 2-4 bit quantization, (2) pushes compression to 1-bit while outperforming ultra-low-bit methods with minimal finetuning, and (3) enables efficient finetuning, even surpassing existing quantization-based approaches and rivaling full FP16 finetuning. Notably, ClusComp supports compression and finetuning of 70B LLMs on a single A6000-48GB GPU. △ Less

Submitted 1 June, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

Comments: ACL camera-ready version

arXiv:2503.02967 [pdf]

Revolutionizing Traffic Management with AI-Powered Machine Vision: A Step Toward Smart Cities

Authors: Seyed Hossein Hosseini DolatAbadi, Sayyed Mohammad Hossein Hashemi, Mohammad Hosseini, Moein-Aldin AliHosseini

Abstract: The rapid urbanization of cities and increasing vehicular congestion have posed significant challenges to traffic management and safety. This study explores the transformative potential of artificial intelligence (AI) and machine vision technologies in revolutionizing traffic systems. By leveraging advanced surveillance cameras and deep learning algorithms, this research proposes a system for real… ▽ More The rapid urbanization of cities and increasing vehicular congestion have posed significant challenges to traffic management and safety. This study explores the transformative potential of artificial intelligence (AI) and machine vision technologies in revolutionizing traffic systems. By leveraging advanced surveillance cameras and deep learning algorithms, this research proposes a system for real-time detection of vehicles, traffic anomalies, and driver behaviors. The system integrates geospatial and weather data to adapt dynamically to environmental conditions, ensuring robust performance in diverse scenarios. Using YOLOv8 and YOLOv11 models, the study achieves high accuracy in vehicle detection and anomaly recognition, optimizing traffic flow and enhancing road safety. These findings contribute to the development of intelligent traffic management solutions and align with the vision of creating smart cities with sustainable and efficient urban infrastructure. △ Less

Submitted 4 March, 2025; originally announced March 2025.

Comments: 6 pages, 1 figure, 2 tables, accepted to 1th AITC conference in University Of Isfahan

arXiv:2501.09706 [pdf, other]

Domain Adaptation of Foundation LLMs for e-Commerce

Authors: Christian Herold, Michael Kozielski, Tala Bazazo, Pavel Petrushkov, Patrycja Cieplicka, Dominika Basaj, Yannick Versley, Seyyed Hadi Hashemi, Shahram Khadivi

Abstract: We present the e-Llama models: 8 billion and 70 billion parameter large language models that are adapted towards the e-commerce domain. These models are meant as foundation models with deep knowledge about e-commerce, that form a base for instruction- and fine-tuning. The e-Llama models are obtained by continuously pretraining the Llama 3.1 base models on 1 trillion tokens of domain-specific data.… ▽ More We present the e-Llama models: 8 billion and 70 billion parameter large language models that are adapted towards the e-commerce domain. These models are meant as foundation models with deep knowledge about e-commerce, that form a base for instruction- and fine-tuning. The e-Llama models are obtained by continuously pretraining the Llama 3.1 base models on 1 trillion tokens of domain-specific data. We discuss our approach and motivate our choice of hyperparameters with a series of ablation studies. To quantify how well the models have been adapted to the e-commerce domain, we define and implement a set of multilingual, e-commerce specific evaluation tasks. We show that, when carefully choosing the training setup, the Llama 3.1 models can be adapted towards the new domain without sacrificing significant performance on general domain tasks. We also explore the possibility of merging the adapted model and the base model for a better control of the performance trade-off between domains. △ Less

Submitted 25 May, 2025; v1 submitted 16 January, 2025; originally announced January 2025.

Comments: Accepted at ACL25 (Industry )

arXiv:2501.00274 [pdf, other]

doi 10.18653/v1/2024.acl-long.745

LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts

Authors: Helia Hashemi, Jason Eisner, Corby Rosset, Benjamin Van Durme, Chris Kedzie

Abstract: This paper introduces a framework for the automated evaluation of natural language texts. A manually constructed rubric describes how to assess multiple dimensions of interest. To evaluate a text, a large language model (LLM) is prompted with each rubric question and produces a distribution over potential responses. The LLM predictions often fail to agree well with human judges -- indeed, the huma… ▽ More This paper introduces a framework for the automated evaluation of natural language texts. A manually constructed rubric describes how to assess multiple dimensions of interest. To evaluate a text, a large language model (LLM) is prompted with each rubric question and produces a distribution over potential responses. The LLM predictions often fail to agree well with human judges -- indeed, the humans do not fully agree with one another. However, the multiple LLM distributions can be $\textit{combined}$ to $\textit{predict}$ each human judge's annotations on all questions, including a summary question that assesses overall quality or relevance. LLM-Rubric accomplishes this by training a small feed-forward neural network that includes both judge-specific and judge-independent parameters. When evaluating dialogue systems in a human-AI information-seeking task, we find that LLM-Rubric with 9 questions (assessing dimensions such as naturalness, conciseness, and citation quality) predicts human judges' assessment of overall user satisfaction, on a scale of 1--4, with RMS error $< 0.5$, a $2\times$ improvement over the uncalibrated baseline. △ Less

Submitted 30 December, 2024; originally announced January 2025.

Comments: Updated version of 17 June 2024

ACM Class: I.2.1; I.2.6; I.2.7

Journal ref: Proceedings of ACL 2024 (Volume 1: Long Papers), pp. 13806-13834

arXiv:2410.12380 [pdf, other]

Evaluation of Attribution Bias in Retrieval-Augmented Large Language Models

Authors: Amin Abolghasemi, Leif Azzopardi, Seyyed Hadi Hashemi, Maarten de Rijke, Suzan Verberne

Abstract: Attributing answers to source documents is an approach used to enhance the verifiability of a model's output in retrieval augmented generation (RAG). Prior work has mainly focused on improving and evaluating the attribution quality of large language models (LLMs) in RAG, but this may come at the expense of inducing biases in the attribution of answers. We define and examine two aspects in the eval… ▽ More Attributing answers to source documents is an approach used to enhance the verifiability of a model's output in retrieval augmented generation (RAG). Prior work has mainly focused on improving and evaluating the attribution quality of large language models (LLMs) in RAG, but this may come at the expense of inducing biases in the attribution of answers. We define and examine two aspects in the evaluation of LLMs in RAG pipelines, namely attribution sensitivity and bias with respect to authorship information. We explicitly inform an LLM about the authors of source documents, instruct it to attribute its answers, and analyze (i) how sensitive the LLM's output is to the author of source documents, and (ii) whether the LLM exhibits a bias towards human-written or AI-generated source documents. We design an experimental setup in which we use counterfactual evaluation to study three LLMs in terms of their attribution sensitivity and bias in RAG pipelines. Our results show that adding authorship information to source documents can significantly change the attribution quality of LLMs by 3% to 18%. Moreover, we show that LLMs can have an attribution bias towards explicit human authorship, which can serve as a competing hypothesis for findings of prior work that shows that LLM-generated content may be preferred over human-written contents. Our findings indicate that metadata of source documents can influence LLMs' trust, and how they attribute their answers. Furthermore, our research highlights attribution bias and sensitivity as a novel aspect of brittleness in LLMs. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2408.00118 [pdf, other]

Gemma 2: Improving Open Language Models at a Practical Size

Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (173 additional authors not shown)

Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community. △ Less

Submitted 2 October, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1112 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 16 December, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2401.03302 [pdf]

Realism in Action: Anomaly-Aware Diagnosis of Brain Tumors from Medical Images Using YOLOv8 and DeiT

Authors: Seyed Mohammad Hossein Hashemi, Leila Safari, Amirhossein Dadashzadeh Taromi

Abstract: In the field of medical sciences, reliable detection and classification of brain tumors from images remains a formidable challenge due to the rarity of tumors within the population of patients. Therefore, the ability to detect tumors in anomaly scenarios is paramount for ensuring timely interventions and improved patient outcomes. This study addresses the issue by leveraging deep learning (DL) tec… ▽ More In the field of medical sciences, reliable detection and classification of brain tumors from images remains a formidable challenge due to the rarity of tumors within the population of patients. Therefore, the ability to detect tumors in anomaly scenarios is paramount for ensuring timely interventions and improved patient outcomes. This study addresses the issue by leveraging deep learning (DL) techniques to detect and classify brain tumors in challenging situations. The curated data set from the National Brain Mapping Lab (NBML) comprises 81 patients, including 30 Tumor cases and 51 Normal cases. The detection and classification pipelines are separated into two consecutive tasks. The detection phase involved comprehensive data analysis and pre-processing to modify the number of image samples and the number of patients of each class to anomaly distribution (9 Normal per 1 Tumor) to comply with real world scenarios. Next, in addition to common evaluation metrics for the testing, we employed a novel performance evaluation method called Patient to Patient (PTP), focusing on the realistic evaluation of the model. In the detection phase, we fine-tuned a YOLOv8n detection model to detect the tumor region. Subsequent testing and evaluation yielded competitive performance both in Common Evaluation Metrics and PTP metrics. Furthermore, using the Data Efficient Image Transformer (DeiT) module, we distilled a Vision Transformer (ViT) model from a fine-tuned ResNet152 as a teacher in the classification phase. This approach demonstrates promising strides in reliable tumor detection and classification, offering potential advancements in tumor diagnosis for real-world medical imaging scenarios. △ Less

Submitted 25 September, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

Comments: This work has been submitted to the Elsevier for possible publication

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1326 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 9 May, 2025; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.11816 [pdf, other]

Hybrid Controller for Robot Manipulators in Task-Space with Visual-Inertial Feedback

Authors: Seyed Hamed Hashemi, Jouni Mattila

Abstract: This paper presents a visual-inertial-based control strategy to address the task space control problem of robot manipulators. To this end, an observer-based hybrid controller is employed to control end-effector motion. In addition, a hybrid observer is introduced for a visual-inertial navigation system to close the control loop directly at the Cartesian space by estimating the end-effector pose. A… ▽ More This paper presents a visual-inertial-based control strategy to address the task space control problem of robot manipulators. To this end, an observer-based hybrid controller is employed to control end-effector motion. In addition, a hybrid observer is introduced for a visual-inertial navigation system to close the control loop directly at the Cartesian space by estimating the end-effector pose. Accordingly, the robot tip is equipped with an inertial measurement unit (IMU) and a stereo camera to provide task-space feedback information for the proposed observer. It is demonstrated through the Lyapunov stability theorem that the resulting closed-loop system under the proposed observer-based controller is globally asymptotically stable. Besides this notable merit (global asymptotic stability), the proposed control method eliminates the need to compute inverse kinematics and increases trajectory tracking accuracy in task-space. The effectiveness and accuracy of the proposed control scheme are evaluated through computer simulations, where the proposed control structure is applied to a 6 degrees-of-freedom long-reach hydraulic robot manipulator. △ Less

Submitted 20 November, 2023; originally announced November 2023.

arXiv:2309.00002 [pdf, other]

3D Ultrafast Shear Wave Absolute Vibro-Elastography using a Matrix Array Transducer

Authors: Hoda S. Hashemi, Shahed K. Mohammed, Qi Zeng, Reza Zahiri Azar, Robert N. Rohling, Septimiu E. Salcudean

Abstract: 3D ultrasound imaging provides more spatial information compared to conventional 2D frames by considering the volumes of data. One of the main bottlenecks of 3D imaging is the long data acquisition time which reduces practicality and can introduce artifacts from unwanted patient or sonographer motion. This paper introduces the first shear wave absolute vibro-elastography (S-WAVE) method with real-… ▽ More 3D ultrasound imaging provides more spatial information compared to conventional 2D frames by considering the volumes of data. One of the main bottlenecks of 3D imaging is the long data acquisition time which reduces practicality and can introduce artifacts from unwanted patient or sonographer motion. This paper introduces the first shear wave absolute vibro-elastography (S-WAVE) method with real-time volumetric acquisition using a matrix array transducer. In SWAVE, an external vibration source generates mechanical vibrations inside the tissue. The tissue motion is then estimated and used in solving a wave equation inverse problem to provide the tissue elasticity. A matrix array transducer is used with a Verasonics ultrasound machine and frame rate of 2000 volumes/s to acquire 100 radio frequency (RF) volumes in 0.05 s. Using plane wave (PW) and compounded diverging wave (CDW) imaging methods, we estimate axial, lateral and elevational displacements over 3D volumes. The curl of the displacements is used with local frequency estimation to estimate elasticity in the acquired volumes. Ultrafast acquisition extends substantially the possible S-WAVE excitation frequency range, now up to 800 Hz, enabling new tissue modeling and characterization. The method was validated on three homogeneous liver fibrosis phantoms and on four different inclusions within a heterogeneous phantom. The homogeneous phantom results show less than 8% (PW) and 5% (CDW) difference between the manufacturer values and the corresponding estimated values over a frequency range of 80 Hz to 800 Hz. The estimated elasticity values for the heterogeneous phantom at 400 Hz excitation frequency show average errors of 9% (PW) and 6% (CDW) compared to the provided average values by MRE. Furthermore, both imaging methods were able to detect the inclusions within the elasticity volumes. △ Less

Submitted 22 May, 2023; originally announced September 2023.

arXiv:2307.11749 [pdf, other]

Differentially Private Heavy Hitter Detection using Federated Analytics

Authors: Karan Chadha, Junye Chen, John Duchi, Vitaly Feldman, Hanieh Hashemi, Omid Javidbakht, Audra McMillan, Kunal Talwar

Abstract: In this work, we study practical heuristics to improve the performance of prefix-tree based algorithms for differentially private heavy hitter detection. Our model assumes each user has multiple data points and the goal is to learn as many of the most frequent data points as possible across all users' data with aggregate and local differential privacy. We propose an adaptive hyperparameter tuning… ▽ More In this work, we study practical heuristics to improve the performance of prefix-tree based algorithms for differentially private heavy hitter detection. Our model assumes each user has multiple data points and the goal is to learn as many of the most frequent data points as possible across all users' data with aggregate and local differential privacy. We propose an adaptive hyperparameter tuning algorithm that improves the performance of the algorithm while satisfying computational, communication and privacy constraints. We explore the impact of different data-selection schemes as well as the impact of introducing deny lists during multiple runs of the algorithm. We test these improvements using extensive experimentation on the Reddit dataset~\cite{caldas2018leaf} on the task of learning the most frequent words. △ Less

Submitted 21 July, 2023; originally announced July 2023.

arXiv:2307.05925 [pdf, other]

A Tractable Statistical Representation of IFTR Fading with Applications

Authors: Maryam Olyaee, Hadi Hashemi, Juan M. Romero-Jerez

Abstract: The recently introduced independent fluctuating two-ray (IFTR) fading model, consisting of two specular components fluctuating independently plus a diffuse component, has proven to provide an excellent fit to different wireless environments, including the millimeter-wave band. However, the original formulations of the probability density function (PDF) and cumulative distribution function (CDF) of… ▽ More The recently introduced independent fluctuating two-ray (IFTR) fading model, consisting of two specular components fluctuating independently plus a diffuse component, has proven to provide an excellent fit to different wireless environments, including the millimeter-wave band. However, the original formulations of the probability density function (PDF) and cumulative distribution function (CDF) of this model are not applicable to all possible values of its defining parameters, and are given in terms of multifold generalized hypergeometric functions, which prevents their widespread use for the derivation of performance metric expressions. In this paper we present a new formulation of the IFTR model as a countable mixture of Gamma distributions which greatly facilitates the performance evaluation for this model in terms of the metrics already known for the much simpler and widely used Nakagami-m fading. Additionally, a closed-form expression is presented for the generalized moment generating function (GMGF), which permits to readily obtain all the moments of the distribution of the model, as well as several relevant performance metrics. Based on these new derivations, the IFTR model is evaluated for the average channel capacity, the outage probability with and without co-channel interference, and the bit error rate (BER), which are verified by Monte Carlo simulations. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: This work was submitted to the IEEE for publication

arXiv:2307.02740 [pdf, other]

Dense Retrieval Adaptation using Target Domain Description

Authors: Helia Hashemi, Yong Zhuang, Sachith Sri Ram Kothur, Srivas Prasad, Edgar Meij, W. Bruce Croft

Abstract: In information retrieval (IR), domain adaptation is the process of adapting a retrieval model to a new domain whose data distribution is different from the source domain. Existing methods in this area focus on unsupervised domain adaptation where they have access to the target document collection or supervised (often few-shot) domain adaptation where they additionally have access to (limited) labe… ▽ More In information retrieval (IR), domain adaptation is the process of adapting a retrieval model to a new domain whose data distribution is different from the source domain. Existing methods in this area focus on unsupervised domain adaptation where they have access to the target document collection or supervised (often few-shot) domain adaptation where they additionally have access to (limited) labeled data in the target domain. There also exists research on improving zero-shot performance of retrieval models with no adaptation. This paper introduces a new category of domain adaptation in IR that is as-yet unexplored. Here, similar to the zero-shot setting, we assume the retrieval model does not have access to the target document collection. In contrast, it does have access to a brief textual description that explains the target domain. We define a taxonomy of domain attributes in retrieval tasks to understand different properties of a source domain that can be adapted to a target domain. We introduce a novel automatic data construction pipeline that produces a synthetic document collection, query set, and pseudo relevance labels, given a textual domain description. Extensive experiments on five diverse target domains show that adapting dense retrieval models using the constructed synthetic data leads to effective retrieval performance on the target domain. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2305.10403 [pdf, other]

PaLM 2 Technical Report

Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities. When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report. △ Less

Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

arXiv:2302.04163 [pdf, ps, other]

Task Space Control of Robot Manipulators based on Visual SLAM

Authors: Seyed Hamed Hashemi, Jouni Mattila

Abstract: This paper aims to address the open problem of designing a globally stable vision-based controller for robot manipulators. Accordingly, based on a hybrid mechanism, this paper proposes a novel task-space control law attained by taking the gradient of a potential function in SE(3). The key idea is to employ the Visual Simultaneous Localization and Mapping (VSLAM) algorithm to estimate a robot pose.… ▽ More This paper aims to address the open problem of designing a globally stable vision-based controller for robot manipulators. Accordingly, based on a hybrid mechanism, this paper proposes a novel task-space control law attained by taking the gradient of a potential function in SE(3). The key idea is to employ the Visual Simultaneous Localization and Mapping (VSLAM) algorithm to estimate a robot pose. The estimated robot pose is then used in the proposed hybrid controller as feedback information. Invoking Barbalats lemma and Lyapunov's stability theorem, it is guaranteed that the resulting closed-loop system is globally asymptotically stable, which is the main accomplishment of the proposed structure. Simulation studies are conducted on a six degrees of freedom (6-DOF) robot manipulator to demonstrate the effectiveness and validate the performance of the proposed VSLAM-based control scheme. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2212.06712 [pdf, other]

Analysis of the Outage Probability of Ground-Based Relaying for Satellite Systems

Authors: Hadi Hashemi, Beatriz Soret, M. Carmen Aguayo-Torres

Abstract: This paper investigates the theoretical basis for using ground relaying in multi-antenna satellites exposed to blocking situations. Inactive and unobstructed User Equipments (UEs) located on ground are the relaying nodes of UEs that are not in the field of view of the satellite. Exact closed-form relationships of the Signal-to-Noise Ratio (SNR) and the outage probability are obtained for the case… ▽ More This paper investigates the theoretical basis for using ground relaying in multi-antenna satellites exposed to blocking situations. Inactive and unobstructed User Equipments (UEs) located on ground are the relaying nodes of UEs that are not in the field of view of the satellite. Exact closed-form relationships of the Signal-to-Noise Ratio (SNR) and the outage probability are obtained for the case where each user is connected to two transmitting antennas at the satellite. The channel between the satellite and ground is modeled as a shadowed-Rician (SR), whereas a Fluctuating Two-Ray (FTR) fading model is used for the mmWaves ground link between relay and UE, as well as cases with perfect and imperfect Channel State Information (CSI). The simulation results showed that if perfect CSI is not available in the pre-coding and only the signal phase is estimated, the performance loss is minimal and the system can reach its ideal performance by spending limited power. In any case, the closed-form for the ideal state are a good proxy to predict the performance under non-ideal conditions. △ Less

Submitted 13 December, 2022; originally announced December 2022.

arXiv:2212.06264 [pdf, other]

Data Leakage via Access Patterns of Sparse Features in Deep Learning-based Recommendation Systems

Authors: Hanieh Hashemi, Wenjie Xiong, Liu Ke, Kiwan Maeng, Murali Annavaram, G. Edward Suh, Hsien-Hsin S. Lee

Abstract: Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model… ▽ More Online personalized recommendation services are generally hosted in the cloud where users query the cloud-based model to receive recommended input such as merchandise of interest or news feed. State-of-the-art recommendation models rely on sparse and dense features to represent users' profile information and the items they interact with. Although sparse features account for 99% of the total model size, there was not enough attention paid to the potential information leakage through sparse features. These sparse features are employed to track users' behavior, e.g., their click history, object interactions, etc., potentially carrying each user's private information. Sparse features are represented as learned embedding vectors that are stored in large tables, and personalized recommendation is performed by using a specific user's sparse feature to index through the tables. Even with recently-proposed methods that hides the computation happening in the cloud, an attacker in the cloud may be able to still track the access patterns to the embedding tables. This paper explores the private information that may be learned by tracking a recommendation model's sparse feature access patterns. We first characterize the types of attacks that can be carried out on sparse features in recommendation models in an untrusted cloud, followed by a demonstration of how each of these attacks leads to extracting users' private information or tracking users by their behavior over time. △ Less

Submitted 12 December, 2022; originally announced December 2022.

arXiv:2211.09409 [pdf]

doi 10.1007/s11042-024-19675-x

Color Image steganography using Deep convolutional Autoencoders based on ResNet architecture

Authors: Seyed Hesam Odin Hashemi, Mohammad-Hassan Majidi, Saeed Khorashadizadeh

Abstract: In this paper, a deep learning color image steganography scheme combining convolutional autoencoders and ResNet architecture is proposed. Traditional steganography methods suffer from some critical defects such as low capacity, security, and robustness. In recent decades, image hiding and image extraction were realized by autoencoder convolutional neural networks to solve the aforementioned challe… ▽ More In this paper, a deep learning color image steganography scheme combining convolutional autoencoders and ResNet architecture is proposed. Traditional steganography methods suffer from some critical defects such as low capacity, security, and robustness. In recent decades, image hiding and image extraction were realized by autoencoder convolutional neural networks to solve the aforementioned challenges. The contribution of this paper is introducing a new scheme for color image steganography inspired by ResNet architecture. The reverse ResNet architecture is utilized to extract the secret image from the stego image. In the proposed method, all images are passed through the prepossess model which is a convolutional deep neural network with the aim of feature extraction. Then, the operational model generates stego and extracted images. In fact, the operational model is an autoencoder based on ResNet structure that produces an image from feature maps. The advantage of proposed structure is identity of models in embedding and extraction phases. The performance of the proposed method is studied using COCO and CelebA datasets. For quantitative comparisons with previous related works, peak signal-to-noise ratio (PSNR), the structural similarity index (SSIM) and hiding capacity are evaluated. The experimental results verify that the proposed scheme performs better than traditional and pervious deep steganography methods. The PSNR and SSIM are more than 40 dB and 0.98, respectively that implies high imperceptibility of the proposed method. Also, this method can hide a color image of the same size in another color image, which can be inferred that the relative capacity of the proposed method is 8 bits per pixel. △ Less

Submitted 17 November, 2022; originally announced November 2022.

arXiv:2207.00083 [pdf, other]

DarKnight: An Accelerated Framework for Privacy and Integrity Preserving Deep Learning Using Trusted Hardware

Authors: Hanieh Hashemi, Yongqin Wang, Murali Annavaram

Abstract: Privacy and security-related concerns are growing as machine learning reaches diverse application domains. The data holders want to train or infer with private data while exploiting accelerators, such as GPUs, that are hosted in the cloud. Cloud systems are vulnerable to attackers that compromise the privacy of data and integrity of computations. Tackling such a challenge requires unifying theoret… ▽ More Privacy and security-related concerns are growing as machine learning reaches diverse application domains. The data holders want to train or infer with private data while exploiting accelerators, such as GPUs, that are hosted in the cloud. Cloud systems are vulnerable to attackers that compromise the privacy of data and integrity of computations. Tackling such a challenge requires unifying theoretical privacy algorithms with hardware security capabilities. This paper presents DarKnight, a framework for large DNN training while protecting input privacy and computation integrity. DarKnight relies on cooperative execution between trusted execution environments (TEE) and accelerators, where the TEE provides privacy and integrity verification, while accelerators perform the bulk of the linear algebraic computation to optimize the performance. In particular, DarKnight uses a customized data encoding strategy based on matrix masking to create input obfuscation within a TEE. The obfuscated data is then offloaded to GPUs for fast linear algebraic computation. DarKnight's data obfuscation strategy provides provable data privacy and computation integrity in the cloud servers. While prior works tackle inference privacy and cannot be utilized for training, DarKnight's encoding scheme is designed to support both training and inference. △ Less

Submitted 30 June, 2022; originally announced July 2022.

Comments: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture. arXiv admin note: text overlap with arXiv:2105.00334

arXiv:2205.01953 [pdf, ps, other]

doi 10.1109/ACCESS.2022.3221524

A Global Asymptotic Convergent Observer for SLAM

Authors: Seyed Hamed Hashemi, Jouni Mattila

Abstract: This paper examines the global convergence problem of SLAM algorithms, an issue that faces topological obstructions. This is because the state-space of attitude dynamics is defined on a non-contractible manifold: the special orthogonal group of order three SO(3). Therefore, this paper presents a novel, gradient-based hybrid observer to overcome these topological obstacles. The Lyapunov stability t… ▽ More This paper examines the global convergence problem of SLAM algorithms, an issue that faces topological obstructions. This is because the state-space of attitude dynamics is defined on a non-contractible manifold: the special orthogonal group of order three SO(3). Therefore, this paper presents a novel, gradient-based hybrid observer to overcome these topological obstacles. The Lyapunov stability theorem is used to prove the globally asymptotic convergence of the proposed algorithm. Finally, comparative analyses of two simulations were conducted to evaluate the performance of the proposed scheme and to demonstrate the superiority of the proposed hybrid observer to a smooth observer. △ Less

Submitted 4 May, 2022; originally announced May 2022.

Comments: 7 pages, 8 figures, conference

arXiv:2203.13949 [pdf, other]

Ultrafast Ultrasound Imaging for 3D Shear Wave Absolute Vibro-Elastography

Authors: Hoda S. Hashemi, Reza Zahiri Azar, Septimiu E. Salcudean, Robert N. Rohling

Abstract: Shear wave absolute vibro-elastography (S-WAVE) is an imaging technique that generates steady-state shear waves inside the tissue using multi-frequency excitation from an external vibration source. In this work, plane wave imaging is introduced to reduce total acquisition time while retaining the benefit of 3D formulation. Plane wave imaging with a frame rate of 3000 frames/s is followed by 3D abs… ▽ More Shear wave absolute vibro-elastography (S-WAVE) is an imaging technique that generates steady-state shear waves inside the tissue using multi-frequency excitation from an external vibration source. In this work, plane wave imaging is introduced to reduce total acquisition time while retaining the benefit of 3D formulation. Plane wave imaging with a frame rate of 3000 frames/s is followed by 3D absolute elasticity estimation. We design two imaging sequences of ultrafast S-WAVE for two sets of excitation frequencies using a Verasonics system and a motorized swept ultrasound transducer to synchronize ultrasound acquisition with the external mechanical excitation. The overall data collection time is improved by 83-88% compared to the original 3D S-WAVE because of the per-channel acquisition offered by the Verasonics system. Tests are performed on liver fibrosis tissue-mimicking phantoms and on ex vivo bovine liver. The curl operator was previously used in magnetic resonance elastography (MRE) to cancel out the effect of the compressional waves. In this work, we apply the curl operator to the full 3D displacement field followed by 3D elasticity reconstruction. The results of phantom experiment show more accurate elasticity estimation as well as 18% less standard deviation (STD) compared to reconstruction using the curl of a 2D displacement field and 45% less STD than without the curl. We also compare our experimental results with a previous method based on acoustic radiation force impulse (ARFI) and achieve closer results to phantom manufacturer provided values with ultrafast S-WAVE. Furthermore, the dependency of the bovine liver elasticity on the frequency of excitation was also shown with our system. △ Less

Submitted 25 July, 2023; v1 submitted 25 March, 2022; originally announced March 2022.

arXiv:2112.13416 [pdf, other]

Attribute Inference Attack of Speech Emotion Recognition in Federated Learning Settings

Authors: Tiantian Feng, Hanieh Hashemi, Rajat Hebbar, Murali Annavaram, Shrikanth S. Narayanan

Abstract: Speech emotion recognition (SER) processes speech signals to detect and characterize expressed perceived emotions. Many SER application systems often acquire and transmit speech data collected at the client-side to remote cloud platforms for inference and decision making. However, speech data carry rich information not only about emotions conveyed in vocal expressions, but also other sensitive dem… ▽ More Speech emotion recognition (SER) processes speech signals to detect and characterize expressed perceived emotions. Many SER application systems often acquire and transmit speech data collected at the client-side to remote cloud platforms for inference and decision making. However, speech data carry rich information not only about emotions conveyed in vocal expressions, but also other sensitive demographic traits such as gender, age and language background. Consequently, it is desirable for SER systems to have the ability to classify emotion constructs while preventing unintended/improper inferences of sensitive and demographic information. Federated learning (FL) is a distributed machine learning paradigm that coordinates clients to train a model collaboratively without sharing their local data. This training approach appears secure and can improve privacy for SER. However, recent works have demonstrated that FL approaches are still vulnerable to various privacy attacks like reconstruction attacks and membership inference attacks. Although most of these have focused on computer vision applications, such information leakages exist in the SER systems trained using the FL technique. To assess the information leakage of SER systems trained using FL, we propose an attribute inference attack framework that infers sensitive attribute information of the clients from shared gradients or model parameters, corresponding to the FedSGD and the FedAvg training algorithms, respectively. As a use case, we empirically evaluate our approach for predicting the client's gender information using three SER benchmark datasets: IEMOCAP, CREMA-D, and MSP-Improv. We show that the attribute inference attack is achievable for SER systems trained using FL. We further identify that most information leakage possibly comes from the first layer in the SER model. △ Less

Submitted 22 December, 2022; v1 submitted 26 December, 2021; originally announced December 2021.

arXiv:2111.12179 [pdf, other]

Multifrequency 3D Elasticity Reconstruction withStructured Sparsity and ADMM

Authors: Shahed Mohammed, Mohammad Honarvar, Qi Zeng, Hoda Hashemi, Robert Rohling, Piotr Kozlowski, Septimiu Salcudean

Abstract: We introduce a model-based iterative method to obtain shear modulus images of tissue using magnetic resonance elastography. The method jointly finds the displacement field that best fits multifrequency tissue displacement data and the corresponding shear modulus. The displacement satisfies a viscoelastic wave equation constraint, discretized using the finite element method. Sparsifying regularizat… ▽ More We introduce a model-based iterative method to obtain shear modulus images of tissue using magnetic resonance elastography. The method jointly finds the displacement field that best fits multifrequency tissue displacement data and the corresponding shear modulus. The displacement satisfies a viscoelastic wave equation constraint, discretized using the finite element method. Sparsifying regularization terms in both shear modulus and the displacement are used in the cost function minimized for the best fit. The formulated problem is bi-convex. Its solution can be obtained iteratively by using the alternating direction method of multipliers. Sparsifying regularizations and the wave equation constraint filter out sensor noise and compressional waves. Our method does not require bandpass filtering as a preprocessing step and converges fast irrespective of the initialization. We evaluate our new method in multiple in silico and phantom experiments, with comparisons with existing methods, and we show improvements in contrast to noise and signal to noise ratios. Results from an in vivo liver imaging study show elastograms with mean elasticity comparable to other values reported in the literature. △ Less

Submitted 23 November, 2021; originally announced November 2021.

arXiv:2107.12958 [pdf, other]

doi 10.1109/IPDPS53621.2022.00067

Adaptive Verifiable Coded Computing: Towards Fast, Secure and Private Distributed Machine Learning

Authors: Tingting Tang, Ramy E. Ali, Hanieh Hashemi, Tynan Gangwani, Salman Avestimehr, Murali Annavaram

Abstract: Stragglers, Byzantine workers, and data privacy are the main bottlenecks in distributed cloud computing. Some prior works proposed coded computing strategies to jointly address all three challenges. They require either a large number of workers, a significant communication cost or a significant computational complexity to tolerate Byzantine workers. Much of the overhead in prior schemes comes from… ▽ More Stragglers, Byzantine workers, and data privacy are the main bottlenecks in distributed cloud computing. Some prior works proposed coded computing strategies to jointly address all three challenges. They require either a large number of workers, a significant communication cost or a significant computational complexity to tolerate Byzantine workers. Much of the overhead in prior schemes comes from the fact that they tightly couple coding for all three problems into a single framework. In this paper, we propose Adaptive Verifiable Coded Computing (AVCC) framework that decouples the Byzantine node detection challenge from the straggler tolerance. AVCC leverages coded computing just for handling stragglers and privacy, and then uses an orthogonal approach that leverages verifiable computing to mitigate Byzantine workers. Furthermore, AVCC dynamically adapts its coding scheme to trade-off straggler tolerance with Byzantine protection. We evaluate AVCC on a compute-intensive distributed logistic regression application. Our experiments show that AVCC achieves up to $4.2\times$ speedup and up to $5.1\%$ accuracy improvement over the state-of-the-art Lagrange coded computing approach (LCC). AVCC also speeds up the conventional uncoded implementation of distributed logistic regression by up to $7.6\times$, and improves the test accuracy by up to $12.1\%$. △ Less

Submitted 22 March, 2022; v1 submitted 27 July, 2021; originally announced July 2021.

arXiv:2106.15085 [pdf, other]

Automatic Construction of Enterprise Knowledge Base

Authors: Junyi Chai, Yujie He, Homa Hashemi, Bing Li, Daraksha Parveen, Ranganath Kondapally, Wenjin Xu

Abstract: In this paper, we present an automatic knowledge base construction system from large scale enterprise documents with minimal efforts of human intervention. In the design and deployment of such a knowledge mining system for enterprise, we faced several challenges including data distributional shift, performance evaluation, compliance requirements and other practical issues. We leveraged state-of-th… ▽ More In this paper, we present an automatic knowledge base construction system from large scale enterprise documents with minimal efforts of human intervention. In the design and deployment of such a knowledge mining system for enterprise, we faced several challenges including data distributional shift, performance evaluation, compliance requirements and other practical issues. We leveraged state-of-the-art deep learning models to extract information (named entities and definitions) at per document level, then further applied classical machine learning techniques to process global statistical information to improve the knowledge base. Experimental results are reported on actual enterprise documents. This system is currently serving as part of a Microsoft 365 service. △ Less

Submitted 29 June, 2021; originally announced June 2021.

arXiv:2106.09227 [pdf, other]

Current Challenges and Future Directions in Podcast Information Access

Authors: Rosie Jones, Hamed Zamani, Markus Schedl, Ching-Wei Chen, Sravana Reddy, Ann Clifton, Jussi Karlgren, Helia Hashemi, Aasish Pappu, Zahra Nazari, Longqi Yang, Oguz Semerci, Hugues Bouchard, Ben Carterette

Abstract: Podcasts are spoken documents across a wide-range of genres and styles, with growing listenership across the world, and a rapidly lowering barrier to entry for both listeners and creators. The great strides in search and recommendation in research and industry have yet to see impact in the podcast space, where recommendations are still largely driven by word of mouth. In this perspective paper, we… ▽ More Podcasts are spoken documents across a wide-range of genres and styles, with growing listenership across the world, and a rapidly lowering barrier to entry for both listeners and creators. The great strides in search and recommendation in research and industry have yet to see impact in the podcast space, where recommendations are still largely driven by word of mouth. In this perspective paper, we highlight the many differences between podcasts and other media, and discuss our perspective on challenges and future research directions in the domain of podcast information access. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: SIGIR 2021

arXiv:2105.02295 [pdf, other]

Byzantine-Robust and Privacy-Preserving Framework for FedML

Authors: Hanieh Hashemi, Yongqin Wang, Chuan Guo, Murali Annavaram

Abstract: Federated learning has emerged as a popular paradigm for collaboratively training a model from data distributed among a set of clients. This learning setting presents, among others, two unique challenges: how to protect privacy of the clients' data during training, and how to ensure integrity of the trained model. We propose a two-pronged solution that aims to address both challenges under a singl… ▽ More Federated learning has emerged as a popular paradigm for collaboratively training a model from data distributed among a set of clients. This learning setting presents, among others, two unique challenges: how to protect privacy of the clients' data during training, and how to ensure integrity of the trained model. We propose a two-pronged solution that aims to address both challenges under a single framework. First, we propose to create secure enclaves using a trusted execution environment (TEE) within the server. Each client can then encrypt their gradients and send them to verifiable enclaves. The gradients are decrypted within the enclave without the fear of privacy breaches. However, robustness check computations in a TEE are computationally prohibitive. Hence, in the second step, we perform a novel gradient encoding that enables TEEs to encode the gradients and then offloading Byzantine check computations to accelerators such as GPUs. Our proposed approach provides theoretical bounds on information leakage and offers a significant speed-up over the baseline in empirical evaluation. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Journal ref: Security and Safety in Machine Learning Systems Workshop in ICLR 2021

arXiv:2105.00334 [pdf, other]

Privacy and Integrity Preserving Training Using Trusted Hardware

Authors: Hanieh Hashemi, Yongqin Wang, Murali Annavaram

Abstract: Privacy and security-related concerns are growing as machine learning reaches diverse application domains. The data holders want to train with private data while exploiting accelerators, such as GPUs, that are hosted in the cloud. However, Cloud systems are vulnerable to attackers that compromise the privacy of data and integrity of computations. This work presents DarKnight, a framework for large… ▽ More Privacy and security-related concerns are growing as machine learning reaches diverse application domains. The data holders want to train with private data while exploiting accelerators, such as GPUs, that are hosted in the cloud. However, Cloud systems are vulnerable to attackers that compromise the privacy of data and integrity of computations. This work presents DarKnight, a framework for large DNN training while protecting input privacy and computation integrity. DarKnight relies on cooperative execution between trusted execution environments (TEE) and accelerators, where the TEE provides privacy and integrity verification, while accelerators perform the computation heavy linear algebraic operations. △ Less

Submitted 1 May, 2021; originally announced May 2021.

Journal ref: Distributed and Private Machine Learning ICLR 2021 Workshop

arXiv:2103.03221 [pdf, ps, other]

GenoML: Automated Machine Learning for Genomics

Authors: Mary B. Makarious, Hampton L. Leonard, Dan Vitale, Hirotaka Iwaki, David Saffo, Lana Sargent, Anant Dadu, Eduardo Salmerón Castaño, John F. Carter, Melina Maleknia, Juan A. Botia, Cornelis Blauwendraat, Roy H. Campbell, Sayed Hadi Hashemi, Andrew B. Singleton, Mike A. Nalls, Faraz Faghri

Abstract: GenoML is a Python package automating machine learning workflows for genomics (genetics and multi-omics) with an open science philosophy. Genomics data require significant domain expertise to clean, pre-process, harmonize and perform quality control of the data. Furthermore, tuning, validation, and interpretation involve taking into account the biology and possibly the limitations of the underlyin… ▽ More GenoML is a Python package automating machine learning workflows for genomics (genetics and multi-omics) with an open science philosophy. Genomics data require significant domain expertise to clean, pre-process, harmonize and perform quality control of the data. Furthermore, tuning, validation, and interpretation involve taking into account the biology and possibly the limitations of the underlying data collection, protocols, and technology. GenoML's mission is to bring machine learning for genomics and clinical data to non-experts by developing an easy-to-use tool that automates the full development, evaluation, and deployment process. Emphasis is put on open science to make workflows easily accessible, replicable, and transferable within the scientific community. Source code and documentation is available at https://genoml.com. △ Less

Submitted 4 March, 2021; originally announced March 2021.

arXiv:2010.07541 [pdf, other]

Secure and Fault Tolerant Decentralized Learning

Authors: Saurav Prakash, Hanieh Hashemi, Yongqin Wang, Murali Annavaram, Salman Avestimehr

Abstract: Federated learning (FL) is a promising paradigm for training a global model over data distributed across multiple data owners without centralizing clients' raw data. However, sharing of local model updates can also reveal information of clients' local datasets. Trusted execution environments (TEEs) within the FL server have been recently deployed by companies like Meta for secure aggregation. Howe… ▽ More Federated learning (FL) is a promising paradigm for training a global model over data distributed across multiple data owners without centralizing clients' raw data. However, sharing of local model updates can also reveal information of clients' local datasets. Trusted execution environments (TEEs) within the FL server have been recently deployed by companies like Meta for secure aggregation. However, secure aggregation can suffer from error-prone local updates sent by clients that become faulty during training due to underlying device malfunctions. Also, data heterogeneity across clients makes fault mitigation challenging, as even updates from normal clients are dissimilar. Thus, most of the prior fault tolerant methods, which treat any local update differing from the majority of other updates as faulty, perform poorly. We propose DiverseFL to make model aggregation secure as well as robust to faults. In DiverseFL, any client whose local model update diverges from its associated guiding update is tagged as being faulty. To implement our novel per-client criteria for fault mitigation, DiverseFL creates a TEE-based secure enclave within the FL server, which in addition to performing secure aggregation for carrying out the global model update step, securely receives a small representative sample of local data from each client only once before training, and computes guiding updates for each participating client during training. Thus, DiverseFL provides security against privacy leakage as well as robustness against faulty clients. In experiments, DiverseFL consistently achieves significant improvements in absolute test accuracy over prior fault mitigation benchmarks. DiverseFL also performs closely to OracleSGD, where server combines updates only from the normal clients. We also analyze the convergence rate of DiverseFL under non-IID data and standard convexity assumptions. △ Less

Submitted 13 September, 2022; v1 submitted 15 October, 2020; originally announced October 2020.

arXiv:2006.07548 [pdf, other]

Guided Transformer: Leveraging Multiple External Sources for Representation Learning in Conversational Search

Authors: Helia Hashemi, Hamed Zamani, W. Bruce Croft

Abstract: Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems, especially conversational search systems with limited bandwidth interfaces. Analyzing and generating clarifying questions have been studied recently but the accurate utilization of user responses to clarifying questions has been relatively les… ▽ More Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems, especially conversational search systems with limited bandwidth interfaces. Analyzing and generating clarifying questions have been studied recently but the accurate utilization of user responses to clarifying questions has been relatively less explored. In this paper, we enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources that weights each term in the conversation. We evaluate this Guided Transformer model in a conversational search scenario that includes clarifying questions. In our experiments, we use two separate external sources, including the top retrieved documents and a set of different possible clarifying questions for the query. We implement the proposed representation learning model for two downstream tasks in conversational search; document retrieval and next clarifying question selection. Our experiments use a public dataset for search clarification and demonstrate significant improvements compared to competitive baselines. △ Less

Submitted 12 June, 2020; originally announced June 2020.

Comments: To appear in the Proceedings of ACM SIGIR 2020. 10 pages

arXiv:2006.01300 [pdf, other]

DarKnight: A Data Privacy Scheme for Training and Inference of Deep Neural Networks

Authors: Hanieh Hashemi, Yongqin Wang, Murali Annavaram

Abstract: Protecting the privacy of input data is of growing importance as machine learning methods reach new application domains. In this paper, we provide a unified training and inference framework for large DNNs while protecting input privacy and computation integrity. Our approach called DarKnight uses a novel data blinding strategy using matrix masking to create input obfuscation within a trusted execu… ▽ More Protecting the privacy of input data is of growing importance as machine learning methods reach new application domains. In this paper, we provide a unified training and inference framework for large DNNs while protecting input privacy and computation integrity. Our approach called DarKnight uses a novel data blinding strategy using matrix masking to create input obfuscation within a trusted execution environment (TEE). Our rigorous mathematical proof demonstrates that our blinding process provides information-theoretic privacy guarantee by bounding information leakage. The obfuscated data can then be offloaded to any GPU for accelerating linear operations on blinded data. The results from linear operations on blinded data are decoded before performing non-linear operations within the TEE. This cooperative execution allows DarKnight to exploit the computational power of GPUs to perform linear operations while exploiting TEEs to protect input privacy. We implement DarKnight on an Intel SGX TEE augmented with a GPU to evaluate its performance. △ Less

Submitted 15 October, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

arXiv:2004.14020 [pdf, other]

Caramel: Accelerating Decentralized Distributed Deep Learning with Computation Scheduling

Authors: Sayed Hadi Hashemi, Sangeetha Abdu Jyothi, Brighten Godfrey, Roy Campbell

Abstract: The method of choice for parameter aggregation in Deep Neural Network (DNN) training, a network-intensive task, is shifting from the Parameter Server model to decentralized aggregation schemes (AllReduce) inspired by theoretical guarantees of better performance. However, current implementations of AllReduce overlook the interdependence of communication and computation, resulting in significant per… ▽ More The method of choice for parameter aggregation in Deep Neural Network (DNN) training, a network-intensive task, is shifting from the Parameter Server model to decentralized aggregation schemes (AllReduce) inspired by theoretical guarantees of better performance. However, current implementations of AllReduce overlook the interdependence of communication and computation, resulting in significant performance degradation. In this paper, we develop Caramel, a system that accelerates decentralized distributed deep learning through model-aware computation scheduling and communication optimizations for AllReduce. Caramel achieves this goal through (a) computation DAG scheduling that expands the feasible window of transfer for each parameter (transfer boundaries), and (b) network optimizations for smoothening of the load including adaptive batching and pipelining of parameter transfers. Caramel maintains the correctness of the dataflow model, is hardware-independent, and does not require any user-level or framework-level changes. We implement Caramel over TensorFlow and show that the iteration time of DNN training can be improved by up to 3.62x in a cloud environment. △ Less

Submitted 29 April, 2020; originally announced April 2020.

arXiv:1905.08957 [pdf, other]

ANTIQUE: A Non-Factoid Question Answering Benchmark

Authors: Helia Hashemi, Mohammad Aliannejadi, Hamed Zamani, W. Bruce Croft

Abstract: Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information retrieval systems. Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. In this paper, we develop a… ▽ More Considering the widespread use of mobile and voice search, answer passage retrieval for non-factoid questions plays a critical role in modern information retrieval systems. Despite the importance of the task, the community still feels the significant lack of large-scale non-factoid question answering collections with real questions and comprehensive relevance judgments. In this paper, we develop and release a collection of 2,626 open-domain non-factoid questions from a diverse set of categories. The dataset, called ANTIQUE, contains 34,011 manual relevance annotations. The questions were asked by real users in a community question answering service, i.e., Yahoo! Answers. Relevance judgments for all the answers to each question were collected through crowdsourcing. To facilitate further research, we also include a brief analysis of the data as well as baseline results on both classical and recently developed neural IR models. △ Less

Submitted 19 August, 2019; v1 submitted 22 May, 2019; originally announced May 2019.

arXiv:1905.04264 [pdf, other]

PartitionedVC: Partitioned External Memory Graph Analytics Framework for SSDs

Authors: Kiran Kumar Matam, Hanieh Hashemi, Murali Annavaram

Abstract: Graph analytics are at the heart of a broad range of applications such as drug discovery, page ranking, and recommendation systems. When graph size exceeds memory size, out-of-core graph processing is needed. For the widely used external memory graph processing systems, accessing storage becomes the bottleneck. We make the observation that nearly all graph algorithms have a dynamically varying num… ▽ More Graph analytics are at the heart of a broad range of applications such as drug discovery, page ranking, and recommendation systems. When graph size exceeds memory size, out-of-core graph processing is needed. For the widely used external memory graph processing systems, accessing storage becomes the bottleneck. We make the observation that nearly all graph algorithms have a dynamically varying number of active vertices that must be processed in each iteration. However, existing graph processing frameworks, such as GraphChi, load the entire graph in each iteration even if a small fraction of the graph is active. This limitation is due to the structure of the data storage used by these systems. In this work, we propose to use a compressed sparse row (CSR) based graph storage that is more amenable for selectively loading only a few active vertices in each iteration. But CSR based systems suffers from random update propagation to many target vertices. To solve this challenge, we propose to use a multi-log update mechanism that logs updates separately, rather than directly update the active edges in a graph. Our proposed multi-log system maintains a separate log per each vertex interval. This separation enables us to efficiently process each vertex interval by just loading the corresponding log. Further, while accessing SSD pages with fewer active vertex data, we reduce the read amplification due to the page granular accesses in SSD by logging the active vertex data in the current iteration and efficiently reading the log in the next iteration. Over the current state of the art out-of-core graph processing framework, our PartitionedVC improves performance by up to $17.84\times$, $1.19\times$, $1.65\times$, $1.38\times$, $3.15\times$, and $6.00\times$ for the widely used bfs, pagerank, community detection, graph coloring, maximal independent set, and random-walk applications, respectively. △ Less

Submitted 11 February, 2020; v1 submitted 10 May, 2019; originally announced May 2019.

Comments: 13 pages

arXiv:1904.06578 [pdf]

Deep-learning PDEs with unlabeled data and hardwiring physics laws

Authors: S. Mohammad H. Hashemi, Demetri Psaltis

Abstract: Providing fast and accurate solutions to partial differential equations is a problem of continuous interest to the fields of applied mathematics and physics. With the recent advances in machine learning, the adoption learning techniques in this domain is being eagerly pursued. We build upon earlier works on linear and homogeneous PDEs, and develop convolutional deep neural networks that can accura… ▽ More Providing fast and accurate solutions to partial differential equations is a problem of continuous interest to the fields of applied mathematics and physics. With the recent advances in machine learning, the adoption learning techniques in this domain is being eagerly pursued. We build upon earlier works on linear and homogeneous PDEs, and develop convolutional deep neural networks that can accurately solve nonlinear and non-homogeneous equations without the need for labeled data. The architecture of these networks is readily accessible for scientific disciplines who deal with PDEs and know the basics of deep learning. △ Less

Submitted 13 April, 2019; originally announced April 2019.

arXiv:1812.04401 [pdf, other]

Output-Oblivious Stochastic Chemical Reaction Networks

Authors: Ben Chugg, Anne Condon, Hooman Hashemi

Abstract: We classify the functions $f:\mathbb{N}^2 \rightarrow \mathbb{N}$ which are stably computable by output-oblivious Stochastic Chemical Reaction Networks (CRNs), i.e., systems of reactions in which output species are never reactants. While it is known that precisely the semilinear functions are stably computable by CRNs, such CRNs sometimes rely on initially producing too many output species and the… ▽ More We classify the functions $f:\mathbb{N}^2 \rightarrow \mathbb{N}$ which are stably computable by output-oblivious Stochastic Chemical Reaction Networks (CRNs), i.e., systems of reactions in which output species are never reactants. While it is known that precisely the semilinear functions are stably computable by CRNs, such CRNs sometimes rely on initially producing too many output species and then consuming the excess in order to reach a correct stable state. These CRNs may be difficult to integrate into larger systems: if the output of a CRN $\mathcal{C}$ becomes the input to a downstream CRN $\mathcal{C}'$, then $\mathcal{C}'$ could inadvertently consume too many outputs before $\mathcal{C}$ stabilizes. If, on the other hand, $\mathcal{C}$ is output-oblivious then $\mathcal{C}'$ may consume $\mathcal{C}$'s output as soon as it is available. In this work we prove that a semilinear function $f:\mathbb{N}^2 \rightarrow \mathbb{N}$ is stably computable by an output-oblivious CRN with a leader if and only if it is both increasing and either grid-affine (intuitively, its domains are congruence classes), or the minimum of a finite set of fissure functions (intuitively, functions behaving like the min function). △ Less

Submitted 30 August, 2022; v1 submitted 7 December, 2018; originally announced December 2018.

Comments: Published in OPODIS 2018. Latest version adds appendix containing all proofs

arXiv:1810.09819 [pdf, other]

doi 10.1051/0004-6361/201834150

3D mapping of young stars in the solar neighbourhood with Gaia DR2

Authors: E. Zari, H. Hashemi, A. G. A. Brown, K. Jardine, P. T. de Zeeuw

Abstract: We study the three dimensional arrangement of young stars in the solar neighbourhood using the second release of the Gaia mission (Gaia DR2) and we provide a new, original view of the spatial configuration of the star forming regions within 500 pc from the Sun. By smoothing the star distribution through a gaussian filter, we construct three dimensional density maps for early-type stars (upper-main… ▽ More We study the three dimensional arrangement of young stars in the solar neighbourhood using the second release of the Gaia mission (Gaia DR2) and we provide a new, original view of the spatial configuration of the star forming regions within 500 pc from the Sun. By smoothing the star distribution through a gaussian filter, we construct three dimensional density maps for early-type stars (upper-main sequence, UMS) and pre-main sequence (PMS) sources. The PMS and the UMS samples are selected through a combination of photometric and astrometric criteria. A side product of the analysis is a three dimensional, G-band extinction map, which we use to correct our colour-magnitude diagram for extinction and reddening. Both density maps show three prominent structures, Scorpius-Centaurus, Orion, and Vela. The PMS map shows a plethora of lower mass star forming regions, such as Taurus, Perseus, Cepheus, Cassiopeia, and Lacerta, which are less visible in the UMS map, due to the lack of large numbers of bright, early-type stars. We report the finding of a candidate new open cluster towards $l, b \sim 218.5^{\circ}, -2^{\circ}$, which could be related to the Orion star forming complex. We estimate ages for the PMS sample and we study the distribution of PMS stars as a function of their age. We find that younger stars cluster in dense, compact clumps, and are surrounded by older sources, whose distribution is instead more diffuse. The youngest groups that we find are mainly located in Scorpius-Centaurus, Orion, Vela, and Taurus. Cepheus, Cassiopeia, and Lacerta are instead more evolved and less numerous. Finally, we find that the three dimensional density maps show no evidence for the existence of the ring-like structure which is usually referred to as the Gould Belt. △ Less

Submitted 6 November, 2018; v1 submitted 23 October, 2018; originally announced October 2018.

Comments: 17 pages, 17 figures, 6 appendixes; accepted for publication in A&A; image quality decreased to comply with the arXiv.org rules on file size

Journal ref: A&A 620, A172 (2018)

arXiv:1810.01953 [pdf]

The Effects Of Longitudinal And Circumferential Cracks On The Torsional Dynamic Response Of Shafts

Authors: Mohsen Nabian, Hamid Nayeb Hashemi

Abstract: Turbo-generators shafts are manufactured through the extrusion process. This results in the formation of weak planes along the extrusion process. It has been observed that large longitudinal cracks often form in these shafts before any circumferential cracks when these shafts are subjected to cyclic torsion due to electrical line faults. The presence of these cracks could severely compromise the s… ▽ More Turbo-generators shafts are manufactured through the extrusion process. This results in the formation of weak planes along the extrusion process. It has been observed that large longitudinal cracks often form in these shafts before any circumferential cracks when these shafts are subjected to cyclic torsion due to electrical line faults. The presence of these cracks could severely compromise the shaft resonance frequencies. Dynamic response of shafts with longitudinal and circumferential cracks is investigated. The longitudinal cracked section of the shaft section is modeled as a shaft with reduced torsional rigidity. The torsional rigidity is obtained as a function of the crack depth. It was found for various shaft diameters, torsional rigidity could be represented as a function of crack depth/ shaft radius only. The circumferential cracked section is modeled as a torsional spring. The torsional spring constant has been obtained using fracture mechanics. It was found the resonance frequency of the shaft may be little affected by the presence of longitudinal crack. The resonance frequencies of the shaft with the circumferential crack depend on the crack length and location. The effects of crack surface interactions for both longitudinal and circumferential cracks were also investigated. △ Less

Submitted 11 August, 2018; originally announced October 2018.

arXiv:1803.03288 [pdf, other]

TicTac: Accelerating Distributed Deep Learning with Communication Scheduling

Authors: Sayed Hadi Hashemi, Sangeetha Abdu Jyothi, Roy H. Campbell

Abstract: State-of-the-art deep learning systems rely on iterative distributed training to tackle the increasing complexity of models and input data. The iteration time in these communication-heavy systems depends on the computation time, communication time and the extent of overlap of computation and communication. In this work, we identify a shortcoming in systems with graph representation for computati… ▽ More State-of-the-art deep learning systems rely on iterative distributed training to tackle the increasing complexity of models and input data. The iteration time in these communication-heavy systems depends on the computation time, communication time and the extent of overlap of computation and communication. In this work, we identify a shortcoming in systems with graph representation for computation, such as TensorFlow and PyTorch, that result in high variance in iteration time --- random order of received parameters across workers. We develop a system, TicTac, to improve the iteration time by fixing this issue in distributed deep learning with Parameter Servers while guaranteeing near-optimal overlap of communication and computation. TicTac identifies and enforces an order of network transfers which improves the iteration time using prioritization. Our system is implemented over TensorFlow and requires no changes to the model or developer inputs. TicTac improves the throughput by up to $37.7\%$ in inference and $19.2\%$ in training, while also reducing straggler effect by up to $2.3\times$. Our code is publicly available. △ Less

Submitted 3 October, 2018; v1 submitted 8 March, 2018; originally announced March 2018.

arXiv:1710.00112 [pdf]

Toward Scalable Machine Learning and Data Mining: the Bioinformatics Case

Authors: Faraz Faghri, Sayed Hadi Hashemi, Mohammad Babaeizadeh, Mike A. Nalls, Saurabh Sinha, Roy H. Campbell

Abstract: In an effort to overcome the data deluge in computational biology and bioinformatics and to facilitate bioinformatics research in the era of big data, we identify some of the most influential algorithms that have been widely used in the bioinformatics community. These top data mining and machine learning algorithms cover classification, clustering, regression, graphical model-based learning, and d… ▽ More In an effort to overcome the data deluge in computational biology and bioinformatics and to facilitate bioinformatics research in the era of big data, we identify some of the most influential algorithms that have been widely used in the bioinformatics community. These top data mining and machine learning algorithms cover classification, clustering, regression, graphical model-based learning, and dimensionality reduction. The goal of this study is to guide the focus of scalable computing experts in the endeavor of applying new storage and scalable computation designs to bioinformatics algorithms that merit their attention most, following the engineering maxim of "optimize the common case". △ Less

Submitted 29 September, 2017; originally announced October 2017.

arXiv:1710.00110 [pdf, other]

Decentralized User-Centric Access Control using PubSub over Blockchain

Authors: Sayed Hadi Hashemi, Faraz Faghri, Roy H Campbell

Abstract: We present a mechanism that puts users in the center of control and empowers them to dictate the access to their collections of data. Revisiting the fundamental mechanisms in security for providing protection, our solution uses capabilities, access lists, and access rights following well-understood formal notions for reasoning about access. This contribution presents a practical, correct, auditabl… ▽ More We present a mechanism that puts users in the center of control and empowers them to dictate the access to their collections of data. Revisiting the fundamental mechanisms in security for providing protection, our solution uses capabilities, access lists, and access rights following well-understood formal notions for reasoning about access. This contribution presents a practical, correct, auditable, transparent, distributed, and decentralized mechanism that is well-matched to the current emerging environments including Internet of Things, smart city, precision medicine, and autonomous cars. It is based on well-tested principles and practices used in a distributed authorization, cryptocurrencies, and scalable computing. △ Less

Submitted 29 September, 2017; originally announced October 2017.

arXiv:1612.00521 [pdf, other]

Performance Modeling of Distributed Deep Neural Networks

Authors: Sayed Hadi Hashemi, Shadi A. Noghabi, William Gropp, Roy H Campbell

Abstract: During the past decade, machine learning has become extremely popular and can be found in many aspects of our every day life. Nowayadays with explosion of data while rapid growth of computation capacity, Distributed Deep Neural Networks (DDNNs) which can improve their performance linearly with more computation resources, have become hot and trending. However, there has not been an in depth study o… ▽ More During the past decade, machine learning has become extremely popular and can be found in many aspects of our every day life. Nowayadays with explosion of data while rapid growth of computation capacity, Distributed Deep Neural Networks (DDNNs) which can improve their performance linearly with more computation resources, have become hot and trending. However, there has not been an in depth study of the performance of these systems, and how well they scale. In this paper we analyze CNTK, one of the most commonly used DDNNs, by first building a performance model and then evaluating the system two settings: a small cluster with all nodes in a single rack connected to a top of rack switch, and in large scale using Blue Waters with arbitary placement of nodes. Our main focus was the scalability of the system with respect to adding more nodes. Based on our results, this system has an excessive initialization overhead because of poor I/O utilization which dominates the whole execution time. Because of this, the system does not scale beyond a few nodes (4 in Blue Waters). Additionally, due to a single server-multiple worker design the server becomes a bottleneck after 16 nodes limiting the scalability of the CNTK. △ Less

Submitted 14 December, 2016; v1 submitted 1 December, 2016; originally announced December 2016.

arXiv:1607.04768 [pdf, other]

Hoffmann-Ostenhof's conjecture for traceable cubic graphs

Authors: F. Abdolhosseini, S. Akbari, H. Hashemi, M. S. Moradian

Abstract: It was conjectured by Hoffmann-Ostenhof that the edge set of every connected cubic graph can be decomposed into a spanning tree, a matching and a family of cycles. In this paper, we show that this conjecture holds for traceable cubic graphs. It was conjectured by Hoffmann-Ostenhof that the edge set of every connected cubic graph can be decomposed into a spanning tree, a matching and a family of cycles. In this paper, we show that this conjecture holds for traceable cubic graphs. △ Less

Submitted 16 July, 2016; originally announced July 2016.

MSC Class: 05C45; 05C70 (Primary)

arXiv:1409.7637 [pdf]

Experimental Demonstration of Nanosecond Accuracy Wireless Network Synchronization

Authors: Marcelo Segura, S. Niranjayan, Hossein Hashemi, Andreas F. Molisch

Abstract: Accurate wireless timing synchronization has been an extremely important topic in wireless sensor networks, required in applications ranging from distributed beam forming to precision localization and navigation. However, it is very challenging to realize, in particular when the required accuracy should be better than the runtime between the nodes. This work presents, to our knowledge for the firs… ▽ More Accurate wireless timing synchronization has been an extremely important topic in wireless sensor networks, required in applications ranging from distributed beam forming to precision localization and navigation. However, it is very challenging to realize, in particular when the required accuracy should be better than the runtime between the nodes. This work presents, to our knowledge for the first time, an experimental timing synchronization scheme that achieves a timing accuracy better than 5ns rms in a network with 4 nodes. The experimental hardware is built from commercially available components and based on software defined ultra wideband transceivers. The protocol for establishing the synchronization is based on our recently developed blink protocol that can scale from the small network demonstrated here to larger networks of hundreds or thousands of nodes. △ Less

Submitted 26 September, 2014; originally announced September 2014.

Comments: Submitted to ICC 2015

Showing 1–50 of 56 results for author: Hashemi, H