Search | arXiv e-print repository

Preserving Privacy and Utility in LLM-Based Product Recommendations

Authors: Tina Khezresmaeilzadeh, Jiang Zhang, Dimitrios Andreadis, Konstantinos Psounis

Abstract: Large Language Model (LLM)-based recommendation systems leverage powerful language models to generate personalized suggestions by processing user interactions and preferences. Unlike traditional recommendation systems that rely on structured data and collaborative filtering, LLM-based models process textual and contextual information, often using cloud-based infrastructure. This raises privacy con… ▽ More Large Language Model (LLM)-based recommendation systems leverage powerful language models to generate personalized suggestions by processing user interactions and preferences. Unlike traditional recommendation systems that rely on structured data and collaborative filtering, LLM-based models process textual and contextual information, often using cloud-based infrastructure. This raises privacy concerns, as user data is transmitted to remote servers, increasing the risk of exposure and reducing control over personal information. To address this, we propose a hybrid privacy-preserving recommendation framework which separates sensitive from nonsensitive data and only shares the latter with the cloud to harness LLM-powered recommendations. To restore lost recommendations related to obfuscated sensitive data, we design a de-obfuscation module that reconstructs sensitive recommendations locally. Experiments on real-world e-commerce datasets show that our framework achieves almost the same recommendation utility with a system which shares all data with an LLM, while preserving privacy to a large extend. Compared to obfuscation-only techniques, our approach improves HR@10 scores and category distribution alignment, offering a better balance between privacy and recommendation quality. Furthermore, our method runs efficiently on consumer-grade hardware, making privacy-aware LLM-based recommendation systems practical for real-world use. △ Less

Submitted 1 May, 2025; originally announced May 2025.

arXiv:2503.03160 [pdf, other]

SpinML: Customized Synthetic Data Generation for Private Training of Specialized ML Models

Authors: Jiang Zhang, Rohan Xavier Sequeira, Konstantinos Psounis

Abstract: Specialized machine learning (ML) models tailored to users needs and requests are increasingly being deployed on smart devices with cameras, to provide personalized intelligent services taking advantage of camera data. However, two primary challenges hinder the training of such models: the lack of publicly available labeled data suitable for specialized tasks and the inaccessibility of labeled pri… ▽ More Specialized machine learning (ML) models tailored to users needs and requests are increasingly being deployed on smart devices with cameras, to provide personalized intelligent services taking advantage of camera data. However, two primary challenges hinder the training of such models: the lack of publicly available labeled data suitable for specialized tasks and the inaccessibility of labeled private data due to concerns about user privacy. To address these challenges, we propose a novel system SpinML, where the server generates customized Synthetic image data to Privately traIN a specialized ML model tailored to the user request, with the usage of only a few sanitized reference images from the user. SpinML offers users fine-grained, object-level control over the reference images, which allows user to trade between the privacy and utility of the generated synthetic data according to their privacy preferences. Through experiments on three specialized model training tasks, we demonstrate that our proposed system can enhance the performance of specialized models without compromising users privacy preferences. △ Less

Submitted 7 April, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

Comments: 17 pages (with appendix), 6 figures, Accepted at The 25th Privacy Enhancing Technologies Symposium (PETS2025)

arXiv:2503.00659 [pdf, other]

CATS: A framework for Cooperative Autonomy Trust & Security

Authors: Namo Asavisanu, Tina Khezresmaeilzadeh, Rohan Sequeira, Hang Qiu, Fawad Ahmad, Konstantinos Psounis, Ramesh Govindan

Abstract: With cooperative perception, autonomous vehicles can wirelessly share sensor data and representations to overcome sensor occlusions, improving situational awareness. Securing such data exchanges is crucial for connected autonomous vehicles. Existing, automated reputation-based approaches often suffer from a delay between detection and exclusion of misbehaving vehicles, while majority-based approac… ▽ More With cooperative perception, autonomous vehicles can wirelessly share sensor data and representations to overcome sensor occlusions, improving situational awareness. Securing such data exchanges is crucial for connected autonomous vehicles. Existing, automated reputation-based approaches often suffer from a delay between detection and exclusion of misbehaving vehicles, while majority-based approaches have communication overheads that limits scalability. In this paper, we introduce CATS, a novel automated system that blends together the best traits of reputation-based and majority-based detection mechanisms to secure vehicle-to-everything (V2X) communications for cooperative perception, while preserving the privacy of cooperating vehicles. Our evaluation with city-scale simulations on realistic traffic data shows CATS's effectiveness in rapidly identifying and isolating misbehaving vehicles, with a low false negative rate and overheads, proving its suitability for real world deployments. △ Less

Submitted 1 March, 2025; originally announced March 2025.

arXiv:2411.01492 [pdf, other]

EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark

Authors: Ming Li, Jike Zhong, Tianle Chen, Yuxiang Lai, Konstantinos Psounis

Abstract: Recent studies on large language models (LLMs) and large multimodal models (LMMs) have demonstrated promising skills in various domains including science and mathematics. However, their capability in more challenging and real-world related scenarios like engineering has not been systematically studied. To bridge this gap, we propose EEE-Bench, a multimodal benchmark aimed at assessing LMMs' capabi… ▽ More Recent studies on large language models (LLMs) and large multimodal models (LMMs) have demonstrated promising skills in various domains including science and mathematics. However, their capability in more challenging and real-world related scenarios like engineering has not been systematically studied. To bridge this gap, we propose EEE-Bench, a multimodal benchmark aimed at assessing LMMs' capabilities in solving practical engineering tasks, using electrical and electronics engineering (EEE) as the testbed. Our benchmark consists of 2860 carefully curated problems spanning 10 essential subdomains such as analog circuits, control systems, etc. Compared to benchmarks in other domains, engineering problems are intrinsically 1) more visually complex and versatile and 2) less deterministic in solutions. Successful solutions to these problems often demand more-than-usual rigorous integration of visual and textual information as models need to understand intricate images like abstract circuits and system diagrams while taking professional instructions, making them excellent candidates for LMM evaluations. Alongside EEE-Bench, we provide extensive quantitative evaluations and fine-grained analysis of 17 widely-used open and closed-sourced LLMs and LMMs. Our results demonstrate notable deficiencies of current foundation models in EEE, with an average performance ranging from 19.48% to 46.78%. Finally, we reveal and explore a critical shortcoming in LMMs which we term laziness: the tendency to take shortcuts by relying on the text while overlooking the visual context when reasoning for technical image problems. In summary, we believe EEE-Bench not only reveals some noteworthy limitations of LMMs but also provides a valuable resource for advancing research on their application in practical engineering tasks, driving future improvements in their capability to handle complex, real-world scenarios. △ Less

Submitted 27 February, 2025; v1 submitted 3 November, 2024; originally announced November 2024.

Comments: Accepted to CVPR 2025

arXiv:2409.07444 [pdf, other]

Echoes of Privacy: Uncovering the Profiling Practices of Voice Assistants

Authors: Tina Khezresmaeilzadeh, Elaine Zhu, Kiersten Grieco, Daniel J. Dubois, Konstantinos Psounis, David Choffnes

Abstract: Many companies, including Google, Amazon, and Apple, offer voice assistants as a convenient solution for answering general voice queries and accessing their services. These voice assistants have gained popularity and can be easily accessed through various smart devices such as smartphones, smart speakers, smartwatches, and an increasing array of other devices. However, this convenience comes with… ▽ More Many companies, including Google, Amazon, and Apple, offer voice assistants as a convenient solution for answering general voice queries and accessing their services. These voice assistants have gained popularity and can be easily accessed through various smart devices such as smartphones, smart speakers, smartwatches, and an increasing array of other devices. However, this convenience comes with potential privacy risks. For instance, while companies vaguely mention in their privacy policies that they may use voice interactions for user profiling, it remains unclear to what extent this profiling occurs and whether voice interactions pose greater privacy risks compared to other interaction modalities. In this paper, we conduct 1171 experiments involving a total of 24530 queries with different personas and interaction modalities over the course of 20 months to characterize how the three most popular voice assistants profile their users. We analyze factors such as the labels assigned to users, their accuracy, the time taken to assign these labels, differences between voice and web interactions, and the effectiveness of profiling remediation tools offered by each voice assistant. Our findings reveal that profiling can happen without interaction, can be incorrect and inconsistent at times, may take several days to weeks for changes to occur, and can be influenced by the interaction modality. △ Less

Submitted 13 September, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

arXiv:2405.04551 [pdf, other]

Differentially Private Federated Learning without Noise Addition: When is it Possible?

Authors: Jiang Zhang, Konstantinos Psounis, Salman Avestimehr

Abstract: Federated Learning (FL) with Secure Aggregation (SA) has gained significant attention as a privacy preserving framework for training machine learning models while preventing the server from learning information about users' data from their individual encrypted model updates. Recent research has extended privacy guarantees of FL with SA by bounding the information leakage through the aggregate mode… ▽ More Federated Learning (FL) with Secure Aggregation (SA) has gained significant attention as a privacy preserving framework for training machine learning models while preventing the server from learning information about users' data from their individual encrypted model updates. Recent research has extended privacy guarantees of FL with SA by bounding the information leakage through the aggregate model over multiple training rounds thanks to leveraging the "noise" from other users' updates. However, the privacy metric used in that work (mutual information) measures the on-average privacy leakage, without providing any privacy guarantees for worse-case scenarios. To address this, in this work we study the conditions under which FL with SA can provide worst-case differential privacy guarantees. Specifically, we formally identify the necessary condition that SA can provide DP without addition noise. We then prove that when the randomness inside the aggregated model update is Gaussian with non-singular covariance matrix, SA can provide differential privacy guarantees with the level of privacy $ε$ bounded by the reciprocal of the minimum eigenvalue of the covariance matrix. However, we further demonstrate that in practice, these conditions are almost unlikely to hold and hence additional noise added in model updates is still required in order for SA in FL to achieve DP. Lastly, we discuss the potential solution of leveraging inherent randomness inside aggregated model update to reduce the amount of addition noise required for DP guarantee. △ Less

Submitted 23 October, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

arXiv:2312.08303 [pdf, other]

Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models

Authors: Jiang Zhang, Qiong Wu, Yiming Xu, Cheng Cao, Zheng Du, Konstantinos Psounis

Abstract: Toxic content detection is crucial for online services to remove inappropriate content that violates community standards. To automate the detection process, prior works have proposed varieties of machine learning (ML) approaches to train Language Models (LMs) for toxic content detection. However, both their accuracy and transferability across datasets are limited. Recently, Large Language Models (… ▽ More Toxic content detection is crucial for online services to remove inappropriate content that violates community standards. To automate the detection process, prior works have proposed varieties of machine learning (ML) approaches to train Language Models (LMs) for toxic content detection. However, both their accuracy and transferability across datasets are limited. Recently, Large Language Models (LLMs) have shown promise in toxic content detection due to their superior zero-shot and few-shot in-context learning ability as well as broad transferability on ML tasks. However, efficiently designing prompts for LLMs remains challenging. Moreover, the high run-time cost of LLMs may hinder their deployments in production. To address these challenges, in this work, we propose BD-LLM, a novel and efficient approach to Bootstrapping and Distilling LLMs for toxic content detection. Specifically, we design a novel prompting method named Decision-Tree-of-Thought (DToT) to bootstrap LLMs' detection performance and extract high-quality rationales. DToT can automatically select more fine-grained context to re-prompt LLMs when their responses lack confidence. Additionally, we use the rationales extracted via DToT to fine-tune student LMs. Our experimental results on various datasets demonstrate that DToT can improve the accuracy of LLMs by up to 4.6%. Furthermore, student LMs fine-tuned with rationales extracted via DToT outperform baselines on all datasets with up to 16.9\% accuracy improvement, while being more than 60x smaller than conventional LLMs. Finally, we observe that student LMs fine-tuned with rationales exhibit better cross-dataset transferability. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2308.03164 [pdf, other]

FireFly A Synthetic Dataset for Ember Detection in Wildfire

Authors: Yue Hu, Xinan Ye, Yifei Liu, Souvik Kundu, Gourav Datta, Srikar Mutnuri, Namo Asavisanu, Nora Ayanian, Konstantinos Psounis, Peter Beerel

Abstract: This paper presents "FireFly", a synthetic dataset for ember detection created using Unreal Engine 4 (UE4), designed to overcome the current lack of ember-specific training resources. To create the dataset, we present a tool that allows the automated generation of the synthetic labeled dataset with adjustable parameters, enabling data diversity from various environmental conditions, making the dat… ▽ More This paper presents "FireFly", a synthetic dataset for ember detection created using Unreal Engine 4 (UE4), designed to overcome the current lack of ember-specific training resources. To create the dataset, we present a tool that allows the automated generation of the synthetic labeled dataset with adjustable parameters, enabling data diversity from various environmental conditions, making the dataset both diverse and customizable based on user requirements. We generated a total of 19,273 frames that have been used to evaluate FireFly on four popular object detection models. Further to minimize human intervention, we leveraged a trained model to create a semi-automatic labeling process for real-life ember frames. Moreover, we demonstrated an up to 8.57% improvement in mean Average Precision (mAP) in real-world wildfire scenarios compared to models trained exclusively on a small real dataset. △ Less

Submitted 6 August, 2023; originally announced August 2023.

Comments: Artificial Intelligence (AI) and Humanitarian Assistance and Disaster Recovery (HADR) workshop, ICCV 2023 in Paris, France

ACM Class: I.4

arXiv:2210.08136 [pdf, other]

A Utility-Preserving Obfuscation Approach for YouTube Recommendations

Authors: Jiang Zhang, Hadi Askari, Konstantinos Psounis, Zubair Shafiq

Abstract: Online content platforms optimize engagement by providing personalized recommendations to their users. These recommendation systems track and profile users to predict relevant content a user is likely interested in. While the personalized recommendations provide utility to users, the tracking and profiling that enables them poses a privacy issue because the platform might infer potentially sensiti… ▽ More Online content platforms optimize engagement by providing personalized recommendations to their users. These recommendation systems track and profile users to predict relevant content a user is likely interested in. While the personalized recommendations provide utility to users, the tracking and profiling that enables them poses a privacy issue because the platform might infer potentially sensitive user interests. There is increasing interest in building privacy-enhancing obfuscation approaches that do not rely on cooperation from online content platforms. However, existing obfuscation approaches primarily focus on enhancing privacy but at the same time they degrade the utility because obfuscation introduces unrelated recommendations. We design and implement De-Harpo, an obfuscation approach for YouTube's recommendation system that not only obfuscates a user's video watch history to protect privacy but then also denoises the video recommendations by YouTube to preserve their utility. In contrast to prior obfuscation approaches, De-Harpo adds a denoiser that makes use of a "secret" input (i.e., a user's actual watch history) as well as information that is also available to the adversarial recommendation system (i.e., obfuscated watch history and corresponding "noisy" recommendations). Our large-scale evaluation of De-Harpo shows that it outperforms the state-of-the-art by a factor of 2x in terms of preserving utility for the same level of privacy, while maintaining stealthiness and robustness to de-obfuscation. △ Less

Submitted 16 June, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

arXiv:2208.02304 [pdf, other]

How Much Privacy Does Federated Learning with Secure Aggregation Guarantee?

Authors: Ahmed Roushdy Elkordy, Jiang Zhang, Yahya H. Ezzeldin, Konstantinos Psounis, Salman Avestimehr

Abstract: Federated learning (FL) has attracted growing interest for enabling privacy-preserving machine learning on data stored at multiple users while avoiding moving the data off-device. However, while data never leaves users' devices, privacy still cannot be guaranteed since significant computations on users' training data are shared in the form of trained local models. These local models have recently… ▽ More Federated learning (FL) has attracted growing interest for enabling privacy-preserving machine learning on data stored at multiple users while avoiding moving the data off-device. However, while data never leaves users' devices, privacy still cannot be guaranteed since significant computations on users' training data are shared in the form of trained local models. These local models have recently been shown to pose a substantial privacy threat through different privacy attacks such as model inversion attacks. As a remedy, Secure Aggregation (SA) has been developed as a framework to preserve privacy in FL, by guaranteeing the server can only learn the global aggregated model update but not the individual model updates. While SA ensures no additional information is leaked about the individual model update beyond the aggregated model update, there are no formal guarantees on how much privacy FL with SA can actually offer; as information about the individual dataset can still potentially leak through the aggregated model computed at the server. In this work, we perform a first analysis of the formal privacy guarantees for FL with SA. Specifically, we use Mutual Information (MI) as a quantification metric and derive upper bounds on how much information about each user's dataset can leak through the aggregated model update. When using the FedSGD aggregation algorithm, our theoretical bounds show that the amount of privacy leakage reduces linearly with the number of users participating in FL with SA. To validate our theoretical bounds, we use an MI Neural Estimator to empirically evaluate the privacy leakage under different FL setups on both the MNIST and CIFAR10 datasets. Our experiments verify our theoretical bounds for FedSGD, which show a reduction in privacy leakage as the number of users and local batch size grow, and an increase in privacy leakage with the number of training rounds. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: Accepted to appear in Proceedings on Privacy Enhancing Technologies (PoPETs) 2023

arXiv:2202.03679 [pdf, other]

A Unified Prediction Framework for Signal Maps

Authors: Emmanouil Alimpertis, Athina Markopoulou, Carter T. Butts, Evita Bakopoulou, Konstantinos Psounis

Abstract: Signal maps are essential for the planning and operation of cellular networks. However, the measurements needed to create such maps are expensive, often biased, not always reflecting the metrics of interest, and posing privacy risks. In this paper, we develop a unified framework for predicting cellular signal maps from limited measurements. Our framework builds on a state-of-the-art random-forest… ▽ More Signal maps are essential for the planning and operation of cellular networks. However, the measurements needed to create such maps are expensive, often biased, not always reflecting the metrics of interest, and posing privacy risks. In this paper, we develop a unified framework for predicting cellular signal maps from limited measurements. Our framework builds on a state-of-the-art random-forest predictor, or any other base predictor. We propose and combine three mechanisms that deal with the fact that not all measurements are equally important for a particular prediction task. First, we design quality-of-service functions ($Q$), including signal strength (RSRP) but also other metrics of interest to operators, i.e., coverage and call drop probability. By implicitly altering the loss function employed in learning, quality functions can also improve prediction for RSRP itself where it matters (e.g., MSE reduction up to 27% in the low signal strength regime, where errors are critical). Second, we introduce weight functions ($W$) to specify the relative importance of prediction at different locations and other parts of the feature space. We propose re-weighting based on importance sampling to obtain unbiased estimators when the sampling and target distributions are different. This yields improvements up to 20% for targets based on spatially uniform loss or losses based on user population density. Third, we apply the Data Shapley framework for the first time in this context: to assign values ($φ$) to individual measurement points, which capture the importance of their contribution to the prediction task. This improves prediction (e.g., from 64% to 94% in recall for coverage loss) by removing points with negative values, and can also enable data minimization. We evaluate our methods and demonstrate significant improvement in prediction performance, using several real-world datasets. △ Less

Submitted 12 February, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: Coverage Maps; Signal Strength Maps; LTE; RSRP; CQI; RSRQ; RSS; Importance Sampling; Random Forests; Carrier's Objectives; Call Drops;Key Performance Indicators

arXiv:2201.04782 [pdf, other]

Privacy-Utility Trades in Crowdsourced Signal Map Obfuscation

Authors: Jiang Zhang, Lillian Clark, Matthew Clark, Konstantinos Psounis, Peter Kairouz

Abstract: Cellular providers and data aggregating companies crowdsource celluar signal strength measurements from user devices to generate signal maps, which can be used to improve network performance. Recognizing that this data collection may be at odds with growing awareness of privacy concerns, we consider obfuscating such data before the data leaves the mobile device. The goal is to increase privacy suc… ▽ More Cellular providers and data aggregating companies crowdsource celluar signal strength measurements from user devices to generate signal maps, which can be used to improve network performance. Recognizing that this data collection may be at odds with growing awareness of privacy concerns, we consider obfuscating such data before the data leaves the mobile device. The goal is to increase privacy such that it is difficult to recover sensitive features from the obfuscated data (e.g. user ids and user whereabouts), while still allowing network providers to use the data for improving network services (i.e. create accurate signal maps). To examine this privacy-utility tradeoff, we identify privacy and utility metrics and threat models suited to signal strength measurements. We then obfuscate the measurements using several preeminent techniques, spanning differential privacy, generative adversarial privacy, and information-theoretic privacy techniques, in order to benchmark a variety of promising obfuscation approaches and provide guidance to real-world engineers who are tasked to build signal maps that protect privacy without hurting utility. Our evaluation results, based on multiple, diverse, real-world signal map datasets, demonstrate the feasibility of concurrently achieving adequate privacy and utility, with obfuscation strategies which use the structure and intended use of datasets in their design, and target average-case, rather than worst-case, guarantees. △ Less

Submitted 12 January, 2022; originally announced January 2022.

arXiv:2112.14947 [pdf, other]

doi 10.1145/3498361.3538925

AutoCast: Scalable Infrastructure-less Cooperative Perception for Distributed Collaborative Driving

Authors: Hang Qiu, Pohan Huang, Namo Asavisanu, Xiaochen Liu, Konstantinos Psounis, Ramesh Govindan

Abstract: Autonomous vehicles use 3D sensors for perception. Cooperative perception enables vehicles to share sensor readings with each other to improve safety. Prior work in cooperative perception scales poorly even with infrastructure support. AutoCast enables scalable infrastructure-less cooperative perception using direct vehicle-to-vehicle communication. It carefully determines which objects to share b… ▽ More Autonomous vehicles use 3D sensors for perception. Cooperative perception enables vehicles to share sensor readings with each other to improve safety. Prior work in cooperative perception scales poorly even with infrastructure support. AutoCast enables scalable infrastructure-less cooperative perception using direct vehicle-to-vehicle communication. It carefully determines which objects to share based on positional relationships between traffic participants, and the time evolution of their trajectories. It coordinates vehicles and optimally schedules transmissions in a distributed fashion. Extensive evaluation results under different scenarios show that, unlike competing approaches, AutoCast can avoid crashes and near-misses which occur frequently without cooperative perception, its performance scales gracefully in dense traffic scenarios providing 2-4x visibility into safety critical objects compared to existing cooperative perception schemes, its transmission schedules can be completed on the real radio testbed, and its scheduling algorithm is near-optimal with negligible computation overhead. △ Less

Submitted 30 December, 2021; originally announced December 2021.

Journal ref: ACM Mobisys 2022

arXiv:2112.03452 [pdf, other]

doi 10.1109/TMC.2023.3332034

Location Leakage in Federated Signal Maps

Authors: Evita Bakopoulou, Mengwei Yang, Jiang Zhang, Konstantinos Psounis, Athina Markopoulou

Abstract: We consider the problem of predicting cellular network performance (signal maps) from measurements collected by several mobile devices. We formulate the problem within the online federated learning framework: (i) federated learning (FL) enables users to collaboratively train a model, while keeping their training data on their devices; (ii) measurements are collected as users move around over time… ▽ More We consider the problem of predicting cellular network performance (signal maps) from measurements collected by several mobile devices. We formulate the problem within the online federated learning framework: (i) federated learning (FL) enables users to collaboratively train a model, while keeping their training data on their devices; (ii) measurements are collected as users move around over time and are used for local training in an online fashion. We consider an honest-but-curious server, who observes the updates from target users participating in FL and infers their location using a deep leakage from gradients (DLG) type of attack, originally developed to reconstruct training data of DNN image classifiers. We make the key observation that a DLG attack, applied to our setting, infers the average location of a batch of local data, and can thus be used to reconstruct the target users' trajectory at a coarse granularity. We build on this observation to protect location privacy, in our setting, by revisiting and designing mechanisms within the federated learning framework including: tuning the FL parameters for averaging, curating local batches so as to mislead the DLG attacker, and aggregating across multiple users with different trajectories. We evaluate the performance of our algorithms through both analysis and simulation based on real-world mobile datasets, and we show that they achieve a good privacy-utility tradeoff. △ Less

Submitted 5 January, 2024; v1 submitted 6 December, 2021; originally announced December 2021.

arXiv:2111.05792 [pdf, other]

doi 10.14722/ndss.2022.23062

HARPO: Learning to Subvert Online Behavioral Advertising

Authors: Jiang Zhang, Konstantinos Psounis, Muhammad Haroon, Zubair Shafiq

Abstract: Online behavioral advertising, and the associated tracking paraphernalia, poses a real privacy threat. Unfortunately, existing privacy-enhancing tools are not always effective against online advertising and tracking. We propose Harpo, a principled learning-based approach to subvert online behavioral advertising through obfuscation. Harpo uses reinforcement learning to adaptively interleave real pa… ▽ More Online behavioral advertising, and the associated tracking paraphernalia, poses a real privacy threat. Unfortunately, existing privacy-enhancing tools are not always effective against online advertising and tracking. We propose Harpo, a principled learning-based approach to subvert online behavioral advertising through obfuscation. Harpo uses reinforcement learning to adaptively interleave real page visits with fake pages to distort a tracker's view of a user's browsing profile. We evaluate Harpo against real-world user profiling and ad targeting models used for online behavioral advertising. The results show that Harpo improves privacy by triggering more than 40% incorrect interest segments and 6x higher bid values. Harpo outperforms existing obfuscation tools by as much as 16x for the same overhead. Harpo is also able to achieve better stealthiness to adversarial detection than existing obfuscation tools. Harpo meaningfully advances the state-of-the-art in leveraging obfuscation to subvert online behavioral advertising △ Less

Submitted 23 November, 2021; v1 submitted 8 November, 2021; originally announced November 2021.

Comments: Accepted at NDSS'22

arXiv:2007.04376 [pdf, other]

TEAM: Trilateration for Exploration and Mapping with Robotic Networks

Authors: Lillian Clark, Charles Andre, Joseph Galante, Bhaskar Krishnamachari, Konstantinos Psounis

Abstract: Motivated by lunar exploration, we consider deploying a network of mobile robots to explore an unknown environment while acting as a cooperative positioning system. Robots measure and communicate position-related data in order to perform localization in the absence of infrastructure-based solutions (e.g. stationary beacons or GPS). We present Trilateration for Exploration and Mapping (TEAM), a nov… ▽ More Motivated by lunar exploration, we consider deploying a network of mobile robots to explore an unknown environment while acting as a cooperative positioning system. Robots measure and communicate position-related data in order to perform localization in the absence of infrastructure-based solutions (e.g. stationary beacons or GPS). We present Trilateration for Exploration and Mapping (TEAM), a novel algorithm for low-complexity localization and mapping with robotic networks. TEAM is designed to leverage the capability of commercially-available ultra-wideband (UWB) radios on board the robots to provide range estimates with centimeter accuracy and perform anchorless localization in a shared, stationary frame. It is well-suited for feature-deprived environments, where feature-based localization approaches suffer. We provide experimental results in varied Gazebo simulation environments as well as on a testbed of Turtlebot3 Burgers with Pozyx UWB radios. We compare TEAM to the popular Rao-Blackwellized Particle Filter for Simultaneous Localization and Mapping (SLAM). We demonstrate that TEAM requires an order of magnitude less computational complexity and reduces the necessary sample rate of LiDAR measurements by an order of magnitude. These advantages do not require sacrificing performance, as TEAM reduces the maximum localization error by 50% and achieves up to a 28% increase in map accuracy in feature-deprived environments and comparable map accuracy in other settings. △ Less

Submitted 15 April, 2021; v1 submitted 8 July, 2020; originally announced July 2020.

Comments: 8 pages, 15 figures, 2021

arXiv:2004.04824 [pdf, other]

Optimal User-Cell Association for 360 Video Streaming over Dense Wireless Networks

Authors: Po-Han Huang, Konstantinos Psounis

Abstract: Delivering 360 degree video streaming for virtual and augmented reality presents many technical challenges especially in bandwidth starved wireless environments. Recently, a so-called two-tier approach has been proposed which delivers a basic-tier chunk and select enhancement-tier chunks to improve user experience while reducing network resources consumption. The video chunks are to be transmitted… ▽ More Delivering 360 degree video streaming for virtual and augmented reality presents many technical challenges especially in bandwidth starved wireless environments. Recently, a so-called two-tier approach has been proposed which delivers a basic-tier chunk and select enhancement-tier chunks to improve user experience while reducing network resources consumption. The video chunks are to be transmitted via unicast or multicast over an ultra-dense small cell infrastructure with enough bandwidth where small cells store video chunks in local caches. In this setup, user-cell association algorithms play a central role to efficiently deliver video since users may only download video chunks from the cell they are associated with. Motivated by this, we jointly formulate the problem of user-cell association and video chunk multicasting/unicasting as a mixed integer linear programming, prove its NP-hardness, and study the optimal solution via the Branch-and-Bound method. We then propose two polynomial-time, approximation algorithms and show via extensive simulations that they are near-optimal in practice and improve user experience by 30% compared to baseline user-cell association schemes. △ Less

Submitted 18 April, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

Comments: 12 pages

arXiv:1607.02598 [pdf, other]

Security Pricing as an Enabler of Cyber-Insurance: A First Look at Differentiated Pricing Markets

Authors: Ranjan Pal, Leana Golubchik, Konstantinos Psounis, Pan Hui

Abstract: Despite the promising potential of network risk management services (e.g., cyber-insurance) to improve information security, their deployment is relatively scarce, primarily due to such service companies being unable to guarantee profitability. As a novel approach to making cyber-insurance services more viable, we explore a symbiotic relationship between security vendors (e.g., Symantec) capable o… ▽ More Despite the promising potential of network risk management services (e.g., cyber-insurance) to improve information security, their deployment is relatively scarce, primarily due to such service companies being unable to guarantee profitability. As a novel approach to making cyber-insurance services more viable, we explore a symbiotic relationship between security vendors (e.g., Symantec) capable of price differentiating their clients, and cyber-insurance agencies having possession of information related to the security investments of their clients. The goal of this relationship is to (i) allow security vendors to price differentiate their clients based on security investment information from insurance agencies, (ii) allow the vendors to make more profit than in homogeneous pricing settings, and (iii) subsequently transfer some of the extra profit to cyber-insurance agencies to make insurance services more viable. \noindent In this paper, we perform a theoretical study of a market for differentiated security product pricing, primarily with a view to ensuring that security vendors (SVs) make more profit in the differentiated pricing case as compared to the case of non-differentiated pricing. In order to practically realize such pricing markets, we propose novel and \emph{computationally efficient} consumer differentiated pricing mechanisms for SVs based on (i) the market structure, (ii) the communication network structure of SV consumers captured via a consumer's \emph{Bonacich centrality} in the network, and (iii) security investment amounts made by SV consumers. △ Less

Submitted 9 July, 2016; originally announced July 2016.

Comments: arXiv admin note: text overlap with arXiv:1101.5617 by other authors without attribution

arXiv:1405.0089 [pdf, other]

Performance Modeling of Next-Generation Wireless Networks

Authors: Antonios Michaloliakos, Ryan Rogalin, Yonglong Zhang, Konstantinos Psounis, Giuseppe Caire

Abstract: The industry is satisfying the increasing demand for wireless bandwidth by densely deploying a large number of access points which are centrally managed, e.g. enterprise WiFi networks deployed in university campuses, companies, airports etc. This small cell architecture is gaining traction in the cellular world as well, as witnessed by the direction in which 4G+ and 5G standardization is moving. P… ▽ More The industry is satisfying the increasing demand for wireless bandwidth by densely deploying a large number of access points which are centrally managed, e.g. enterprise WiFi networks deployed in university campuses, companies, airports etc. This small cell architecture is gaining traction in the cellular world as well, as witnessed by the direction in which 4G+ and 5G standardization is moving. Prior academic work in analyzing such large-scale wireless networks either uses oversimplified models for the physical layer, or ignores other important, real-world aspects of the problem, like MAC layer considerations, topology characteristics, and protocol overhead. On the other hand, the industry is using for deployment purposes on-site surveys and simulation tools which do not scale, cannot efficiently optimize the design of such a network, and do not explain why one design choice is better than another. In this paper we introduce a simple yet accurate analytical model which combines the realism and practicality of industrial simulation tools with the ability to scale, analyze the effect of various design parameters, and optimize the performance of real- world deployments. The model takes into account all central system parameters, including channelization, power allocation, user scheduling, load balancing, MAC, advanced PHY techniques (single and multi user MIMO as well as cooperative transmission from multiple access points), topological characteristics and protocol overhead. The accuracy of the model is verified via extensive simulations and the model is used to study a wide range of real world scenarios, providing design guidelines on the effect of various design parameters on performance. △ Less

Submitted 1 May, 2014; originally announced May 2014.

arXiv:1310.7001 [pdf, other]

Scalable Synchronization and Reciprocity Calibration for Distributed Multiuser MIMO

Authors: Ryan Rogalin, Ozgun Bursalioglu, Haralabos Papadopoulos, Giuseppe Caire, Andreas Molisch, Antonios Michaloliakos, Vlad Balan, Konstantinos Psounis

Abstract: Large-scale distributed Multiuser MIMO (MU-MIMO) is a promising wireless network architecture that combines the advantages of "massive MIMO" and "small cells." It consists of several Access Points (APs) connected to a central server via a wired backhaul network and acting as a large distributed antenna system. We focus on the downlink, which is both more demanding in terms of traffic and more chal… ▽ More Large-scale distributed Multiuser MIMO (MU-MIMO) is a promising wireless network architecture that combines the advantages of "massive MIMO" and "small cells." It consists of several Access Points (APs) connected to a central server via a wired backhaul network and acting as a large distributed antenna system. We focus on the downlink, which is both more demanding in terms of traffic and more challenging in terms of implementation than the uplink. In order to enable multiuser joint precoding of the downlink signals, channel state information at the transmitter side is required. We consider Time Division Duplex (TDD), where the {\em downlink} channels can be learned from the user uplink pilot signals, thanks to channel reciprocity. Furthermore, coherent multiuser joint precoding is possible only if the APs maintain a sufficiently accurate relative timing and phase synchronization. AP synchronization and TDD reciprocity calibration are two key problems to be solved in order to enable distributed MU-MIMO downlink. In this paper, we propose novel over-the-air synchronization and calibration protocols that scale well with the network size. The proposed schemes can be applied to networks formed by a large number of APs, each of which is driven by an inexpensive 802.11-grade clock and has a standard RF front-end, not explicitly designed to be reciprocal. Our protocols can incorporate, as a building block, any suitable timing and frequency estimator. Here we revisit the problem of joint ML timing and frequency estimation and use the corresponding Cramer-Rao bound to evaluate the performance of the synchronization protocol. Overall, the proposed synchronization and calibration schemes are shown to achieve sufficient accuracy for satisfactory distributed MU-MIMO performance. △ Less

Submitted 31 March, 2015; v1 submitted 25 October, 2013; originally announced October 2013.

Comments: Replaced Figure 5 with correct version

arXiv:1208.2002 [pdf, ps, other]

Tag Spotting at the Interference Range

Authors: Horia Vlad Balan, Konstantinos Psounis

Abstract: In wireless networks, the presence of interference among wireless links in- troduces dependencies among flows that do not share a single link or node. As a result, when designing a resource allocation scheme, be it a medium access scheduler or a flow rate controller, one needs to consider the interdependence among nodes within interference range of each other. Specifically, control plane informati… ▽ More In wireless networks, the presence of interference among wireless links in- troduces dependencies among flows that do not share a single link or node. As a result, when designing a resource allocation scheme, be it a medium access scheduler or a flow rate controller, one needs to consider the interdependence among nodes within interference range of each other. Specifically, control plane information needs to reach nearby nodes which often lie outside the communi- cation range, but within the interference range of a node of interest. But how can one communicate control plane information well beyond the existing communication range? To address this fundamental need we introduce tag spotting. Tag spotting refers to a communication system which allows re- liable control data transmission at SNR values as low as 0 dB. It does this by employing a number of signal encoding techniques including adding redundancy to multitone modulation, shaping the spectrum to reduce inter-carrier interfer- ence, and the use of algebraic coding. Making use of a detection theory-based model we analyze the performance achievable by our modulation as well as the trade-off between the rate of the information transmitted and the likelihood of error. Using real-world experiments on an OFDM system built with software radios, we show that we can transmit data at the target SNR value of 0 dB with a 6% overhead; that is, 6% of our packet is used for our low-SNR decodable tags (which carry up to a couple of bytes of data in our testbed), while the remain- ing 94% is used for traditional header and payload data. We also demonstrate via simulations how tag spotting can be used in implementing fair and efficient rate control and scheduling schemes. △ Less

Submitted 9 August, 2012; originally announced August 2012.

Comments: 30 pages

arXiv:1205.6862 [pdf, other]

AirSync: Enabling Distributed Multiuser MIMO with Full Spatial Multiplexing

Authors: Horia Vlad Balan, Ryan Rogalin, Antonios Michaloliakos, Konstantinos Psounis, Giuseppe Caire

Abstract: The enormous success of advanced wireless devices is pushing the demand for higher wireless data rates. Denser spectrum reuse through the deployment of more access points per square mile has the potential to successfully meet the increasing demand for more bandwidth. In theory, the best approach to density increase is via distributed multiuser MIMO, where several access points are connected to a c… ▽ More The enormous success of advanced wireless devices is pushing the demand for higher wireless data rates. Denser spectrum reuse through the deployment of more access points per square mile has the potential to successfully meet the increasing demand for more bandwidth. In theory, the best approach to density increase is via distributed multiuser MIMO, where several access points are connected to a central server and operate as a large distributed multi-antenna access point, ensuring that all transmitted signal power serves the purpose of data transmission, rather than creating "interference." In practice, while enterprise networks offer a natural setup in which distributed MIMO might be possible, there are serious implementation difficulties, the primary one being the need to eliminate phase and timing offsets between the jointly coordinated access points. In this paper we propose AirSync, a novel scheme which provides not only time but also phase synchronization, thus enabling distributed MIMO with full spatial multiplexing gains. AirSync locks the phase of all access points using a common reference broadcasted over the air in conjunction with a Kalman filter which closely tracks the phase drift. We have implemented AirSync as a digital circuit in the FPGA of the WARP radio platform. Our experimental testbed, comprised of two access points and two clients, shows that AirSync is able to achieve phase synchronization within a few degrees, and allows the system to nearly achieve the theoretical optimal multiplexing gain. We also discuss MAC and higher layer aspects of a practical deployment. To the best of our knowledge, AirSync offers the first ever realization of the full multiuser MIMO gain, namely the ability to increase the number of wireless clients linearly with the number of jointly coordinated access points, without reducing the per client rate. △ Less

Submitted 14 August, 2012; v1 submitted 30 May, 2012; originally announced May 2012.

Comments: Submitted to Transactions on Networking

Report number: CENG-TR-2012-1

arXiv:1107.4785 [pdf, ps, other]

A Novel Cyber-Insurance for Internet Security

Authors: Ranjan Pal, Leana Golubchik, Konstantinos Psounis

Abstract: Internet users such as individuals and organizations are subject to different types of epidemic risks such as worms, viruses, and botnets. To reduce the probability of risk, an Internet user generally invests in self-defense mechanisms like antivirus and antispam software. However, such software does not completely eliminate risk. Recent works have considered the problem of residual risk eliminati… ▽ More Internet users such as individuals and organizations are subject to different types of epidemic risks such as worms, viruses, and botnets. To reduce the probability of risk, an Internet user generally invests in self-defense mechanisms like antivirus and antispam software. However, such software does not completely eliminate risk. Recent works have considered the problem of residual risk elimination by proposing the idea of cyber-insurance. In reality, an Internet user faces risks due to security attacks as well as risks due to non-security related failures (e.g., reliability faults in the form of hardware crash, buffer overflow, etc.) . These risk types are often indistinguishable by a naive user. However, a cyber-insurance agency would most likely insure risks only due to security attacks. In this case, it becomes a challenge for an Internet user to choose the right type of cyber-insurance contract as standard optimal contracts, i.e., contracts under security attacks only, might prove to be sub-optimal for himself. In this paper, we address the problem of analyzing cyber-insurance solutions when a user faces risks due to both, security as well as non-security related failures. We propose \emph{Aegis}, a novel cyber-insurance model in which the user accepts a fraction \emph{(strictly positive)} of loss recovery on himself and transfers rest of the loss recovery on the cyber-insurance agency. We mathematically show that given an option, Internet users would prefer Aegis contracts to traditional cyber-insurance contracts, under all premium types. This result firmly establishes the non-existence of traditional cyber-insurance markets when Aegis contracts are offered to users. △ Less

Submitted 24 July, 2011; originally announced July 2011.

arXiv:1007.4724 [pdf, ps, other]

CapEst: A Measurement-based Approach to Estimating Link Capacity in Wireless Networks

Authors: Apoorva Jindal, Konstantinos Psounis, Mingyan Liu

Abstract: Estimating link capacity in a wireless network is a complex task because the available capacity at a link is a function of not only the current arrival rate at that link, but also of the arrival rate at links which interfere with that link as well as of the nature of interference between these links. Models which accurately characterize this dependence are either too computationally complex to be… ▽ More Estimating link capacity in a wireless network is a complex task because the available capacity at a link is a function of not only the current arrival rate at that link, but also of the arrival rate at links which interfere with that link as well as of the nature of interference between these links. Models which accurately characterize this dependence are either too computationally complex to be useful or lack accuracy. Further, they have a high implementation overhead and make restrictive assumptions, which makes them inapplicable to real networks. In this paper, we propose CapEst, a general, simple yet accurate, measurement-based approach to estimating link capacity in a wireless network. To be computationally light, CapEst allows inaccuracy in estimation; however, using measurements, it can correct this inaccuracy in an iterative fashion and converge to the correct estimate. Our evaluation shows that CapEst always converged to within 5% of the correct value in less than 18 iterations. CapEst is model-independent, hence, is applicable to any MAC/PHY layer and works with auto-rate adaptation. Moreover, it has a low implementation overhead, can be used with any application which requires an estimate of residual capacity on a wireless link and can be implemented completely at the network layer without any support from the underlying chipset. △ Less

Submitted 27 July, 2010; originally announced July 2010.

arXiv:0810.3935 [pdf, ps, other]

Modeling Spatial and Temporal Dependencies of User Mobility in Wireless Mobile Networks

Authors: Wei-jen Hsu, Thrasyvoulos Spyropoulos, Konstantinos Psounis, Ahmed Helmy

Abstract: Realistic mobility models are fundamental to evaluate the performance of protocols in mobile ad hoc networks. Unfortunately, there are no mobility models that capture the non-homogeneous behaviors in both space and time commonly found in reality, while at the same time being easy to use and analyze. Motivated by this, we propose a time-variant community mobility model, referred to as the TVC mod… ▽ More Realistic mobility models are fundamental to evaluate the performance of protocols in mobile ad hoc networks. Unfortunately, there are no mobility models that capture the non-homogeneous behaviors in both space and time commonly found in reality, while at the same time being easy to use and analyze. Motivated by this, we propose a time-variant community mobility model, referred to as the TVC model, which realistically captures spatial and temporal correlations. We devise the communities that lead to skewed location visiting preferences, and time periods that allow us to model time dependent behaviors and periodic re-appearances of nodes at specific locations. To demonstrate the power and flexibility of the TVC model, we use it to generate synthetic traces that match the characteristics of a number of qualitatively different mobility traces, including wireless LAN traces, vehicular mobility traces, and human encounter traces. More importantly, we show that, despite the high level of realism achieved, our TVC model is still theoretically tractable. To establish this, we derive a number of important quantities related to protocol performance, such as the average node degree, the hitting time, and the meeting time, and provide examples of how to utilize this theory to guide design decisions in routing protocols. △ Less

Submitted 21 October, 2008; originally announced October 2008.

Comments: 14 pages, 9 figures

Showing 1–25 of 25 results for author: Psounis, K