-
The Query/Hit Model for Sequential Hypothesis Testing
Authors:
Mahshad Shariatnasab,
Stefano Rini,
Farhad Shirani,
S. Sitharama Iyengar
Abstract:
This work introduces the Query/Hit (Q/H) learning model. The setup consists of two agents. One agent, Alice, has access to a streaming source, while the other, Bob, does not have direct access to the source. Communication occurs through sequential Q/H pairs: Bob sends a sequence of source symbols (queries), and Alice responds with the waiting time until each query appears in the source stream (hit…
▽ More
This work introduces the Query/Hit (Q/H) learning model. The setup consists of two agents. One agent, Alice, has access to a streaming source, while the other, Bob, does not have direct access to the source. Communication occurs through sequential Q/H pairs: Bob sends a sequence of source symbols (queries), and Alice responds with the waiting time until each query appears in the source stream (hits). This model is motivated by scenarios with communication, computation, and privacy constraints that limit real-time access to the source. The error exponent for sequential hypothesis testing under the Q/H model is characterized, and a querying strategy, the Dynamic Scout-Sentinel Algorithm (DSSA), is proposed. The strategy employs a mutual information neural estimator to compute the error exponent associated with each query and to select the query with the highest efficiency. Extensive empirical evaluations on both synthetic and real-world datasets -- including mouse movement trajectories, typesetting patterns, and touch-based user interactions -- are provided to evaluate the performance of the proposed strategy in comparison with baselines, in terms of probability of error, query choice, and time-to-detection.
△ Less
Submitted 1 February, 2025;
originally announced February 2025.
-
In-Application Defense Against Evasive Web Scans through Behavioral Analysis
Authors:
Behzad Ousat,
Mahshad Shariatnasab,
Esteban Schafir,
Farhad Shirani Chaharsooghi,
Amin Kharraz
Abstract:
Web traffic has evolved to include both human users and automated agents, ranging from benign web crawlers to adversarial scanners such as those capable of credential stuffing, command injection, and account hijacking at the web scale. The estimated financial costs of these adversarial activities are estimated to exceed tens of billions of dollars in 2023. In this work, we introduce WebGuard, a lo…
▽ More
Web traffic has evolved to include both human users and automated agents, ranging from benign web crawlers to adversarial scanners such as those capable of credential stuffing, command injection, and account hijacking at the web scale. The estimated financial costs of these adversarial activities are estimated to exceed tens of billions of dollars in 2023. In this work, we introduce WebGuard, a low-overhead in-application forensics engine, to enable robust identification and monitoring of automated web scanners, and help mitigate the associated security risks. WebGuard focuses on the following design criteria: (i) integration into web applications without any changes to the underlying software components or infrastructure, (ii) minimal communication overhead, (iii) capability for real-time detection, e.g., within hundreds of milliseconds, and (iv) attribution capability to identify new behavioral patterns and detect emerging agent categories. To this end, we have equipped WebGuard with multi-modal behavioral monitoring mechanisms, such as monitoring spatio-temporal data and browser events. We also design supervised and unsupervised learning architectures for real-time detection and offline attribution of human and automated agents, respectively. Information theoretic analysis and empirical evaluations are provided to show that multi-modal data analysis, as opposed to uni-modal analysis which relies solely on mouse movement dynamics, significantly improves time-to-detection and attribution accuracy. Various numerical evaluations using real-world data collected via WebGuard are provided achieving high accuracy in hundreds of milliseconds, with a communication overhead below 10 KB per second.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
The Privacy-Utility Tradeoff in Rank-Preserving Dataset Obfuscation
Authors:
Mahshad Shariatnasab,
Farhad Shirani,
S. Sitharma Iyengar
Abstract:
Dataset obfuscation refers to techniques in which random noise is added to the entries of a given dataset, prior to its public release, to protect against leakage of private information. In this work, dataset obfuscation under two objectives is considered: i) rank-preservation: to preserve the row ordering in the obfuscated dataset induced by a given rank function, and ii) anonymity: to protect us…
▽ More
Dataset obfuscation refers to techniques in which random noise is added to the entries of a given dataset, prior to its public release, to protect against leakage of private information. In this work, dataset obfuscation under two objectives is considered: i) rank-preservation: to preserve the row ordering in the obfuscated dataset induced by a given rank function, and ii) anonymity: to protect user anonymity under fingerprinting attacks. The first objective, rank-preservation, is of interest in applications such as the design of search engines and recommendation systems, feature matching, and social network analysis. Fingerprinting attacks, considered in evaluating the anonymity objective, are privacy attacks where an attacker constructs a fingerprint of a victim based on its observed activities, such as online web activities, and compares this fingerprint with information extracted from a publicly released obfuscated dataset to identify the victim. By evaluating the performance limits of a class of obfuscation mechanisms over asymptotically large datasets, a fundamental trade-off is quantified between rank-preservation and user anonymity. Single-letter obfuscation mechanisms are considered, where each entry in the dataset is perturbed by independent noise, and their fundamental performance limits are characterized by leveraging large deviation techniques. The optimal obfuscating test-channel, optimizing the privacy-utility tradeoff, is characterized in the form of a convex optimization problem which can be solved efficiently. Numerical simulations of various scenarios are provided to verify the theoretical derivations.
△ Less
Submitted 11 May, 2023;
originally announced May 2023.
-
Privacy Limits in Power-Law Bipartite Networks under Active Fingerprinting Attacks
Authors:
M. Shariatnasab,
F. Shirani,
Z. Anwar
Abstract:
This work considers the fundamental privacy limits under active fingerprinting attacks in power-law bipartite networks. The scenario arises naturally in social network analysis, tracking user mobility in wireless networks, and forensics applications, among others. A stochastic growing network generation model -- called the popularity-based model -- is investigated, where the bipartite network is g…
▽ More
This work considers the fundamental privacy limits under active fingerprinting attacks in power-law bipartite networks. The scenario arises naturally in social network analysis, tracking user mobility in wireless networks, and forensics applications, among others. A stochastic growing network generation model -- called the popularity-based model -- is investigated, where the bipartite network is generated iteratively, and in each iteration vertices attract new edges based on their assigned popularity values. It is shown that using the appropriate choice of initial popularity values, the node degree distribution follows a power-law distribution with arbitrary parameter $α>2$, i.e. fraction of nodes with degree $d$ is proportional to $d^{-α}$. An active fingerprinting deanonymization attack strategy called the augmented information threshold attack strategy (A-ITS) is proposed which uses the attacker's knowledge of the node degree distribution along with the concept of information values for deanonymization. Sufficient conditions for the success of the A-ITS, based on network parameters, are derived. It is shown through simulations that the proposed attack significantly outperforms the state-of-the-art attack strategies.
△ Less
Submitted 11 February, 2022;
originally announced February 2022.
-
Fundamental Privacy Limits in Bipartite Networks under Active Attacks
Authors:
Mahshad Shariatnasab,
Farhad Shirani,
Elza Erkip
Abstract:
This work considers active deanonymization of bipartite networks. The scenario arises naturally in evaluating privacy in various applications such as social networks, mobility networks, and medical databases. For instance, in active deanonymization of social networks, an anonymous victim is targeted by an attacker (e.g. the victim visits the attacker's website), and the attacker queries her group…
▽ More
This work considers active deanonymization of bipartite networks. The scenario arises naturally in evaluating privacy in various applications such as social networks, mobility networks, and medical databases. For instance, in active deanonymization of social networks, an anonymous victim is targeted by an attacker (e.g. the victim visits the attacker's website), and the attacker queries her group memberships (e.g. by querying the browser history) to deanonymize her. In this work, the fundamental limits of privacy, in terms of the minimum number of queries necessary for deanonymization, is investigated. A stochastic model is considered, where i) the bipartite network of group memberships is generated randomly, ii) the attacker has partial prior knowledge of the group memberships, and iii) it receives noisy responses to its real-time queries. The bipartite network is generated based on linear and sublinear preferential attachment, and the stochastic block model. The victim's identity is chosen randomly based on a distribution modeling the users' risk of being the victim (e.g. probability of visiting the website). An attack algorithm is proposed which builds upon techniques from communication with feedback, and its performance, in terms of expected number of queries, is analyzed. Simulation results are provided to verify the theoretical derivations.
△ Less
Submitted 8 June, 2021;
originally announced June 2021.
-
On Graph Matching Using Generalized Seed Side-Information
Authors:
Mahshad Shariatnasab,
Farhad Shirani,
Siddharth Garg,
Elza Erkip
Abstract:
In this paper, matching pairs of stocahstically generated graphs in the presence of generalized seed side-information is considered. The graph matching problem emerges naturally in various applications such as social network de-anonymization, image processing, DNA sequencing, and natural language processing. A pair of randomly generated labeled Erdos-Renyi graphs with pairwise correlated edges are…
▽ More
In this paper, matching pairs of stocahstically generated graphs in the presence of generalized seed side-information is considered. The graph matching problem emerges naturally in various applications such as social network de-anonymization, image processing, DNA sequencing, and natural language processing. A pair of randomly generated labeled Erdos-Renyi graphs with pairwise correlated edges are considered. It is assumed that the matching strategy has access to the labeling of the vertices in the first graph, as well as a collection of shortlists -- called ambiguity sets -- of possible labels for the vertices of the second graph. The objective is to leverage the correlation among the edges of the graphs along with the side-information provided in the form of ambiguity sets to recover the labels of the vertices in the second graph. This scenario can be viewed as a generalization of the seeded graph matching problem, where the ambiguity sets take a specific form such that the exact labels for a subset of vertices in the second graph are known prior to matching. A matching strategy is proposed which operates by evaluating the joint typicality of the adjacency matrices of the graphs. Sufficient conditions on the edge statistics as well as ambiguity set statistics are derived under which the proposed matching strategy successfully recovers the labels of the vertices in the second graph. Additionally, Fano-type arguments are used to derive general necessary conditions for successful matching.
△ Less
Submitted 11 February, 2021;
originally announced February 2021.