Search | arXiv e-print repository

arXiv:2009.12344 [pdf, other]

A little goes a long way: Improving toxic language classification despite data scarcity

Authors: Mika Juuti, Tommi Gröndahl, Adrian Flanagan, N. Asokan

Abstract: Detection of some types of toxic language is hampered by extreme scarcity of labeled training data. Data augmentation - generating new synthetic data from a labeled seed dataset - can help. The efficacy of data augmentation on toxic language classification has not been fully explored. We present the first systematic study on how data augmentation techniques impact performance across toxic language… ▽ More Detection of some types of toxic language is hampered by extreme scarcity of labeled training data. Data augmentation - generating new synthetic data from a labeled seed dataset - can help. The efficacy of data augmentation on toxic language classification has not been fully explored. We present the first systematic study on how data augmentation techniques impact performance across toxic language classifiers, ranging from shallow logistic regression architectures to BERT - a state-of-the-art pre-trained Transformer network. We compare the performance of eight techniques on very scarce seed datasets. We show that while BERT performed the best, shallow classifiers performed comparably when trained on data augmented with a combination of three techniques, including GPT-2-generated sentences. We discuss the interplay of performance and computational overhead, which can inform the choice of techniques under different constraints. △ Less

Submitted 24 October, 2020; v1 submitted 25 September, 2020; originally announced September 2020.

Comments: To appear in Findings of ACL: EMNLP 2020

arXiv:1910.05429 [pdf, other]

Extraction of Complex DNN Models: Real Threat or Boogeyman?

Authors: Buse Gul Atli, Sebastian Szyller, Mika Juuti, Samuel Marchal, N. Asokan

Abstract: Recently, machine learning (ML) has introduced advanced solutions to many domains. Since ML models provide business advantage to model owners, protecting intellectual property of ML models has emerged as an important consideration. Confidentiality of ML models can be protected by exposing them to clients only via prediction APIs. However, model extraction attacks can steal the functionality of ML… ▽ More Recently, machine learning (ML) has introduced advanced solutions to many domains. Since ML models provide business advantage to model owners, protecting intellectual property of ML models has emerged as an important consideration. Confidentiality of ML models can be protected by exposing them to clients only via prediction APIs. However, model extraction attacks can steal the functionality of ML models using the information leaked to clients through the results returned via the API. In this work, we question whether model extraction is a serious threat to complex, real-life ML models. We evaluate the current state-of-the-art model extraction attack (Knockoff nets) against complex models. We reproduce and confirm the results in the original paper. But we also show that the performance of this attack can be limited by several factors, including ML model architecture and the granularity of API response. Furthermore, we introduce a defense based on distinguishing queries used for Knockoff nets from benign queries. Despite the limitations of the Knockoff nets, we show that a more realistic adversary can effectively steal complex ML models and evade known defenses. △ Less

Submitted 27 May, 2020; v1 submitted 11 October, 2019; originally announced October 2019.

Comments: 16 pages, 1 figure, Accepted for publication in AAAI-20 Workshop on Engineering Dependable and Secure Machine Learning Systems (AAAI-EDSMLS 2020)

arXiv:1906.03397 [pdf, other]

doi 10.1145/3338501.3357366

Making targeted black-box evasion attacks effective and efficient

Authors: Mika Juuti, Buse Gul Atli, N. Asokan

Abstract: We investigate how an adversary can optimally use its query budget for targeted evasion attacks against deep neural networks in a black-box setting. We formalize the problem setting and systematically evaluate what benefits the adversary can gain by using substitute models. We show that there is an exploration-exploitation tradeoff in that query efficiency comes at the cost of effectiveness. We pr… ▽ More We investigate how an adversary can optimally use its query budget for targeted evasion attacks against deep neural networks in a black-box setting. We formalize the problem setting and systematically evaluate what benefits the adversary can gain by using substitute models. We show that there is an exploration-exploitation tradeoff in that query efficiency comes at the cost of effectiveness. We present two new attack strategies for using substitute models and show that they are as effective as previous query-only techniques but require significantly fewer queries, by up to three orders of magnitude. We also show that an agile adversary capable of switching through different attack techniques can achieve pareto-optimal efficiency. We demonstrate our attack against Google Cloud Vision showing that the difficulty of black-box attacks against real-world prediction APIs is significantly easier than previously thought (requiring approximately 500 queries instead of approximately 20,000 as in previous works). △ Less

Submitted 8 June, 2019; originally announced June 2019.

Comments: 12 pages, 10 figures

Journal ref: AISec 2019: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security

arXiv:1808.09115 [pdf, ps, other]

All You Need is "Love": Evading Hate-speech Detection

Authors: Tommi Gröndahl, Luca Pajola, Mika Juuti, Mauro Conti, N. Asokan

Abstract: With the spread of social networks and their unfortunate use for hate speech, automatic detection of the latter has become a pressing problem. In this paper, we reproduce seven state-of-the-art hate speech detection models from prior work, and show that they perform well only when tested on the same type of data they were trained on. Based on these results, we argue that for successful hate speech… ▽ More With the spread of social networks and their unfortunate use for hate speech, automatic detection of the latter has become a pressing problem. In this paper, we reproduce seven state-of-the-art hate speech detection models from prior work, and show that they perform well only when tested on the same type of data they were trained on. Based on these results, we argue that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria. We further show that all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech. A combination of these methods is also effective against Google Perspective -- a cutting-edge solution from industry. Our experiments demonstrate that adversarial training does not completely mitigate the attacks, and using character-level features makes the models systematically more attack-resistant than using word-level features. △ Less

Submitted 5 November, 2018; v1 submitted 28 August, 2018; originally announced August 2018.

Comments: 11 pages, Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security (AISec) 2018

arXiv:1805.02628 [pdf, other]

PRADA: Protecting against DNN Model Stealing Attacks

Authors: Mika Juuti, Sebastian Szyller, Samuel Marchal, N. Asokan

Abstract: Machine learning (ML) applications are increasingly prevalent. Protecting the confidentiality of ML models becomes paramount for two reasons: (a) a model can be a business advantage to its owner, and (b) an adversary may use a stolen model to find transferable adversarial examples that can evade classification by the original model. Access to the model can be restricted to be only via well-defined… ▽ More Machine learning (ML) applications are increasingly prevalent. Protecting the confidentiality of ML models becomes paramount for two reasons: (a) a model can be a business advantage to its owner, and (b) an adversary may use a stolen model to find transferable adversarial examples that can evade classification by the original model. Access to the model can be restricted to be only via well-defined prediction APIs. Nevertheless, prediction APIs still provide enough information to allow an adversary to mount model extraction attacks by sending repeated queries via the prediction API. In this paper, we describe new model extraction attacks using novel approaches for generating synthetic queries, and optimizing training hyperparameters. Our attacks outperform state-of-the-art model extraction in terms of transferability of both targeted and non-targeted adversarial examples (up to +29-44 percentage points, pp), and prediction accuracy (up to +46 pp) on two datasets. We provide take-aways on how to perform effective model extraction attacks. We then propose PRADA, the first step towards generic and effective detection of DNN model extraction attacks. It analyzes the distribution of consecutive API queries and raises an alarm when this distribution deviates from benign behavior. We show that PRADA can detect all prior model extraction attacks with no false positives. △ Less

Submitted 31 March, 2019; v1 submitted 7 May, 2018; originally announced May 2018.

Comments: 17 pages, 7 figures, 9 tables. Accepted for publication in the 4th IEEE European Symposium on Security and Privacy (EuroS&P 2019)

arXiv:1805.02400 [pdf, other]

Stay On-Topic: Generating Context-specific Fake Restaurant Reviews

Authors: Mika Juuti, Bo Sun, Tatsuya Mori, N. Asokan

Abstract: Automatically generated fake restaurant reviews are a threat to online review systems. Recent research has shown that users have difficulties in detecting machine-generated fake reviews hiding among real restaurant reviews. The method used in this work (char-LSTM ) has one drawback: it has difficulties staying in context, i.e. when it generates a review for specific target entity, the resulting re… ▽ More Automatically generated fake restaurant reviews are a threat to online review systems. Recent research has shown that users have difficulties in detecting machine-generated fake reviews hiding among real restaurant reviews. The method used in this work (char-LSTM ) has one drawback: it has difficulties staying in context, i.e. when it generates a review for specific target entity, the resulting review may contain phrases that are unrelated to the target, thus increasing its detectability. In this work, we present and evaluate a more sophisticated technique based on neural machine translation (NMT) with which we can generate reviews that stay on-topic. We test multiple variants of our technique using native English speakers on Amazon Mechanical Turk. We demonstrate that reviews generated by the best variant have almost optimal undetectability (class-averaged F-score 47%). We conduct a user study with skeptical users and show that our method evades detection more frequently compared to the state-of-the-art (average evasion 3.2/4 vs 1.5/4) with statistical significance, at level α = 1% (Section 4.3). We develop very effective detection tools and reach average F-score of 97% in classifying these. Although fake reviews are very effective in fooling people, effective automatic detection is still feasible. △ Less

Submitted 28 June, 2018; v1 submitted 7 May, 2018; originally announced May 2018.

Comments: 21 pages, 5 figures, 6 tables. Accepted for publication in the European Symposium on Research in Computer Security (ESORICS) 2018

arXiv:1610.02801 [pdf, other]

STASH: Securing transparent authentication schemes using prover-side proximity verification

Authors: Mika Juuti, Christian Vaas, Ivo Sluganovic, Hans Liljestrand, N. Asokan, Ivan Martinovic

Abstract: Transparent authentication (TA) schemes are those in which a user is authenticated by a verifier without requiring explicit user interaction. By doing so, those schemes promise high usability and security simultaneously. The majority of TA implementations rely on the received signal strength as an indicator for the proximity of a user device (prover). However, such implicit proximity verification… ▽ More Transparent authentication (TA) schemes are those in which a user is authenticated by a verifier without requiring explicit user interaction. By doing so, those schemes promise high usability and security simultaneously. The majority of TA implementations rely on the received signal strength as an indicator for the proximity of a user device (prover). However, such implicit proximity verification is not secure against an adversary who can relay messages over a larger distance. In this paper, we propose a novel approach for thwarting relay attacks in TA schemes: the prover permits access to authentication credentials only if it can confirm that it is near the verifier. We present STASH, a system for relay-resilient transparent authentication in which the prover does proximity verification by comparing its approach trajectory towards the intended verifier with known authorized reference trajectories. Trajectories are measured using low-cost sensors commonly available on personal devices. We demonstrate the security of STASH against a class of adversaries and its ease-of-use by analyzing empirical data, collected using a STASH prototype. STASH is efficient and can be easily integrated to complement existing TA schemes. △ Less

Submitted 29 March, 2017; v1 submitted 10 October, 2016; originally announced October 2016.

Comments: Updated name of paper. Paper accepted to IEEE SECON'17

arXiv:1505.05779 [pdf, ps, other]

Pitfalls in Designing Zero-Effort Deauthentication: Opportunistic Human Observation Attacks

Authors: O. Huhta, P. Shrestha, S. Udar, M. Juuti, N. Saxena, N. Asokan

Abstract: Deauthentication is an important component of any authentication system. The widespread use of computing devices in daily life has underscored the need for zero-effort deauthentication schemes. However, the quest for eliminating user effort may lead to hidden security flaws in the authentication schemes. As a case in point, we investigate a prominent zero-effort deauthentication scheme, called ZEB… ▽ More Deauthentication is an important component of any authentication system. The widespread use of computing devices in daily life has underscored the need for zero-effort deauthentication schemes. However, the quest for eliminating user effort may lead to hidden security flaws in the authentication schemes. As a case in point, we investigate a prominent zero-effort deauthentication scheme, called ZEBRA, which provides an interesting and a useful solution to a difficult problem as demonstrated in the original paper. We identify a subtle incorrect assumption in its adversary model that leads to a fundamental design flaw. We exploit this to break the scheme with a class of attacks that are much easier for a human to perform in a realistic adversary model, compared to the naïve attacks studied in the ZEBRA paper. For example, one of our main attacks, where the human attacker has to opportunistically mimic only the victim's keyboard typing activity at a nearby terminal, is significantly more successful compared to the naïve attack that requires mimicking keyboard and mouse activities as well as keyboard-mouse movements. Further, by understanding the design flaws in ZEBRA as cases of tainted input, we show that we can draw on well-understood design principles to improve ZEBRA's security. △ Less

Submitted 14 February, 2016; v1 submitted 21 May, 2015; originally announced May 2015.

ACM Class: K.6.5

Showing 1–8 of 8 results for author: Juuti, M