-
Forging and Removing Latent-Noise Diffusion Watermarks Using a Single Image
Authors:
Anubhav Jain,
Yuya Kobayashi,
Naoki Murata,
Yuhta Takida,
Takashi Shibuya,
Yuki Mitsufuji,
Niv Cohen,
Nasir Memon,
Julian Togelius
Abstract:
Watermarking techniques are vital for protecting intellectual property and preventing fraudulent use of media. Most previous watermarking schemes designed for diffusion models embed a secret key in the initial noise. The resulting pattern is often considered hard to remove and forge into unrelated images. In this paper, we propose a black-box adversarial attack without presuming access to the diff…
▽ More
Watermarking techniques are vital for protecting intellectual property and preventing fraudulent use of media. Most previous watermarking schemes designed for diffusion models embed a secret key in the initial noise. The resulting pattern is often considered hard to remove and forge into unrelated images. In this paper, we propose a black-box adversarial attack without presuming access to the diffusion model weights. Our attack uses only a single watermarked example and is based on a simple observation: there is a many-to-one mapping between images and initial noises. There are regions in the clean image latent space pertaining to each watermark that get mapped to the same initial noise when inverted. Based on this intuition, we propose an adversarial attack to forge the watermark by introducing perturbations to the images such that we can enter the region of watermarked images. We show that we can also apply a similar approach for watermark removal by learning perturbations to exit this region. We report results on multiple watermarking schemes (Tree-Ring, RingID, WIND, and Gaussian Shading) across two diffusion models (SDv1.4 and SDv2.0). Our results demonstrate the effectiveness of the attack and expose vulnerabilities in the watermarking methods, motivating future research on improving them.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
FaceCloak: Learning to Protect Face Templates
Authors:
Sudipta Banerjee,
Anubhav Jain,
Chinmay Hegde,
Nasir Memon
Abstract:
Generative models can reconstruct face images from encoded representations (templates) bearing remarkable likeness to the original face raising security and privacy concerns. We present FaceCloak, a neural network framework that protects face templates by generating smart, renewable binary cloaks. Our method proactively thwarts inversion attacks by cloaking face templates with unique disruptors sy…
▽ More
Generative models can reconstruct face images from encoded representations (templates) bearing remarkable likeness to the original face raising security and privacy concerns. We present FaceCloak, a neural network framework that protects face templates by generating smart, renewable binary cloaks. Our method proactively thwarts inversion attacks by cloaking face templates with unique disruptors synthesized from a single face template on the fly while provably retaining biometric utility and unlinkability. Our cloaked templates can suppress sensitive attributes while generalizing to novel feature extraction schemes and outperforms leading baselines in terms of biometric matching and resiliency to reconstruction attacks. FaceCloak-based matching is extremely fast (inference time cost=0.28ms) and light-weight (0.57MB).
△ Less
Submitted 8 April, 2025;
originally announced April 2025.
-
WavePulse: Real-time Content Analytics of Radio Livestreams
Authors:
Govind Mittal,
Sarthak Gupta,
Shruti Wagle,
Chirag Chopra,
Anthony J DeMattee,
Nasir Memon,
Mustaque Ahamad,
Chinmay Hegde
Abstract:
Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally…
▽ More
Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.
△ Less
Submitted 29 January, 2025; v1 submitted 23 December, 2024;
originally announced December 2024.
-
TraSCE: Trajectory Steering for Concept Erasure
Authors:
Anubhav Jain,
Yuya Kobayashi,
Takashi Shibuya,
Yuhta Takida,
Nasir Memon,
Julian Togelius,
Yuki Mitsufuji
Abstract:
Recent advancements in text-to-image diffusion models have brought them to the public spotlight, becoming widely accessible and embraced by everyday users. However, these models have been shown to generate harmful content such as not-safe-for-work (NSFW) images. While approaches have been proposed to erase such abstract concepts from the models, jail-breaking techniques have succeeded in bypassing…
▽ More
Recent advancements in text-to-image diffusion models have brought them to the public spotlight, becoming widely accessible and embraced by everyday users. However, these models have been shown to generate harmful content such as not-safe-for-work (NSFW) images. While approaches have been proposed to erase such abstract concepts from the models, jail-breaking techniques have succeeded in bypassing such safety measures. In this paper, we propose TraSCE, an approach to guide the diffusion trajectory away from generating harmful content. Our approach is based on negative prompting, but as we show in this paper, a widely used negative prompting strategy is not a complete solution and can easily be bypassed in some corner cases. To address this issue, we first propose using a specific formulation of negative prompting instead of the widely used one. Furthermore, we introduce a localized loss-based guidance that enhances the modified negative prompting technique by steering the diffusion trajectory. We demonstrate that our proposed method achieves state-of-the-art results on various benchmarks in removing harmful content, including ones proposed by red teams, and erasing artistic styles and objects. Our proposed approach does not require any training, weight modifications, or training data (either image or prompt), making it easier for model owners to erase new concepts.
△ Less
Submitted 17 March, 2025; v1 submitted 10 December, 2024;
originally announced December 2024.
-
Classifier-Free Guidance inside the Attraction Basin May Cause Memorization
Authors:
Anubhav Jain,
Yuya Kobayashi,
Takashi Shibuya,
Yuhta Takida,
Nasir Memon,
Julian Togelius,
Yuki Mitsufuji
Abstract:
Diffusion models are prone to exactly reproduce images from the training data. This exact reproduction of the training data is concerning as it can lead to copyright infringement and/or leakage of privacy-sensitive information. In this paper, we present a novel perspective on the memorization phenomenon and propose a simple yet effective approach to mitigate it. We argue that memorization occurs b…
▽ More
Diffusion models are prone to exactly reproduce images from the training data. This exact reproduction of the training data is concerning as it can lead to copyright infringement and/or leakage of privacy-sensitive information. In this paper, we present a novel perspective on the memorization phenomenon and propose a simple yet effective approach to mitigate it. We argue that memorization occurs because of an attraction basin in the denoising process which steers the diffusion trajectory towards a memorized image. However, this can be mitigated by guiding the diffusion trajectory away from the attraction basin by not applying classifier-free guidance until an ideal transition point occurs from which classifier-free guidance is applied. This leads to the generation of non-memorized images that are high in image quality and well-aligned with the conditioning mechanism. To further improve on this, we present a new guidance technique, opposite guidance, that escapes the attraction basin sooner in the denoising process. We demonstrate the existence of attraction basins in various scenarios in which memorization occurs, and we show that our proposed approach successfully mitigates memorization.
△ Less
Submitted 17 March, 2025; v1 submitted 23 November, 2024;
originally announced November 2024.
-
Alpha-wolves and Alpha-mammals: Exploring Dictionary Attacks on Iris Recognition Systems
Authors:
Sudipta Banerjee,
Anubhav Jain,
Zehua Jiang,
Nasir Memon,
Julian Togelius,
Arun Ross
Abstract:
A dictionary attack in a biometric system entails the use of a small number of strategically generated images or templates to successfully match with a large number of identities, thereby compromising security. We focus on dictionary attacks at the template level, specifically the IrisCodes used in iris recognition systems. We present an hitherto unknown vulnerability wherein we mix IrisCodes usin…
▽ More
A dictionary attack in a biometric system entails the use of a small number of strategically generated images or templates to successfully match with a large number of identities, thereby compromising security. We focus on dictionary attacks at the template level, specifically the IrisCodes used in iris recognition systems. We present an hitherto unknown vulnerability wherein we mix IrisCodes using simple bitwise operators to generate alpha-mixtures - alpha-wolves (combining a set of "wolf" samples) and alpha-mammals (combining a set of users selected via search optimization) that increase false matches. We evaluate this vulnerability using the IITD, CASIA-IrisV4-Thousand and Synthetic datasets, and observe that an alpha-wolf (from two wolves) can match upto 71 identities @FMR=0.001%, while an alpha-mammal (from two identities) can match upto 133 other identities @FMR=0.01% on the IITD dataset.
△ Less
Submitted 20 November, 2023;
originally announced March 2024.
-
Mitigating the Impact of Attribute Editing on Face Recognition
Authors:
Sudipta Banerjee,
Sai Pranaswi Mullangi,
Shruti Wagle,
Chinmay Hegde,
Nasir Memon
Abstract:
Through a large-scale study over diverse face images, we show that facial attribute editing using modern generative AI models can severely degrade automated face recognition systems. This degradation persists even with identity-preserving generative models. To mitigate this issue, we propose two novel techniques for local and global attribute editing. We empirically ablate twenty-six facial semant…
▽ More
Through a large-scale study over diverse face images, we show that facial attribute editing using modern generative AI models can severely degrade automated face recognition systems. This degradation persists even with identity-preserving generative models. To mitigate this issue, we propose two novel techniques for local and global attribute editing. We empirically ablate twenty-six facial semantic, demographic and expression-based attributes that have been edited using state-of-the-art generative models, and evaluate them using ArcFace and AdaFace matchers on CelebA, CelebAMaskHQ and LFW datasets. Finally, we use LLaVA, an emerging visual question-answering framework for attribute prediction to validate our editing techniques. Our methods outperform the current state-of-the-art at facial editing (BLIP, InstantID) while improving identity retention by a significant extent.
△ Less
Submitted 9 April, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
Authors:
Govind Mittal,
Arthur Jakobsson,
Kelly O. Marshall,
Chinmay Hegde,
Nasir Memon
Abstract:
The rise of AI voice-cloning technology, particularly audio Real-time Deepfakes (RTDFs), has intensified social engineering attacks by enabling real-time voice impersonation that bypasses conventional enrollment-based authentication. This technology represents an existential threat to phone-based authentication systems, while total identity fraud losses reached $43 billion. Unlike traditional robo…
▽ More
The rise of AI voice-cloning technology, particularly audio Real-time Deepfakes (RTDFs), has intensified social engineering attacks by enabling real-time voice impersonation that bypasses conventional enrollment-based authentication. This technology represents an existential threat to phone-based authentication systems, while total identity fraud losses reached $43 billion. Unlike traditional robocalls, these personalized AI-generated voice attacks target high-value accounts and circumvent existing defensive measures, creating an urgent cybersecurity challenge. To address this, we propose PITCH, a robust challenge-response method to detect and tag interactive deepfake audio calls. We developed a comprehensive taxonomy of audio challenges based on the human auditory system, linguistics, and environmental factors, yielding 20 prospective challenges. Testing against leading voice-cloning systems using a novel dataset (18,600 original and 1.6 million deepfake samples from 100 users), PITCH's challenges enhanced machine detection capabilities to 88.7% AUROC score, enabling us to identify 10 highly-effective challenges.
For human evaluation, we filtered a challenging, balanced subset on which human evaluators independently achieved 72.6% accuracy, while machines scored 87.7%. Recognizing that call environments require human control, we developed a novel human-AI collaborative system that tags suspicious calls as "Deepfake-likely." Contrary to prior findings, we discovered that integrating human intuition with machine precision offers complementary advantages, giving users maximum control while boosting detection accuracy to 84.5%. This significant improvement situates PITCH's potential as an AI-assisted pre-screener for verifying calls, offering an adaptable approach to combat real-time voice-cloning attacks while maintaining human decision authority.
△ Less
Submitted 26 May, 2025; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Information Forensics and Security: A quarter-century-long journey
Authors:
Mauro Barni,
Patrizio Campisi,
Edward J. Delp,
Gwenael Doërr,
Jessica Fridrich,
Nasir Memon,
Fernando Pérez-González,
Anderson Rocha,
Luisa Verdoliva,
Min Wu
Abstract:
Information Forensics and Security (IFS) is an active R&D area whose goal is to ensure that people use devices, data, and intellectual properties for authorized purposes and to facilitate the gathering of solid evidence to hold perpetrators accountable. For over a quarter century since the 1990s, the IFS research area has grown tremendously to address the societal needs of the digital information…
▽ More
Information Forensics and Security (IFS) is an active R&D area whose goal is to ensure that people use devices, data, and intellectual properties for authorized purposes and to facilitate the gathering of solid evidence to hold perpetrators accountable. For over a quarter century since the 1990s, the IFS research area has grown tremendously to address the societal needs of the digital information era. The IEEE Signal Processing Society (SPS) has emerged as an important hub and leader in this area, and the article below celebrates some landmark technical contributions. In particular, we highlight the major technological advances on some selected focus areas in the field developed in the last 25 years from the research community and present future trends.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Fair GANs through model rebalancing for extremely imbalanced class distributions
Authors:
Anubhav Jain,
Nasir Memon,
Julian Togelius
Abstract:
Deep generative models require large amounts of training data. This often poses a problem as the collection of datasets can be expensive and difficult, in particular datasets that are representative of the appropriate underlying distribution (e.g. demographic). This introduces biases in datasets which are further propagated in the models. We present an approach to construct an unbiased generative…
▽ More
Deep generative models require large amounts of training data. This often poses a problem as the collection of datasets can be expensive and difficult, in particular datasets that are representative of the appropriate underlying distribution (e.g. demographic). This introduces biases in datasets which are further propagated in the models. We present an approach to construct an unbiased generative adversarial network (GAN) from an existing biased GAN by rebalancing the model distribution. We do so by generating balanced data from an existing imbalanced deep generative model using an evolutionary algorithm and then using this data to train a balanced generative model. Additionally, we propose a bias mitigation loss function that minimizes the deviation of the learned class distribution from being equiprobable. We show results for the StyleGAN2 models while training on the Flickr Faces High Quality (FFHQ) dataset for racial fairness and see that the proposed approach improves on the fairness metric by almost 5 times, whilst maintaining image quality. We further validate our approach by applying it to an imbalanced CIFAR10 dataset where we show that we can obtain comparable fairness and image quality as when training on a balanced CIFAR10 dataset which is also twice as large. Lastly, we argue that the traditionally used image quality metrics such as Frechet inception distance (FID) are unsuitable for scenarios where the class distributions are imbalanced and a balanced reference set is not available.
△ Less
Submitted 21 December, 2023; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Identity-Preserving Aging of Face Images via Latent Diffusion Models
Authors:
Sudipta Banerjee,
Govind Mittal,
Ameya Joshi,
Chinmay Hegde,
Nasir Memon
Abstract:
The performance of automated face recognition systems is inevitably impacted by the facial aging process. However, high quality datasets of individuals collected over several years are typically small in scale. In this work, we propose, train, and validate the use of latent text-to-image diffusion models for synthetically aging and de-aging face images. Our models succeed with few-shot training, a…
▽ More
The performance of automated face recognition systems is inevitably impacted by the facial aging process. However, high quality datasets of individuals collected over several years are typically small in scale. In this work, we propose, train, and validate the use of latent text-to-image diffusion models for synthetically aging and de-aging face images. Our models succeed with few-shot training, and have the added benefit of being controllable via intuitive textual prompting. We observe high degrees of visual realism in the generated images while maintaining biometric fidelity measured by commonly used metrics. We evaluate our method on two benchmark datasets (CelebA and AgeDB) and observe significant reduction (~44%) in the False Non-Match Rate compared to existing state-of the-art baselines.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
Zero-shot racially balanced dataset generation using an existing biased StyleGAN2
Authors:
Anubhav Jain,
Nasir Memon,
Julian Togelius
Abstract:
Facial recognition systems have made significant strides thanks to data-heavy deep learning models, but these models rely on large privacy-sensitive datasets. Further, many of these datasets lack diversity in terms of ethnicity and demographics, which can lead to biased models that can have serious societal and security implications. To address these issues, we propose a methodology that leverages…
▽ More
Facial recognition systems have made significant strides thanks to data-heavy deep learning models, but these models rely on large privacy-sensitive datasets. Further, many of these datasets lack diversity in terms of ethnicity and demographics, which can lead to biased models that can have serious societal and security implications. To address these issues, we propose a methodology that leverages the biased generative model StyleGAN2 to create demographically diverse images of synthetic individuals. The synthetic dataset is created using a novel evolutionary search algorithm that targets specific demographic groups. By training face recognition models with the resulting balanced dataset containing 50,000 identities per race (13.5 million images in total), we can improve their performance and minimize biases that might have been present in a model trained on a real dataset.
△ Less
Submitted 18 September, 2023; v1 submitted 12 May, 2023;
originally announced May 2023.
-
A Dataless FaceSwap Detection Approach Using Synthetic Images
Authors:
Anubhav Jain,
Nasir Memon,
Julian Togelius
Abstract:
Face swapping technology used to create "Deepfakes" has advanced significantly over the past few years and now enables us to create realistic facial manipulations. Current deep learning algorithms to detect deepfakes have shown promising results, however, they require large amounts of training data, and as we show they are biased towards a particular ethnicity. We propose a deepfake detection meth…
▽ More
Face swapping technology used to create "Deepfakes" has advanced significantly over the past few years and now enables us to create realistic facial manipulations. Current deep learning algorithms to detect deepfakes have shown promising results, however, they require large amounts of training data, and as we show they are biased towards a particular ethnicity. We propose a deepfake detection methodology that eliminates the need for any real data by making use of synthetically generated data using StyleGAN3. This not only performs at par with the traditional training methodology of using real data but it shows better generalization capabilities when finetuned with a small amount of real data. Furthermore, this also reduces biases created by facial image datasets that might have sparse data from particular ethnicities.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
GOTCHA: Real-Time Video Deepfake Detection via Challenge-Response
Authors:
Govind Mittal,
Chinmay Hegde,
Nasir Memon
Abstract:
With the rise of AI-enabled Real-Time Deepfakes (RTDFs), the integrity of online video interactions has become a growing concern. RTDFs have now made it feasible to replace an imposter's face with their victim in live video interactions. Such advancement in deepfakes also coaxes detection to rise to the same standard. However, existing deepfake detection techniques are asynchronous and hence ill-s…
▽ More
With the rise of AI-enabled Real-Time Deepfakes (RTDFs), the integrity of online video interactions has become a growing concern. RTDFs have now made it feasible to replace an imposter's face with their victim in live video interactions. Such advancement in deepfakes also coaxes detection to rise to the same standard. However, existing deepfake detection techniques are asynchronous and hence ill-suited for RTDFs. To bridge this gap, we propose a challenge-response approach that establishes authenticity in live settings. We focus on talking-head style video interaction and present a taxonomy of challenges that specifically target inherent limitations of RTDF generation pipelines. We evaluate representative examples from the taxonomy by collecting a unique dataset comprising eight challenges, which consistently and visibly degrades the quality of state-of-the-art deepfake generators. These results are corroborated both by humans and a new automated scoring function, leading to 88.6% and 80.1% AUC, respectively. The findings underscore the promising potential of challenge-response systems for explainable and scalable real-time deepfake detection in practical scenarios. We provide access to data and code at \url{https://github.com/mittalgovind/GOTCHA-Deepfakes}.
△ Less
Submitted 23 May, 2024; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Diversity and Novelty MasterPrints: Generating Multiple DeepMasterPrints for Increased User Coverage
Authors:
M Charity,
Nasir Memon,
Zehua Jiang,
Abhi Sen,
Julian Togelius
Abstract:
This work expands on previous advancements in genetic fingerprint spoofing via the DeepMasterPrints and introduces Diversity and Novelty MasterPrints. This system uses quality diversity evolutionary algorithms to generate dictionaries of artificial prints with a focus on increasing coverage of users from the dataset. The Diversity MasterPrints focus on generating solution prints that match with us…
▽ More
This work expands on previous advancements in genetic fingerprint spoofing via the DeepMasterPrints and introduces Diversity and Novelty MasterPrints. This system uses quality diversity evolutionary algorithms to generate dictionaries of artificial prints with a focus on increasing coverage of users from the dataset. The Diversity MasterPrints focus on generating solution prints that match with users not covered by previously found prints, and the Novelty MasterPrints explicitly search for prints with more that are farther in user space than previous prints. Our multi-print search methodologies outperform the singular DeepMasterPrints in both coverage and generalization while maintaining quality of the fingerprint image output.
△ Less
Submitted 11 September, 2022;
originally announced September 2022.
-
Dictionary Attacks on Speaker Verification
Authors:
Mirko Marras,
Pawel Korus,
Anubhav Jain,
Nasir Memon
Abstract:
In this paper, we propose dictionary attacks against speaker verification - a novel attack vector that aims to match a large fraction of speaker population by chance. We introduce a generic formulation of the attack that can be used with various speech representations and threat models. The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speec…
▽ More
In this paper, we propose dictionary attacks against speaker verification - a novel attack vector that aims to match a large fraction of speaker population by chance. We introduce a generic formulation of the attack that can be used with various speech representations and threat models. The attacker uses adversarial optimization to maximize raw similarity of speaker embeddings between a seed speech sample and a proxy population. The resulting master voice successfully matches a non-trivial fraction of people in an unknown population. Adversarial waveforms obtained with our approach can match on average 69% of females and 38% of males enrolled in the target system at a strict decision threshold calibrated to yield false alarm rate of 1%. By using the attack with a black-box voice cloning system, we obtain master voices that are effective in the most challenging conditions and transferable between speaker encoders. We also show that, combined with multiple attempts, this attack opens even more to serious issues on the security of these systems.
△ Less
Submitted 12 December, 2022; v1 submitted 24 April, 2022;
originally announced April 2022.
-
Hard-Attention for Scalable Image Classification
Authors:
Athanasios Papadopoulos,
Paweł Korus,
Nasir Memon
Abstract:
Can we leverage high-resolution information without the unsustainable quadratic complexity to input scale? We propose Traversal Network (TNet), a novel multi-scale hard-attention architecture, which traverses image scale-space in a top-down fashion, visiting only the most informative image regions along the way. TNet offers an adjustable trade-off between accuracy and complexity, by changing the n…
▽ More
Can we leverage high-resolution information without the unsustainable quadratic complexity to input scale? We propose Traversal Network (TNet), a novel multi-scale hard-attention architecture, which traverses image scale-space in a top-down fashion, visiting only the most informative image regions along the way. TNet offers an adjustable trade-off between accuracy and complexity, by changing the number of attended image locations. We compare our model against hard-attention baselines on ImageNet, achieving higher accuracy with less resources (FLOPs, processing time and memory). We further test our model on fMoW dataset, where we process satellite images of size up to $896 \times 896$ px, getting up to $2.5$x faster processing compared to baselines operating on the same resolution, while achieving higher accuracy as well. TNet is modular, meaning that most classification models could be adopted as its backbone for feature extraction, making the reported performance gains orthogonal to benefits offered by existing optimized deep models. Finally, hard-attention guarantees a degree of interpretability to our model's predictions, without any extra cost beyond inference. Code is available at $\href{https://github.com/Tpap/TNet}{github.com/Tpap/TNet}$.
△ Less
Submitted 28 October, 2021; v1 submitted 19 February, 2021;
originally announced February 2021.
-
The Role of the Crowd in Countering Misinformation: A Case Study of the COVID-19 Infodemic
Authors:
Nicholas Micallef,
Bing He,
Srijan Kumar,
Mustaque Ahamad,
Nasir Memon
Abstract:
Fact checking by professionals is viewed as a vital defense in the fight against misinformation.While fact checking is important and its impact has been significant, fact checks could have limited visibility and may not reach the intended audience, such as those deeply embedded in polarized communities. Concerned citizens (i.e., the crowd), who are users of the platforms where misinformation appea…
▽ More
Fact checking by professionals is viewed as a vital defense in the fight against misinformation.While fact checking is important and its impact has been significant, fact checks could have limited visibility and may not reach the intended audience, such as those deeply embedded in polarized communities. Concerned citizens (i.e., the crowd), who are users of the platforms where misinformation appears, can play a crucial role in disseminating fact-checking information and in countering the spread of misinformation. To explore if this is the case, we conduct a data-driven study of misinformation on the Twitter platform, focusing on tweets related to the COVID-19 pandemic, analyzing the spread of misinformation, professional fact checks, and the crowd response to popular misleading claims about COVID-19. In this work, we curate a dataset of false claims and statements that seek to challenge or refute them. We train a classifier to create a novel dataset of 155,468 COVID-19-related tweets, containing 33,237 false claims and 33,413 refuting arguments.Our findings show that professional fact-checking tweets have limited volume and reach. In contrast, we observe that the surge in misinformation tweets results in a quick response and a corresponding increase in tweets that refute such misinformation. More importantly, we find contrasting differences in the way the crowd refutes tweets, some tweets appear to be opinions, while others contain concrete evidence, such as a link to a reputed source. Our work provides insights into how misinformation is organically countered in social platforms by some of their users and the role they play in amplifying professional fact checks.These insights could lead to development of tools and mechanisms that can empower concerned citizens in combating misinformation. The code and data can be found in http://claws.cc.gatech.edu/covid_counter_misinformation.html.
△ Less
Submitted 11 November, 2020; v1 submitted 11 November, 2020;
originally announced November 2020.
-
Empirical Evaluation of PRNU Fingerprint Variation for Mismatched Imaging Pipelines
Authors:
Sharad Joshi,
Pawel Korus,
Nitin Khanna,
Nasir Memon
Abstract:
We assess the variability of PRNU-based camera fingerprints with mismatched imaging pipelines (e.g., different camera ISP or digital darkroom software). We show that camera fingerprints exhibit non-negligible variations in this setup, which may lead to unexpected degradation of detection statistics in real-world use-cases. We tested 13 different pipelines, including standard digital darkroom softw…
▽ More
We assess the variability of PRNU-based camera fingerprints with mismatched imaging pipelines (e.g., different camera ISP or digital darkroom software). We show that camera fingerprints exhibit non-negligible variations in this setup, which may lead to unexpected degradation of detection statistics in real-world use-cases. We tested 13 different pipelines, including standard digital darkroom software and recent neural-networks. We observed that correlation between fingerprints from mismatched pipelines drops on average to 0.38 and the PCE detection statistic drops by over 40%. The degradation in error rates is the strongest for small patches commonly used in photo manipulation detection, and when neural networks are used for photo development. At a fixed 0.5% FPR setting, the TPR drops by 17 ppt (percentage points) for 128 px and 256 px patches.
△ Less
Submitted 9 October, 2020; v1 submitted 4 April, 2020;
originally announced April 2020.
-
Fusion of Camera Model and Source Device Specific Forensic Methods for Improved Tamper Detection
Authors:
Ahmet Gökhan Poyraz,
Ahmet Emir Dirik,
Ahmet Karaküçük,
Nasir Memon
Abstract:
PRNU based camera recognition method is widely studied in the image forensic literature. In recent years, CNN based camera model recognition methods have been developed. These two methods also provide solutions to tamper localization problem. In this paper, we propose their combination via a Neural Network to achieve better small-scale tamper detection performance. According to the results, the fu…
▽ More
PRNU based camera recognition method is widely studied in the image forensic literature. In recent years, CNN based camera model recognition methods have been developed. These two methods also provide solutions to tamper localization problem. In this paper, we propose their combination via a Neural Network to achieve better small-scale tamper detection performance. According to the results, the fusion method performs better than underlying methods even under high JPEG compression. For forgeries as small as 100$\times$100 pixel size, the proposed method outperforms the state-of-the-art, which validates the usefulness of fusion for localization of small-size image forgeries. We believe the proposed approach is feasible for any tamper-detection pipeline using the PRNU based methodology.
△ Less
Submitted 5 May, 2020; v1 submitted 24 February, 2020;
originally announced February 2020.
-
Camera Fingerprint Extraction via Spatial Domain Averaged Frames
Authors:
Samet Taspinar,
Manoranjan Mohanty,
Nasir Memon
Abstract:
Photo Response Non-Uniformity (PRNU) based camera attribution is an effective method to determine the source camera of visual media (an image or a video). To apply this method, images or videos need to be obtained from a camera to create a "camera fingerprint" which then can be compared against the PRNU of the query media whose origin is under question. The fingerprint extraction process can be ti…
▽ More
Photo Response Non-Uniformity (PRNU) based camera attribution is an effective method to determine the source camera of visual media (an image or a video). To apply this method, images or videos need to be obtained from a camera to create a "camera fingerprint" which then can be compared against the PRNU of the query media whose origin is under question. The fingerprint extraction process can be time-consuming when a large number of video frames or images have to be denoised. This may need to be done when the individual images have been subjected to high compression or other geometric processing such as video stabilization. This paper investigates a simple, yet effective and efficient technique to create a camera fingerprint when so many still images need to be denoised. The technique utilizes Spatial Domain Averaged (SDA) frames. An SDA-frame is the arithmetic mean of multiple still images. When it is used for fingerprint extraction, the number of denoising operations can be significantly decreased with little or no performance loss. Experimental results show that the proposed method can work more than 50 times faster than conventional methods while providing similar matching results.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
FiFTy: Large-scale File Fragment Type Identification using Neural Networks
Authors:
Govind Mittal,
Pawel Korus,
Nasir Memon
Abstract:
We present FiFTy, a modern file type identification tool for memory forensics and data carving. In contrast to previous approaches based on hand-crafted features, we design a compact neural network architecture, which uses a trainable embedding space, akin to successful natural language processing models. Our approach dispenses with explicit feature extraction which is a bottleneck in legacy syste…
▽ More
We present FiFTy, a modern file type identification tool for memory forensics and data carving. In contrast to previous approaches based on hand-crafted features, we design a compact neural network architecture, which uses a trainable embedding space, akin to successful natural language processing models. Our approach dispenses with explicit feature extraction which is a bottleneck in legacy systems. We evaluate the proposed method on a novel dataset with 75 file types - the most diverse and balanced dataset reported to date. FiFTy consistently outperforms all baselines in terms of speed, accuracy and individual misclassification rates. We achieved an average accuracy of 77.5% with processing speed of approx 38 sec/GB, which is better and more than an order of magnitude faster than the previous state-of-the-art tool - Sceadan (69% at 9 min/GB). Our tool and the corresponding dataset are available publicly online.
△ Less
Submitted 7 June, 2020; v1 submitted 16 August, 2019;
originally announced August 2019.
-
Source Camera Attribution of Multi-Format Devices
Authors:
Samet Taspinar,
Manoranjan Mohanty,
Nasir Memon
Abstract:
Photo Response Non-Uniformity (PRNU) based source camera attribution is an effective method to determine the origin camera of visual media (an image or a video). However, given that modern devices, especially smartphones, capture images, and videos at different resolutions using the same sensor array, PRNU attribution can become ineffective as the camera fingerprint and query visual media can be m…
▽ More
Photo Response Non-Uniformity (PRNU) based source camera attribution is an effective method to determine the origin camera of visual media (an image or a video). However, given that modern devices, especially smartphones, capture images, and videos at different resolutions using the same sensor array, PRNU attribution can become ineffective as the camera fingerprint and query visual media can be misaligned. We examine different resizing techniques such as binning, line-skipping, cropping and scaling that cameras use to downsize the raw sensor image to different media. Taking such techniques into account, this paper studies the problem of source camera attribution. We define the notion of Ratio of Alignment, which is a measure of shared sensor elements among spatially corresponding pixels within two media objects resized with different techniques. We then compute the Ratio of Alignment between the different combinations of three common resizing methods under simplified conditions and experimentally validate our analysis. Based on the insights drawn from the different techniques used by cameras and the RoA analysis, the paper proposes an algorithm for matching the source of a video with an image and vice versa. We also present an efficient search method resulting in significantly improved performance in matching as well as computation time.
△ Less
Submitted 18 April, 2020; v1 submitted 2 April, 2019;
originally announced April 2019.
-
Analysis of Rolling Shutter Effect on ENF based Video Forensics
Authors:
Saffet Vatansever,
Ahmet Emir Dirik,
Nasir Memon
Abstract:
ENF is a time-varying signal of the frequency of mains electricity in a power grid. It continuously fluctuates around a nominal value (50/60 Hz) due to changes in supply and demand of power over time. Depending on these ENF variations, the luminous intensity of a mains-powered light source also fluctuates. These fluctuations in luminance can be captured by video recordings. Accordingly, ENF can be…
▽ More
ENF is a time-varying signal of the frequency of mains electricity in a power grid. It continuously fluctuates around a nominal value (50/60 Hz) due to changes in supply and demand of power over time. Depending on these ENF variations, the luminous intensity of a mains-powered light source also fluctuates. These fluctuations in luminance can be captured by video recordings. Accordingly, ENF can be estimated from such videos by analysis of steady content in the video scene. When videos are captured by using a rolling shutter sampling mechanism, as is done mostly with CMOS cameras, there is an idle period between successive frames. Consequently, a number of illumination samples of the scene are effectively lost due to the idle period. These missing samples affect ENF estimation, in the sense of the frequency shift caused and the power attenuation that results. This work develops an analytical model for videos captured using a rolling shutter mechanism. The model illustrates how the frequency of the main ENF harmonic varies depending on the idle period length, and how the power of the captured ENF attenuates as idle period increases. Based on this, a novel idle period estimation method for potential use in camera forensics that is able to operate independently of video frame rate is proposed. Finally, a novel time-of-recording verification approach based on use of multiple ENF components, idle period assumptions and interpolation of missing ENF samples is also proposed.
△ Less
Submitted 23 March, 2019;
originally announced March 2019.
-
Detecting the Presence of ENF Signal in Digital Videos: a Superpixel based Approach
Authors:
Saffet Vatansever,
Ahmet Emir Dirik,
Nasir Memon
Abstract:
ENF (Electrical Network Frequency) instantaneously fluctuates around its nominal value (50/60 Hz) due to a continuous disparity between generated power and consumed power. Consequently, luminous intensity of a mains-powered light source varies depending on ENF fluctuations in the grid network. Variations in the luminance over time can be captured from video recordings and ENF can be estimated thro…
▽ More
ENF (Electrical Network Frequency) instantaneously fluctuates around its nominal value (50/60 Hz) due to a continuous disparity between generated power and consumed power. Consequently, luminous intensity of a mains-powered light source varies depending on ENF fluctuations in the grid network. Variations in the luminance over time can be captured from video recordings and ENF can be estimated through content analysis of these recordings. In ENF based video forensics, it is critical to check whether a given video file is appropriate for this type of analysis. That is, if ENF signal is not present in a given video, it would be useless to apply ENF based forensic analysis. In this work, an ENF signal presence detection method is introduced for videos. The proposed method is based on multiple ENF signal estimations from steady superpixels, i.e. pixels that are most likely uniform in color, brightness, and texture, and intraclass similarity of the estimated signals. Subsequently, consistency among these estimates is then used to determine the presence or absence of an ENF signal in a given video. The proposed technique can operate on video clips as short as 2 minutes and is independent of the camera sensor type, i.e. CCD or CMOS.
△ Less
Submitted 23 March, 2019;
originally announced March 2019.
-
Neural Imaging Pipelines - the Scourge or Hope of Forensics?
Authors:
Pawel Korus,
Nasir Memon
Abstract:
Forensic analysis of digital photographs relies on intrinsic statistical traces introduced at the time of their acquisition or subsequent editing. Such traces are often removed by post-processing (e.g., down-sampling and re-compression applied upon distribution in the Web) which inhibits reliable provenance analysis. Increasing adoption of computational methods within digital cameras further compl…
▽ More
Forensic analysis of digital photographs relies on intrinsic statistical traces introduced at the time of their acquisition or subsequent editing. Such traces are often removed by post-processing (e.g., down-sampling and re-compression applied upon distribution in the Web) which inhibits reliable provenance analysis. Increasing adoption of computational methods within digital cameras further complicates the process and renders explicit mathematical modeling infeasible. While this trend challenges forensic analysis even in near-acquisition conditions, it also creates new opportunities. This paper explores end-to-end optimization of the entire image acquisition and distribution workflow to facilitate reliable forensic analysis at the end of the distribution channel, where state-of-the-art forensic techniques fail. We demonstrate that a neural network can be trained to replace the entire photo development pipeline, and jointly optimized for high-fidelity photo rendering and reliable provenance analysis. Such optimized neural imaging pipeline allowed us to increase image manipulation detection accuracy from approx. 45% to over 90%. The network learns to introduce carefully crafted artifacts, akin to digital watermarks, which facilitate subsequent manipulation detection. Analysis of performance trade-offs indicates that most of the gains can be obtained with only minor distortion. The findings encourage further research towards building more reliable imaging pipelines with explicit provenance-guaranteeing properties.
△ Less
Submitted 27 February, 2019;
originally announced February 2019.
-
Crime Analysis using Open Source Information
Authors:
Sarwat Nizamani,
Nasrullah Memon,
Azhar Ali Shah,
Sehrish Nizamani,
Saad Nizamani,
Imdad Ali Ismaili
Abstract:
In this paper, we present a method of crime analysis from open source information. We employed un-supervised methods of data mining to explore the facts regarding the crimes of an area of interest. The analysis is based on well known clustering and association techniques. The results show that the proposed method of crime analysis is efficient and gives a broad picture of the crimes of an area to…
▽ More
In this paper, we present a method of crime analysis from open source information. We employed un-supervised methods of data mining to explore the facts regarding the crimes of an area of interest. The analysis is based on well known clustering and association techniques. The results show that the proposed method of crime analysis is efficient and gives a broad picture of the crimes of an area to analyst without much effort. The analysis is evaluated using manual approach, which reveals that the results produced by the proposed approach are comparable to the manual analysis, while a great amount of time is saved.
△ Less
Submitted 15 February, 2019;
originally announced February 2019.
-
Content Authentication for Neural Imaging Pipelines: End-to-end Optimization of Photo Provenance in Complex Distribution Channels
Authors:
Pawel Korus,
Nasir Memon
Abstract:
Forensic analysis of digital photo provenance relies on intrinsic traces left in the photograph at the time of its acquisition. Such analysis becomes unreliable after heavy post-processing, such as down-sampling and re-compression applied upon distribution in the Web. This paper explores end-to-end optimization of the entire image acquisition and distribution workflow to facilitate reliable forens…
▽ More
Forensic analysis of digital photo provenance relies on intrinsic traces left in the photograph at the time of its acquisition. Such analysis becomes unreliable after heavy post-processing, such as down-sampling and re-compression applied upon distribution in the Web. This paper explores end-to-end optimization of the entire image acquisition and distribution workflow to facilitate reliable forensic analysis at the end of the distribution channel. We demonstrate that neural imaging pipelines can be trained to replace the internals of digital cameras, and jointly optimized for high-fidelity photo development and reliable provenance analysis. In our experiments, the proposed approach increased image manipulation detection accuracy from 45% to over 90%. The findings encourage further research towards building more reliable imaging pipelines with explicit provenance-guaranteeing properties.
△ Less
Submitted 25 February, 2019; v1 submitted 4 December, 2018;
originally announced December 2018.
-
Kid on The Phone! Toward Automatic Detection of Children on Mobile Devices
Authors:
Toan Nguyen,
Aditi Roy,
Nasir Memon
Abstract:
Studies have shown that children can be exposed to smart devices at a very early age. This has important implications on research in children-computer interaction, children online safety and early education. Many systems have been built based on such research. In this work, we present multiple techniques to automatically detect the presence of a child on a smart device, which could be used as the…
▽ More
Studies have shown that children can be exposed to smart devices at a very early age. This has important implications on research in children-computer interaction, children online safety and early education. Many systems have been built based on such research. In this work, we present multiple techniques to automatically detect the presence of a child on a smart device, which could be used as the first step on such systems. Our methods distinguish children from adults based on behavioral differences while operating a touch-enabled modern computing device. Behavioral differences are extracted from data recorded by the touchscreen and built-in sensors. To evaluate the effectiveness of the proposed methods, a new data set has been created from 50 children and adults who interacted with off-the-shelf applications on smart phones. Results show that it is possible to achieve 99% accuracy and less than 0.5% error rate after 8 consecutive touch gestures using only touch information or 5 seconds of sensor reading. If information is used from multiple sensors, then only after 3 gestures, similar performance could be achieved.
△ Less
Submitted 5 August, 2018;
originally announced August 2018.
-
Tap-based User Authentication for Smartwatches
Authors:
Toan Nguyen,
Nasir Memon
Abstract:
This paper presents TapMeIn, an eyes-free, two-factor authentication method for smartwatches. It allows users to tap a memorable melody (tap-password) of their choice anywhere on the touchscreen to unlock their watch. A user is verified based on the tap-password as well as her physiological and behavioral characteristics when tapping. Results from preliminary experiments with 41 participants show…
▽ More
This paper presents TapMeIn, an eyes-free, two-factor authentication method for smartwatches. It allows users to tap a memorable melody (tap-password) of their choice anywhere on the touchscreen to unlock their watch. A user is verified based on the tap-password as well as her physiological and behavioral characteristics when tapping. Results from preliminary experiments with 41 participants show that TapMeIn could achieve an accuracy of 98.7% with a False Positive Rate of only 0.98%. In addition, TapMeIn retains its performance in different conditions such as sitting and walking. In terms of speed, TapMeIn has an average authentication time of 2 seconds. A user study with the System Usability Scale (SUS) tool suggests that TapMeIn has a high usability score.
△ Less
Submitted 5 August, 2018; v1 submitted 2 July, 2018;
originally announced July 2018.
-
An HMM-based behavior modeling approach for continuous mobile authentication
Authors:
Aditi Roy,
Tzipora Halevi,
Nasir Memon
Abstract:
This paper studies continuous authentication for touch interface based mobile devices. A Hidden Markov Model (HMM) based behavioral template training approach is presented, which does not require training data from other subjects other than the owner of the mobile. The stroke patterns of a user are modeled using a continuous left-right HMM. The approach models the horizontal and vertical scrolling…
▽ More
This paper studies continuous authentication for touch interface based mobile devices. A Hidden Markov Model (HMM) based behavioral template training approach is presented, which does not require training data from other subjects other than the owner of the mobile. The stroke patterns of a user are modeled using a continuous left-right HMM. The approach models the horizontal and vertical scrolling patterns of a user since these are the basic and mostly used interactions on a mobile device. The effectiveness of the proposed method is evaluated through extensive experiments using the Toucha-lytics database which comprises of touch data over time. The results show that the performance of the proposed approach is better than the state-of-the-art method.
△ Less
Submitted 22 December, 2017;
originally announced December 2017.
-
An HMM-based Multi-sensor Approach for Continuous Mobile Authentication
Authors:
Aditi Roy,
Tzipora Halevi,
Nasir Memon
Abstract:
With the increased popularity of smart phones, there is a greater need to have a robust authentication mechanism that handles various security threats and privacy leakages effectively. This paper studies continuous authentication for touch interface based mobile devices. A Hidden Markov Model (HMM) based behavioral template training approach is presented, which does not require training data from…
▽ More
With the increased popularity of smart phones, there is a greater need to have a robust authentication mechanism that handles various security threats and privacy leakages effectively. This paper studies continuous authentication for touch interface based mobile devices. A Hidden Markov Model (HMM) based behavioral template training approach is presented, which does not require training data from other subjects other than the owner of the mobile device and can get updated with new data over time. The gesture patterns of the user are modeled from multiple sensors - touch, accelerometer and gyroscope data using a continuous left-right HMM. The approach models the tap and stroke patterns of a user since these are the basic and most frequently used interactions on a mobile device. To evaluate the effectiveness of the proposed method a new data set has been created from 42 users who interacted with off-the-shelf applications on their smart phones. Results show that the performance of the proposed approach is promising and potentially better than other state-of-the-art approaches.
△ Less
Submitted 22 December, 2017;
originally announced December 2017.
-
IllusionPIN: Shoulder-Surfing Resistant Authentication Using Hybrid Images
Authors:
Athanasios Papadopoulos,
Toan Nguyen,
Emre Durmus,
Nasir Memon
Abstract:
We address the problem of shoulder-surfing attacks on authentication schemes by proposing IllusionPIN (IPIN), a PIN-based authentication method that operates on touchscreen devices. IPIN uses the technique of hybrid images to blend two keypads with different digit orderings in such a way, that the user who is close to the device is seeing one keypad to enter her PIN, while the attacker who is look…
▽ More
We address the problem of shoulder-surfing attacks on authentication schemes by proposing IllusionPIN (IPIN), a PIN-based authentication method that operates on touchscreen devices. IPIN uses the technique of hybrid images to blend two keypads with different digit orderings in such a way, that the user who is close to the device is seeing one keypad to enter her PIN, while the attacker who is looking at the device from a bigger distance is seeing only the other keypad. The user's keypad is shuffled in every authentication attempt since the attacker may memorize the spatial arrangement of the pressed digits.
To reason about the security of IllusionPIN, we developed an algorithm which is based on human visual perception and estimates the minimum distance from which an observer is unable to interpret the keypad of the user. We tested our estimations with 84 simulated shoulder-surfing attacks from 21 different people. None of the attacks was successful against our estimations. In addition, we estimated the minimum distance from which a camera is unable to capture the visual information from the keypad of the user. Based on our analysis, it seems practically almost impossible for a surveillance camera to capture the PIN of a smartphone user when IPIN is in use.
△ Less
Submitted 22 August, 2017;
originally announced August 2017.
-
DeepMasterPrints: Generating MasterPrints for Dictionary Attacks via Latent Variable Evolution
Authors:
Philip Bontrager,
Aditi Roy,
Julian Togelius,
Nasir Memon,
Arun Ross
Abstract:
Recent research has demonstrated the vulnerability of fingerprint recognition systems to dictionary attacks based on MasterPrints. MasterPrints are real or synthetic fingerprints that can fortuitously match with a large number of fingerprints thereby undermining the security afforded by fingerprint systems. Previous work by Roy et al. generated synthetic MasterPrints at the feature-level. In this…
▽ More
Recent research has demonstrated the vulnerability of fingerprint recognition systems to dictionary attacks based on MasterPrints. MasterPrints are real or synthetic fingerprints that can fortuitously match with a large number of fingerprints thereby undermining the security afforded by fingerprint systems. Previous work by Roy et al. generated synthetic MasterPrints at the feature-level. In this work we generate complete image-level MasterPrints known as DeepMasterPrints, whose attack accuracy is found to be much superior than that of previous methods. The proposed method, referred to as Latent Variable Evolution, is based on training a Generative Adversarial Network on a set of real fingerprint images. Stochastic search in the form of the Covariance Matrix Adaptation Evolution Strategy is then used to search for latent input variables to the generator network that can maximize the number of impostor matches as assessed by a fingerprint recognizer. Experiments convey the efficacy of the proposed method in generating DeepMasterPrints. The underlying method is likely to have broad applications in fingerprint security as well as fingerprint synthesis.
△ Less
Submitted 18 October, 2018; v1 submitted 20 May, 2017;
originally announced May 2017.
-
pH Prediction by Artificial Neural Networks for the Drinking Water of the Distribution System of Hyderabad City
Authors:
Niaz Ahmed Memon,
Mukhtiar Ali Unar,
Abdul Khalique Ansari
Abstract:
In this research, feedforward ANN (Artificial Neural Network) model is developed and validated for predicting the pH at 10 different locations of the distribution system of drinking water of Hyderabad city. The developed model is MLP (Multilayer Perceptron) with back propagation algorithm.The data for the training and testing of the model are collected through an experimental analysis on weekly ba…
▽ More
In this research, feedforward ANN (Artificial Neural Network) model is developed and validated for predicting the pH at 10 different locations of the distribution system of drinking water of Hyderabad city. The developed model is MLP (Multilayer Perceptron) with back propagation algorithm.The data for the training and testing of the model are collected through an experimental analysis on weekly basis in a routine examination for maintaining the quality of drinking water in the city. 17 parameters are taken into consideration including pH. These all parameters are taken as input variables for the model and then pH is predicted for 03 phases;raw water of river Indus,treated water in the treatment plants and then treated water in the distribution system of drinking water. The training and testing results of this model reveal that MLP neural networks are exceedingly extrapolative for predicting the pH of river water, untreated and treated water at all locations of the distribution system of drinking water of Hyderabad city. The optimum input and output weights are generated with minimum MSE (Mean Square Error) < 5%.Experimental, predicted and tested values of pH are plotted and the effectiveness of the model is determined by calculating the coefficient of correlation (R2=0.999) of trained and tested results.
△ Less
Submitted 2 April, 2016;
originally announced April 2016.
-
Leveraging Personalization To Facilitate Privacy
Authors:
Tehila Minkus,
Nasir Memon
Abstract:
Online social networks have enabled new methods and modalities of collaboration and sharing. These advances bring privacy concerns: online social data is more accessible and persistent and simultaneously less contextualized than traditional social interactions. To allay these concerns, many web services allow users to configure their privacy settings based on a set of multiple-choice questions.…
▽ More
Online social networks have enabled new methods and modalities of collaboration and sharing. These advances bring privacy concerns: online social data is more accessible and persistent and simultaneously less contextualized than traditional social interactions. To allay these concerns, many web services allow users to configure their privacy settings based on a set of multiple-choice questions.
We suggest a new paradigm for privacy options. Instead of suggesting the same defaults to each user, services can leverage knowledge of users' traits to recommend a machine-learned prediction of their privacy preferences for Facebook. As a case study, we build and evaluate MyPrivacy, a publicly available web application that suggests personalized privacy settings. An evaluation with 199 users shows that users find the suggestions to be appropriate and private; furthermore, they express intent to implement the recommendations made by MyPrivacy. This supports the proposal to put personalization to work in online communities to promote privacy and security.
△ Less
Submitted 9 June, 2014;
originally announced June 2014.
-
CEAI: CCM based Email Authorship Identification Model
Authors:
Sarwat Nizamani,
Nasrullah Memon
Abstract:
In this paper we present a model for email authorship identification (EAI) by employing a Cluster-based Classification (CCM) technique. Traditionally, stylometric features have been successfully employed in various authorship analysis tasks; we extend the traditional feature-set to include some more interesting and effective features for email authorship identification (e.g. the last punctuation m…
▽ More
In this paper we present a model for email authorship identification (EAI) by employing a Cluster-based Classification (CCM) technique. Traditionally, stylometric features have been successfully employed in various authorship analysis tasks; we extend the traditional feature-set to include some more interesting and effective features for email authorship identification (e.g. the last punctuation mark used in an email, the tendency of an author to use capitalization at the start of an email, or the punctuation after a greeting or farewell). We also included Info Gain feature selection based content features. It is observed that the use of such features in the authorship identification process has a positive impact on the accuracy of the authorship identification task. We performed experiments to justify our arguments and compared the results with other base line models. Experimental results reveal that the proposed CCM-based email authorship identification model, along with the proposed feature set, outperforms the state-of-the-art support vector machine (SVM)-based models, as well as the models proposed by Iqbal et al. [1, 2]. The proposed model attains an accuracy rate of 94% for 10 authors, 89% for 25 authors, and 81% for 50 authors, respectively on Enron dataset, while 89.5% accuracy has been achieved on authors' constructed real email dataset. The results on Enron dataset have been achieved on quite a large number of authors as compared to the models proposed by Iqbal et al. [1, 2].
△ Less
Submitted 6 December, 2013;
originally announced December 2013.
-
Modeling Suspicious Email Detection using Enhanced Feature Selection
Authors:
Sarwat Nizamani,
Nasrullah Memon,
Uffe Kock Wiil,
Panagiotis Karampelas
Abstract:
The paper presents a suspicious email detection model which incorporates enhanced feature selection. In the paper we proposed the use of feature selection strategies along with classification technique for terrorists email detection. The presented model focuses on the evaluation of machine learning algorithms such as decision tree (ID3), logistic regression, Naïve Bayes (NB), and Support Vector Ma…
▽ More
The paper presents a suspicious email detection model which incorporates enhanced feature selection. In the paper we proposed the use of feature selection strategies along with classification technique for terrorists email detection. The presented model focuses on the evaluation of machine learning algorithms such as decision tree (ID3), logistic regression, Naïve Bayes (NB), and Support Vector Machine (SVM) for detecting emails containing suspicious content. In the literature, various algorithms achieved good accuracy for the desired task. However, the results achieved by those algorithms can be further improved by using appropriate feature selection mechanisms. We have identified the use of a specific feature selection scheme that improves the performance of the existing algorithms.
△ Less
Submitted 6 December, 2013;
originally announced December 2013.
-
PSN: Portfolio Social Network
Authors:
Jordi M. Cortes,
Sarwat Nizamani,
Nasrullah Memon
Abstract:
In this paper we present a web-based information system which is a portfolio social network (PSN) that provides solutions to recruiters and job seekers. The proposed system enables users to create portfolios so that he/she can add his specializations with piece of code, if any, specifically for software engineers, which is accessible online. The unique feature of the system is to enable the recrui…
▽ More
In this paper we present a web-based information system which is a portfolio social network (PSN) that provides solutions to recruiters and job seekers. The proposed system enables users to create portfolios so that he/she can add his specializations with piece of code, if any, specifically for software engineers, which is accessible online. The unique feature of the system is to enable the recruiters to quickly view the prominent skills of the users. A comparative analysis of the proposed system with the state of the art systems is presented. The comparative study reveals that the proposed system has advanced functionalities.
△ Less
Submitted 6 December, 2013;
originally announced December 2013.
-
From Public Outrage to the Burst of Public Violence: An Epidemic-Like Model
Authors:
Sarwat Nizamani,
Nasrullah Memon,
Serge Galam
Abstract:
This study extends classical models of spreading epidemics to describe the phenomenon of contagious public outrage, which eventually leads to the spread of violence following a disclosure of some unpopular political decisions and/or activity. Accordingly, a mathematical model is proposed to simulate from the start, the internal dynamics by which an external event is turned into internal violence w…
▽ More
This study extends classical models of spreading epidemics to describe the phenomenon of contagious public outrage, which eventually leads to the spread of violence following a disclosure of some unpopular political decisions and/or activity. Accordingly, a mathematical model is proposed to simulate from the start, the internal dynamics by which an external event is turned into internal violence within a population. Five kinds of agents are considered: "Upset" (U), "Violent" (V), "Sensitive" (S), "Immune" (I), and "Relaxed" (R), leading to a set of ordinary differential equations, which in turn yield the dynamics of spreading of each type of agents among the population. The process is stopped with the deactivation of the associated issue. Conditions coinciding with a twofold spreading of public violence are singled out. The results shed a new light to understand terror activity and provides some hint on how to curb the spreading of violence within population globally sensitive to specific world issues. Recent world violent events are discussed.
△ Less
Submitted 2 October, 2013;
originally announced October 2013.
-
Phishing, Personality Traits and Facebook
Authors:
Tzipora Halevi,
Jim Lewis,
Nasir Memon
Abstract:
Phishing attacks have become an increasing threat to online users. Recent research has begun to focus on the factors that cause people to respond to them. Our study examines the correlation between the Big Five personality traits and email phishing response. We also examine how these factors affect users behavior on Facebook, including posting personal information and choosing Facebook privacy set…
▽ More
Phishing attacks have become an increasing threat to online users. Recent research has begun to focus on the factors that cause people to respond to them. Our study examines the correlation between the Big Five personality traits and email phishing response. We also examine how these factors affect users behavior on Facebook, including posting personal information and choosing Facebook privacy settings.
Our research shows that when using a prize phishing email, we find a strong correlation between gender and the response to the phishing email. In addition, we find that the neuroticism is the factor most correlated to responding to this email. Our study also found that people who score high on the openness factor tend to both post more information on Facebook as well as have less strict privacy settings, which may cause them to be susceptible to privacy attacks. In addition, our work detected no correlation between the participants estimate of being vulnerable to phishing attacks and actually being phished, which suggests susceptibility to phishing is not due to lack of awareness of the phishing risks and that realtime response to phishing is hard to predict in advance by online users.
We believe that better understanding of the traits which contribute to online vulnerability can help develop methods for increasing users privacy and security in the future.
△ Less
Submitted 7 February, 2013; v1 submitted 31 January, 2013;
originally announced January 2013.