-
Adaptive Control Attention Network for Underwater Acoustic Localization and Domain Adaptation
Authors:
Quoc Thinh Vo,
Joe Woods,
Priontu Chowdhury,
David K. Han
Abstract:
Localizing acoustic sound sources in the ocean is a challenging task due to the complex and dynamic nature of the environment. Factors such as high background noise, irregular underwater geometries, and varying acoustic properties make accurate localization difficult. To address these obstacles, we propose a multi-branch network architecture designed to accurately predict the distance between a mo…
▽ More
Localizing acoustic sound sources in the ocean is a challenging task due to the complex and dynamic nature of the environment. Factors such as high background noise, irregular underwater geometries, and varying acoustic properties make accurate localization difficult. To address these obstacles, we propose a multi-branch network architecture designed to accurately predict the distance between a moving acoustic source and a receiver, tested on real-world underwater signal arrays. The network leverages Convolutional Neural Networks (CNNs) for robust spatial feature extraction and integrates Conformers with self-attention mechanism to effectively capture temporal dependencies. Log-mel spectrogram and generalized cross-correlation with phase transform (GCC-PHAT) features are employed as input representations. To further enhance the model performance, we introduce an Adaptive Gain Control (AGC) layer, that adaptively adjusts the amplitude of input features, ensuring consistent energy levels across varying ranges, signal strengths, and noise conditions. We assess the model's generalization capability by training it in one domain and testing it in a different domain, using only a limited amount of data from the test domain for fine-tuning. Our proposed method outperforms state-of-the-art (SOTA) approaches in similar settings, establishing new benchmarks for underwater sound localization.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Orbital Collision: An Indigenously Developed Web-based Space Situational Awareness Platform
Authors:
Partha Chowdhury,
Harsha M,
Ayush Gupta,
Sanat K Biswas
Abstract:
This work presents an indigenous web based platform Orbital Collision (OrCo), created by the Space Systems Laboratory at IIIT Delhi, to enhance Space Situational Awareness (SSA) by predicting collision probabilities of space objects using Two Line Elements (TLE) data. The work highlights the growing challenges of congestion in the Earth's orbital environment, mainly due to space debris and defunct…
▽ More
This work presents an indigenous web based platform Orbital Collision (OrCo), created by the Space Systems Laboratory at IIIT Delhi, to enhance Space Situational Awareness (SSA) by predicting collision probabilities of space objects using Two Line Elements (TLE) data. The work highlights the growing challenges of congestion in the Earth's orbital environment, mainly due to space debris and defunct satellites, which increase collision risks. It employs several methods for propagating orbital uncertainty and calculating the collision probability. The performance of the platform is evaluated through accuracy assessments and efficiency metrics, in order to improve the tracking of space objects and ensure the safety of the satellite in congested space.
△ Less
Submitted 20 June, 2025;
originally announced June 2025.
-
Towards Bridging Formal Methods and Human Interpretability
Authors:
Abhijit Paul,
Proma Chowdhury,
Kazi Sakib
Abstract:
Labeled Transition Systems (LTS) are integral to model checking and design repair tools. System engineers frequently examine LTS designs during model checking or design repair to debug, identify inconsistencies, and validate system behavior. Despite LTS's significance, no prior research has examined human comprehension of these designs. To address this, we draw on traditional software engineering…
▽ More
Labeled Transition Systems (LTS) are integral to model checking and design repair tools. System engineers frequently examine LTS designs during model checking or design repair to debug, identify inconsistencies, and validate system behavior. Despite LTS's significance, no prior research has examined human comprehension of these designs. To address this, we draw on traditional software engineering and graph theory, identifying 7 key metrics: cyclomatic complexity, state space size, average branching factor, maximum depth, Albin complexity, modularity, and redundancy. We created a dataset of 148 LTS designs, sampling 48 for 324 paired comparisons, and ranked them using the Bradley-Terry model. Through Kendall's Tau correlation analysis, we found that Albin complexity ($τ= 0.444$), state space size ($τ= 0.420$), cyclomatic complexity ($τ= 0.366$), and redundancy ($τ= 0.315$) most accurately reflect human comprehension of LTS designs. To showcase the metrics' utility, we applied the Albin complexity metric within the Fortis design repair tool, ranking system redesigns. This ranking reduced annotators' comprehension time by 39\%, suggesting that metrics emphasizing human factors can enhance formal design interpretability.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting
Authors:
Sankalan Pal Chowdhury,
Terry Jingchen Zhang,
Donya Rooein,
Dirk Hovy,
Tanja Käser,
Mrinmaya Sachan
Abstract:
The rapid development of Large Language Models (LLMs) opens up the possibility of using them as personal tutors. This has led to the development of several intelligent tutoring systems and learning assistants that use LLMs as back-ends with various degrees of engineering. In this study, we seek to compare human tutors with LLM tutors in terms of engagement, empathy, scaffolding, and conciseness. W…
▽ More
The rapid development of Large Language Models (LLMs) opens up the possibility of using them as personal tutors. This has led to the development of several intelligent tutoring systems and learning assistants that use LLMs as back-ends with various degrees of engineering. In this study, we seek to compare human tutors with LLM tutors in terms of engagement, empathy, scaffolding, and conciseness. We ask human tutors to annotate and compare the performance of an LLM tutor with that of a human tutor in teaching grade-school math word problems on these qualities. We find that annotators with teaching experience perceive LLMs as showing higher performance than human tutors in all 4 metrics. The biggest advantage is in empathy, where 80% of our annotators prefer the LLM tutor more often than the human tutors. Our study paints a positive picture of LLMs as tutors and indicates that these models can be used to reduce the load on human teachers in the future.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch
Authors:
Aneeshan Sain,
Subhajit Maity,
Pinaki Nath Chowdhury,
Subhadeep Koley,
Ayan Kumar Bhunia,
Yi-Zhe Song
Abstract:
As sketch research has collectively matured over time, its adaptation for at-mass commercialisation emerges on the immediate horizon. Despite an already mature research endeavour for photos, there is no research on the efficient inference specifically designed for sketch data. In this paper, we first demonstrate existing state-of-the-art efficient light-weight models designed for photos do not wor…
▽ More
As sketch research has collectively matured over time, its adaptation for at-mass commercialisation emerges on the immediate horizon. Despite an already mature research endeavour for photos, there is no research on the efficient inference specifically designed for sketch data. In this paper, we first demonstrate existing state-of-the-art efficient light-weight models designed for photos do not work on sketches. We then propose two sketch-specific components which work in a plug-n-play manner on any photo efficient network to adapt them to work on sketch data. We specifically chose fine-grained sketch-based image retrieval (FG-SBIR) as a demonstrator as the most recognised sketch problem with immediate commercial value. Technically speaking, we first propose a cross-modal knowledge distillation network to transfer existing photo efficient networks to be compatible with sketch, which brings down number of FLOPs and model parameters by 97.96% percent and 84.89% respectively. We then exploit the abstract trait of sketch to introduce a RL-based canvas selector that dynamically adjusts to the abstraction level which further cuts down number of FLOPs by two thirds. The end result is an overall reduction of 99.37% of FLOPs (from 40.18G to 0.254G) when compared with a full network, while retaining the accuracy (33.03% vs 32.77%) -- finally making an efficient network for the sparse sketch data that exhibit even fewer FLOPs than the best photo counterpart.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
A Large-scale Benchmark on Geological Fault Delineation Models: Domain Shift, Training Dynamics, Generalizability, Evaluation and Inferential Behavior
Authors:
Jorge Quesada,
Chen Zhou,
Prithwijit Chowdhury,
Mohammad Alotaibi,
Ahmad Mustafa,
Yusufjon Kumamnov,
Mohit Prabhushankar,
Ghassan AlRegib
Abstract:
Machine learning has taken a critical role in seismic interpretation workflows, especially in fault delineation tasks. However, despite the recent proliferation of pretrained models and synthetic datasets, the field still lacks a systematic understanding of the generalizability limits of these models across seismic data representing a variety of geologic, acquisition and processing settings. Distr…
▽ More
Machine learning has taken a critical role in seismic interpretation workflows, especially in fault delineation tasks. However, despite the recent proliferation of pretrained models and synthetic datasets, the field still lacks a systematic understanding of the generalizability limits of these models across seismic data representing a variety of geologic, acquisition and processing settings. Distributional shifts between different data sources, limitations in fine-tuning strategies and labeled data accessibility, and inconsistent evaluation protocols all represent major roadblocks in the deployment of reliable and robust models in real-world exploration settings. In this paper, we present the first large-scale benchmarking study explicitly designed to provide answers and guidelines for domain shift strategies in seismic interpretation. Our benchmark encompasses over $200$ models trained and evaluated on three heterogeneous datasets (synthetic and real data) including FaultSeg3D, CRACKS, and Thebe. We systematically assess pretraining, fine-tuning, and joint training strategies under varying degrees of domain shift. Our analysis highlights the fragility of current fine-tuning practices, the emergence of catastrophic forgetting, and the challenges of interpreting performance in a systematic manner. We establish a robust experimental baseline to provide insights into the tradeoffs inherent to current fault delineation workflows, and shed light on directions for developing more generalizable, interpretable and effective machine learning models for seismic interpretation. The insights and analyses reported provide a set of guidelines on the deployment of fault delineation models within seismic interpretation workflows.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Multilingual Performance Biases of Large Language Models in Education
Authors:
Vansh Gupta,
Sankalan Pal Chowdhury,
Vilém Zouhar,
Donya Rooein,
Mrinmaya Sachan
Abstract:
Large language models (LLMs) are increasingly being adopted in educational settings. These applications expand beyond English, though current LLMs remain primarily English-centric. In this work, we ascertain if their use in education settings in non-English languages is warranted. We evaluated the performance of popular LLMs on four educational tasks: identifying student misconceptions, providing…
▽ More
Large language models (LLMs) are increasingly being adopted in educational settings. These applications expand beyond English, though current LLMs remain primarily English-centric. In this work, we ascertain if their use in education settings in non-English languages is warranted. We evaluated the performance of popular LLMs on four educational tasks: identifying student misconceptions, providing targeted feedback, interactive tutoring, and grading translations in six languages (Hindi, Arabic, Farsi, Telugu, Ukrainian, Czech) in addition to English. We find that the performance on these tasks somewhat corresponds to the amount of language represented in training data, with lower-resource languages having poorer task performance. Although the models perform reasonably well in most languages, the frequent performance drop from English is significant. Thus, we recommend that practitioners first verify that the LLM works well in the target language for their educational task before deployment.
△ Less
Submitted 24 April, 2025;
originally announced April 2025.
-
SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models
Authors:
Subhadeep Koley,
Tapas Kumar Dutta,
Aneeshan Sain,
Pinaki Nath Chowdhury,
Ayan Kumar Bhunia,
Yi-Zhe Song
Abstract:
While foundation models have revolutionised computer vision, their effectiveness for sketch understanding remains limited by the unique challenges of abstract, sparse visual inputs. Through systematic analysis, we uncover two fundamental limitations: Stable Diffusion (SD) struggles to extract meaningful features from abstract sketches (unlike its success with photos), and exhibits a pronounced fre…
▽ More
While foundation models have revolutionised computer vision, their effectiveness for sketch understanding remains limited by the unique challenges of abstract, sparse visual inputs. Through systematic analysis, we uncover two fundamental limitations: Stable Diffusion (SD) struggles to extract meaningful features from abstract sketches (unlike its success with photos), and exhibits a pronounced frequency-domain bias that suppresses essential low-frequency components needed for sketch understanding. Rather than costly retraining, we address these limitations by strategically combining SD with CLIP, whose strong semantic understanding naturally compensates for SD's spatial-frequency biases. By dynamically injecting CLIP features into SD's denoising process and adaptively aggregating features across semantic levels, our method achieves state-of-the-art performance in sketch retrieval (+3.35%), recognition (+1.06%), segmentation (+29.42%), and correspondence learning (+21.22%), demonstrating the first truly universal sketch feature representation in the era of foundation models.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches
Authors:
Subhadeep Koley,
Viswanatha Reddy Gajjala,
Aneeshan Sain,
Pinaki Nath Chowdhury,
Tao Xiang,
Ayan Kumar Bhunia,
Yi-Zhe Song
Abstract:
We introduce SketchYourSeg, a novel framework that establishes freehand sketches as a powerful query modality for subjective image segmentation across entire galleries through a single exemplar sketch. Unlike text prompts that struggle with spatial specificity or interactive methods confined to single-image operations, sketches naturally combine semantic intent with structural precision. This uniq…
▽ More
We introduce SketchYourSeg, a novel framework that establishes freehand sketches as a powerful query modality for subjective image segmentation across entire galleries through a single exemplar sketch. Unlike text prompts that struggle with spatial specificity or interactive methods confined to single-image operations, sketches naturally combine semantic intent with structural precision. This unique dual encoding enables precise visual disambiguation for segmentation tasks where text descriptions would be cumbersome or ambiguous -- such as distinguishing between visually similar instances, specifying exact part boundaries, or indicating spatial relationships in composed concepts. Our approach addresses three fundamental challenges: (i) eliminating the need for pixel-perfect annotation masks during training with a mask-free framework; (ii) creating a synergistic relationship between sketch-based image retrieval (SBIR) models and foundation models (CLIP/DINOv2) where the former provides training signals while the latter generates masks; and (iii) enabling multi-granular segmentation capabilities through purpose-made sketch augmentation strategies. Our extensive evaluations demonstrate superior performance over existing approaches across diverse benchmarks, establishing a new paradigm for user-guided image segmentation that balances precision with efficiency.
△ Less
Submitted 17 March, 2025; v1 submitted 27 January, 2025;
originally announced January 2025.
-
SPACE-SUIT: An Artificial Intelligence based chromospheric feature extractor and classifier for SUIT
Authors:
Pranava Seth,
Vishal Upendran,
Megha Anand,
Janmejoy Sarkar,
Soumya Roy,
Priyadarshan Chaki,
Pratyay Chowdhury,
Borishan Ghosh,
Durgesh Tripathi
Abstract:
The Solar Ultraviolet Imaging Telescope(SUIT) onboard Aditya-L1 is an imager that observes the solar photosphere and chromosphere through observations in the wavelength range of 200-400 nm. A comprehensive understanding of the plasma and thermodynamic properties of chromospheric and photospheric morphological structures requires a large sample statistical study, necessitating the development of au…
▽ More
The Solar Ultraviolet Imaging Telescope(SUIT) onboard Aditya-L1 is an imager that observes the solar photosphere and chromosphere through observations in the wavelength range of 200-400 nm. A comprehensive understanding of the plasma and thermodynamic properties of chromospheric and photospheric morphological structures requires a large sample statistical study, necessitating the development of automatic feature detection methods. To this end, we develop the feature detection algorithm SPACE-SUIT: Solar Phenomena Analysis and Classification using Enhanced vision techniques for SUIT, to detect and classify the solar chromospheric features to be observed from SUIT's Mg II k filter. Specifically, we target plage regions, sunspots, filaments, and off-limb structures. SPACE uses You Only Look Once(YOLO), a neural network-based model to identify regions of interest. We train and validate SPACE using mock-SUIT images developed from Interface Region Imaging Spectrometer(IRIS) full-disk mosaic images in Mg II k line, while we also perform detection on Level-1 SUIT data. SPACE achieves an approximate precision of 0.788, recall 0.863 and MAP of 0.874 on the validation mock SUIT FITS dataset. Given the manual labeling of our dataset, we perform "self-validation" by applying statistical measures and Tamura features on the ground truth and predicted bounding boxes. We find the distributions of entropy, contrast, dissimilarity, and energy to show differences in the features. These differences are qualitatively captured by the detected regions predicted by SPACE and validated with the observed SUIT images, even in the absence of labeled ground truth. This work not only develops a chromospheric feature extractor but also demonstrates the effectiveness of statistical metrics and Tamura features for distinguishing chromospheric features, offering independent validation for future detection schemes.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
DreamColour: Controllable Video Colour Editing without Training
Authors:
Chaitat Utintu,
Pinaki Nath Chowdhury,
Aneeshan Sain,
Subhadeep Koley,
Ayan Kumar Bhunia,
Yi-Zhe Song
Abstract:
Video colour editing is a crucial task for content creation, yet existing solutions either require painstaking frame-by-frame manipulation or produce unrealistic results with temporal artefacts. We present a practical, training-free framework that makes precise video colour editing accessible through an intuitive interface while maintaining professional-quality output. Our key insight is that by d…
▽ More
Video colour editing is a crucial task for content creation, yet existing solutions either require painstaking frame-by-frame manipulation or produce unrealistic results with temporal artefacts. We present a practical, training-free framework that makes precise video colour editing accessible through an intuitive interface while maintaining professional-quality output. Our key insight is that by decoupling spatial and temporal aspects of colour editing, we can better align with users' natural workflow -- allowing them to focus on precise colour selection in key frames before automatically propagating changes across time. We achieve this through a novel technical framework that combines: (i) a simple point-and-click interface merging grid-based colour selection with automatic instance segmentation for precise spatial control, (ii) bidirectional colour propagation that leverages inherent video motion patterns, and (iii) motion-aware blending that ensures smooth transitions even with complex object movements. Through extensive evaluation on diverse scenarios, we demonstrate that our approach matches or exceeds state-of-the-art methods while eliminating the need for training or specialized hardware, making professional-quality video colour editing accessible to everyone.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
GPT-4 as a Homework Tutor can Improve Student Engagement and Learning Outcomes
Authors:
Alessandro Vanzo,
Sankalan Pal Chowdhury,
Mrinmaya Sachan
Abstract:
This work contributes to the scarce empirical literature on LLM-based interactive homework in real-world educational settings and offers a practical, scalable solution for improving homework in schools. Homework is an important part of education in schools across the world, but in order to maximize benefit, it needs to be accompanied with feedback and followup questions. We developed a prompting s…
▽ More
This work contributes to the scarce empirical literature on LLM-based interactive homework in real-world educational settings and offers a practical, scalable solution for improving homework in schools. Homework is an important part of education in schools across the world, but in order to maximize benefit, it needs to be accompanied with feedback and followup questions. We developed a prompting strategy that enables GPT-4 to conduct interactive homework sessions for high-school students learning English as a second language. Our strategy requires minimal efforts in content preparation, one of the key challenges of alternatives like home tutors or ITSs. We carried out a Randomized Controlled Trial (RCT) in four high-school classes, replacing traditional homework with GPT-4 homework sessions for the treatment group. We observed significant improvements in learning outcomes, specifically a greater gain in grammar, and student engagement. In addition, students reported high levels of satisfaction with the system and wanted to continue using it after the end of the RCT.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
Graph Neural Ordinary Differential Equations for Coarse-Grained Socioeconomic Dynamics
Authors:
James Koch,
Pranab Roy Chowdhury,
Heng Wan,
Parin Bhaduri,
Jim Yoon,
Vivek Srikrishnan,
W. Brent Daniel
Abstract:
We present a data-driven machine-learning approach for modeling space-time socioeconomic dynamics. Through coarse-graining fine-scale observations, our modeling framework simplifies these complex systems to a set of tractable mechanistic relationships -- in the form of ordinary differential equations -- while preserving critical system behaviors. This approach allows for expedited 'what if' studie…
▽ More
We present a data-driven machine-learning approach for modeling space-time socioeconomic dynamics. Through coarse-graining fine-scale observations, our modeling framework simplifies these complex systems to a set of tractable mechanistic relationships -- in the form of ordinary differential equations -- while preserving critical system behaviors. This approach allows for expedited 'what if' studies and sensitivity analyses, essential for informed policy-making. Our findings, from a case study of Baltimore, MD, indicate that this machine learning-augmented coarse-grained model serves as a powerful instrument for deciphering the complex interactions between social factors, geography, and exogenous stressors, offering a valuable asset for system forecasting and resilience planning.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Do Generalised Classifiers really work on Human Drawn Sketches?
Authors:
Hmrishav Bandyopadhyay,
Pinaki Nath Chowdhury,
Aneeshan Sain,
Subhadeep Koley,
Tao Xiang,
Ayan Kumar Bhunia,
Yi-Zhe Song
Abstract:
This paper, for the first time, marries large foundation models with human sketch understanding. We demonstrate what this brings -- a paradigm shift in terms of generalised sketch representation learning (e.g., classification). This generalisation happens on two fronts: (i) generalisation across unknown categories (i.e., open-set), and (ii) generalisation traversing abstraction levels (i.e., good…
▽ More
This paper, for the first time, marries large foundation models with human sketch understanding. We demonstrate what this brings -- a paradigm shift in terms of generalised sketch representation learning (e.g., classification). This generalisation happens on two fronts: (i) generalisation across unknown categories (i.e., open-set), and (ii) generalisation traversing abstraction levels (i.e., good and bad sketches), both being timely challenges that remain unsolved in the sketch literature. Our design is intuitive and centred around transferring the already stellar generalisation ability of CLIP to benefit generalised learning for sketches. We first "condition" the vanilla CLIP model by learning sketch-specific prompts using a novel auxiliary head of raster to vector sketch conversion. This importantly makes CLIP "sketch-aware". We then make CLIP acute to the inherently different sketch abstraction levels. This is achieved by learning a codebook of abstraction-specific prompt biases, a weighted combination of which facilitates the representation of sketches across abstraction levels -- low abstract edge-maps, medium abstract sketches in TU-Berlin, and highly abstract doodles in QuickDraw. Our framework surpasses popular sketch representation learning algorithms in both zero-shot and few-shot setups and in novel settings across different abstraction boundaries.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval
Authors:
Aneeshan Sain,
Pinaki Nath Chowdhury,
Subhadeep Koley,
Ayan Kumar Bhunia,
Yi-Zhe Song
Abstract:
In this paper, we delve into the intricate dynamics of Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) by addressing a critical yet overlooked aspect -- the choice of viewpoint during sketch creation. Unlike photo systems that seamlessly handle diverse views through extensive datasets, sketch systems, with limited data collected from fixed perspectives, face challenges. Our pilot study, employ…
▽ More
In this paper, we delve into the intricate dynamics of Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) by addressing a critical yet overlooked aspect -- the choice of viewpoint during sketch creation. Unlike photo systems that seamlessly handle diverse views through extensive datasets, sketch systems, with limited data collected from fixed perspectives, face challenges. Our pilot study, employing a pre-trained FG-SBIR model, highlights the system's struggle when query-sketches differ in viewpoint from target instances. Interestingly, a questionnaire however shows users desire autonomy, with a significant percentage favouring view-specific retrieval. To reconcile this, we advocate for a view-aware system, seamlessly accommodating both view-agnostic and view-specific tasks. Overcoming dataset limitations, our first contribution leverages multi-view 2D projections of 3D objects, instilling cross-modal view awareness. The second contribution introduces a customisable cross-modal feature through disentanglement, allowing effortless mode switching. Extensive experiments on standard datasets validate the effectiveness of our method.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Assessing Effectiveness of Cyber Essentials Technical Controls
Authors:
Priyanka Badva,
Partha Das Chowdhury,
Kopo M. Ramokapane,
Barnaby Craggs,
Awais Rashid
Abstract:
Cyber Essentials (CE) comprise a set of controls designed to protect organisations, irrespective of their size, against cyber attacks. The controls are firewalls, secure configuration, user access control, malware protection & security update management. In this work, we explore the extent to which CE remains robust against an ever-evolving threat landscape. To that end, we reconstruct 45 breaches…
▽ More
Cyber Essentials (CE) comprise a set of controls designed to protect organisations, irrespective of their size, against cyber attacks. The controls are firewalls, secure configuration, user access control, malware protection & security update management. In this work, we explore the extent to which CE remains robust against an ever-evolving threat landscape. To that end, we reconstruct 45 breaches mapped to MiTRE ATT&CK using an Incident Fault Tree ( IFT ) approach. Our method reveals the intersections where the placement of controls could have protected organisations. Then we identify appropriate Cyber Essential controls and/or Additional Controls for these vulnerable intersections. Our results show that CE controls can effectively protect against most attacks during the initial attack phase. However, they may need to be complemented with additional Controls if the attack proceeds further into organisational systems & networks. The Additional Controls (AC) we identify include back-ups, security awareness, logging and monitoring. Our analysis brings to the fore a foundational issue as to whether controls should exclude recovery and focus only on pre-emption. The latter makes the strong assumption that a prior identification of all controls in a dynamic threat landscape is indeed possible. Furthermore, any potential broadening of technical controls entails re-scoping the skills that are required for a Cyber Essentials (CE) assessor. To that end, we suggest human factors and security operations and incident management as two potential knowledge areas from Cyber Security Body of Knowledge (CyBOK) if there is any broadening of CE based on these findings.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Are Objective Explanatory Evaluation metrics Trustworthy? An Adversarial Analysis
Authors:
Prithwijit Chowdhury,
Mohit Prabhushankar,
Ghassan AlRegib,
Mohamed Deriche
Abstract:
Explainable AI (XAI) has revolutionized the field of deep learning by empowering users to have more trust in neural network models. The field of XAI allows users to probe the inner workings of these algorithms to elucidate their decision-making processes. The rise in popularity of XAI has led to the advent of different strategies to produce explanations, all of which only occasionally agree. Thus…
▽ More
Explainable AI (XAI) has revolutionized the field of deep learning by empowering users to have more trust in neural network models. The field of XAI allows users to probe the inner workings of these algorithms to elucidate their decision-making processes. The rise in popularity of XAI has led to the advent of different strategies to produce explanations, all of which only occasionally agree. Thus several objective evaluation metrics have been devised to decide which of these modules give the best explanation for specific scenarios. The goal of the paper is twofold: (i) we employ the notions of necessity and sufficiency from causal literature to come up with a novel explanatory technique called SHifted Adversaries using Pixel Elimination(SHAPE) which satisfies all the theoretical and mathematical criteria of being a valid explanation, (ii) we show that SHAPE is, infact, an adversarial explanation that fools causal metrics that are employed to measure the robustness and reliability of popular importance based visual XAI methods. Our analysis shows that SHAPE outperforms popular explanatory techniques like GradCAM and GradCAM++ in these tests and is comparable to RISE, raising questions about the sanity of these metrics and the need for human involvement for an overall better evaluation.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
SketchDeco: Decorating B&W Sketches with Colour
Authors:
Chaitat Utintu,
Pinaki Nath Chowdhury,
Aneeshan Sain,
Subhadeep Koley,
Ayan Kumar Bhunia,
Yi-Zhe Song
Abstract:
This paper introduces a novel approach to sketch colourisation, inspired by the universal childhood activity of colouring and its professional applications in design and story-boarding. Striking a balance between precision and convenience, our method utilises region masks and colour palettes to allow intuitive user control, steering clear of the meticulousness of manual colour assignments or the l…
▽ More
This paper introduces a novel approach to sketch colourisation, inspired by the universal childhood activity of colouring and its professional applications in design and story-boarding. Striking a balance between precision and convenience, our method utilises region masks and colour palettes to allow intuitive user control, steering clear of the meticulousness of manual colour assignments or the limitations of textual prompts. By strategically combining ControlNet and staged generation, incorporating Stable Diffusion v1.5, and leveraging BLIP-2 text prompts, our methodology facilitates faithful image generation and user-directed colourisation. Addressing challenges of local and global consistency, we employ inventive solutions such as an inversion scheme, guided sampling, and a self-attention mechanism with a scaling factor. The resulting tool is not only fast and training-free but also compatible with consumer-grade Nvidia RTX 4090 Super GPUs, making it a valuable asset for both creative professionals and enthusiasts in various fields. Project Page: \url{https://chaitron.github.io/SketchDeco/}
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
What Sketch Explainability Really Means for Downstream Tasks
Authors:
Hmrishav Bandyopadhyay,
Pinaki Nath Chowdhury,
Ayan Kumar Bhunia,
Aneeshan Sain,
Tao Xiang,
Yi-Zhe Song
Abstract:
In this paper, we explore the unique modality of sketch for explainability, emphasising the profound impact of human strokes compared to conventional pixel-oriented studies. Beyond explanations of network behavior, we discern the genuine implications of explainability across diverse downstream sketch-related tasks. We propose a lightweight and portable explainability solution -- a seamless plugin…
▽ More
In this paper, we explore the unique modality of sketch for explainability, emphasising the profound impact of human strokes compared to conventional pixel-oriented studies. Beyond explanations of network behavior, we discern the genuine implications of explainability across diverse downstream sketch-related tasks. We propose a lightweight and portable explainability solution -- a seamless plugin that integrates effortlessly with any pre-trained model, eliminating the need for re-training. Demonstrating its adaptability, we present four applications: highly studied retrieval and generation, and completely novel assisted drawing and sketch adversarial attacks. The centrepiece to our solution is a stroke-level attribution map that takes different forms when linked with downstream tasks. By addressing the inherent non-differentiability of rasterisation, we enable explanations at both coarse stroke level (SLA) and partial stroke level (P-SLA), each with its advantages for specific downstream tasks.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
SketchINR: A First Look into Sketches as Implicit Neural Representations
Authors:
Hmrishav Bandyopadhyay,
Ayan Kumar Bhunia,
Pinaki Nath Chowdhury,
Aneeshan Sain,
Tao Xiang,
Timothy Hospedales,
Yi-Zhe Song
Abstract:
We propose SketchINR, to advance the representation of vector sketches with implicit neural models. A variable length vector sketch is compressed into a latent space of fixed dimension that implicitly encodes the underlying shape as a function of time and strokes. The learned function predicts the $xy$ point coordinates in a sketch at each time and stroke. Despite its simplicity, SketchINR outperf…
▽ More
We propose SketchINR, to advance the representation of vector sketches with implicit neural models. A variable length vector sketch is compressed into a latent space of fixed dimension that implicitly encodes the underlying shape as a function of time and strokes. The learned function predicts the $xy$ point coordinates in a sketch at each time and stroke. Despite its simplicity, SketchINR outperforms existing representations at multiple tasks: (i) Encoding an entire sketch dataset into a fixed size latent vector, SketchINR gives $60\times$ and $10\times$ data compression over raster and vector sketches, respectively. (ii) SketchINR's auto-decoder provides a much higher-fidelity representation than other learned vector sketch representations, and is uniquely able to scale to complex vector sketches such as FS-COCO. (iii) SketchINR supports parallelisation that can decode/render $\sim$$100\times$ faster than other learned vector representations such as SketchRNN. (iv) SketchINR, for the first time, emulates the human ability to reproduce a sketch with varying abstraction in terms of number and complexity of strokes. As a first look at implicit sketches, SketchINR's compact high-fidelity representation will support future work in modelling long and complex sketches.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
Authors:
Subhadeep Koley,
Ayan Kumar Bhunia,
Deeptanshu Sekhri,
Aneeshan Sain,
Pinaki Nath Chowdhury,
Tao Xiang,
Yi-Zhe Song
Abstract:
This paper unravels the potential of sketches for diffusion models, addressing the deceptive promise of direct sketch control in generative AI. We importantly democratise the process, enabling amateur sketches to generate precise images, living up to the commitment of "what you sketch is what you get". A pilot study underscores the necessity, revealing that deformities in existing models stem from…
▽ More
This paper unravels the potential of sketches for diffusion models, addressing the deceptive promise of direct sketch control in generative AI. We importantly democratise the process, enabling amateur sketches to generate precise images, living up to the commitment of "what you sketch is what you get". A pilot study underscores the necessity, revealing that deformities in existing models stem from spatial-conditioning. To rectify this, we propose an abstraction-aware framework, utilising a sketch adapter, adaptive time-step sampling, and discriminative guidance from a pre-trained fine-grained sketch-based image retrieval model, working synergistically to reinforce fine-grained sketch-photo association. Our approach operates seamlessly during inference without the need for textual prompts; a simple, rough sketch akin to what you and I can create suffices! We welcome everyone to examine results presented in the paper and its supplementary. Contributions include democratising sketch control, introducing an abstraction-aware framework, and leveraging discriminative guidance, validated through extensive experiments.
△ Less
Submitted 20 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval
Authors:
Subhadeep Koley,
Ayan Kumar Bhunia,
Aneeshan Sain,
Pinaki Nath Chowdhury,
Tao Xiang,
Yi-Zhe Song
Abstract:
Two primary input modalities prevail in image retrieval: sketch and text. While text is widely used for inter-category retrieval tasks, sketches have been established as the sole preferred modality for fine-grained image retrieval due to their ability to capture intricate visual details. In this paper, we question the reliance on sketches alone for fine-grained image retrieval by simultaneously ex…
▽ More
Two primary input modalities prevail in image retrieval: sketch and text. While text is widely used for inter-category retrieval tasks, sketches have been established as the sole preferred modality for fine-grained image retrieval due to their ability to capture intricate visual details. In this paper, we question the reliance on sketches alone for fine-grained image retrieval by simultaneously exploring the fine-grained representation capabilities of both sketch and text, orchestrating a duet between the two. The end result enables precise retrievals previously unattainable, allowing users to pose ever-finer queries and incorporate attributes like colour and contextual cues from text. For this purpose, we introduce a novel compositionality framework, effectively combining sketches and text using pre-trained CLIP models, while eliminating the need for extensive fine-grained textual descriptions. Last but not least, our system extends to novel applications in composed image retrieval, domain attribute transfer, and fine-grained generation, providing solutions for various real-world scenarios.
△ Less
Submitted 20 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers
Authors:
Subhadeep Koley,
Ayan Kumar Bhunia,
Aneeshan Sain,
Pinaki Nath Chowdhury,
Tao Xiang,
Yi-Zhe Song
Abstract:
This paper, for the first time, explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR). We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos. This proficiency is underpinned by their robust cross-modal capabilities and shape bias, findings that are substantiated through our pi…
▽ More
This paper, for the first time, explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR). We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos. This proficiency is underpinned by their robust cross-modal capabilities and shape bias, findings that are substantiated through our pilot studies. In order to harness pre-trained diffusion models effectively, we introduce a straightforward yet powerful strategy focused on two key aspects: selecting optimal feature layers and utilising visual and textual prompts. For the former, we identify which layers are most enriched with information and are best suited for the specific retrieval requirements (category-level or fine-grained). Then we employ visual and textual prompts to guide the model's feature extraction process, enabling it to generate more discriminative and contextually relevant cross-modal representations. Extensive experiments on several benchmark datasets validate significant performance improvements.
△ Less
Submitted 20 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?
Authors:
Subhadeep Koley,
Ayan Kumar Bhunia,
Aneeshan Sain,
Pinaki Nath Chowdhury,
Tao Xiang,
Yi-Zhe Song
Abstract:
In this paper, we propose a novel abstraction-aware sketch-based image retrieval framework capable of handling sketch abstraction at varied levels. Prior works had mainly focused on tackling sub-factors such as drawing style and order, we instead attempt to model abstraction as a whole, and propose feature-level and retrieval granularity-level designs so that the system builds into its DNA the nec…
▽ More
In this paper, we propose a novel abstraction-aware sketch-based image retrieval framework capable of handling sketch abstraction at varied levels. Prior works had mainly focused on tackling sub-factors such as drawing style and order, we instead attempt to model abstraction as a whole, and propose feature-level and retrieval granularity-level designs so that the system builds into its DNA the necessary means to interpret abstraction. On learning abstraction-aware features, we for the first-time harness the rich semantic embedding of pre-trained StyleGAN model, together with a novel abstraction-level mapper that deciphers the level of abstraction and dynamically selects appropriate dimensions in the feature matrix correspondingly, to construct a feature matrix embedding that can be freely traversed to accommodate different levels of abstraction. For granularity-level abstraction understanding, we dictate that the retrieval model should not treat all abstraction-levels equally and introduce a differentiable surrogate Acc.@q loss to inject that understanding into the system. Different to the gold-standard triplet loss, our Acc.@q loss uniquely allows a sketch to narrow/broaden its focus in terms of how stringent the evaluation should be - the more abstract a sketch, the less stringent (higher q). Extensive experiments depict our method to outperform existing state-of-the-arts in standard SBIR tasks along with challenging scenarios like early retrieval, forensic sketch-photo matching, and style-invariant retrieval.
△ Less
Submitted 20 March, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Book2Dial: Generating Teacher-Student Interactions from Textbooks for Cost-Effective Development of Educational Chatbots
Authors:
Junling Wang,
Jakub Macina,
Nico Daheim,
Sankalan Pal Chowdhury,
Mrinmaya Sachan
Abstract:
Educational chatbots are a promising tool for assisting student learning. However, the development of effective chatbots in education has been challenging, as high-quality data is seldom available in this domain. In this paper, we propose a framework for generating synthetic teacher-student interactions grounded in a set of textbooks. Our approaches capture one aspect of learning interactions wher…
▽ More
Educational chatbots are a promising tool for assisting student learning. However, the development of effective chatbots in education has been challenging, as high-quality data is seldom available in this domain. In this paper, we propose a framework for generating synthetic teacher-student interactions grounded in a set of textbooks. Our approaches capture one aspect of learning interactions where curious students with partial knowledge interactively ask a teacher questions about the material in the textbook. We highlight various quality criteria that such dialogues should fulfill and compare several approaches relying on either prompting or fine-tuning large language models. We use synthetic dialogues to train educational chatbots and show benefits of further fine-tuning in different educational domains. However, human evaluation shows that our best data synthesis method still suffers from hallucinations and tends to reiterate information from previous conversations. Our findings offer insights for future efforts in synthesizing conversational data that strikes a balance between size and quality. We will open-source our data and code.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails
Authors:
Sankalan Pal Chowdhury,
Vilém Zouhar,
Mrinmaya Sachan
Abstract:
Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation. In this paper, we explore the potential of using Large Language Models (LLMs) to author Intelligent Tutoring Systems. A common pitfall of LLMs is their straying from desired pedagogical strategies such as leaking the answer to the student, and in general, providin…
▽ More
Large Language Models (LLMs) have found several use cases in education, ranging from automatic question generation to essay evaluation. In this paper, we explore the potential of using Large Language Models (LLMs) to author Intelligent Tutoring Systems. A common pitfall of LLMs is their straying from desired pedagogical strategies such as leaking the answer to the student, and in general, providing no guarantees. We posit that while LLMs with certain guardrails can take the place of subject experts, the overall pedagogical design still needs to be handcrafted for the best learning results. Based on this principle, we create a sample end-to-end tutoring system named MWPTutor, which uses LLMs to fill in the state space of a pre-defined finite state transducer. This approach retains the structure and the pedagogy of traditional tutoring systems that has been developed over the years by learning scientists but brings in additional flexibility of LLM-based approaches. Through a human evaluation study on two datasets based on math word problems, we show that our hybrid approach achieves a better overall tutoring score than an instructed, but otherwise free-form, GPT-4. MWPTutor is completely modular and opens up the scope for the community to improve its performance by improving individual modules or using different teaching strategies that it can follow.
△ Less
Submitted 25 April, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Authors:
Dar-Yen Chen,
Ayan Kumar Bhunia,
Subhadeep Koley,
Aneeshan Sain,
Pinaki Nath Chowdhury,
Yi-Zhe Song
Abstract:
In this paper, we democratise caricature generation, empowering individuals to effortlessly craft personalised caricatures with just a photo and a conceptual sketch. Our objective is to strike a delicate balance between abstraction and identity, while preserving the creativity and subjectivity inherent in a sketch. To achieve this, we present Explicit Rank-1 Model Editing alongside single-image pe…
▽ More
In this paper, we democratise caricature generation, empowering individuals to effortlessly craft personalised caricatures with just a photo and a conceptual sketch. Our objective is to strike a delicate balance between abstraction and identity, while preserving the creativity and subjectivity inherent in a sketch. To achieve this, we present Explicit Rank-1 Model Editing alongside single-image personalisation, selectively applying nuanced edits to cross-attention layers for a seamless merge of identity and style. Additionally, we propose Random Mask Reconstruction to enhance robustness, directing the model to focus on distinctive identity and style features. Crucially, our aim is not to replace artists but to eliminate accessibility barriers, allowing enthusiasts to engage in the artistry.
△ Less
Submitted 24 March, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Doodle Your 3D: From Abstract Freehand Sketches to Precise 3D Shapes
Authors:
Hmrishav Bandyopadhyay,
Subhadeep Koley,
Ayan Das,
Ayan Kumar Bhunia,
Aneeshan Sain,
Pinaki Nath Chowdhury,
Tao Xiang,
Yi-Zhe Song
Abstract:
In this paper, we democratise 3D content creation, enabling precise generation of 3D shapes from abstract sketches while overcoming limitations tied to drawing skills. We introduce a novel part-level modelling and alignment framework that facilitates abstraction modelling and cross-modal correspondence. Leveraging the same part-level decoder, our approach seamlessly extends to sketch modelling by…
▽ More
In this paper, we democratise 3D content creation, enabling precise generation of 3D shapes from abstract sketches while overcoming limitations tied to drawing skills. We introduce a novel part-level modelling and alignment framework that facilitates abstraction modelling and cross-modal correspondence. Leveraging the same part-level decoder, our approach seamlessly extends to sketch modelling by establishing correspondence between CLIPasso edgemaps and projected 3D part regions, eliminating the need for a dataset pairing human sketches and 3D shapes. Additionally, our method introduces a seamless in-position editing process as a byproduct of cross-modal part-aligned modelling. Operating in a low-dimensional implicit space, our approach significantly reduces computational demands and processing time.
△ Less
Submitted 7 June, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Co-creating a Transdisciplinary Map of Technology-mediated Harms, Risks and Vulnerabilities: Challenges, Ambivalences and Opportunities
Authors:
Andrés Domínguez Hernández,
Kopo M. Ramokapane,
Partha Das Chowdhury,
Ola Michalec,
Emily Johnstone,
Emily Godwin,
Alicia G Cork,
Awais Rashid
Abstract:
The phrase "online harms" has emerged in recent years out of a growing political willingness to address the ethical and social issues associated with the use of the Internet and digital technology at large. The broad landscape that surrounds online harms gathers a multitude of disciplinary, sectoral and organizational efforts while raising myriad challenges and opportunities for the crossing entre…
▽ More
The phrase "online harms" has emerged in recent years out of a growing political willingness to address the ethical and social issues associated with the use of the Internet and digital technology at large. The broad landscape that surrounds online harms gathers a multitude of disciplinary, sectoral and organizational efforts while raising myriad challenges and opportunities for the crossing entrenched boundaries. In this paper we draw lessons from a journey of co-creating a transdisciplinary knowledge infrastructure within a large research initiative animated by the online harms agenda. We begin with a reflection of the implications of mapping, taxonomizing and constructing knowledge infrastructures and a brief review of how online harm and adjacent themes have been theorized and classified in the literature to date. Grounded on our own experience of co-creating a map of online harms, we then argue that the map -- and the process of mapping -- perform three mutually constitutive functions, acting simultaneously as method, medium and provocation. We draw lessons from how an open-ended approach to mapping, despite not guaranteeing consensus, can foster productive debate and collaboration in ethically and politically fraught areas of research. We end with a call for CSCW research to surface and engage with the multiple temporalities, social lives and political sensibilities of knowledge infrastructures.
△ Less
Submitted 19 July, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
3D VR Sketch Guided 3D Shape Prototyping and Exploration
Authors:
Ling Luo,
Pinaki Nath Chowdhury,
Tao Xiang,
Yi-Zhe Song,
Yulia Gryaditskaya
Abstract:
3D shape modeling is labor-intensive, time-consuming, and requires years of expertise. To facilitate 3D shape modeling, we propose a 3D shape generation network that takes a 3D VR sketch as a condition. We assume that sketches are created by novices without art training and aim to reconstruct geometrically realistic 3D shapes of a given category. To handle potential sketch ambiguity, our method cr…
▽ More
3D shape modeling is labor-intensive, time-consuming, and requires years of expertise. To facilitate 3D shape modeling, we propose a 3D shape generation network that takes a 3D VR sketch as a condition. We assume that sketches are created by novices without art training and aim to reconstruct geometrically realistic 3D shapes of a given category. To handle potential sketch ambiguity, our method creates multiple 3D shapes that align with the original sketch's structure. We carefully design our method, training the model step-by-step and leveraging multi-modal 3D shape representation to support training with limited training data. To guarantee the realism of generated 3D shapes we leverage the normalizing flow that models the distribution of the latent space of 3D shapes. To encourage the fidelity of the generated 3D shapes to an input sketch, we propose a dedicated loss that we deploy at different stages of the training process. The code is available at https://github.com/Rowl1ng/3Dsketch2shape.
△ Less
Submitted 10 January, 2024; v1 submitted 19 June, 2023;
originally announced June 2023.
-
MathDial: A Dialogue Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems
Authors:
Jakub Macina,
Nico Daheim,
Sankalan Pal Chowdhury,
Tanmay Sinha,
Manu Kapur,
Iryna Gurevych,
Mrinmaya Sachan
Abstract:
While automatic dialogue tutors hold great potential in making education personalized and more accessible, research on such systems has been hampered by a lack of sufficiently large and high-quality datasets. Collecting such datasets remains challenging, as recording tutoring sessions raises privacy concerns and crowdsourcing leads to insufficient data quality. To address this, we propose a framew…
▽ More
While automatic dialogue tutors hold great potential in making education personalized and more accessible, research on such systems has been hampered by a lack of sufficiently large and high-quality datasets. Collecting such datasets remains challenging, as recording tutoring sessions raises privacy concerns and crowdsourcing leads to insufficient data quality. To address this, we propose a framework to generate such dialogues by pairing human teachers with a Large Language Model (LLM) prompted to represent common student errors. We describe how we use this framework to collect MathDial, a dataset of 3k one-to-one teacher-student tutoring dialogues grounded in multi-step math reasoning problems. While models like GPT-3 are good problem solvers, they fail at tutoring because they generate factually incorrect feedback or are prone to revealing solutions to students too early. To overcome this, we let teachers provide learning opportunities to students by guiding them using various scaffolding questions according to a taxonomy of teacher moves. We demonstrate MathDial and its extensive annotations can be used to finetune models to be more effective tutors (and not just solvers). We confirm this by automatic and human evaluation, notably in an interactive setting that measures the trade-off between student solving success and telling solutions. The dataset is released publicly.
△ Less
Submitted 23 October, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
What Can Human Sketches Do for Object Detection?
Authors:
Pinaki Nath Chowdhury,
Ayan Kumar Bhunia,
Aneeshan Sain,
Subhadeep Koley,
Tao Xiang,
Yi-Zhe Song
Abstract:
Sketches are highly expressive, inherently capturing subjective and fine-grained visual cues. The exploration of such innate properties of human sketches has, however, been limited to that of image retrieval. In this paper, for the first time, we cultivate the expressiveness of sketches but for the fundamental vision task of object detection. The end result is a sketch-enabled object detection fra…
▽ More
Sketches are highly expressive, inherently capturing subjective and fine-grained visual cues. The exploration of such innate properties of human sketches has, however, been limited to that of image retrieval. In this paper, for the first time, we cultivate the expressiveness of sketches but for the fundamental vision task of object detection. The end result is a sketch-enabled object detection framework that detects based on what \textit{you} sketch -- \textit{that} ``zebra'' (e.g., one that is eating the grass) in a herd of zebras (instance-aware detection), and only the \textit{part} (e.g., ``head" of a ``zebra") that you desire (part-aware detection). We further dictate that our model works without (i) knowing which category to expect at testing (zero-shot) and (ii) not requiring additional bounding boxes (as per fully supervised) and class labels (as per weakly supervised). Instead of devising a model from the ground up, we show an intuitive synergy between foundation models (e.g., CLIP) and existing sketch models build for sketch-based image retrieval (SBIR), which can already elegantly solve the task -- CLIP to provide model generalisation, and SBIR to bridge the (sketch$\rightarrow$photo) gap. In particular, we first perform independent prompting on both sketch and photo branches of an SBIR model to build highly generalisable sketch and photo encoders on the back of the generalisation ability of CLIP. We then devise a training paradigm to adapt the learned encoders for object detection, such that the region embeddings of detected boxes are aligned with the sketch and photo embeddings from SBIR. Evaluating our framework on standard object detection datasets like PASCAL-VOC and MS-COCO outperforms both supervised (SOD) and weakly-supervised object detectors (WSOD) on zero-shot setups. Project Page: \url{https://pinakinathc.github.io/sketch-detect}
△ Less
Submitted 28 October, 2023; v1 submitted 27 March, 2023;
originally announced March 2023.
-
Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR
Authors:
Aneeshan Sain,
Ayan Kumar Bhunia,
Subhadeep Koley,
Pinaki Nath Chowdhury,
Soumitri Chattopadhyay,
Tao Xiang,
Yi-Zhe Song
Abstract:
This paper advances the fine-grained sketch-based image retrieval (FG-SBIR) literature by putting forward a strong baseline that overshoots prior state-of-the-arts by ~11%. This is not via complicated design though, but by addressing two critical issues facing the community (i) the gold standard triplet loss does not enforce holistic latent space geometry, and (ii) there are never enough sketches…
▽ More
This paper advances the fine-grained sketch-based image retrieval (FG-SBIR) literature by putting forward a strong baseline that overshoots prior state-of-the-arts by ~11%. This is not via complicated design though, but by addressing two critical issues facing the community (i) the gold standard triplet loss does not enforce holistic latent space geometry, and (ii) there are never enough sketches to train a high accuracy model. For the former, we propose a simple modification to the standard triplet loss, that explicitly enforces separation amongst photos/sketch instances. For the latter, we put forward a novel knowledge distillation module can leverage photo data for model training. Both modules are then plugged into a novel plug-n-playable training paradigm that allows for more stable training. More specifically, for (i) we employ an intra-modal triplet loss amongst sketches to bring sketches of the same instance closer from others, and one more amongst photos to push away different photo instances while bringing closer a structurally augmented version of the same photo (offering a gain of ~4-6%). To tackle (ii), we first pre-train a teacher on the large set of unlabelled photos over the aforementioned intra-modal photo triplet loss. Then we distill the contextual similarity present amongst the instances in the teacher's embedding space to that in the student's embedding space, by matching the distribution over inter-feature distances of respective samples in both embedding spaces (delivering a further gain of ~4-5%). Apart from outperforming prior arts significantly, our model also yields satisfactory results on generalising to new classes. Project page: https://aneeshan95.github.io/Sketch_PVT/
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not
Authors:
Aneeshan Sain,
Ayan Kumar Bhunia,
Pinaki Nath Chowdhury,
Subhadeep Koley,
Tao Xiang,
Yi-Zhe Song
Abstract:
In this paper, we leverage CLIP for zero-shot sketch based image retrieval (ZS-SBIR). We are largely inspired by recent advances on foundation models and the unparalleled generalisation ability they seem to offer, but for the first time tailor it to benefit the sketch community. We put forward novel designs on how best to achieve this synergy, for both the category setting and the fine-grained set…
▽ More
In this paper, we leverage CLIP for zero-shot sketch based image retrieval (ZS-SBIR). We are largely inspired by recent advances on foundation models and the unparalleled generalisation ability they seem to offer, but for the first time tailor it to benefit the sketch community. We put forward novel designs on how best to achieve this synergy, for both the category setting and the fine-grained setting ("all"). At the very core of our solution is a prompt learning setup. First we show just via factoring in sketch-specific prompts, we already have a category-level ZS-SBIR system that overshoots all prior arts, by a large margin (24.8%) - a great testimony on studying the CLIP and ZS-SBIR synergy. Moving onto the fine-grained setup is however trickier, and requires a deeper dive into this synergy. For that, we come up with two specific designs to tackle the fine-grained matching nature of the problem: (i) an additional regularisation loss to ensure the relative separation between sketches and photos is uniform across categories, which is not the case for the gold standard standalone triplet loss, and (ii) a clever patch shuffling technique to help establishing instance-level structural correspondences between sketch-photo pairs. With these designs, we again observe significant performance gains in the region of 26.9% over previous state-of-the-art. The take-home message, if any, is the proposed CLIP and prompt learning paradigm carries great promise in tackling other sketch-related tasks (not limited to ZS-SBIR) where data scarcity remains a great challenge. Project page: https://aneeshan95.github.io/Sketch_LVM/
△ Less
Submitted 27 March, 2023; v1 submitted 23 March, 2023;
originally announced March 2023.
-
Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings
Authors:
Ayan Kumar Bhunia,
Subhadeep Koley,
Amandeep Kumar,
Aneeshan Sain,
Pinaki Nath Chowdhury,
Tao Xiang,
Yi-Zhe Song
Abstract:
Human sketch has already proved its worth in various visual understanding tasks (e.g., retrieval, segmentation, image-captioning, etc). In this paper, we reveal a new trait of sketches - that they are also salient. This is intuitive as sketching is a natural attentive process at its core. More specifically, we aim to study how sketches can be used as a weak label to detect salient objects present…
▽ More
Human sketch has already proved its worth in various visual understanding tasks (e.g., retrieval, segmentation, image-captioning, etc). In this paper, we reveal a new trait of sketches - that they are also salient. This is intuitive as sketching is a natural attentive process at its core. More specifically, we aim to study how sketches can be used as a weak label to detect salient objects present in an image. To this end, we propose a novel method that emphasises on how "salient object" could be explained by hand-drawn sketches. To accomplish this, we introduce a photo-to-sketch generation model that aims to generate sequential sketch coordinates corresponding to a given visual photo through a 2D attention mechanism. Attention maps accumulated across the time steps give rise to salient regions in the process. Extensive quantitative and qualitative experiments prove our hypothesis and delineate how our sketch-based saliency detection model gives a competitive performance compared to the state-of-the-art.
△ Less
Submitted 30 March, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Picture that Sketch: Photorealistic Image Generation from Abstract Sketches
Authors:
Subhadeep Koley,
Ayan Kumar Bhunia,
Aneeshan Sain,
Pinaki Nath Chowdhury,
Tao Xiang,
Yi-Zhe Song
Abstract:
Given an abstract, deformed, ordinary sketch from untrained amateurs like you and me, this paper turns it into a photorealistic image - just like those shown in Fig. 1(a), all non-cherry-picked. We differ significantly from prior art in that we do not dictate an edgemap-like sketch to start with, but aim to work with abstract free-hand human sketches. In doing so, we essentially democratise the sk…
▽ More
Given an abstract, deformed, ordinary sketch from untrained amateurs like you and me, this paper turns it into a photorealistic image - just like those shown in Fig. 1(a), all non-cherry-picked. We differ significantly from prior art in that we do not dictate an edgemap-like sketch to start with, but aim to work with abstract free-hand human sketches. In doing so, we essentially democratise the sketch-to-photo pipeline, "picturing" a sketch regardless of how good you sketch. Our contribution at the outset is a decoupled encoder-decoder training paradigm, where the decoder is a StyleGAN trained on photos only. This importantly ensures that generated results are always photorealistic. The rest is then all centred around how best to deal with the abstraction gap between sketch and photo. For that, we propose an autoregressive sketch mapper trained on sketch-photo pairs that maps a sketch to the StyleGAN latent space. We further introduce specific designs to tackle the abstract nature of human sketches, including a fine-grained discriminative loss on the back of a trained sketch-photo retrieval model, and a partial-aware sketch augmentation strategy. Finally, we showcase a few downstream tasks our generation model enables, amongst them is showing how fine-grained sketch-based image retrieval, a well-studied problem in the sketch community, can be reduced to an image (generated) to image retrieval task, surpassing state-of-the-arts. We put forward generated results in the supplementary for everyone to scrutinise.
△ Less
Submitted 30 March, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Threat Models over Space and Time: A Case Study of E2EE Messaging Applications
Authors:
Partha Das Chowdhury,
Maria Sameen,
Jenny Blessing,
Nicholas Boucher,
Joseph Gardiner,
Tom Burrows,
Ross Anderson,
Awais Rashid
Abstract:
Threat modelling is foundational to secure systems engineering and should be done in consideration of the context within which systems operate. On the other hand, the continuous evolution of both the technical sophistication of threats and the system attack surface is an inescapable reality. In this work, we explore the extent to which real-world systems engineering reflects the changing threat co…
▽ More
Threat modelling is foundational to secure systems engineering and should be done in consideration of the context within which systems operate. On the other hand, the continuous evolution of both the technical sophistication of threats and the system attack surface is an inescapable reality. In this work, we explore the extent to which real-world systems engineering reflects the changing threat context. To this end we examine the desktop clients of six widely used end-to-end-encrypted mobile messaging applications to understand the extent to which they adjusted their threat model over space (when enabling clients on new platforms, such as desktop clients) and time (as new threats emerged). We experimented with short-lived adversarial access against these desktop clients and analyzed the results with respect to two popular threat elicitation frameworks, STRIDE and LINDDUN. The results demonstrate that system designers need to both recognise the threats in the evolving context within which systems operate and, more importantly, to mitigate them by rescoping trust boundaries in a manner that those within the administrative boundary cannot violate security and privacy properties. Such a nuanced understanding of trust boundary scopes and their relationship with administrative boundaries allows for better administration of shared components, including securing them with safe defaults.
△ Less
Submitted 28 May, 2023; v1 submitted 13 January, 2023;
originally announced January 2023.
-
Better Call Saltzer \& Schroeder: A Retrospective Security Analysis of SolarWinds \& Log4j
Authors:
Partha Das Chowdhury,
Mohammad Tahaei,
Awais Rashid
Abstract:
Saltzer \& Schroeder's principles aim to bring security to the design of computer systems. We investigate SolarWinds Orion update and Log4j to unpack the intersections where observance of these principles could have mitigated the embedded vulnerabilities. The common principles that were not observed include \emph{fail safe defaults}, \emph{economy of mechanism}, \emph{complete mediation} and \emph…
▽ More
Saltzer \& Schroeder's principles aim to bring security to the design of computer systems. We investigate SolarWinds Orion update and Log4j to unpack the intersections where observance of these principles could have mitigated the embedded vulnerabilities. The common principles that were not observed include \emph{fail safe defaults}, \emph{economy of mechanism}, \emph{complete mediation} and \emph{least privilege}. Then we explore the literature on secure software development interventions for developers to identify usable analysis tools and frameworks that can contribute towards improved observance of these principles. We focus on a system wide view of access of codes, checking access paths and aiding application developers with safe libraries along with an appropriate security task list for functionalities.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Adaptive Fine-Grained Sketch-Based Image Retrieval
Authors:
Ayan Kumar Bhunia,
Aneeshan Sain,
Parth Shah,
Animesh Gupta,
Pinaki Nath Chowdhury,
Tao Xiang,
Yi-Zhe Song
Abstract:
The recent focus on Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) has shifted towards generalising a model to new categories without any training data from them. In real-world applications, however, a trained FG-SBIR model is often applied to both new categories and different human sketchers, i.e., different drawing styles. Although this complicates the generalisation problem, fortunately, a…
▽ More
The recent focus on Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) has shifted towards generalising a model to new categories without any training data from them. In real-world applications, however, a trained FG-SBIR model is often applied to both new categories and different human sketchers, i.e., different drawing styles. Although this complicates the generalisation problem, fortunately, a handful of examples are typically available, enabling the model to adapt to the new category/style. In this paper, we offer a novel perspective -- instead of asking for a model that generalises, we advocate for one that quickly adapts, with just very few samples during testing (in a few-shot manner). To solve this new problem, we introduce a novel model-agnostic meta-learning (MAML) based framework with several key modifications: (1) As a retrieval task with a margin-based contrastive loss, we simplify the MAML training in the inner loop to make it more stable and tractable. (2) The margin in our contrastive loss is also meta-learned with the rest of the model. (3) Three additional regularisation losses are introduced in the outer loop, to make the meta-learned FG-SBIR model more effective for category/style adaptation. Extensive experiments on public datasets suggest a large gain over generalisation and zero-shot based approaches, and a few strong few-shot baselines.
△ Less
Submitted 19 August, 2022; v1 submitted 4 July, 2022;
originally announced July 2022.
-
SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text
Authors:
Pinaki Nath Chowdhury,
Ayan Kumar Bhunia,
Aneeshan Sain,
Subhadeep Koley,
Tao Xiang,
Yi-Zhe Song
Abstract:
In this paper, we extend scene understanding to include that of human sketch. The result is a complete trilogy of scene representation from three diverse and complementary modalities -- sketch, photo, and text. Instead of learning a rigid three-way embedding and be done with it, we focus on learning a flexible joint embedding that fully supports the ``optionality" that this complementarity brings.…
▽ More
In this paper, we extend scene understanding to include that of human sketch. The result is a complete trilogy of scene representation from three diverse and complementary modalities -- sketch, photo, and text. Instead of learning a rigid three-way embedding and be done with it, we focus on learning a flexible joint embedding that fully supports the ``optionality" that this complementarity brings. Our embedding supports optionality on two axes: (i) optionality across modalities -- use any combination of modalities as query for downstream tasks like retrieval, (ii) optionality across tasks -- simultaneously utilising the embedding for either discriminative (e.g., retrieval) or generative tasks (e.g., captioning). This provides flexibility to end-users by exploiting the best of each modality, therefore serving the very purpose behind our proposal of a trilogy in the first place. First, a combination of information-bottleneck and conditional invertible neural networks disentangle the modality-specific component from modality-agnostic in sketch, photo, and text. Second, the modality-agnostic instances from sketch, photo, and text are synergised using a modified cross-attention. Once learned, we show our embedding can accommodate a multi-facet of scene-related tasks, including those enabled for the first time by the inclusion of sketch, all without any task-specific modifications. Project Page: \url{http://www.pinakinathc.me/scenetrilogy}
△ Less
Submitted 26 March, 2023; v1 submitted 25 April, 2022;
originally announced April 2022.
-
Deep Learning based Automatic Detection of Dicentric Chromosome
Authors:
Angad Singh Wadhwa,
Nikhil Tyagi,
Pinaki Roy Chowdhury
Abstract:
Automatic detection of dicentric chromosomes is an essential step to estimate radiation exposure and development of end to end emergency bio dosimetry systems. During accidents, a large amount of data is required to be processed for extensive testing to formulate a medical treatment plan for the masses, which requires this process to be automated. Current approaches require human adjustments accor…
▽ More
Automatic detection of dicentric chromosomes is an essential step to estimate radiation exposure and development of end to end emergency bio dosimetry systems. During accidents, a large amount of data is required to be processed for extensive testing to formulate a medical treatment plan for the masses, which requires this process to be automated. Current approaches require human adjustments according to the data and therefore need a human expert to calibrate the system. This paper proposes a completely data driven framework which requires minimum intervention of field experts and can be deployed in emergency cases with relative ease. Our approach involves YOLOv4 to detect the chromosomes and remove the debris in each image, followed by a classifier that differentiates between an analysable chromosome and a non-analysable one. Images are extracted from YOLOv4 based on the protocols described by WHO-BIODOSNET. The analysable chromosome is classified as Monocentric or Dicentric and an image is accepted for consideration of dose estimation based on the analysable chromosome count. We report an accuracy in dicentric identification of 94.33% on a 1:1 split of Dicentric and Monocentric Chromosomes.
△ Less
Submitted 17 April, 2022;
originally announced April 2022.
-
Sketching without Worrying: Noise-Tolerant Sketch-Based Image Retrieval
Authors:
Ayan Kumar Bhunia,
Subhadeep Koley,
Abdullah Faiz Ur Rahman Khilji,
Aneeshan Sain,
Pinaki Nath Chowdhury,
Tao Xiang,
Yi-Zhe Song
Abstract:
Sketching enables many exciting applications, notably, image retrieval. The fear-to-sketch problem (i.e., "I can't sketch") has however proven to be fatal for its widespread adoption. This paper tackles this "fear" head on, and for the first time, proposes an auxiliary module for existing retrieval models that predominantly lets the users sketch without having to worry. We first conducted a pilot…
▽ More
Sketching enables many exciting applications, notably, image retrieval. The fear-to-sketch problem (i.e., "I can't sketch") has however proven to be fatal for its widespread adoption. This paper tackles this "fear" head on, and for the first time, proposes an auxiliary module for existing retrieval models that predominantly lets the users sketch without having to worry. We first conducted a pilot study that revealed the secret lies in the existence of noisy strokes, but not so much of the "I can't sketch". We consequently design a stroke subset selector that {detects noisy strokes, leaving only those} which make a positive contribution towards successful retrieval. Our Reinforcement Learning based formulation quantifies the importance of each stroke present in a given subset, based on the extent to which that stroke contributes to retrieval. When combined with pre-trained retrieval models as a pre-processing module, we achieve a significant gain of 8%-10% over standard baselines and in turn report new state-of-the-art performance. Last but not least, we demonstrate the selector once trained, can also be used in a plug-and-play manner to empower various sketch applications in ways that were not previously possible.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Partially Does It: Towards Scene-Level FG-SBIR with Partial Input
Authors:
Pinaki Nath Chowdhury,
Ayan Kumar Bhunia,
Viswanatha Reddy Gajjala,
Aneeshan Sain,
Tao Xiang,
Yi-Zhe Song
Abstract:
We scrutinise an important observation plaguing scene-level sketch research -- that a significant portion of scene sketches are "partial". A quick pilot study reveals: (i) a scene sketch does not necessarily contain all objects in the corresponding photo, due to the subjective holistic interpretation of scenes, (ii) there exists significant empty (white) regions as a result of object-level abstrac…
▽ More
We scrutinise an important observation plaguing scene-level sketch research -- that a significant portion of scene sketches are "partial". A quick pilot study reveals: (i) a scene sketch does not necessarily contain all objects in the corresponding photo, due to the subjective holistic interpretation of scenes, (ii) there exists significant empty (white) regions as a result of object-level abstraction, and as a result, (iii) existing scene-level fine-grained sketch-based image retrieval methods collapse as scene sketches become more partial. To solve this "partial" problem, we advocate for a simple set-based approach using optimal transport (OT) to model cross-modal region associativity in a partially-aware fashion. Importantly, we improve upon OT to further account for holistic partialness by comparing intra-modal adjacency matrices. Our proposed method is not only robust to partial scene-sketches but also yields state-of-the-art performance on existing datasets.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Sketch3T: Test-Time Training for Zero-Shot SBIR
Authors:
Aneeshan Sain,
Ayan Kumar Bhunia,
Vaishnav Potlapalli,
Pinaki Nath Chowdhury,
Tao Xiang,
Yi-Zhe Song
Abstract:
Zero-shot sketch-based image retrieval typically asks for a trained model to be applied as is to unseen categories. In this paper, we question to argue that this setup by definition is not compatible with the inherent abstract and subjective nature of sketches, i.e., the model might transfer well to new categories, but will not understand sketches existing in different test-time distribution as a…
▽ More
Zero-shot sketch-based image retrieval typically asks for a trained model to be applied as is to unseen categories. In this paper, we question to argue that this setup by definition is not compatible with the inherent abstract and subjective nature of sketches, i.e., the model might transfer well to new categories, but will not understand sketches existing in different test-time distribution as a result. We thus extend ZS-SBIR asking it to transfer to both categories and sketch distributions. Our key contribution is a test-time training paradigm that can adapt using just one sketch. Since there is no paired photo, we make use of a sketch raster-vector reconstruction module as a self-supervised auxiliary task. To maintain the fidelity of the trained cross-modal joint embedding during test-time update, we design a novel meta-learning based training paradigm to learn a separation between model updates incurred by this auxiliary task from those off the primary objective of discriminative learning. Extensive experiments show our model to outperform state of-the-arts, thanks to the proposed test-time adaption that not only transfers to new categories but also accommodates to new sketching styles.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context
Authors:
Pinaki Nath Chowdhury,
Aneeshan Sain,
Ayan Kumar Bhunia,
Tao Xiang,
Yulia Gryaditskaya,
Yi-Zhe Song
Abstract:
We advance sketch research to scenes with the first dataset of freehand scene sketches, FS-COCO. With practical applications in mind, we collect sketches that convey scene content well but can be sketched within a few minutes by a person with any sketching skills. Our dataset comprises 10,000 freehand scene vector sketches with per point space-time information by 100 non-expert individuals, offeri…
▽ More
We advance sketch research to scenes with the first dataset of freehand scene sketches, FS-COCO. With practical applications in mind, we collect sketches that convey scene content well but can be sketched within a few minutes by a person with any sketching skills. Our dataset comprises 10,000 freehand scene vector sketches with per point space-time information by 100 non-expert individuals, offering both object- and scene-level abstraction. Each sketch is augmented with its text description. Using our dataset, we study for the first time the problem of fine-grained image retrieval from freehand scene sketches and sketch captions. We draw insights on: (i) Scene salience encoded in sketches using the strokes temporal order; (ii) Performance comparison of image retrieval from a scene sketch and an image caption; (iii) Complementarity of information in sketches and image captions, as well as the potential benefit of combining the two modalities. In addition, we extend a popular vector sketch LSTM-based encoder to handle sketches with larger complexity than was supported by previous work. Namely, we propose a hierarchical sketch decoder, which we leverage at a sketch-specific "pre-text" task. Our dataset enables for the first time research on freehand scene sketch understanding and its practical applications.
△ Less
Submitted 20 July, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
From Utility to Capability: A New Paradigm to Conceptualize and Develop Inclusive PETs
Authors:
Partha Das Chowdhury,
Andres Dominguez,
Kopo M. Ramokapane,
Awais Rashid
Abstract:
The wider adoption of PETs has relied on usability studies, which focus mainly on an assessment of how a specified group of users interface, in particular contexts, with the technical properties of a system. While human centred efforts in usability aim to achieve important technical improvements and drive technology adoption, a focus on the usability of PETs alone is not enough. PETs development a…
▽ More
The wider adoption of PETs has relied on usability studies, which focus mainly on an assessment of how a specified group of users interface, in particular contexts, with the technical properties of a system. While human centred efforts in usability aim to achieve important technical improvements and drive technology adoption, a focus on the usability of PETs alone is not enough. PETs development and adoption requires a broadening of focus to adequately capture the specific needs of individuals, particularly of vulnerable individuals and or individuals in marginalized populations. We argue for a departure, from the utilitarian evaluation of surface features aimed at maximizing adoption, towards a bottom up evaluation of what real opportunities humans have to use a particular system. We delineate a new paradigm for the way PETs are conceived and developed. To that end, we propose that Amartya Sen s capability approach offers a foundation for the comprehensive evaluation of the opportunities individuals have based on their personal and environmental circumstances which can, in turn, inform the evolution of PETs. This includes considerations of vulnerability, age, education, physical and mental ability, language barriers, gender, access to technology, freedom from oppression among many important contextual factors.
△ Less
Submitted 29 September, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
On Learning the Transformer Kernel
Authors:
Sankalan Pal Chowdhury,
Adamos Solomou,
Avinava Dubey,
Mrinmaya Sachan
Abstract:
In this work we introduce KERNELIZED TRANSFORMER, a generic, scalable, data driven framework for learning the kernel function in Transformers. Our framework approximates the Transformer kernel as a dot product between spectral feature maps and learns the kernel by learning the spectral distribution. This not only helps in learning a generic kernel end-to-end, but also reduces the time and space co…
▽ More
In this work we introduce KERNELIZED TRANSFORMER, a generic, scalable, data driven framework for learning the kernel function in Transformers. Our framework approximates the Transformer kernel as a dot product between spectral feature maps and learns the kernel by learning the spectral distribution. This not only helps in learning a generic kernel end-to-end, but also reduces the time and space complexity of Transformers from quadratic to linear. We show that KERNELIZED TRANSFORMERS achieve performance comparable to existing efficient Transformer architectures, both in terms of accuracy as well as computational efficiency. Our study also demonstrates that the choice of the kernel has a substantial impact on performance, and kernel learning variants are competitive alternatives to fixed kernel Transformers, both in long as well as short sequence tasks.
△ Less
Submitted 21 July, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
Nonparametric Functional Analysis of Generalized Linear Models Under Nonlinear Constraints
Authors:
K. P. Chowdhury
Abstract:
This article introduces a novel nonparametric methodology for Generalized Linear Models which combines the strengths of the binary regression and latent variable formulations for categorical data, while overcoming their disadvantages. Requiring minimal assumptions, it extends recently published parametric versions of the methodology and generalizes it. If the underlying data generating process is…
▽ More
This article introduces a novel nonparametric methodology for Generalized Linear Models which combines the strengths of the binary regression and latent variable formulations for categorical data, while overcoming their disadvantages. Requiring minimal assumptions, it extends recently published parametric versions of the methodology and generalizes it. If the underlying data generating process is asymmetric, it gives uniformly better prediction and inference performance over the parametric formulation. Furthermore, it introduces a new classification statistic utilizing which I show that overall, it has better model fit, inference and classification performance than the parametric version, and the difference in performance is statistically significant especially if the data generating process is asymmetric. In addition, the methodology can be used to perform model diagnostics for any model specification. This is a highly useful result, and it extends existing work for categorical model diagnostics broadly across the sciences. The mathematical results also highlight important new findings regarding the interplay of statistical significance and scientific significance. Finally, the methodology is applied to various real-world datasets to show that it may outperform widely used existing models, including Random Forests and Deep Neural Networks with very few iterations.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
SketchLattice: Latticed Representation for Sketch Manipulation
Authors:
Yonggang Qi,
Guoyao Su,
Pinaki Nath Chowdhury,
Mingkang Li,
Yi-Zhe Song
Abstract:
The key challenge in designing a sketch representation lies with handling the abstract and iconic nature of sketches. Existing work predominantly utilizes either, (i) a pixelative format that treats sketches as natural images employing off-the-shelf CNN-based networks, or (ii) an elaborately designed vector format that leverages the structural information of drawing orders using sequential RNN-bas…
▽ More
The key challenge in designing a sketch representation lies with handling the abstract and iconic nature of sketches. Existing work predominantly utilizes either, (i) a pixelative format that treats sketches as natural images employing off-the-shelf CNN-based networks, or (ii) an elaborately designed vector format that leverages the structural information of drawing orders using sequential RNN-based methods. While the pixelative format lacks intuitive exploitation of structural cues, sketches in vector format are absent in most cases limiting their practical usage. Hence, in this paper, we propose a lattice structured sketch representation that not only removes the bottleneck of requiring vector data but also preserves the structural cues that vector data provides. Essentially, sketch lattice is a set of points sampled from the pixelative format of the sketch using a lattice graph. We show that our lattice structure is particularly amenable to structural changes that largely benefits sketch abstraction modeling for generation tasks. Our lattice representation could be effectively encoded using a graph model, that uses significantly fewer model parameters (13.5 times lesser) than existing state-of-the-art. Extensive experiments demonstrate the effectiveness of sketch lattice for sketch manipulation, including sketch healing and image-to-sketch synthesis.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
XCI-Sketch: Extraction of Color Information from Images for Generation of Colored Outlines and Sketches
Authors:
V Manushree,
Sameer Saxena,
Parna Chowdhury,
Manisimha Varma,
Harsh Rathod,
Ankita Ghosh,
Sahil Khose
Abstract:
Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color…
▽ More
Sketches are a medium to convey a visual scene from an individual's creative perspective. The addition of color substantially enhances the overall expressivity of a sketch. This paper proposes two methods to mimic human-drawn colored sketches by utilizing the Contour Drawing Dataset. Our first approach renders colored outline sketches by applying image processing techniques aided by k-means color clustering. The second method uses a generative adversarial network to develop a model that can generate colored sketches from previously unobserved images. We assess the results obtained through quantitative and qualitative evaluations.
△ Less
Submitted 7 January, 2022; v1 submitted 25 August, 2021;
originally announced August 2021.