-
Temporal Alignment of Time Sensitive Facts with Activation Engineering
Authors:
Sanjay Govindan,
Maurice Pagnucco,
Yang Song
Abstract:
Large Language Models (LLMs) are trained on diverse and often conflicting knowledge spanning multiple domains and time periods. Some of this knowledge is only valid within specific temporal contexts, such as answering the question, "Who is the President of the United States in 2022?" Ensuring LLMs generate time appropriate responses is crucial for maintaining relevance and accuracy. In this work w…
▽ More
Large Language Models (LLMs) are trained on diverse and often conflicting knowledge spanning multiple domains and time periods. Some of this knowledge is only valid within specific temporal contexts, such as answering the question, "Who is the President of the United States in 2022?" Ensuring LLMs generate time appropriate responses is crucial for maintaining relevance and accuracy. In this work we explore activation engineering as a method for temporally aligning LLMs to improve factual recall without any training or dataset creation. In this research we explore an activation engineering technique to ground three versions of LLaMA 2 to specific points in time and examine the effects of varying injection layers and prompting strategies. Our experiments demonstrate up to a 44% and 16% improvement in relative and explicit prompting respectively, achieving comparable performance to the fine-tuning method proposed by Zhao et al. (2024) . Notably, our approach achieves similar results to the fine-tuning baseline while being significantly more computationally efficient and requiring no pre-aligned datasets.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Prototype-Based Image Prompting for Weakly Supervised Histopathological Image Segmentation
Authors:
Qingchen Tang,
Lei Fan,
Maurice Pagnucco,
Yang Song
Abstract:
Weakly supervised image segmentation with image-level labels has drawn attention due to the high cost of pixel-level annotations. Traditional methods using Class Activation Maps (CAMs) often highlight only the most discriminative regions, leading to incomplete masks. Recent approaches that introduce textual information struggle with histopathological images due to inter-class homogeneity and intra…
▽ More
Weakly supervised image segmentation with image-level labels has drawn attention due to the high cost of pixel-level annotations. Traditional methods using Class Activation Maps (CAMs) often highlight only the most discriminative regions, leading to incomplete masks. Recent approaches that introduce textual information struggle with histopathological images due to inter-class homogeneity and intra-class heterogeneity. In this paper, we propose a prototype-based image prompting framework for histopathological image segmentation. It constructs an image bank from the training set using clustering, extracting multiple prototype features per class to capture intra-class heterogeneity. By designing a matching loss between input features and class-specific prototypes using contrastive learning, our method addresses inter-class homogeneity and guides the model to generate more accurate CAMs. Experiments on four datasets (LUAD-HistoSeg, BCSS-WSSS, GCSS, and BCSS) show that our method outperforms existing weakly supervised segmentation approaches, setting new benchmarks in histopathological image segmentation.
△ Less
Submitted 15 March, 2025;
originally announced March 2025.
-
Interpretable Image Classification via Non-parametric Part Prototype Learning
Authors:
Zhijie Zhu,
Lei Fan,
Maurice Pagnucco,
Yang Song
Abstract:
Classifying images with an interpretable decision-making process is a long-standing problem in computer vision. In recent years, Prototypical Part Networks has gained traction as an approach for self-explainable neural networks, due to their ability to mimic human visual reasoning by providing explanations based on prototypical object parts. However, the quality of the explanations generated by th…
▽ More
Classifying images with an interpretable decision-making process is a long-standing problem in computer vision. In recent years, Prototypical Part Networks has gained traction as an approach for self-explainable neural networks, due to their ability to mimic human visual reasoning by providing explanations based on prototypical object parts. However, the quality of the explanations generated by these methods leaves room for improvement, as the prototypes usually focus on repetitive and redundant concepts. Leveraging recent advances in prototype learning, we present a framework for part-based interpretable image classification that learns a set of semantically distinctive object parts for each class, and provides diverse and comprehensive explanations. The core of our method is to learn the part-prototypes in a non-parametric fashion, through clustering deep features extracted from foundation vision models that encode robust semantic information. To quantitatively evaluate the quality of explanations provided by ProtoPNets, we introduce Distinctiveness Score and Comprehensiveness Score. Through evaluation on CUB-200-2011, Stanford Cars and Stanford Dogs datasets, we show that our framework compares favourably against existing ProtoPNets while achieving better interpretability. Code is available at: https://github.com/zijizhu/proto-non-param.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Structure based SAT dataset for analysing GNN generalisation
Authors:
Yi Fu,
Anthony Tompkins,
Yang Song,
Maurice Pagnucco
Abstract:
Satisfiability (SAT) solvers based on techniques such as conflict driven clause learning (CDCL) have produced excellent performance on both synthetic and real world industrial problems. While these CDCL solvers only operate on a per-problem basis, graph neural network (GNN) based solvers bring new benefits to the field by allowing practitioners to exploit knowledge gained from solved problems to e…
▽ More
Satisfiability (SAT) solvers based on techniques such as conflict driven clause learning (CDCL) have produced excellent performance on both synthetic and real world industrial problems. While these CDCL solvers only operate on a per-problem basis, graph neural network (GNN) based solvers bring new benefits to the field by allowing practitioners to exploit knowledge gained from solved problems to expedite solving of new SAT problems. However, one specific area that is often studied in the context of CDCL solvers, but largely overlooked in GNN solvers, is the relationship between graph theoretic measure of structure in SAT problems and the generalisation ability of GNN solvers. To bridge the gap between structural graph properties (e.g., modularity, self-similarity) and the generalisability (or lack thereof) of GNN based SAT solvers, we present StructureSAT: a curated dataset, along with code to further generate novel examples, containing a diverse set of SAT problems from well known problem domains. Furthermore, we utilise a novel splitting method that focuses on deconstructing the families into more detailed hierarchies based on their structural properties. With the new dataset, we aim to help explain problematic generalisation in existing GNN SAT solvers by exploiting knowledge of structural graph properties. We conclude with multiple future directions that can help researchers in GNN based SAT solving develop more effective and generalisable SAT solvers.
△ Less
Submitted 16 February, 2025;
originally announced February 2025.
-
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
Authors:
Lei Fan,
Dongdong Fan,
Zhiguang Hu,
Yiwen Ding,
Donglin Di,
Kai Yi,
Maurice Pagnucco,
Yang Song
Abstract:
We present MANTA, a visual-text anomaly detection dataset for tiny objects. The visual component comprises over 137.3K images across 38 object categories spanning five typical domains, of which 8.6K images are labeled as anomalous with pixel-level annotations. Each image is captured from five distinct viewpoints to ensure comprehensive object coverage. The text component consists of two subsets: D…
▽ More
We present MANTA, a visual-text anomaly detection dataset for tiny objects. The visual component comprises over 137.3K images across 38 object categories spanning five typical domains, of which 8.6K images are labeled as anomalous with pixel-level annotations. Each image is captured from five distinct viewpoints to ensure comprehensive object coverage. The text component consists of two subsets: Declarative Knowledge, including 875 words that describe common anomalies across various domains and specific categories, with detailed explanations for < what, why, how>, including causes and visual characteristics; and Constructivist Learning, providing 2K multiple-choice questions with varying levels of difficulty, each paired with images and corresponded answer explanations. We also propose a baseline for visual-text tasks and conduct extensive benchmarking experiments to evaluate advanced methods across different settings, highlighting the challenges and efficacy of our dataset.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
Revitalizing Reconstruction Models for Multi-class Anomaly Detection via Class-Aware Contrastive Learning
Authors:
Lei Fan,
Junjie Huang,
Donglin Di,
Anyang Su,
Maurice Pagnucco,
Yang Song
Abstract:
For anomaly detection (AD), early approaches often train separate models for individual classes, yielding high performance but posing challenges in scalability and resource management. Recent efforts have shifted toward training a single model capable of handling multiple classes. However, directly extending early AD methods to multi-class settings often results in degraded performance. In this pa…
▽ More
For anomaly detection (AD), early approaches often train separate models for individual classes, yielding high performance but posing challenges in scalability and resource management. Recent efforts have shifted toward training a single model capable of handling multiple classes. However, directly extending early AD methods to multi-class settings often results in degraded performance. In this paper, we analyze this degradation observed in reconstruction-based methods, identifying two key issues: catastrophic forgetting and inter-class confusion. To this end, we propose a plug-and-play modification by incorporating class-aware contrastive learning (CL). By explicitly leveraging raw object category information (e.g., carpet or wood) as supervised signals, we apply local CL to fine-tune multiscale features and global CL to learn more compact feature representations of normal patterns, thereby effectively adapting the models to multi-class settings. Experiments across four datasets (over 60 categories) verify the effectiveness of our approach, yielding significant improvements and superior performance compared to advanced methods. Notably, ablation studies show that even using pseudo-class labels can achieve comparable performance.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Fully Distributed, Flexible Compositional Visual Representations via Soft Tensor Products
Authors:
Bethia Sun,
Maurice Pagnucco,
Yang Song
Abstract:
Since the inception of the classicalist vs. connectionist debate, it has been argued that the ability to systematically combine symbol-like entities into compositional representations is crucial for human intelligence. In connectionist systems, the field of disentanglement has gained prominence for its ability to produce explicitly compositional representations; however, it relies on a fundamental…
▽ More
Since the inception of the classicalist vs. connectionist debate, it has been argued that the ability to systematically combine symbol-like entities into compositional representations is crucial for human intelligence. In connectionist systems, the field of disentanglement has gained prominence for its ability to produce explicitly compositional representations; however, it relies on a fundamentally symbolic, concatenative representation of compositional structure that clashes with the continuous, distributed foundations of deep learning. To resolve this tension, we extend Smolensky's Tensor Product Representation (TPR) and introduce Soft TPR, a representational form that encodes compositional structure in an inherently distributed, flexible manner, along with Soft TPR Autoencoder, a theoretically-principled architecture designed specifically to learn Soft TPRs. Comprehensive evaluations in the visual representation learning domain demonstrate that the Soft TPR framework consistently outperforms conventional disentanglement alternatives -- achieving state-of-the-art disentanglement, boosting representation learner convergence, and delivering superior sample efficiency and low-sample regime performance in downstream tasks. These findings highlight the promise of a distributed and flexible approach to representing compositional structure by potentially enhancing alignment with the core principles of deep learning over the conventional symbolic approach.
△ Less
Submitted 22 January, 2025; v1 submitted 5 December, 2024;
originally announced December 2024.
-
Dataset Distillation for Histopathology Image Classification
Authors:
Cong Cong,
Shiyu Xuan,
Sidong Liu,
Maurice Pagnucco,
Shiliang Zhang,
Yang Song
Abstract:
Deep neural networks (DNNs) have exhibited remarkable success in the field of histopathology image analysis. On the other hand, the contemporary trend of employing large models and extensive datasets has underscored the significance of dataset distillation, which involves compressing large-scale datasets into a condensed set of synthetic samples, offering distinct advantages in improving training…
▽ More
Deep neural networks (DNNs) have exhibited remarkable success in the field of histopathology image analysis. On the other hand, the contemporary trend of employing large models and extensive datasets has underscored the significance of dataset distillation, which involves compressing large-scale datasets into a condensed set of synthetic samples, offering distinct advantages in improving training efficiency and streamlining downstream applications. In this work, we introduce a novel dataset distillation algorithm tailored for histopathology image datasets (Histo-DD), which integrates stain normalisation and model augmentation into the distillation progress. Such integration can substantially enhance the compatibility with histopathology images that are often characterised by high colour heterogeneity. We conduct a comprehensive evaluation of the effectiveness of the proposed algorithm and the generated histopathology samples in both patch-level and slide-level classification tasks. The experimental results, carried out on three publicly available WSI datasets, including Camelyon16, TCGA-IDH, and UniToPath, demonstrate that the proposed Histo-DD can generate more informative synthetic patches than previous coreset selection and patch sampling methods. Moreover, the synthetic samples can preserve discriminative information, substantially reduce training efforts, and exhibit architecture-agnostic properties. These advantages indicate that synthetic samples can serve as an alternative to large-scale datasets.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
An annotated grain kernel image database for visual quality inspection
Authors:
Lei Fan,
Yiwen Ding,
Dongdong Fan,
Yong Wu,
Hongxia Chu,
Maurice Pagnucco,
Yang Song
Abstract:
We present a machine vision-based database named GrainSet for the purpose of visual quality inspection of grain kernels. The database contains more than 350K single-kernel images with experts' annotations. The grain kernels used in the study consist of four types of cereal grains including wheat, maize, sorghum and rice, and were collected from over 20 regions in 5 countries. The surface informati…
▽ More
We present a machine vision-based database named GrainSet for the purpose of visual quality inspection of grain kernels. The database contains more than 350K single-kernel images with experts' annotations. The grain kernels used in the study consist of four types of cereal grains including wheat, maize, sorghum and rice, and were collected from over 20 regions in 5 countries. The surface information of each kernel is captured by our custom-built device equipped with high-resolution optic sensor units, and corresponding sampling information and annotations include collection location and time, morphology, physical size, weight, and Damage & Unsound grain categories provided by senior inspectors. In addition, we employed a commonly used deep learning model to provide classification results as a benchmark. We believe that our GrainSet will facilitate future research in fields such as assisting inspectors in grain quality inspections, providing guidance for grain storage and trade, and contributing to applications of smart agriculture.
△ Less
Submitted 20 November, 2023;
originally announced January 2024.
-
DARNet: Bridging Domain Gaps in Cross-Domain Few-Shot Segmentation with Dynamic Adaptation
Authors:
Haoran Fan,
Qi Fan,
Maurice Pagnucco,
Yang Song
Abstract:
Few-shot segmentation (FSS) aims to segment novel classes in a query image by using only a small number of supporting images from base classes. However, in cross-domain few-shot segmentation (CD-FSS), leveraging features from label-rich domains for resource-constrained domains poses challenges due to domain discrepancies. This work presents a Dynamically Adaptive Refine (DARNet) method that aims t…
▽ More
Few-shot segmentation (FSS) aims to segment novel classes in a query image by using only a small number of supporting images from base classes. However, in cross-domain few-shot segmentation (CD-FSS), leveraging features from label-rich domains for resource-constrained domains poses challenges due to domain discrepancies. This work presents a Dynamically Adaptive Refine (DARNet) method that aims to balance generalization and specificity for CD-FSS. Our method includes the Channel Statistics Disruption (CSD) strategy, which perturbs feature channel statistics in the source domain, bolstering generalization to unknown target domains. Moreover, recognizing the variability across target domains, an Adaptive Refine Self-Matching (ARSM) method is also proposed to adjust the matching threshold and dynamically refine the prediction result with the self-matching method, enhancing accuracy. We also present a Test-Time Adaptation (TTA) method to refine the model's adaptability to diverse feature distributions. Our approach demonstrates superior performance against state-of-the-art methods in CD-FSS tasks.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Identifying the Defective: Detecting Damaged Grains for Cereal Appearance Inspection
Authors:
Lei Fan,
Yiwen Ding,
Dongdong Fan,
Yong Wu,
Maurice Pagnucco,
Yang Song
Abstract:
Cereal grain plays a crucial role in the human diet as a major source of essential nutrients. Grain Appearance Inspection (GAI) serves as an essential process to determine grain quality and facilitate grain circulation and processing. However, GAI is routinely performed manually by inspectors with cumbersome procedures, which poses a significant bottleneck in smart agriculture.
In this paper, we…
▽ More
Cereal grain plays a crucial role in the human diet as a major source of essential nutrients. Grain Appearance Inspection (GAI) serves as an essential process to determine grain quality and facilitate grain circulation and processing. However, GAI is routinely performed manually by inspectors with cumbersome procedures, which poses a significant bottleneck in smart agriculture.
In this paper, we endeavor to develop an automated GAI system:AI4GrainInsp. By analyzing the distinctive characteristics of grain kernels, we formulate GAI as a ubiquitous problem: Anomaly Detection (AD), in which healthy and edible kernels are considered normal samples while damaged grains or unknown objects are regarded as anomalies. We further propose an AD model, called AD-GAI, which is trained using only normal samples yet can identify anomalies during inference. Moreover, we customize a prototype device for data acquisition and create a large-scale dataset including 220K high-quality images of wheat and maize kernels. Through extensive experiments, AD-GAI achieves considerable performance in comparison with advanced AD methods, and AI4GrainInsp has highly consistent performance compared to human experts and excels at inspection efficiency over 20x speedup. The dataset, code and models will be released at https://github.com/hellodfan/AI4GrainInsp.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Online Learning and Planning in Cognitive Hierarchies
Authors:
Bernhard Hengst,
Maurice Pagnucco,
David Rajaratnam,
Claude Sammut,
Michael Thielscher
Abstract:
Complex robot behaviour typically requires the integration of multiple robotic and Artificial Intelligence (AI) techniques and components. Integrating such disparate components into a coherent system, while also ensuring global properties and behaviours, is a significant challenge for cognitive robotics. Using a formal framework to model the interactions between components can be an important step…
▽ More
Complex robot behaviour typically requires the integration of multiple robotic and Artificial Intelligence (AI) techniques and components. Integrating such disparate components into a coherent system, while also ensuring global properties and behaviours, is a significant challenge for cognitive robotics. Using a formal framework to model the interactions between components can be an important step in dealing with this challenge. In this paper we extend an existing formal framework [Clark et al., 2016] to model complex integrated reasoning behaviours of robotic systems; from symbolic planning through to online learning of policies and transition systems. Furthermore the new framework allows for a more flexible modelling of the interactions between different reasoning components.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition
Authors:
Junyan Wang,
Zhenhong Sun,
Yichen Qian,
Dong Gong,
Xiuyu Sun,
Ming Lin,
Maurice Pagnucco,
Yang Song
Abstract:
3D convolution neural networks (CNNs) have been the prevailing option for video recognition. To capture the temporal information, 3D convolutions are computed along the sequences, leading to cubically growing and expensive computations. To reduce the computational cost, previous methods resort to manually designed 3D/2D CNN structures with approximations or automatic search, which sacrifice the mo…
▽ More
3D convolution neural networks (CNNs) have been the prevailing option for video recognition. To capture the temporal information, 3D convolutions are computed along the sequences, leading to cubically growing and expensive computations. To reduce the computational cost, previous methods resort to manually designed 3D/2D CNN structures with approximations or automatic search, which sacrifice the modeling ability or make training time-consuming. In this work, we propose to automatically design efficient 3D CNN architectures via a novel training-free neural architecture search approach tailored for 3D CNNs considering the model complexity. To measure the expressiveness of 3D CNNs efficiently, we formulate a 3D CNN as an information system and derive an analytic entropy score, based on the Maximum Entropy Principle. Specifically, we propose a spatio-temporal entropy score (STEntr-Score) with a refinement factor to handle the discrepancy of visual information in spatial and temporal dimensions, through dynamically leveraging the correlation between the feature map size and kernel size depth-wisely. Highly efficient and expressive 3D CNN architectures, \ie entropy-based 3D CNNs (E3D family), can then be efficiently searched by maximizing the STEntr-Score under a given computational budget, via an evolutionary algorithm without training the network parameters. Extensive experiments on Something-Something V1\&V2 and Kinetics400 demonstrate that the E3D family achieves state-of-the-art performance with higher computational efficiency. Code is available at https://github.com/alibaba/lightweight-neural-architecture-search.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
Vision-based Multi-future Trajectory Prediction: A Survey
Authors:
Renhao Huang,
Hao Xue,
Maurice Pagnucco,
Flora Salim,
Yang Song
Abstract:
Vision-based trajectory prediction is an important task that supports safe and intelligent behaviours in autonomous systems. Many advanced approaches have been proposed over the years with improved spatial and temporal feature extraction. However, human behaviour is naturally diverse and uncertain. Given the past trajectory and surrounding environment information, an agent can have multiple plausi…
▽ More
Vision-based trajectory prediction is an important task that supports safe and intelligent behaviours in autonomous systems. Many advanced approaches have been proposed over the years with improved spatial and temporal feature extraction. However, human behaviour is naturally diverse and uncertain. Given the past trajectory and surrounding environment information, an agent can have multiple plausible trajectories in the future. To tackle this problem, an essential task named multi-future trajectory prediction (MTP) has recently been studied. This task aims to generate a diverse, acceptable and explainable distribution of future predictions for each agent. In this paper, we present the first survey for MTP with our unique taxonomies and a comprehensive analysis of frameworks, datasets and evaluation metrics. We also compare models on existing MTP datasets and conduct experiments on the ForkingPath dataset. Finally, we discuss multiple future directions that can help researchers develop novel multi-future trajectory prediction systems and other diverse learning tasks similar to MTP.
△ Less
Submitted 26 March, 2025; v1 submitted 21 February, 2023;
originally announced February 2023.
-
Graph-based Spatial Transformer with Memory Replay for Multi-future Pedestrian Trajectory Prediction
Authors:
Lihuan Li,
Maurice Pagnucco,
Yang Song
Abstract:
Pedestrian trajectory prediction is an essential and challenging task for a variety of real-life applications such as autonomous driving and robotic motion planning. Besides generating a single future path, predicting multiple plausible future paths is becoming popular in some recent work on trajectory prediction. However, existing methods typically emphasize spatial interactions between pedestria…
▽ More
Pedestrian trajectory prediction is an essential and challenging task for a variety of real-life applications such as autonomous driving and robotic motion planning. Besides generating a single future path, predicting multiple plausible future paths is becoming popular in some recent work on trajectory prediction. However, existing methods typically emphasize spatial interactions between pedestrians and surrounding areas but ignore the smoothness and temporal consistency of predictions. Our model aims to forecast multiple paths based on a historical trajectory by modeling multi-scale graph-based spatial transformers combined with a trajectory smoothing algorithm named ``Memory Replay'' utilizing a memory graph. Our method can comprehensively exploit the spatial information as well as correct the temporally inconsistent trajectories (e.g., sharp turns). We also propose a new evaluation metric named ``Percentage of Trajectory Usage'' to evaluate the comprehensiveness of diverse multi-future predictions. Our extensive experiments show that the proposed model achieves state-of-the-art performance on multi-future prediction and competitive results for single-future prediction. Code released at https://github.com/Jacobieee/ST-MR.
△ Less
Submitted 12 June, 2022;
originally announced June 2022.
-
Explainability in Graph Neural Networks: An Experimental Survey
Authors:
Peibo Li,
Yixing Yang,
Maurice Pagnucco,
Yang Song
Abstract:
Graph neural networks (GNNs) have been extensively developed for graph representation learning in various application domains. However, similar to all other neural networks models, GNNs suffer from the black-box problem as people cannot understand the mechanism underlying them. To solve this problem, several GNN explainability methods have been proposed to explain the decisions made by GNNs. In th…
▽ More
Graph neural networks (GNNs) have been extensively developed for graph representation learning in various application domains. However, similar to all other neural networks models, GNNs suffer from the black-box problem as people cannot understand the mechanism underlying them. To solve this problem, several GNN explainability methods have been proposed to explain the decisions made by GNNs. In this survey, we give an overview of the state-of-the-art GNN explainability methods and how they are evaluated. Furthermore, we propose a new evaluation metric and conduct thorough experiments to compare GNN explainability methods on real world datasets. We also suggest future directions for GNN explainability.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
GrainSpace: A Large-scale Dataset for Fine-grained and Domain-adaptive Recognition of Cereal Grains
Authors:
Lei Fan,
Yiwen Ding,
Dongdong Fan,
Donglin Di,
Maurice Pagnucco,
Yang Song
Abstract:
Cereal grains are a vital part of human diets and are important commodities for people's livelihood and international trade. Grain Appearance Inspection (GAI) serves as one of the crucial steps for the determination of grain quality and grain stratification for proper circulation, storage and food processing, etc. GAI is routinely performed manually by qualified inspectors with the aid of some han…
▽ More
Cereal grains are a vital part of human diets and are important commodities for people's livelihood and international trade. Grain Appearance Inspection (GAI) serves as one of the crucial steps for the determination of grain quality and grain stratification for proper circulation, storage and food processing, etc. GAI is routinely performed manually by qualified inspectors with the aid of some hand tools. Automated GAI has the benefit of greatly assisting inspectors with their jobs but has been limited due to the lack of datasets and clear definitions of the tasks.
In this paper we formulate GAI as three ubiquitous computer vision tasks: fine-grained recognition, domain adaptation and out-of-distribution recognition. We present a large-scale and publicly available cereal grains dataset called GrainSpace. Specifically, we construct three types of device prototypes for data acquisition, and a total of 5.25 million images determined by professional inspectors. The grain samples including wheat, maize and rice are collected from five countries and more than 30 regions. We also develop a comprehensive benchmark based on semi-supervised learning and self-supervised learning techniques. To the best of our knowledge, GrainSpace is the first publicly released dataset for cereal grain inspection.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
Discriminative Latent Semantic Graph for Video Captioning
Authors:
Yang Bai,
Junyan Wang,
Yang Long,
Bingzhang Hu,
Yang Song,
Maurice Pagnucco,
Yu Guan
Abstract:
Video captioning aims to automatically generate natural language sentences that can describe the visual contents of a given video. Existing generative models like encoder-decoder frameworks cannot explicitly explore the object-level interactions and frame-level information from complex spatio-temporal data to generate semantic-rich captions. Our main contribution is to identify three key problems…
▽ More
Video captioning aims to automatically generate natural language sentences that can describe the visual contents of a given video. Existing generative models like encoder-decoder frameworks cannot explicitly explore the object-level interactions and frame-level information from complex spatio-temporal data to generate semantic-rich captions. Our main contribution is to identify three key problems in a joint framework for future video summarization tasks. 1) Enhanced Object Proposal: we propose a novel Conditional Graph that can fuse spatio-temporal information into latent object proposal. 2) Visual Knowledge: Latent Proposal Aggregation is proposed to dynamically extract visual words with higher semantic levels. 3) Sentence Validation: A novel Discriminative Language Validator is proposed to verify generated captions so that key semantic concepts can be effectively preserved. Our experiments on two public datasets (MVSD and MSR-VTT) manifest significant improvements over state-of-the-art approaches on all metrics, especially for BLEU-4 and CIDEr. Our code is available at https://github.com/baiyang4/D-LSG-Video-Caption.
△ Less
Submitted 10 August, 2021; v1 submitted 8 August, 2021;
originally announced August 2021.
-
Perceptual Context in Cognitive Hierarchies
Authors:
Bernhard Hengst,
Maurice Pagnucco,
David Rajaratnam,
Claude Sammut,
Michael Thielscher
Abstract:
Cognition does not only depend on bottom-up sensor feature abstraction, but also relies on contextual information being passed top-down. Context is higher level information that helps to predict belief states at lower levels. The main contribution of this paper is to provide a formalisation of perceptual context and its integration into a new process model for cognitive hierarchies. Several simple…
▽ More
Cognition does not only depend on bottom-up sensor feature abstraction, but also relies on contextual information being passed top-down. Context is higher level information that helps to predict belief states at lower levels. The main contribution of this paper is to provide a formalisation of perceptual context and its integration into a new process model for cognitive hierarchies. Several simple instantiations of a cognitive hierarchy are used to illustrate the role of context. Notably, we demonstrate the use context in a novel approach to visually track the pose of rigid objects with just a 2D camera.
△ Less
Submitted 7 January, 2018;
originally announced January 2018.
-
Using Modern Technologies to Capture and Share Indigenous Astronomical Knowledge
Authors:
N. M. Nakata,
D. W. Hamacher,
J. Warren,
A. Byrne,
M. Pagnucco,
R. Harley,
S. Venugopal,
K. Thorpe,
R. Neville,
R. Bolt
Abstract:
Indigenous Knowledge is important for Indigenous communities across the globe and for the advancement of our general scientific knowledge. In particular, Indigenous astronomical knowledge integrates many aspects of Indigenous Knowledge, including seasonal calendars, navigation, food economics, law, ceremony, and social structure. We aim to develop innovative ways of capturing, managing, and dissem…
▽ More
Indigenous Knowledge is important for Indigenous communities across the globe and for the advancement of our general scientific knowledge. In particular, Indigenous astronomical knowledge integrates many aspects of Indigenous Knowledge, including seasonal calendars, navigation, food economics, law, ceremony, and social structure. We aim to develop innovative ways of capturing, managing, and disseminating Indigenous astronomical knowledge for Indigenous communities and the general public for the future. Capturing, managing, and disseminating this knowledge in the digital environment poses a number of challenges, which we aim to address using a collaborative project involving experts in the higher education, library, and industry sectors. Using Microsoft's WorldWide Telescope and Rich Interactive Narratives technologies, we propose to develop software, media design, and archival management solutions to allow Indigenous communities to share their astronomical knowledge with the world on their terms and in a culturally sensitive manner.
△ Less
Submitted 3 September, 2014;
originally announced September 2014.