-
A Full-Stack Platform Architecture for Self-Organised Social Coordination
Authors:
Matthew Scott,
Jeremy Pitt
Abstract:
To mitigate the restrictive centralising and monopolistic tendencies of platformisation, we aim to empower local communities by democratising platforms for self-organised social coordination. Our approach is to develop an open-source, full-stack architecture for platform development that supports ease of distribution and cloning, generativity, and a variety of hosting options. The architecture con…
▽ More
To mitigate the restrictive centralising and monopolistic tendencies of platformisation, we aim to empower local communities by democratising platforms for self-organised social coordination. Our approach is to develop an open-source, full-stack architecture for platform development that supports ease of distribution and cloning, generativity, and a variety of hosting options. The architecture consists of a meta-platform that is used to instantiate a base platform with supporting libraries for generic functions, and plugins (intended to be supplied by third parties) for customisation of application-specification functionality for self-organised social coordination. Associated developer- and user-oriented toolchains support the instantiation and customisation of a platform in a two-stage process. This is demonstrated through the proof-of-concept implementation of two case studies: a platform for regular sporting association, and a platform for collective group study. We conclude by arguing that self-organisation at the application layer can be achieved by the specific supporting functionality of a full-stack architecture with complimentary developer and user toolchains.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
The Amazon Nova Family of Models: Technical Report and Model Card
Authors:
Amazon AGI,
Aaron Langford,
Aayush Shah,
Abhanshu Gupta,
Abhimanyu Bhatter,
Abhinav Goyal,
Abhinav Mathur,
Abhinav Mohanty,
Abhishek Kumar,
Abhishek Sethi,
Abi Komma,
Abner Pena,
Achin Jain,
Adam Kunysz,
Adam Opyrchal,
Adarsh Singh,
Aditya Rawal,
Adok Achar Budihal Prasad,
Adrià de Gispert,
Agnika Kumar,
Aishwarya Aryamane,
Ajay Nair,
Akilan M,
Akshaya Iyengar,
Akshaya Vishnu Kudlu Shanbhogue
, et al. (761 additional authors not shown)
Abstract:
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents…
▽ More
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.
△ Less
Submitted 17 March, 2025;
originally announced June 2025.
-
Denoising guarantees for optimized sampling schemes in compressed sensing
Authors:
Yaniv Plan,
Matthew S. Scott,
Xia Sheng,
Ozgur Yilmaz
Abstract:
Compressed sensing with subsampled unitary matrices benefits from \emph{optimized} sampling schemes, which feature improved theoretical guarantees and empirical performance relative to uniform subsampling. We provide, in a first of its kind in compressed sensing, theoretical guarantees showing that the error caused by the measurement noise vanishes with an increasing number of measurements for opt…
▽ More
Compressed sensing with subsampled unitary matrices benefits from \emph{optimized} sampling schemes, which feature improved theoretical guarantees and empirical performance relative to uniform subsampling. We provide, in a first of its kind in compressed sensing, theoretical guarantees showing that the error caused by the measurement noise vanishes with an increasing number of measurements for optimized sampling schemes, assuming that the noise is Gaussian. We moreover provide similar guarantees for measurements sampled with-replacement with arbitrary probability weights. All our results hold on prior sets contained in a union of low-dimensional subspaces. Finally, we demonstrate that this denoising behavior appears in empirical experiments with a rate that closely matches our theoretical guarantees when the prior set is the range of a generative ReLU neural network and when it is the set of sparse vectors.
△ Less
Submitted 31 March, 2025;
originally announced April 2025.
-
Distributed, communication-efficient, and differentially private estimation of KL divergence
Authors:
Mary Scott,
Sayan Biswas,
Graham Cormode,
Carsten Maple
Abstract:
A key task in managing distributed, sensitive data is to measure the extent to which a distribution changes. Understanding this drift can effectively support a variety of federated learning and analytics tasks. However, in many practical settings sharing such information can be undesirable (e.g., for privacy concerns) or infeasible (e.g., for high communication costs). In this work, we describe no…
▽ More
A key task in managing distributed, sensitive data is to measure the extent to which a distribution changes. Understanding this drift can effectively support a variety of federated learning and analytics tasks. However, in many practical settings sharing such information can be undesirable (e.g., for privacy concerns) or infeasible (e.g., for high communication costs). In this work, we describe novel algorithmic approaches for estimating the KL divergence of data across federated models of computation, under differential privacy. We analyze their theoretical properties and present an empirical study of their performance. We explore parameter settings that optimize the accuracy of the algorithm catering to each of the settings; these provide sub-variations that are applicable to real-world tasks, addressing different context- and application-specific trust level requirements. Our experimental results confirm that our private estimators achieve accuracy comparable to a baseline algorithm without differential privacy guarantees.
△ Less
Submitted 28 November, 2024; v1 submitted 25 November, 2024;
originally announced November 2024.
-
Towards Robust Federated Analytics via Differentially Private Measurements of Statistical Heterogeneity
Authors:
Mary Scott,
Graham Cormode,
Carsten Maple
Abstract:
Statistical heterogeneity is a measure of how skewed the samples of a dataset are. It is a common problem in the study of differential privacy that the usage of a statistically heterogeneous dataset results in a significant loss of accuracy. In federated scenarios, statistical heterogeneity is more likely to happen, and so the above problem is even more pressing. We explore the three most promisin…
▽ More
Statistical heterogeneity is a measure of how skewed the samples of a dataset are. It is a common problem in the study of differential privacy that the usage of a statistically heterogeneous dataset results in a significant loss of accuracy. In federated scenarios, statistical heterogeneity is more likely to happen, and so the above problem is even more pressing. We explore the three most promising ways to measure statistical heterogeneity and give formulae for their accuracy, while simultaneously incorporating differential privacy. We find the optimum privacy parameters via an analytic mechanism, which incorporates root finding methods. We validate the main theorems and related hypotheses experimentally, and test the robustness of the analytic mechanism to different heterogeneity levels. The analytic mechanism in a distributed setting delivers superior accuracy to all combinations involving the classic mechanism and/or the centralized setting. All measures of statistical heterogeneity do not lose significant accuracy when a heterogeneous sample is used.
△ Less
Submitted 28 November, 2024; v1 submitted 7 November, 2024;
originally announced November 2024.
-
Bridging Research and Practice Through Conversation: Reflecting on Our Experience
Authors:
Mayra Russo,
Mackenzie Jorgensen,
Kristen M. Scott,
Wendy Xu,
Di H. Nguyen,
Jessie Finocchiaro,
Matthew Olckers
Abstract:
While some research fields have a long history of collaborating with domain experts outside academia, many quantitative researchers do not have natural avenues to meet experts in areas where the research is later deployed. We explain how conversations -- interviews without a specific research objective -- can bridge research and practice. Using collaborative autoethnography, we reflect on our expe…
▽ More
While some research fields have a long history of collaborating with domain experts outside academia, many quantitative researchers do not have natural avenues to meet experts in areas where the research is later deployed. We explain how conversations -- interviews without a specific research objective -- can bridge research and practice. Using collaborative autoethnography, we reflect on our experience of conducting conversations with practitioners from a range of different backgrounds, including refugee rights, conservation, addiction counseling, and municipal data science. Despite these varied backgrounds, common lessons emerged, including the importance of valuing the knowledge of experts, recognizing that academic research and practice have differing objectives and timelines, understanding the limits of quantification, and avoiding data extractivism. We consider the impact of these conversations on our work, the potential roles we can serve as researchers, and the challenges we anticipate as we move forward in these collaborations.
△ Less
Submitted 17 September, 2024; v1 submitted 25 August, 2024;
originally announced September 2024.
-
Articulation Work and Tinkering for Fairness in Machine Learning
Authors:
Miriam Fahimi,
Mayra Russo,
Kristen M. Scott,
Maria-Esther Vidal,
Bettina Berendt,
Katharina Kinder-Kurlanda
Abstract:
The field of fair AI aims to counter biased algorithms through computational modelling. However, it faces increasing criticism for perpetuating the use of overly technical and reductionist methods. As a result, novel approaches appear in the field to address more socially-oriented and interdisciplinary (SOI) perspectives on fair AI. In this paper, we take this dynamic as the starting point to stud…
▽ More
The field of fair AI aims to counter biased algorithms through computational modelling. However, it faces increasing criticism for perpetuating the use of overly technical and reductionist methods. As a result, novel approaches appear in the field to address more socially-oriented and interdisciplinary (SOI) perspectives on fair AI. In this paper, we take this dynamic as the starting point to study the tension between computer science (CS) and SOI research. By drawing on STS and CSCW theory, we position fair AI research as a matter of 'organizational alignment': what makes research 'doable' is the successful alignment of three levels of work organization (the social world, the laboratory, and the experiment). Based on qualitative interviews with CS researchers, we analyze the tasks, resources, and actors required for doable research in the case of fair AI. We find that CS researchers engage with SOI research to some extent, but organizational conditions, articulation work, and ambiguities of the social world constrain the doability of SOI research for them. Based on our findings, we identify and discuss problems for aligning CS and SOI as fair AI continues to evolve.
△ Less
Submitted 28 August, 2024; v1 submitted 23 July, 2024;
originally announced July 2024.
-
Social Deliberation vs. Social Contracts in Self-Governing Voluntary Organisations
Authors:
Matthew Scott,
Asimina Mertzani,
Ciske Smit,
Stefan Sarkadi,
Jeremy Pitt
Abstract:
Self-organising multi-agent systems regulate their components' behaviour voluntarily, according to a set of socially-constructed, mutually-agreed, and mutable social arrangements. In some systems, these arrangements may be applied with a frequency, at a scale and within implicit cost constraints such that performance becomes a pressing issue. This paper introduces the \textit{Megabike Scenario}, w…
▽ More
Self-organising multi-agent systems regulate their components' behaviour voluntarily, according to a set of socially-constructed, mutually-agreed, and mutable social arrangements. In some systems, these arrangements may be applied with a frequency, at a scale and within implicit cost constraints such that performance becomes a pressing issue. This paper introduces the \textit{Megabike Scenario}, which consists of a negotiated agreement on a relatively 'large' set of conventional rules, 'frequent' 'democratic' decision-making according to those rules, and a resource-bounded imperative to reach 'correct' decisions. A formalism is defined for effective rule representation and processing in the scenario, and is evaluated against five interleaved socio-functional requirements. System performance is also evaluated empirically through simulation. We conclude that to self-organise their social arrangements, agents need some awareness of their own limitations and the value of compromise.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing
Authors:
Asmita,
Yaroslav Oliinyk,
Michael Scott,
Ryan Tsang,
Chongzhou Fang,
Houman Homayoun
Abstract:
BusyBox, an open-source software bundling over 300 essential Linux commands into a single executable, is ubiquitous in Linux-based embedded devices. Vulnerabilities in BusyBox can have far-reaching consequences, affecting a wide array of devices. This research, driven by the extensive use of BusyBox, delved into its analysis. The study revealed the prevalence of older BusyBox versions in real-worl…
▽ More
BusyBox, an open-source software bundling over 300 essential Linux commands into a single executable, is ubiquitous in Linux-based embedded devices. Vulnerabilities in BusyBox can have far-reaching consequences, affecting a wide array of devices. This research, driven by the extensive use of BusyBox, delved into its analysis. The study revealed the prevalence of older BusyBox versions in real-world embedded products, prompting us to conduct fuzz testing on BusyBox. Fuzzing, a pivotal software testing method, aims to induce crashes that are subsequently scrutinized to uncover vulnerabilities. Within this study, we introduce two techniques to fortify software testing. The first technique enhances fuzzing by leveraging Large Language Models (LLM) to generate target-specific initial seeds. Our study showed a substantial increase in crashes when using LLM-generated initial seeds, highlighting the potential of LLM to efficiently tackle the typically labor-intensive task of generating target-specific initial seeds. The second technique involves repurposing previously acquired crash data from similar fuzzed targets before initiating fuzzing on a new target. This approach streamlines the time-consuming fuzz testing process by providing crash data directly to the new target before commencing fuzzing. We successfully identified crashes in the latest BusyBox target without conducting traditional fuzzing, emphasizing the effectiveness of LLM and crash reuse techniques in enhancing software testing and improving vulnerability detection in embedded systems. Additionally, manual triaging was performed to identify the nature of crashes in the latest BusyBox.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Model-adapted Fourier sampling for generative compressed sensing
Authors:
Aaron Berk,
Simone Brugiapaglia,
Yaniv Plan,
Matthew Scott,
Xia Sheng,
Ozgur Yilmaz
Abstract:
We study generative compressed sensing when the measurement matrix is randomly subsampled from a unitary matrix (with the DFT as an important special case). It was recently shown that $\textit{O}(kdn\| \boldsymbolα\|_{\infty}^{2})$ uniformly random Fourier measurements are sufficient to recover signals in the range of a neural network $G:\mathbb{R}^k \to \mathbb{R}^n$ of depth $d$, where each comp…
▽ More
We study generative compressed sensing when the measurement matrix is randomly subsampled from a unitary matrix (with the DFT as an important special case). It was recently shown that $\textit{O}(kdn\| \boldsymbolα\|_{\infty}^{2})$ uniformly random Fourier measurements are sufficient to recover signals in the range of a neural network $G:\mathbb{R}^k \to \mathbb{R}^n$ of depth $d$, where each component of the so-called local coherence vector $\boldsymbolα$ quantifies the alignment of a corresponding Fourier vector with the range of $G$. We construct a model-adapted sampling strategy with an improved sample complexity of $\textit{O}(kd\| \boldsymbolα\|_{2}^{2})$ measurements. This is enabled by: (1) new theoretical recovery guarantees that we develop for nonuniformly random sampling distributions and then (2) optimizing the sampling distribution to minimize the number of measurements needed for these guarantees. This development offers a sample complexity applicable to natural signal classes, which are often almost maximally coherent with low Fourier frequencies. Finally, we consider a surrogate sampling scheme, and validate its performance in recovery experiments using the CelebA dataset.
△ Less
Submitted 17 November, 2023; v1 submitted 7 October, 2023;
originally announced October 2023.
-
A robust synthetic data generation framework for machine learning in High-Resolution Transmission Electron Microscopy (HRTEM)
Authors:
Luis Rangel DaCosta,
Katherine Sytwu,
Catherine Groschner,
Mary Scott
Abstract:
Machine learning techniques are attractive options for developing highly-accurate automated analysis tools for nanomaterials characterization, including high-resolution transmission electron microscopy (HRTEM). However, successfully implementing such machine learning tools can be difficult due to the challenges in procuring sufficiently large, high-quality training datasets from experiments. In th…
▽ More
Machine learning techniques are attractive options for developing highly-accurate automated analysis tools for nanomaterials characterization, including high-resolution transmission electron microscopy (HRTEM). However, successfully implementing such machine learning tools can be difficult due to the challenges in procuring sufficiently large, high-quality training datasets from experiments. In this work, we introduce Construction Zone, a Python package for rapidly generating complex nanoscale atomic structures, and develop an end-to-end workflow for creating large simulated databases for training neural networks. Construction Zone enables fast, systematic sampling of realistic nanomaterial structures, and can be used as a random structure generator for simulated databases, which is important for generating large, diverse synthetic datasets. Using HRTEM imaging as an example, we train a series of neural networks on various subsets of our simulated databases to segment nanoparticles and holistically study the data curation process to understand how various aspects of the curated simulated data -- including simulation fidelity, the distribution of atomic structures, and the distribution of imaging conditions -- affect model performance across several experimental benchmarks. Using our results, we are able to achieve state-of-the-art segmentation performance on experimental HRTEM images of nanoparticles from several experimental benchmarks and, further, we discuss robust strategies for consistently achieving high performance with machine learning in experimental settings using purely synthetic data.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Generalization Across Experimental Parameters in Machine Learning Analysis of High Resolution Transmission Electron Microscopy Datasets
Authors:
Katherine Sytwu,
Luis Rangel DaCosta,
Mary C. Scott
Abstract:
Neural networks are promising tools for high-throughput and accurate transmission electron microscopy (TEM) analysis of nanomaterials, but are known to generalize poorly on data that is "out-of-distribution" from their training data. Given the limited set of image features typically seen in high-resolution TEM imaging, it is unclear which images are considered out-of-distribution from others. Here…
▽ More
Neural networks are promising tools for high-throughput and accurate transmission electron microscopy (TEM) analysis of nanomaterials, but are known to generalize poorly on data that is "out-of-distribution" from their training data. Given the limited set of image features typically seen in high-resolution TEM imaging, it is unclear which images are considered out-of-distribution from others. Here, we investigate how the choice of metadata features in the training dataset influences neural network performance, focusing on the example task of nanoparticle segmentation. We train and validate neural networks across curated, experimentally-collected high-resolution TEM image datasets of nanoparticles under controlled imaging and material parameters, including magnification, dosage, nanoparticle diameter, and nanoparticle material. Overall, we find that our neural networks are not robust across microscope parameters, but do generalize across certain sample parameters. Additionally, data preprocessing heavily influences the generalizability of neural networks trained on nominally similar datasets. Our results highlight the need to understand how dataset features affect deployment of data-driven algorithms.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Domain Adaptive Decision Trees: Implications for Accuracy and Fairness
Authors:
Jose M. Alvarez,
Kristen M. Scott,
Salvatore Ruggieri,
Bettina Berendt
Abstract:
In uses of pre-trained machine learning models, it is a known issue that the target population in which the model is being deployed may not have been reflected in the source population with which the model was trained. This can result in a biased model when deployed, leading to a reduction in model performance. One risk is that, as the population changes, certain demographic groups will be under-s…
▽ More
In uses of pre-trained machine learning models, it is a known issue that the target population in which the model is being deployed may not have been reflected in the source population with which the model was trained. This can result in a biased model when deployed, leading to a reduction in model performance. One risk is that, as the population changes, certain demographic groups will be under-served or otherwise disadvantaged by the model, even as they become more represented in the target population. The field of domain adaptation proposes techniques for a situation where label data for the target population does not exist, but some information about the target distribution does exist. In this paper we contribute to the domain adaptation literature by introducing domain-adaptive decision trees (DADT). We focus on decision trees given their growing popularity due to their interpretability and performance relative to other more complex models. With DADT we aim to improve the accuracy of models trained in a source domain (or training data) that differs from the target domain (or test data). We propose an in-processing step that adjusts the information gain split criterion with outside information corresponding to the distribution of the target population. We demonstrate DADT on real data and find that it improves accuracy over a standard decision tree when testing in a shifted target population. We also study the change in fairness under demographic parity and equal opportunity. Results show an improvement in fairness with the use of DADT.
△ Less
Submitted 31 May, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Transactional Composition of Nonblocking Data Structures
Authors:
Wentao Cai,
Haosen Wen,
Michael L. Scott
Abstract:
This paper introduces nonblocking transaction composition (NBTC), a new methodology for atomic composition of nonblocking operations on concurrent data structures. Unlike previous software transactional memory (STM) approaches, NBTC leverages the linearizability of existing nonblocking structures, reducing the number of memory accesses that must be executed together, atomically, to only one per op…
▽ More
This paper introduces nonblocking transaction composition (NBTC), a new methodology for atomic composition of nonblocking operations on concurrent data structures. Unlike previous software transactional memory (STM) approaches, NBTC leverages the linearizability of existing nonblocking structures, reducing the number of memory accesses that must be executed together, atomically, to only one per operation in most cases (these are typically the linearizing instructions of the constituent operations).
Our obstruction-free implementation of NBTC, which we call Medley, makes it easy to transform most nonblocking data structures into transactional counterparts while preserving their liveness and high concurrency. In our experiments, Medley outperforms Lock-Free Transactional Transform (LFTT), the fastest prior competing methodology, by 40--170%. The marginal overhead of Medley's transactional composition, relative to separate operations performed in succession, is roughly 2.2$\times$.
For persistent data structures, we observe that failure atomicity for transactions can be achieved "almost for free" with epoch-based periodic persistence. Toward that end, we integrate Medley with nbMontage, a general system for periodically persistent data structures. The resulting txMontage provides ACID transactions and achieves throughput up to two orders of magnitude higher than that of the OneFile persistent STM system.
△ Less
Submitted 7 January, 2023; v1 submitted 3 January, 2023;
originally announced January 2023.
-
A coherence parameter characterizing generative compressed sensing with Fourier measurements
Authors:
Aaron Berk,
Simone Brugiapaglia,
Babhru Joshi,
Yaniv Plan,
Matthew Scott,
Özgür Yilmaz
Abstract:
In Bora et al. (2017), a mathematical framework was developed for compressed sensing guarantees in the setting where the measurement matrix is Gaussian and the signal structure is the range of a generative neural network (GNN). The problem of compressed sensing with GNNs has since been extensively analyzed when the measurement matrix and/or network weights follow a subgaussian distribution. We mov…
▽ More
In Bora et al. (2017), a mathematical framework was developed for compressed sensing guarantees in the setting where the measurement matrix is Gaussian and the signal structure is the range of a generative neural network (GNN). The problem of compressed sensing with GNNs has since been extensively analyzed when the measurement matrix and/or network weights follow a subgaussian distribution. We move beyond the subgaussian assumption, to measurement matrices that are derived by sampling uniformly at random rows of a unitary matrix (including subsampled Fourier measurements as a special case). Specifically, we prove the first known restricted isometry guarantee for generative compressed sensing with subsampled isometries and provide recovery bounds, addressing an open problem of Scarlett et al. (2022, p. 10). Recovery efficacy is characterized by the coherence, a new parameter, which measures the interplay between the range of the network and the measurement matrix. Our approach relies on subspace counting arguments and ideas central to high-dimensional probability. Furthermore, we propose a regularization strategy for training GNNs to have favourable coherence with the measurement operator. We provide compelling numerical simulations that support this regularized training strategy: our strategy yields low coherence networks that require fewer measurements for signal recovery. This, together with our theoretical results, supports coherence as a natural quantity for characterizing generative compressed sensing with subsampled isometries.
△ Less
Submitted 9 November, 2022; v1 submitted 19 July, 2022;
originally announced July 2022.
-
Fairness in Agreement With European Values: An Interdisciplinary Perspective on AI Regulation
Authors:
Alejandra Bringas Colmenarejo,
Luca Nannini,
Alisa Rieger,
Kristen M. Scott,
Xuan Zhao,
Gourab K. Patro,
Gjergji Kasneci,
Katharina Kinder-Kurlanda
Abstract:
With increasing digitalization, Artificial Intelligence (AI) is becoming ubiquitous. AI-based systems to identify, optimize, automate, and scale solutions to complex economic and societal problems are being proposed and implemented. This has motivated regulation efforts, including the Proposal of an EU AI Act. This interdisciplinary position paper considers various concerns surrounding fairness an…
▽ More
With increasing digitalization, Artificial Intelligence (AI) is becoming ubiquitous. AI-based systems to identify, optimize, automate, and scale solutions to complex economic and societal problems are being proposed and implemented. This has motivated regulation efforts, including the Proposal of an EU AI Act. This interdisciplinary position paper considers various concerns surrounding fairness and discrimination in AI, and discusses how AI regulations address them, focusing on (but not limited to) the Proposal. We first look at AI and fairness through the lenses of law, (AI) industry, sociotechnology, and (moral) philosophy, and present various perspectives. Then, we map these perspectives along three axes of interests: (i) Standardization vs. Localization, (ii) Utilitarianism vs. Egalitarianism, and (iii) Consequential vs. Deontological ethics which leads us to identify a pattern of common arguments and tensions between these axes. Positioning the discussion within the axes of interest and with a focus on reconciling the key tensions, we identify and propose the roles AI Regulation should take to make the endeavor of the AI Act a success in terms of AI fairness concerns.
△ Less
Submitted 8 June, 2022;
originally announced July 2022.
-
Understanding the Influence of Receptive Field and Network Complexity in Neural-Network-Guided TEM Image Analysis
Authors:
Katherine Sytwu,
Catherine Groschner,
Mary C. Scott
Abstract:
Trained neural networks are promising tools to analyze the ever-increasing amount of scientific image data, but it is unclear how to best customize these networks for the unique features in transmission electron micrographs. Here, we systematically examine how neural network architecture choices affect how neural networks segment, or pixel-wise separate, crystalline nanoparticles from amorphous ba…
▽ More
Trained neural networks are promising tools to analyze the ever-increasing amount of scientific image data, but it is unclear how to best customize these networks for the unique features in transmission electron micrographs. Here, we systematically examine how neural network architecture choices affect how neural networks segment, or pixel-wise separate, crystalline nanoparticles from amorphous background in transmission electron microscopy (TEM) images. We focus on decoupling the influence of receptive field, or the area of the input image that contributes to the output decision, from network complexity, which dictates the number of trainable parameters. We find that for low-resolution TEM images which rely on amplitude contrast to distinguish nanoparticles from background, the receptive field does not significantly influence segmentation performance. On the other hand, for high-resolution TEM images which rely on a combination of amplitude and phase contrast changes to identify nanoparticles, receptive field is a key parameter for increased performance, especially in images with minimal amplitude contrast. Our results provide insight and guidance as to how to adapt neural networks for applications with TEM datasets.
△ Less
Submitted 8 April, 2022;
originally announced April 2022.
-
Aggregation and Transformation of Vector-Valued Messages in the Shuffle Model of Differential Privacy
Authors:
Mary Scott,
Graham Cormode,
Carsten Maple
Abstract:
Advances in communications, storage and computational technology allow significant quantities of data to be collected and processed by distributed devices. Combining the information from these endpoints can realize significant societal benefit but presents challenges in protecting the privacy of individuals, especially important in an increasingly regulated world. Differential privacy (DP) is a te…
▽ More
Advances in communications, storage and computational technology allow significant quantities of data to be collected and processed by distributed devices. Combining the information from these endpoints can realize significant societal benefit but presents challenges in protecting the privacy of individuals, especially important in an increasingly regulated world. Differential privacy (DP) is a technique that provides a rigorous and provable privacy guarantee for aggregation and release. The Shuffle Model for DP has been introduced to overcome challenges regarding the accuracy of local-DP algorithms and the privacy risks of central-DP. In this work we introduce a new protocol for vector aggregation in the context of the Shuffle Model. The aim of this paper is twofold; first, we provide a single message protocol for the summation of real vectors in the Shuffle Model, using advanced composition results. Secondly, we provide an improvement on the bound on the error achieved through using this protocol through the implementation of a Discrete Fourier Transform, thereby minimizing the initial error at the expense of the loss in accuracy through the transformation itself. This work will further the exploration of more sophisticated structures such as matrices and higher-dimensional tensors in this context, both of which are reliant on the functionality of the vector case.
△ Less
Submitted 31 January, 2022;
originally announced January 2022.
-
Applying the Shuffle Model of Differential Privacy to Vector Aggregation
Authors:
Mary Scott,
Graham Cormode,
Carsten Maple
Abstract:
In this work we introduce a new protocol for vector aggregation in the context of the Shuffle Model, a recent model within Differential Privacy (DP). It sits between the Centralized Model, which prioritizes the level of accuracy over the secrecy of the data, and the Local Model, for which an improvement in trust is counteracted by a much higher noise requirement. The Shuffle Model was developed to…
▽ More
In this work we introduce a new protocol for vector aggregation in the context of the Shuffle Model, a recent model within Differential Privacy (DP). It sits between the Centralized Model, which prioritizes the level of accuracy over the secrecy of the data, and the Local Model, for which an improvement in trust is counteracted by a much higher noise requirement. The Shuffle Model was developed to provide a good balance between these two models through the addition of a shuffling step, which unbinds the users from their data whilst maintaining a moderate noise requirement. We provide a single message protocol for the summation of real vectors in the Shuffle Model, using advanced composition results. Our contribution provides a mechanism to enable private aggregation and analysis across more sophisticated structures such as matrices and higher-dimensional tensors, both of which are reliant on the functionality of the vector case.
△ Less
Submitted 31 January, 2022; v1 submitted 10 December, 2021;
originally announced December 2021.
-
TOOD: Task-aligned One-stage Object Detection
Authors:
Chengjian Feng,
Yujie Zhong,
Yu Gao,
Matthew R. Scott,
Weilin Huang
Abstract:
One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks. In this work, we propose a Task-aligned One-stage Object Detection (TOOD) that explicitly aligns the two tasks in a learning-based manner. Fir…
▽ More
One-stage object detection is commonly implemented by optimizing two sub-tasks: object classification and localization, using heads with two parallel branches, which might lead to a certain level of spatial misalignment in predictions between the two tasks. In this work, we propose a Task-aligned One-stage Object Detection (TOOD) that explicitly aligns the two tasks in a learning-based manner. First, we design a novel Task-aligned Head (T-Head) which offers a better balance between learning task-interactive and task-specific features, as well as a greater flexibility to learn the alignment via a task-aligned predictor. Second, we propose Task Alignment Learning (TAL) to explicitly pull closer (or even unify) the optimal anchors for the two tasks during training via a designed sample assignment scheme and a task-aligned loss. Extensive experiments are conducted on MS-COCO, where TOOD achieves a 51.1 AP at single-model single-scale testing. This surpasses the recent one-stage detectors by a large margin, such as ATSS (47.7 AP), GFL (48.2 AP), and PAA (49.0 AP), with fewer parameters and FLOPs. Qualitative results also demonstrate the effectiveness of TOOD for better aligning the tasks of object classification and localization. Code is available at https://github.com/fcjian/TOOD.
△ Less
Submitted 28 August, 2021; v1 submitted 17 August, 2021;
originally announced August 2021.
-
Mutually-aware Sub-Graphs Differentiable Architecture Search
Authors:
Haoxian Tan,
Sheng Guo,
Yujie Zhong,
Matthew R. Scott,
Weilin Huang
Abstract:
Differentiable architecture search is prevalent in the field of NAS because of its simplicity and efficiency, where two paradigms, multi-path algorithms and single-path methods, are dominated. Multi-path framework (e.g. DARTS) is intuitive but suffers from memory usage and training collapse. Single-path methods (e.g.GDAS and ProxylessNAS) mitigate the memory issue and shrink the gap between search…
▽ More
Differentiable architecture search is prevalent in the field of NAS because of its simplicity and efficiency, where two paradigms, multi-path algorithms and single-path methods, are dominated. Multi-path framework (e.g. DARTS) is intuitive but suffers from memory usage and training collapse. Single-path methods (e.g.GDAS and ProxylessNAS) mitigate the memory issue and shrink the gap between searching and evaluation but sacrifice the performance. In this paper, we propose a conceptually simple yet efficient method to bridge these two paradigms, referred as Mutually-aware Sub-Graphs Differentiable Architecture Search (MSG-DAS). The core of our framework is a differentiable Gumbel-TopK sampler that produces multiple mutually exclusive single-path sub-graphs. To alleviate the severer skip-connect issue brought by multiple sub-graphs setting, we propose a Dropblock-Identity module to stabilize the optimization. To make best use of the available models (super-net and sub-graphs), we introduce a memory-efficient super-net guidance distillation to improve training. The proposed framework strikes a balance between flexible memory usage and searching quality. We demonstrate the effectiveness of our methods on ImageNet and CIFAR10, where the searched models show a comparable performance as the most recent approaches.
△ Less
Submitted 5 November, 2021; v1 submitted 9 July, 2021;
originally announced July 2021.
-
Fast Nonblocking Persistence for Concurrent Data Structures
Authors:
Wentao Cai,
Haosen Wen,
Vladimir Maksimovski,
Mingzhe Du,
Rafaello Sanna,
Shreif Abdallah,
Michael L. Scott
Abstract:
We present a fully lock-free variant of the recent Montage system for persistent data structures. Our variant, nbMontage, adds persistence to almost any nonblocking concurrent structure without introducing significant overhead or blocking of any kind. Like its predecessor, nbMontage is buffered durably linearizable: it guarantees that the state recovered in the wake of a crash will represent a con…
▽ More
We present a fully lock-free variant of the recent Montage system for persistent data structures. Our variant, nbMontage, adds persistence to almost any nonblocking concurrent structure without introducing significant overhead or blocking of any kind. Like its predecessor, nbMontage is buffered durably linearizable: it guarantees that the state recovered in the wake of a crash will represent a consistent prefix of pre-crash execution. Unlike its predecessor, nbMontage ensures wait-free progress of the persistence frontier, thereby bounding the number of recent updates that may be lost on a crash, and allowing a thread to force an update of the frontier (i.e., to perform a sync operation) without the risk of blocking. As an extra benefit, the helping mechanism employed by our wait-free sync significantly reduces its latency.
Performance results for nonblocking queues, skip lists, trees, and hash tables rival custom data structures in the literature -- dramatically faster than achieved with prior general-purpose systems, and generally within 50% of equivalent non-persistent structures placed in DRAM.
△ Less
Submitted 8 August, 2021; v1 submitted 20 May, 2021;
originally announced May 2021.
-
Rethinking Deep Contrastive Learning with Embedding Memory
Authors:
Haozhi Zhang,
Xun Wang,
Weilin Huang,
Matthew R. Scott
Abstract:
Pair-wise loss functions have been extensively studied and shown to continuously improve the performance of deep metric learning (DML). However, they are primarily designed with intuition based on simple toy examples, and experimentally identifying the truly effective design is difficult in complicated, real-world cases. In this paper, we provide a new methodology for systematically studying weigh…
▽ More
Pair-wise loss functions have been extensively studied and shown to continuously improve the performance of deep metric learning (DML). However, they are primarily designed with intuition based on simple toy examples, and experimentally identifying the truly effective design is difficult in complicated, real-world cases. In this paper, we provide a new methodology for systematically studying weighting strategies of various pair-wise loss functions, and rethink pair weighting with an embedding memory. We delve into the weighting mechanisms by decomposing the pair-wise functions, and study positive and negative weights separately using direct weight assignment. This allows us to study various weighting functions deeply and systematically via weight curves, and identify a number of meaningful, comprehensive and insightful facts, which come up with our key observation on memory-based DML: it is critical to mine hard negatives and discard easy negatives which are less informative and redundant, but weighting on positive pairs is not helpful. This results in an efficient but surprisingly simple rule to design the weighting scheme, making it significantly different from existing mini-batch based methods which design various sophisticated loss functions to weight pairs carefully. Finally, we conduct extensive experiments on three large-scale visual retrieval benchmarks, and demonstrate the superiority of memory-based DML over recent mini-batch based approaches, by using a simple contrastive loss with momentum-updated memory.
△ Less
Submitted 25 March, 2021;
originally announced March 2021.
-
Brain Image Synthesis with Unsupervised Multivariate Canonical CSC$\ell_4$Net
Authors:
Yawen Huang,
Feng Zheng,
Danyang Wang,
Weilin Huang,
Matthew R. Scott,
Ling Shao
Abstract:
Recent advances in neuroscience have highlighted the effectiveness of multi-modal medical data for investigating certain pathologies and understanding human cognition. However, obtaining full sets of different modalities is limited by various factors, such as long acquisition times, high examination costs and artifact suppression. In addition, the complexity, high dimensionality and heterogeneity…
▽ More
Recent advances in neuroscience have highlighted the effectiveness of multi-modal medical data for investigating certain pathologies and understanding human cognition. However, obtaining full sets of different modalities is limited by various factors, such as long acquisition times, high examination costs and artifact suppression. In addition, the complexity, high dimensionality and heterogeneity of neuroimaging data remains another key challenge in leveraging existing randomized scans effectively, as data of the same modality is often measured differently by different machines. There is a clear need to go beyond the traditional imaging-dependent process and synthesize anatomically specific target-modality data from a source input. In this paper, we propose to learn dedicated features that cross both intre- and intra-modal variations using a novel CSC$\ell_4$Net. Through an initial unification of intra-modal data in the feature maps and multivariate canonical adaptation, CSC$\ell_4$Net facilitates feature-level mutual transformation. The positive definite Riemannian manifold-penalized data fidelity term further enables CSC$\ell_4$Net to reconstruct missing measurements according to transformed features. Finally, the maximization $\ell_4$-norm boils down to a computationally efficient optimization problem. Extensive experiments validate the ability and robustness of our CSC$\ell_4$Net compared to the state-of-the-art methods on multiple datasets.
△ Less
Submitted 22 March, 2021;
originally announced March 2021.
-
Unchain the Search Space with Hierarchical Differentiable Architecture Search
Authors:
Guanting Liu,
Yujie Zhong,
Sheng Guo,
Matthew R. Scott,
Weilin Huang
Abstract:
Differentiable architecture search (DAS) has made great progress in searching for high-performance architectures with reduced computational cost. However, DAS-based methods mainly focus on searching for a repeatable cell structure, which is then stacked sequentially in multiple stages to form the networks. This configuration significantly reduces the search space, and ignores the importance of con…
▽ More
Differentiable architecture search (DAS) has made great progress in searching for high-performance architectures with reduced computational cost. However, DAS-based methods mainly focus on searching for a repeatable cell structure, which is then stacked sequentially in multiple stages to form the networks. This configuration significantly reduces the search space, and ignores the importance of connections between the cells. To overcome this limitation, in this paper, we propose a Hierarchical Differentiable Architecture Search (H-DAS) that performs architecture search both at the cell level and at the stage level. Specifically, the cell-level search space is relaxed so that the networks can learn stage-specific cell structures. For the stage-level search, we systematically study the architectures of stages, including the number of cells in each stage and the connections between the cells. Based on insightful observations, we design several search rules and losses, and mange to search for better stage-level architectures. Such hierarchical search space greatly improves the performance of the networks without introducing expensive search cost. Extensive experiments on CIFAR10 and ImageNet demonstrate the effectiveness of the proposed H-DAS. Moreover, the searched stage-level architectures can be combined with the cell structures searched by existing DAS methods to further boost the performance. Code is available at: https://github.com/MalongTech/research-HDAS
△ Less
Submitted 11 January, 2021; v1 submitted 11 January, 2021;
originally announced January 2021.
-
Montage: A General System for Buffered Durably Linearizable Data Structures
Authors:
Haosen Wen,
Wentao Cai,
Mingzhe Du,
Louis Jenkins,
Benjamin Valpey,
Michael L. Scott
Abstract:
The recent emergence of fast, dense, nonvolatile main memory suggests that certain long-lived data might remain in its natural pointer-rich format across program runs and hardware reboots. Operations on such data must be instrumented with explicit write-back and fence instructions to ensure consistency in the wake of a crash. Techniques to minimize the cost of this instrumentation are an active to…
▽ More
The recent emergence of fast, dense, nonvolatile main memory suggests that certain long-lived data might remain in its natural pointer-rich format across program runs and hardware reboots. Operations on such data must be instrumented with explicit write-back and fence instructions to ensure consistency in the wake of a crash. Techniques to minimize the cost of this instrumentation are an active topic of research.
We present what we believe to be the first general-purpose approach to building buffered durably linearizable persistent data structures, and a system, Montage, to support that approach. Montage is built on top of the Ralloc nonblocking persistent allocator. It employs a slow-ticking epoch clock, and ensures that no operation appears to span an epoch boundary. It also arranges to persist only that data minimally required to reconstruct the structure after a crash. If a crash occurs in epoch $e$, all work performed in epochs $e$ and $e-1$ is lost, but work from prior epochs is preserved.
We describe the implementation of Montage, argue its correctness, and report unprecedented throughput for persistent queues, sets/mappings, and general graphs.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Representation Sharing for Fast Object Detector Search and Beyond
Authors:
Yujie Zhong,
Zelu Deng,
Sheng Guo,
Matthew R. Scott,
Weilin Huang
Abstract:
Region Proposal Network (RPN) provides strong support for handling the scale variation of objects in two-stage object detection. For one-stage detectors which do not have RPN, it is more demanding to have powerful sub-networks capable of directly capturing objects of unknown sizes. To enhance such capability, we propose an extremely efficient neural architecture search method, named Fast And Diver…
▽ More
Region Proposal Network (RPN) provides strong support for handling the scale variation of objects in two-stage object detection. For one-stage detectors which do not have RPN, it is more demanding to have powerful sub-networks capable of directly capturing objects of unknown sizes. To enhance such capability, we propose an extremely efficient neural architecture search method, named Fast And Diverse (FAD), to better explore the optimal configuration of receptive fields and convolution types in the sub-networks for one-stage detectors. FAD consists of a designed search space and an efficient architecture search algorithm. The search space contains a rich set of diverse transformations designed specifically for object detection. To cope with the designed search space, a novel search algorithm termed Representation Sharing (RepShare) is proposed to effectively identify the best combinations of the defined transformations. In our experiments, FAD obtains prominent improvements on two types of one-stage detectors with various backbones. In particular, our FAD detector achieves 46.4 AP on MS-COCO (under single-scale testing), outperforming the state-of-the-art detectors, including the most recent NAS-based detectors, Auto-FPN (searched for 16 GPU-days) and NAS-FCOS (28 GPU-days), while significantly reduces the search cost to 0.6 GPU-days. Beyond object detection, we further demonstrate the generality of FAD on the more challenging instance segmentation, and expect it to benefit more tasks.
△ Less
Submitted 23 October, 2020; v1 submitted 23 July, 2020;
originally announced July 2020.
-
Deformable Siamese Attention Networks for Visual Object Tracking
Authors:
Yuechen Yu,
Yilei Xiong,
Weilin Huang,
Matthew R. Scott
Abstract:
Siamese-based trackers have achieved excellent performance on visual object tracking. However, the target template is not updated online, and the features of the target template and search image are computed independently in a Siamese architecture. In this paper, we propose Deformable Siamese Attention Networks, referred to as SiamAttn, by introducing a new Siamese attention mechanism that compute…
▽ More
Siamese-based trackers have achieved excellent performance on visual object tracking. However, the target template is not updated online, and the features of the target template and search image are computed independently in a Siamese architecture. In this paper, we propose Deformable Siamese Attention Networks, referred to as SiamAttn, by introducing a new Siamese attention mechanism that computes deformable self-attention and cross-attention. The self attention learns strong context information via spatial attention, and selectively emphasizes interdependent channel-wise features with channel attention. The cross-attention is capable of aggregating rich contextual inter-dependencies between the target template and the search image, providing an implicit manner to adaptively update the target template. In addition, we design a region refinement module that computes depth-wise cross correlations between the attentional features for more accurate tracking. We conduct experiments on six benchmarks, where our method achieves new state of-the-art results, outperforming the strong baseline, SiamRPN++ [24], by 0.464->0.537 and 0.415->0.470 EAO on VOT 2016 and 2018. Our code is available at: https://github.com/msight-tech/research-siamattn.
△ Less
Submitted 24 March, 2021; v1 submitted 14 April, 2020;
originally announced April 2020.
-
Understanding and Optimizing Persistent Memory Allocation
Authors:
Wentao Cai,
Haosen Wen,
H. Alan Beadle,
Chris Kjellqvist,
Mohammad Hedayati,
Michael L. Scott
Abstract:
The proliferation of fast, dense, byte-addressable nonvolatile memory suggests that data might be kept in pointer-rich "in-memory" format across program runs and even process and system crashes. For full generality, such data requires dynamic memory allocation, and while the allocator could in principle "rolled into" each data structure, it is desirable to make it a separate abstraction.
Toward…
▽ More
The proliferation of fast, dense, byte-addressable nonvolatile memory suggests that data might be kept in pointer-rich "in-memory" format across program runs and even process and system crashes. For full generality, such data requires dynamic memory allocation, and while the allocator could in principle "rolled into" each data structure, it is desirable to make it a separate abstraction.
Toward this end, we introduce recoverability, a correctness criterion for persistent allocators, together with a nonblocking allocator, Ralloc, that satisfies this criterion. Ralloc is based on the LRMalloc of Leite and Rocha, with three key innovations. First, we persist just enough information during normal operation to permit correct reconstruction of the heap after a full-system crash. Our reconstruction mechanism performs garbage collection (GC) to identify and remedy any failure-induced memory leaks. Second, we introduce the notion of filter functions, which identify the locations of pointers within persistent blocks to mitigate the limitations of conservative GC. Third, to allow persistent regions to be mapped at an arbitrary address, we employ position-independent (offset-based) pointers for both data and metadata.
Experiments show Ralloc to be performance-competitive with both Makalu, the state-of-the-art lock-based persistent allocator, and such transient allocators as LRMalloc and JEMalloc. In particular, reliance on GC and offline metadata reconstruction allows Ralloc to pay almost nothing for persistence during normal operation.
△ Less
Submitted 14 March, 2020;
originally announced March 2020.
-
Channel Interaction Networks for Fine-Grained Image Categorization
Authors:
Yu Gao,
Xintong Han,
Xun Wang,
Weilin Huang,
Matthew R. Scott
Abstract:
Fine-grained image categorization is challenging due to the subtle inter-class differences.We posit that exploiting the rich relationships between channels can help capture such differences since different channels correspond to different semantics. In this paper, we propose a channel interaction network (CIN), which models the channel-wise interplay both within an image and across images. For a s…
▽ More
Fine-grained image categorization is challenging due to the subtle inter-class differences.We posit that exploiting the rich relationships between channels can help capture such differences since different channels correspond to different semantics. In this paper, we propose a channel interaction network (CIN), which models the channel-wise interplay both within an image and across images. For a single image, a self-channel interaction (SCI) module is proposed to explore channel-wise correlation within the image. This allows the model to learn the complementary features from the correlated channels, yielding stronger fine-grained features. Furthermore, given an image pair, we introduce a contrastive channel interaction (CCI) module to model the cross-sample channel interaction with a metric learning framework, allowing the CIN to distinguish the subtle visual differences between images. Our model can be trained efficiently in an end-to-end fashion without the need of multi-stage training and testing. Finally, comprehensive experiments are conducted on three publicly available benchmarks, where the proposed method consistently outperforms the state-of-theart approaches, such as DFL-CNN (Wang, Morariu, and Davis 2018) and NTS (Yang et al. 2018).
△ Less
Submitted 11 March, 2020;
originally announced March 2020.
-
iFAN: Image-Instance Full Alignment Networks for Adaptive Object Detection
Authors:
Chenfan Zhuang,
Xintong Han,
Weilin Huang,
Matthew R. Scott
Abstract:
Training an object detector on a data-rich domain and applying it to a data-poor one with limited performance drop is highly attractive in industry, because it saves huge annotation cost. Recent research on unsupervised domain adaptive object detection has verified that aligning data distributions between source and target images through adversarial learning is very useful. The key is when, where…
▽ More
Training an object detector on a data-rich domain and applying it to a data-poor one with limited performance drop is highly attractive in industry, because it saves huge annotation cost. Recent research on unsupervised domain adaptive object detection has verified that aligning data distributions between source and target images through adversarial learning is very useful. The key is when, where and how to use it to achieve best practice. We propose Image-Instance Full Alignment Networks (iFAN) to tackle this problem by precisely aligning feature distributions on both image and instance levels: 1) Image-level alignment: multi-scale features are roughly aligned by training adversarial domain classifiers in a hierarchically-nested fashion. 2) Full instance-level alignment: deep semantic information and elaborate instance representations are fully exploited to establish a strong relationship among categories and domains. Establishing these correlations is formulated as a metric learning problem by carefully constructing instance pairs. Above-mentioned adaptations can be integrated into an object detector (e.g. Faster RCNN), resulting in an end-to-end trainable framework where multiple alignments can work collaboratively in a coarse-tofine manner. In two domain adaptation tasks: synthetic-to-real (SIM10K->Cityscapes) and normal-to-foggy weather (Cityscapes->Foggy Cityscapes), iFAN outperforms the state-of-the-art methods with a boost of 10%+ AP over the source-only baseline.
△ Less
Submitted 9 March, 2020;
originally announced March 2020.
-
Knowledge Integration Networks for Action Recognition
Authors:
Shiwen Zhang,
Sheng Guo,
Limin Wang,
Weilin Huang,
Matthew R. Scott
Abstract:
In this work, we propose Knowledge Integration Networks (referred as KINet) for video action recognition. KINet is capable of aggregating meaningful context features which are of great importance to identifying an action, such as human information and scene context. We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsin…
▽ More
In this work, we propose Knowledge Integration Networks (referred as KINet) for video action recognition. KINet is capable of aggregating meaningful context features which are of great importance to identifying an action, such as human information and scene context. We design a three-branch architecture consisting of a main branch for action recognition, and two auxiliary branches for human parsing and scene recognition which allow the model to encode the knowledge of human and scene for action recognition. We explore two pre-trained models as teacher networks to distill the knowledge of human and scene for training the auxiliary tasks of KINet. Furthermore, we propose a two-level knowledge encoding mechanism which contains a Cross Branch Integration (CBI) module for encoding the auxiliary knowledge into medium-level convolutional features, and an Action Knowledge Graph (AKG) for effectively fusing high-level context information. This results in an end-to-end trainable framework where the three tasks can be trained collaboratively, allowing the model to compute strong context knowledge efficiently. The proposed KINet achieves the state-of-the-art performance on a large-scale action recognition benchmark Kinetics-400, with a top-1 accuracy of 77.8%. We further demonstrate that our KINet has strong capability by transferring the Kinetics-trained model to UCF-101, where it obtains 97.8% top-1 accuracy.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.
-
V4D:4D Convolutional Neural Networks for Video-level Representation Learning
Authors:
Shiwen Zhang,
Sheng Guo,
Weilin Huang,
Matthew R. Scott,
Limin Wang
Abstract:
Most existing 3D CNNs for video representation learning are clip-based methods, and thus do not consider video-level temporal evolution of spatio-temporal features. In this paper, we propose Video-level 4D Convolutional Neural Networks, referred as V4D, to model the evolution of long-range spatio-temporal representation with 4D convolutions, and at the same time, to preserve strong 3D spatio-tempo…
▽ More
Most existing 3D CNNs for video representation learning are clip-based methods, and thus do not consider video-level temporal evolution of spatio-temporal features. In this paper, we propose Video-level 4D Convolutional Neural Networks, referred as V4D, to model the evolution of long-range spatio-temporal representation with 4D convolutions, and at the same time, to preserve strong 3D spatio-temporal representation with residual connections. Specifically, we design a new 4D residual block able to capture inter-clip interactions, which could enhance the representation power of the original clip-level 3D CNNs. The 4D residual blocks can be easily integrated into the existing 3D CNNs to perform long-range modeling hierarchically. We further introduce the training and inference methods for the proposed V4D. Extensive experiments are conducted on three video recognition benchmarks, where V4D achieves excellent results, surpassing recent 3D CNNs by a large margin.
△ Less
Submitted 18 February, 2020;
originally announced February 2020.
-
Machine Learning Pipeline for Segmentation and Defect Identification from High Resolution Transmission Electron Microscopy Data
Authors:
C. K. Groschner,
Christina Choi,
M. C. Scott
Abstract:
In the field of transmission electron microscopy, data interpretation often lags behind acquisition methods, as image processing methods often have to be manually tailored to individual datasets. Machine learning offers a promising approach for fast, accurate analysis of electron microscopy data. Here, we demonstrate a flexible two step pipeline for analysis of high resolution transmission electro…
▽ More
In the field of transmission electron microscopy, data interpretation often lags behind acquisition methods, as image processing methods often have to be manually tailored to individual datasets. Machine learning offers a promising approach for fast, accurate analysis of electron microscopy data. Here, we demonstrate a flexible two step pipeline for analysis of high resolution transmission electron microscopy data, which uses a U-Net for segmentation followed by a random forest for detection of stacking faults. Our trained U-Net is able to segment nanoparticle regions from amorphous background with a Dice coefficient of 0.8 and significantly outperforms traditional image segmentation methods. Using these segmented regions, we are then able to classify whether nanoparticles contain a visible stacking fault with 86% accuracy. We provide this adaptable pipeline as an open source tool for the community. The combined output of the segmentation network and classifier offer a way to determine statistical distributions of features of interest, such as size, shape and defect presence, enabling detection of correlations between these features.
△ Less
Submitted 23 February, 2021; v1 submitted 14 January, 2020;
originally announced January 2020.
-
Cross-Batch Memory for Embedding Learning
Authors:
Xun Wang,
Haozhi Zhang,
Weilin Huang,
Matthew R. Scott
Abstract:
Mining informative negative instances are of central importance to deep metric learning (DML), however this task is intrinsically limited by mini-batch training, where only a mini-batch of instances is accessible at each iteration. In this paper, we identify a "slow drift" phenomena by observing that the embedding features drift exceptionally slow even as the model parameters are updating througho…
▽ More
Mining informative negative instances are of central importance to deep metric learning (DML), however this task is intrinsically limited by mini-batch training, where only a mini-batch of instances is accessible at each iteration. In this paper, we identify a "slow drift" phenomena by observing that the embedding features drift exceptionally slow even as the model parameters are updating throughout the training process. This suggests that the features of instances computed at preceding iterations can be used to considerably approximate their features extracted by the current model. We propose a cross-batch memory (XBM) mechanism that memorizes the embeddings of past iterations, allowing the model to collect sufficient hard negative pairs across multiple mini-batches - even over the whole dataset. Our XBM can be directly integrated into a general pair-based DML framework, where the XBM augmented DML can boost performance considerably. In particular, without bells and whistles, a simple contrastive loss with our XBM can have large R@1 improvements of 12%-22.5% on three large-scale image retrieval datasets, surpassing the most sophisticated state-of-the-art methods, by a large margin. Our XBM is conceptually simple, easy to implement - using several lines of codes, and is memory efficient - with a negligible 0.2 GB extra GPU memory. Code is available at: https://github.com/MalongTech/research-xbm.
△ Less
Submitted 20 April, 2020; v1 submitted 14 December, 2019;
originally announced December 2019.
-
Convolutional Character Networks
Authors:
Linjie Xing,
Zhi Tian,
Weilin Huang,
Matthew R. Scott
Abstract:
Recent progress has been made on developing a unified framework for joint text detection and recognition in natural images, but existing joint models were mostly built on two-stage framework by involving ROI pooling, which can degrade the performance on recognition task. In this work, we propose convolutional character networks, referred as CharNet, which is an one-stage model that can process two…
▽ More
Recent progress has been made on developing a unified framework for joint text detection and recognition in natural images, but existing joint models were mostly built on two-stage framework by involving ROI pooling, which can degrade the performance on recognition task. In this work, we propose convolutional character networks, referred as CharNet, which is an one-stage model that can process two tasks simultaneously in one pass. CharNet directly outputs bounding boxes of words and characters, with corresponding character labels. We utilize character as basic element, allowing us to overcome the main difficulty of existing approaches that attempted to optimize text detection jointly with a RNN-based recognition branch. In addition, we develop an iterative character detection approach able to transform the ability of character detection learned from synthetic data to real-world images. These technical improvements result in a simple, compact, yet powerful one-stage model that works reliably on multi-orientation and curved text. We evaluate CharNet on three standard benchmarks, where it consistently outperforms the state-of-the-art approaches [25, 24] by a large margin, e.g., with improvements of 65.33%->71.08% (with generic lexicon) on ICDAR 2015, and 54.0%->69.23% on Total-Text, on end-to-end text recognition. Code is available at: https://github.com/MalongTech/research-charnet.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
Label-PEnet: Sequential Label Propagation and Enhancement Networks for Weakly Supervised Instance Segmentation
Authors:
Weifeng Ge,
Sheng Guo,
Weilin Huang,
Matthew R. Scott
Abstract:
Weakly-supervised instance segmentation aims to detect and segment object instances precisely, given imagelevel labels only. Unlike previous methods which are composed of multiple offline stages, we propose Sequential Label Propagation and Enhancement Networks (referred as Label-PEnet) that progressively transform image-level labels to pixel-wise labels in a coarse-to-fine manner. We design four c…
▽ More
Weakly-supervised instance segmentation aims to detect and segment object instances precisely, given imagelevel labels only. Unlike previous methods which are composed of multiple offline stages, we propose Sequential Label Propagation and Enhancement Networks (referred as Label-PEnet) that progressively transform image-level labels to pixel-wise labels in a coarse-to-fine manner. We design four cascaded modules including multi-label classification, object detection, instance refinement and instance segmentation, which are implemented sequentially by sharing the same backbone. The cascaded pipeline is trained alternatively with a curriculum learning strategy that generalizes labels from high-level images to low-level pixels gradually with increasing accuracy. In addition, we design a proposal calibration module to explore the ability of classification networks to find key pixels that identify object parts, which serves as a post validation strategy running in the inverse order. We evaluate the efficiency of our Label-PEnet in mining instance masks on standard benchmarks: PASCAL VOC 2007 and 2012. Experimental results show that Label-PEnet outperforms the state-of-the-art algorithms by a clear margin, and obtains comparable performance even with the fully-supervised approaches.
△ Less
Submitted 24 April, 2020; v1 submitted 7 October, 2019;
originally announced October 2019.
-
Dual-Stream Pyramid Registration Network
Authors:
Miao Kang,
Xiaojun Hu,
Weilin Huang,
Matthew R. Scott,
Mauricio Reyes
Abstract:
We propose a Dual-Stream Pyramid Registration Network (referred as Dual-PRNet) for unsupervised 3D medical image registration. Unlike recent CNN-based registration approaches, such as VoxelMorph, which explores a single-stream encoder-decoder network to compute a registration fields from a pair of 3D volumes, we design a two-stream architecture able to compute multi-scale registration fields from…
▽ More
We propose a Dual-Stream Pyramid Registration Network (referred as Dual-PRNet) for unsupervised 3D medical image registration. Unlike recent CNN-based registration approaches, such as VoxelMorph, which explores a single-stream encoder-decoder network to compute a registration fields from a pair of 3D volumes, we design a two-stream architecture able to compute multi-scale registration fields from convolutional feature pyramids. Our contributions are two-fold: (i) we design a two-stream 3D encoder-decoder network which computes two convolutional feature pyramids separately for a pair of input volumes, resulting in strong deep representations that are meaningful for deformation estimation; (ii) we propose a pyramid registration module able to predict multi-scale registration fields directly from the decoding feature pyramids. This allows it to refine the registration fields gradually in a coarse-to-fine manner via sequential warping, and enable the model with the capability for handling significant deformations between two volumes, such as large displacements in spatial domain or slice space. The proposed Dual-PRNet is evaluated on two standard benchmarks for brain MRI registration, where it outperforms the state-of-the-art approaches by a large margin, e.g., having improvements over recent VoxelMorph [2] with 0.683->0.778 on the LPBA40, and 0.511->0.631 on the Mindboggle101, in term of average Dice score. Code is available at: https://github.com/kangmiao15/Dual-Stream-PRNet-Plus.
△ Less
Submitted 1 April, 2023; v1 submitted 26 September, 2019;
originally announced September 2019.
-
The iMaterialist Fashion Attribute Dataset
Authors:
Sheng Guo,
Weilin Huang,
Xiao Zhang,
Prasanna Srikhanta,
Yin Cui,
Yuan Li,
Matthew R. Scott,
Hartwig Adam,
Serge Belongie
Abstract:
Large-scale image databases such as ImageNet have significantly advanced image classification and other visual recognition tasks. However much of these datasets are constructed only for single-label and coarse object-level classification. For real-world applications, multiple labels and fine-grained categories are often needed, yet very few such datasets exist publicly, especially those of large-s…
▽ More
Large-scale image databases such as ImageNet have significantly advanced image classification and other visual recognition tasks. However much of these datasets are constructed only for single-label and coarse object-level classification. For real-world applications, multiple labels and fine-grained categories are often needed, yet very few such datasets exist publicly, especially those of large-scale and high quality. In this work, we contribute to the community a new dataset called iMaterialist Fashion Attribute (iFashion-Attribute) to address this problem in the fashion domain. The dataset is constructed from over one million fashion images with a label space that includes 8 groups of 228 fine-grained attributes in total. Each image is annotated by experts with multiple, high-quality fashion attributes. The result is the first known million-scale multi-label and fine-grained image dataset. We conduct extensive experiments and provide baseline results with modern deep Convolutional Neural Networks (CNNs). Additionally, we demonstrate models pre-trained on iFashion-Attribute achieve superior transfer learning performance on fashion related tasks compared with pre-training from ImageNet or other fashion datasets. Data is available at: https://github.com/visipedia/imat_fashion_comp
△ Less
Submitted 14 June, 2019; v1 submitted 13 June, 2019;
originally announced June 2019.
-
The Stanford Acuity Test: A Precise Vision Test Using Bayesian Techniques and a Discovery in Human Visual Response
Authors:
Chris Piech,
Ali Malik,
Laura M Scott,
Robert T Chang,
Charles Lin
Abstract:
Chart-based visual acuity measurements are used by billions of people to diagnose and guide treatment of vision impairment. However, the ubiquitous eye exam has no mechanism for reasoning about uncertainty and as such, suffers from a well-documented reproducibility problem. In this paper we make two core contributions. First, we uncover a new parametric probabilistic model of visual acuity respons…
▽ More
Chart-based visual acuity measurements are used by billions of people to diagnose and guide treatment of vision impairment. However, the ubiquitous eye exam has no mechanism for reasoning about uncertainty and as such, suffers from a well-documented reproducibility problem. In this paper we make two core contributions. First, we uncover a new parametric probabilistic model of visual acuity response based on detailed measurements of patients with eye disease. Then, we present an adaptive, digital eye exam using modern artificial intelligence techniques which substantially reduces acuity exam error over existing approaches, while also introducing the novel ability to model its own uncertainty and incorporate prior beliefs. Using standard evaluation metrics, we estimate a 74% reduction in prediction error compared to the ubiquitous chart-based eye exam and up to 67% reduction compared to the previous best digital exam. For patients with eye disease, the novel ability to finely measure acuity from home could be a crucial part in early diagnosis. We provide a web implementation of our algorithm for anyone in the world to use. The insights in this paper also provide interesting implications for the field of psychometric Item Response Theory.
△ Less
Submitted 21 November, 2019; v1 submitted 4 June, 2019;
originally announced June 2019.
-
Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning
Authors:
Xun Wang,
Xintong Han,
Weilin Huang,
Dengke Dong,
Matthew R. Scott
Abstract:
A family of loss functions built on pair-based computation have been proposed in the literature which provide a myriad of solutions for deep metric learning. In this paper, we provide a general weighting framework for understanding recent pair-based loss functions. Our contributions are three-fold: (1) we establish a General Pair Weighting (GPW) framework, which casts the sampling problem of deep…
▽ More
A family of loss functions built on pair-based computation have been proposed in the literature which provide a myriad of solutions for deep metric learning. In this paper, we provide a general weighting framework for understanding recent pair-based loss functions. Our contributions are three-fold: (1) we establish a General Pair Weighting (GPW) framework, which casts the sampling problem of deep metric learning into a unified view of pair weighting through gradient analysis, providing a powerful tool for understanding recent pair-based loss functions; (2) we show that with GPW, various existing pair-based methods can be compared and discussed comprehensively, with clear differences and key limitations identified; (3) we propose a new loss called multi-similarity loss (MS loss) under the GPW, which is implemented in two iterative steps (i.e., mining and weighting). This allows it to fully consider three similarities for pair weighting, providing a more principled approach for collecting and weighting informative pairs. Finally, the proposed MS loss obtains new state-of-the-art performance on four image retrieval benchmarks, where it outperforms the most recent approaches, such as ABE\cite{Kim_2018_ECCV} and HTL by a large margin: 60.6% to 65.7% on CUB200, and 80.9% to 88.0% on In-Shop Clothes Retrieval dataset at Recall@1. Code is available at https://github.com/MalongTech/research-ms-loss.
△ Less
Submitted 22 March, 2020; v1 submitted 14 April, 2019;
originally announced April 2019.
-
Fast Intra-kernel Isolation and Security with IskiOS
Authors:
Spyridoula Gravani,
Mohammad Hedayati,
John Criswell,
Michael L. Scott
Abstract:
The kernels of operating systems such as Windows, Linux, and MacOS are vulnerable to control-flow hijacking. Defenses exist, but many require efficient intra-address-space isolation. Execute-only memory, for example, requires read protection on code segments, and shadow stacks require protection from buffer overwrites. Intel's Protection Keys for Userspace (PKU) could, in principle, provide the in…
▽ More
The kernels of operating systems such as Windows, Linux, and MacOS are vulnerable to control-flow hijacking. Defenses exist, but many require efficient intra-address-space isolation. Execute-only memory, for example, requires read protection on code segments, and shadow stacks require protection from buffer overwrites. Intel's Protection Keys for Userspace (PKU) could, in principle, provide the intra-kernel isolation needed by such defenses, but, when used as designed, it applies only to user-mode application code. This paper presents an unconventional approach to memory protection, allowing PKU to be used within the operating system kernel on existing Intel hardware, replacing the traditional user/supervisor isolation mechanism and, simultaneously, enabling efficient intra-kernel isolation. We call the resulting mechanism Protection Keys for Kernelspace (PKK). To demonstrate its utility and efficiency, we present a system we call IskiOS: a Linux variant featuring execute-only memory (XOM) and the first-ever race-free shadow stacks for x86-64. Experiments with the LMBench kernel microbenchmarks display a geometric mean overhead of about 11% for PKK and no additional overhead for XOM. IskiOS's shadow stacks bring the total to 22%. For full applications, experiments with the system benchmarks of the Phoronix test suite display negligible overhead for PKK and XOM, and less than 5% geometric mean overhead for shadow stacks.
△ Less
Submitted 2 August, 2021; v1 submitted 11 March, 2019;
originally announced March 2019.
-
Compatible and Diverse Fashion Image Inpainting
Authors:
Xintong Han,
Zuxuan Wu,
Weilin Huang,
Matthew R. Scott,
Larry S. Davis
Abstract:
Visual compatibility is critical for fashion analysis, yet is missing in existing fashion image synthesis systems. In this paper, we propose to explicitly model visual compatibility through fashion image inpainting. To this end, we present Fashion Inpainting Networks (FiNet), a two-stage image-to-image generation framework that is able to perform compatible and diverse inpainting. Disentangling th…
▽ More
Visual compatibility is critical for fashion analysis, yet is missing in existing fashion image synthesis systems. In this paper, we propose to explicitly model visual compatibility through fashion image inpainting. To this end, we present Fashion Inpainting Networks (FiNet), a two-stage image-to-image generation framework that is able to perform compatible and diverse inpainting. Disentangling the generation of shape and appearance to ensure photorealistic results, our framework consists of a shape generation network and an appearance generation network. More importantly, for each generation network, we introduce two encoders interacting with one another to learn latent code in a shared compatibility space. The latent representations are jointly optimized with the corresponding generation network to condition the synthesis process, encouraging a diverse set of generated results that are visually compatible with existing fashion garments. In addition, our framework is readily extended to clothing reconstruction and fashion transfer, with impressive results. Extensive experiments with comparisons with state-of-the-art approaches on fashion synthesis task quantitatively and qualitatively demonstrate the effectiveness of our method.
△ Less
Submitted 24 April, 2019; v1 submitted 4 February, 2019;
originally announced February 2019.
-
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Authors:
Spyridon Bakas,
Mauricio Reyes,
Andras Jakab,
Stefan Bauer,
Markus Rempfler,
Alessandro Crimi,
Russell Takeshi Shinohara,
Christoph Berger,
Sung Min Ha,
Martin Rozycki,
Marcel Prastawa,
Esther Alberts,
Jana Lipkova,
John Freymann,
Justin Kirby,
Michel Bilello,
Hassan Fathallah-Shaykh,
Roland Wiest,
Jan Kirschke,
Benedikt Wiestler,
Rivka Colen,
Aikaterini Kotrotsou,
Pamela Lamontagne,
Daniel Marcus,
Mikhail Milchenko
, et al. (402 additional authors not shown)
Abstract:
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem…
▽ More
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
△ Less
Submitted 23 April, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.
-
Deep Metric Learning with Hierarchical Triplet Loss
Authors:
Weifeng Ge,
Weilin Huang,
Dengke Dong,
Matthew R. Scott
Abstract:
We present a novel hierarchical triplet loss (HTL) capable of automatically collecting informative training samples (triplets) via a defined hierarchical tree that encodes global context information. This allows us to cope with the main limitation of random sampling in training a conventional triplet loss, which is a central issue for deep metric learning. Our main contributions are two-fold. (i)…
▽ More
We present a novel hierarchical triplet loss (HTL) capable of automatically collecting informative training samples (triplets) via a defined hierarchical tree that encodes global context information. This allows us to cope with the main limitation of random sampling in training a conventional triplet loss, which is a central issue for deep metric learning. Our main contributions are two-fold. (i) we construct a hierarchical class-level tree where neighboring classes are merged recursively. The hierarchical structure naturally captures the intrinsic data distribution over the whole database. (ii) we formulate the problem of triplet collection by introducing a new violate margin, which is computed dynamically based on the designed hierarchical tree. This allows it to automatically select meaningful hard samples with the guide of global context. It encourages the model to learn more discriminative features from visual similar classes, leading to faster convergence and better performance. Our method is evaluated on the tasks of image retrieval and face recognition, where it outperforms the standard triplet loss substantially by 1%-18%. It achieves new state-of-the-art performance on a number of benchmarks, with much fewer learning iterations.
△ Less
Submitted 16 October, 2018;
originally announced October 2018.
-
CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images
Authors:
Sheng Guo,
Weilin Huang,
Haozhi Zhang,
Chenfan Zhuang,
Dengke Dong,
Matthew R. Scott,
Dinglong Huang
Abstract:
We present a simple yet efficient approach capable of training deep neural networks on large-scale weakly-supervised web images, which are crawled raw from the Internet by using text queries, without any human annotation. We develop a principled learning strategy by leveraging curriculum learning, with the goal of handling a massive amount of noisy labels and data imbalance effectively. We design…
▽ More
We present a simple yet efficient approach capable of training deep neural networks on large-scale weakly-supervised web images, which are crawled raw from the Internet by using text queries, without any human annotation. We develop a principled learning strategy by leveraging curriculum learning, with the goal of handling a massive amount of noisy labels and data imbalance effectively. We design a new learning curriculum by measuring the complexity of data using its distribution density in a feature space, and rank the complexity in an unsupervised manner. This allows for an efficient implementation of curriculum learning on large-scale web images, resulting in a high-performance CNN model, where the negative impact of noisy labels is reduced substantially. Importantly, we show by experiments that those images with highly noisy labels can surprisingly improve the generalization capability of the model, by serving as a manner of regularization. Our approaches obtain state-of-the-art performance on four benchmarks: WebVision, ImageNet, Clothing-1M and Food-101. With an ensemble of multiple models, we achieved a top-5 error rate of 5.2% on the WebVision challenge for 1000-category classification. This result was the top performance by a wide margin, outperforming second place by a nearly 50% relative error rate. Code and models are available at: https://github.com/MalongTech/CurriculumNet .
△ Less
Submitted 18 October, 2018; v1 submitted 3 August, 2018;
originally announced August 2018.
-
Aligning coding sequences with frameshift extension penalties
Authors:
Safa Jammali,
Esaie Kuitche,
Ayoub Rachati,
François Bélanger,
Michelle Scott,
Aïda Ouangraoua
Abstract:
Frameshift translation is an important phenomenon that contributes to the appearance of novel Coding DNA Sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of genes coding regions. Frameshift translations can be identified by aligning two CDS, from a same gene or from homologous genes, while accounting for their codon structure. Two main classes of alg…
▽ More
Frameshift translation is an important phenomenon that contributes to the appearance of novel Coding DNA Sequences (CDS) and functions in gene evolution, by allowing alternative amino acid translations of genes coding regions. Frameshift translations can be identified by aligning two CDS, from a same gene or from homologous genes, while accounting for their codon structure. Two main classes of algorithms have been proposed to solve the problem of aligning CDS, either by amino acid sequence alignment back-translation, or by simultaneously accounting for the nucleotide and amino acid levels. The former does not allow to account for frameshift translations and up to now, the latter exclusively accounts for frameshift translation initiation, not accounting for the length of the translation disruption caused by a frameshift.
Here, we introduce a new scoring scheme with an algorithm for the pairwise alignment of CDS accounting for frameshift translation initiation and length, while simultaneously accounting for nucleotide and amino acid sequences. We compare the method to other CDS alignment methods based on an application to the comparison of pairs of CDS from homologous \emph{human}, \emph{mouse} and \emph{cow} genes of ten mammalian gene families from the Ensembl-Compara database. The results show that our method is particularly robust to parameter changes as compared to existing methods. It also appears to be a good compromise, performing well both in the presence and absence of frameshift translations between the CDS. An implementation of the method is available at https://github.com/UdeS-CoBIUS/FsePSA.
△ Less
Submitted 13 April, 2017; v1 submitted 27 October, 2016;
originally announced October 2016.
-
Personality, Culture, and System Factors - Impact on Affective Response to Multimedia
Authors:
Sharath Chandra Guntuku,
Michael James Scott,
Gheorghita Ghinea,
Weisi Lin
Abstract:
Whilst affective responses to various forms and genres of multimedia content have been well researched, precious few studies have investigated the combined impact that multimedia system parameters and human factors have on affect. Consequently, in this paper we explore the role that two primordial dimensions of human factors - personality and culture - in conjunction with system factors - frame ra…
▽ More
Whilst affective responses to various forms and genres of multimedia content have been well researched, precious few studies have investigated the combined impact that multimedia system parameters and human factors have on affect. Consequently, in this paper we explore the role that two primordial dimensions of human factors - personality and culture - in conjunction with system factors - frame rate, resolution, and bit rate - have on user affect and enjoyment of multimedia presentations. To this end, a two-site, cross-cultural study was undertaken, the results of which produced three predictve models. Personality and Culture traits were shown statistically to represent 5.6% of the variance in positive affect, 13.6% in negative affect and 9.3% in enjoyment. The correlation between affect and enjoyment, was significant. Predictive modeling incorporating human factors showed about 8%, 7% and 9% improvement in predicting positive affect, negative affect and enjoyment respectively when compared to models trained only on system factors. Results and analysis indicate the significant role played by human factors in influencing affect that users experience while watching multimedia.
△ Less
Submitted 18 July, 2016; v1 submitted 22 June, 2016;
originally announced June 2016.
-
Cross-Cultural Differences in Students' Intention to Use RSS Feeds between Lebanon and the United Kingdom: A Multi-Group Invariance Analysis Based on the Technology Acceptance Model
Authors:
Ali Tarhini,
Michael James Scott
Abstract:
Really Simple Syndication (RSS) offers a means for university students to receive timely updates from virtual learning environments. However, despite its utility, only 21% of students surveyed at a Lebanese university claim to have ever used the technology. To investigate whether a cultural influence is affecting intention to use RSS, the survey was extended to the British context to conduct a cro…
▽ More
Really Simple Syndication (RSS) offers a means for university students to receive timely updates from virtual learning environments. However, despite its utility, only 21% of students surveyed at a Lebanese university claim to have ever used the technology. To investigate whether a cultural influence is affecting intention to use RSS, the survey was extended to the British context to conduct a cross-cultural comparison. Using the Technology Adoption Model (TAM) as a research framework, 437 students responded to a questionnaire containing four constructs: intention to use (INT); attitude towards benefit (ATT); perceived usefulness (PU); and perceived ease of use (PEOU). Principle components analysis (PCA) and structural equation modelling (SEM) were used to explore the psychometric qualities of the scale. The results show that adoption was significantly higher, but also modest, in the British context at 36%. Configural and metric invariance were fully supported, while scalar and factorial invariance were partially supported. Analysis reveals that, as a potential consequence of culture, there are significant differences between PU and PEOU across the two contexts studied, potentially as a consequence of culture. It is recommended that faculty demonstrate to students how RSS can be used effectively in order to increase awareness and emphasise usefulness.
△ Less
Submitted 30 December, 2015;
originally announced December 2015.
-
Promoting Inclusive Design Practice at the Global Game Jam: A Pilot Evaluation
Authors:
Michael James Scott,
Gheorghita Ghinea,
Ian Hamilton
Abstract:
Games are a popular form of entertainment. However, many computer games present unnecessary barriers to players with sensory, motor and cognitive impairments. In order to overcome such pitfalls, an awareness of their impact and a willingness to apply inclusive design practice is often necessary. The Global Game Jam offers a potential avenue to promote inclusive design practices to students of game…
▽ More
Games are a popular form of entertainment. However, many computer games present unnecessary barriers to players with sensory, motor and cognitive impairments. In order to overcome such pitfalls, an awareness of their impact and a willingness to apply inclusive design practice is often necessary. The Global Game Jam offers a potential avenue to promote inclusive design practices to students of game development. As such, this paper evaluates the impact of an initiative to promote inclusive design practices during the 2014 Global Game Jam. An attitude questionnaire was distributed to both participants and non-participants at one event venue. The results indicate that, having enrolled in the initiative, students' attitudes improved. Furthermore, all attendees reported they were likely to pursue further learning opportunities and consider accessibility issues in their future games. This suggests that the Global Game Jam, and other similar events, present an attractive avenue to promote inclusive design practice within the context of digital game development. However, further analysis of submitted games, additional qualitative inquiry and a large-scale trial are needed to determine impact on practice and to form recommendations for future events.
△ Less
Submitted 18 September, 2014;
originally announced September 2014.