-
The Multiscale Surface Vision Transformer
Authors:
Simon Dahan,
Logan Z. J. Williams,
Daniel Rueckert,
Emma C. Robinson
Abstract:
Surface meshes are a favoured domain for representing structural and functional information on the human cortex, but their complex topology and geometry pose significant challenges for deep learning analysis. While Transformers have excelled as domain-agnostic architectures for sequence-to-sequence learning, the quadratic cost of the self-attention operation remains an obstacle for many dense pred…
▽ More
Surface meshes are a favoured domain for representing structural and functional information on the human cortex, but their complex topology and geometry pose significant challenges for deep learning analysis. While Transformers have excelled as domain-agnostic architectures for sequence-to-sequence learning, the quadratic cost of the self-attention operation remains an obstacle for many dense prediction tasks. Inspired by some of the latest advances in hierarchical modelling with vision transformers, we introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning. The self-attention mechanism is applied within local-mesh-windows to allow for high-resolution sampling of the underlying data, while a shifted-window strategy improves the sharing of information between windows. Neighbouring patches are successively merged, allowing the MS-SiT to learn hierarchical representations suitable for any prediction task. Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenotyping prediction tasks using the Developing Human Connectome Project (dHCP) dataset. Furthermore, building the MS-SiT backbone into a U-shaped architecture for surface segmentation demonstrates competitive results on cortical parcellation using the UK Biobank (UKB) and manually-annotated MindBoggle datasets. Code and trained models are publicly available at https://github.com/metrics-lab/surface-vision-transformers.
△ Less
Submitted 11 June, 2024; v1 submitted 21 March, 2023;
originally announced March 2023.
-
MWAX: A New Correlator for the Murchison Widefield Array
Authors:
I. S. Morrison,
B. Crosse,
G. Sleap,
R. B. Wayth,
A. Williams,
M. Johnston-Hollitt,
J. Jones,
S. J. Tingay,
M. Walker,
L. Williams
Abstract:
We describe the design, validation, and commissioning of a new correlator termed "MWAX" for the Murchison Widefield Array (MWA) low-frequency radio telescope. MWAX replaces an earlier generation MWA correlator, extending correlation capabilities and providing greater flexibility, scalability, and maintainability. MWAX is designed to exploit current and future Phase II/III upgrades to MWA infrastru…
▽ More
We describe the design, validation, and commissioning of a new correlator termed "MWAX" for the Murchison Widefield Array (MWA) low-frequency radio telescope. MWAX replaces an earlier generation MWA correlator, extending correlation capabilities and providing greater flexibility, scalability, and maintainability. MWAX is designed to exploit current and future Phase II/III upgrades to MWA infrastructure, most notably the simultaneous correlation of all 256 of the MWA's antenna tiles (and potentially more in future). MWAX is a fully software-programmable correlator based around an ethernet multicast architecture. At its core is a cluster of 24 high-performance GPU-enabled commercial-off-the-shelf compute servers that together process in real-time up to 24 coarse channels of 1.28 MHz bandwidth each. The system is highly flexible and scalable in terms of the number of antenna tiles and number of coarse channels to be correlated, and it offers a wide range of frequency / time resolution combinations to users. We conclude with a roadmap of future enhancements and extensions that we anticipate will be progressively rolled out over time.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
SecretBench: A Dataset of Software Secrets
Authors:
Setu Kumar Basak,
Lorenzo Neil,
Bradley Reaves,
Laurie Williams
Abstract:
According to GitGuardian's monitoring of public GitHub repositories, the exposure of secrets (API keys and other credentials) increased two-fold in 2021 compared to 2020, totaling more than six million secrets. However, no benchmark dataset is publicly available for researchers and tool developers to evaluate secret detection tools that produce many false positive warnings. The goal of our paper i…
▽ More
According to GitGuardian's monitoring of public GitHub repositories, the exposure of secrets (API keys and other credentials) increased two-fold in 2021 compared to 2020, totaling more than six million secrets. However, no benchmark dataset is publicly available for researchers and tool developers to evaluate secret detection tools that produce many false positive warnings. The goal of our paper is to aid researchers and tool developers in evaluating and improving secret detection tools by curating a benchmark dataset of secrets through a systematic collection of secrets from open-source repositories. We present a labeled dataset of source codes containing 97,479 secrets (of which 15,084 are true secrets) of various secret types extracted from 818 public GitHub repositories. The dataset covers 49 programming languages and 311 file types.
△ Less
Submitted 12 March, 2023;
originally announced March 2023.
-
A Safety Framework for Flow Decomposition Problems via Integer Linear Programming
Authors:
Fernando H. C. Dias,
Manuel Caceres,
Lucia Williams,
Brendan Mumey,
Alexandru I. Tomescu
Abstract:
Many important problems in Bioinformatics (e.g., assembly or multi-assembly) admit multiple solutions, while the final objective is to report only one. A common approach to deal with this uncertainty is finding safe partial solutions (e.g., contigs) which are common to all solutions. Previous research on safety has focused on polynomially-time solvable problems, whereas many successful and natural…
▽ More
Many important problems in Bioinformatics (e.g., assembly or multi-assembly) admit multiple solutions, while the final objective is to report only one. A common approach to deal with this uncertainty is finding safe partial solutions (e.g., contigs) which are common to all solutions. Previous research on safety has focused on polynomially-time solvable problems, whereas many successful and natural models are NP-hard to solve, leaving a lack of "safety tools" for such problems. We propose the first method for computing all safe solutions for an NP-hard problem, minimum flow decomposition. We obtain our results by developing a "safety test" for paths based on a general Integer Linear Programming (ILP) formulation. Moreover, we provide implementations with practical optimizations aimed to reduce the total ILP time, the most efficient of these being based on a recursive group-testing procedure.
Results: Experimental results on the transcriptome datasets of Shao and Kingsford (TCBB, 2017) show that all safe paths for minimum flow decompositions correctly recover up to 90% of the full RNA transcripts, which is at least 25% more than previously known safe paths, such as (Caceres et al. TCBB, 2021), (Zheng et al., RECOMB 2021), (Khan et al., RECOMB 2022, ESA 2022). Moreover, despite the NP-hardness of the problem, we can report all safe paths for 99.8% of the over 27,000 non-trivial graphs of this dataset in only 1.5 hours. Our results suggest that, on perfect data, there is less ambiguity than thought in the notoriously hard RNA assembly problem.
Availability: https://github.com/algbio/mfd-safety
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
The LOFAR LBA Sky Survey II. First data release
Authors:
F. de Gasperin,
H. W. Edler,
W. L. Williams,
J. R. Callingham,
B. Asabere,
M. Bruggen,
G. Brunetti,
T. J. Dijkema,
M. J. Hardcastle,
M. Iacobelli,
A. Offringa,
M. J. Norden,
H. J. A. Rottgering,
T. Shimwell,
R. J. van Weeren,
C. Tasse,
D. J. Bomans,
A. Bonafede,
A. Botteon,
R. Cassano,
K. T. Chyzy,
V. Cuciti,
K. L. Emig,
M. Kadler,
G. Miley
, et al. (5 additional authors not shown)
Abstract:
The Low Frequency Array (LOFAR) is the only existing radio interferometer able to observe at ultra-low frequencies (<100 MHz) with high resolution (<15") and high sensitivity (<1 mJy/beam). To exploit these capabilities, the LOFAR Surveys Key Science Project is using the LOFAR Low Band Antenna (LBA) to carry out a sensitive wide-area survey at 41-66 MHz named the LOFAR LBA Sky Survey (LoLSS). LoLS…
▽ More
The Low Frequency Array (LOFAR) is the only existing radio interferometer able to observe at ultra-low frequencies (<100 MHz) with high resolution (<15") and high sensitivity (<1 mJy/beam). To exploit these capabilities, the LOFAR Surveys Key Science Project is using the LOFAR Low Band Antenna (LBA) to carry out a sensitive wide-area survey at 41-66 MHz named the LOFAR LBA Sky Survey (LoLSS). LoLSS is covering the whole northern sky above declination 24 deg with a resolution of 15" and a sensitivity of 1-2 mJy/beam (1 sigma) depending on declination, field properties, and observing conditions. Here we present the first data release. An automated pipeline was used to reduce the 95 fields included in this data release. The data reduction procedures developed for this project have general application and are currently being used to process LOFAR LBA interferometric observations. Compared to the preliminary release, direction-dependent errors have been corrected for during the calibration process. This results in a typical sensitivity of 1.55 mJy/beam at the target resolution of 15". The first data release of the LOFAR LBA Sky Survey covers 650 sqdeg in the HETDEX spring field. The resultant data products released to the community include mosaic images (I and V Stokes) of the region, and a catalogue of 42463 detected sources and related Gaussian components used to describe sources' morphologies. Separate catalogues for 6 in-band frequencies are also released. The first data release of LoLSS shows that, despite the influences of the ionosphere, LOFAR can conduct large-scale surveys in the frequency window 42-66 MHz with unprecedentedly high sensitivity and resolution. The data can be used to derive unique information on the low-frequency spectral properties of many thousands of sources with a wide range of applications in extragalactic and galactic astronomy.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
What Challenges Do Developers Face About Checked-in Secrets in Software Artifacts?
Authors:
Setu Kumar Basak,
Lorenzo Neil,
Bradley Reaves,
Laurie Williams
Abstract:
Throughout 2021, GitGuardian's monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. To our knowledge, the challenges developers face to avoid checked-in secrets are not yet characterized. The goal of our paper is to aid researchers…
▽ More
Throughout 2021, GitGuardian's monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. To our knowledge, the challenges developers face to avoid checked-in secrets are not yet characterized. The goal of our paper is to aid researchers and tool developers in understanding and prioritizing opportunities for future research and tool automation for mitigating checked-in secrets through an empirical investigation of challenges and solutions related to checked-in secrets. We extract 779 questions related to checked-in secrets on Stack Exchange and apply qualitative analysis to determine the challenges and the solutions posed by others for each of the challenges. We identify 27 challenges and 13 solutions. The four most common challenges, in ranked order, are: (i) store/version of secrets during deployment; (ii) store/version of secrets in source code; (iii) ignore/hide of secrets in source code; and (iv) sanitize VCS history. The three most common solutions, in ranked order, are: (i) move secrets out of source code/version control and use template config file; (ii) secret management in deployment; and (iii) use local environment variables. Our findings indicate that the same solution has been mentioned to mitigate multiple challenges. However, our findings also identify an increasing trend in questions lacking accepted solutions substantiating the need for future research and tool automation on managing secrets.
△ Less
Submitted 29 January, 2023;
originally announced January 2023.
-
A Framework for Active Haptic Guidance Using Robotic Haptic Proxies
Authors:
Niall L. Williams,
Nicholas Rewkowski,
Jiasheng Li,
Ming C. Lin
Abstract:
Haptic feedback is an important component of creating an immersive mixed reality experience. Traditionally, haptic forces are rendered in response to the user's interactions with the virtual environment. In this work, we explore the idea of rendering haptic forces in a proactive manner, with the explicit intention to influence the user's behavior through compelling haptic forces. To this end, we p…
▽ More
Haptic feedback is an important component of creating an immersive mixed reality experience. Traditionally, haptic forces are rendered in response to the user's interactions with the virtual environment. In this work, we explore the idea of rendering haptic forces in a proactive manner, with the explicit intention to influence the user's behavior through compelling haptic forces. To this end, we present a framework for active haptic guidance in mixed reality, using one or more robotic haptic proxies to influence user behavior and deliver a safer and more immersive virtual experience. We provide details on common challenges that need to be overcome when implementing active haptic guidance, and discuss example applications that show how active haptic guidance can be used to influence the user's behavior. Finally, we apply active haptic guidance to a virtual reality navigation problem, and conduct a user study that demonstrates how active haptic guidance creates a safer and more immersive experience for users.
△ Less
Submitted 27 February, 2023; v1 submitted 12 January, 2023;
originally announced January 2023.
-
Efficient Graph Reconstruction and Representation Using Augmented Persistence Diagrams
Authors:
Brittany Terese Fasy,
Samuel Micka,
David L. Millman,
Anna Schenfisch,
Lucia Williams
Abstract:
Persistent homology is a tool that can be employed to summarize the shape of data by quantifying homological features. When the data is an object in $\mathbb{R}^d$, the (augmented) persistent homology transform ((A)PHT) is a family of persistence diagrams, parameterized by directions in the ambient space. A recent advance in understanding the PHT used the framework of reconstruction in order to fi…
▽ More
Persistent homology is a tool that can be employed to summarize the shape of data by quantifying homological features. When the data is an object in $\mathbb{R}^d$, the (augmented) persistent homology transform ((A)PHT) is a family of persistence diagrams, parameterized by directions in the ambient space. A recent advance in understanding the PHT used the framework of reconstruction in order to find finite a set of directions to faithfully represent the shape, a result that is of both theoretical and practical interest. In this paper, we improve upon this result and present an improved algorithm for graph -- and, more generally one-skeleton -- reconstruction. The improvement comes in reconstructing the edges, where we use a radial binary (multi-)search. The binary search employed takes advantage of the fact that the edges can be ordered radially with respect to a reference plane, a feature unique to graphs.
△ Less
Submitted 26 December, 2022;
originally announced December 2022.
-
The wide-field, multiplexed, spectroscopic facility WEAVE: Survey design, overview, and simulated implementation
Authors:
Shoko Jin,
Scott C. Trager,
Gavin B. Dalton,
J. Alfonso L. Aguerri,
J. E. Drew,
Jesús Falcón-Barroso,
Boris T. Gänsicke,
Vanessa Hill,
Angela Iovino,
Matthew M. Pieri,
Bianca M. Poggianti,
D. J. B. Smith,
Antonella Vallenari,
Don Carlos Abrams,
David S. Aguado,
Teresa Antoja,
Alfonso Aragón-Salamanca,
Yago Ascasibar,
Carine Babusiaux,
Marc Balcells,
R. Barrena,
Giuseppina Battaglia,
Vasily Belokurov,
Thomas Bensby,
Piercarlo Bonifacio
, et al. (190 additional authors not shown)
Abstract:
WEAVE, the new wide-field, massively multiplexed spectroscopic survey facility for the William Herschel Telescope, will see first light in late 2022. WEAVE comprises a new 2-degree field-of-view prime-focus corrector system, a nearly 1000-multiplex fibre positioner, 20 individually deployable 'mini' integral field units (IFUs), and a single large IFU. These fibre systems feed a dual-beam spectrogr…
▽ More
WEAVE, the new wide-field, massively multiplexed spectroscopic survey facility for the William Herschel Telescope, will see first light in late 2022. WEAVE comprises a new 2-degree field-of-view prime-focus corrector system, a nearly 1000-multiplex fibre positioner, 20 individually deployable 'mini' integral field units (IFUs), and a single large IFU. These fibre systems feed a dual-beam spectrograph covering the wavelength range 366$-$959\,nm at $R\sim5000$, or two shorter ranges at $R\sim20\,000$. After summarising the design and implementation of WEAVE and its data systems, we present the organisation, science drivers and design of a five- to seven-year programme of eight individual surveys to: (i) study our Galaxy's origins by completing Gaia's phase-space information, providing metallicities to its limiting magnitude for $\sim$3 million stars and detailed abundances for $\sim1.5$ million brighter field and open-cluster stars; (ii) survey $\sim0.4$ million Galactic-plane OBA stars, young stellar objects and nearby gas to understand the evolution of young stars and their environments; (iii) perform an extensive spectral survey of white dwarfs; (iv) survey $\sim400$ neutral-hydrogen-selected galaxies with the IFUs; (v) study properties and kinematics of stellar populations and ionised gas in $z<0.5$ cluster galaxies; (vi) survey stellar populations and kinematics in $\sim25\,000$ field galaxies at $0.3\lesssim z \lesssim 0.7$; (vii) study the cosmic evolution of accretion and star formation using $>1$ million spectra of LOFAR-selected radio sources; (viii) trace structures using intergalactic/circumgalactic gas at $z>2$. Finally, we describe the WEAVE Operational Rehearsals using the WEAVE Simulator.
△ Less
Submitted 31 October, 2023; v1 submitted 7 December, 2022;
originally announced December 2022.
-
An Extended Model of Software Configuration
Authors:
Rezvan Mahdavi-Hezaveh,
Sameeha Fatima,
Laurie Williams
Abstract:
Feature toggles and configuration options are modern programmatic techniques to easily include or exclude functionality in a software product. The research contributions to these two techniques have most often been focused on either one of them separately. However, focusing on the similarities of these two techniques may enable a more fruitful combined family of research on software configuration,…
▽ More
Feature toggles and configuration options are modern programmatic techniques to easily include or exclude functionality in a software product. The research contributions to these two techniques have most often been focused on either one of them separately. However, focusing on the similarities of these two techniques may enable a more fruitful combined family of research on software configuration, a term we use to encompass both techniques. Also, a common terminology may have enabled meta-analysis, a more practical application of the research on the two techniques, and prevented duplication of research effort. The goal of this research study is to aid researchers in conducting a family of research on software configuration by extending an existing model of software configuration that provides terminology for research studies. To achieve our goal, we started with Seigmund et al. Model of Software Configuration (MSC) which was developed based on interviews and publications on configuration options. We explicitly extend the MSC to include feature toggles and to add qualitative analysis of feature toggle-related resources. From our analysis, we proposed MSCv2 as an extended version of MSC and evaluated it through its application on five academic publications and the Chrome system. Our results indicate that multiple researchers studying the same system may provide different definitions of software configuration in their publications. Also, similar research questions may be answered on feature toggles and configuration options repeatedly because of a lack of a clear definition of software configuration. These observations indicate that having a model for defining software configuration may enable more clear and generalized research on the software configuration family of research. Practitioners benefit MSCv2 in their systems to better knowledge transfer to other practitioners and researchers.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
An investigation of security controls and MITRE ATT\&CK techniques
Authors:
Md Rayhanur Rahman,
Laurie Williams
Abstract:
Attackers utilize a plethora of adversarial techniques in cyberattacks to compromise the confidentiality, integrity, and availability of the target organizations and systems. Information security standards such as NIST, ISO/IEC specify hundreds of security controls that organizations can enforce to protect and defend the information systems from adversarial techniques. However, implementing all th…
▽ More
Attackers utilize a plethora of adversarial techniques in cyberattacks to compromise the confidentiality, integrity, and availability of the target organizations and systems. Information security standards such as NIST, ISO/IEC specify hundreds of security controls that organizations can enforce to protect and defend the information systems from adversarial techniques. However, implementing all the available controls at the same time can be infeasible and security controls need to be investigated in terms of their mitigation ability over adversarial techniques used in cyberattacks as well. The goal of this research is to aid organizations in making informed choices on security controls to defend against cyberthreats through an investigation of adversarial techniques used in current cyberattacks. In this study, we investigated the extent of mitigation of 298 NIST SP800-53 controls over 188 adversarial techniques used in 669 cybercrime groups and malware cataloged in the MITRE ATT\&CK framework based upon an existing mapping between the controls and techniques. We identify that, based on the mapping, only 101 out of 298 control are capable of mitigating adversarial techniques. However, we also identify that 53 adversarial techniques cannot be mitigated by any existing controls, and these techniques primarily aid adversaries in bypassing system defense and discovering targeted system information. We identify a set of 20 critical controls that can mitigate 134 adversarial techniques, and on average, can mitigate 72\% of all techniques used by 98\% of the cataloged adversaries in MITRE ATT\&CK. We urge organizations, that do not have any controls enforced in place, to implement the top controls identified in the study.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
Investigating co-occurrences of MITRE ATT\&CK Techniques
Authors:
Md Rayhanur Rahman,
Laurie Williams
Abstract:
Cyberattacks use adversarial techniques to bypass system defenses, persist, and eventually breach systems. The MITRE ATT\&CK framework catalogs a set of adversarial techniques and maps between adversaries and their used techniques and tactics. Understanding how adversaries deploy techniques in conjunction is pivotal for learning adversary behavior, hunting potential threats, and formulating a proa…
▽ More
Cyberattacks use adversarial techniques to bypass system defenses, persist, and eventually breach systems. The MITRE ATT\&CK framework catalogs a set of adversarial techniques and maps between adversaries and their used techniques and tactics. Understanding how adversaries deploy techniques in conjunction is pivotal for learning adversary behavior, hunting potential threats, and formulating a proactive defense. The goal of this research is to aid cybersecurity practitioners and researchers in choosing detection and mitigation strategies through co-occurrence analysis of adversarial techniques reported in MITRE ATT&CK. We collect the adversarial techniques of 115 cybercrime groups and 484 malware from the MITRE ATT\&CK. We apply association rule mining and network analysis to investigate how adversarial techniques co-occur. We identify that adversaries pair T1059: Command and scripting interface and T1105: Ingress tool transfer techniques with a relatively large number of ATT\&CK techniques. We also identify adversaries using the T1082: System Information Discovery technique to determine their next course of action. We observe adversaries deploy the highest number of techniques from the TA0005: Defense evasion and TA0007: Discovery tactics. Based on our findings on co-occurrence, we identify six detection, six mitigation strategies, and twelve adversary behaviors. We urge defenders to prioritize primarily the detection of TA0007: Discovery and mitigation of TA0005: Defense evasion techniques. Overall, this study approximates how adversaries leverage techniques based on publicly reported documents. We advocate organizations investigate adversarial techniques in their environment and make the findings available for a more precise and actionable understanding.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
Flashlights: More than A Dozen High-Significance Microlensing Events of Extremely Magnified Stars in Galaxies at Redshifts z=0.7-1.5
Authors:
Patrick L. Kelly,
Wenlei Chen,
Amruth Alfred,
Thomas J. Broadhurst,
Jose M. Diego,
Najmeh Emami,
Alexei V. Filippenko,
Allison Keen,
Sung Kei Li,
Jeremy Lim,
Ashish K. Meena,
Masamune Oguri,
Claudia Scarlata,
Tommaso Treu,
Hayley Williams,
Liliya L. R. Williams,
Rui Zhou,
Adi Zitrin,
Ryan J. Foley,
Saurabh W. Jha,
Nick Kaiser,
Vihang Mehta,
Steven Rieck,
Laura Salo,
Nathan Smith
, et al. (1 additional authors not shown)
Abstract:
Once only accessible in nearby galaxies, we can now study individual stars across much of the observable universe aided by galaxy-cluster gravitational lenses. When a star, compact object, or multiple such objects in the foreground galaxy-cluster lens become aligned, they can magnify a background individual star, and the timescale of a magnification peak can limit its size to tens of AU. The numbe…
▽ More
Once only accessible in nearby galaxies, we can now study individual stars across much of the observable universe aided by galaxy-cluster gravitational lenses. When a star, compact object, or multiple such objects in the foreground galaxy-cluster lens become aligned, they can magnify a background individual star, and the timescale of a magnification peak can limit its size to tens of AU. The number and frequency of microlensing events therefore opens a window into the population of stars and compact objects, as well as high-redshift stars. To assemble the first statistical sample of stars in order to constrain the initial mass function (IMF) of massive stars at redshift z=0.7-1.5, the abundance of primordial black holes in galaxy-cluster dark matter, and the IMF of the stars making up the intracluster light, we are carrying out a 192-orbit program with the Hubble Space Telescope called "Flashlights," which is now two-thirds complete owing to scheduling challenges. We use the ultrawide F200LP and F350LP long-pass WFC3 UVIS filters and conduct two 16-orbit visits separated by one year. Having an identical roll angle during both visits, while difficult to schedule, yields extremely clean subtraction. Here we report the discovery of more than a dozen bright microlensing events, including multiple examples in the famous "Dragon Arc" discovered in the 1980s, as well as the "Spocks" and "Warhol" arcs that have hosted already known supergiants. The ultradeep observer-frame ultraviolet-through-optical imaging is sensitive to hot stars, which will complement deep James Webb Space Telescope infrared imaging. We are also acquiring Large Binocular Telescope LUCI and Keck-I MOSFIRE near-infrared spectra of the highly magnified arcs to constrain their recent star-formation histories.
△ Less
Submitted 4 November, 2022;
originally announced November 2022.
-
Flashlights: An Off-Caustic Lensed Star at Redshift $z$ = 1.26 in Abell 370
Authors:
Ashish Kumar Meena,
Wenlei Chen,
Adi Zitrin,
Patrick L. Kelly,
Miriam Golubchik,
Rui Zhou,
Amruth Alfred,
Tom Broadhurst,
Jose M. Diego,
Masamune Oguri,
Liliya L. R. Williams,
Alexei V. Filippenko,
Sung Kei Li
Abstract:
We report the discovery of a transient seen in a strongly lensed arc at redshift $z_{\rm s}=1.2567$ in \emph{Hubble Space Telescope} imaging of the Abell 370 galaxy cluster. The transient is detected at $29.51\pm0.14$ AB mag in a WFC3/UVIS F200LP difference image made using observations from two different epochs, obtained in the framework of the \emph{Flashlights} program, and is also visible in t…
▽ More
We report the discovery of a transient seen in a strongly lensed arc at redshift $z_{\rm s}=1.2567$ in \emph{Hubble Space Telescope} imaging of the Abell 370 galaxy cluster. The transient is detected at $29.51\pm0.14$ AB mag in a WFC3/UVIS F200LP difference image made using observations from two different epochs, obtained in the framework of the \emph{Flashlights} program, and is also visible in the F350LP band ($m_{\rm F350LP} \approx 30.53\pm0.76$ AB mag). The transient is observed on the negative-parity side of the critical curve at a distance of $\sim 0.6"$ from it, greater than previous examples of lensed stars. The large distance from the critical curve yields a significantly smaller macromagnification, but our simulations show that bright, O/B-type supergiants can reach sufficiently high magnifications to be seen at the observed position and magnitude. In addition, the observed transient image is a trailing image with an observer-frame time delay of $\sim+0.8$ days from its expected counterpart, so that any transient lasting for longer than that should have also been seen on the minima side and is thus excluded. This, together with the blue colour we measure for the transient ($m_{\rm F200LP} - m_{\rm F350LP} \approx [-0.3,-1.6]$ AB), rules out most other transient candidates such as (kilo)novae, for example, and makes a lensed star the prime candidate. Assuming the transient is indeed a lensed star as suggested, many more such events should be detected in the near future in cluster surveys with the \emph{Hubble Space Telescope} and \emph{James Webb Space Telescope}.
△ Less
Submitted 5 April, 2023; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Extragalactic Peaked-Spectrum Radio Sources at Low-Frequencies are Young Radio Galaxies
Authors:
M. M. Slob,
J. R. Callingham,
H. J. A. Röttgering,
W. L. Williams,
K. J. Duncan,
F. de Gasperin,
M. J. Hardcastle,
G. K. Miley
Abstract:
We present a sample of 373 peaked-spectrum (PS) sources with spectral peaks around 150MHz, selected using a subset of two LOFAR all-sky surveys, the LOFAR Two Meter Sky Survey and the LOFAR LBA Sky Survey. These surveys are the most sensitive low-frequency widefield surveys to date, allowing us to select low-luminosity PS sources. Our sample increases the number of known PS sources in our survey a…
▽ More
We present a sample of 373 peaked-spectrum (PS) sources with spectral peaks around 150MHz, selected using a subset of two LOFAR all-sky surveys, the LOFAR Two Meter Sky Survey and the LOFAR LBA Sky Survey. These surveys are the most sensitive low-frequency widefield surveys to date, allowing us to select low-luminosity PS sources. Our sample increases the number of known PS sources in our survey area by a factor 50. The 5GHz luminosity distribution of our PS sample shows we sample the lowest luminosity PS sources to-date by nearly an order of magnitude. Since high-frequency PS sources and compact steep-spectrum sources are hypothesised to be the precursors to large radio galaxies, we investigate whether this is also the case for our sample of low-frequency PS sources. Using optical line emission criteria, we find that our PS sources are predominately high-excitation radio galaxies instead of low-excitation radio galaxies, corresponding to a quickly evolving population. We compute the radio source counts of our PS sample, and find they are scaled down by a factor of $\sim$40 compared to a general sample of radio-loud active galactic nuclei (AGN). This implies that the lifetimes of PS sources are 40 times shorter than large scale radio galaxies, if their luminosity functions are identical. To investigate this, we compute the first radio luminosity function for a homogeneously-selected PS sample. We find that for 144MHz luminosities $\gtrsim 10^{25}$W Hz$^{-1}$, the PS luminosity function has the same shape as an unresolved radio-loud AGN population but shifted down by a factor of $\sim$10. We interpret this as strong evidence that these high-luminosity PS sources evolve into large-scale radio-loud AGN. For local, low-luminosity PS sources, there is a surplus of PS sources, which we hypothesise to be the addition of frustrated PS sources that do not evolve into large-scale AGN.
△ Less
Submitted 29 October, 2022;
originally announced October 2022.
-
Do Software Security Practices Yield Fewer Vulnerabilities?
Authors:
Nusrat Zahan,
Shohanuzzaman Shohan,
Dan Harris,
Laurie Williams
Abstract:
Due to the ever-increasing security breaches, practitioners are motivated to produce more secure software. In the United States, the White House Office released a memorandum on Executive Order (EO) 14028 that mandates organizations provide self-attestation of the use of secure software development practices. The OpenSSF Scorecard project allows practitioners to measure the use of software security…
▽ More
Due to the ever-increasing security breaches, practitioners are motivated to produce more secure software. In the United States, the White House Office released a memorandum on Executive Order (EO) 14028 that mandates organizations provide self-attestation of the use of secure software development practices. The OpenSSF Scorecard project allows practitioners to measure the use of software security practices automatically. However, little research has been done to determine whether the use of security practices improves package security, particularly which security practices have the biggest impact on security outcomes. The goal of this study is to assist practitioners and researchers making informed decisions on which security practices to adopt through the development of models between software security practice scores and security vulnerability counts.
To that end, we developed five supervised machine learning models for npm and PyPI packages using the OpenSSF Scorecared security practices scores and aggregate security scores as predictors and the number of externally-reported vulnerabilities as a target variable. Our models found four security practices (Maintained, Code Review, Branch Protection, and Security Policy) were the most important practices influencing vulnerability count. However, we had low R^2 (ranging from 9% to 12%) when we tested the models to predict vulnerability counts. Additionally, we observed that the number of reported vulnerabilities increased rather than reduced as the aggregate security score of the packages increased. Both findings indicate that additional factors may influence the package vulnerability count. We suggest that vulnerability count and security score data be refined such that these measures may be used to provide actionable guidance on security practices.
△ Less
Submitted 15 June, 2023; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Position-controlled Telecom Single Photon Emitters Operating at Elevated Temperatures
Authors:
Patrick Laferrière,
Sofiane Haffouz,
David B. Northeast,
Philip J. Poole,
Robin L. Williams,
Dan Dalacu
Abstract:
Single photon emitters are a key component for enabling the practical use of quantum key distribution protocols for secure communications. For long-haul optical networks it is imperative to use photons at wavelengths that are compatible with standard single mode fibers: 1.31 μm and 1.55 μm. We demonstrate high purity single photon emission at 1.31 μm using deterministically positioned InP photonic…
▽ More
Single photon emitters are a key component for enabling the practical use of quantum key distribution protocols for secure communications. For long-haul optical networks it is imperative to use photons at wavelengths that are compatible with standard single mode fibers: 1.31 μm and 1.55 μm. We demonstrate high purity single photon emission at 1.31 μm using deterministically positioned InP photonic waveguide nanowires containing single InAsP quantum dot-in-a-rod structures. At 4 K the detected count rate in fiber was 1.9 Mcps under above-band pulsed laser excitation at 80 MHz corresponding to a single photon collection efficiency at the first lens of 25%. At this count rate, the probability of multiphoton emission is g(2)(0) = 0.021. We have also evaluated the performance of the source as a function of temperature. Multiphoton emission probability increases with temperature with values of 0.11, 0.34 and 0.57 at 77 K, 220 K and 300 K, respectively, which is attributed to an overlap of temperature-broadened excitonic emission lines. These results are a promising step towards scalably fabricating telecom single photon emitters that operate under relaxed cooling requirements.
△ Less
Submitted 28 October, 2022; v1 submitted 23 October, 2022;
originally announced October 2022.
-
Strong Lensing by Galaxies
Authors:
A. J. Shajib,
G. Vernardos,
T. E. Collett,
V. Motta,
D. Sluse,
L. L. R. Williams,
P. Saha,
S. Birrer,
C. Spiniello,
T. Treu
Abstract:
Strong gravitational lensing at the galaxy scale is a valuable tool for various applications in astrophysics and cosmology. The primary uses of galaxy-scale lensing are to study elliptical galaxies' mass structure and evolution, constrain the stellar initial mass function, and measure cosmological parameters. Since the discovery of the first galaxy-scale lens in the 1980s, this field has made sign…
▽ More
Strong gravitational lensing at the galaxy scale is a valuable tool for various applications in astrophysics and cosmology. The primary uses of galaxy-scale lensing are to study elliptical galaxies' mass structure and evolution, constrain the stellar initial mass function, and measure cosmological parameters. Since the discovery of the first galaxy-scale lens in the 1980s, this field has made significant advancements in data quality and modeling techniques. In this review, we describe the most common methods for modeling lensing observables, especially imaging data, as they are the most accessible and informative source of lensing observables. We then summarize the primary findings from the literature on the astrophysical and cosmological applications of galaxy-scale lenses. We also discuss the current limitations of the data and methodologies and provide an outlook on the expected improvements in both areas in the near future.
△ Less
Submitted 6 April, 2025; v1 submitted 19 October, 2022;
originally announced October 2022.
-
From Threat Reports to Continuous Threat Intelligence: A Comparison of Attack Technique Extraction Methods from Textual Artifacts
Authors:
Md Rayhanur Rahman,
Laurie Williams
Abstract:
The cyberthreat landscape is continuously evolving. Hence, continuous monitoring and sharing of threat intelligence have become a priority for organizations. Threat reports, published by cybersecurity vendors, contain detailed descriptions of attack Tactics, Techniques, and Procedures (TTP) written in an unstructured text format. Extracting TTP from these reports aids cybersecurity practitioners a…
▽ More
The cyberthreat landscape is continuously evolving. Hence, continuous monitoring and sharing of threat intelligence have become a priority for organizations. Threat reports, published by cybersecurity vendors, contain detailed descriptions of attack Tactics, Techniques, and Procedures (TTP) written in an unstructured text format. Extracting TTP from these reports aids cybersecurity practitioners and researchers learn and adapt to evolving attacks and in planning threat mitigation. Researchers have proposed TTP extraction methods in the literature, however, not all of these proposed methods are compared to one another or to a baseline. \textit{The goal of this study is to aid cybersecurity researchers and practitioners choose attack technique extraction methods for monitoring and sharing threat intelligence by comparing the underlying methods from the TTP extraction studies in the literature.} In this work, we identify ten existing TTP extraction studies from the literature and implement five methods from the ten studies. We find two methods, based on Term Frequency-Inverse Document Frequency(TFIDF) and Latent Semantic Indexing (LSI), outperform the other three methods with a F1 score of 84\% and 83\%, respectively. We observe the performance of all methods in F1 score drops in the case of increasing the class labels exponentially. We also implement and evaluate an oversampling strategy to mitigate class imbalance issues. Furthermore, oversampling improves the classification performance of TTP extraction. We provide recommendations from our findings for future cybersecurity researchers, such as the construction of a benchmark dataset from a large corpus; and the selection of textual features of TTP. Our work, along with the dataset and implementation source code, can work as a baseline for cybersecurity researchers to test and compare the performance of future TTP extraction methods.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
Radio source-component association for the LOFAR Two-metre Sky Survey with region-based convolutional neural networks
Authors:
Rafaël I. J. Mostert,
Kenneth J. Duncan,
Lara Alegre,
Huub J. A. Röttgering,
Wendy L. Williams,
Philip N. Best,
Martin J. Hardcastle,
Raffaella Morganti
Abstract:
Radio loud active galactic nuclei (RLAGNs) are often morphologically complex objects that can consist of multiple, spatially separated, components. Astronomers often rely on visual inspection to resolve radio component association. However, applying visual inspection to all the hundreds of thousands of well-resolved RLAGNs that appear in the images from the Low Frequency Array (LOFAR) Two-metre Sk…
▽ More
Radio loud active galactic nuclei (RLAGNs) are often morphologically complex objects that can consist of multiple, spatially separated, components. Astronomers often rely on visual inspection to resolve radio component association. However, applying visual inspection to all the hundreds of thousands of well-resolved RLAGNs that appear in the images from the Low Frequency Array (LOFAR) Two-metre Sky Survey (LoTSS) at $144$ MHz, is a daunting, time-consuming process, even with extensive manpower.
Using a machine learning approach, we aim to automate the radio component association of large ($> 15$ arcsec) radio components.
We turned the association problem into a classification problem and trained an adapted Fast region-based convolutional neural network to mimic the expert annotations from the first LoTSS data release. We implemented a rotation data augmentation to reduce overfitting and simplify the component association by removing unresolved radio sources that are likely unrelated to the large and bright radio components that we consider using predictions from an existing gradient boosting classifier.
For large ($> 15$ arcsec) and bright ($> 10$ mJy) radio components in the LoTSS first data release, our model provides the same associations for $85.3\%\pm0.6$ of the cases as those derived when astronomers perform the association manually. When the association is done through public crowd-sourced efforts, a result similar to that of our model is attained.
Our method is able to efficiently carry out manual radio-component association for huge radio surveys and can serve as a basis for either automated radio morphology classification or automated optical host identification. This opens up an avenue to study the completeness and reliability of samples of radio sources with extended, complex morphologies.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
Sensitivity projections for a dual-phase argon TPC optimized for light dark matter searches through the ionization channel
Authors:
P. Agnes,
I. Ahmad,
S. Albergo,
I. F. M. Albuquerque,
T. Alexander,
A. K. Alton,
P. Amaudruz,
M. Atzori Corona,
D. J. Auty,
M. Ave,
I. Ch. Avetisov,
R. I. Avetisov,
O. Azzolini,
H. O. Back,
Z. Balmforth,
V. Barbarian,
A. Barrado Olmedo,
P. Barrillon,
A. Basco,
G. Batignani,
E. Berzin,
A. Bondar,
W. M. Bonivento,
E. Borisova,
B. Bottino
, et al. (274 additional authors not shown)
Abstract:
Dark matter lighter than 10 GeV/c$^2$ encompasses a promising range of candidates. A conceptual design for a new detector, DarkSide-LowMass, is presented, based on the DarkSide-50 detector and progress toward DarkSide-20k, optimized for a low-threshold electron-counting measurement. Sensitivity to light dark matter is explored for various potential energy thresholds and background rates. These stu…
▽ More
Dark matter lighter than 10 GeV/c$^2$ encompasses a promising range of candidates. A conceptual design for a new detector, DarkSide-LowMass, is presented, based on the DarkSide-50 detector and progress toward DarkSide-20k, optimized for a low-threshold electron-counting measurement. Sensitivity to light dark matter is explored for various potential energy thresholds and background rates. These studies show that DarkSide-LowMass can achieve sensitivity to light dark matter down to the solar neutrino floor for GeV-scale masses and significant sensitivity down to 10 MeV/c$^2$ considering the Migdal effect or interactions with electrons. Requirements for optimizing the detector's sensitivity are explored, as are potential sensitivity gains from modeling and mitigating spurious electron backgrounds that may dominate the signal at the lowest energies.
△ Less
Submitted 20 June, 2023; v1 submitted 2 September, 2022;
originally announced September 2022.
-
Minimum Flow Decomposition in Graphs with Cycles using Integer Linear Programming
Authors:
Fernando H. C. Dias,
Lucia Williams,
Brendan Mumey,
Alexandru I. Tomescu
Abstract:
Minimum flow decomposition (MFD) -- the problem of finding a minimum set of weighted source-to-sink paths that perfectly decomposes a flow -- is a classical problem in Computer Science, and variants of it are powerful models in different fields such as Bioinformatics and Transportation. Even on acyclic graphs, the problem is NP-hard, and most practical solutions have been via heuristics or approxi…
▽ More
Minimum flow decomposition (MFD) -- the problem of finding a minimum set of weighted source-to-sink paths that perfectly decomposes a flow -- is a classical problem in Computer Science, and variants of it are powerful models in different fields such as Bioinformatics and Transportation. Even on acyclic graphs, the problem is NP-hard, and most practical solutions have been via heuristics or approximations. While there is an extensive body of research on acyclic graphs, currently, there is no \emph{exact} solution on graphs with cycles. In this paper, we present the first ILP formulation for three natural variants of the MFD problem in graphs with cycles, asking for a decomposition consisting only of weighted source-to-sink paths or cycles, trails, and walks, respectively. On three datasets of increasing levels of complexity from both Bioinformatics and Transportation, our approaches solve any instance in under 10 minutes. Our implementations are freely available at github.com/algbio/MFD-ILP.
△ Less
Submitted 16 January, 2023; v1 submitted 31 August, 2022;
originally announced September 2022.
-
Statistical Mechanics of Thermostatically Controlled Multi-Zone Buildings
Authors:
Lucas Fuentes Valenzuela,
Lindell Williams,
Michael Chertkov
Abstract:
We study the collective phenomena and constraints associated with the aggregation of individual cooling units from a statistical mechanics perspective. These units are modelled as Thermostatically Controlled Loads (TCLs) and represent zones in a large commercial or residential building. Their energy input is centralized and controlled by a collective unit -- the Air Handling Unit (AHU) -- deliveri…
▽ More
We study the collective phenomena and constraints associated with the aggregation of individual cooling units from a statistical mechanics perspective. These units are modelled as Thermostatically Controlled Loads (TCLs) and represent zones in a large commercial or residential building. Their energy input is centralized and controlled by a collective unit -- the Air Handling Unit (AHU) -- delivering cool air to all TCLs, thereby coupling them together. Aiming to identify representative qualitative features of the AHU-to-TCL coupling, we build a realistic but also sufficiently simple model and analyze it in two distinct regimes: the Constant Supply Temperature (CST) and the Constant Power Input (CPI) regimes. In both cases, we center our analysis on the relaxation dynamics of individual TCL temperatures to a statistically steady state. We observe that while the dynamics are relatively fast in the CST regime, resulting in all TCLs evolving around the control setpoint, the CPI regime reveals emergence of a \emph{bi-modal probability distribution and two, possibly strongly separated, time scales}. We observe that the two modes in the CPI regime are associated with all TCLs being in the same low and high-temperature states, respectively, with occasional (and therefore possibly rare) collective transition between the modes akin in the Kramer's phenomenon of statistical physics. To the best of our knowledge, this phenomenon was overlooked in the context of the multi-zone energy building engineering, even thought it has direct implications on the operations of centralized cooling systems in buildings. It teaches us that a balance needs to be struck between occupational comfort -- related to variations in the individual temperatures -- and power output predictability -- the main focus of the DR schemes.
△ Less
Submitted 27 August, 2022;
originally announced August 2022.
-
Statistical Mechanics of Collisionless Orbits. V. The approach to equilibrium for idealized self-gravitating systems
Authors:
Liliya L. R. Williams,
Jens Hjorth
Abstract:
Self-gravitating Newtonian systems consisting of a very large number of particles have generally defied attempts to describe them using statistical mechanics. This is paradoxical since many astronomical systems, or simulations thereof, appear to have universal, equilibrium structures for which no physical basis exist. A decade ago we showed that extremizing the number of microstates with a given e…
▽ More
Self-gravitating Newtonian systems consisting of a very large number of particles have generally defied attempts to describe them using statistical mechanics. This is paradoxical since many astronomical systems, or simulations thereof, appear to have universal, equilibrium structures for which no physical basis exist. A decade ago we showed that extremizing the number of microstates with a given energy per unit mass, under the constraints of conserved total energy and mass, leads to the maximum entropy state, $n(E) \propto \exp (-β(E-Φ_0))-1$, known as DARKexp. This differential energy distribution, and the resulting density structures, closely approximate those of dark-matter halos with central cusps, $ρ\sim r^{-1}$, and outer parts, $ρ\sim r^{-4}$. Here we define a non-equilibrium functional, $S_D$, which is maximized for DARKexp and increases monotonically during the evolution towards equilibrium of idealized collisionless systems of the Extended Spherical Infall Model. Systems that undergo more mixing more closely approach DARKexp.
△ Less
Submitted 24 August, 2022;
originally announced August 2022.
-
What are the Practices for Secret Management in Software Artifacts?
Authors:
Setu Kumar Basak,
Lorenzo Neil,
Bradley Reaves,
Laurie Williams
Abstract:
Throughout 2021, GitGuardian's monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. A systematic derivation of practices for managing secrets can help practitioners in secure development. The goal of our paper is to aid practitioner…
▽ More
Throughout 2021, GitGuardian's monitoring of public GitHub repositories revealed a two-fold increase in the number of secrets (database credentials, API keys, and other credentials) exposed compared to 2020, accumulating more than six million secrets. A systematic derivation of practices for managing secrets can help practitioners in secure development. The goal of our paper is to aid practitioners in avoiding the exposure of secrets by identifying secret management practices in software artifacts through a systematic derivation of practices disseminated in Internet artifacts. We conduct a grey literature review of Internet artifacts, such as blog articles and question and answer posts. We identify 24 practices grouped in six categories comprised of developer and organizational practices. Our findings indicate that using local environment variables and external secret management services are the most recommended practices to move secrets out of source code and to securely store secrets. We also observe that using version control system scanning tools and employing short-lived secrets are the most recommended practices to avoid accidentally committing secrets and limit secret exposure, respectively.
△ Less
Submitted 23 August, 2022;
originally announced August 2022.
-
Polyhedral and Tropical Geometry of Flag Positroids
Authors:
Jonathan Boretsky,
Christopher Eur,
Lauren Williams
Abstract:
A flag positroid of ranks $\boldsymbol{r}:=(r_1<\dots <r_k)$ on $[n]$ is a flag matroid that can be realized by a real $r_k \times n$ matrix $A$ such that the $r_i \times r_i$ minors of $A$ involving rows $1,2,\dots,r_i$ are nonnegative for all $1\leq i \leq k$. In this paper we explore the polyhedral and tropical geometry of flag positroids, particularly when $\boldsymbol{r}:=(a, a+1,\dots,b)$ is…
▽ More
A flag positroid of ranks $\boldsymbol{r}:=(r_1<\dots <r_k)$ on $[n]$ is a flag matroid that can be realized by a real $r_k \times n$ matrix $A$ such that the $r_i \times r_i$ minors of $A$ involving rows $1,2,\dots,r_i$ are nonnegative for all $1\leq i \leq k$. In this paper we explore the polyhedral and tropical geometry of flag positroids, particularly when $\boldsymbol{r}:=(a, a+1,\dots,b)$ is a sequence of consecutive numbers. In this case we show that the nonnegative tropical flag variety TrFl$_{\boldsymbol{r},n}^{\geq 0}$ equals the nonnegative flag Dressian FlDr$_{\boldsymbol{r},n}^{\geq 0}$, and that the points $\boldsymbolμ = (μ_a,\ldots, μ_b)$ of TrFl$_{\boldsymbol{r},n}^{\geq 0} =$ FlDr$_{\boldsymbol{r},n}^{\geq 0}$ give rise to coherent subdivisions of the flag positroid polytope $P(\underline{\boldsymbolμ})$ into flag positroid polytopes. Our results have applications to Bruhat interval polytopes: for example, we show that a complete flag matroid polytope is a Bruhat interval polytope if and only if its $(\leq 2)$-dimensional faces are Bruhat interval polytopes. Our results also have applications to realizability questions. We define a positively oriented flag matroid to be a sequence of positively oriented matroids $(χ_1,\dots,χ_k)$ which is also an oriented flag matroid. We then prove that every positively oriented flag matroid of ranks $\boldsymbol{r}=(a,a+1,\dots,b)$ is realizable.
△ Less
Submitted 20 February, 2025; v1 submitted 18 August, 2022;
originally announced August 2022.
-
OpenSSF Scorecard: On the Path Toward Ecosystem-wide Automated Security Metrics
Authors:
Nusrat Zahan,
Parth Kanakiya,
Brian Hambleton,
Shohanuzzaman Shohan,
Laurie Williams
Abstract:
The OpenSSF Scorecard project is an automated tool to monitor the security health of open-source software. This study evaluates the applicability of the Scorecard tool and compares the security practices and gaps in the npm and PyPI ecosystems.
The OpenSSF Scorecard project is an automated tool to monitor the security health of open-source software. This study evaluates the applicability of the Scorecard tool and compares the security practices and gaps in the npm and PyPI ecosystems.
△ Less
Submitted 15 June, 2023; v1 submitted 5 August, 2022;
originally announced August 2022.
-
Present and Future of SLAM in Extreme Underground Environments
Authors:
Kamak Ebadi,
Lukas Bernreiter,
Harel Biggie,
Gavin Catt,
Yun Chang,
Arghya Chatterjee,
Christopher E. Denniston,
Simon-Pierre Deschênes,
Kyle Harlow,
Shehryar Khattak,
Lucas Nogueira,
Matteo Palieri,
Pavel Petráček,
Matěj Petrlík,
Andrzej Reinke,
Vít Krátký,
Shibo Zhao,
Ali-akbar Agha-mohammadi,
Kostas Alexis,
Christoffer Heckman,
Kasra Khosoussi,
Navinda Kottege,
Benjamin Morrell,
Marco Hutter,
Fred Pauling
, et al. (6 additional authors not shown)
Abstract:
This paper reports on the state of the art in underground SLAM by discussing different SLAM strategies and results across six teams that participated in the three-year-long SubT competition. In particular, the paper has four main goals. First, we review the algorithms, architectures, and systems adopted by the teams; particular emphasis is put on lidar-centric SLAM solutions (the go-to approach fo…
▽ More
This paper reports on the state of the art in underground SLAM by discussing different SLAM strategies and results across six teams that participated in the three-year-long SubT competition. In particular, the paper has four main goals. First, we review the algorithms, architectures, and systems adopted by the teams; particular emphasis is put on lidar-centric SLAM solutions (the go-to approach for virtually all teams in the competition), heterogeneous multi-robot operation (including both aerial and ground robots), and real-world underground operation (from the presence of obscurants to the need to handle tight computational constraints). We do not shy away from discussing the dirty details behind the different SubT SLAM systems, which are often omitted from technical papers. Second, we discuss the maturity of the field by highlighting what is possible with the current SLAM systems and what we believe is within reach with some good systems engineering. Third, we outline what we believe are fundamental open problems, that are likely to require further research to break through. Finally, we provide a list of open-source SLAM implementations and datasets that have been produced during the SubT challenge and related efforts, and constitute a useful resource for researchers and practitioners.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Do I really need all this work to find vulnerabilities? An empirical case study comparing vulnerability detection techniques on a Java application
Authors:
Sarah Elder,
Nusrat Zahan,
Rui Shu,
Monica Metro,
Valeri Kozarev,
Tim Menzies,
Laurie Williams
Abstract:
CONTEXT: Applying vulnerability detection techniques is one of many tasks using the limited resources of a software project.
OBJECTIVE: The goal of this research is to assist managers and other decision-makers in making informed choices about the use of software vulnerability detection techniques through an empirical study of the efficiency and effectiveness of four techniques on a Java-based we…
▽ More
CONTEXT: Applying vulnerability detection techniques is one of many tasks using the limited resources of a software project.
OBJECTIVE: The goal of this research is to assist managers and other decision-makers in making informed choices about the use of software vulnerability detection techniques through an empirical study of the efficiency and effectiveness of four techniques on a Java-based web application.
METHOD: We apply four different categories of vulnerability detection techniques \textendash~ systematic manual penetration testing (SMPT), exploratory manual penetration testing (EMPT), dynamic application security testing (DAST), and static application security testing (SAST) \textendash\ to an open-source medical records system.
RESULTS: We found the most vulnerabilities using SAST. However, EMPT found more severe vulnerabilities. With each technique, we found unique vulnerabilities not found using the other techniques. The efficiency of manual techniques (EMPT, SMPT) was comparable to or better than the efficiency of automated techniques (DAST, SAST) in terms of Vulnerabilities per Hour (VpH).
CONCLUSIONS: The vulnerability detection technique practitioners should select may vary based on the goals and available resources of the project. If the goal of an organization is to find "all" vulnerabilities in a project, they need to use as many techniques as their resources allow.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Approaching transform-limited photons from nanowire quantum dots excited above-band
Authors:
Patrick Laferrière,
Aria Yin,
Edith Yeung,
Leila Kusmic,
Marek Korkusinski,
Payman Rasekh,
David B. Northeast,
Sofiane Haffouz,
Jean Lapointe,
Philip J. Poole,
Robin L. Williams,
Dan Dalacu
Abstract:
We demonstrate that, even when employing above-band excitation, photons emitted from semiconductor quantum dots can have linewidths that approach their transform-limited values. This is accomplished by using quantum dots embedded in bottom-up photonic nanowires, an approach which mitigates several potential mechanisms that can result in linewidth broadening: (i) only a single quantum dot is presen…
▽ More
We demonstrate that, even when employing above-band excitation, photons emitted from semiconductor quantum dots can have linewidths that approach their transform-limited values. This is accomplished by using quantum dots embedded in bottom-up photonic nanowires, an approach which mitigates several potential mechanisms that can result in linewidth broadening: (i) only a single quantum dot is present in each device, (ii) dot nucleation proceeds without the formation of a wetting layer, and (iii) the sidewalls of the photonic nanowire are comprised not of etched facets, but of epitaxially grown crystal planes. Using these structures we achieve linewidths of 2x the transform limit, unprecedented for above-band excitation. We also demonstrate a highly nonlinear dependence of the linewidth on both excitation power and temperature which can be described by an independent Boson model that considers both deformation and piezoelectric exciton-phonon coupling. We find that for sufficiently low excitation powers and temperatures, the observed excess broadening is not dominated by phonon dephasing, a surprising result considering the high phonon occupation that occurs with above-band excitation.
△ Less
Submitted 29 July, 2022;
originally announced August 2022.
-
Identifying active galactic nuclei via brightness temperature with sub-arcsecond International LOFAR Telescope observations
Authors:
Leah K. Morabito,
F. Sweijen,
J. F. Radcliffe,
P. N. Best,
Rohit Kondapally,
Marco Bondi,
Matteo Bonato,
K. J. Duncan,
Isabella Prandoni,
T. W. Shimwell,
W. L. Williams,
R. J. van Weeren,
J. E. Conway,
G. Calistro Rivera
Abstract:
Identifying active galactic nuclei (AGN) and isolating their contribution to a galaxy's energy budget is crucial for studying the co-evolution of AGN and their host galaxies. Brightness temperature ($T_b$) measurements from high-resolution radio observations at GHz frequencies are widely used to identify AGN. Here we investigate using new sub-arcsecond imaging at 144 MHz with the International LOF…
▽ More
Identifying active galactic nuclei (AGN) and isolating their contribution to a galaxy's energy budget is crucial for studying the co-evolution of AGN and their host galaxies. Brightness temperature ($T_b$) measurements from high-resolution radio observations at GHz frequencies are widely used to identify AGN. Here we investigate using new sub-arcsecond imaging at 144 MHz with the International LOFAR Telescope to identify AGN using $T_b$ in the Lockman Hole field. We use ancillary data to validate the 940 AGN identifications, finding 83 percent of sources have AGN classifications from SED fitting and/or photometric identifications, yielding 160 new AGN identifications. Considering the multi-wavelength classifications, brightness temperature criteria select over half of radio-excess sources, 32 percent of sources classified as radio-quiet AGN, and 20 percent of sources classified as star-forming galaxies. Infrared colour-colour plots and comparison with what we would expect to detect based on peak brightness in 6 arcsec LOFAR maps, imply that the star-forming galaxies and sources at low flux densities have a mixture of star-formation and AGN activity. We separate the radio emission from star-formation and AGN in unresolved, $T_b$-identified AGN with no significant radio excess and find the AGN comprises $0.49\pm 0.16$ of the radio luminosity. Overall the non-radio excess AGN show evidence for having a variety of different radio emission mechanisms, which can provide different pathways for AGN and galaxy co-evolution. This validation of AGN identification using brightness temperature at low frequencies opens the possibility for securely selecting AGN samples where ancillary data is inadequate.
△ Less
Submitted 26 July, 2022;
originally announced July 2022.
-
Trajectory PMB Filters for Extended Object Tracking Using Belief Propagation
Authors:
Yuxuan Xia,
Ángel F. García-Fernández,
Florian Meyer,
Jason L. Williams,
Karl Granström,
Lennart Svensson
Abstract:
In this paper, we propose a Poisson multi-Bernoulli (PMB) filter for extended object tracking (EOT), which directly estimates the set of object trajectories, using belief propagation (BP). The proposed filter propagates a PMB density on the posterior of sets of trajectories through the filtering recursions over time, where the PMB mixture (PMBM) posterior after the update step is approximated as a…
▽ More
In this paper, we propose a Poisson multi-Bernoulli (PMB) filter for extended object tracking (EOT), which directly estimates the set of object trajectories, using belief propagation (BP). The proposed filter propagates a PMB density on the posterior of sets of trajectories through the filtering recursions over time, where the PMB mixture (PMBM) posterior after the update step is approximated as a PMB. The efficient PMB approximation relies on several important theoretical contributions. First, we present a PMBM conjugate prior on the posterior of sets of trajectories for a generalized measurement model, in which each object generates an independent set of measurements. The PMBM density is a conjugate prior in the sense that both the prediction and the update steps preserve the PMBM form of the density. Second, we present a factor graph representation of the joint posterior of the PMBM set of trajectories and association variables for the Poisson spatial measurement model. Importantly, leveraging the PMBM conjugacy and the factor graph formulation enables an elegant treatment on undetected objects via a Poisson point process and efficient inference on sets of trajectories using BP, where the approximate marginal densities in the PMB approximation can be obtained without enumeration of different data association hypotheses. To achieve this, we present a particle-based implementation of the proposed filter, where smoothed trajectory estimates, if desired, can be obtained via single-object particle smoothing methods, and its performance for EOT with ellipsoidal shapes is evaluated in a simulation study.
△ Less
Submitted 19 September, 2023; v1 submitted 20 July, 2022;
originally announced July 2022.
-
Width Helps and Hinders Splitting Flows
Authors:
Manuel Cáceres,
Massimo Cairo,
Andreas Grigorjew,
Shahbaz Khan,
Brendan Mumey,
Romeo Rizzi,
Alexandru I. Tomescu,
Lucia Williams
Abstract:
Minimum flow decomposition (MFD) is the NP-hard problem of finding a smallest decomposition of a network flow/circulation $X$ on a directed graph $G$ into weighted source-to-sink paths whose superposition equals $X$. We show that, for acyclic graphs, considering the \emph{width} of the graph (the minimum number of paths needed to cover all of its edges) yields advances in our understanding of its…
▽ More
Minimum flow decomposition (MFD) is the NP-hard problem of finding a smallest decomposition of a network flow/circulation $X$ on a directed graph $G$ into weighted source-to-sink paths whose superposition equals $X$. We show that, for acyclic graphs, considering the \emph{width} of the graph (the minimum number of paths needed to cover all of its edges) yields advances in our understanding of its approximability. For the version of the problem that uses only non-negative weights, we identify and characterise a new class of \emph{width-stable} graphs, for which a popular heuristic is a \gwsimple-approximation ($|X|$ being the total flow of $X$), and strengthen its worst-case approximation ratio from $Ω(\sqrt{m})$ to $Ω(m / \log m)$ for sparse graphs, where $m$ is the number of edges in the graph. We also study a new problem on graphs with cycles, Minimum Cost Circulation Decomposition (MCCD), and show that it generalises MFD through a simple reduction. For the version allowing also negative weights, we give a $(\lceil \log \Vert X \Vert \rceil +1)$-approximation ($\Vert X \Vert$ being the maximum absolute value of $X$ on any edge) using a power-of-two approach, combined with parity fixing arguments and a decomposition of unitary circulations ($\Vert X \Vert \leq 1$), using a generalised notion of width for this problem. Finally, we disprove a conjecture about the linear independence of minimum (non-negative) flow decompositions posed by Kloster et al. [ALENEX 2018], but show that its useful implication (polynomial-time assignments of weights to a given set of paths to decompose a flow) holds for the negative version.
△ Less
Submitted 9 May, 2023; v1 submitted 5 July, 2022;
originally announced July 2022.
-
A machine learning classifier for LOFAR radio galaxy cross-matching techniques
Authors:
Lara Alegre,
Jose Sabater,
Philip Best,
Rafaël I. J. Mostert,
Wendy L. Williams,
Gülay Gürkan,
Martin J. Hardcastle,
Rohit Kondapally,
Tim W. Shimwell,
Daniel J. B. Smith
Abstract:
New-generation radio telescopes like LOFAR are conducting extensive sky surveys, detecting millions of sources. To maximise the scientific value of these surveys, radio source components must be properly associated into physical sources before being cross-matched with their optical/infrared counterparts. In this paper, we use machine learning to identify those radio sources for which either source…
▽ More
New-generation radio telescopes like LOFAR are conducting extensive sky surveys, detecting millions of sources. To maximise the scientific value of these surveys, radio source components must be properly associated into physical sources before being cross-matched with their optical/infrared counterparts. In this paper, we use machine learning to identify those radio sources for which either source association is required or statistical cross-matching to optical/infrared catalogues is unreliable. We train a binary classifier using manual annotations from the LOFAR Two-metre Sky Survey (LoTSS). We find that, compared to a classification model based on just the radio source parameters, the addition of features of the nearest-neighbour radio sources, the potential optical host galaxy, and the radio source composition in terms of Gaussian components, all improve model performance. Our best model, a gradient boosting classifier, achieves an accuracy of 95 per cent on a balanced dataset and 96 per cent on the whole (unbalanced) sample after optimising the classification threshold. Unsurprisingly, the classifier performs best on small, unresolved radio sources, reaching almost 99 per cent accuracy for sources smaller than 15 arcsec, but still achieves 70 per cent accuracy on resolved sources. It flags 68 per cent more sources than required as needing visual inspection, but this is still fewer than the manually-developed decision tree used in LoTSS, while also having a lower rate of wrongly accepted sources for statistical analysis. The results have an immediate practical application for cross-matching the next LoTSS data releases and can be generalised to other radio surveys.
△ Less
Submitted 4 July, 2022;
originally announced July 2022.
-
Are your dependencies code reviewed?: Measuring code review coverage in dependency updates
Authors:
Nasif Imtiaz,
Laurie Williams
Abstract:
As modern software extensively uses free open source packages as dependencies, developers have to regularly pull in new third-party code through frequent updates. However, without a proper review of every incoming change, vulnerable and malicious code can sneak into the codebase through these dependencies. The goal of this study is to aid developers in securely accepting dependency updates by meas…
▽ More
As modern software extensively uses free open source packages as dependencies, developers have to regularly pull in new third-party code through frequent updates. However, without a proper review of every incoming change, vulnerable and malicious code can sneak into the codebase through these dependencies. The goal of this study is to aid developers in securely accepting dependency updates by measuring if the code changes in an update have passed through a code review process. We implement Depdive, an update audit tool for packages in Crates.io, npm, PyPI, and RubyGems registry. Depdive first (i) identifies the files and the code changes in an update that cannot be traced back to the package's source repository, i.e., \textit{phantom artifacts}; and then (ii) measures what portion of changes in the update, excluding the phantom artifacts, has passed through a code review process, i.e., \textit{code review coverage}.
Using Depdive, we present an empirical study across the latest ten updates of the most downloaded 1000 packages in each of the four registries. We further evaluated our results through a maintainer agreement survey. We find the updates are typically only partially code-reviewed (52.5\% of the time). Further, only 9.0\% of the packages had all their updates in our data set fully code-reviewed, indicating that even the most used packages can introduce non-reviewed code in the software supply chain. We also observe that updates either tend to have high \textit{CRC} or low \textit{CRC}, suggesting that packages at the opposite end of the spectrum may require a separate set of treatments.
△ Less
Submitted 7 November, 2022; v1 submitted 19 June, 2022;
originally announced June 2022.
-
An excursion into the core of the cluster lens Abell 1689
Authors:
Agniva Ghosh,
Dominic Adams,
Liliya L. R. Williams,
Jori Liesenborgs,
Anahita Alavi,
Claudia Scarlata
Abstract:
Abell 1689 is a well studied cluster of galaxies and one of the largest gravitational lens systems ever observed. We have obtained a reconstruction of the cluster Abell 1689 using Grale, a free-form lens inversion method that relies exclusively on the multiple image data. Non-inclusion of any data related to cluster member galaxies ensures an unbiased measure of the mass distribution, which is the…
▽ More
Abell 1689 is a well studied cluster of galaxies and one of the largest gravitational lens systems ever observed. We have obtained a reconstruction of the cluster Abell 1689 using Grale, a free-form lens inversion method that relies exclusively on the multiple image data. Non-inclusion of any data related to cluster member galaxies ensures an unbiased measure of the mass distribution, which is the most notable feature of free-form methods like Grale. We used two different sets of multiple image systems from the available strong lensing data - one containing only the secure systems (107 images), and the other containing all available systems, only excluding some very non-secure systems (151 images). For the very well-constrained central $\sim$100 kpc region of the cluster we made detailed comparison of the Grale reconstructed lensing mass and stellar mass retrieved by the Spectral Energy Distribution (SED) fitting software FAST++. We found a light-unaccompanied mass peak in this region, whose existence, while tentative, is favored by the distribution of nearby images that are local maxima in the Fermat potential. However, further tests, using different methodologies are needed to confirm the reality of this feature. If it shown to be real, this light-unaccompanied mass peak is consistent with dark matter self-interaction cross-section $σ\lesssim 1$cm$^2$/g, while being in tension with larger cross-sections.
△ Less
Submitted 7 August, 2023; v1 submitted 17 June, 2022;
originally announced June 2022.
-
Multiple Object Trajectory Estimation Using Backward Simulation
Authors:
Yuxuan Xia,
Lennart Svensson,
Ángel F. García-Fernández,
Jason L. Williams,
Daniel Svensson,
Karl Granström
Abstract:
This paper presents a general solution for computing the multi-object posterior for sets of trajectories from a sequence of multi-object (unlabelled) filtering densities and a multi-object dynamic model. Importantly, the proposed solution opens an avenue of trajectory estimation possibilities for multi-object filters that do not explicitly estimate trajectories. In this paper, we first derive a ge…
▽ More
This paper presents a general solution for computing the multi-object posterior for sets of trajectories from a sequence of multi-object (unlabelled) filtering densities and a multi-object dynamic model. Importantly, the proposed solution opens an avenue of trajectory estimation possibilities for multi-object filters that do not explicitly estimate trajectories. In this paper, we first derive a general multi-trajectory backward smoothing equation based on random finite sets of trajectories. Then we show how to sample sets of trajectories using backward simulation for Poisson multi-Bernoulli filtering densities, and develop a tractable implementation based on ranked assignment. The performance of the resulting multi-trajectory particle smoothers is evaluated in a simulation study, and the results demonstrate that they have superior performance in comparison to several state-of-the-art multi-object filters and smoothers.
△ Less
Submitted 16 June, 2022;
originally announced June 2022.
-
A Deep Generative Model of Neonatal Cortical Surface Development
Authors:
Abdulah Fawaz,
Logan Z. Williams,
A. David Edwards,
Emma Robinson
Abstract:
The neonatal cortical surface is known to be affected by preterm birth, and the subsequent changes to cortical organisation have been associated with poorer neurodevelopmental outcomes. Deep Generative models have the potential to lead to clinically interpretable models of disease, but developing these on the cortical surface is challenging since established techniques for learning convolutional f…
▽ More
The neonatal cortical surface is known to be affected by preterm birth, and the subsequent changes to cortical organisation have been associated with poorer neurodevelopmental outcomes. Deep Generative models have the potential to lead to clinically interpretable models of disease, but developing these on the cortical surface is challenging since established techniques for learning convolutional filters are inappropriate on non-flat topologies. To close this gap, we implement a surface-based CycleGAN using mixture model CNNs (MoNet) to translate sphericalised neonatal cortical surface features (curvature and T1w/T2w cortical myelin) between different stages of cortical maturity. Results show our method is able to reliably predict changes in individual patterns of cortical organisation at later stages of gestation, validated by comparison to longitudinal data; and translate appearance between preterm and term gestation (> 37 weeks gestation), validated through comparison with a trained term/preterm classifier. Simulated differences in cortical maturation are consistent with observations in the literature.
△ Less
Submitted 22 June, 2022; v1 submitted 15 June, 2022;
originally announced June 2022.
-
Surface Analysis with Vision Transformers
Authors:
Simon Dahan,
Logan Z. J. Williams,
Abdulah Fawaz,
Daniel Rueckert,
Emma C. Robinson
Abstract:
The extension of convolutional neural networks (CNNs) to non-Euclidean geometries has led to multiple frameworks for studying manifolds. Many of those methods have shown design limitations resulting in poor modelling of long-range associations, as the generalisation of convolutions to irregular surfaces is non-trivial. Recent state-of-the-art performance of Vision Transformers (ViTs) demonstrates…
▽ More
The extension of convolutional neural networks (CNNs) to non-Euclidean geometries has led to multiple frameworks for studying manifolds. Many of those methods have shown design limitations resulting in poor modelling of long-range associations, as the generalisation of convolutions to irregular surfaces is non-trivial. Recent state-of-the-art performance of Vision Transformers (ViTs) demonstrates that a general-purpose architecture, which implements self-attention, could replace the local feature learning operations of CNNs. Motivated by the success of attention-modelling in computer vision, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence problem and propose a patching mechanism for surface meshes. We validate the performance of the proposed Surface Vision Transformer (SiT) on two brain age prediction tasks in the developing Human Connectome Project (dHCP) dataset and investigate the impact of pre-training on model performance. Experiments show that the SiT outperforms many surface CNNs, while indicating some evidence of general transformation invariance. Code available at https://github.com/metrics-lab/surface-vision-transformers
△ Less
Submitted 31 May, 2022;
originally announced May 2022.
-
Reducing the Cost of Training Security Classifier (via Optimized Semi-Supervised Learning)
Authors:
Rui Shu,
Tianpei Xia,
Huy Tu,
Laurie Williams,
Tim Menzies
Abstract:
Background: Most of the existing machine learning models for security tasks, such as spam detection, malware detection, or network intrusion detection, are built on supervised machine learning algorithms. In such a paradigm, models need a large amount of labeled data to learn the useful relationships between selected features and the target class. However, such labeled data can be scarce and expen…
▽ More
Background: Most of the existing machine learning models for security tasks, such as spam detection, malware detection, or network intrusion detection, are built on supervised machine learning algorithms. In such a paradigm, models need a large amount of labeled data to learn the useful relationships between selected features and the target class. However, such labeled data can be scarce and expensive to acquire. Goal: To help security practitioners train useful security classification models when few labeled training data and many unlabeled training data are available. Method: We propose an adaptive framework called Dapper, which optimizes 1) semi-supervised learning algorithms to assign pseudo-labels to unlabeled data in a propagation paradigm and 2) the machine learning classifier (i.e., random forest). When the dataset class is highly imbalanced, Dapper then adaptively integrates and optimizes a data oversampling method called SMOTE. We use the novel Bayesian Optimization to search a large hyperparameter space of these tuning targets. Result: We evaluate Dapper with three security datasets, i.e., the Twitter spam dataset, the malware URLs dataset, and the CIC-IDS-2017 dataset. Experimental results indicate that we can use as low as 10% of original labeled data but achieve close or even better classification performance than using 100% labeled data in a supervised way. Conclusion: Based on those results, we would recommend using hyperparameter optimization with semi-supervised learning when dealing with shortages of labeled security data.
△ Less
Submitted 2 May, 2022;
originally announced May 2022.
-
Cosmic evolution of low-excitation radio galaxies in the LOFAR Two-meter Sky Survey Deep Fields
Authors:
R. Kondapally,
P. N. Best,
R. K. Cochrane,
J. Sabater,
K. J. Duncan,
M. J. Hardcastle,
P. Haskell,
B. Mingo,
H. J. A. Röttgering,
D. J. B. Smith,
W. L. Williams,
M. Bonato,
G. Calistro Rivera,
F. Gao,
C. L. Hale,
K. Małek,
G. K. Miley,
I. Prandoni,
L. Wang
Abstract:
Feedback from low-excitation radio galaxies (LERGs) plays a key role in the lifecycle of massive galaxies in the local Universe; their evolution, and the impact of these active galactic nuclei on early galaxy evolution, however, remain poorly understood. We use a sample of 10481 LERGs from the first data release of the LOFAR Two-meter Sky Survey Deep Fields, covering $\sim$ 25 deg$^2$, to present…
▽ More
Feedback from low-excitation radio galaxies (LERGs) plays a key role in the lifecycle of massive galaxies in the local Universe; their evolution, and the impact of these active galactic nuclei on early galaxy evolution, however, remain poorly understood. We use a sample of 10481 LERGs from the first data release of the LOFAR Two-meter Sky Survey Deep Fields, covering $\sim$ 25 deg$^2$, to present the first measurement of the evolution of the radio luminosity function (LF) of LERGs out to $z\sim2.5$; this shows relatively mild evolution. We split the LERGs into those hosted by quiescent and star-forming galaxies, finding a new dominant population of LERGs hosted by star-forming galaxies at high redshifts. The incidence of LERGs in quiescent galaxies shows a steep dependence on stellar-mass out to $z \sim1.5$, consistent with local Universe measurements of accretion occurring from cooling of hot gas haloes. The quiescent-LERGs dominate the LFs at $z<1$, showing a strong decline in space density with redshift, tracing that of the available host galaxies, while there is an increase in the characteristic luminosity. The star-forming LERG LF increases with redshift, such that this population dominates the space densities at most radio-luminosities by $z \sim 1$. The incidence of LERGs in star-forming galaxies shows a much weaker stellar-mass dependence, and increases with redshift, suggesting a different fuelling mechanism compared to their quiescent counterparts, potentially associated with the cold gas supply present in the star-forming galaxies.
△ Less
Submitted 22 April, 2022; v1 submitted 15 April, 2022;
originally announced April 2022.
-
Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces
Authors:
Simon Dahan,
Hao Xu,
Logan Z. J. Williams,
Abdulah Fawaz,
Chunhui Yang,
Timothy S. Coalson,
Michelle C. Williams,
David E. Newby,
A. David Edwards,
Matthew F. Glasser,
Alistair A. Young,
Daniel Rueckert,
Emma C. Robinson
Abstract:
Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that a general-purpose architecture, which implements long-range self-attention, could replace the local feature learning operations of convolutional neural networks. In this paper, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem…
▽ More
Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that a general-purpose architecture, which implements long-range self-attention, could replace the local feature learning operations of convolutional neural networks. In this paper, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem, by proposing patching mechanisms for general surface meshes. Sequences of patches are then processed by a transformer encoder and used for classification or regression. We validate our method on a range of different biomedical surface domains and tasks: brain age prediction in the developing Human Connectome Project (dHCP), fluid intelligence prediction in the Human Connectome Project (HCP), and coronary artery calcium score classification using surfaces from the Scottish Computed Tomography of the Heart (SCOT-HEART) dataset, and investigate the impact of pretraining and data augmentation on model performance. Results suggest that Surface Vision Transformers (SiT) demonstrate consistent improvement over geometric deep learning methods for brain age and fluid intelligence prediction and achieve comparable performance on calcium score classification to standard metrics used in clinical practice. Furthermore, analysis of transformer attention maps offers clear and individualised predictions of the features driving each task. Code is available on Github: https://github.com/metrics-lab/surface-vision-transformers
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
Surface Vision Transformers: Attention-Based Modelling applied to Cortical Analysis
Authors:
Simon Dahan,
Abdulah Fawaz,
Logan Z. J. Williams,
Chunhui Yang,
Timothy S. Coalson,
Matthew F. Glasser,
A. David Edwards,
Daniel Rueckert,
Emma C. Robinson
Abstract:
The extension of convolutional neural networks (CNNs) to non-Euclidean geometries has led to multiple frameworks for studying manifolds. Many of those methods have shown design limitations resulting in poor modelling of long-range associations, as the generalisation of convolutions to irregular surfaces is non-trivial. Motivated by the success of attention-modelling in computer vision, we translat…
▽ More
The extension of convolutional neural networks (CNNs) to non-Euclidean geometries has led to multiple frameworks for studying manifolds. Many of those methods have shown design limitations resulting in poor modelling of long-range associations, as the generalisation of convolutions to irregular surfaces is non-trivial. Motivated by the success of attention-modelling in computer vision, we translate convolution-free vision transformer approaches to surface data, to introduce a domain-agnostic architecture to study any surface data projected onto a spherical manifold. Here, surface patching is achieved by representing spherical data as a sequence of triangular patches, extracted from a subdivided icosphere. A transformer model encodes the sequence of patches via successive multi-head self-attention layers while preserving the sequence resolution. We validate the performance of the proposed Surface Vision Transformer (SiT) on the task of phenotype regression from cortical surface metrics derived from the Developing Human Connectome Project (dHCP). Experiments show that the SiT generally outperforms surface CNNs, while performing comparably on registered and unregistered data. Analysis of transformer attention maps offers strong potential to characterise subtle cognitive developmental patterns.
△ Less
Submitted 30 March, 2022;
originally announced March 2022.
-
A Deep-Discrete Learning Framework for Spherical Surface Registration
Authors:
Mohamed A. Suliman,
Logan Z. J. Williams,
Abdulah Fawaz,
Emma C. Robinson
Abstract:
Cortical surface registration is a fundamental tool for neuroimaging analysis that has been shown to improve the alignment of functional regions relative to volumetric approaches. Classically, image registration is performed by optimizing a complex objective similarity function, leading to long run times. This contributes to a convention for aligning all data to a global average reference frame th…
▽ More
Cortical surface registration is a fundamental tool for neuroimaging analysis that has been shown to improve the alignment of functional regions relative to volumetric approaches. Classically, image registration is performed by optimizing a complex objective similarity function, leading to long run times. This contributes to a convention for aligning all data to a global average reference frame that poorly reflects the underlying cortical heterogeneity. In this paper, we propose a novel unsupervised learning-based framework that converts registration to a multi-label classification problem, where each point in a low-resolution control grid deforms to one of fixed, finite number of endpoints. This is learned using a spherical geometric deep learning architecture, in an end-to-end unsupervised way, with regularization imposed using a deep Conditional Random Field (CRF). Experiments show that our proposed framework performs competitively, in terms of similarity and areal distortion, relative to the most popular classical surface registration algorithms and generates smoother deformations than other learning-based surface registration methods, even in subjects with atypical cortical morphology.
△ Less
Submitted 24 March, 2022;
originally announced March 2022.
-
Dazzle: Using Optimized Generative Adversarial Networks to Address Security Data Class Imbalance Issue
Authors:
Rui Shu,
Tianpei Xia,
Laurie Williams,
Tim Menzies
Abstract:
Background: Machine learning techniques have been widely used and demonstrate promising performance in many software security tasks such as software vulnerability prediction. However, the class ratio within software vulnerability datasets is often highly imbalanced (since the percentage of observed vulnerability is usually very low). Goal: To help security practitioners address software security d…
▽ More
Background: Machine learning techniques have been widely used and demonstrate promising performance in many software security tasks such as software vulnerability prediction. However, the class ratio within software vulnerability datasets is often highly imbalanced (since the percentage of observed vulnerability is usually very low). Goal: To help security practitioners address software security data class imbalanced issues and further help build better prediction models with resampled datasets. Method: We introduce an approach called Dazzle which is an optimized version of conditional Wasserstein Generative Adversarial Networks with gradient penalty (cWGAN-GP). Dazzle explores the architecture hyperparameters of cWGAN-GP with a novel optimizer called Bayesian Optimization. We use Dazzle to generate minority class samples to resample the original imbalanced training dataset. Results: We evaluate Dazzle with three software security datasets, i.e., Moodle vulnerable files, Ambari bug reports, and JavaScript function code. We show that Dazzle is practical to use and demonstrates promising improvement over existing state-of-the-art oversampling techniques such as SMOTE (e.g., with an average of about 60% improvement rate over SMOTE in recall among all datasets). Conclusion: Based on this study, we would suggest the use of optimized GANs as an alternative method for security vulnerability data class imbalanced issues.
△ Less
Submitted 2 May, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
The Dwarf Galaxy Population at $z\sim 0.7$: A Catalog of Emission Lines and Redshifts from Deep Keck Observations
Authors:
John Pharo,
Yicheng Guo,
Guillermo Barro Calvo,
Timothy Carleton,
S. M. Faber,
Puragra Guhathakurta,
Susan A. Kassin,
David C. Koo,
Jack Lonergan,
Teja Teppala,
Weichen Wang,
Hassen M. Yesuf,
Fuyan Bian,
Romeel Dave,
John C. Forbes,
Dusan Keres,
Pablo Perez-Gonzalez,
Alec Martin,
A. J. Puleo,
Lauryn Williams,
Benjamin Winningham
Abstract:
We present a catalog of spectroscopically measured redshifts over $0 < z < 2$ and emission line fluxes for 1440 galaxies. The majority ($\sim$65\%) of the galaxies come from the HALO7D survey, with the remainder from the DEEPwinds program. This catalog includes redshifts for 646 dwarf galaxies with $\log(M_{\star}/M_{\odot}) < 9.5$. 810 catalog galaxies did not have previously published spectrosco…
▽ More
We present a catalog of spectroscopically measured redshifts over $0 < z < 2$ and emission line fluxes for 1440 galaxies. The majority ($\sim$65\%) of the galaxies come from the HALO7D survey, with the remainder from the DEEPwinds program. This catalog includes redshifts for 646 dwarf galaxies with $\log(M_{\star}/M_{\odot}) < 9.5$. 810 catalog galaxies did not have previously published spectroscopic redshifts, including 454 dwarf galaxies. HALO7D used the DEIMOS spectrograph on the Keck II telescope to take very deep (up to 32 hours exposure, with a median of $\sim$7 hours) optical spectroscopy in the COSMOS, EGS, GOODS-North, and GOODS-South CANDELS fields, and in some areas outside CANDELS. We compare our redshift results to existing spectroscopic and photometric redshifts in these fields, finding only a 1\% rate of discrepancy with other spectroscopic redshifts. We measure a small increase in median photometric redshift error (from 1.0\% to 1.3\%) and catastrophic outlier rate (from 3.5\% to 8\%) with decreasing stellar mass. We obtained successful redshift fits for 75\% of massive galaxies, and demonstrate a similar 70-75\% successful redshift measurement rate in $8.5 < \log(M_{\star}/M_{\odot}) < 9.5$ galaxies, suggesting similar survey sensitivity in this low-mass range. We describe the redshift, mass, and color-magnitude distributions of the catalog galaxies, finding HALO7D galaxies representative of CANDELS galaxies up to \textit{i}-band magnitudes of 25. The catalogs presented will enable studies of star formation (SF), the mass-metallicity relation, SF-morphology relations, and other properties of the $z\sim0.7$ dwarf galaxy population.
△ Less
Submitted 25 July, 2022; v1 submitted 17 March, 2022;
originally announced March 2022.
-
Euclidean Invariant Recognition of 2D Shapes Using Histograms of Magnitudes of Local Fourier-Mellin Descriptors
Authors:
Xinhua Zhang,
Lance R. Williams
Abstract:
Because the magnitude of inner products with its basis functions are invariant to rotation and scale change, the Fourier-Mellin transform has long been used as a component in Euclidean invariant 2D shape recognition systems. Yet Fourier-Mellin transform magnitudes are only invariant to rotation and scale changes about a known center point, and full Euclidean invariant shape recognition is not poss…
▽ More
Because the magnitude of inner products with its basis functions are invariant to rotation and scale change, the Fourier-Mellin transform has long been used as a component in Euclidean invariant 2D shape recognition systems. Yet Fourier-Mellin transform magnitudes are only invariant to rotation and scale changes about a known center point, and full Euclidean invariant shape recognition is not possible except when this center point can be consistently and accurately identified. In this paper, we describe a system where a Fourier-Mellin transform is computed at every point in the image. The spatial support of the Fourier-Mellin basis functions is made local by multiplying them with a polynomial envelope. Significantly, the magnitudes of convolutions with these complex filters at isolated points are not (by themselves) used as features for Euclidean invariant shape recognition because reliable discrimination would require filters with spatial support large enough to fully encompass the shapes. Instead, we rely on the fact that normalized histograms of magnitudes are fully Euclidean invariant. We demonstrate a system based on the VLAD machine learning method that performs Euclidean invariant recognition of 2D shapes and requires an order of magnitude less training data than comparable methods based on convolutional neural networks.
△ Less
Submitted 13 March, 2022;
originally announced March 2022.
-
Similarity Equivariant Linear Transformation of Joint Orientation-Scale Space Representations
Authors:
Xinhua Zhang,
Lance R. Williams
Abstract:
Convolution is conventionally defined as a linear operation on functions of one or more variables which commutes with shifts. Group convolution generalizes the concept to linear operations on functions of group elements representing more general geometric transformations and which commute with those transformations. Since similarity transformation is the most general geometric transformation on im…
▽ More
Convolution is conventionally defined as a linear operation on functions of one or more variables which commutes with shifts. Group convolution generalizes the concept to linear operations on functions of group elements representing more general geometric transformations and which commute with those transformations. Since similarity transformation is the most general geometric transformation on images that preserves shape, the group convolution that is equivariant to similarity transformation is the most general shape preserving linear operator. Because similarity transformations have four free parameters, group convolutions are defined on four-dimensional, joint orientation-scale spaces. Although prior work on equivariant linear operators has been limited to discrete groups, the similarity group is continuous. In this paper, we describe linear operators on discrete representations that are equivariant to continuous similarity transformation. This is achieved by using a basis of functions that is it joint shiftable-twistable-scalable. These pinwheel functions use Fourier series in the orientation dimension and Laplace transform in the log-scale dimension to form a basis of spatially localized functions that can be continuously interpolated in position, orientation and scale. Although this result is potentially significant with respect to visual computation generally, we present an initial demonstration of its utility by using it to compute a shape equivariant distribution of closed contours traced by particles undergoing Brownian motion in velocity. The contours are constrained by sets of points and line endings representing well known bistable illusory contour inducing patterns.
△ Less
Submitted 15 March, 2022; v1 submitted 13 March, 2022;
originally announced March 2022.
-
The LOFAR Two-metre Sky Survey -- V. Second data release
Authors:
T. W. Shimwell,
M. J. Hardcastle,
C. Tasse,
P. N. Best,
H. J. A. Röttgering,
W. L. Williams,
A. Botteon,
A. Drabent,
A. Mechev,
A. Shulevski,
R. J. van Weeren,
L. Bester,
M. Brüggen,
G. Brunetti,
J. R. Callingham,
K. T. Chyży,
J. E. Conway,
T. J. Dijkema,
K. Duncan,
F. de Gasperin,
C. L. Hale,
M. Haverkorn,
B. Hugo,
N. Jackson,
M. Mevius
, et al. (81 additional authors not shown)
Abstract:
In this data release from the LOFAR Two-metre Sky Survey (LoTSS) we present 120-168MHz images covering 27% of the northern sky. Our coverage is split into two regions centred at approximately 12h45m +44$^\circ$30' and 1h00m +28$^\circ$00' and spanning 4178 and 1457 square degrees respectively. The images were derived from 3,451hrs (7.6PB) of LOFAR High Band Antenna data which were corrected for th…
▽ More
In this data release from the LOFAR Two-metre Sky Survey (LoTSS) we present 120-168MHz images covering 27% of the northern sky. Our coverage is split into two regions centred at approximately 12h45m +44$^\circ$30' and 1h00m +28$^\circ$00' and spanning 4178 and 1457 square degrees respectively. The images were derived from 3,451hrs (7.6PB) of LOFAR High Band Antenna data which were corrected for the direction-independent instrumental properties as well as direction-dependent ionospheric distortions during extensive, but fully automated, data processing. A catalogue of 4,396,228 radio sources is derived from our total intensity (Stokes I) maps, where the majority of these have never been detected at radio wavelengths before. At 6" resolution, our full bandwidth Stokes I continuum maps with a central frequency of 144MHz have: a median rms sensitivity of 83$μ$Jy/beam; a flux density scale accuracy of approximately 10%; an astrometric accuracy of 0.2"; and we estimate the point-source completeness to be 90% at a peak brightness of 0.8mJy/beam. By creating three 16MHz bandwidth images across the band we are able to measure the in-band spectral index of many sources, albeit with an error on the derived spectral index of +/-0.2 which is a consequence of our flux-density scale accuracy and small fractional bandwidth. Our circular polarisation (Stokes V) 20" resolution 120-168MHz continuum images have a median rms sensitivity of 95$μ$Jy/beam, and we estimate a Stokes I to Stokes V leakage of 0.056%. Our linear polarisation (Stokes Q and Stokes U) image cubes consist of 480 x 97.6 kHz wide planes and have a median rms sensitivity per plane of 10.8mJy/beam at 4' and 2.2mJy/beam at 20"; we estimate the Stokes I to Stokes Q/U leakage to be approximately 0.2%. Here we characterise and publicly release our Stokes I, Q, U and V images in addition to the calibrated uv-data.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
The discovery of a radio galaxy of at least 5 Mpc
Authors:
Martijn S. S. L. Oei,
Reinout J. van Weeren,
Martin J. Hardcastle,
Andrea Botteon,
Tim W. Shimwell,
Pratik Dabhade,
Aivin R. D. J. G. I. B. Gast,
Huub J. A. Röttgering,
Marcus Brüggen,
Cyril Tasse,
Wendy L. Williams,
Aleksandar Shulevski
Abstract:
We discover what is in projection the largest known structure of galactic origin: a giant radio galaxy with a projected proper length of $4.99 \pm 0.04\ \mathrm{Mpc}$. The source, named Alcyoneus, was first identified in low-resolution LOFAR Two-metre Sky Survey images from which angularly compact sources had been removed. Being an extreme example in its class, Alcyoneus could shed light on the ma…
▽ More
We discover what is in projection the largest known structure of galactic origin: a giant radio galaxy with a projected proper length of $4.99 \pm 0.04\ \mathrm{Mpc}$. The source, named Alcyoneus, was first identified in low-resolution LOFAR Two-metre Sky Survey images from which angularly compact sources had been removed. Being an extreme example in its class, Alcyoneus could shed light on the main mechanisms that drive radio galaxy growth. We find that - beyond geometry - Alcyoneus and its host galaxy appear suspiciously ordinary: the total low-frequency luminosity density, stellar mass and supermassive black hole mass are all lower than, though similar to, those of the medial giant radio galaxy (percentiles $45 \pm 3\%$, $25 \pm 9 \%$ and $23 \pm 11 \%$, respectively). The source resides in a filament of the Cosmic Web, with which it might have significant thermodynamic interaction. At $5 \cdot 10^{-16}\ \mathrm{Pa}$, the pressures in the lobes are the lowest hitherto found, and Alcyoneus therefore represents one of the most promising radio galaxies yet to probe the warm-hot intergalactic medium.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.