-
(De)composing Craft: An Elementary Grammar for Sharing Expertise in Craft Workflows
Authors:
Ritik Batra,
Lydia Kim,
Ilan Mandel,
Amritansh Kwatra,
Jane L. E.,
Steven J. Jackson,
Thijs Roumen
Abstract:
Craft practices rely on evolving archives of skill and knowledge, developed through generations of craftspeople experimenting with designs, materials, and techniques. Better documentation of these practices enables the sharing of knowledge and expertise between sites and generations. However, most documentation focuses solely on the linear steps leading to final artifacts, neglecting the tacit kno…
▽ More
Craft practices rely on evolving archives of skill and knowledge, developed through generations of craftspeople experimenting with designs, materials, and techniques. Better documentation of these practices enables the sharing of knowledge and expertise between sites and generations. However, most documentation focuses solely on the linear steps leading to final artifacts, neglecting the tacit knowledge necessary to improvise, or adapt workflows to meet the unique demands of each craft project. This omission limits knowledge sharing and reduces craft to a mechanical endeavor, rather than a sophisticated way of seeing, thinking, and doing. Drawing on expert interviews and literature from HCI, CSCW and the social sciences, we develop an elementary grammar to document improvisational actions of real-world craft practices. We demonstrate the utility of this grammar with an interface called CraftLink that can be used to analyze expert videos and semi-automatically generate documentation to convey material and contextual variations of craft practices. Our user study with expert crocheters (N=7) using this interface evaluates our grammar's effectiveness in capturing and sharing expert knowledge with other craftspeople, offering new pathways for computational systems to support collaborative archives of knowledge and practice within communities.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
The Future of HCI-Policy Collaboration
Authors:
Qian Yang,
Richmond Y Wong,
Steven J Jackson,
Sabine Junginger,
Margaret D Hagan,
Thomas Gilbert,
John Zimmerman
Abstract:
Policies significantly shape computation's societal impact, a crucial HCI concern. However, challenges persist when HCI professionals attempt to integrate policy into their work or affect policy outcomes. Prior research considered these challenges at the ``border'' of HCI and policy. This paper asks: What if HCI considers policy integral to its intellectual concerns, placing system-people-policy i…
▽ More
Policies significantly shape computation's societal impact, a crucial HCI concern. However, challenges persist when HCI professionals attempt to integrate policy into their work or affect policy outcomes. Prior research considered these challenges at the ``border'' of HCI and policy. This paper asks: What if HCI considers policy integral to its intellectual concerns, placing system-people-policy interaction not at the border but nearer the center of HCI research, practice, and education? What if HCI fosters a mosaic of methods and knowledge contributions that blend system, human, and policy expertise in various ways, just like HCI has done with blending system and human expertise? We present this re-imagined HCI-policy relationship as a provocation and highlight its usefulness: It spotlights previously overlooked system-people-policy interaction work in HCI. It unveils new opportunities for HCI's futuring, empirical, and design projects. It allows HCI to coordinate its diverse policy engagements, enhancing its collective impact on policy outcomes.
△ Less
Submitted 29 September, 2024;
originally announced September 2024.
-
Streaming Technologies and Serialization Protocols: Empirical Performance Analysis
Authors:
Samuel Jackson,
Nathan Cummings,
Saiful Khan
Abstract:
Efficient data streaming is essential for real-time data analytics, visualization, and machine learning model training, particularly when dealing with high-volume datasets. Various streaming technologies and serialization protocols have been developed to cater to different streaming requirements, each performing differently depending on specific tasks and datasets involved. This variety poses chal…
▽ More
Efficient data streaming is essential for real-time data analytics, visualization, and machine learning model training, particularly when dealing with high-volume datasets. Various streaming technologies and serialization protocols have been developed to cater to different streaming requirements, each performing differently depending on specific tasks and datasets involved. This variety poses challenges in selecting the most appropriate combination, as encountered during the implementation of streaming system for the MAST fusion device data or SKA's radio astronomy data. To address this challenge, we conducted an empirical study on widely used data streaming technologies and serialization protocols. We also developed an extensible, open-source software framework to benchmark their efficiency across various performance metrics. Our study uncovers significant performance differences and trade-offs between these technologies, providing valuable insights that can guide the selection of optimal streaming and serialization solutions for modern data-intensive applications. Our goal is to equip the scientific community and industry professionals with the knowledge needed to enhance data streaming efficiency for improved data utilization and real-time analysis.
△ Less
Submitted 4 November, 2024; v1 submitted 18 July, 2024;
originally announced July 2024.
-
MLCommons Cloud Masking Benchmark with Early Stopping
Authors:
Varshitha Chennamsetti,
Gregor von Laszewski,
Ruochen Gu,
Laiba Mehnaz,
Juri Papay,
Samuel Jackson,
Jeyan Thiyagalingam,
Sergey V. Samsonau,
Geoffrey C. Fox
Abstract:
In this paper, we report on work performed for the MLCommons Science Working Group on the cloud masking benchmark. MLCommons is a consortium that develops and maintains several scientific benchmarks that aim to benefit developments in AI. The benchmarks are conducted on the High Performance Computing (HPC) Clusters of New York University and University of Virginia, as well as a commodity desktop.…
▽ More
In this paper, we report on work performed for the MLCommons Science Working Group on the cloud masking benchmark. MLCommons is a consortium that develops and maintains several scientific benchmarks that aim to benefit developments in AI. The benchmarks are conducted on the High Performance Computing (HPC) Clusters of New York University and University of Virginia, as well as a commodity desktop. We provide a description of the cloud masking benchmark, as well as a summary of our submission to MLCommons on the benchmark experiment we conducted. It includes a modification to the reference implementation of the cloud masking benchmark enabling early stopping. This benchmark is executed on the NYU HPC through a custom batch script that runs the various experiments through the batch queuing system while allowing for variation on the number of epochs trained. Our submission includes the modified code, a custom batch script to modify epochs, documentation, and the benchmark results. We report the highest accuracy (scientific metric) and the average time taken (performance metric) for training and inference that was achieved on NYU HPC Greene. We also provide a comparison of the compute capabilities between different systems by running the benchmark for one epoch. Our submission can be found in a Globus repository that is accessible to MLCommons Science Working Group.
△ Less
Submitted 30 May, 2024; v1 submitted 11 December, 2023;
originally announced January 2024.
-
Synthesis parameter effect detection using quantitative representations and high dimensional distribution distances
Authors:
Alex Hagen,
Shane Jackson
Abstract:
Detection of effects of the parameters of the synthetic process on the microstructure of materials is an important, yet elusive goal of materials science. We develop a method for detecting effects based on copula theory, high dimensional distribution distances, and permutational statistics to analyze a designed experiment synthesizing plutonium oxide from Pu(III) Oxalate. We detect effects of stri…
▽ More
Detection of effects of the parameters of the synthetic process on the microstructure of materials is an important, yet elusive goal of materials science. We develop a method for detecting effects based on copula theory, high dimensional distribution distances, and permutational statistics to analyze a designed experiment synthesizing plutonium oxide from Pu(III) Oxalate. We detect effects of strike order and oxalic acid feed on the microstructure of the resulting plutonium oxide, which match the literature well. We also detect excess bivariate effects between the pairs of acid concentration, strike order and precipitation temperature.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
The FluidFlower International Benchmark Study: Process, Modeling Results, and Comparison to Experimental Data
Authors:
Bernd Flemisch,
Jan M. Nordbotten,
Martin Fernø,
Ruben Juanes,
Holger Class,
Mojdeh Delshad,
Florian Doster,
Jonathan Ennis-King,
Jacques Franc,
Sebastian Geiger,
Dennis Gläser,
Christopher Green,
James Gunning,
Hadi Hajibeygi,
Samuel J. Jackson,
Mohamad Jammoul,
Satish Karra,
Jiawei Li,
Stephan K. Matthäi,
Terry Miller,
Qi Shao,
Catherine Spurin,
Philip Stauffer,
Hamdi Tchelepi,
Xiaoming Tian
, et al. (8 additional authors not shown)
Abstract:
Successful deployment of geological carbon storage (GCS) requires an extensive use of reservoir simulators for screening, ranking and optimization of storage sites. However, the time scales of GCS are such that no sufficient long-term data is available yet to validate the simulators against. As a consequence, there is currently no solid basis for assessing the quality with which the dynamics of la…
▽ More
Successful deployment of geological carbon storage (GCS) requires an extensive use of reservoir simulators for screening, ranking and optimization of storage sites. However, the time scales of GCS are such that no sufficient long-term data is available yet to validate the simulators against. As a consequence, there is currently no solid basis for assessing the quality with which the dynamics of large-scale GCS operations can be forecasted.
To meet this knowledge gap, we have conducted a major GCS validation benchmark study. To achieve reasonable time scales, a laboratory-size geological storage formation was constructed (the "FluidFlower"), forming the basis for both the experimental and computational work. A validation experiment consisting of repeated GCS operations was conducted in the FluidFlower, providing what we define as the true physical dynamics for this system. Nine different research groups from around the world provided forecasts, both individually and collaboratively, based on a detailed physical and petrophysical characterization of the FluidFlower sands.
The major contribution of this paper is a report and discussion of the results of the validation benchmark study, complemented by a description of the benchmarking process and the participating computational models. The forecasts from the participating groups are compared to each other and to the experimental data by means of various indicative qualitative and quantitative measures. By this, we provide a detailed assessment of the capabilities of reservoir simulators and their users to capture both the injection and post-injection dynamics of the GCS operations.
△ Less
Submitted 9 February, 2023;
originally announced February 2023.
-
Do Neural Networks Trained with Topological Features Learn Different Internal Representations?
Authors:
Sarah McGuire,
Shane Jackson,
Tegan Emerson,
Henry Kvinge
Abstract:
There is a growing body of work that leverages features extracted via topological data analysis to train machine learning models. While this field, sometimes known as topological machine learning (TML), has seen some notable successes, an understanding of how the process of learning from topological features differs from the process of learning from raw data is still limited. In this work, we begi…
▽ More
There is a growing body of work that leverages features extracted via topological data analysis to train machine learning models. While this field, sometimes known as topological machine learning (TML), has seen some notable successes, an understanding of how the process of learning from topological features differs from the process of learning from raw data is still limited. In this work, we begin to address one component of this larger issue by asking whether a model trained with topological features learns internal representations of data that are fundamentally different than those learned by a model trained with the original raw data. To quantify ``different'', we exploit two popular metrics that can be used to measure the similarity of the hidden representations of data within neural networks, neural stitching and centered kernel alignment. From these we draw a range of conclusions about how training with topological features does and does not change the representations that a model learns. Perhaps unsurprisingly, we find that structurally, the hidden representations of models trained and evaluated on topological features differ substantially compared to those trained and evaluated on the corresponding raw data. On the other hand, our experiments show that in some cases, these representations can be reconciled (at least to the degree required to solve the corresponding task) using a simple affine transformation. We conjecture that this means that neural networks trained on raw data may extract some limited topological features in the process of making predictions.
△ Less
Submitted 14 November, 2022;
originally announced November 2022.
-
Complexity and Ramsey Largeness of Sets of Oracles Separating Complexity Classes
Authors:
Alex Creiner,
Stephen Jackson
Abstract:
We prove two sets of results concerning computational complexity classes. The first concerns a variation of the random oracle hypothesis posed by Bennett and Gill after they showed that relative to a randomly chosen oracle, P not equal NP with probability 1. This hypothesis was quickly disproven in several ways, most famously in 1992 with the result that IP equals PSPACE, in spite of the classes b…
▽ More
We prove two sets of results concerning computational complexity classes. The first concerns a variation of the random oracle hypothesis posed by Bennett and Gill after they showed that relative to a randomly chosen oracle, P not equal NP with probability 1. This hypothesis was quickly disproven in several ways, most famously in 1992 with the result that IP equals PSPACE, in spite of the classes being shown unequal with probability 1. Here we propose a variation of what it means to be ``large'' using the Ellentuck topology. In this new context, we demonstrate that the set of oracles separating NP and co-NP is not small, and obtain similar results for the separation of PSPACE from PH along with the separation of NP from BQP. We demonstrate that this version of the hypothesis turns it into a sufficient condition for unrelativized relationships, at least in the three cases considered here. Second, we example the descriptive complexity of the classes of oracles providing the separations for these various classes, and determine their exact placement in the Borel hierarchy.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
A comparative study of paired versus unpaired deep learning methods for physically enhancing digital rock image resolution
Authors:
Yufu Niu,
Samuel J. Jackson,
Naif Alqahtani,
Peyman Mostaghimi,
Ryan T. Armstrong
Abstract:
X-ray micro-computed tomography (micro-CT) has been widely leveraged to characterise pore-scale geometry in subsurface porous rock. Recent developments in super resolution (SR) methods using deep learning allow the digital enhancement of low resolution (LR) images over large spatial scales, creating SR images comparable to the high resolution (HR) ground truth. This circumvents traditional resolut…
▽ More
X-ray micro-computed tomography (micro-CT) has been widely leveraged to characterise pore-scale geometry in subsurface porous rock. Recent developments in super resolution (SR) methods using deep learning allow the digital enhancement of low resolution (LR) images over large spatial scales, creating SR images comparable to the high resolution (HR) ground truth. This circumvents traditional resolution and field-of-view trade-offs. An outstanding issue is the use of paired (registered) LR and HR data, which is often required in the training step of such methods but is difficult to obtain. In this work, we rigorously compare two different state-of-the-art SR deep learning techniques, using both paired and unpaired data, with like-for-like ground truth data. The first approach requires paired images to train a convolutional neural network (CNN) while the second approach uses unpaired images to train a generative adversarial network (GAN). The two approaches are compared using a micro-CT carbonate rock sample with complicated micro-porous textures. We implemented various image based and numerical verifications and experimental validation to quantitatively evaluate the physical accuracy and sensitivities of the two methods. Our quantitative results show that unpaired GAN approach can reconstruct super-resolution images as precise as paired CNN method, with comparable training times and dataset requirement. This unlocks new applications for micro-CT image enhancement using unpaired deep learning methods; image registration is no longer needed during the data processing stage. Decoupled images from data storage platforms can be exploited more efficiently to train networks for SR digital rock applications. This opens up a new pathway for various applications of multi-scale flow simulation in heterogeneous porous media.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
Deep learning of multi-resolution X-Ray micro-CT images for multi-scale modelling
Authors:
Samuel J. Jackson,
Yufu Niu,
Sojwal Manoorkar,
Peyman Mostaghimi,
Ryan T. Armstrong
Abstract:
Field-of-view and resolution trade-offs in X-Ray micro-computed tomography (micro-CT) imaging limit the characterization, analysis and model development of multi-scale porous systems. To this end, we developed an applied methodology utilising deep learning to enhance low resolution images over large sample sizes and create multi-scale models capable of accurately simulating experimental fluid dyna…
▽ More
Field-of-view and resolution trade-offs in X-Ray micro-computed tomography (micro-CT) imaging limit the characterization, analysis and model development of multi-scale porous systems. To this end, we developed an applied methodology utilising deep learning to enhance low resolution images over large sample sizes and create multi-scale models capable of accurately simulating experimental fluid dynamics from the pore (microns) to continuum (centimetres) scale. We develop a 3D Enhanced Deep Super Resolution (EDSR) convolutional neural network to create super resolution (SR) images from low resolution images, which alleviates common micro-CT hardware/reconstruction defects in high-resolution (HR) images. When paired with pore-network simulations and parallel computation, we can create large 3D continuum-scale models with spatially varying flow & material properties. We quantitatively validate the workflow at various scales using direct HR/SR image similarity, pore-scale material/flow simulations and continuum scale multiphase flow experiments (drainage immiscible flow pressures and 3D fluid volume fractions). The SR images and models are comparable to the HR ground truth, and generally accurate to within experimental uncertainty at the continuum scale across a range of flow rates. They are found to be significantly more accurate than their LR counterparts, especially in cases where a wide distribution of pore-sizes are encountered. The applied methodology opens up the possibility to image, model and analyse truly multi-scale heterogeneous systems that are otherwise intractable.
△ Less
Submitted 15 March, 2022; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Accelerated Computation of a High Dimensional Kolmogorov-Smirnov Distance
Authors:
Alex Hagen,
Shane Jackson,
James Kahn,
Jan Strube,
Isabel Haide,
Karl Pazdernik,
Connor Hainje
Abstract:
Statistical testing is widespread and critical for a variety of scientific disciplines. The advent of machine learning and the increase of computing power has increased the interest in the analysis and statistical testing of multidimensional data. We extend the powerful Kolmogorov-Smirnov two sample test to a high dimensional form in a similar manner to Fasano (Fasano, 1987). We call our result th…
▽ More
Statistical testing is widespread and critical for a variety of scientific disciplines. The advent of machine learning and the increase of computing power has increased the interest in the analysis and statistical testing of multidimensional data. We extend the powerful Kolmogorov-Smirnov two sample test to a high dimensional form in a similar manner to Fasano (Fasano, 1987). We call our result the d-dimensional Kolmogorov-Smirnov test (ddKS) and provide three novel contributions therewith: we develop an analytical equation for the significance of a given ddKS score, we provide an algorithm for computation of ddKS on modern computing hardware that is of constant time complexity for small sample sizes and dimensions, and we provide two approximate calculations of ddKS: one that reduces the time complexity to linear at larger sample sizes, and another that reduces the time complexity to linear with increasing dimension. We perform power analysis of ddKS and its approximations on a corpus of datasets and compare to other common high dimensional two sample tests and distances: Hotelling's T^2 test and Kullback-Leibler divergence. Our ddKS test performs well for all datasets, dimensions, and sizes tested, whereas the other tests and distances fail to reject the null hypothesis on at least one dataset. We therefore conclude that ddKS is a powerful multidimensional two sample test for general use, and can be calculated in a fast and efficient manner using our parallel or approximate methods. Open source implementations of all methods described in this work are located at https://github.com/pnnl/ddks.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Physics-informed Neural-Network Software for Molecular Dynamics Applications
Authors:
Taufeq Mohammed Razakh,
Beibei Wang,
Shane Jackson,
Rajiv K. Kalia,
Aiichiro Nakano,
Ken-ichi Nomura,
Priya Vashishta
Abstract:
We have developed a novel differential equation solver software called PND based on the physics-informed neural network for molecular dynamics simulators. Based on automatic differentiation technique provided by Pytorch, our software allows users to flexibly implement equation of atom motions, initial and boundary conditions, and conservation laws as loss function to train the network. PND comes w…
▽ More
We have developed a novel differential equation solver software called PND based on the physics-informed neural network for molecular dynamics simulators. Based on automatic differentiation technique provided by Pytorch, our software allows users to flexibly implement equation of atom motions, initial and boundary conditions, and conservation laws as loss function to train the network. PND comes with a parallel molecular dynamics (MD) engine in order for users to examine and optimize loss function design, and different conservation laws and boundary conditions, and hyperparameters, thereby accelerate the PINN-based development for molecular applications.
△ Less
Submitted 21 November, 2020; v1 submitted 6 November, 2020;
originally announced November 2020.
-
Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects
Authors:
Samir Passi,
Steven J. Jackson
Abstract:
The trustworthiness of data science systems in applied and real-world settings emerges from the resolution of specific tensions through situated, pragmatic, and ongoing forms of work. Drawing on research in CSCW, critical data studies, and history and sociology of science, and six months of immersive ethnographic fieldwork with a corporate data science team, we describe four common tensions in app…
▽ More
The trustworthiness of data science systems in applied and real-world settings emerges from the resolution of specific tensions through situated, pragmatic, and ongoing forms of work. Drawing on research in CSCW, critical data studies, and history and sociology of science, and six months of immersive ethnographic fieldwork with a corporate data science team, we describe four common tensions in applied data science work: (un)equivocal numbers, (counter)intuitive knowledge, (in)credible data, and (in)scrutable models. We show how organizational actors establish and re-negotiate trust under messy and uncertain analytic conditions through practices of skepticism, assessment, and credibility. Highlighting the collaborative and heterogeneous nature of real-world data science, we show how the management of trust in applied corporate data science settings depends not only on pre-processing and quantification, but also on negotiation and translation. We conclude by discussing the implications of our findings for data science research and practice, both within and beyond CSCW.
△ Less
Submitted 9 February, 2020;
originally announced February 2020.
-
Data Vision: Learning to See Through Algorithmic Abstraction
Authors:
Samir Passi,
Steven J. Jackson
Abstract:
Learning to see through data is central to contemporary forms of algorithmic knowledge production. While often represented as a mechanical application of rules, making algorithms work with data requires a great deal of situated work. This paper examines how the often-divergent demands of mechanization and discretion manifest in data analytic learning environments. Drawing on research in CSCW and t…
▽ More
Learning to see through data is central to contemporary forms of algorithmic knowledge production. While often represented as a mechanical application of rules, making algorithms work with data requires a great deal of situated work. This paper examines how the often-divergent demands of mechanization and discretion manifest in data analytic learning environments. Drawing on research in CSCW and the social sciences, and ethnographic fieldwork in two data learning environments, we show how an algorithm's application is seen sometimes as a mechanical sequence of rules and at other times as an array of situated decisions. Casting data analytics as a rule-based (rather than rule-bound) practice, we show that effective data vision requires would-be analysts to straddle the competing demands of formal abstraction and empirical contingency. We conclude by discussing how the notion of data vision can help better leverage the role of human work in data analytic learning, research, and practice.
△ Less
Submitted 9 February, 2020;
originally announced February 2020.
-
Machine Learning and Big Scientific Data
Authors:
Tony Hey,
Keith Butler,
Sam Jackson,
Jeyarajan Thiyagalingam
Abstract:
This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory site at Harwell near Oxford. Such "Big Scientific Data" comes from the Diamond Light Source and Electron Microscopy Facilities, the ISIS Neutron and Muon Facility, and the UK's Central Lase…
▽ More
This paper reviews some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory site at Harwell near Oxford. Such "Big Scientific Data" comes from the Diamond Light Source and Electron Microscopy Facilities, the ISIS Neutron and Muon Facility, and the UK's Central Laser Facility. Increasingly, scientists are now needing to use advanced machine learning and other AI technologies both to automate parts of the data pipeline and also to help find new scientific discoveries in the analysis of their data. For commercially important applications, such as object recognition, natural language processing and automatic translation, deep learning has made dramatic breakthroughs. Google's DeepMind has now also used deep learning technology to develop their AlphaFold tool to make predictions for protein folding. Remarkably, they have been able to achieve some spectacular results for this specific scientific problem. Can deep learning be similarly transformative for other scientific problems? After a brief review of some initial applications of machine learning at the Rutherford Appleton Laboratory, we focus on challenges and opportunities for AI in advancing materials science. Finally, we discuss the importance of developing some realistic machine learning benchmarks using Big Scientific Data coming from a number of different scientific domains. We conclude with some initial examples of our "SciML" benchmark suite and of the research challenges these benchmarks will enable.
△ Less
Submitted 12 October, 2019;
originally announced October 2019.
-
Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge
Authors:
Spyridon Bakas,
Mauricio Reyes,
Andras Jakab,
Stefan Bauer,
Markus Rempfler,
Alessandro Crimi,
Russell Takeshi Shinohara,
Christoph Berger,
Sung Min Ha,
Martin Rozycki,
Marcel Prastawa,
Esther Alberts,
Jana Lipkova,
John Freymann,
Justin Kirby,
Michel Bilello,
Hassan Fathallah-Shaykh,
Roland Wiest,
Jan Kirschke,
Benedikt Wiestler,
Rivka Colen,
Aikaterini Kotrotsou,
Pamela Lamontagne,
Daniel Marcus,
Mikhail Milchenko
, et al. (402 additional authors not shown)
Abstract:
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles dissem…
▽ More
Gliomas are the most common primary brain malignancies, with different degrees of aggressiveness, variable prognosis and various heterogeneous histologic sub-regions, i.e., peritumoral edematous/invaded tissue, necrotic core, active and non-enhancing core. This intrinsic heterogeneity is also portrayed in their radio-phenotype, as their sub-regions are depicted by varying intensity profiles disseminated across multi-parametric magnetic resonance imaging (mpMRI) scans, reflecting varying biological properties. Their heterogeneous shape, extent, and location are some of the factors that make these tumors difficult to resect, and in some cases inoperable. The amount of resected tumor is a factor also considered in longitudinal scans, when evaluating the apparent tumor for potential diagnosis of progression. Furthermore, there is mounting evidence that accurate segmentation of the various tumor sub-regions can offer the basis for quantitative image analysis towards prediction of patient overall survival. This study assesses the state-of-the-art machine learning (ML) methods used for brain tumor image analysis in mpMRI scans, during the last seven instances of the International Brain Tumor Segmentation (BraTS) challenge, i.e., 2012-2018. Specifically, we focus on i) evaluating segmentations of the various glioma sub-regions in pre-operative mpMRI scans, ii) assessing potential tumor progression by virtue of longitudinal growth of tumor sub-regions, beyond use of the RECIST/RANO criteria, and iii) predicting the overall survival from pre-operative mpMRI scans of patients that underwent gross total resection. Finally, we investigate the challenge of identifying the best ML algorithms for each of these tasks, considering that apart from being diverse on each instance of the challenge, the multi-institutional mpMRI BraTS dataset has also been a continuously evolving/growing dataset.
△ Less
Submitted 23 April, 2019; v1 submitted 5 November, 2018;
originally announced November 2018.
-
3D Human Body Reconstruction from a Single Image via Volumetric Regression
Authors:
Aaron S. Jackson,
Chris Manafas,
Georgios Tzimiropoulos
Abstract:
This paper proposes the use of an end-to-end Convolutional Neural Network for direct reconstruction of the 3D geometry of humans via volumetric regression. The proposed method does not require the fitting of a shape model and can be trained to work from a variety of input types, whether it be landmarks, images or segmentation masks. Additionally, non-visible parts, either self-occluded or otherwis…
▽ More
This paper proposes the use of an end-to-end Convolutional Neural Network for direct reconstruction of the 3D geometry of humans via volumetric regression. The proposed method does not require the fitting of a shape model and can be trained to work from a variety of input types, whether it be landmarks, images or segmentation masks. Additionally, non-visible parts, either self-occluded or otherwise, are still reconstructed, which is not the case with depth map regression. We present results that show that our method can handle both pose variation and detailed reconstruction given appropriate datasets for training.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.
-
Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression
Authors:
Aaron S. Jackson,
Adrian Bulat,
Vasileios Argyriou,
Georgios Tzimiropoulos
Abstract:
3D face reconstruction is a fundamental Computer Vision problem of extraordinary difficulty. Current systems often assume the availability of multiple facial images (sometimes from the same subject) as input, and must address a number of methodological challenges such as establishing dense correspondences across large facial poses, expressions, and non-uniform illumination. In general these method…
▽ More
3D face reconstruction is a fundamental Computer Vision problem of extraordinary difficulty. Current systems often assume the availability of multiple facial images (sometimes from the same subject) as input, and must address a number of methodological challenges such as establishing dense correspondences across large facial poses, expressions, and non-uniform illumination. In general these methods require complex and inefficient pipelines for model building and fitting. In this work, we propose to address many of these limitations by training a Convolutional Neural Network (CNN) on an appropriate dataset consisting of 2D images and 3D facial models or scans. Our CNN works with just a single 2D facial image, does not require accurate alignment nor establishes dense correspondence between images, works for arbitrary facial poses and expressions, and can be used to reconstruct the whole 3D facial geometry (including the non-visible parts of the face) bypassing the construction (during training) and fitting (during testing) of a 3D Morphable Model. We achieve this via a simple CNN architecture that performs direct regression of a volumetric representation of the 3D facial geometry from a single 2D image. We also demonstrate how the related task of facial landmark localization can be incorporated into the proposed framework and help improve reconstruction quality, especially for the cases of large poses and facial expressions. Testing code will be made available online, along with pre-trained models http://aaronsplace.co.uk/papers/jackson2017recon
△ Less
Submitted 8 September, 2017; v1 submitted 22 March, 2017;
originally announced March 2017.
-
pyFRET: A Python Library for Single Molecule Fluorescence Data Analysis
Authors:
Rebecca R. Murphy,
Sophie E. Jackson,
David Klenerman
Abstract:
Single molecule Förster resonance energy transfer (smFRET) is a powerful experimental technique for studying the properties of individual biological molecules in solution. However, as adoption of smFRET techniques becomes more widespread, the lack of available software, whether open source or commercial, for data analysis, is becoming a significant issue. Here, we present pyFRET, an open source Py…
▽ More
Single molecule Förster resonance energy transfer (smFRET) is a powerful experimental technique for studying the properties of individual biological molecules in solution. However, as adoption of smFRET techniques becomes more widespread, the lack of available software, whether open source or commercial, for data analysis, is becoming a significant issue. Here, we present pyFRET, an open source Python package for the analysis of data from single-molecule fluorescence experiments from freely diffusing biomolecules. The package provides methods for the complete analysis of a smFRET dataset, from burst selection and denoising, through data visualisation and model fitting. We provide support for both continuous excitation and alternating laser excitation (ALEX) data analysis. pyFRET is available as a package downloadable from the Python Package Index (PyPI) under the open source three-clause BSD licence, together with links to extensive documentation and tutorials, including example usage and test data. Additional documentation including tutorials is hosted independently on ReadTheDocs. The code is available from the free hosting site Bitbucket. Through distribution of this software, we hope to lower the barrier for the adoption of smFRET experiments by other research groups and we encourage others to contribute modules for specific analysis needs.
△ Less
Submitted 19 December, 2014;
originally announced December 2014.