-
Contributions of the Petabyte Scale Sequence Search Codeathon toward efforts to scale sequence-based searches on SRA
Authors:
Priyanka Ghosh,
Kjiersten Fagnan,
Ryan Connor,
Ravinder Pannu,
Travis J. Wheeler,
Mihai Pop,
C. Titus Brown,
Tessa Pierce-Ward,
Rob Patro,
Jacquelyn S. Michaelis,
Thomas L. Madden,
Christiam Camacho,
Olaitan I. Awe,
Arianna I. Krinos,
René KM Xavier,
Rodrigo Ortega Polo,
Jack W. Roddy,
Adelaide Rhodes,
Alexander Sweeten,
Adrian Viehweger,
Bariş Ekim,
Harihara Subrahmaniam Muralidharan,
Amatur Rahman,
Vinícius W. Salazar,
Andrew Tritt
, et al. (13 additional authors not shown)
Abstract:
The volume of biological data being generated by the scientific community is growing exponentially, reflecting technological advances and research activities. The National Institutes of Health's (NIH) Sequence Read Archive (SRA), which is maintained by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM), is a rapidly growing public database that resea…
▽ More
The volume of biological data being generated by the scientific community is growing exponentially, reflecting technological advances and research activities. The National Institutes of Health's (NIH) Sequence Read Archive (SRA), which is maintained by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM), is a rapidly growing public database that researchers use to drive scientific discovery across all domains of life. This increase in available data has great promise for pushing scientific discovery but also introduces new challenges that scientific communities need to address. As genomic datasets have grown in scale and diversity, a parade of new methods and associated software have been developed to address the challenges posed by this growth. These methodological advances are vital for maximally leveraging the power of next-generation sequencing (NGS) technologies. With the goal of laying a foundation for evaluation of methods for petabyte-scale sequence search, the Department of Energy (DOE) Office of Biological and Environmental Research (BER), the NIH Office of Data Science Strategy (ODSS), and NCBI held a virtual codeathon 'Petabyte Scale Sequence Search: Metagenomics Benchmarking Codeathon' on September 27 - Oct 1 2021, to evaluate emerging solutions in petabyte scale sequence search. The codeathon attracted experts from national laboratories, research institutions, and universities across the world to (a) develop benchmarking approaches to address challenges in conducting large-scale analyses of metagenomic data (which comprises approximately 20% of SRA), (b) identify potential applications that benefit from SRA-wide searches and the tools required to execute the search, and (c) produce community resources i.e. a public facing repository with information to rebuild and reproduce the problems addressed by each team challenge.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
Pseudo-Haptics Survey: Human-Computer Interaction in Extended Reality & Teleoperation
Authors:
Rui Xavier,
José Luís Silva,
Rodrigo Ventura,
Joaquim Jorge
Abstract:
Pseudo-haptic techniques are becoming increasingly popular in human-computer interaction. They replicate haptic sensations by leveraging primarily visual feedback rather than mechanical actuators. These techniques bridge the gap between the real and virtual worlds by exploring the brain's ability to integrate visual and haptic information. One of the many advantages of pseudo-haptic techniques is…
▽ More
Pseudo-haptic techniques are becoming increasingly popular in human-computer interaction. They replicate haptic sensations by leveraging primarily visual feedback rather than mechanical actuators. These techniques bridge the gap between the real and virtual worlds by exploring the brain's ability to integrate visual and haptic information. One of the many advantages of pseudo-haptic techniques is that they are cost-effective, portable, and flexible. They eliminate the need for direct attachment of haptic devices to the body, which can be heavy and large and require a lot of power and maintenance. Recent research has focused on applying these techniques to extended reality and mid-air interactions. To better understand the potential of pseudo-haptic techniques, the authors developed a novel taxonomy encompassing tactile feedback, kinesthetic feedback, and combined categories in multimodal approaches, ground not covered by previous surveys. This survey highlights multimodal strategies and potential avenues for future studies, particularly regarding integrating these techniques into extended reality and collaborative virtual environments.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
The Moon, a disk or a sphere?
Authors:
E. Seperuelo Duarte,
A. T. Mota,
J. R. de Carvalho,
R. C. Xavier,
P. V. S. Souza
Abstract:
In this paper, we present a physical modeling activity whose objective is to allow students to determine the differences between a disk and a sphere using pure scientific criteria. Thereunto, we reproduce the Sun-Earth-Moon system with low-cost materials and compare the illumination effects on the Moon considering two possible shapes for it (a sphere and a disk). The analysis is based on the shape…
▽ More
In this paper, we present a physical modeling activity whose objective is to allow students to determine the differences between a disk and a sphere using pure scientific criteria. Thereunto, we reproduce the Sun-Earth-Moon system with low-cost materials and compare the illumination effects on the Moon considering two possible shapes for it (a sphere and a disk). The analysis is based on the shape of the Terminator line produced in each case as a function of the illumination angle. The results obtained are first discussed and then applied so that one can interpret the observed patterns in the illumination effects of other celestial bodies, such as Venus or even the Earth. Thereby, the activity can be very useful to unmask the unscientific idea of Flat Earth. The entire activity is easily replicable and it may be useful to promote a more realistic view of science and its methods.
△ Less
Submitted 26 August, 2021; v1 submitted 11 June, 2021;
originally announced June 2021.
-
In what sense space dimensionality can be used to cast light into cultural anthropology?
Authors:
Francisco Caruso,
Roberto Moreira Xavier
Abstract:
Humans have always constructed spaces, through Mythos and Logos, as part of an aspiration to capture the essence of the changing world. This has been a permanent endeavour since the invention of language. By doing this, in fact, Humankind started constructing itself: we are beings in constant evolutionary process in real and imaginary spaces. Our concepts of Space and our anthropological ideas, sp…
▽ More
Humans have always constructed spaces, through Mythos and Logos, as part of an aspiration to capture the essence of the changing world. This has been a permanent endeavour since the invention of language. By doing this, in fact, Humankind started constructing itself: we are beings in constant evolutionary process in real and imaginary spaces. Our concepts of Space and our anthropological ideas, specially the fundamental concepts of subject and subjectivity, are intertwined and intimately connected. We believe that the great narratives about Humanity, which ultimately define our view of ourselves, are entangled with those concepts that Cassirer identified as the cornerstones of culture: space, time, and number. To explore these ideas, the authors wrote an essay, in 2017, in a book format, in which the fundamental role of real and imaginary spaces (and especially of their dimensionalities) in the History of Culture was discussed. This book, titled "O Livro, o Espaço e a Natureza: Ensaio Sobre a Leitura do Mundo, as Mutações da Cultura e do Sujeito", has a preface written by Francisco Antonio Doria. As many of the issues treated there are among his multiple interests, it was decided to revisit here the problems of subjectivity and subject's relationship with the dimensionality of space including the question of the architecture of books and other writing supports.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Community Detection in Weighted Multilayer Networks with Ambient Noise
Authors:
Mark He,
Dylan Lu,
Jason Xu,
Rose Mary Xavier
Abstract:
We introduce a novel model for multilayer weighted networks that accounts for global noise in addition to local signals. The model is similar to a multilayer stochastic blockmodel (SBM), but the key difference is that between-block interactions independent across layers are common for the whole system, which we call ambient noise. A single block is also characterized by these fixed ambient paramet…
▽ More
We introduce a novel model for multilayer weighted networks that accounts for global noise in addition to local signals. The model is similar to a multilayer stochastic blockmodel (SBM), but the key difference is that between-block interactions independent across layers are common for the whole system, which we call ambient noise. A single block is also characterized by these fixed ambient parameters to represent members that do not belong anywhere else. This approach allows simultaneous clustering and typologizing of blocks into signal or noise in order to better understand their roles in the overall system, which is not accounted for by existing Blockmodels. We employ a novel application of hierarchical variational inference to jointly detect and differentiate types of blocks. We call this model for multilayer weighted networks the Stochastic Block (with) Ambient Noise Model (SBANM) and develop an associated community detection algorithm. We apply this method to subjects in the Philadelphia Neurodevelopmental Cohort to discover communities of subjects with co-occurrent psychopathologies in relation to psychosis.
△ Less
Submitted 24 July, 2022; v1 submitted 24 February, 2021;
originally announced March 2021.
-
A step toward a reinforcement learning de novo genome assembler
Authors:
Kleber Padovani,
Roberto Xavier,
Rafael Cabral Borges,
Andre Carvalho,
Anna Reali,
Annie Chateau,
Ronnie Alves
Abstract:
De novo genome assembly is a relevant but computationally complex task in genomics. Although de novo assemblers have been used successfully in several genomics projects, there is still no 'best assembler', and the choice and setup of assemblers still rely on bioinformatics experts. Thus, as with other computationally complex problems, machine learning may emerge as an alternative (or complementary…
▽ More
De novo genome assembly is a relevant but computationally complex task in genomics. Although de novo assemblers have been used successfully in several genomics projects, there is still no 'best assembler', and the choice and setup of assemblers still rely on bioinformatics experts. Thus, as with other computationally complex problems, machine learning may emerge as an alternative (or complementary) way for developing more accurate and automated assemblers. Reinforcement learning has proven promising for solving complex activities without supervision - such games - and there is a pressing need to understand the limits of this approach to 'real' problems, such as the DFA problem. This study aimed to shed light on the application of machine learning, using reinforcement learning (RL), in genome assembly. We expanded upon the sole previous approach found in the literature to solve this problem by carefully exploring the learning aspects of the proposed intelligent agent, which uses the Q-learning algorithm, and we provided insights for the next steps of automated genome assembly development. We improved the reward system and optimized the exploration of the state space based on pruning and in collaboration with evolutionary computing. We tested the new approaches on 23 new larger environments, which are all available on the internet. Our results suggest consistent performance progress; however, we also found limitations, especially concerning the high dimensionality of state and action spaces. Finally, we discuss paths for achieving efficient and automated genome assembly in real scenarios considering successful RL applications - including deep reinforcement learning.
△ Less
Submitted 7 March, 2024; v1 submitted 2 February, 2021;
originally announced February 2021.
-
The Human Cell Atlas White Paper
Authors:
Aviv Regev,
Sarah Teichmann,
Orit Rozenblatt-Rosen,
Michael Stubbington,
Kristin Ardlie,
Ido Amit,
Paola Arlotta,
Gary Bader,
Christophe Benoist,
Moshe Biton,
Bernd Bodenmiller,
Benoit Bruneau,
Peter Campbell,
Mary Carmichael,
Piero Carninci,
Leslie Castelo-Soccio,
Menna Clatworthy,
Hans Clevers,
Christian Conrad,
Roland Eils,
Jeremy Freeman,
Lars Fugger,
Berthold Goettgens,
Daniel Graham,
Anna Greka
, et al. (56 additional authors not shown)
Abstract:
The Human Cell Atlas (HCA) will be made up of comprehensive reference maps of all human cells - the fundamental units of life - as a basis for understanding fundamental human biological processes and diagnosing, monitoring, and treating disease. It will help scientists understand how genetic variants impact disease risk, define drug toxicities, discover better therapies, and advance regenerative m…
▽ More
The Human Cell Atlas (HCA) will be made up of comprehensive reference maps of all human cells - the fundamental units of life - as a basis for understanding fundamental human biological processes and diagnosing, monitoring, and treating disease. It will help scientists understand how genetic variants impact disease risk, define drug toxicities, discover better therapies, and advance regenerative medicine. A resource of such ambition and scale should be built in stages, increasing in size, breadth, and resolution as technologies develop and understanding deepens. We will therefore pursue Phase 1 as a suite of flagship projects in key tissues, systems, and organs. We will bring together experts in biology, medicine, genomics, technology development and computation (including data analysis, software engineering, and visualization). We will also need standardized experimental and computational methods that will allow us to compare diverse cell and tissue types - and samples across human communities - in consistent ways, ensuring that the resulting resource is truly global.
This document, the first version of the HCA White Paper, was written by experts in the field with feedback and suggestions from the HCA community, gathered during recent international meetings. The White Paper, released at the close of this yearlong planning process, will be a living document that evolves as the HCA community provides additional feedback, as technological and computational advances are made, and as lessons are learned during the construction of the atlas.
△ Less
Submitted 11 October, 2018;
originally announced October 2018.
-
Robust chemical solver for fully-implicit simulations
Authors:
McNeece Colin,
Raynaud Xavier,
Nilsen Halvor,
Hesse Marc
Abstract:
The study of geological systems requires the solution of complex geochemical relations. We present an implementation of a chemical solver which can handle various types of models, including surface chemistry. The implementation is done in view of easy coupling with flow simulations to obtain a fully-coupled, fully-implicit solver for chemical reaction transport equations applicable to realistic re…
▽ More
The study of geological systems requires the solution of complex geochemical relations. We present an implementation of a chemical solver which can handle various types of models, including surface chemistry. The implementation is done in view of easy coupling with flow simulations to obtain a fully-coupled, fully-implicit solver for chemical reaction transport equations applicable to realistic reservoir models.
△ Less
Submitted 8 June, 2018;
originally announced June 2018.
-
Comparison between cell-centered and nodal based discretization schemes for linear elasticity
Authors:
Nilsen Halvor,
Nordbotten Jan,
Raynaud Xavier
Abstract:
In this paper we study newly developed methods for linear elasticity on polyhedral meshes. Our emphasis is on applications of the methods to geological models. Models of subsurface, and in particular sedimentary rocks, naturally lead to general polyhedral meshes. Numerical methods which can directly handle such representation are highly desirable. Many of the numerical challenges in simulation of…
▽ More
In this paper we study newly developed methods for linear elasticity on polyhedral meshes. Our emphasis is on applications of the methods to geological models. Models of subsurface, and in particular sedimentary rocks, naturally lead to general polyhedral meshes. Numerical methods which can directly handle such representation are highly desirable. Many of the numerical challenges in simulation of subsurface applications come from the lack of robustness and accuracy of numerical methods in the case of highly distorted grids. In this paper we investigate and compare the Multi-Point Stress Approximation (MPSA) and the Virtual Element Method (VEM) with regards to grid features that are frequently seen in geological models and likely to lead to a lack of accuracy of the methods. In particular we look how the methods perform near the incompressible limit. This work shows that both methods are promising for flexible modeling of subsurface mechanics.
△ Less
Submitted 28 April, 2016;
originally announced April 2016.
-
Causa Efficiens versus Causa Formalis: origens da discussão moderna sobre a dimensionalidade do espaço
Authors:
Francisco Caruso,
Roberto Moreira Xavier
Abstract:
Metascientific criteria used for explaining or constraining physical space dimensionality and their historical relationship to prevailing causal systems are discussed. The important contributions by Aristotle, Kant and Ehrenfest to the dimensionality of space problem are considered and shown to be grounded on different causal explanations: {\it causa materialis} for Aristotle, {\it causa efficiens…
▽ More
Metascientific criteria used for explaining or constraining physical space dimensionality and their historical relationship to prevailing causal systems are discussed. The important contributions by Aristotle, Kant and Ehrenfest to the dimensionality of space problem are considered and shown to be grounded on different causal explanations: {\it causa materialis} for Aristotle, {\it causa efficiens} for young Kant and an ingenious combination of {\it causa efficiens} and {\it causa formalis} for Ehrenfest. The prominent and growing rôle played by {\it causa formalis} in modern physical approaches to this problem is emphasized.
△ Less
Submitted 3 May, 2015;
originally announced May 2015.
-
Lipschitz metric for the two-component Camassa--Holm system
Authors:
Grunert Katrin,
Holden Helge,
Raynaud Xavier
Abstract:
We construct a Lipschitz metric for conservative solutions of the Cauchy problem on the line for the two-component Camassa--Holm system $u_t-u_{txx}+3uu_x-2u_xu_{xx}-uu_{xxx}+ρρ_x=0$, and $ρ_t+(uρ)_x=0$ with given initial data $(u_0, ρ_0)$. The Lipschitz metric $d_{\D^M}$ has the property that for two solutions $z(t)=(u(t),ρ(t),μ_t)$ and $\tilde z(t)=(\tilde u(t),\tilde ρ(t),\tilde μ_t)$ of the sy…
▽ More
We construct a Lipschitz metric for conservative solutions of the Cauchy problem on the line for the two-component Camassa--Holm system $u_t-u_{txx}+3uu_x-2u_xu_{xx}-uu_{xxx}+ρρ_x=0$, and $ρ_t+(uρ)_x=0$ with given initial data $(u_0, ρ_0)$. The Lipschitz metric $d_{\D^M}$ has the property that for two solutions $z(t)=(u(t),ρ(t),μ_t)$ and $\tilde z(t)=(\tilde u(t),\tilde ρ(t),\tilde μ_t)$ of the system we have $d_{\D^M}(z(t),\tilde z(t))\le C_{M,T} d_{\D^M}(z_0,\tilde z_0)$ for $t\in[0,T]$. Here the measure $μ_t$ is such that its absolutely continuous part equals the energy $(u^2+u_x^2+ρ^2)(t)dx$, and the solutions are restricted to a ball of radius $M$.
△ Less
Submitted 28 June, 2013;
originally announced June 2013.
-
On the Physical Problem of Spatial Dimensions: An Alternative Procedure to Stability Arguments
Authors:
Francisco Caruso,
Roberto Moreira Xavier
Abstract:
Why is space 3-dimensional? The first answer to this question, entirely based on Physics, was given by Ehrenfest, in 1917, who showed that the stability requirement for $n$-dimensional two-body planetary system very strongly constrains space dimensionality, favoring 3-d. This kind of approach will be generically called "stability postulate" throughout this paper and was shown by Tangherlini, in 19…
▽ More
Why is space 3-dimensional? The first answer to this question, entirely based on Physics, was given by Ehrenfest, in 1917, who showed that the stability requirement for $n$-dimensional two-body planetary system very strongly constrains space dimensionality, favoring 3-d. This kind of approach will be generically called "stability postulate" throughout this paper and was shown by Tangherlini, in 1963, to be still valid in the framework of general relativity as well as for quantum mechanical hydrogen atom, giving the same constraint for space-dimensionality. In the present work, before criticizing this methodology, a brief discussion has been introduced, aimed at stressing and clarifying some general physical aspects of the problem of how to determine the number of space dimensions. Then, the epistemological consequences of Ehrenfest's methodology are critically reviewed. An alternative procedure to get at the proper number of dimensions, in which the stability postulate (and the implicit singularities in three-dimensional physics) are not an essential part of the argument, is proposed. In this way, the main epistemological problems contained in Ehrenfest's original idea are avoided. The alternative methodology proposed in this paper is realized by obtaining and discussing the $n$-dimensional quantum theory as expressed in Planck's law, de Broglie relation and the Heisenberg uncertainty relation. As a consequence, it is possible to propose an experiment, based on thermal neutron diffraction by crystals, to directly measure space dimensionality. Finally the distinguished role of Maxwell's electromagnetic theory in the determination of space dimensionality is stressed.
△ Less
Submitted 22 May, 2012;
originally announced May 2012.
-
On Kant's first insight into the problem of space dimensionality and its physical foundations
Authors:
Francisco Caruso,
Roberto Moreira Xavier
Abstract:
In this article it is shown that a careful analysis of Kant's "Thoughts on the True Estimation of Living Forces" leads to a conclusion that does not match the usually accepted interpretation of Kant's reasoning in 1747, according to which the Young Kant supposedly establishes a relationship between the tridimensionality of space and Newton's law of universal gravitation. Indeed, it is argued that…
▽ More
In this article it is shown that a careful analysis of Kant's "Thoughts on the True Estimation of Living Forces" leads to a conclusion that does not match the usually accepted interpretation of Kant's reasoning in 1747, according to which the Young Kant supposedly establishes a relationship between the tridimensionality of space and Newton's law of universal gravitation. Indeed, it is argued that this text does not yield a satisfactory explanation of space dimensionality, actually restricting itself to justify the tridimensionality of extension.
△ Less
Submitted 25 April, 2015; v1 submitted 20 July, 2009;
originally announced July 2009.
-
Fine Structure Constants in n-dimensional Physical Spaces through Dimensional Analysis
Authors:
Fabricio Casarejos,
Jaime F. Villas da Rocha,
Roberto Moreira Xavier
Abstract:
We use Vaschy-Buckhingham Theorem as a systematic tool to build univocal n-dimensional extensions of the electric and gravitational fine structure constants and show that their ratio is dimensionally invariant. The results allow us to obtain the relative standard uncertainty on the three-dimensionality of space as 1.08 $10^{-13}$.
We use Vaschy-Buckhingham Theorem as a systematic tool to build univocal n-dimensional extensions of the electric and gravitational fine structure constants and show that their ratio is dimensionally invariant. The results allow us to obtain the relative standard uncertainty on the three-dimensionality of space as 1.08 $10^{-13}$.
△ Less
Submitted 7 September, 2003;
originally announced September 2003.