-
AutoWISP: Automated Processing of Wide-Field Color Images
Authors:
Angel E. Romero,
Kaloyan Penev,
S. Javad Jafarzadeh,
Zoltan Csubry,
Joel D. Hartman,
Gaspar A. Bakos
Abstract:
We have developed a software pipeline, AutoWISP, for extracting high-precision photometry from citizen scientists' observations made with consumer-grade color digital cameras (digital single-lens reflex, or DSLR, cameras), based on our previously developed tool, AstroWISP. The new pipeline is designed to convert these observations, including color images, into high-precision light curves of stars.…
▽ More
We have developed a software pipeline, AutoWISP, for extracting high-precision photometry from citizen scientists' observations made with consumer-grade color digital cameras (digital single-lens reflex, or DSLR, cameras), based on our previously developed tool, AstroWISP. The new pipeline is designed to convert these observations, including color images, into high-precision light curves of stars. We outline the individual steps of the pipeline and present a case study using a Sony-alpha 7R II DSLR camera, demonstrating sub-percent photometric precision, and highlighting the benefits of three-color photometry of stars. Project PANOPTES will adopt this photometric pipeline and, we hope, be used by citizen scientists worldwide. Our aim is for AutoWISP to pave the way for potentially transformative contributions from citizen scientists with access to observing equipment.
△ Less
Submitted 21 July, 2025;
originally announced July 2025.
-
An expanded evaluation of protein function prediction methods shows an improvement in accuracy
Authors:
Yuxiang Jiang,
Tal Ronnen Oron,
Wyatt T Clark,
Asma R Bankapur,
Daniel D'Andrea,
Rosalba Lepore,
Christopher S Funk,
Indika Kahanda,
Karin M Verspoor,
Asa Ben-Hur,
Emily Koo,
Duncan Penfold-Brown,
Dennis Shasha,
Noah Youngs,
Richard Bonneau,
Alexandra Lin,
Sayed ME Sahraeian,
Pier Luigi Martelli,
Giuseppe Profiti,
Rita Casadio,
Renzhi Cao,
Zhaolong Zhong,
Jianlin Cheng,
Adrian Altenhoff,
Nives Skunca
, et al. (122 additional authors not shown)
Abstract:
Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our a…
▽ More
Background: The increasing volume and variety of genotypic and phenotypic data is a major defining characteristic of modern biomedical sciences. At the same time, the limitations in technology for generating data and the inherently stochastic nature of biomolecular events have led to the discrepancy between the volume of data and the amount of knowledge gleaned from it. A major bottleneck in our ability to understand the molecular underpinnings of life is the assignment of function to biological macromolecules, especially proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, accurately assessing methods for protein function prediction and tracking progress in the field remain challenging. Methodology: We have conducted the second Critical Assessment of Functional Annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. One hundred twenty-six methods from 56 research groups were evaluated for their ability to predict biological functions using the Gene Ontology and gene-disease associations using the Human Phenotype Ontology on a set of 3,681 proteins from 18 species. CAFA2 featured significantly expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis also compared the best methods participating in CAFA1 to those of CAFA2. Conclusions: The top performing methods in CAFA2 outperformed the best methods from CAFA1, demonstrating that computational function prediction is improving. This increased accuracy can be attributed to the combined effect of the growing number of experimental annotations and improved methods for function prediction.
△ Less
Submitted 2 January, 2016;
originally announced January 2016.
-
How distant is the ideal filter of being a causal one?
Authors:
J. M. Almira,
A. E. Romero
Abstract:
In this paper the characterization as convolution operators of filters sending finite energy signals to bounded signals is used to prove several theoretical results concerning the distance between the ideal filter and the spaces of physically realizable filters. Both the analog and the digital cases are studied and the formulas for the distance and the angle between the filters in each case are al…
▽ More
In this paper the characterization as convolution operators of filters sending finite energy signals to bounded signals is used to prove several theoretical results concerning the distance between the ideal filter and the spaces of physically realizable filters. Both the analog and the digital cases are studied and the formulas for the distance and the angle between the filters in each case are also given.
△ Less
Submitted 26 February, 2015;
originally announced February 2015.
-
Scientific impact evaluation and the effect of self-citations: mitigating the bias by discounting h-index
Authors:
Emilio Ferrara,
Alfonso E. Romero
Abstract:
In this paper, we propose a measure to assess scientific impact that discounts self-citations and does not require any prior knowledge on the their distribution among publications. This index can be applied to both researchers and journals. In particular, we show that it fills the gap of h-index and similar measures that do not take into account the effect of self-citations for authors or journals…
▽ More
In this paper, we propose a measure to assess scientific impact that discounts self-citations and does not require any prior knowledge on the their distribution among publications. This index can be applied to both researchers and journals. In particular, we show that it fills the gap of h-index and similar measures that do not take into account the effect of self-citations for authors or journals impact evaluation. The paper provides with two real-world examples: in the former, we evaluate the research impact of the most productive scholars in Computer Science (according to DBLP); in the latter, we revisit the impact of the journals ranked in the 'Computer Science Applications' section of SCImago. We observe how self-citations, in many cases, affect the rankings obtained according to different measures (including h-index and ch-index), and show how the proposed measure mitigates this effect.
△ Less
Submitted 16 October, 2013; v1 submitted 14 February, 2012;
originally announced February 2012.
-
A probabilistic methodology for multilabel classification
Authors:
Alfonso E. Romero,
Luis M. de Campos
Abstract:
Multilabel classification is a relatively recent subfield of machine learning. Unlike to the classical approach, where instances are labeled with only one category, in multilabel classification, an arbitrary number of categories is chosen to label an instance. Due to the problem complexity (the solution is one among an exponential number of alternatives), a very common solution (the binary method)…
▽ More
Multilabel classification is a relatively recent subfield of machine learning. Unlike to the classical approach, where instances are labeled with only one category, in multilabel classification, an arbitrary number of categories is chosen to label an instance. Due to the problem complexity (the solution is one among an exponential number of alternatives), a very common solution (the binary method) is frequently used, learning a binary classifier for every category, and combining them all afterwards. The assumption taken in this solution is not realistic, and in this work we give examples where the decisions for all the labels are not taken independently, and thus, a supervised approach should learn those existing relationships among categories to make a better classification. Therefore, we show here a generic methodology that can improve the results obtained by a set of independent probabilistic binary classifiers, by using a combination procedure with a classifier trained on the co-occurrences of the labels. We show an exhaustive experimentation in three different standard corpora of labeled documents (Reuters-21578, Ohsumed-23 and RCV1), which present noticeable improvements in all of them, when using our methodology, in three probabilistic base classifiers.
△ Less
Submitted 28 February, 2013; v1 submitted 23 January, 2012;
originally announced January 2012.