Skip to main content

Showing 1–7 of 7 results for author: Smyth, K

.
  1. arXiv:2010.04665  [pdf, other

    cs.CL cs.IR

    Scaling Systematic Literature Reviews with Machine Learning Pipelines

    Authors: Seraphina Goldfarb-Tarrant, Alexander Robertson, Jasmina Lazic, Theodora Tsouloufi, Louise Donnison, Karen Smyth

    Abstract: Systematic reviews, which entail the extraction of data from large numbers of scientific documents, are an ideal avenue for the application of machine learning. They are vital to many fields of science and philanthropy, but are very time-consuming and require experts. Yet the three main stages of a systematic review are easily done automatically: searching for documents can be done via APIs and sc… ▽ More

    Submitted 9 October, 2020; originally announced October 2020.

    Comments: In EMNLP 2020 Scholarly Document Processing Workshop

  2. arXiv:1603.06687  [pdf, other

    stat.CO

    statmod: Probability Calculations for the Inverse Gaussian Distribution

    Authors: Göknur Giner, Gordon K. Smyth

    Abstract: The inverse Gaussian distribution (IGD) is a well known and often used probability distribution for which fully reliable numerical algorithms have not been available. Our aim in this article is to develop software for this distribution for the R programming environment. We develop fast, reliable basic probability functions (dinvgauss, pinvgauss, qinvgauss and rinvgauss) that work for all possible… ▽ More

    Submitted 27 July, 2016; v1 submitted 22 March, 2016; originally announced March 2016.

    Comments: 18 pages, 2 figures. Accepted for publication in The R Journal, Volume 8 (2016)

    MSC Class: 60-04

  3. Permutation p-values should never be zero: calculating exact p-values when permutations are randomly drawn

    Authors: Belinda Phipson, Gordon K. Smyth

    Abstract: Permutation tests are amongst the most commonly used statistical tools in modern genomic research, a process by which p-values are attached to a test statistic by randomly permuting the sample or gene labels. Yet permutation p-values published in the genomic literature are often computed incorrectly, understated by about 1/m, where m is the number of permutations. The same is often true in the mor… ▽ More

    Submitted 18 March, 2016; originally announced March 2016.

    Comments: 12 pages, 2 figures

    MSC Class: 62G09; 62G10

    Journal ref: Stat. Appl. Genet. Molec. Biol., Volume 9 (2010), Issue 1, Article 39

  4. arXiv:1602.08678  [pdf, other

    stat.AP q-bio.GN

    Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression

    Authors: Belinda Phipson, Stanley Lee, Ian J. Majewski, Warren S. Alexander, Gordon K. Smyth

    Abstract: One of the most common analysis tasks in genomic research is to identify genes that are differentially expressed (DE) between experimental conditions. Empirical Bayes (EB) statistical tests using moderated genewise variances have been very effective for this purpose, especially when the number of biological replicate samples is small. The EB procedures can however be heavily influenced by a small… ▽ More

    Submitted 27 July, 2016; v1 submitted 28 February, 2016; originally announced February 2016.

    Comments: 23 pages, 4 figures

    MSC Class: 62F35 (primary); 62P10 (secondary)

    Journal ref: Ann. Appl. Stat., Volume 10, Number 2 (2016), 946-963

  5. Assessing Technical Performance in Differential Gene Expression Experiments with External Spike-in RNA Control Ratio Mixtures

    Authors: Sarah A. Munro, Steve P. Lund, P. Scott Pine, Hans Binder, Djork-Arné Clevert, Ana Conesa, Joaquin Dopazo, Mario Fasold, Sepp Hochreiter, Huixiao Hong, Nederah Jafari, David P. Kreil, Paweł P. Łabaj, Sheng Li, Yang Liao, Simon Lin, Joseph Meehan, Christopher E. Mason, Javier Santoyo, Robert A. Setterquist, Leming Shi, Wei Shi, Gordon K. Smyth, Nancy Stralis-Pavese, Zhenqiang Su , et al. (8 additional authors not shown)

    Abstract: There is a critical need for standard approaches to assess, report, and compare the technical performance of genome-scale differential gene expression experiments. We assess technical performance with a proposed "standard" dashboard of metrics derived from analysis of external spike-in RNA control ratio mixtures. These control ratio mixtures with defined abundance ratios enable assessment of diagn… ▽ More

    Submitted 18 June, 2014; originally announced June 2014.

    Comments: 65 pages, 6 Main Figures, 33 Supplementary Figures

    Journal ref: Nat. Commun. (2014) 5:5125

  6. featureCounts: An efficient general-purpose program for assigning sequence reads to genomic features

    Authors: Yang Liao, Gordon K Smyth, Wei Shi

    Abstract: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a g… ▽ More

    Submitted 14 November, 2013; v1 submitted 14 May, 2013; originally announced May 2013.

    Comments: This manuscript has now been published on Bioinformatics Yang Liao, Gordon K Smyth and Wei Shi. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics 2013

    Journal ref: Bioinformatics 30 (2014), 923-930

  7. Count-based differential expression analysis of RNA sequencing data using R and Bioconductor

    Authors: Simon Anders, Davis J. McCarthy, Yunshen Chen, Michal Okoniewski, Gordon K. Smyth, Wolfgang Huber, Mark D. Robinson

    Abstract: RNA sequencing (RNA-seq) has been rapidly adopted for the profiling of transcriptomes in many areas of biology, including studies into gene regulation, development and disease. Of particular interest is the discovery of differentially expressed genes across different conditions (e.g., tissues, perturbations), while optionally adjusting for other systematic factors that affect the data collection p… ▽ More

    Submitted 20 June, 2013; v1 submitted 15 February, 2013; originally announced February 2013.