Skip to main content

Showing 1–2 of 2 results for author: Aristarán, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.14524  [pdf, other

    cs.IR cs.DL

    tabulapdf: An R Package to Extract Tables from PDF Documents

    Authors: Mauricio Vargas Sepúlveda, Thomas J. Leeper, Tom Paskhalis, Manuel Aristarán, Jeremy B. Merrill, Mike Tigas

    Abstract: tabulapdf is an R package that utilizes the Tabula Java library to import tables from PDF files directly into R. This tool can reduce time and effort in data extraction processes in fields like investigative journalism. It allows for automatic and manual table extraction, the latter facilitated through a Shiny interface, enabling manual areas selection with a computer mouse for data retrieval.

    Submitted 25 August, 2024; originally announced September 2024.

    Comments: 10 pages, 1 figure

    ACM Class: H.3.3

  2. arXiv:1602.08409  [pdf

    cs.DL cs.SI physics.soc-ph

    The Research Space: using the career paths of scholars to predict the evolution of the research output of individuals, institutions, and nations

    Authors: Miguel R. Guevara, Dominik Hartmann, Manuel Aristarán, Marcelo Mendoza, César A. Hidalgo

    Abstract: In recent years scholars have built maps of science by connecting the academic fields that cite each other, are cited together, or that cite a similar literature. But since scholars cannot always publish in the fields they cite, or that cite them, these science maps are only rough proxies for the potential of a scholar, organization, or country, to enter a new academic field. Here we use a large d… ▽ More

    Submitted 14 April, 2016; v1 submitted 26 February, 2016; originally announced February 2016.