-
Second Order Operators in the NASA Astrophysics Data System
Authors:
Michael J. Kurtz,
Roman Chyla
Abstract:
Second Order Operators (SOOs) are database functions which form secondary queries based on attributes of the objects returned in an initial query; they can provide powerful methods to investigate complex, multipartite information graphs. The NASA Astrophysics Data System (ADS) has implemented four SOOs, reviews, useful, trending, and similar which use the citations, references, downloads, and abst…
▽ More
Second Order Operators (SOOs) are database functions which form secondary queries based on attributes of the objects returned in an initial query; they can provide powerful methods to investigate complex, multipartite information graphs. The NASA Astrophysics Data System (ADS) has implemented four SOOs, reviews, useful, trending, and similar which use the citations, references, downloads, and abstract text.
This tutorial describes these operators in detail, both alone and in conjunction with other functions. It is intended for scientists and others who wish to make fuller use of the ADS database. Basic knowledge of the ADS is assumed.
△ Less
Submitted 3 October, 2020;
originally announced October 2020.
-
Merging the Astrophysics and Planetary Science Information Systems
Authors:
Michael J. Kurtz,
Alberto Accomazzi,
Edwin A. Henneken
Abstract:
Conceptually exoplanet research has one foot in the discipline of Astrophysics and the other foot in Planetary Science. Research strategies for exoplanets will require efficient access to data and information from both realms. Astrophysics has a sophisticated, well integrated, distributed information system with archives and data centers which are interlinked with the technical literature via the…
▽ More
Conceptually exoplanet research has one foot in the discipline of Astrophysics and the other foot in Planetary Science. Research strategies for exoplanets will require efficient access to data and information from both realms. Astrophysics has a sophisticated, well integrated, distributed information system with archives and data centers which are interlinked with the technical literature via the Astrophysics Data System (ADS). The information system for Planetary Science does not have a central component linking the literature with the observational and theoretical data. Here we propose that the Committee on an Exoplanet Science Strategy recommend that this linkage be built, with the ADS playing the role in Planetary Science which it already plays in Astrophysics. This will require additional resources for the ADS, and the Planetary Data System (PDS), as well as other international collaborators
△ Less
Submitted 9 March, 2018;
originally announced March 2018.
-
Advice from the Oracle: Really Intelligent Information Retrieval
Authors:
Michael J. Kurtz
Abstract:
What is "intelligent" information retrieval? Essentially this is asking what is intelligence, in this article I will attempt to show some of the aspects of human intelligence, as related to information retrieval. I will do this by the device of a semi-imaginary Oracle. Every Observatory has an oracle, someone who is a distinguished scientist, has great administrative responsibilities, acts as ment…
▽ More
What is "intelligent" information retrieval? Essentially this is asking what is intelligence, in this article I will attempt to show some of the aspects of human intelligence, as related to information retrieval. I will do this by the device of a semi-imaginary Oracle. Every Observatory has an oracle, someone who is a distinguished scientist, has great administrative responsibilities, acts as mentor to a number of less senior people, and as trusted advisor to even the most accomplished scientists, and knows essentially everyone in the field. In an appendix I will present a brief summary of the Statistical Factor Space method for text indexing and retrieval, and indicate how it will be used in the Astrophysics Data System Abstract Service. 2018 Keywords: Personal Digital Assistant; Supervised Topic Models
△ Less
Submitted 2 January, 2018;
originally announced January 2018.
-
Comparing People with Bibliometrics
Authors:
Michael J. Kurtz
Abstract:
Bibliometric indicators, citation counts and/or download counts are increasingly being used to inform personnel decisions such as hiring or promotions. These statistics are very often misused. Here we provide a guide to the factors which should be considered when using these so-called quantitative measures to evaluate people. Rules of thumb are given for when begin to use bibliometric measures whe…
▽ More
Bibliometric indicators, citation counts and/or download counts are increasingly being used to inform personnel decisions such as hiring or promotions. These statistics are very often misused. Here we provide a guide to the factors which should be considered when using these so-called quantitative measures to evaluate people. Rules of thumb are given for when begin to use bibliometric measures when comparing otherwise similar candidates.
△ Less
Submitted 31 July, 2017;
originally announced July 2017.
-
Usage Bibliometrics as a Tool to Measure Research Activity
Authors:
Edwin A. Henneken,
Michael J. Kurtz
Abstract:
Measures for research activity and impact have become an integral ingredient in the assessment of a wide range of entities (individual researchers, organizations, instruments, regions, disciplines). Traditional bibliometric indicators, like publication and citation based indicators, provide an essential part of this picture, but cannot describe the complete picture. Since reading scholarly publica…
▽ More
Measures for research activity and impact have become an integral ingredient in the assessment of a wide range of entities (individual researchers, organizations, instruments, regions, disciplines). Traditional bibliometric indicators, like publication and citation based indicators, provide an essential part of this picture, but cannot describe the complete picture. Since reading scholarly publications is an essential part of the research life cycle, it is only natural to introduce measures for this activity in attempts to quantify the efficiency, productivity and impact of an entity. Citations and reads are significantly different signals, so taken together, they provide a more complete picture of research activity. Most scholarly publications are now accessed online, making the study of reads and their patterns possible. Click-stream logs allow us to follow information access by the entire research community, real-time. Publication and citation datasets just reflect activity by authors. In addition, download statistics will help us identify publications with significant impact, but which do not attract many citations. Click-stream signals are arguably more complex than, say, citation signals. For one, they are a superposition of different classes of readers. Systematic downloads by crawlers also contaminate the signal, as does browsing behavior. We discuss the complexities associated with clickstream data and how, with proper filtering, statistically significant relations and conclusions can be inferred from download statistics. We describe how download statistics can be used to describe research activity at different levels of aggregation, ranging from organizations to countries. These statistics show a correlation with socio-economic indicators. A comparison will be made with traditional bibliometric indicators. We will argue that astronomy is representative of more general trends.
△ Less
Submitted 7 June, 2017;
originally announced June 2017.
-
Measuring Metrics - A forty year longitudinal cross-validation of citations, downloads, and peer review in Astrophysics
Authors:
Michael J. Kurtz,
Edwin A. Henneken
Abstract:
Citation measures, and newer altmetric measures such as downloads are now commonly used to inform personnel decisions. How well do or can these measures measure or predict the past, current of future scholarly performance of an individual? Using data from the Smithsonian/NASA Astrophysics Data System we analyze the publication, citation, download, and distinction histories of a cohort of 922 indiv…
▽ More
Citation measures, and newer altmetric measures such as downloads are now commonly used to inform personnel decisions. How well do or can these measures measure or predict the past, current of future scholarly performance of an individual? Using data from the Smithsonian/NASA Astrophysics Data System we analyze the publication, citation, download, and distinction histories of a cohort of 922 individuals who received a U.S. PhD in astronomy in the period 1972-1976. By examining the same and different measures at the same and different times for the same individuals we are able to show the capabilities and limitations of each measure. Because the distributions are lognormal measurement uncertainties are multiplicative; we show that in order to state with 95% confidence that one person's citations and/or downloads are significantly higher than another person's, the log difference in the ratio of counts must be at least 0.3 dex, which corresponds to a multiplicative factor of two.
△ Less
Submitted 30 October, 2015;
originally announced October 2015.
-
A measure of total research impact independent of time and discipline
Authors:
Alberto Pepe,
Michael J. Kurtz
Abstract:
Authorship and citation practices evolve with time and differ by academic discipline. As such, indicators of research productivity based on citation records are naturally subject to historical and disciplinary effects. We observe these effects on a corpus of astronomer career data constructed from a database of refereed publications. We employ a simple mechanism to measure research output using au…
▽ More
Authorship and citation practices evolve with time and differ by academic discipline. As such, indicators of research productivity based on citation records are naturally subject to historical and disciplinary effects. We observe these effects on a corpus of astronomer career data constructed from a database of refereed publications. We employ a simple mechanism to measure research output using author and reference counts available in bibliographic databases to develop a citation-based indicator of research productivity. The total research impact (tori) quantifies, for an individual, the total amount of scholarly work that others have devoted to his/her work, measured in the volume of research papers. A derived measure, the research impact quotient (riq), is an age independent measure of an individual's research ability. We demonstrate that these measures are substantially less vulnerable to temporal debasement and cross-disciplinary bias than the most popular current measures. The proposed measures of research impact, tori and riq, have been implemented in the Smithsonian/NASA Astrophysics Data System.
△ Less
Submitted 10 September, 2012;
originally announced September 2012.
-
Finding and Recommending Scholarly Articles
Authors:
Michael J. Kurtz,
Edwin A. Henneken
Abstract:
The rate at which scholarly literature is being produced has been increasing at approximately 3.5 percent per year for decades. This means that during a typical 40 year career the amount of new literature produced each year increases by a factor of four. The methods scholars use to discover relevant literature must change. Just like everybody else involved in information discovery, scholars are co…
▽ More
The rate at which scholarly literature is being produced has been increasing at approximately 3.5 percent per year for decades. This means that during a typical 40 year career the amount of new literature produced each year increases by a factor of four. The methods scholars use to discover relevant literature must change. Just like everybody else involved in information discovery, scholars are confronted with information overload. Two decades ago, this discovery process essentially consisted of paging through abstract books, talking to colleagues and librarians, and browsing journals. A time-consuming process, which could even be longer if material had to be shipped from elsewhere. Now much of this discovery process is mediated by online scholarly information systems. All these systems are relatively new, and all are still changing. They all share a common goal: to provide their users with access to the literature relevant to their specific needs. To achieve this each system responds to actions by the user by displaying articles which the system judges relevant to the user's current needs. Recently search systems which use particularly sophisticated methodologies to recommend a few specific papers to the user have been called "recommender systems". These methods are in line with the current use of the term "recommender system" in computer science. We do not adopt this definition, rather we view systems like these as components in a larger whole, which is presented by the scholarly information systems themselves. In what follows we view the recommender system as an aspect of the entire information system; one which combines the massive memory capacities of the machine with the cognitive abilities of the human user to achieve a human-machine synergy.
△ Less
Submitted 6 September, 2012;
originally announced September 2012.
-
Usage Bibliometrics
Authors:
Michael J. Kurtz,
Johan Bollen
Abstract:
Scholarly usage data provides unique opportunities to address the known shortcomings of citation analysis. However, the collection, processing and analysis of usage data remains an area of active research. This article provides a review of the state-of-the-art in usage-based informetric, i.e. the use of usage data to study the scholarly process.
Scholarly usage data provides unique opportunities to address the known shortcomings of citation analysis. However, the collection, processing and analysis of usage data remains an area of active research. This article provides a review of the state-of-the-art in usage-based informetric, i.e. the use of usage data to study the scholarly process.
△ Less
Submitted 14 February, 2011;
originally announced February 2011.
-
The Emerging Scholarly Brain
Authors:
Michael J. Kurtz
Abstract:
It is now a commonplace observation that human society is becoming a coherent super-organism, and that the information infrastructure forms its emerging brain. Perhaps, as the underlying technologies are likely to become billions of times more powerful than those we have today, we could say that we are now building the lizard brain for the future organism.
It is now a commonplace observation that human society is becoming a coherent super-organism, and that the information infrastructure forms its emerging brain. Perhaps, as the underlying technologies are likely to become billions of times more powerful than those we have today, we could say that we are now building the lizard brain for the future organism.
△ Less
Submitted 4 August, 2010;
originally announced August 2010.
-
Using Multipartite Graphs for Recommendation and Discovery
Authors:
Michael J. Kurtz,
Alberto Accomazzi,
Edwin Henneken,
Giovanni Di Milia,
Carolyn S. Grant
Abstract:
The Smithsonian/NASA Astrophysics Data System exists at the nexus of a dense system of interacting and interlinked information networks. The syntactic and the semantic content of this multipartite graph structure can be combined to provide very specific research recommendations to the scientist/user.
The Smithsonian/NASA Astrophysics Data System exists at the nexus of a dense system of interacting and interlinked information networks. The syntactic and the semantic content of this multipartite graph structure can be combined to provide very specific research recommendations to the scientist/user.
△ Less
Submitted 30 December, 2009;
originally announced December 2009.
-
The Bibliometric Properties of Article Readership Information
Authors:
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn S. Grant,
Markus Demleitner,
Stephen S. Murray,
Nathalie Martimbeau,
Barbara Elwell
Abstract:
The NASA Astrophysics Data System (ADS), along with astronomy's journals and data centers (a collaboration dubbed URANIA), has developed a distributed on-line digital library which has become the dominant means by which astronomers search, access and read their technical literature. Digital libraries such as the NASA Astrophysics Data System permit the easy accumulation of a new type of bibliome…
▽ More
The NASA Astrophysics Data System (ADS), along with astronomy's journals and data centers (a collaboration dubbed URANIA), has developed a distributed on-line digital library which has become the dominant means by which astronomers search, access and read their technical literature. Digital libraries such as the NASA Astrophysics Data System permit the easy accumulation of a new type of bibliometric measure, the number of electronic accesses (``reads'') of individual articles. We explore various aspects of this new measure. We examine the obsolescence function as measured by actual reads, and show that it can be well fit by the sum of four exponentials with very different time constants. We compare the obsolescence function as measured by readership with the obsolescence function as measured by citations. We find that the citation function is proportional to the sum of two of the components of the readership function. This proves that the normative theory of citation is true in the mean. We further examine in detail the similarities and differences between the citation rate, the readership rate and the total citations for individual articles, and discuss some of the causes. Using the number of reads as a bibliometric measure for individuals, we introduce the read-cite diagram to provide a two-dimensional view of an individual's scientific productivity. We develop a simple model to account for an individual's reads and cites and use it to show that the position of a person in the read-cite diagram is a function of age, innate productivity, and work history. We show the age biases of both reads and cites, and develop two new bibliometric measures which have substantially less age bias than citations
△ Less
Submitted 25 September, 2009;
originally announced September 2009.
-
Worldwide Use and Impact of the NASA Astrophysics Data System Digital Library
Authors:
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn Grant,
Markus Demleitner,
Stephen S. Murray
Abstract:
By combining data from the text, citation, and reference databases with data from the ADS readership logs we have been able to create Second Order Bibliometric Operators, a customizable class of collaborative filters which permits substantially improved accuracy in literature queries.
Using the ADS usage logs along with membership statistics from the International Astronomical Union and data o…
▽ More
By combining data from the text, citation, and reference databases with data from the ADS readership logs we have been able to create Second Order Bibliometric Operators, a customizable class of collaborative filters which permits substantially improved accuracy in literature queries.
Using the ADS usage logs along with membership statistics from the International Astronomical Union and data on the population and gross domestic product (GDP) we develop an accurate model for world-wide basic research where the number of scientists in a country is proportional to the GDP of that country, and the amount of basic research done by a country is proportional to the number of scientists in that country times that country's per capita GDP.
We introduce the concept of utility time to measure the impact of the ADS/URANIA and the electronic astronomical library on astronomical research. We find that in 2002 it amounted to the equivalent of 736 FTE researchers, or $250 Million, or the astronomical research done in France.
Subject headings: digital libraries; bibliometrics; sociology of science; information retrieval
△ Less
Submitted 25 September, 2009;
originally announced September 2009.