-
Are Widely Known Findings Easier to Retract?
Authors:
Shahan Ali Memon,
Jevin D. West,
Cailin O'Connor
Abstract:
Failures of retraction are common in science. Why do these failures occur? And, relatedly, what makes findings harder or easier to retract? We use data from Microsoft Academic Graph, Retraction Watch, and Altmetric -- including retracted papers, citation records, and Altmetric scores and mentions -- to test recently proposed answers to these questions. A recent previous study by LaCroix et al. emp…
▽ More
Failures of retraction are common in science. Why do these failures occur? And, relatedly, what makes findings harder or easier to retract? We use data from Microsoft Academic Graph, Retraction Watch, and Altmetric -- including retracted papers, citation records, and Altmetric scores and mentions -- to test recently proposed answers to these questions. A recent previous study by LaCroix et al. employ simple network models to argue that the social spread of scientific information helps explain failures of retraction. One prediction of their models is that widely known or well established results, surprisingly, should be easier to retract, since their retraction is more relevant to more scientists. Our results support this conclusion. We find that highly cited papers show more significant reductions in citation after retraction and garner more attention to their retractions as they occur.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
From job titles to jawlines: Using context voids to study generative AI systems
Authors:
Shahan Ali Memon,
Soham De,
Sungha Kang,
Riyan Mujtaba,
Bedoor AlShebli,
Katie Davis,
Jaime Snyder,
Jevin D. West
Abstract:
In this paper, we introduce a speculative design methodology for studying the behavior of generative AI systems, framing design as a mode of inquiry. We propose bridging seemingly unrelated domains to generate intentional context voids, using these tasks as probes to elicit AI model behavior. We demonstrate this through a case study: probing the ChatGPT system (GPT-4 and DALL-E) to generate headsh…
▽ More
In this paper, we introduce a speculative design methodology for studying the behavior of generative AI systems, framing design as a mode of inquiry. We propose bridging seemingly unrelated domains to generate intentional context voids, using these tasks as probes to elicit AI model behavior. We demonstrate this through a case study: probing the ChatGPT system (GPT-4 and DALL-E) to generate headshots from professional Curricula Vitae (CVs). In contrast to traditional ways, our approach assesses system behavior under conditions of radical uncertainty -- when forced to invent entire swaths of missing context -- revealing subtle stereotypes and value-laden assumptions. We qualitatively analyze how the system interprets identity and competence markers from CVs, translating them into visual portraits despite the missing context (i.e. physical descriptors). We show that within this context void, the AI system generates biased representations, potentially relying on stereotypical associations or blatant hallucinations.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
Insights from Network Science can advance Deep Graph Learning
Authors:
Christopher Blöcker,
Martin Rosvall,
Ingo Scholtes,
Jevin D. West
Abstract:
Deep graph learning and network science both analyze graphs but approach similar problems from different perspectives. Whereas network science focuses on models and measures that reveal the organizational principles of complex systems with explicit assumptions, deep graph learning focuses on flexible and generalizable models that learn patterns in graph data in an automated fashion. Despite these…
▽ More
Deep graph learning and network science both analyze graphs but approach similar problems from different perspectives. Whereas network science focuses on models and measures that reveal the organizational principles of complex systems with explicit assumptions, deep graph learning focuses on flexible and generalizable models that learn patterns in graph data in an automated fashion. Despite these differences, both fields share the same goal: to better model and understand patterns in graph-structured data. Early efforts to integrate methods, models, and measures from network science and deep graph learning indicate significant untapped potential. In this position, we explore opportunities at their intersection. We discuss open challenges in deep graph learning, including data augmentation, improved evaluation practices, higher-order models, and pooling methods. Likewise, we highlight challenges in network science, including scaling to massive graphs, integrating continuous gradient-based optimization, and developing standardized benchmarks.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
An Automated Explainable Educational Assessment System Built on LLMs
Authors:
Jiazheng Li,
Artem Bobrov,
David West,
Cesare Aloisi,
Yulan He
Abstract:
In this demo, we present AERA Chat, an automated and explainable educational assessment system designed for interactive and visual evaluations of student responses. This system leverages large language models (LLMs) to generate automated marking and rationale explanations, addressing the challenge of limited explainability in automated educational assessment and the high costs associated with anno…
▽ More
In this demo, we present AERA Chat, an automated and explainable educational assessment system designed for interactive and visual evaluations of student responses. This system leverages large language models (LLMs) to generate automated marking and rationale explanations, addressing the challenge of limited explainability in automated educational assessment and the high costs associated with annotation. Our system allows users to input questions and student answers, providing educators and researchers with insights into assessment accuracy and the quality of LLM-assessed rationales. Additionally, it offers advanced visualization and robust evaluation tools, enhancing the usability for educational assessment and facilitating efficient rationale verification. Our demo video can be found at https://youtu.be/qUSjz-sxlBc.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
AERA Chat: An Interactive Platform for Automated Explainable Student Answer Assessment
Authors:
Jiazheng Li,
Artem Bobrov,
David West,
Cesare Aloisi,
Yulan He
Abstract:
Generating rationales that justify scoring decisions has emerged as a promising approach to enhance explainability in the development of automated scoring systems. However, the scarcity of publicly available rationale data and the high cost of annotation have resulted in existing methods typically relying on noisy rationales generated by large language models (LLMs). To address these challenges, w…
▽ More
Generating rationales that justify scoring decisions has emerged as a promising approach to enhance explainability in the development of automated scoring systems. However, the scarcity of publicly available rationale data and the high cost of annotation have resulted in existing methods typically relying on noisy rationales generated by large language models (LLMs). To address these challenges, we have developed AERA Chat, an interactive platform, to provide visually explained assessment of student answers and streamline the verification of rationales. Users can input questions and student answers to obtain automated, explainable assessment results from LLMs. The platform's innovative visualization features and robust evaluation tools make it useful for educators to assist their marking process, and for researchers to evaluate assessment performance and quality of rationales generated by different LLMs, or as a tool for efficient annotation. We evaluated three rationale generation approaches on our platform to demonstrate its capability.
△ Less
Submitted 12 October, 2024;
originally announced October 2024.
-
Calibrating LLMs with Preference Optimization on Thought Trees for Generating Rationale in Science Question Scoring
Authors:
Jiazheng Li,
Hainiu Xu,
Zhaoyue Sun,
Yuxiang Zhou,
David West,
Cesare Aloisi,
Yulan He
Abstract:
Generating rationales that justify scoring decisions has been a promising way to facilitate explainability in automated scoring systems. However, existing methods do not match the accuracy of classifier-based methods. Plus, the generated rationales often contain hallucinated information. To address these issues, we propose a novel framework capable of generating more faithful rationales and, more…
▽ More
Generating rationales that justify scoring decisions has been a promising way to facilitate explainability in automated scoring systems. However, existing methods do not match the accuracy of classifier-based methods. Plus, the generated rationales often contain hallucinated information. To address these issues, we propose a novel framework capable of generating more faithful rationales and, more importantly, matching performance with classifier-based black-box scoring systems. We first mimic the human assessment process by querying Large Language Models (LLMs) to generate a thought tree. We then summarise intermediate assessment decisions from each thought tree path for creating synthetic rationale data and rationale preference data. Finally, we utilise the generated synthetic data to calibrate LLMs through a two-step training process: supervised fine-tuning and preference optimization. Extensive experimental results demonstrate that our framework achieves a 38% assessment performance improvement in the QWK score compared to prior work while producing higher-quality rationales, as recognised by human evaluators and LLMs. Our work sheds light on the effectiveness of performing preference optimization using synthetic preference data obtained from thought tree paths. Data and code are available at https://github.com/lijiazheng99/thought_tree_assessment.
△ Less
Submitted 12 October, 2024; v1 submitted 28 June, 2024;
originally announced June 2024.
-
RIP Twitter API: A eulogy to its vast research contributions
Authors:
Ryan Murtfeldt,
Naomi Alterman,
Ihsan Kahveci,
Jevin D. West
Abstract:
Since 2006, Twitter's Application Programming Interface (API) has been a treasure trove of high-quality data for researchers studying everything from the spread of misinformation, to social psychology and emergency management. However, in the spring of 2023, Twitter (now called X) began changing $42,000/month for its Enterprise access level, an essential death knell for researcher use. Lacking suf…
▽ More
Since 2006, Twitter's Application Programming Interface (API) has been a treasure trove of high-quality data for researchers studying everything from the spread of misinformation, to social psychology and emergency management. However, in the spring of 2023, Twitter (now called X) began changing $42,000/month for its Enterprise access level, an essential death knell for researcher use. Lacking sufficient funds to pay this monthly fee, academics are now scrambling to continue their research without this important data source. This study collects and tabulates the number of studies, number of citations, dates, major disciplines, and major topic areas of studies that used Twitter data between 2006 and 2023. While we cannot know for certain what will be lost now that Twitter data is cost prohibitive, we can illustrate its research value during the time it was available. A search of 8 databases and 3 related APIs found that since 2006, a total of 27,453 studies have been published in 7,432 publication venues, with 1,303,142 citations, across 14 disciplines. Major disciplines include: computational social science, engineering, data science, social media studies, public health, and medicine. Major topics include: information dissemination, assessing the credibility of tweets, strategies for conducting data research, detecting and analyzing major events, and studying human behavior. Twitter data studies have increased every year since 2006, but following Twitter's decision to begin charging for data in the spring of 2023, the number of studies published in 2023 decreased by 13% compared to 2022. We assume that much of the data used for studies published in 2023 were collected prior to Twitter's shutdown, and thus the number of new studies are likely to decline further in subsequent years.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Echo Chambers in the Age of Algorithms: An Audit of Twitter's Friend Recommender System
Authors:
Kayla Duskin,
Joseph S. Schafer,
Jevin D. West,
Emma S. Spiro
Abstract:
The presence of political misinformation and ideological echo chambers on social media platforms is concerning given the important role that these sites play in the public's exposure to news and current events. Algorithmic systems employed on these platforms are presumed to play a role in these phenomena, but little is known about their mechanisms and effects. In this work, we conduct an algorithm…
▽ More
The presence of political misinformation and ideological echo chambers on social media platforms is concerning given the important role that these sites play in the public's exposure to news and current events. Algorithmic systems employed on these platforms are presumed to play a role in these phenomena, but little is known about their mechanisms and effects. In this work, we conduct an algorithmic audit of Twitter's Who-To-Follow friend recommendation system, the first empirical audit that investigates the impact of this algorithm in-situ. We create automated Twitter accounts that initially follow left and right affiliated U.S. politicians during the 2022 U.S. midterm elections and then grow their information networks using the platform's recommender system. We pair the experiment with an observational study of Twitter users who already follow the same politicians. Broadly, we find that while following the recommendation algorithm leads accounts into dense and reciprocal neighborhoods that structurally resemble echo chambers, the recommender also results in less political homogeneity of a user's network compared to accounts growing their networks through social endorsement. Furthermore, accounts that exclusively followed users recommended by the algorithm had fewer opportunities to encounter content centered on false or misleading election narratives compared to choosing friends based on social endorsement.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Unsupervised self-organising map of prostate cell Raman spectra shows disease-state subclustering
Authors:
Daniel West,
Susan Stepney,
Y. Hancock
Abstract:
Prostate cancer is a disease which poses an interesting clinical question: should it be treated? A small subset of prostate cancers are aggressive and require removal and treatment to prevent metastatic spread. However, conventional diagnostics remain challenged to risk-stratify such patients, hence, new methods of approach to biomolecularly subclassify the disease are needed. Here we use an unsup…
▽ More
Prostate cancer is a disease which poses an interesting clinical question: should it be treated? A small subset of prostate cancers are aggressive and require removal and treatment to prevent metastatic spread. However, conventional diagnostics remain challenged to risk-stratify such patients, hence, new methods of approach to biomolecularly subclassify the disease are needed. Here we use an unsupervised, self-organising map approach to analyse live-cell Raman spectroscopy data obtained from prostate cell-lines; our aim is to test the feasibility of this method to differentiate, at the single-cell-level, cancer from normal using high-dimensional datasets with minimal preprocessing. The results demonstrate not only successful separation of normal prostate and cancer cells, but also a new subclustering of the prostate cancer cell-line into two groups. Initial analysis of the spectra from each of the cancer subclusters demonstrates a differential expression of lipids, which, against the normal control, may be linked to disease-related changes in cellular signalling.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Search Engines Post-ChatGPT: How Generative Artificial Intelligence Could Make Search Less Reliable
Authors:
Shahan Ali Memon,
Jevin D. West
Abstract:
In this commentary, we discuss the evolving nature of search engines, as they begin to generate, index, and distribute content created by generative artificial intelligence (GenAI). Our discussion highlights challenges in the early stages of GenAI integration, particularly around factual inconsistencies and biases. We discuss how output from GenAI carries an unwarranted sense of credibility, while…
▽ More
In this commentary, we discuss the evolving nature of search engines, as they begin to generate, index, and distribute content created by generative artificial intelligence (GenAI). Our discussion highlights challenges in the early stages of GenAI integration, particularly around factual inconsistencies and biases. We discuss how output from GenAI carries an unwarranted sense of credibility, while decreasing transparency and sourcing ability. Furthermore, search engines are already answering queries with error-laden, generated content, further blurring the provenance of information and impacting the integrity of the information ecosystem. We argue how all these factors could reduce the reliability of search engines. Finally, we summarize some of the active research directions and open questions.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
How should the advent of large language models affect the practice of science?
Authors:
Marcel Binz,
Stephan Alaniz,
Adina Roskies,
Balazs Aczel,
Carl T. Bergstrom,
Colin Allen,
Daniel Schad,
Dirk Wulff,
Jevin D. West,
Qiong Zhang,
Richard M. Shiffrin,
Samuel J. Gershman,
Ven Popov,
Emily M. Bender,
Marco Marelli,
Matthew M. Botvinick,
Zeynep Akata,
Eric Schulz
Abstract:
Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schu…
▽ More
Large language models (LLMs) are being increasingly incorporated into scientific workflows. However, we have yet to fully grasp the implications of this integration. How should the advent of large language models affect the practice of science? For this opinion piece, we have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate. Schulz et al. make the argument that working with LLMs is not fundamentally different from working with human collaborators, while Bender et al. argue that LLMs are often misused and over-hyped, and that their limitations warrant a focus on more specialized, easily interpretable tools. Marelli et al. emphasize the importance of transparent attribution and responsible use of LLMs. Finally, Botvinick and Gershman advocate that humans should retain responsibility for determining the scientific roadmap. To facilitate the discussion, the four perspectives are complemented with a response from each group. By putting these different perspectives in conversation, we aim to bring attention to important considerations within the academic community regarding the adoption of LLMs and their impact on both current and future scientific practices.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Distilling ChatGPT for Explainable Automated Student Answer Assessment
Authors:
Jiazheng Li,
Lin Gui,
Yuxiang Zhou,
David West,
Cesare Aloisi,
Yulan He
Abstract:
Providing explainable and faithful feedback is crucial for automated student answer assessment. In this paper, we introduce a novel framework that explores using ChatGPT, a cutting-edge large language model, for the concurrent tasks of student answer scoring and rationale generation. We identify the appropriate instructions by prompting ChatGPT with different templates to collect the rationales, w…
▽ More
Providing explainable and faithful feedback is crucial for automated student answer assessment. In this paper, we introduce a novel framework that explores using ChatGPT, a cutting-edge large language model, for the concurrent tasks of student answer scoring and rationale generation. We identify the appropriate instructions by prompting ChatGPT with different templates to collect the rationales, where inconsistent rationales are refined to align with marking standards. The refined ChatGPT outputs enable us to fine-tune a smaller language model that simultaneously assesses student answers and provides rationales. Extensive experiments on the benchmark dataset show that the proposed method improves the overall QWK score by 11% compared to ChatGPT. Furthermore, our thorough analysis and human evaluation demonstrate that the rationales generated by our proposed method are comparable to those of ChatGPT. Our approach provides a viable solution to achieve explainable automated assessment in education. Code available at https://github.com/lijiazheng99/aera.
△ Less
Submitted 24 October, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Delineating Knowledge Domains in the Scientific Literature Using Visual Information
Authors:
Sean Yang,
Po-shen Lee,
Jevin D. West,
Bill Howe
Abstract:
Figures are an important channel for scientific communication, used to express complex ideas, models and data in ways that words cannot. However, this visual information is mostly ignored in analyses of the scientific literature. In this paper, we demonstrate the utility of using scientific figures as markers of knowledge domains in science, which can be used for classification, recommender system…
▽ More
Figures are an important channel for scientific communication, used to express complex ideas, models and data in ways that words cannot. However, this visual information is mostly ignored in analyses of the scientific literature. In this paper, we demonstrate the utility of using scientific figures as markers of knowledge domains in science, which can be used for classification, recommender systems, and studies of scientific information exchange. We encode sets of images into a visual signature, then use distances between these signatures to understand how patterns of visual communication compare with patterns of jargon and citation structures. We find that figures can be as effective for differentiating communities of practice as text or citation patterns. We then consider where these metrics disagree to understand how different disciplines use visualization to express ideas. Finally, we further consider how specific figure types propagate through the literature, suggesting a new mechanism for understanding the flow of ideas apart from conventional channels of text and citations. Our ultimate aim is to better leverage these information-dense objects to improve scientific communication across disciplinary boundaries.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
AI-based evaluation of the SDGs: The case of crop detection with earth observation data
Authors:
Natalia Efremova,
Dennis West,
Dmitry Zausaev
Abstract:
The framework of the seventeen sustainable development goals is a challenge for developers and researchers applying artificial intelligence (AI). AI and earth observations (EO) can provide reliable and disaggregated data for better monitoring of the sustainable development goals (SDGs). In this paper, we present an overview of SDG targets, which can be effectively measured with AI tools. We identi…
▽ More
The framework of the seventeen sustainable development goals is a challenge for developers and researchers applying artificial intelligence (AI). AI and earth observations (EO) can provide reliable and disaggregated data for better monitoring of the sustainable development goals (SDGs). In this paper, we present an overview of SDG targets, which can be effectively measured with AI tools. We identify indicators with the most significant contribution from the AI and EO and describe an application of state-of-the-art machine learning models to one of the indicators. We describe an application of U-net with SE blocks for efficient segmentation of satellite imagery for crop detection. Finally, we demonstrate how AI can be more effectively applied in solutions directly contributing towards specific SDGs and propose further research on an AI-based evaluative infrastructure for SDGs.
△ Less
Submitted 5 July, 2019;
originally announced July 2019.
-
Why scatter plots suggest causality, and what we can do about it
Authors:
Carl T. Bergstrom,
Jevin D. West
Abstract:
Scatter plots carry an implicit if subtle message about causality. Whether we look at functions of one variable in pure mathematics, plots of experimental measurements as a function of the experimental conditions, or scatter plots of predictor and response variables, the value plotted on the vertical axis is by convention assumed to be determined or influenced by the value on the horizontal axis.…
▽ More
Scatter plots carry an implicit if subtle message about causality. Whether we look at functions of one variable in pure mathematics, plots of experimental measurements as a function of the experimental conditions, or scatter plots of predictor and response variables, the value plotted on the vertical axis is by convention assumed to be determined or influenced by the value on the horizontal axis. This is a problem for the public understanding of scientific results and perhaps also for professional scientists' interpretations of scatter plots. To avoid suggesting a causal relationship between the x and y values in a scatter plot, we propose a new type of data visualization, the diamond plot. Diamond plots are essentially 45 degree rotations of ordinary scatter plots; by visually jarring the viewer they clearly indicate that she should not draw the usual distinction between independent/predictor variable and dependent/response variable. Instead, she should see the relationship as purely correlative.
△ Less
Submitted 25 September, 2018;
originally announced September 2018.
-
Leveraging Citation Networks to Visualize Scholarly Influence Over Time
Authors:
Jason Portenoy,
Jessica Hullman,
Jevin D. West
Abstract:
Assessing the influence of a scholar's work is an important task for funding organizations, academic departments, and researchers. Common methods, such as measures of citation counts, can ignore much of the nuance and multidimensionality of scholarly influence. We present an approach for generating dynamic visualizations of scholars' careers. This approach uses an animated node-link diagram showin…
▽ More
Assessing the influence of a scholar's work is an important task for funding organizations, academic departments, and researchers. Common methods, such as measures of citation counts, can ignore much of the nuance and multidimensionality of scholarly influence. We present an approach for generating dynamic visualizations of scholars' careers. This approach uses an animated node-link diagram showing the citation network accumulated around the researcher over the course of the career in concert with key indicators, highlighting influence both within and across fields. We developed our design in collaboration with one funding organization---the Pew Biomedical Scholars program---but the methods are generalizable to visualizations of scholarly influence. We applied the design method to the Microsoft Academic Graph, which includes more than 120 million publications. We validate our abstractions throughout the process through collaboration with the Pew Biomedical Scholars program officers and summative evaluations with their scholars.
△ Less
Submitted 5 December, 2016; v1 submitted 21 November, 2016;
originally announced November 2016.
-
Spanning Trees in 2-trees
Authors:
P. Renjith,
N. Sadagopan,
Douglas B. West
Abstract:
A spanning tree of a graph $G$ is a connected acyclic spanning subgraph of $G$. We consider enumeration of spanning trees when $G$ is a $2$-tree, meaning that $G$ is obtained from one edge by iteratively adding a vertex whose neighborhood consists of two adjacent vertices. We use this construction order both to inductively list the spanning trees without repetition and to give bounds on the number…
▽ More
A spanning tree of a graph $G$ is a connected acyclic spanning subgraph of $G$. We consider enumeration of spanning trees when $G$ is a $2$-tree, meaning that $G$ is obtained from one edge by iteratively adding a vertex whose neighborhood consists of two adjacent vertices. We use this construction order both to inductively list the spanning trees without repetition and to give bounds on the number of them. We determine the $n$-vertex $2$-trees having the most and the fewest spanning trees. The $2$-tree with the fewest is unique; it has $n-2$ vertices of degree $2$ and has $n2^{n-3}$ spanning trees. Those with the most are all those having exactly two vertices of degree $2$, and their number of spanning trees is the Fibonacci number $F_{2n-2}$.
△ Less
Submitted 20 July, 2016;
originally announced July 2016.
-
Men Set Their Own Cites High: Gender and Self-citation across Fields and over Time
Authors:
Molly M. King,
Carl T. Bergstrom,
Shelley J. Correll,
Jennifer Jacquet,
Jevin D. West
Abstract:
How common is self-citation in scholarly publication, and does the practice vary by gender? Using novel methods and a data set of 1.5 million research papers in the scholarly database JSTOR published between 1779 and 2011, the authors find that nearly 10 percent of references are self-citations by a paper's authors. The findings also show that between 1779 and 2011, men cited their own papers 56 p…
▽ More
How common is self-citation in scholarly publication, and does the practice vary by gender? Using novel methods and a data set of 1.5 million research papers in the scholarly database JSTOR published between 1779 and 2011, the authors find that nearly 10 percent of references are self-citations by a paper's authors. The findings also show that between 1779 and 2011, men cited their own papers 56 percent more than did women. In the last two decades of data, men self-cited 70 percent more than women. Women are also more than 10 percentage points more likely than men to not cite their own previous work at all. While these patterns could result from differences in the number of papers that men and women authors have published rather than gender-specific patterns of self-citation behavior, this gender gap in self-citation rates has remained stable over the last 50 years, despite increased representation of women in academia. The authors break down self-citation patterns by academic field and number of authors and comment on potential mechanisms behind these observations. These findings have important implications for scholarly visibility and cumulative advantage in academic careers.
△ Less
Submitted 12 December, 2017; v1 submitted 30 June, 2016;
originally announced July 2016.
-
Static Ranking of Scholarly Papers using Article-Level Eigenfactor (ALEF)
Authors:
Ian Wesley-Smith,
Carl T. Bergstrom,
Jevin D. West
Abstract:
Microsoft Research hosted the 2016 WSDM Cup Challenge based on the Microsoft Academic Graph. The goal was to provide static rankings for the articles that make up the graph, with the rankings to be evaluated against those of human judges. While the Microsoft Academic Graph provided metadata about many aspects of each scholarly document, we focused more narrowly on citation data and used this conte…
▽ More
Microsoft Research hosted the 2016 WSDM Cup Challenge based on the Microsoft Academic Graph. The goal was to provide static rankings for the articles that make up the graph, with the rankings to be evaluated against those of human judges. While the Microsoft Academic Graph provided metadata about many aspects of each scholarly document, we focused more narrowly on citation data and used this contest as an opportunity to test the Article Level Eigenfactor (ALEF), a novel citation-based ranking algorithm, and evaluate its performance against competing algorithms that drew upon multiple facets of the data from a large, real world dataset (122M papers and 757M citations). Our final submission to this contest was scored at 0.676, earning second place.
△ Less
Submitted 27 June, 2016;
originally announced June 2016.
-
Viziometrics: Analyzing Visual Information in the Scientific Literature
Authors:
Po-shen Lee,
Jevin D. West,
Bill Howe
Abstract:
Scientific results are communicated visually in the literature through diagrams, visualizations, and photographs. These information-dense objects have been largely ignored in bibliometrics and scientometrics studies when compared to citations and text. In this paper, we use techniques from computer vision and machine learning to classify more than 8 million figures from PubMed into 5 figure types…
▽ More
Scientific results are communicated visually in the literature through diagrams, visualizations, and photographs. These information-dense objects have been largely ignored in bibliometrics and scientometrics studies when compared to citations and text. In this paper, we use techniques from computer vision and machine learning to classify more than 8 million figures from PubMed into 5 figure types and study the resulting patterns of visual information as they relate to impact. We find that the distribution of figures and figure types in the literature has remained relatively constant over time, but can vary widely across field and topic. Remarkably, we find a significant correlation between scientific impact and the use of visual information, where higher impact papers tend to include more diagrams, and to a lesser extent more plots and photographs. To explore these results and other ways of extracting this visual information, we have built a visual browser to illustrate the concept and explore design alternatives for supporting viziometric analysis and organizing visual information. We use these results to articulate a new research agenda -- viziometrics -- to study the organization and presentation of visual information in the scientific literature.
△ Less
Submitted 27 May, 2016; v1 submitted 16 May, 2016;
originally announced May 2016.
-
The vulnerability of the diameter of enhanced hypercubes
Authors:
Meijie Ma,
Douglas B. West,
Jun-Ming Xu
Abstract:
For an interconnection network $G$, the {\it $ω$-wide diameter} $d_ω(G)$ is the least $\ell$ such that any two vertices are joined by $ω$ internally-disjoint paths of length at most $\ell$, and the {\it $(ω-1)$-fault diameter} $D_ω(G)$ is the maximum diameter of a subgraph obtained by deleting fewer than $ω$ vertices of $G$.
The enhanced hypercube $Q_{n,k}$ is a variant of the well-known hypercu…
▽ More
For an interconnection network $G$, the {\it $ω$-wide diameter} $d_ω(G)$ is the least $\ell$ such that any two vertices are joined by $ω$ internally-disjoint paths of length at most $\ell$, and the {\it $(ω-1)$-fault diameter} $D_ω(G)$ is the maximum diameter of a subgraph obtained by deleting fewer than $ω$ vertices of $G$.
The enhanced hypercube $Q_{n,k}$ is a variant of the well-known hypercube. Yang, Chang, Pai, and Chan gave an upper bound for $d_{n+1}(Q_{n,k})$ and $D_{n+1}(Q_{n,k})$ and posed the problem of finding the wide diameters and fault diameters of $Q_{n,k}$. By constructing internally disjoint paths between any two vertices in the enhanced hypercube, for $n\ge3$ and $2\le k\le n$ we prove $$ D_ω(Q_{n,k})=d_ω(Q_{n,k})=\begin{cases} d(Q_{n,k}) & \textrm{for $1 \leq ω< n-\lfloor\frac{k}{2}\rfloor$;}\\ d(Q_{n,k})+1 & \textrm{for $n-\lfloor\frac{k}{2}\rfloor \leq ω\leq n+1$.} \end{cases} $$ where $d(Q_{n,k})$ is the diameter of $Q_{n,k}$. These results mean that interconnection networks modelled by enhanced hypercubes are extremely robust.
△ Less
Submitted 18 June, 2016; v1 submitted 11 April, 2016;
originally announced April 2016.
-
On r-dynamic Coloring of Grids
Authors:
Ross Kang,
Tobias Muller,
Douglas B. West
Abstract:
An \textit{$r$-dynamic $k$-coloring} of a graph $G$ is a proper $k$-coloring of $G$ such that every vertex in $V(G)$ has neighbors in at least $\min\{d(v),r\}$ different color classes. The \textit{$r$-dynamic chromatic number} of a graph $G$, written $χ_r(G)$, is the least $k$ such that $G$ has such a coloring. Proving a conjecture of Jahanbekam, Kim, O, and West, we show that the $m$-by-$n$ grid…
▽ More
An \textit{$r$-dynamic $k$-coloring} of a graph $G$ is a proper $k$-coloring of $G$ such that every vertex in $V(G)$ has neighbors in at least $\min\{d(v),r\}$ different color classes. The \textit{$r$-dynamic chromatic number} of a graph $G$, written $χ_r(G)$, is the least $k$ such that $G$ has such a coloring. Proving a conjecture of Jahanbekam, Kim, O, and West, we show that the $m$-by-$n$ grid has no $3$-dynamic $4$-coloring when $mn\equiv2\mod 4$. This completes the determination of the $r$-dynamic chromatic number of the $m$-by-$n$ grid for all $r,m,n$.
△ Less
Submitted 13 July, 2014;
originally announced July 2014.
-
Beyond Ohba's Conjecture: A bound on the choice number of $k$-chromatic graphs with $n$ vertices
Authors:
Jonathan A. Noel,
Douglas B. West,
Hehui Wu,
Xuding Zhu
Abstract:
Let $\text{ch}(G)$ denote the choice number of a graph $G$ (also called "list chromatic number" or "choosability" of $G$). Noel, Reed, and Wu proved the conjecture of Ohba that $\text{ch}(G)=χ(G)$ when $|V(G)|\le 2χ(G)+1$. We extend this to a general upper bound: $\text{ch}(G)\le \max\{χ(G),\lceil({|V(G)|+χ(G)-1})/{3}\rceil\}$. Our result is sharp for $|V(G)|\le 3χ(G)$ using Ohba's examples, and i…
▽ More
Let $\text{ch}(G)$ denote the choice number of a graph $G$ (also called "list chromatic number" or "choosability" of $G$). Noel, Reed, and Wu proved the conjecture of Ohba that $\text{ch}(G)=χ(G)$ when $|V(G)|\le 2χ(G)+1$. We extend this to a general upper bound: $\text{ch}(G)\le \max\{χ(G),\lceil({|V(G)|+χ(G)-1})/{3}\rceil\}$. Our result is sharp for $|V(G)|\le 3χ(G)$ using Ohba's examples, and it improves the best-known upper bound for $\text{ch}(K_{4,\dots,4})$.
△ Less
Submitted 27 August, 2014; v1 submitted 30 August, 2013;
originally announced August 2013.
-
Memory in network flows and its effects on spreading dynamics and community detection
Authors:
Martin Rosvall,
Alcides V. Esquivel,
Andrea Lancichinetti,
Jevin D. West,
Renaud Lambiotte
Abstract:
Random walks on networks is the standard tool for modelling spreading processes in social and biological systems. This first-order Markov approach is used in conventional community detection, ranking, and spreading analysis although it ignores a potentially important feature of the dynamics: where flow moves to may depend on where it comes from. Here we analyse pathways from different systems, and…
▽ More
Random walks on networks is the standard tool for modelling spreading processes in social and biological systems. This first-order Markov approach is used in conventional community detection, ranking, and spreading analysis although it ignores a potentially important feature of the dynamics: where flow moves to may depend on where it comes from. Here we analyse pathways from different systems, and while we only observe marginal consequences for disease spreading, we show that ignoring the effects of second-order Markov dynamics has important consequences for community detection, ranking, and information spreading. For example, capturing dynamics with a second-order Markov model allows us to reveal actual travel patterns in air traffic and to uncover multidisciplinary journals in scientific communication. These findings were achieved only by using more available data and making no additional assumptions, and therefore suggest that accounting for higher-order memory in network flows can help us better understand how real systems are organized and function.
△ Less
Submitted 12 August, 2014; v1 submitted 21 May, 2013;
originally announced May 2013.
-
The role of gender in scholarly authorship
Authors:
Jevin D. West,
Jennifer Jacquet,
Molly M. King,
Shelley J. Correll,
Carl T. Bergstrom
Abstract:
Gender disparities appear to be decreasing in academia according to a number of metrics, such as grant funding, hiring, acceptance at scholarly journals, and productivity, and it might be tempting to think that gender inequity will soon be a problem of the past. However, a large-scale analysis based on over eight million papers across the natural sciences, social sciences, and humanities re- revea…
▽ More
Gender disparities appear to be decreasing in academia according to a number of metrics, such as grant funding, hiring, acceptance at scholarly journals, and productivity, and it might be tempting to think that gender inequity will soon be a problem of the past. However, a large-scale analysis based on over eight million papers across the natural sciences, social sciences, and humanities re- reveals a number of understated and persistent ways in which gender inequities remain. For instance, even where raw publication counts seem to be equal between genders, close inspection reveals that, in certain fields, men predominate in the prestigious first and last author positions. Moreover, women are significantly underrepresented as authors of single-authored papers. Academics should be aware of the subtle ways that gender disparities can appear in scholarly authorship.
△ Less
Submitted 7 November, 2012;
originally announced November 2012.
-
Revolutionaries and spies: Spy-good and spy-bad graphs
Authors:
Jane V. Butterfield,
Daniel W. Cranston,
Gregory J. Puleo,
Douglas B. West,
Reza Zamani
Abstract:
We study a game on a graph $G$ played by $r$ {\it revolutionaries} and $s$ {\it spies}. Initially, revolutionaries and then spies occupy vertices. In each subsequent round, each revolutionary may move to a neighboring vertex or not move, and then each spy has the same option. The revolutionaries win if $m$ of them meet at some vertex having no spy (at the end of a round); the spies win if they can…
▽ More
We study a game on a graph $G$ played by $r$ {\it revolutionaries} and $s$ {\it spies}. Initially, revolutionaries and then spies occupy vertices. In each subsequent round, each revolutionary may move to a neighboring vertex or not move, and then each spy has the same option. The revolutionaries win if $m$ of them meet at some vertex having no spy (at the end of a round); the spies win if they can avoid this forever.
Let $σ(G,m,r)$ denote the minimum number of spies needed to win. To avoid degenerate cases, assume $|V(G)|\ge r-m+1\ge\floor{r/m}\ge 1$. The easy bounds are then $\floor{r/m}\le σ(G,m,r)\le r-m+1$. We prove that the lower bound is sharp when $G$ has a rooted spanning tree $T$ such that every edge of $G$ not in $T$ joins two vertices having the same parent in $T$. As a consequence, $σ(G,m,r)\leγ(G)\floor{r/m}$, where $γ(G)$ is the domination number; this bound is nearly sharp when $γ(G)\le m$.
For the random graph with constant edge-probability $p$, we obtain constants $c$ and $c'$ (depending on $m$ and $p$) such that $σ(G,m,r)$ is near the trivial upper bound when $r<c\ln n$ and at most $c'$ times the trivial lower bound when $r>c'\ln n$. For the hypercube $Q_d$ with $d\ge r$, we have $σ(G,m,r)=r-m+1$ when $m=2$, and for $m\ge 3$ at least $r-39m$ spies are needed.
For complete $k$-partite graphs with partite sets of size at least $2r$, the leading term in $σ(G,m,r)$ is approximately $\frac{k}{k-1}\frac{r}{m}$ when $k\ge m$. For $k=2$, we have $σ(G,2,r)=\bigl\lceil{\frac{\floor{7r/2}-3}5}\bigr\rceil$ and $σ(G,3,r)=\floor{r/2}$, and in general $\frac{3r}{2m}-3\le σ(G,m,r)\le\frac{(1+1/\sqrt3)r}{m}$.
△ Less
Submitted 26 May, 2012; v1 submitted 13 February, 2012;
originally announced February 2012.
-
Revolutionaries and spies on trees and unicyclic graphs
Authors:
Daniel W. Cranston,
Clifford D. Smyth,
Douglas B. West
Abstract:
A team of $r$ {\it revolutionaries} and a team of $s$ {\it spies} play a game on a graph $G$. Initially, revolutionaries and then spies take positions at vertices. In each subsequent round, each revolutionary may move to an adjacent vertex or not move, and then each spy has the same option. The revolutionaries want to hold an {\it unguarded meeting}, meaning $m$ revolutionaries at some vertex havi…
▽ More
A team of $r$ {\it revolutionaries} and a team of $s$ {\it spies} play a game on a graph $G$. Initially, revolutionaries and then spies take positions at vertices. In each subsequent round, each revolutionary may move to an adjacent vertex or not move, and then each spy has the same option. The revolutionaries want to hold an {\it unguarded meeting}, meaning $m$ revolutionaries at some vertex having no spy at the end of a round. To prevent this forever, trivially at least $\min\{|V(G)|,\FL{r/m}\}$ spies are needed. When $G$ is a tree, this many spies suffices. When $G$ is a unicyclic graph, $\min\{|V(G)|,\CL{r/m}\}$ spies suffice, and we characterize those unicyclic graphs where $\FL{r/m}+1$ spies are needed. \def\FL#1{\lfloor #1 \rfloor} \def\CL#1{\lceil #1 \rceil}
△ Less
Submitted 11 October, 2011;
originally announced October 2011.