Skip to main content

Showing 1–22 of 22 results for author: Piccardi, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.15837  [pdf, other

    cs.SI cs.CY cs.DL

    Web2Wiki: Characterizing Wikipedia Linking Across the Web

    Authors: Veniamin Veselovsky, Tiziano Piccardi, Ashton Anderson, Robert West, Akhil Arora

    Abstract: Wikipedia is one of the most visited websites globally, yet its role beyond its own platform remains largely unexplored. In this paper, we present the first large-scale analysis of how Wikipedia is referenced across the Web. Using a dataset from Common Crawl, we identify over 90 million Wikipedia links spanning 1.68% of Web domains and examine their distribution, context, and function. Our analysi… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 13 pages, 3 figures, 5 tables

  2. arXiv:2505.10839  [pdf, other

    cs.HC cs.CY cs.SI

    Alexandria: A Library of Pluralistic Values for Realtime Re-Ranking of Social Media Feeds

    Authors: Akaash Kolluri, Renn Su, Farnaz Jahanbakhsh, Dora Zhao, Tiziano Piccardi, Michael S. Bernstein

    Abstract: Social media feed ranking algorithms fail when they too narrowly focus on engagement as their objective. The literature has asserted a wide variety of values that these algorithms should account for as well -- ranging from well-being to productive discourse -- far more than can be encapsulated by a single topic or theory. In response, we present a $\textit{library of values}$ for social media algo… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  3. arXiv:2501.00939  [pdf

    cs.CY cs.DL cs.HC

    Navigating Knowledge: Patterns and Insights from Wikipedia Consumption

    Authors: Tiziano Piccardi, Robert West

    Abstract: The Web has drastically simplified our access to knowledge and learning, and fact-checking online resources has become a part of our daily routine. Studying online knowledge consumption is thus critical for understanding human behavior and informing the design of future platforms. In this Chapter, we approach this subject by describing the navigation patterns of the readers of Wikipedia, the world… ▽ More

    Submitted 1 January, 2025; originally announced January 2025.

    Comments: This is a draft. The final version will be available in the Handbook of Computational Social Science edited by Taha Yasseri, forthcoming 2025, Edward Elgar Publishing Ltd

  4. arXiv:2411.14652  [pdf, other

    cs.CY cs.AI cs.HC cs.SI

    Social Media Algorithms Can Shape Affective Polarization via Exposure to Antidemocratic Attitudes and Partisan Animosity

    Authors: Tiziano Piccardi, Martin Saveski, Chenyan Jia, Jeffrey T. Hancock, Jeanne L. Tsai, Michael Bernstein

    Abstract: There is widespread concern about the negative impacts of social media feed ranking algorithms on political polarization. Leveraging advancements in large language models (LLMs), we develop an approach to re-rank feeds in real-time to test the effects of content that is likely to polarize: expressions of antidemocratic attitudes and partisan animosity (AAPA). In a preregistered 10-day field experi… ▽ More

    Submitted 21 November, 2024; originally announced November 2024.

  5. arXiv:2406.19571  [pdf, other

    cs.SI cs.CY

    Reranking Social Media Feeds: A Practical Guide for Field Experiments

    Authors: Tiziano Piccardi, Martin Saveski, Chenyan Jia, Jeffrey Hancock, Jeanne L. Tsai, Michael S. Bernstein

    Abstract: Social media plays a central role in shaping public opinion and behavior, yet performing experiments on these platforms and, in particular, on feed algorithms is becoming increasingly challenging. This article offers practical recommendations to researchers developing and deploying field experiments focused on real-time re-ranking of social media feeds. This article is organized around two contrib… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  6. arXiv:2402.02056  [pdf, other

    cs.CL cs.AI cs.CY

    AnthroScore: A Computational Linguistic Measure of Anthropomorphism

    Authors: Myra Cheng, Kristina Gligoric, Tiziano Piccardi, Dan Jurafsky

    Abstract: Anthropomorphism, or the attribution of human-like characteristics to non-human entities, has shaped conversations about the impacts and possibilities of technology. We present AnthroScore, an automatic metric of implicit anthropomorphism in language. We use a masked language model to quantify how non-human entities are implicitly framed as human by the surrounding context. We show that AnthroScor… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: EACL 2024 Main Conference

  7. arXiv:2310.11501  [pdf, other

    cs.CL cs.AI cs.CY

    CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations

    Authors: Myra Cheng, Tiziano Piccardi, Diyi Yang

    Abstract: Recent work has aimed to capture nuances of human behavior by using LLMs to simulate responses from particular demographics in settings like social science experiments and public opinion surveys. However, there are currently no established ways to discuss or evaluate the quality of such LLM simulations. Moreover, there is growing concern that these LLM simulations are flattened caricatures of the… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: To appear at EMNLP 2023 (Main)

  8. arXiv:2308.16491  [pdf, other

    cs.CY cs.AI cs.LG cs.SI

    In-class Data Analysis Replications: Teaching Students while Testing Science

    Authors: Kristina Gligoric, Tiziano Piccardi, Jake Hofman, Robert West

    Abstract: Science is facing a reproducibility crisis. Previous work has proposed incorporating data analysis replications into classrooms as a potential solution. However, despite the potential benefits, it is unclear whether this approach is feasible, and if so, what the involved stakeholders-students, educators, and scientists-should expect from it. Can students perform a data analysis replication over th… ▽ More

    Submitted 30 July, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

  9. arXiv:2305.09497  [pdf, other

    cs.CY cs.DL

    Curious Rhythms: Temporal Regularities of Wikipedia Consumption

    Authors: Tiziano Piccardi, Martin Gerlach, Robert West

    Abstract: Wikipedia, in its role as the world's largest encyclopedia, serves a broad range of information needs. Although previous studies have noted that Wikipedia users' information needs vary throughout the day, there is to date no large-scale, quantitative study of the underlying dynamics. The present paper fills this gap by investigating temporal regularities in daily consumption patterns in a large-sc… ▽ More

    Submitted 23 November, 2024; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: ICWSM 2024

  10. arXiv:2305.09038  [pdf

    cs.HC

    Characterizing Image Accessibility on Wikipedia across Languages

    Authors: Elisa Kreiss, Krishna Srinivasan, Tiziano Piccardi, Jesus Adolfo Hermosillo, Cynthia Bennett, Michael S. Bernstein, Meredith Ringel Morris, Christopher Potts

    Abstract: We make a first attempt to characterize image accessibility on Wikipedia across languages, present new experimental results that can inform efforts to assess description quality, and offer some strategies to improve Wikipedia's image accessibility.

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: Presented at Wiki Workshop 2023

  11. arXiv:2203.06932  [pdf, other

    cs.CY

    Going Down the Rabbit Hole: Characterizing the Long Tail of Wikipedia Reading Sessions

    Authors: Tiziano Piccardi, Martin Gerlach, Robert West

    Abstract: "Wiki rabbit holes" are informally defined as navigation paths followed by Wikipedia readers that lead them to long explorations, sometimes involving unexpected articles. Although wiki rabbit holes are a popular concept in Internet culture, our current understanding of their dynamics is based on anecdotal reports only. To bridge this gap, this paper provides a large-scale quantitative characteriza… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: WikiWorkshop '22 Proc. of World Wide Web Conference (Companion)

  12. arXiv:2201.03677  [pdf, other

    cs.CL cs.AI

    Homepage2Vec: Language-Agnostic Website Embedding and Classification

    Authors: Sylvain Lugeon, Tiziano Piccardi, Robert West

    Abstract: Currently, publicly available models for website classification do not offer an embedding method and have limited support for languages beyond English. We release a dataset of more than two million category-labeled websites in 92 languages collected from Curlie, the largest multilingual human-edited Web directory. The dataset contains 14 website categories aligned across languages. Alongside it, w… ▽ More

    Submitted 8 April, 2022; v1 submitted 10 January, 2022; originally announced January 2022.

    Comments: Published in Proc. of ICWSM 2022

  13. arXiv:2201.00812  [pdf, other

    cs.CY cs.DL cs.SI

    Wikipedia Reader Navigation: When Synthetic Data Is Enough

    Authors: Akhil Arora, Martin Gerlach, Tiziano Piccardi, Alberto García-Durán, Robert West

    Abstract: Every day millions of people read Wikipedia. When navigating the vast space of available topics using hyperlinks, readers describe trajectories on the article network. Understanding these navigation patterns is crucial to better serve readers' needs and address structural biases and knowledge gaps. However, systematic studies of navigation on Wikipedia are hindered by a lack of publicly available… ▽ More

    Submitted 5 January, 2022; v1 submitted 3 January, 2022; originally announced January 2022.

    Comments: WSDM 2022, 11 pages, 16 figures

  14. A Large-Scale Characterization of How Readers Browse Wikipedia

    Authors: Tiziano Piccardi, Martin Gerlach, Akhil Arora, Robert West

    Abstract: Despite the importance and pervasiveness of Wikipedia as one of the largest platforms for open knowledge, surprisingly little is known about how people navigate its content when seeking information. To bridge this gap, we present the first systematic large-scale analysis of how readers browse Wikipedia. Using billions of page requests from Wikipedia's server logs, we measure how readers reach arti… ▽ More

    Submitted 17 January, 2023; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: ACM Trans. Web, Vol. 1, No. 1, Article 1. Publication date: January 2023

  15. arXiv:2112.01868  [pdf, other

    cs.CY

    A Large Scale Study of Reader Interactions with Images on Wikipedia

    Authors: Daniele Rama, Tiziano Piccardi, Miriam Redi, Rossano Schifanella

    Abstract: Wikipedia is the largest source of free encyclopedic knowledge and one of the most visited sites on the Web. To increase reader understanding of the article, Wikipedia editors add images within the text of the article's body. However, despite their widespread usage on web platforms and the huge volume of visual content on Wikipedia, little is known about the importance of images in the context of… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

    Comments: 29 pages, 12 figures, final version to be published in EPJ Data Science

  16. On the Value of Wikipedia as a Gateway to the Web

    Authors: Tiziano Piccardi, Miriam Redi, Giovanni Colavizza, Robert West

    Abstract: By linking to external websites, Wikipedia can act as a gateway to the Web. To date, however, little is known about the amount of traffic generated by Wikipedia's external links. We fill this gap in a detailed analysis of usage logs gathered from Wikipedia users' client devices. Our analysis proceeds in three steps: First, we quantify the level of engagement with external links, finding that, in o… ▽ More

    Submitted 15 February, 2021; originally announced February 2021.

    Comments: The Web Conference WWW 2021, 12 pages

  17. Crosslingual Topic Modeling with WikiPDA

    Authors: Tiziano Piccardi, Robert West

    Abstract: We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA), a crosslingual topic model that learns to represent Wikipedia articles written in any language as distributions over a common set of language-independent topics. It leverages the fact that Wikipedia articles link to each other and are mapped to concepts in the Wikidata knowledge base, such that, when represented as bags of links,… ▽ More

    Submitted 14 February, 2021; v1 submitted 23 September, 2020; originally announced September 2020.

    Comments: 10 pages, WWW - The Web Conference, 2021

  18. arXiv:2001.10256  [pdf, other

    cs.CY

    WikiHist.html: English Wikipedia's Full Revision History in HTML Format

    Authors: Blagoj Mitrevski, Tiziano Piccardi, Robert West

    Abstract: Wikipedia is written in the wikitext markup language. When serving content, the MediaWiki software that powers Wikipedia parses wikitext to HTML, thereby inserting additional content by expanding macros (templates and mod-ules). Hence, researchers who intend to analyze Wikipediaas seen by its readers should work with HTML, rather than wikitext. Since Wikipedia's revision history is publicly availa… ▽ More

    Submitted 21 April, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

    Comments: Dataset paper, 7 pages

  19. arXiv:2001.08614  [pdf, other

    cs.CY

    Quantifying Engagement with Citations on Wikipedia

    Authors: Tiziano Piccardi, Miriam Redi, Giovanni Colavizza, Robert West

    Abstract: Wikipedia, the free online encyclopedia that anyone can edit, is one of the most visited sites on the Web and a common source of information for many users. As an encyclopedia, Wikipedia is not a source of original information, but was conceived as a gateway to secondary sources: according to Wikipedia's guidelines, facts must be backed up by reliable sources that reflect the full spectrum of view… ▽ More

    Submitted 26 January, 2020; v1 submitted 23 January, 2020; originally announced January 2020.

    Comments: The Web Conference WWW 2020, 10 pages

  20. Structuring Wikipedia Articles with Section Recommendations

    Authors: Tiziano Piccardi, Michele Catasta, Leila Zia, Robert West

    Abstract: Sections are the building blocks of Wikipedia articles. They enhance readability and can be used as a structured entry point for creating and expanding articles. Structuring a new or already existing Wikipedia article with sections is a hard task for humans, especially for newcomers or less experienced editors, as it requires significant knowledge about how a well-written article looks for each po… ▽ More

    Submitted 3 May, 2018; v1 submitted 16 April, 2018; originally announced April 2018.

    Comments: SIGIR '18 camera-ready

  21. arXiv:1804.02525  [pdf, other

    cs.SI cs.CL cs.IR

    Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping

    Authors: Dario Pavllo, Tiziano Piccardi, Robert West

    Abstract: We propose Quootstrap, a method for extracting quotations, as well as the names of the speakers who uttered them, from large news corpora. Whereas prior work has addressed this problem primarily with supervised machine learning, our approach follows a fully unsupervised bootstrapping paradigm. It leverages the redundancy present in large news corpora, more precisely, the fact that the same quotati… ▽ More

    Submitted 7 April, 2018; originally announced April 2018.

    Comments: Accepted at the 12th International Conference on Web and Social Media (ICWSM), 2018

  22. arXiv:1407.5903  [pdf, other

    cs.HC cs.CY

    Assessing the Performance of Question-and-Answer Communities Using Survival Analysis

    Authors: Felipe Ortega, Gregorio Convertino, Massimo Zancanaro, Tiziano Piccardi

    Abstract: Question-&-Answer (QA) websites have emerged as efficient platforms for knowledge sharing and problem solving. In particular, the Stack Exchange platform includes some of the most popular QA communities to date, such as Stack Overflow. Initial metrics used to assess the performance of these communities include summative statistics like the percentage of resolved questions or the average time to re… ▽ More

    Submitted 29 July, 2014; v1 submitted 22 July, 2014; originally announced July 2014.

    Comments: 10 pages, 3 figures, example code

    ACM Class: H.3.4