Skip to main content

Showing 1–50 of 66 results for author: Weninger, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.14370  [pdf, ps, other

    cs.CL

    Digital Gatekeepers: Google's Role in Curating Hashtags and Subreddits

    Authors: Amrit Poudel, Yifan Ding, Jurgen Pfeffer, Tim Weninger

    Abstract: Search engines play a crucial role as digital gatekeepers, shaping the visibility of Web and social media content through algorithmic curation. This study investigates how search engines like Google selectively promotes or suppresses certain hashtags and subreddits, impacting the information users encounter. By comparing search engine results with nonsampled data from Reddit and Twitter/X, we reve… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Accepted to ACL 2025 Main

    Journal ref: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics 2025

  2. arXiv:2501.01303  [pdf, other

    cs.CL cs.AI

    Citations and Trust in LLM Generated Responses

    Authors: Yifan Ding, Matthew Facciani, Amrit Poudel, Ellen Joyce, Salvador Aguinaga, Balaji Veeramani, Sanmitra Bhattacharya, Tim Weninger

    Abstract: Question answering systems are rapidly advancing, but their opaque nature may impact user trust. We explored trust through an anti-monitoring framework, where trust is predicted to be correlated with presence of citations and inversely related to checking citations. We tested this hypothesis with a live question-answering experiment that presented text responses generated using a commercial Chatbo… ▽ More

    Submitted 2 January, 2025; originally announced January 2025.

    Comments: Accepted to AAAI 2025

  3. arXiv:2409.15368  [pdf, other

    cs.CL cs.AI cs.ET cs.IR cs.LG

    MedCodER: A Generative AI Assistant for Medical Coding

    Authors: Krishanu Das Baksi, Elijah Soba, John J. Higgins, Ravi Saini, Jaden Wood, Jane Cook, Jack Scott, Nirmala Pudota, Tim Weninger, Edward Bowen, Sanmitra Bhattacharya

    Abstract: Medical coding is essential for standardizing clinical data and communication but is often time-consuming and prone to errors. Traditional Natural Language Processing (NLP) methods struggle with automating coding due to the large label space, lengthy text inputs, and the absence of supporting evidence annotations that justify code selection. Recent advancements in Generative Artificial Intelligenc… ▽ More

    Submitted 18 September, 2024; originally announced September 2024.

  4. arXiv:2409.13064  [pdf, other

    cs.SI cs.AI

    Fear and Loathing on the Frontline: Decoding the Language of Othering by Russia-Ukraine War Bloggers

    Authors: Patrick Gerard, William Theisen, Tim Weninger, Kristina Lerman

    Abstract: Othering, the act of portraying outgroups as fundamentally different from the ingroup, often escalates into framing them as existential threats--fueling intergroup conflict and justifying exclusion and violence. These dynamics are alarmingly pervasive, spanning from the extreme historical examples of genocides against minorities in Germany and Rwanda to the ongoing violence and rhetoric targeting… ▽ More

    Submitted 19 September, 2024; originally announced September 2024.

    Comments: 15 pages

  5. arXiv:2409.07684  [pdf, other

    cs.SI cs.AI

    Modeling Information Narrative Detection and Evolution on Telegram during the Russia-Ukraine War

    Authors: Patrick Gerard, Svitlana Volkova, Louis Penafiel, Kristina Lerman, Tim Weninger

    Abstract: Following the Russian Federation's full-scale invasion of Ukraine in February 2022, a multitude of information narratives emerged within both pro-Russian and pro-Ukrainian communities online. As the conflict progresses, so too do the information narratives, constantly adapting and influencing local and global community perceptions and attitudes. This dynamic nature of the evolving information envi… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 12 pages, International AAAI Conference on Web and Social Media 2025

  6. arXiv:2405.19164  [pdf, ps, other

    cs.AI cs.IR

    Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery

    Authors: Sounak Lahiri, Sumit Pai, Tim Weninger, Sanmitra Bhattacharya

    Abstract: Electronic Discovery (eDiscovery) requires identifying relevant documents from vast collections for legal production requests. While artificial intelligence (AI) and natural language processing (NLP) have improved document review efficiency, current methods still struggle with legal entities, citations, and complex legal artifacts. To address these challenges, we introduce DISCOvery Graph (DISCOG)… ▽ More

    Submitted 13 June, 2025; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Updated with Camera Ready Copy for ACL 2025

  7. arXiv:2405.12040  [pdf, other

    cs.SI cs.HC

    Reputation Transfer in the Twitter Diaspora

    Authors: Kristina Radivojevic, DJ Adams, Griffin Laszlo, Felixander Kery, Tim Weninger

    Abstract: Social media platforms have witnessed a dynamic landscape of user migration in recent years, fueled by changes in ownership, policy, and user preferences. This paper explores the phenomenon of user migration from established platforms like X/Twitter to emerging alternatives such as Threads, Mastodon, and Truth Social. Leveraging a large dataset from X/Twitter, we investigate the extent of user dep… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures

  8. arXiv:2404.13489  [pdf, other

    cs.DB

    SCHENO: Measuring Schema vs. Noise in Graphs

    Authors: Justus Isaiah Hibshman, Adnan Hoq, Tim Weninger

    Abstract: Real-world data is typically a noisy manifestation of a core pattern (schema), and the purpose of data mining algorithms is to uncover that pattern, thereby splitting (i.e. decomposing) the data into schema and noise. We introduce SCHENO, a principled evaluation metric for the goodness of a schema-noise decomposition of a graph. SCHENO captures how schematic the schema is, how noisy the noise is,… ▽ More

    Submitted 4 February, 2025; v1 submitted 20 April, 2024; originally announced April 2024.

    MSC Class: 68R10; 68T10; 08A35

  9. arXiv:2403.15453  [pdf, other

    cs.CL cs.AI cs.IR

    Span-Oriented Information Extraction -- A Unifying Perspective on Information Extraction

    Authors: Yifan Ding, Michael Yankoski, Tim Weninger

    Abstract: Information Extraction refers to a collection of tasks within Natural Language Processing (NLP) that identifies sub-sequences within text and their labels. These tasks have been used for many years to link extract relevant information and to link free text to structured data. However, the heterogeneity among information extraction tasks impedes progress in this area. We therefore offer a unifying… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 35 Pages, 1 Figure

  10. arXiv:2402.14947  [pdf, other

    cs.HC cs.MM cs.SI

    An Avalanche of Images on Telegram Preceded Russia's Full-Scale Invasion of Ukraine

    Authors: William Theisen, Michael Yankoski, Kristina Hook, Ernesto Verdeja, Walter Scheirer, Tim Weninger

    Abstract: Governments use propaganda, including through visual content -- or Politically Salient Image Patterns (PSIP) -- on social media, to influence and manipulate public opinion. In the present work, we collected Telegram post-history of from 989 Russian milbloggers to better understand the social and political narratives that circulated online in the months surrounding Russia's 2022 full-scale invasion… ▽ More

    Submitted 15 July, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 20 pages, 7 figures

  11. arXiv:2402.14858  [pdf, other

    cs.CL cs.AI

    ChatEL: Entity Linking with Chatbots

    Authors: Yifan Ding, Qingkai Zeng, Tim Weninger

    Abstract: Entity Linking (EL) is an essential and challenging task in natural language processing that seeks to link some text representing an entity within a document or sentence with its corresponding entry in a dictionary or knowledge base. Most existing approaches focus on creating elaborate contextual models that look for clues the words surrounding the entity-text to help solve the linking problem. Al… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  12. arXiv:2402.06738  [pdf, other

    cs.CL

    EntGPT: Entity Linking with Generative Large Language Models

    Authors: Yifan Ding, Amrit Poudel, Qingkai Zeng, Tim Weninger, Balaji Veeramani, Sanmitra Bhattacharya

    Abstract: Entity Linking in natural language processing seeks to match text entities to their corresponding entries in a dictionary or knowledge base. Traditional approaches rely on contextual models, which can be complex, hard to train, and have limited transferability across different domains. Generative large language models like GPT offer a promising alternative but often underperform with naive prompts… ▽ More

    Submitted 22 May, 2025; v1 submitted 9 February, 2024; originally announced February 2024.

    ACM Class: H.3.3

  13. arXiv:2401.15479  [pdf, other

    cs.IR cs.CL cs.SI

    Navigating the Post-API Dilemma | Search Engine Results Pages Present a Biased View of Social Media Data

    Authors: Amrit Poudel, Tim Weninger

    Abstract: Recent decisions to discontinue access to social media APIs are having detrimental effects on Internet research and the field of computational social science as a whole. This lack of access to data has been dubbed the Post-API era of Internet research. Fortunately, popular search engines have the means to crawl, capture, and surface social media data on their Search Engine Results Pages (SERP) if… ▽ More

    Submitted 27 November, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

    Comments: Proceedings of the ACM Web Conference 2024 (WWW '24)

  14. arXiv:2310.11607  [pdf, other

    cs.LG

    TK-KNN: A Balanced Distance-Based Pseudo Labeling Approach for Semi-Supervised Intent Classification

    Authors: Nicholas Botzer, David Vasquez, Tim Weninger, Issam Laradji

    Abstract: The ability to detect intent in dialogue systems has become increasingly important in modern technology. These systems often generate a large amount of unlabeled data, and manually labeling this data requires substantial human effort. Semi-supervised methods attempt to remedy this cost by using a model trained on a few labeled examples and then by assigning pseudo-labels to further a subset of unl… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: 9 pages, 6 figures, 4 tables

  15. arXiv:2304.03351  [pdf, other

    cs.SI

    Entity Graphs for Exploring Online Discourse

    Authors: Nicholas Botzer, Tim Weninger

    Abstract: Vast amounts of human communication occurs online. These digital traces of natural human communication along with recent advances in natural language processing technology provide for computational analysis of these discussions. In the study of social networks the typical perspective is to view users as nodes and concepts as flowing through and among the user-nodes within the social network. In th… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: 27 pages, 10 figures, Published in Knowledge and Information Systems

  16. arXiv:2303.11553  [pdf, other

    cs.LG cs.FL cs.SI

    Dynamic Vertex Replacement Grammars

    Authors: Daniel Gonzalez Cedre, Justus Isaiah Hibshman, Timothy La Fond, Grant Boquet, Tim Weninger

    Abstract: Context-free graph grammars have shown a remarkable ability to model structures in real-world relational data. However, graph grammars lack the ability to capture time-changing phenomena since the left-to-right transitions of a production rule do not represent temporal change. In the present work, we describe dynamic vertex-replacement grammars (DyVeRG), which generalize vertex replacement grammar… ▽ More

    Submitted 21 March, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

  17. arXiv:2303.11240  [pdf, other

    cs.SI

    Truth Social Dataset

    Authors: Patrick Gerard, Nicholas Botzer, Tim Weninger

    Abstract: Formally announced to the public following former President Donald Trump's bans and suspensions from mainstream social networks in early 2022 after his role in the January 6 Capitol Riots, Truth Social was launched as an "alternative" social media platform that claims to be a refuge for free speech, offering a platform for those disaffected by the content moderation policies of the existing, mains… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: 7 pages, 5 figures, ICWSM 2023

  18. arXiv:2301.08792  [pdf, other

    cs.SI

    Inherent Limits on Topology-Based Link Prediction

    Authors: Justus I. Hibshman, Tim Weninger

    Abstract: Link prediction systems (e.g. recommender systems) typically use graph topology as one of their main sources of information. However, automorphisms and related properties of graphs beget inherent limits in predictability. We calculate hard upper bounds on how well graph topology alone enables link prediction for a wide variety of real-world graphs. We find that in the sparsest of these graphs the… ▽ More

    Submitted 26 June, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

  19. arXiv:2206.01309  [pdf, other

    cs.CV

    H-EMD: A Hierarchical Earth Mover's Distance Method for Instance Segmentation

    Authors: Peixian Liang, Yizhe Zhang, Yifan Ding, Jianxu Chen, Chinedu S. Madukoma, Tim Weninger, Joshua D. Shrout, Danny Z. Chen

    Abstract: Deep learning (DL) based semantic segmentation methods have achieved excellent performance in biomedical image segmentation, producing high quality probability maps to allow extraction of rich instance information to facilitate good instance segmentation. While numerous efforts were put into developing new DL semantic segmentation models, less attention was paid to a key issue of how to effectivel… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: Accepted at IEEE Transactions On Medical Imaging (TMI)

  20. arXiv:2205.05783  [pdf, other

    cs.CV cs.CY

    MEWS: Real-time Social Media Manipulation Detection and Analysis

    Authors: Trenton W. Ford, William Theisen, Michael Yankoski, Tom Henry, Farah Khashman, Katherine R. Dearstyne, Tim Weninger

    Abstract: This article presents a beta-version of MEWS (Misinformation Early Warning System). It describes the various aspects of the ingestion, manipulation detection, and graphing algorithms employed to determine--in near real-time--the relationships between social media images as they emerge and spread on social media platforms. By combining these various technologies into a single processing pipeline, M… ▽ More

    Submitted 12 May, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

  21. arXiv:2203.10155  [pdf, other

    cs.SI cs.CY

    Subreddit Links Drive Community Creation and User Engagement on Reddit

    Authors: Rachel Krohn, Tim Weninger

    Abstract: On Reddit, individual subreddits are used to organize content and connect users. One mode of interaction is the subreddit link, which occurs when a user makes a direct reference to a subreddit in another community. Based on the ubiquity of these references, we have undertaken a study on subreddit links on Reddit, with the goal of understanding their impact on both the referenced subreddit, and on… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: Accepted at ICWSM 2022

  22. arXiv:2203.08327  [pdf, other

    cs.CV cs.SI

    Motif Mining: Finding and Summarizing Remixed Image Content

    Authors: William Theisen, Daniel Gonzalez Cedre, Zachariah Carmichael, Daniel Moreira, Tim Weninger, Walter Scheirer

    Abstract: On the internet, images are no longer static; they have become dynamic content. Thanks to the availability of smartphones with cameras and easy-to-use editing software, images can be remixed (i.e., redacted, edited, and recombined with other content) on-the-fly and with a world-wide audience that can repeat the process. From digital art to memes, the evolution of images through time is now an impo… ▽ More

    Submitted 17 March, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: 41 pages, 21 figures

  23. Attributed Graph Modeling with Vertex Replacement Grammars

    Authors: Satyaki Sikdar, Neil Shah, Tim Weninger

    Abstract: Recent work at the intersection of formal language theory and graph theory has explored graph grammars for graph modeling. However, existing models and formalisms can only operate on homogeneous (i.e., untyped or unattributed) graphs. We relax this restriction and introduce the Attributed Vertex Replacement Grammar (AVRG), which can be efficiently extracted from heterogeneous (i.e., typed, colored… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: 9 pages, 2 tables, 10 figures. Accepted as a regular paper at WSDM 2021

  24. arXiv:2107.08034  [pdf, other

    cs.CY cs.HC cs.SI

    Pilot Study Suggests Online Media Literacy Programming Reduces Belief in False News in Indonesia

    Authors: Pamela Bilo Thomas, Clark Hogan-Taylor, Michael Yankoski, Tim Weninger

    Abstract: Amidst the threat of digital misinformation, we offer a pilot study regarding the efficacy of an online social media literacy campaign aimed at empowering individuals in Indonesia with skills to help them identify misinformation. We found that users who engaged with our online training materials and educational videos were more likely to identify misinformation than those in our control group (tot… ▽ More

    Submitted 16 July, 2021; originally announced July 2021.

    Comments: 13 pages

  25. arXiv:2106.07353  [pdf, other

    cs.CL cs.AI cs.LG

    Posthoc Verification and the Fallibility of the Ground Truth

    Authors: Yifan Ding, Nicholas Botzer, Tim Weninger

    Abstract: Classifiers commonly make use of pre-annotated datasets, wherein a model is evaluated by pre-defined metrics on a held-out test set typically made of human-annotated labels. Metrics used in these evaluations are tied to the availability of well-defined ground truth labels, and these metrics typically do not allow for inexact matches. These noisy ground truth labels and strict evaluation metrics ma… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: 12 pages, 6 figures, 1 table

  26. arXiv:2106.01254  [pdf, other

    cs.LG cs.HC cs.MA

    Survey Equivalence: A Procedure for Measuring Classifier Accuracy Against Human Labels

    Authors: Paul Resnick, Yuqing Kong, Grant Schoenebeck, Tim Weninger

    Abstract: In many classification tasks, the ground truth is either noisy or subjective. Examples include: which of two alternative paper titles is better? is this comment toxic? what is the political leaning of this news article? We refer to such tasks as survey settings because the ground truth is defined through a survey of one or more human raters. In survey settings, conventional measurements of classif… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

  27. arXiv:2102.03952  [pdf, other

    cs.SI

    Competition Dynamics in the Meme Ecosystem

    Authors: Trenton Ford, Rachel Krohn, Tim Weninger

    Abstract: The creation and sharing of memes is a common modality of online social interactions. The goal of the present work is to better understand the collective dynamics of memes in this accelerating and competitive environment. By taking an ecological perspective and tracking the meme-text from 352 popular memes over the entirety of Reddit, we are able to show that the frequency of memes has scaled almo… ▽ More

    Submitted 7 February, 2021; originally announced February 2021.

    ACM Class: H.0; J.4

  28. arXiv:2101.07664  [pdf, other

    cs.SI

    Analysis of Moral Judgement on Reddit

    Authors: Nicholas Botzer, Shawn Gu, Tim Weninger

    Abstract: Moral outrage has become synonymous with social media in recent years. However, the preponderance of academic analysis on social media websites has focused on hate speech and misinformation. This paper focuses on analyzing moral judgements rendered on social media by capturing the moral judgements that are passed in the subreddit /r/AmITheAsshole on Reddit. Using the labels associated with each ju… ▽ More

    Submitted 19 January, 2021; originally announced January 2021.

    Comments: Submitted to ICWSM 2021, 9 pages and 6 figures

  29. arXiv:2101.01793  [pdf, ps, other

    cs.SI

    Behavior Change in Response to Subreddit Bans and External Events

    Authors: Pamela Bilo Thomas, Daniel Riehm, Maria Glenski, Tim Weninger

    Abstract: As more people flock to social media to connect with others and form virtual communities, it is important to research how members of these groups interact to understand human behavior on the Web. In response to an increase in hate speech, harassment and other antisocial behaviors, many social media companies have implemented different content and user moderation policies. On Reddit, for example, c… ▽ More

    Submitted 5 January, 2021; originally announced January 2021.

  30. Reddit Entity Linking Dataset

    Authors: Nicholas Botzer, Yifan Ding, Tim Weninger

    Abstract: We introduce and make publicly available an entity linking dataset from Reddit that contains 17,316 linked entities, each annotated by three human annotators and then grouped into Gold, Silver, and Bronze to indicate inter-annotator agreement. We analyze the different errors and disagreements made by annotators and suggest three types of corrections to the raw data. Finally, we tested existing ent… ▽ More

    Submitted 25 February, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

    Comments: 20 pages and 4 figures

    Journal ref: Information Processing and Management Volume 58, Issue 3 (May 2021) 1-20

  31. arXiv:2009.14783  [pdf, other

    cs.DC cs.LG

    HetSeq: Distributed GPU Training on Heterogeneous Infrastructure

    Authors: Yifan Ding, Nicholas Botzer, Tim Weninger

    Abstract: Modern deep learning systems like PyTorch and Tensorflow are able to train enormous models with billions (or trillions) of parameters on a distributed infrastructure. These systems require that the internal nodes have the same memory capacity and compute performance. Unfortunately, most organizations, especially universities, have a piecemeal approach to purchasing computer systems resulting in a… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

    Comments: 7 pages, 3 tables, 2 figures

  32. arXiv:2009.08925  [pdf, other

    cs.SI physics.soc-ph

    The Infinity Mirror Test for Graph Models

    Authors: Satyaki Sikdar, Daniel Gonzalez Cedre, Trenton W. Ford, Tim Weninger

    Abstract: Graph models, like other machine learning models, have implicit and explicit biases built-in, which often impact performance in nontrivial ways. The model's faithfulness is often measured by comparing the newly generated graph against the source graph using any number or combination of graph properties. Differences in the size or topology of the generated graph, therefore, indicate a loss in the m… ▽ More

    Submitted 3 January, 2022; v1 submitted 18 September, 2020; originally announced September 2020.

    Comments: Accepted in IEEE TKDE 2022, 12 pages and 8 figures

  33. Joint Subgraph-to-Subgraph Transitions -- Generalizing Triadic Closure for Powerful and Interpretable Graph Modeling

    Authors: Justus Hibshman, Daniel Gonzalez Cedre, Satyaki Sikdar, Tim Weninger

    Abstract: We generalize triadic closure, along with previous generalizations of triadic closure, under an intuitive umbrella generalization: the Subgraph-to-Subgraph Transition (SST). We present algorithms and code to model graph evolution in terms of collections of these SSTs. We then use the SST framework to create link prediction models for both static and temporal, directed and undirected graphs which p… ▽ More

    Submitted 17 February, 2022; v1 submitted 14 September, 2020; originally announced September 2020.

    Comments: Published in WSDM 2021

    ACM Class: I.2.6

  34. arXiv:2003.00045  [pdf, ps, other

    cs.SI

    Library Adoption Dynamics in Software Teams

    Authors: Pamela Bilo Thomas, Rachel Krohn, Tim Weninger

    Abstract: When a group of people strives to understand new information, struggle ensues as various ideas compete for attention. Steep learning curves are surmounted as teams learn together. To understand how these team dynamics play out in software development, we explore Git logs, which provide a complete change history of software repositories. In these repositories, we observe code additions, which repre… ▽ More

    Submitted 28 February, 2020; originally announced March 2020.

    Comments: 12 pages. Short version published at ASONAM 2019

  35. arXiv:2001.06122  [pdf, other

    cs.CV cs.SI

    Automatic Discovery of Political Meme Genres with Diverse Appearances

    Authors: William Theisen, Joel Brogan, Pamela Bilo Thomas, Daniel Moreira, Pascal Phoa, Tim Weninger, Walter Scheirer

    Abstract: Forms of human communication are not static -- we expect some evolution in the way information is conveyed over time because of advances in technology. One example of this phenomenon is the image-based meme, which has emerged as a dominant form of political messaging in the past decade. While originally used to spread jokes on social media, memes are now having an outsized impact on public percept… ▽ More

    Submitted 10 September, 2020; v1 submitted 16 January, 2020; originally announced January 2020.

    Comments: 13 pages, 14 figures

  36. arXiv:1910.10763  [pdf, ps, other

    cs.SI

    Representation Learning in Heterogeneous Professional Social Networks with Ambiguous Social Connections

    Authors: Baoxu Shi, Jaewon Yang, Tim Weninger, Jing How, Qi He

    Abstract: Network representations have been shown to improve performance within a variety of tasks, including classification, clustering, and link prediction. However, most models either focus on moderate-sized, homogeneous networks or require a significant amount of auxiliary input to be provided by the user. Moreover, few works have studied network representations in real-world heterogeneous social networ… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

    Comments: 10 pages, accepted at IEEE BigData 2019

  37. Towards Interpretable Graph Modeling with Vertex Replacement Grammars

    Authors: Justus Hibshman, Satyaki Sikdar, Tim Weninger

    Abstract: An enormous amount of real-world data exists in the form of graphs. Oftentimes, interesting patterns that describe the complex dynamics of these graphs are captured in the form of frequently reoccurring substructures. Recent work at the intersection of formal language theory and graph theory has explored the use of graph grammars for graph modeling and pattern mining. However, existing formulation… ▽ More

    Submitted 18 October, 2019; originally announced October 2019.

    Comments: 10 pages, 9 figures, accepted at IEEE BigData 2019

  38. arXiv:1910.08575  [pdf, ps, other

    cs.SI

    Modelling Online Comment Threads from their Start

    Authors: Rachel Krohn, Tim Weninger

    Abstract: The social Web is a widely used platform for online discussion. Across social media, users can start discussions by posting a topical image, url, or message. Upon seeing this initial post, other users may add their own comments to the post, or to another user's comment. The resulting online discourse produces a comment thread, which constitutes an enormous portion of modern online communication. C… ▽ More

    Submitted 18 October, 2019; originally announced October 2019.

    Comments: 10 pages, 10 figures, accepted at IEEE Big Data 2019

  39. Massive Multi-Agent Data-Driven Simulations of the GitHub Ecosystem

    Authors: Jim Blythe, John Bollenbacher, Di Huang, Pik-Mai Hui, Rachel Krohn, Diogo Pacheco, Goran Muric, Anna Sapienza, Alexey Tregubov, Yong-Yeol Ahn, Alessandro Flammini, Kristina Lerman, Filippo Menczer, Tim Weninger, Emilio Ferrara

    Abstract: Simulating and predicting planetary-scale techno-social systems poses heavy computational and modeling challenges. The DARPA SocialSim program set the challenge to model the evolution of GitHub, a large collaborative software-development ecosystem, using massive multi-agent simulations. We describe our best performing models and our agent-based simulation framework, which we are currently extendin… ▽ More

    Submitted 15 August, 2019; originally announced August 2019.

    Journal ref: International Conference on Practical Applications of Agents and Multi-Agent Systems, pp. 3-15. Springer, Cham, 2019

  40. Modeling Graphs with Vertex Replacement Grammars

    Authors: Satyaki Sikdar, Justus Hibshman, Tim Weninger

    Abstract: One of the principal goals of graph modeling is to capture the building blocks of network data in order to study various physical and natural phenomena. Recent work at the intersection of formal language theory and graph theory has explored the use of graph grammars for graph modeling. However, existing graph grammar formalisms, like Hyperedge Replacement Grammars, can only operate on small tree-l… ▽ More

    Submitted 11 September, 2019; v1 submitted 10 August, 2019; originally announced August 2019.

    Comments: Accepted as a regular paper at IEEE ICDM 2019. 15 pages, 9 figures

  41. arXiv:1907.04527  [pdf, other

    cs.SI cs.SE

    Dynamics of Team Library Adoptions: An Exploration of GitHub Commit Logs

    Authors: Pamela Bilo Thomas, Rachel Krohn, Tim Weninger

    Abstract: When a group of people strives to understand new information, struggle ensues as various ideas compete for attention. Steep learning curves are surmounted as teams learn together. To understand how these team dynamics play out in software development, we explore Git logs, which provide a complete change history of software repositories. In these repositories, we observe code additions, which repre… ▽ More

    Submitted 10 July, 2019; originally announced July 2019.

  42. arXiv:1907.00558  [pdf, ps, other

    q-fin.ST cs.LG cs.SI stat.ML

    Improved Forecasting of Cryptocurrency Price using Social Signals

    Authors: Maria Glenski, Tim Weninger, Svitlana Volkova

    Abstract: Social media signals have been successfully used to develop large-scale predictive and anticipatory analytics. For example, forecasting stock market prices and influenza outbreaks. Recently, social data has been explored to forecast price fluctuations of cryptocurrencies, which are a novel disruptive technology with significant political and economic implications. In this paper we leverage and con… ▽ More

    Submitted 1 July, 2019; originally announced July 2019.

  43. Propagation from Deceptive News Sources: Who Shares, How Much, How Evenly, and How Quickly?

    Authors: Maria Glenski, Tim Weninger, Svitlana Volkova

    Abstract: As people rely on social media as their primary sources of news, the spread of misinformation has become a significant concern. In this large-scale study of news in social media we analyze eleven million posts and investigate propagation behavior of users that directly interact with news accounts identified as spreading trusted versus malicious content. Unlike previous work, which looks at specifi… ▽ More

    Submitted 9 December, 2018; originally announced December 2018.

    Comments: 12 pages, 6 figures, 7 tables, published in IEEE TCSS December 2018

    Journal ref: IEEE Transactions on Computational Social Systems ( Volume: 5 , Issue: 4 , Dec. 2018 )

  44. arXiv:1809.00740  [pdf, other

    cs.HC

    GuessTheKarma: A Game to Assess Social Rating Systems

    Authors: Maria Glenski, Greg Stoddard, Paul Resnick, Tim Weninger

    Abstract: Popularity systems, like Twitter retweets, Reddit upvotes, and Pinterest pins have the potential to guide people toward posts that others liked. That, however, creates a feedback loop that reduces their informativeness: items marked as more popular get more attention, so that additional upvotes and retweets may simply reflect the increased attention and not independent information about the fracti… ▽ More

    Submitted 3 September, 2018; originally announced September 2018.

    Comments: 15 pages, 7 figures, accepted to CSCW 2018

  45. arXiv:1807.05327  [pdf, ps, other

    cs.SI

    How Humans versus Bots React to Deceptive and Trusted News Sources: A Case Study of Active Users

    Authors: Maria Glenski, Tim Weninger, Svitlana Volkova

    Abstract: Society's reliance on social media as a primary source of news has spawned a renewed focus on the spread of misinformation. In this work, we identify the differences in how social media accounts identified as bots react to news sources of varying credibility, regardless of the veracity of the content those sources have shared. We analyze bot and human responses annotated using a fine-grained model… ▽ More

    Submitted 13 July, 2018; originally announced July 2018.

  46. arXiv:1806.07955  [pdf, other

    cs.SI cs.AI

    Growing Better Graphs With Latent-Variable Probabilistic Graph Grammars

    Authors: Xinyi Wang, Salvador Aguinaga, Tim Weninger, David Chiang

    Abstract: Recent work in graph models has found that probabilistic hyperedge replacement grammars (HRGs) can be extracted from graphs and used to generate new random graphs with graph properties and substructures close to the original. In this paper, we show how to add latent variables to the model, trained using Expectation-Maximization, to generate still better graphs, that is, ones that generalize better… ▽ More

    Submitted 11 June, 2018; originally announced June 2018.

  47. arXiv:1805.12032  [pdf, ps, other

    cs.CL

    Identifying and Understanding User Reactions to Deceptive and Trusted Social News Sources

    Authors: Maria Glenski, Tim Weninger, Svitlana Volkova

    Abstract: In the age of social news, it is important to understand the types of reactions that are evoked from news sources with various levels of credibility. In the present work we seek to better understand how users react to trusted and deceptive news sources across two popular, and very different, social media platforms. To that end, (1) we develop a model to classify user reactions into one of nine typ… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

  48. arXiv:1802.08614  [pdf, other

    cs.CL cs.AI cs.IR

    Visualizing the Flow of Discourse with a Concept Ontology

    Authors: Baoxu Shi, Tim Weninger

    Abstract: Understanding and visualizing human discourse has long being a challenging task. Although recent work on argument mining have shown success in classifying the role of various sentences, the task of recognizing concepts and understanding the ways in which they are discussed remains challenging. Given an email thread or a transcript of a group discussion, our task is to extract the relevant concepts… ▽ More

    Submitted 23 February, 2018; originally announced February 2018.

    Comments: 2 pages, accepted to WWW2018

  49. arXiv:1802.08068  [pdf, other

    cs.SI cs.FL

    Learning Hyperedge Replacement Grammars for Graph Generation

    Authors: Salvador Aguinaga, David Chiang, Tim Weninger

    Abstract: The discovery and analysis of network patterns are central to the scientific enterprise. In the present work, we developed and evaluated a new approach that learns the building blocks of graphs that can be used to understand and generate new realistic graphs. Our key insight is that a graph's clique tree encodes robust and precise information. We show that a Hyperedge Replacement Grammar (HRG) can… ▽ More

    Submitted 23 February, 2018; v1 submitted 20 February, 2018; originally announced February 2018.

    Comments: 27 pages, accepted at IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). arXiv admin note: substantial text overlap with arXiv:1608.03192

  50. arXiv:1711.03438  [pdf, ps, other

    cs.AI cs.CL

    Open-World Knowledge Graph Completion

    Authors: Baoxu Shi, Tim Weninger

    Abstract: Knowledge Graphs (KGs) have been applied to many tasks including Web search, link prediction, recommendation, natural language processing, and entity linking. However, most KGs are far from complete and are growing at a rapid pace. To address these problems, Knowledge Graph Completion (KGC) has been proposed to improve KGs by filling in its missing connections. Unlike existing methods which hold a… ▽ More

    Submitted 9 November, 2017; originally announced November 2017.

    Comments: 8 pages, accepted to AAAI 2018