Skip to main content

Showing 1–35 of 35 results for author: Larsen, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2508.10975  [pdf, ps, other

    cs.LG cs.CL

    BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining

    Authors: DatologyAI, :, Pratyush Maini, Vineeth Dorna, Parth Doshi, Aldo Carranza, Fan Pan, Jack Urbanek, Paul Burstein, Alex Fang, Alvin Deng, Amro Abbas, Brett Larsen, Cody Blakeney, Charvi Bannur, Christina Baek, Darren Teh, David Schwab, Haakon Mongstad, Haoli Yin, Josh Wills, Kaleigh Mentzer, Luke Merrick, Ricardo Monti, Rishabh Adiga , et al. (6 additional authors not shown)

    Abstract: Recent advances in large language model (LLM) pretraining have shown that simply scaling data quantity eventually leads to diminishing returns, hitting a data wall. In response, the use of synthetic data for pretraining has emerged as a promising paradigm for pushing the frontier of performance. Despite this, the factors affecting synthetic data quality remain poorly understood. In this work, we i… ▽ More

    Submitted 19 August, 2025; v1 submitted 14 August, 2025; originally announced August 2025.

    Comments: Blog version can be viewed at: http://blog.datologyai.com/beyondweb

  2. arXiv:2507.13575  [pdf, ps, other

    cs.LG cs.AI

    Apple Intelligence Foundation Language Models: Tech Report 2025

    Authors: Ethan Li, Anders Boesen Lindbo Larsen, Chen Zhang, Xiyou Zhou, Jun Qin, Dian Ang Yap, Narendran Raghavan, Xuankai Chang, Margit Bowler, Eray Yildiz, John Peebles, Hannah Gillis Coleman, Matteo Ronchi, Peter Gray, Keen You, Anthony Spalvieri-Kruse, Ruoming Pang, Reed Li, Yuli Yang, Emad Soroush, Zhiyun Lu, Crystal Xiao, Rong Situ, Jordan Huffaker, David Griffiths , et al. (373 additional authors not shown)

    Abstract: We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and services: i a 3B-parameter on-device model optimized for Apple silicon through architectural innovations such as KV-cache sharing and 2-bit quantization-aware training; and ii a scalable server model built on a novel Parallel-Track Mixture-of-Experts PT-MoE transform… ▽ More

    Submitted 27 August, 2025; v1 submitted 17 July, 2025; originally announced July 2025.

  3. arXiv:2507.12466  [pdf, ps, other

    cs.CL cs.LG

    Language Models Improve When Pretraining Data Matches Target Tasks

    Authors: David Mizrahi, Anders Boesen Lindbo Larsen, Jesse Allardice, Suzie Petryk, Yuri Gorokhov, Jeffrey Li, Alex Fang, Josh Gardner, Tom Gunter, Afshin Dehghan

    Abstract: Every data selection method inherently has a target. In practice, these targets often emerge implicitly through benchmark-driven iteration: researchers develop selection strategies, train models, measure benchmark performance, then refine accordingly. This raises a natural question: what happens when we make this optimization explicit? To explore this, we propose benchmark-targeted ranking (BETR),… ▽ More

    Submitted 16 July, 2025; originally announced July 2025.

    Comments: 44 pages, 25 figures, 13 tables

  4. arXiv:2408.05677  [pdf, other

    math.NA cs.LG

    Tensor Decomposition Meets RKHS: Efficient Algorithms for Smooth and Misaligned Data

    Authors: Brett W. Larsen, Tamara G. Kolda, Anru R. Zhang, Alex H. Williams

    Abstract: The canonical polyadic (CP) tensor decomposition decomposes a multidimensional data array into a sum of outer products of finite-dimensional vectors. Instead, we can replace some or all of the vectors with continuous functions (infinite-dimensional vectors) from a reproducing kernel Hilbert space (RKHS). We refer to tensors with some infinite-dimensional modes as quasitensors, and the approach of… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  5. arXiv:2406.03476  [pdf, other

    cs.LG cs.CL

    Does your data spark joy? Performance gains from domain upsampling at the end of training

    Authors: Cody Blakeney, Mansheej Paul, Brett W. Larsen, Sean Owen, Jonathan Frankle

    Abstract: Pretraining datasets for large language models (LLMs) have grown to trillions of tokens composed of large amounts of CommonCrawl (CC) web scrape along with smaller, domain-specific datasets. It is expensive to understand the impact of these domain-specific datasets on model capabilities as training at large FLOP scales is required to reveal significant changes to difficult and emergent benchmarks.… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: The first three authors contributed equally

  6. arXiv:2311.11436  [pdf, other

    stat.ML cs.LG

    Duality of Bures and Shape Distances with Implications for Comparing Neural Representations

    Authors: Sarah E. Harvey, Brett W. Larsen, Alex H. Williams

    Abstract: A multitude of (dis)similarity measures between neural network representations have been proposed, resulting in a fragmented research landscape. Most of these measures fall into one of two categories. First, measures such as linear regression, canonical correlations analysis (CCA), and shape distances, all learn explicit mappings between neural units to quantify similarity while accounting for e… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

  7. arXiv:2310.05742  [pdf, other

    stat.ML cs.LG q-bio.NC

    Estimating Shape Distances on Neural Representations with Limited Samples

    Authors: Dean A. Pospisil, Brett W. Larsen, Sarah E. Harvey, Alex H. Williams

    Abstract: Measuring geometric similarity between high-dimensional network representations is a topic of longstanding interest to neuroscience and deep learning. Although many methods have been proposed, only a few works have rigorously analyzed their statistical efficiency or quantified estimator uncertainty in data-limited regimes. Here, we derive upper and lower bounds on the worst-case convergence of sta… ▽ More

    Submitted 9 December, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

  8. arXiv:2210.03044  [pdf, other

    cs.LG cs.AI stat.ML

    Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

    Authors: Mansheej Paul, Feng Chen, Brett W. Larsen, Jonathan Frankle, Surya Ganguli, Gintare Karolina Dziugaite

    Abstract: Modern deep learning involves training costly, highly overparameterized networks, thus motivating the search for sparser networks that can still be trained to the same accuracy as the full network (i.e. matching). Iterative magnitude pruning (IMP) is a state of the art algorithm that can find such highly sparse matching subnetworks, known as winning tickets. IMP operates by iterative cycles of tra… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: The first three authors contributed equally

  9. arXiv:2206.01278  [pdf, other

    cs.LG cs.AI stat.ML

    Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

    Authors: Mansheej Paul, Brett W. Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite

    Abstract: A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same accuracy as the dense network. However, the same does not hold at step 0, i.e. random initialization. In this work, we seek to understand how this ear… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

    Comments: The first two authors contributed equally

  10. arXiv:2201.10638  [pdf, ps, other

    math.NA cs.DS

    Sketching Matrix Least Squares via Leverage Scores Estimates

    Authors: Brett W. Larsen, Tamara G. Kolda

    Abstract: We consider the matrix least squares problem of the form $\| \mathbf{A} \mathbf{X}-\mathbf{B} \|_F^2$ where the design matrix $\mathbf{A} \in \mathbb{R}^{N \times r}$ is tall and skinny with $N \gg r$. We propose to create a sketched version $\| \tilde{\mathbf{A}}\mathbf{X}-\tilde{\mathbf{B}} \|_F^2$ where the sketched matrices $\tilde{\mathbf{A}}$ and $\tilde{\mathbf{B}}$ contain weighted subsets… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

    Comments: This is detailed and standalone derivation of a result that already appears in (arXiv:2006.16438, Appendix A). arXiv admin note: substantial text overlap with arXiv:2006.16438

  11. A Framework for Understanding AI-Induced Field Change: How AI Technologies are Legitimized and Institutionalized

    Authors: Benjamin Cedric Larsen

    Abstract: Artificial intelligence (AI) systems operate in increasingly diverse areas, from healthcare to facial recognition, the stock market, autonomous vehicles, and so on. While the underlying digital infrastructure of AI systems is developing rapidly, each area of implementation is subject to different degrees and processes of legitimization. By combining elements from institutional theory and informati… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: 10 pages, 2 figures

    Journal ref: In Proceedings of the 2021 AAAI ACM Conference on AI Ethics and Society

  12. arXiv:2107.05802  [pdf, other

    cs.LG stat.ML

    How many degrees of freedom do we need to train deep networks: a loss landscape perspective

    Authors: Brett W. Larsen, Stanislav Fort, Nic Becker, Surya Ganguli

    Abstract: A variety of recent works, spanning pruning, lottery tickets, and training within random subspaces, have shown that deep neural networks can be trained using far fewer degrees of freedom than the total number of parameters. We analyze this phenomenon for random subspaces by first examining the success probability of hitting a training loss sub-level set when training within a random subspace of a… ▽ More

    Submitted 3 February, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: ICLR 2022

  13. arXiv:2011.12684  [pdf, other

    cs.IR cs.LG

    Denmark's Participation in the Search Engine TREC COVID-19 Challenge: Lessons Learned about Searching for Precise Biomedical Scientific Information on COVID-19

    Authors: Lucas Chaves Lima, Casper Hansen, Christian Hansen, Dongsheng Wang, Maria Maistro, Birger Larsen, Jakob Grue Simonsen, Christina Lioma

    Abstract: This report describes the participation of two Danish universities, University of Copenhagen and Aalborg University, in the international search engine competition on COVID-19 (the 2020 TREC-COVID Challenge) organised by the U.S. National Institute of Standards and Technology (NIST) and its Text Retrieval Conference (TREC) division. The aim of the competition was to find the best search engine str… ▽ More

    Submitted 26 November, 2020; v1 submitted 25 November, 2020; originally announced November 2020.

  14. arXiv:2010.15045  [pdf, other

    cs.NE cs.LG cs.MA

    A multi-agent model for growing spiking neural networks

    Authors: Javier Lopez Randulfe, Leon Bonde Larsen

    Abstract: Artificial Intelligence has looked into biological systems as a source of inspiration. Although there are many aspects of the brain yet to be discovered, neuroscience has found evidence that the connections between neurons continuously grow and reshape as a part of the learning process. This differs from the design of Artificial Neural Networks, that achieve learning by evolving the weights in the… ▽ More

    Submitted 21 September, 2020; originally announced October 2020.

    Comments: 79 pages. Master's thesis

  15. Topology Optimization and 3D printing of Large Deformation Compliant Mechanisms for Straining Biological Tissues

    Authors: P. Kumar, C. Schmidleithner, N. B. Larsen, O. Sigmund

    Abstract: This paper presents a synthesis approach in a density-based topology optimization setting to design large deformation compliant mechanisms for inducing desired strains in biological tissues. The modelling is based on geometrical nonlinearity together with a suitably chosen hypereleastic material model, wherein the mechanical equilibrium equations are solved using the total Lagrangian finite elemen… ▽ More

    Submitted 25 March, 2021; v1 submitted 20 August, 2020; originally announced August 2020.

    Comments: 23 pages, 14 figures

    Journal ref: Structural and Multidisciplinary Optimization, volume 63, 2021

  16. Factuality Checking in News Headlines with Eye Tracking

    Authors: Christian Hansen, Casper Hansen, Jakob Grue Simonsen, Birger Larsen, Stephen Alstrup, Christina Lioma

    Abstract: We study whether it is possible to infer if a news headline is true or false using only the movement of the human eyes when reading news headlines. Our study with 55 participants who are eye-tracked when reading 108 news headlines (72 true, 36 false) shows that false headlines receive statistically significantly less visual attention than true headlines. We further build an ensemble learner that p… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

    Comments: Accepted to SIGIR 2020

  17. arXiv:2001.00098  [pdf, other

    cs.LG math.OC stat.ML

    Avoiding Spurious Local Minima in Deep Quadratic Networks

    Authors: Abbas Kazemipour, Brett W. Larsen, Shaul Druckmann

    Abstract: Despite their practical success, a theoretical understanding of the loss landscape of neural networks has proven challenging due to the high-dimensional, non-convex, and highly nonlinear structure of such models. In this paper, we characterize the training landscape of the mean squared error loss for neural networks with quadratic activation functions. We prove existence of spurious local minima a… ▽ More

    Submitted 19 July, 2020; v1 submitted 31 December, 2019; originally announced January 2020.

    Comments: 36 pages; added deep network experiments, results for population loss

  18. arXiv:1802.02603  [pdf

    cs.IR

    To Phrase or Not to Phrase - Impact of User versus System Term Dependence Upon Retrieval

    Authors: Christina Lioma, Birger Larsen, Peter Ingwersen

    Abstract: When submitting queries to information retrieval (IR) systems, users often have the option of specifying which, if any, of the query terms are heavily dependent on each other and should be treated as a fixed phrase, for instance by placing them between quotes. In addition to such cases where users specify term dependence, automatic ways also exist for IR systems to detect dependent terms in querie… ▽ More

    Submitted 5 March, 2018; v1 submitted 7 February, 2018; originally announced February 2018.

  19. arXiv:1708.07157  [pdf, ps, other

    cs.IR

    Evaluation Measures for Relevance and Credibility in Ranked Lists

    Authors: Christina Lioma, Jakob Grue Simonsen, Birger Larsen

    Abstract: Recent discussions on alternative facts, fake news, and post truth politics have motivated research on creating technologies that allow people not only to access information, but also to assess the credibility of the information presented to them by information retrieval systems. Whereas technology is in place for filtering information according to relevance and/or credibility, no single measure c… ▽ More

    Submitted 23 August, 2017; originally announced August 2017.

  20. arXiv:1704.01845  [pdf, ps, other

    cs.IR

    Report on TBAS 2012: Workshop on Task-Based and Aggregated Search

    Authors: Birger Larsen, Christina Lioma, Arjen de Vries

    Abstract: The ECIR half-day workshop on Task-Based and Aggregated Search (TBAS) was held in Barcelona, Spain on 1 April 2012. The program included a keynote talk by Professor Jarvelin, six full paper presentations, two poster presentations, and an interactive discussion among the approximately 25 participants. This report overviews the aims and contents of the workshop and outlines the major outcomes.

    Submitted 6 April, 2017; originally announced April 2017.

  21. arXiv:1704.01610  [pdf, ps, other

    cs.IR

    A Subjective Logic Formalisation of the Principle of Polyrepresentation for Information Needs

    Authors: Christina Lioma, Birger Larsen, Hinrich Schütze, Peter Ingwersen

    Abstract: Interactive Information Retrieval refers to the branch of Information Retrieval that considers the retrieval process with respect to a wide range of contexts, which may affect the user's information seeking experience. The identification and representation of such contexts has been the object of the principle of Polyrepresentation, a theoretical framework for reasoning about different representati… ▽ More

    Submitted 5 April, 2017; originally announced April 2017.

  22. arXiv:1704.01603  [pdf, ps, other

    cs.IR

    Preliminary Experiments using Subjective Logic for the Polyrepresentation of Information Needs

    Authors: Christina Lioma, Birger Larsen, Peter Ingwersen

    Abstract: According to the principle of polyrepresentation, retrieval accuracy may improve through the combination of multiple and diverse information object representations about e.g. the context of the user, the information sought, or the retrieval system. Recently, the principle of polyrepresentation was mathematically expressed using subjective logic, where the potential suitability of each representati… ▽ More

    Submitted 5 April, 2017; originally announced April 2017.

  23. arXiv:1704.01599  [pdf, ps, other

    cs.IR cs.CL

    Rhetorical relations for information retrieval

    Authors: Christina Lioma, Birger Larsen, Wei Lu

    Abstract: Typically, every part in most coherent text has some plausible reason for its presence, some function that it performs to the overall semantics of the text. Rhetorical relations, e.g. contrast, cause, explanation, describe how the parts of a text are linked to each other. Knowledge about this socalled discourse structure has been applied successfully to several natural language processing tasks. T… ▽ More

    Submitted 5 April, 2017; originally announced April 2017.

  24. arXiv:1610.01327  [pdf, ps, other

    cs.IR

    A Study of Factuality, Objectivity and Relevance: Three Desiderata in Large-Scale Information Retrieval?

    Authors: Christina Lioma, Birger Larsen, Wei Lu, Yong Huang

    Abstract: Much of the information processed by Information Retrieval (IR) systems is unreliable, biased, and generally untrustworthy [1], [2], [3]. Yet, factuality & objectivity detection is not a standard component of IR systems, even though it has been possible in Natural Language Processing (NLP) in the last decade. Motivated by this, we ask if and how factuality & objectivity detection may benefit IR. W… ▽ More

    Submitted 8 October, 2016; v1 submitted 5 October, 2016; originally announced October 2016.

  25. arXiv:1608.00758  [pdf, other

    cs.IR

    Exploiting the Bipartite Structure of Entity Grids for Document Coherence and Retrieval

    Authors: Christina Lioma, Fabien Tarissan, Jakob Grue Simonsen, Casper Petersen, Birger Larsen

    Abstract: Document coherence describes how much sense text makes in terms of its logical organisation and discourse flow. Even though coherence is a relatively difficult notion to quantify precisely, it can be approximated automatically. This type of coherence modelling is not only interesting in itself, but also useful for a number of other text processing tasks, including Information Retrieval (IR), where… ▽ More

    Submitted 2 August, 2016; originally announced August 2016.

  26. arXiv:1606.07660  [pdf, other

    cs.IR

    Deep Learning Relevance: Creating Relevant Information (as Opposed to Retrieving it)

    Authors: Christina Lioma, Birger Larsen, Casper Petersen, Jakob Grue Simonsen

    Abstract: What if Information Retrieval (IR) systems did not just retrieve relevant information that is stored in their indices, but could also "understand" it and synthesise it into a single document? We present a preliminary study that makes a first step towards answering this question. Given a query, we train a Recurrent Neural Network (RNN) on existing relevant information to that query. We then use the… ▽ More

    Submitted 27 June, 2016; v1 submitted 24 June, 2016; originally announced June 2016.

    Comments: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval, July 21, 2016, Pisa, Italy

  27. arXiv:1512.09300  [pdf, other

    cs.LG cs.CV stat.ML

    Autoencoding beyond pixels using a learned similarity metric

    Authors: Anders Boesen Lindbo Larsen, Søren Kaae Sønderby, Hugo Larochelle, Ole Winther

    Abstract: We present an autoencoder that leverages learned representations to better measure similarities in data space. By combining a variational autoencoder with a generative adversarial network we can use learned feature representations in the GAN discriminator as basis for the VAE reconstruction objective. Thereby, we replace element-wise errors with feature-wise errors to better capture the data distr… ▽ More

    Submitted 10 February, 2016; v1 submitted 31 December, 2015; originally announced December 2015.

  28. arXiv:1510.02795  [pdf, other

    cs.CV

    Dreaming More Data: Class-dependent Distributions over Diffeomorphisms for Learned Data Augmentation

    Authors: Søren Hauberg, Oren Freifeld, Anders Boesen Lindbo Larsen, John W. Fisher III, Lars Kai Hansen

    Abstract: Data augmentation is a key element in training high-dimensional models. In this approach, one synthesizes new observations by applying pre-specified transformations to the original training data; e.g.~new images are formed by rotating old ones. Current augmentation schemes, however, rely on manual specification of the applied transformations, making data augmentation an implicit form of feature en… ▽ More

    Submitted 30 June, 2016; v1 submitted 9 October, 2015; originally announced October 2015.

    Journal ref: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 342-350, 2016

  29. arXiv:1507.08234  [pdf, other

    cs.IR

    Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application

    Authors: Casper Petersen, Christina Lioma, Jakob Grue Simonsen, Birger Larsen

    Abstract: We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which ne… ▽ More

    Submitted 29 July, 2015; originally announced July 2015.

  30. arXiv:1507.08198  [pdf, ps, other

    cs.IR

    Non-Compositional Term Dependence for Information Retrieval

    Authors: Christina Lioma, Jakob Grue Simonsen, Birger Larsen, Niels Dalum Hansen

    Abstract: Modelling term dependence in IR aims to identify co-occurring terms that are too heavily dependent on each other to be treated as a bag of words, and to adapt the indexing and ranking accordingly. Dependent terms are predominantly identified using lexical frequency statistics, assuming that (a) if terms co-occur often enough in some corpus, they are semantically dependent; (b) the more often they… ▽ More

    Submitted 29 July, 2015; originally announced July 2015.

  31. Assessment of algorithms for mitosis detection in breast cancer histopathology images

    Authors: Mitko Veta, Paul J. van Diest, Stefan M. Willems, Haibo Wang, Anant Madabhushi, Angel Cruz-Roa, Fabio Gonzalez, Anders B. L. Larsen, Jacob S. Vestergaard, Anders B. Dahl, Dan C. Cireşan, Jürgen Schmidhuber, Alessandro Giusti, Luca M. Gambardella, F. Boray Tek, Thomas Walter, Ching-Wei Wang, Satoshi Kondo, Bogdan J. Matuszewski, Frederic Precioso, Violet Snell, Josef Kittler, Teofilo E. de Campos, Adnan M. Khan, Nasir M. Rajpoot , et al. (4 additional authors not shown)

    Abstract: The proliferative activity of breast tumors, which is routinely estimated by counting of mitotic figures in hematoxylin and eosin stained histology sections, is considered to be one of the most important prognostic markers. However, mitosis counting is laborious, subjective and may suffer from low inter-observer agreement. With the wider acceptance of whole slide images in pathology labs, automati… ▽ More

    Submitted 21 November, 2014; originally announced November 2014.

    Comments: 23 pages, 5 figures, accepted for publication in the journal Medical Image Analysis

  32. A review of the characteristics of 108 author-level bibliometric indicators

    Authors: Lorna Wildgaard, Jesper W. Schneider, Birger Larsen

    Abstract: An increasing demand for bibliometric assessment of individuals has led to a growth of new bibliometric indicators as well as new variants or combinations of established ones. The aim of this review is to contribute with objective facts about the usefulness of bibliometric indicators of the effects of publication activity at the individual level. This paper reviews 108 indicators that can potentia… ▽ More

    Submitted 25 August, 2014; originally announced August 2014.

    Comments: to be published in Scientometrics, 2014

  33. arXiv:1404.3084  [pdf, other

    cs.DL astro-ph.IM physics.soc-ph

    Bibliometric Indicators of Young Authors in Astrophysics: Can Later Stars be Predicted?

    Authors: Frank Havemann, Birger Larsen

    Abstract: We test 16 bibliometric indicators with respect to their validity at the level of the individual researcher by estimating their power to predict later successful researchers. We compare the indicators of a sample of astrophysics researchers who later co-authored highly cited papers before their first landmark paper with the distributions of these indicators over a random control group of young aut… ▽ More

    Submitted 11 April, 2014; originally announced April 2014.

    Comments: 14 pages, 10 figures

    Journal ref: Scientometrics, 30. November 2014, 1-25 p

  34. arXiv:1310.8226  [pdf

    cs.IR cs.DL physics.soc-ph

    Bibliometric-enhanced Information Retrieval

    Authors: Philipp Mayr, Andrea Scharnhorst, Birger Larsen, Philipp Schaer, Peter Mutschke

    Abstract: Bibliometric techniques are not yet widely used to enhance retrieval processes in digital libraries, although they offer value-added effects for users. In this workshop we will explore how statistical modelling of scholarship, such as Bradfordizing or network analysis of coauthorship network, can improve retrieval services for specific communities, as well as for large, cross-domain collections. T… ▽ More

    Submitted 30 October, 2013; originally announced October 2013.

    Comments: 6 pages, accepted workshop proposal for ECIR 2014

  35. FindZebra: A search engine for rare diseases

    Authors: Radu Dragusin, Paula Petcu, Christina Lioma, Birger Larsen, Henrik L. Jørgensen, Ingemar J. Cox, Lars Kai Hansen, Peter Ingwersen, Ole Winther

    Abstract: Background: The web has become a primary information resource about illnesses and treatments for both medical and non-medical users. Standard web search is by far the most common interface for such information. It is therefore of interest to find out how well web search engines work for diagnostic queries and what factors contribute to successes and failures. Among diseases, rare (or orphan) disea… ▽ More

    Submitted 13 March, 2013; originally announced March 2013.

    Journal ref: International Journal of Medical Informatics, Available online 23 February 2013, ISSN 1386-5056