Skip to main content

Showing 1–27 of 27 results for author: Wilhelm, C

.
  1. arXiv:2502.18443  [pdf, other

    cs.CL

    olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models

    Authors: Jake Poznanski, Jon Borchardt, Jason Dunkelberger, Regan Huff, Daniel Lin, Aman Rangapur, Christopher Wilhelm, Kyle Lo, Luca Soldaini

    Abstract: PDF documents have the potential to provide trillions of novel, high-quality tokens for training language models. However, these documents come in a diversity of types with differing formats and visual layouts that pose a challenge when attempting to extract and faithfully represent the underlying content for language model use. We present olmOCR, an open-source Python toolkit for processing PDFs… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

  2. arXiv:2501.00656  [pdf, other

    cs.CL cs.LG

    2 OLMo 2 Furious

    Authors: Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Michal Guerquin, Hamish Ivison, Pang Wei Koh, Jiacheng Liu, Saumya Malik, William Merrill , et al. (15 additional authors not shown)

    Abstract: We present OLMo 2, the next generation of our fully open language models. OLMo 2 includes dense autoregressive models with improved architecture and training recipe, pretraining data mixtures, and instruction tuning recipes. Our modified model architecture and training recipe achieve both better training stability and improved per-token efficiency. Our updated pretraining data mixture introduces a… ▽ More

    Submitted 14 January, 2025; v1 submitted 31 December, 2024; originally announced January 2025.

    Comments: Model demo available at playground.allenai.org

  3. arXiv:2411.15124  [pdf, other

    cs.CL

    Tulu 3: Pushing Frontiers in Open Language Model Post-Training

    Authors: Nathan Lambert, Jacob Morrison, Valentina Pyatkin, Shengyi Huang, Hamish Ivison, Faeze Brahman, Lester James V. Miranda, Alisa Liu, Nouha Dziri, Shane Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D. Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A. Smith, Yizhong Wang, Pradeep Dasigi, Hannaneh Hajishirzi

    Abstract: Language model post-training is applied to refine behaviors and unlock new skills across a wide range of recent language models, but open recipes for applying these techniques lag behind proprietary ones. The underlying training data and recipes for post-training are simultaneously the most important pieces of the puzzle and the portion with the least transparency. To bridge this gap, we introduce… ▽ More

    Submitted 14 April, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

    Comments: Added Tulu 3 405B results and additional analyses

  4. arXiv:2407.19018  [pdf, other

    astro-ph.EP astro-ph.GA astro-ph.SR

    On the suppression of giant planet formation around low-mass stars in clustered environments

    Authors: Shuo Huang, Simon Portegies Zwart, Maite J. C. Wilhelm

    Abstract: Context: Current exoplanet formation studies tend to overlook the birth environment of stars in clustered environments. The effect of this environment on the planet-formation process, however, is important, especially in the earliest stage. Aims: We investigate the differences in planet populations forming in star-cluster environments through pebble accretion and compare these results with the pla… ▽ More

    Submitted 2 August, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: 13 pages 10 figures. Accepted for publication in A&A

    Journal ref: A&A 689, A338 (2024)

  5. Massive star cluster formation I. High star formation efficiency while resolving feedback of individual stars

    Authors: Brooke Polak, Mordecai-Mark Mac Low, Ralf S. Klessen, Jia Wei Teh, Claude Cournoyer-Cloutier, Eric P. Andersson, Sabrina M. Appel, Aaron Tran, Sean C. Lewis, Maite J. C. Wilhelm, Simon Portegies Zwart, Simon C. O. Glover, Long Wang, Stephen L. W. McMillan

    Abstract: The mode of star formation that results in the formation of globular clusters and young massive clusters is difficult to constrain through observations. We present models of massive star cluster formation using the Torch framework, which uses AMUSE to couple distinct multi-physics codes that handle star formation, stellar evolution and dynamics, radiative transfer, and magnetohydrodynamics. We upg… ▽ More

    Submitted 7 March, 2025; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Published in A&A

    Journal ref: A&A 690, A94 (2024)

  6. arXiv:2303.14334  [pdf, other

    cs.HC cs.AI cs.CL

    The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces

    Authors: Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasiades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond Fok, Fangzhou Hu, Regan Huff, Dongyeop Kang, Tae Soo Kim, Rodney Kinney , et al. (30 additional authors not shown)

    Abstract: Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has chan… ▽ More

    Submitted 23 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

  7. Early Evolution and 3D Structure of Embedded Star Clusters

    Authors: Claude Cournoyer-Cloutier, Alison Sills, William E. Harris, Sabrina M. Appel, Sean C. Lewis, Brooke Polak, Aaron Tran, Martijn J. C. Wilhelm, Mordecai-Mark Mac Low, Stephen L. W. McMillan, Simon Portegies Zwart

    Abstract: We perform simulations of star cluster formation to investigate the morphological evolution of embedded star clusters in the earliest stages of their evolution. We conduct our simulations with Torch, which uses the AMUSE framework to couple state-of-the-art stellar dynamics to star formation, radiation, stellar winds, and hydrodynamics in FLASH. We simulate a suite of $10^4$ M$_{\odot}$ clouds at… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: 15 pages, 10 figures, to be published in MNRAS

  8. arXiv:2302.03721  [pdf, other

    astro-ph.EP astro-ph.SR

    Radiation shielding of protoplanetary discs in young star-forming regions

    Authors: Martijn J. C. Wilhelm, Simon Portegies Zwart, Claude Cournoyer-Cloutier, Sean C. Lewis, Brooke Polak, Aaron Tran, Mordecai-Mark Mac Low

    Abstract: Protoplanetary discs spend their lives in the dense environment of a star forming region. While there, they can be affected by nearby stars through external photoevaporation and dynamic truncations. We present simulations that use the AMUSE framework to couple the Torch model for star cluster formation from a molecular cloud with a model for the evolution of protoplanetary discs under these two en… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: 23 pages, 22 figures, 1 table, accepted for publication in MNRAS

  9. arXiv:2301.10140  [pdf, other

    cs.DL cs.CL

    The Semantic Scholar Open Data Platform

    Authors: Rodney Kinney, Chloe Anastasiades, Russell Authur, Iz Beltagy, Jonathan Bragg, Alexandra Buraczynski, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Arman Cohan, Miles Crawford, Doug Downey, Jason Dunkelberger, Oren Etzioni, Rob Evans, Sergey Feldman, Joseph Gorney, David Graham, Fangzhou Hu, Regan Huff, Daniel King, Sebastian Kohlmeier, Bailey Kuehl, Michael Langan, Daniel Lin , et al. (23 additional authors not shown)

    Abstract: The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature. We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF conte… ▽ More

    Submitted 25 April, 2025; v1 submitted 24 January, 2023; originally announced January 2023.

    Comments: 8 pages, 6 figures

  10. arXiv:2212.01465  [pdf, other

    astro-ph.GA astro-ph.SR

    Early-Forming Massive Stars Suppress Star Formation and Hierarchical Cluster Assembly

    Authors: Sean C. Lewis, Stephen L. W. McMillan, Mordecai-Mark Mac Low, Claude Cournoyer-Cloutier, Brooke Polak, Martijn J. C. Wilhelm, Aaron Tran, Alison Sills, Simon Portegies Zwart, Ralf S. Klessen, Joshua E. Wall

    Abstract: Feedback from massive stars plays an important role in the formation of star clusters. Whether a very massive star is born early or late in the cluster formation timeline has profound implications for the star cluster formation and assembly processes. We carry out a controlled experiment to characterize the effects of early-forming massive stars on star cluster formation. We use the star formation… ▽ More

    Submitted 28 February, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: 17 pages, 7 figures. Published in ApJ

    Journal ref: ApJ 944 211 (2023)

  11. arXiv:2205.05372  [pdf, other

    astro-ph.GA astro-ph.SR

    Expanding shells around young clusters -- S 171/Be 59

    Authors: G. F. Gahm, M. J. C. Wilhelm, C. M. Persson, A. A. Djupvik, S. F. Portegies Zwart

    Abstract: Some HII regions that surround young stellar clusters are bordered by molecular shells that appear to expand at a rate inconsistent with our current model simulations. In this study we focus on the dynamics of Sharpless 171 (including NGC 7822), which surrounds the cluster Berkeley 59. We aim to compare the velocity pattern over the molecular shell with the mean radial velocity of the cluster for… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

    Comments: 27 pages, 14 figures, 8 tables, 3 appendices. Accepted for publication in Astronomy & Astrophysics

    Journal ref: A&A 663, A111 (2022)

  12. arXiv:2112.03372  [pdf, other

    astro-ph.EP

    The Ice Coverage of Earth-like Planets Orbiting FGK Stars

    Authors: Caitlyn Wilhelm, Rory Barnes, Russell Deitrick, Rachel Mellman

    Abstract: The photometric and spectroscopic signatures of habitable planets orbiting FGK stars may be modulated by surface ice coverage. To estimate its frequency and locations, we simulated the climates of hypothetical planets with a 1D energy balance model and assumed that the planets possess properties similar to modern Earth (mass, geography, atmosphere). We first simulated planets with fixed rotational… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: 23 pages, 13 figures, 4 tables, accepted to Planetary Science Journal. Source code available at https://github.com/VirtualPlanetaryLaboratory/vplanet, and scripts to generate data and figures available at https://github.com/caitlyn-wilhelm/IceCoverage

  13. arXiv:2109.01456  [pdf, other

    astro-ph.SR astro-ph.EP

    Exploring the possibility of Peter Pan discs across stellar mass

    Authors: Martijn J. C. Wilhelm, Simon Portegies Zwart

    Abstract: Recently, several accreting M dwarf stars have been discovered with ages far exceeding the typical protoplanetary disc lifetime. These `Peter Pan discs' can be explained as primordial discs that evolve in a low-radiation environment. The persistently low masses of the host stars raise the question whether primordial discs can survive up to these ages around stars of higher mass. In this work we ex… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Comments: Accepted by MNRAS, 8 pages, 5 figures

  14. Evolution of circumstellar discs in young star-forming regions

    Authors: Francisca Concha-Ramírez, Martijn J. C Wilhelm, Simon Portegies Zwart

    Abstract: The evolution of circumstellar discs is influenced by their surroundings. The relevant processes include external photoevaporation due to nearby stars, and dynamical truncations. The impact of these processes on disc populations depends on the star-formation history and on the dynamical evolution of the region. Since star formation history and the phase-space characteristics of the stars are impor… ▽ More

    Submitted 26 May, 2022; v1 submitted 19 January, 2021; originally announced January 2021.

    Comments: Accepted for publication in MNRAS. 15 pages, 12 figures

  15. arXiv:2006.07378  [pdf, other

    astro-ph.EP astro-ph.GA astro-ph.SR

    Effects of stellar density on the photoevaporation of circumstellar discs

    Authors: Francisca Concha-Ramírez, Martijn J. C. Wilhelm, Simon Portegies Zwart, Sierk E. van Terwisga, Alvaro Hacar

    Abstract: Circumstellar discs are the precursors of planetary systems and develop shortly after their host star has formed. In their early stages these discs are immersed in an environment rich in gas and neighbouring stars, which can be hostile for their survival. There are several environmental processes that affect the evolution of circumstellar discs, and external photoevaporation is arguably one of the… ▽ More

    Submitted 20 November, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: Accepted for publication in MNRAS. 9 pages, 4 figures

  16. arXiv:2006.01320  [pdf, other

    cs.CV

    Two-hand Global 3D Pose Estimation Using Monocular RGB

    Authors: Fanqing Lin, Connor Wilhelm, Tony Martinez

    Abstract: We tackle the challenging task of estimating global 3D joint locations for both hands via only monocular RGB input images. We propose a novel multi-stage convolutional neural network based pipeline that accurately segments and locates the hands despite occlusion between two hands and complex background noise and estimates the 2D and 3D canonical joint locations without any depth information. Globa… ▽ More

    Submitted 25 August, 2020; v1 submitted 1 June, 2020; originally announced June 2020.

  17. arXiv:2004.10706  [pdf, other

    cs.DL cs.CL

    CORD-19: The COVID-19 Open Research Dataset

    Authors: Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Doug Burdick, Darrin Eide, Kathryn Funk, Yannis Katsis, Rodney Kinney, Yunyao Li, Ziyang Liu, William Merrill, Paul Mooney, Dewey Murdick, Devvret Rishi, Jerry Sheehan, Zhihong Shen, Brandon Stilson, Alex Wade, Kuansan Wang, Nancy Xin Ru Wang, Chris Wilhelm, Boya Xie, Douglas Raymond , et al. (3 additional authors not shown)

    Abstract: The COVID-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on COVID-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of text mining and information retrieval systems over its rich collection of metadata and structured full text papers. Since its release, CORD-19 has been downloaded over 200K times and has served as the b… ▽ More

    Submitted 10 July, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

    Comments: ACL NLP-COVID Workshop 2020

  18. The Milky Way's bar structural properties from gravitational waves

    Authors: Martijn J. C. Wilhelm, Valeriya Korol, Elena M. Rossi, Elena D'Onghia

    Abstract: The Laser Interferometer Space Antenna (LISA) will enable Galactic gravitational wave (GW) astronomy by individually resolving $ > 10^4$ signals from double white dwarf (DWD) binaries throughout the Milky Way. In this work we assess for the first time the potential of LISA data to map the Galactic stellar bar and spiral arms, since GWs are unaffected by stellar crowding and dust extinction unlike… ▽ More

    Submitted 3 November, 2020; v1 submitted 24 March, 2020; originally announced March 2020.

    Comments: Accepted by MNRAS, 14 pages, 11 figures, 1 table

  19. arXiv:1907.03760  [pdf, other

    astro-ph.EP astro-ph.GA

    External photoevaporation of circumstellar disks constrains the timescale for planet formation

    Authors: Francisca Concha-Ramírez, Martijn J. C. Wilhelm, Simon Portegies Zwart, Thomas J. Haworth

    Abstract: Planet-forming circumstellar disks are a fundamental part of the star formation process. Since stars form in a hierarchical fashion in groups of up to hundreds or thousands, the UV radiation environment that these disks are exposed to can vary in strength by at least six orders of magnitude. This radiation can limit the masses and sizes of the disks. Diversity in star forming environments can have… ▽ More

    Submitted 18 October, 2019; v1 submitted 8 July, 2019; originally announced July 2019.

    Comments: 13 pages, 14 figures. Accepted for publication in MNRAS

  20. arXiv:1905.06367  [pdf, other

    astro-ph.EP astro-ph.SR

    VPLanet: The Virtual Planet Simulator

    Authors: Rory Barnes, Rodrigo Luger, Russell Deitrick, Peter Driscoll, Thomas R. Quinn, David P. Fleming, Hayden Smotherman, Diego V. McDonald, Caitlyn Wilhelm, Rodolfo Garcia, Patrick Barth, Benjamin Guyer, Victoria S. Meadows, Cecilia M. Bitz, Pramod Gupta, Shawn D. Domagal-Goldman, John Armstrong

    Abstract: We describe a software package called VPLanet that simulates fundamental aspects of planetary system evolution over Gyr timescales, with a focus on investigating habitable worlds. In this initial release, eleven physics modules are included that model internal, atmospheric, rotational, orbital, stellar, and galactic processes. Many of these modules can be coupled simultaneously to simulate the evo… ▽ More

    Submitted 27 August, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

    Comments: 75 pages, 34 figures, 10 tables, accepted to the Proceedings of the Astronomical Society of the Pacific. Source code, documentation, and examples available at https://github.com/VirtualPlanetaryLaboratory/vplanet

  21. arXiv:1809.01328  [pdf

    cond-mat.mes-hall physics.optics

    Nanowire lasers

    Authors: C. Couteau, A. Larrue, C. Wilhelm, C. Soci

    Abstract: We review principles and trends in the use of semiconductor nanowires (NWs) as gain media for stimulated emission and lasing. Semiconductor nanowires have recently been widely studied for use in integrated optoelectronic devices, such as LEDs, solar cells, and transistors. Intensive research has also been conducted on the use of nanowires for sub-wavelength laser systems that take advantage of the… ▽ More

    Submitted 5 September, 2018; originally announced September 2018.

    Comments: Review article. 18 pages

    Journal ref: Nanophotonics 4, 90 (2015)

  22. arXiv:1807.08994  [pdf

    physics.bio-ph cond-mat.soft

    Forced- and Self-Rotation of Magnetic Nanorods Assembly at the Cell Membrane: A Biomagnetic Torsion Pendulum

    Authors: François Mazuel, Samuel Mathieu, Riccardo Di Corato, Jean-Claude Bacri, Thierry Meylheuc, Teresa Pellegrino, Myriam Reffay, Claire Wilhelm

    Abstract: In order to give insights into how anisotropic nano-objects interact with living cell membranes, and possibly self-assemble, we designed magnetic nanorods with average size around 100 nm x 1$μ$m by assembling iron oxide nanocubes within a polymeric matrix under a magnetic field. We then explored the nano-bio interface at the cell membrane under the influence of a rotating magnetic field. We observ… ▽ More

    Submitted 24 July, 2018; originally announced July 2018.

    Journal ref: Small, Wiley-VCH Verlag, 2017, 13 (31)

  23. arXiv:1806.07976  [pdf, other

    cs.CL

    Ontology Alignment in the Biomedical Domain Using Entity Definitions and Context

    Authors: Lucy Lu Wang, Chandra Bhagavatula, Mark Neumann, Kyle Lo, Chris Wilhelm, Waleed Ammar

    Abstract: Ontology alignment is the task of identifying semantically equivalent entities from two given ontologies. Different ontologies have different representations of the same entity, resulting in a need to de-duplicate entities when merging ontologies. We propose a method for enriching entities in an ontology with external definition and context information, and use this additional information for onto… ▽ More

    Submitted 20 June, 2018; originally announced June 2018.

    Comments: ACL 2018 BioNLP workshop

  24. arXiv:1805.02262  [pdf, other

    cs.CL

    Construction of the Literature Graph in Semantic Scholar

    Authors: Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi, Matthew Peters, Joanna Power, Sam Skjonsberg, Lucy Lu Wang, Chris Wilhelm, Zheng Yuan, Madeleine van Zuylen, Oren Etzioni

    Abstract: We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers, authors, entities and various interactions between them (e.g., authorships, citations, entity mentions). We reduce literature graph construction in… ▽ More

    Submitted 6 May, 2018; originally announced May 2018.

    Comments: To appear in NAACL 2018 industry track

  25. Exo-Milankovitch Cycles II: Climates of G-dwarf Planets in Dynamically Hot Systems

    Authors: Russell Deitrick, Rory Barnes, Cecilia Bitz, David Fleming, Benjamin Charnay, Victoria Meadows, Caitlyn Wilhelm, John Armstrong, Thomas R. Quinn

    Abstract: Using an energy balance model with ice sheets, we examine the climate response of an Earth-like planet orbiting a G dwarf star and experiencing large orbital and obliquity variations. We find that ice caps couple strongly to the orbital forcing, leading to extreme ice ages. In contrast with previous studies, we find that such exo-Milankovitch cycles tend to impair habitability by inducing snowball… ▽ More

    Submitted 1 May, 2018; originally announced May 2018.

    Comments: 37 pages, 26 figures, accepted at the Astronomical Journal

  26. Exo-Milankovitch Cycles I: Orbits and Rotation States

    Authors: Russell Deitrick, Rory Barnes, Thomas R. Quinn, John Armstrong, Benjamin Charnay, Caitlyn Wilhelm

    Abstract: The obliquity of the Earth, which controls our seasons, varies by only ~2.5 degrees over ~40,000 years, and its eccentricity varies by only ~0.05 over 100,000 years. Nonetheless, these small variations influence Earth's ice ages. For exoplanets, however, variations can be significantly larger. Previous studies of the habitability of moonless Earth-like exoplanets have found that high obliquities,… ▽ More

    Submitted 28 December, 2017; originally announced December 2017.

    Comments: 29 pages, 19 figures, accepted for publication in the Astronomical Journal

  27. arXiv:1509.08314  [pdf, ps, other

    physics.optics cond-mat.mtrl-sci

    Broadband tunable hybrid photonic crystal-nanowire light emitter

    Authors: Christophe E. Wilhelm, M. Iqbal Bakti Utama, Qihua Xiong, Cesare Soci, Gaëlle Lehoucq, Daniel Dolfi, Alfredo De Rossi, Sylvain Combrié

    Abstract: We integrate about 100 single Cadmium Selenide semiconductor nanowires in self-standing Silicon Nitride photonic crystal cavities in a single processing run. Room temperature measurements reveal a single narrow emission linewidth, corresponding to a Q-factor as large as 5000. By varying the structural parameters of the photonic crystal, the peak wavelength is tuned, thereby covering the entire emi… ▽ More

    Submitted 25 September, 2015; originally announced September 2015.