-
Gemma 2: Improving Open Language Models at a Practical Size
Authors:
Gemma Team,
Morgane Riviere,
Shreya Pathak,
Pier Giuseppe Sessa,
Cassidy Hardin,
Surya Bhupatiraju,
Léonard Hussenot,
Thomas Mesnard,
Bobak Shahriari,
Alexandre Ramé,
Johan Ferret,
Peter Liu,
Pouya Tafti,
Abe Friesen,
Michelle Casbon,
Sabela Ramos,
Ravin Kumar,
Charline Le Lan,
Sammy Jerome,
Anton Tsitsulin,
Nino Vieillard,
Piotr Stanczyk,
Sertan Girgin,
Nikola Momchev,
Matt Hoffman
, et al. (173 additional authors not shown)
Abstract:
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al…
▽ More
In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al., 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2-3 times bigger. We release all our models to the community.
△ Less
Submitted 2 October, 2024; v1 submitted 31 July, 2024;
originally announced August 2024.
-
Navigating the Web of Misinformation: A Framework for Misinformation Domain Detection Using Browser Traffic
Authors:
Mayana Pereira,
Kevin Greene,
Nilima Pisharody,
Rahul Dodhia,
Jacob N. Shapiro,
Juan Lavista
Abstract:
The proliferation of misinformation and propaganda is a global challenge, with profound effects during major crises such as the COVID-19 pandemic and the Russian invasion of Ukraine. Understanding the spread of misinformation and its social impacts requires identifying the news sources spreading false information. While machine learning (ML) techniques have been proposed to address this issue, ML…
▽ More
The proliferation of misinformation and propaganda is a global challenge, with profound effects during major crises such as the COVID-19 pandemic and the Russian invasion of Ukraine. Understanding the spread of misinformation and its social impacts requires identifying the news sources spreading false information. While machine learning (ML) techniques have been proposed to address this issue, ML models have failed to provide an efficient implementation scenario that yields useful results. In prior research, the precision of deployment in real traffic deteriorates significantly, experiencing a decrement up to ten times compared to the results derived from benchmark data sets. Our research addresses this gap by proposing a graph-based approach to capture navigational patterns and generate traffic-based features which are used to train a classification model. These navigational and traffic-based features result in classifiers that present outstanding performance when evaluated against real traffic. Moreover, we also propose graph-based filtering techniques to filter out models to be classified by our framework. These filtering techniques increase the signal-to-noise ratio of the models to be classified, greatly reducing false positives and the computational cost of deploying the model. Our proposed framework for the detection of misinformation domains achieves a precision of 0.78 when evaluated in real traffic. This outcome represents an improvement factor of over ten times over those achieved in previous studies.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
An Infinite Parade of Giraffes: Expressive Augmentation and Complexity Layers for Cartoon Drawing
Authors:
K. G. Greene
Abstract:
In this paper, we explore creative image generation constrained by small data. To partially automate the creation of cartoon sketches consistent with a specific designer's style, where acquiring a very large original image set is impossible or cost prohibitive, we exploit domain specific knowledge for a huge reduction in original image requirements, creating an effectively infinite number of carto…
▽ More
In this paper, we explore creative image generation constrained by small data. To partially automate the creation of cartoon sketches consistent with a specific designer's style, where acquiring a very large original image set is impossible or cost prohibitive, we exploit domain specific knowledge for a huge reduction in original image requirements, creating an effectively infinite number of cartoon giraffes from just nine original drawings. We introduce "expressive augmentations" for cartoon sketches, mathematical transformations that create broad domain appropriate variation, far beyond the usual affine transformations, and we show that chained GANs models trained on the temporal stages of drawing or "complexity layers" can effectively add character appropriate details and finish new drawings in the designer's style.
We discuss the application of these tools in design processes for textiles, graphics, architectural elements and interior design.
△ Less
Submitted 7 November, 2018;
originally announced November 2018.
-
DragonPaint: Rule based bootstrapping for small data with an application to cartoon coloring
Authors:
K. Gretchen Greene
Abstract:
In this paper, we confront the problem of deep learning's big labeled data requirements, offer a rule based strategy for extreme augmentation of small data sets and apply that strategy with the image to image translation model by Isola et al. (2016) to automate cel style cartoon coloring with very limited training data. While our experimental results using geometric rules and transformations demon…
▽ More
In this paper, we confront the problem of deep learning's big labeled data requirements, offer a rule based strategy for extreme augmentation of small data sets and apply that strategy with the image to image translation model by Isola et al. (2016) to automate cel style cartoon coloring with very limited training data. While our experimental results using geometric rules and transformations demonstrate the performance of our methods on an image translation task with industry applications in art, design and animation, we also propose the use of rules on partial data sets as a generalizable small data strategy, potentially applicable across data types and domains.
△ Less
Submitted 7 November, 2018;
originally announced November 2018.