-
Experimentation in Content Moderation using RWKV
Authors:
Umut Yildirim,
Rohan Dutta,
Burak Yildirim,
Atharva Vaidya
Abstract:
This paper investigates the RWKV model's efficacy in content moderation through targeted experimentation. We introduce a novel dataset specifically designed for distillation into smaller models, enhancing content moderation practices. This comprehensive dataset encompasses images, videos, sounds, and text data that present societal challenges. Leveraging advanced Large Language Models (LLMs), we g…
▽ More
This paper investigates the RWKV model's efficacy in content moderation through targeted experimentation. We introduce a novel dataset specifically designed for distillation into smaller models, enhancing content moderation practices. This comprehensive dataset encompasses images, videos, sounds, and text data that present societal challenges. Leveraging advanced Large Language Models (LLMs), we generated an extensive set of responses -- 558,958 for text and 83,625 for images -- to train and refine content moderation systems. Our core experimentation involved fine-tuning the RWKV model, capitalizing on its CPU-efficient architecture to address large-scale content moderation tasks. By highlighting the dataset's potential for knowledge distillation, this study not only demonstrates RWKV's capability in improving the accuracy and efficiency of content moderation systems but also paves the way for developing more compact, resource-efficient models in this domain. Datasets and models can be found in HuggingFace: https://huggingface.co/modrwkv
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
Fractal dimension, approximation and data sets
Authors:
L. Betti,
I. Chio,
J. Fleischman,
A. Iosevich,
F. Iulianelli,
S. Kirila,
M. Martino,
A. Mayeli,
S. Pack,
Z. Sheng,
C. Taliancic,
A. Thomas,
N. Whybra,
E. Wyman,
U. Yildirim,
K. Zhao
Abstract:
The purpose of this paper is to study the fractal phenomena in large data sets and the associated questions of dimension reduction. We examine situations where the classical Principal Component Analysis is not effective in identifying the salient underlying fractal features of the data set. Instead, we employ the discrete energy, a technique borrowed from geometric measure theory, to limit the num…
▽ More
The purpose of this paper is to study the fractal phenomena in large data sets and the associated questions of dimension reduction. We examine situations where the classical Principal Component Analysis is not effective in identifying the salient underlying fractal features of the data set. Instead, we employ the discrete energy, a technique borrowed from geometric measure theory, to limit the number of points of a given data set that lie near a $k$-dimensional hyperplane, or, more generally, near a set of a given upper Minkowski dimension. Concrete motivations stemming from naturally arising data sets are described and future directions outlined.
△ Less
Submitted 24 September, 2022;
originally announced September 2022.
-
Deep-Learning Driven Noise Reduction for Reduced Flux Computed Tomography
Authors:
Khalid L. Alsamadony,
Ertugrul U. Yildirim,
Guenther Glatz,
Umair bin Waheed,
Sherif M. Hanafy
Abstract:
Deep neural networks have received considerable attention in clinical imaging, particularly with respect to the reduction of radiation risk. Lowering the radiation dose by reducing the photon flux inevitably results in the degradation of the scanned image quality. Thus, researchers have sought to exploit deep convolutional neural networks (DCNNs) to map low-quality, low-dose images to higher-dose,…
▽ More
Deep neural networks have received considerable attention in clinical imaging, particularly with respect to the reduction of radiation risk. Lowering the radiation dose by reducing the photon flux inevitably results in the degradation of the scanned image quality. Thus, researchers have sought to exploit deep convolutional neural networks (DCNNs) to map low-quality, low-dose images to higher-dose, higher-quality images thereby minimizing the associated radiation hazard. Conversely, computed tomography (CT) measurements of geomaterials are not limited by the radiation dose. In contrast to the human body, however, geomaterials may be comprised of high-density constituents causing increased attenuation of the X-Rays. Consequently, higher dosage images are required to obtain an acceptable scan quality. The problem of prolonged acquisition times is particularly severe for micro-CT based scanning technologies. Depending on the sample size and exposure time settings, a single scan may require several hours to complete. This is of particular concern if phenomena with an exponential temperature dependency are to be elucidated. A process may happen too fast to be adequately captured by CT scanning. To address the aforementioned issues, we apply DCNNs to improve the quality of rock CT images and reduce exposure times by more than 60\%, simultaneously. We highlight current results based on micro-CT derived datasets and apply transfer learning to improve DCNN results without increasing training time. The approach is applicable to any computed tomography technology. Furthermore, we contrast the performance of the DCNN trained by minimizing different loss functions such as mean squared error and structural similarity index.
△ Less
Submitted 18 January, 2021;
originally announced January 2021.
-
Complex $G_2$-manifolds and Seiberg-Witten Equations
Authors:
Selman Akbulut,
Ustun Yildirim
Abstract:
We introduce the notion of complex $G_2$ manifold $M_{\mathbb C}$, and complexification of a $G_2$ manifold $M\subset M_{\mathbb C}$. As an application we show the following: If $(Y,s)$ is a closed oriented $3$-manifold with a $Spin^{c}$ structure, and $(Y,s)\subset (M, \varphi)$ is an imbedding as an associative submanifold of some $G_2$ manifold (such imbedding always exists), then the isotropic…
▽ More
We introduce the notion of complex $G_2$ manifold $M_{\mathbb C}$, and complexification of a $G_2$ manifold $M\subset M_{\mathbb C}$. As an application we show the following: If $(Y,s)$ is a closed oriented $3$-manifold with a $Spin^{c}$ structure, and $(Y,s)\subset (M, \varphi)$ is an imbedding as an associative submanifold of some $G_2$ manifold (such imbedding always exists), then the isotropic associative deformations of $Y$ in the complexified $G_2$ manifold $M_{\mathbb C}$ is given by Seiberg-Witten equations.
△ Less
Submitted 15 October, 2018; v1 submitted 26 April, 2018;
originally announced April 2018.
-
On the Complex Cayley Grassmannian
Authors:
Üstün Yıldırım
Abstract:
We define a torus action on the (complex) Cayley Grassmannian $X$. Using this action, we prove that $X$ is a singular variety. We also show that the singular locus is smooth and has the same cohomology ring as that of $\mathbb{CP}^5$. Furthermore, we identify the singular locus with a quotient of $G_2^\mathbb{C}$ by a parabolic subgroup.
We define a torus action on the (complex) Cayley Grassmannian $X$. Using this action, we prove that $X$ is a singular variety. We also show that the singular locus is smooth and has the same cohomology ring as that of $\mathbb{CP}^5$. Furthermore, we identify the singular locus with a quotient of $G_2^\mathbb{C}$ by a parabolic subgroup.
△ Less
Submitted 27 February, 2019; v1 submitted 14 November, 2017;
originally announced November 2017.