-
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation
Authors:
Emilio Villa-Cueva,
Sholpan Bolatzhanova,
Diana Turmakhan,
Kareem Elzeky,
Henok Biadglign Ademtew,
Alham Fikri Aji,
Israel Abebe Azime,
Jinheon Baek,
Frederico Belcavello,
Fermin Cristobal,
Jan Christian Blaise Cruz,
Mary Dabre,
Raj Dabre,
Toqeer Ehsan,
Naome A Etori,
Fauzan Farooqui,
Jiahui Geng,
Guido Ivetta,
Thanmay Jayakumar,
Soyeong Jeong,
Zheng Wei Lim,
Aishik Mandal,
Sofia Martinelli,
Mihail Minkov Mihaylov,
Daniil Orel
, et al. (9 additional authors not shown)
Abstract:
Cultural content poses challenges for machine translation systems due to the differences in conceptualizations between cultures, where language alone may fail to convey sufficient context to capture region-specific meanings. In this work, we investigate whether images can act as cultural context in multimodal translation. We introduce CaMMT, a human-curated benchmark of over 5,800 triples of image…
▽ More
Cultural content poses challenges for machine translation systems due to the differences in conceptualizations between cultures, where language alone may fail to convey sufficient context to capture region-specific meanings. In this work, we investigate whether images can act as cultural context in multimodal translation. We introduce CaMMT, a human-curated benchmark of over 5,800 triples of images along with parallel captions in English and regional languages. Using this dataset, we evaluate five Vision Language Models (VLMs) in text-only and text+image settings. Through automatic and human evaluations, we find that visual context generally improves translation quality, especially in handling Culturally-Specific Items (CSIs), disambiguation, and correct gender usage. By releasing CaMMT, we aim to support broader efforts in building and evaluating multimodal translation systems that are better aligned with cultural nuance and regional variation.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation
Authors:
Israfel Salazar,
Manuel Fernández Burda,
Shayekh Bin Islam,
Arshia Soltani Moakhar,
Shivalika Singh,
Fabian Farestam,
Angelika Romanou,
Danylo Boiko,
Dipika Khullar,
Mike Zhang,
Dominik Krzemiński,
Jekaterina Novikova,
Luísa Shimabucoro,
Joseph Marvin Imperial,
Rishabh Maheshwary,
Sharad Duwal,
Alfonso Amayuelas,
Swati Rajwal,
Jebish Purbey,
Ahmed Ruby,
Nicholas Popovič,
Marek Suppa,
Azmine Toushik Wasi,
Ram Mohan Rao Kadiyala,
Olga Tsymboi
, et al. (20 additional authors not shown)
Abstract:
The evaluation of vision-language models (VLMs) has mainly relied on English-language benchmarks, leaving significant gaps in both multilingual and multicultural coverage. While multilingual benchmarks have expanded, both in size and languages, many rely on translations of English datasets, failing to capture cultural nuances. In this work, we propose Kaleidoscope, as the most comprehensive exam b…
▽ More
The evaluation of vision-language models (VLMs) has mainly relied on English-language benchmarks, leaving significant gaps in both multilingual and multicultural coverage. While multilingual benchmarks have expanded, both in size and languages, many rely on translations of English datasets, failing to capture cultural nuances. In this work, we propose Kaleidoscope, as the most comprehensive exam benchmark to date for the multilingual evaluation of vision-language models. Kaleidoscope is a large-scale, in-language multimodal benchmark designed to evaluate VLMs across diverse languages and visual inputs. Kaleidoscope covers 18 languages and 14 different subjects, amounting to a total of 20,911 multiple-choice questions. Built through an open science collaboration with a diverse group of researchers worldwide, Kaleidoscope ensures linguistic and cultural authenticity. We evaluate top-performing multilingual vision-language models and find that they perform poorly on low-resource languages and in complex multimodal scenarios. Our results highlight the need for progress on culturally inclusive multimodal evaluation frameworks.
△ Less
Submitted 29 April, 2025; v1 submitted 9 April, 2025;
originally announced April 2025.
-
Towards a framework for understanding societal and ethical implications of Artificial Intelligence
Authors:
Richard Benjamins,
Idoia Salazar
Abstract:
Artificial Intelligence (AI) is one of the most discussed technologies today. There are many innovative applications such as the diagnosis and treatment of cancer, customer experience, new business, education, contagious diseases propagation and optimization of the management of humanitarian catastrophes. However, with all those opportunities also comes great responsibility to ensure good and fair…
▽ More
Artificial Intelligence (AI) is one of the most discussed technologies today. There are many innovative applications such as the diagnosis and treatment of cancer, customer experience, new business, education, contagious diseases propagation and optimization of the management of humanitarian catastrophes. However, with all those opportunities also comes great responsibility to ensure good and fair practice of AI. The objective of this paper is to identify the main societal and ethical challenges implied by a massive uptake of AI. We have surveyed the literature for the most common challenges and classified them in seven groups: 1) Non-desired effects, 2) Liability, 3) Unknown consequences, 4) Relation people-robots, 5) Concentration of power and wealth, 6) Intentional bad uses, and 7) AI for weapons and warfare. The challenges should be dealt with in different ways depending on their origin; some have technological solutions, while others require ethical, societal, or political answers. Depending on the origin, different stakeholders might need to act. Whatever the identified stakeholder, not treating those issues will lead to uncertainty and unforeseen consequences with potentially large negative societal impact, hurting especially the most vulnerable groups of societies. Technology is helping to take better decisions, and AI is promoting data-driven decisions in addition to experience- and intuition-based discussion, with many improvements happening. However, the negative side effects of this technology need to be well understood and acted upon before we launch them massively into the world.
△ Less
Submitted 3 January, 2020;
originally announced January 2020.
-
Galaxy Merger Fractions in Two Clusters at $z\sim2$ Using the Hubble Space Telescope
Authors:
Courtney Watson,
Kim-Vy Tran,
Adam Tomczak,
Leo Alcorn,
Irene V. Salazar,
Anshu Gupta,
Ivelina Momcheva,
Casey Papovich,
Pieter van Dokkum,
Gabriel Brammer,
Jennifer Lotz
Abstract:
We measure the fraction of galaxy-galaxy mergers in two clusters at $z\sim2$ using imaging and grism observations from the {\it Hubble Space Telescope}. The two galaxy cluster candidates were originally identified as overdensities of objects using deep mid-infrared imaging and observations from the {\it Spitzer Space Telescope}, and were subsequently followed up with HST/WFC3 imaging and grism obs…
▽ More
We measure the fraction of galaxy-galaxy mergers in two clusters at $z\sim2$ using imaging and grism observations from the {\it Hubble Space Telescope}. The two galaxy cluster candidates were originally identified as overdensities of objects using deep mid-infrared imaging and observations from the {\it Spitzer Space Telescope}, and were subsequently followed up with HST/WFC3 imaging and grism observations. We identify galaxy-galaxy merger candidates using high resolution imaging with the WFC3 in the F105W, F125W, and F160W bands. Coarse redshifts for the same objects are obtained with grism observations in G102 for the $z\sim1.6$ cluster (IRC0222A) and G141 for the $z\sim2$ cluster (IRC0222B). Using visual classifications as well as a variety of selection techniques, we measure merger fractions of $11_{-3.2}^{+8.2}$ in IRC0222A and $18_{-4.5}^{+7.8}$ in IRC0222B. In comparison, we measure a merger fraction of $5.0_{-0.8}^{+1.1}\%$ for field galaxies at $z\sim2$. Our study indicates that the galaxy-galaxy merger fraction in clusters at $z\sim2$ is enhanced compared the field population, but note that more cluster measurements at this epoch are needed to confirm our findings.
△ Less
Submitted 19 February, 2019;
originally announced February 2019.
-
Renyi relative entropies and renormalization group flows
Authors:
Horacio Casini,
Raimel Medina,
Ignacio Salazar,
Gonzalo Torroba
Abstract:
Quantum Renyi relative entropies provide a one-parameter family of distances between density matrices, which generalizes the relative entropy and the fidelity. We study these measures for renormalization group flows in quantum field theory. We derive explicit expressions in free field theory based on the real time approach. Using monotonicity properties, we obtain new inequalities that need to be…
▽ More
Quantum Renyi relative entropies provide a one-parameter family of distances between density matrices, which generalizes the relative entropy and the fidelity. We study these measures for renormalization group flows in quantum field theory. We derive explicit expressions in free field theory based on the real time approach. Using monotonicity properties, we obtain new inequalities that need to be satisfied by consistent renormalization group trajectories in field theory. These inequalities play the role of a second law of thermodynamics, in the context of renormalization group flows. Finally, we apply these results to a tractable Kondo model, where we evaluate the Renyi relative entropies explicitly. An outcome of this is that Anderson's orthogonality catastrophe can be avoided by working on a Cauchy surface that approaches the light-cone.
△ Less
Submitted 8 August, 2018; v1 submitted 9 July, 2018;
originally announced July 2018.
-
Accurate Pre-Eruption and Post-Eruption Orbital Periods for the Dwarf/Classical Nova V1017 Sgr
Authors:
Irene V. Salazar,
Amy LeBleu,
Bradley E. Schaefer,
Arlo U. Landolt,
Shawn Dvorak
Abstract:
V1017 Sgr is a classical nova (in 1919) that displayed an earlier dwarf nova eruption (in 1901), and two more dwarf nova events (in 1973 and 1991). Previous work on this bright system in quiescence (V=13.5) has only been a few isolated magnitudes, a few spectra, and an ambiguous claim for an orbital period of 5.714 days as based on nine radial velocities. To test this period, we have collected 289…
▽ More
V1017 Sgr is a classical nova (in 1919) that displayed an earlier dwarf nova eruption (in 1901), and two more dwarf nova events (in 1973 and 1991). Previous work on this bright system in quiescence (V=13.5) has only been a few isolated magnitudes, a few spectra, and an ambiguous claim for an orbital period of 5.714 days as based on nine radial velocities. To test this period, we have collected 2896 magnitudes (plus 53 in the literature) in the UBVRIJHKL bands from 1897 to 2016, making an essentially complete photometric history of this unique cataclysmic variable. We find that the light curve in all bands is dominated by the ellipsoidal modulations of a G giant companion star, with a post-eruption (after the 1919 nova event) orbital period of 5.786290 +- 0.000032 days. This is the longest period for any classical nova, the accretion must be powered by the nuclear evolution of the companion star, and the dwarf nova events occur only because the outer parts of the large disk are cool enough to be unstable. Furthermore, we measure the pre-eruption orbital period (from 1907 to 1916), and there is a small steady period change in quiescence. The orbital period has decreased by 273 +- 61 parts-per-million across the 1919 eruption, with the significance of the period change being at the 5.7-sigma confidence level. This is startling and mystifying for nova-theory, because the three known period change effects cannot account for a period decrease in V1017 Sgr, much less one of such a large size.
△ Less
Submitted 1 December, 2016;
originally announced December 2016.