Skip to main content

Showing 1–34 of 34 results for author: Garcia-Gasulla, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.04388  [pdf, ps, other

    cs.CL cs.AI

    The Aloe Family Recipe for Open and Specialized Healthcare LLMs

    Authors: Dario Garcia-Gasulla, Jordi Bayarri-Planas, Ashwin Kumar Gururajan, Enrique Lopez-Cuena, Adrian Tormos, Daniel Hinjos, Pablo Bernabeu-Perez, Anna Arias-Duart, Pablo Agustin Martin-Torres, Marta Gonzalez-Mallo, Sergio Alvarez-Napagao, Eduard Ayguadé-Parra, Ulises Cortés

    Abstract: Purpose: With advancements in Large Language Models (LLMs) for healthcare, the need arises for competitive open-source models to protect the public interest. This work contributes to the field of open medical LLMs by optimizing key stages of data preprocessing and training, while showing how to improve model safety (through DPO) and efficacy (through RAG). The evaluation methodology used, which in… ▽ More

    Submitted 28 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

    Comments: Follow-up work from arXiv:2405.01886

  2. arXiv:2504.01986  [pdf, ps, other

    cs.AR cs.AI

    TuRTLe: A Unified Evaluation of LLMs for RTL Generation

    Authors: Dario Garcia-Gasulla, Gokcen Kestor, Emanuele Parisi, Miquel Albertí-Binimelis, Cristian Gutierrez, Razine Moundir Ghorab, Orlando Montenegro, Bernat Homs, Miquel Moreto

    Abstract: The rapid advancements in LLMs have driven the adoption of generative AI in various domains, including Electronic Design Automation (EDA). Unlike traditional software development, EDA presents unique challenges, as generated RTL code must not only be syntactically correct and functionally accurate but also synthesizable by hardware generators while meeting performance, power, and area constraints.… ▽ More

    Submitted 30 May, 2025; v1 submitted 31 March, 2025; originally announced April 2025.

    ACM Class: I.2.5; J.6

  3. arXiv:2502.13603  [pdf, other

    cs.CL cs.AI cs.LG

    Efficient Safety Retrofitting Against Jailbreaking for LLMs

    Authors: Dario Garcia-Gasulla, Adrian Tormos, Anna Arias-Duart, Daniel Hinjos, Oscar Molina-Sedano, Ashwin Kumar Gururajan, Maria Eugenia Cardello

    Abstract: Direct Preference Optimization (DPO) is an efficient alignment technique that steers LLMs towards preferable outputs by training on preference data, bypassing the need for explicit reward models. Its simplicity enables easy adaptation to various domains and safety requirements. This paper examines DPO's effectiveness in model safety against jailbreaking attacks while minimizing data requirements a… ▽ More

    Submitted 25 February, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

  4. arXiv:2502.06666  [pdf, other

    cs.CL cs.AI

    Automatic Evaluation of Healthcare LLMs Beyond Question-Answering

    Authors: Anna Arias-Duart, Pablo Agustin Martin-Torres, Daniel Hinjos, Pablo Bernabeu-Perez, Lucia Urcelay Ganzabal, Marta Gonzalez Mallo, Ashwin Kumar Gururajan, Enrique Lopez-Cuena, Sergio Alvarez-Napagao, Dario Garcia-Gasulla

    Abstract: Current Large Language Models (LLMs) benchmarks are often based on open-ended or close-ended QA evaluations, avoiding the requirement of human labor. Close-ended measurements evaluate the factuality of responses but lack expressiveness. Open-ended capture the model's capacity to produce discourse responses but are harder to assess for correctness. These two approaches are commonly used, either ind… ▽ More

    Submitted 10 February, 2025; originally announced February 2025.

  5. arXiv:2411.18926  [pdf, other

    cs.CV

    Data Augmentation with Diffusion Models for Colon Polyp Localization on the Low Data Regime: How much real data is enough?

    Authors: Adrian Tormos, Blanca Llauradó, Fernando Núñez, Axel Romero, Dario Garcia-Gasulla, Javier Béjar

    Abstract: The scarcity of data in medical domains hinders the performance of Deep Learning models. Data augmentation techniques can alleviate that problem, but they usually rely on functional transformations of the data that do not guarantee to preserve the original tasks. To approximate the distribution of the data using generative models is a way of reducing that problem and also to obtain new samples tha… ▽ More

    Submitted 28 November, 2024; originally announced November 2024.

    ACM Class: I.2.1; I.4.8; I.5.1; I.4.9

  6. arXiv:2409.15127  [pdf, other

    cs.AI

    Pareto-Optimized Open-Source LLMs for Healthcare via Context Retrieval

    Authors: Jordi Bayarri-Planas, Ashwin Kumar Gururajan, Dario Garcia-Gasulla

    Abstract: This study leverages optimized context retrieval to enhance open-source Large Language Models (LLMs) for cost-effective, high performance healthcare AI. We demonstrate that this approach achieves state-of-the-art accuracy on medical question answering at a fraction of the cost of proprietary models, significantly improving the cost-accuracy Pareto frontier on the MedQA benchmark. Key contributions… ▽ More

    Submitted 3 April, 2025; v1 submitted 23 September, 2024; originally announced September 2024.

    Comments: 14 pages, 3 figures, 5 tables, Accepted for publication at the 21st International Conference on Artificial Intelligence Applications and Innovations (AIAI 2025)

    ACM Class: I.2.0; I.2.7

  7. arXiv:2409.14128  [pdf, other

    cs.CV cs.AI cs.LG

    Present and Future Generalization of Synthetic Image Detectors

    Authors: Pablo Bernabeu-Perez, Enrique Lopez-Cuena, Dario Garcia-Gasulla

    Abstract: The continued release of increasingly realistic image generation models creates a demand for synthetic image detectors. To build effective detectors we must first understand how factors like data source diversity, training methodologies and image alterations affect their generalization capabilities. This work conducts a systematic analysis and uses its insights to develop practical guidelines for… ▽ More

    Submitted 26 November, 2024; v1 submitted 21 September, 2024; originally announced September 2024.

    Comments: 21 pages, 12 figures

  8. arXiv:2405.01886  [pdf, other

    cs.CL cs.AI

    Aloe: A Family of Fine-tuned Open Healthcare LLMs

    Authors: Ashwin Kumar Gururajan, Enrique Lopez-Cuena, Jordi Bayarri-Planas, Adrian Tormos, Daniel Hinjos, Pablo Bernabeu-Perez, Anna Arias-Duart, Pablo Agustin Martin-Torres, Lucia Urcelay-Ganzabal, Marta Gonzalez-Mallo, Sergio Alvarez-Napagao, Eduard Ayguadé-Parra, Ulises Cortés Dario Garcia-Gasulla

    Abstract: As the capabilities of Large Language Models (LLMs) in healthcare and medicine continue to advance, there is a growing need for competitive open-source models that can safeguard public interest. With the increasing availability of highly competitive open base models, the impact of continued pre-training is increasingly uncertain. In this work, we explore the role of instruct tuning, model merging,… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Five appendix

  9. arXiv:2309.08048  [pdf, other

    cs.CV cs.AI

    Padding Aware Neurons

    Authors: Dario Garcia-Gasulla, Victor Gimenez-Abalos, Pablo Martin-Torres

    Abstract: Convolutional layers are a fundamental component of most image-related models. These layers often implement by default a static padding policy (\eg zero padding), to control the scale of the internal representations, and to allow kernel activations centered on the border regions. In this work we identify Padding Aware Neurons (PANs), a type of filter that is found in most (if not all) convolutiona… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: In 4th Visual Inductive Priors for Data-Efficient Deep Learning Workshop, ICCV 2023

  10. arXiv:2308.02534  [pdf, other

    cs.CV cs.AI

    Exploring the Role of Explainability in AI-Assisted Embryo Selection

    Authors: Lucia Urcelay, Daniel Hinjos, Pablo A. Martin-Torres, Marta Gonzalez, Marta Mendez, Salva Cívico, Sergio Álvarez-Napagao, Dario Garcia-Gasulla

    Abstract: In Vitro Fertilization is among the most widespread treatments for infertility. One of its main challenges is the evaluation and selection of embryo for implantation, a process with large inter- and intra-clinician variability. Deep learning based methods are gaining attention, but their opaque nature compromises their acceptance in the clinical context, where transparency in the decision making i… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  11. arXiv:2211.04347  [pdf, ps, other

    cs.CV cs.AI

    When & How to Transfer with Transfer Learning

    Authors: Adrian Tormos, Dario Garcia-Gasulla, Victor Gimenez-Abalos, Sergio Alvarez-Napagao

    Abstract: In deep learning, transfer learning (TL) has become the de facto approach when dealing with image related tasks. Visual features learnt for one task have been shown to be reusable for other tasks, improving performance significantly. By reusing deep representations, TL enables the use of deep models in domains with limited data availability, limited computational resources and/or limited access to… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

  12. arXiv:2203.11261  [pdf, other

    cs.SI cs.LG

    Healthy Twitter discussions? Time will tell

    Authors: Dmitry Gnatyshak, Dario Garcia-Gasulla, Sergio Alvarez-Napagao, Jamie Arjona, Tommaso Venturini

    Abstract: Studying misinformation and how to deal with unhealthy behaviours within online discussions has recently become an important field of research within social studies. With the rapid development of social media, and the increasing amount of available information and sources, rigorous manual analysis of such discourses has become unfeasible. Many approaches tackle the issue by studying the semantic a… ▽ More

    Submitted 12 May, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: 15 pages. Related to the SoBigData++ project: https://plusplus.sobigdata.eu/

  13. arXiv:2109.15035  [pdf, other

    cs.LG cs.AI

    Focus! Rating XAI Methods and Finding Biases

    Authors: Anna Arias-Duart, Ferran Parés, Dario Garcia-Gasulla, Victor Gimenez-Abalos

    Abstract: AI explainability improves the transparency of models, making them more trustworthy. Such goals are motivated by the emergence of deep learning models, which are obscure by nature; even in the domain of images, where deep learning has succeeded the most, explainability is still poorly assessed. In the field of image recognition many feature attribution methods have been proposed with the purpose o… ▽ More

    Submitted 28 February, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

  14. arXiv:2009.13871  [pdf, other

    cs.CY cs.HC

    Signs for Ethical AI: A Route Towards Transparency

    Authors: Dario Garcia-Gasulla, Atia Cortés, Sergio Alvarez-Napagao, Ulises Cortés

    Abstract: Today, Artificial Intelligence (AI) has a direct impact on the daily life of billions of people. Being applied to sectors like finance, health, security and advertisement, AI fuels some of the biggest companies and research institutions in the world. Its impact in the near future seems difficult to predict or bound. In contrast to all this power, society remains mostly ignorant of the capabilities… ▽ More

    Submitted 9 May, 2022; v1 submitted 29 September, 2020; originally announced September 2020.

    Comments: 30 pages, 7 figures, 2 tables

  15. arXiv:2007.13693  [pdf, other

    cs.CV cs.LG

    The MAMe Dataset: On the relevance of High Resolution and Variable Shape image properties

    Authors: Ferran Parés, Anna Arias-Duart, Dario Garcia-Gasulla, Gema Campo-Francés, Nina Viladrich, Eduard Ayguadé, Jesús Labarta

    Abstract: In the image classification task, the most common approach is to resize all images in a dataset to a unique shape, while reducing their precision to a size which facilitates experimentation at scale. This practice has benefits from a computational perspective, but it entails negative side-effects on performance due to loss of information and image deformation. In this work we introduce the MAMe da… ▽ More

    Submitted 20 May, 2021; v1 submitted 27 July, 2020; originally announced July 2020.

  16. arXiv:2006.16189  [pdf

    q-bio.OT cs.LG

    DOME: Recommendations for supervised machine learning validation in biology

    Authors: Ian Walsh, Dmytro Fishman, Dario Garcia-Gasulla, Tiina Titma, Gianluca Pollastri, The ELIXIR Machine Learning focus group, Jen Harrow, Fotis E. Psomopoulos, Silvio C. E. Tosatto

    Abstract: Modern biology frequently relies on machine learning to provide predictions and improve decision processes. There have been recent calls for more scrutiny on machine learning performance and possible limitations. Here we present a set of community-wide recommendations aiming to help establish standards of supervised machine learning validation in biology. Adopting a structured methods description… ▽ More

    Submitted 7 January, 2021; v1 submitted 25 June, 2020; originally announced June 2020.

  17. arXiv:2006.05573  [pdf, other

    cs.SI cs.LG physics.soc-ph

    Global Data Science Project for COVID-19

    Authors: Toyotaro Suzumura, Dario Garcia-Gasulla, Sergio Alvarez Napagao, Irene Li, Hiroshi Maruyama, Hiroki Kanezashi, Raquel P'erez-Arnal, Kunihiko Miyoshi, Euma Ishii, Keita Suzuki, Sayaka Shiba, Mariko Kurokawa, Yuta Kanzawa, Naomi Nakagawa, Masatoshi Hanai, Yixin Li, Tianxiao Li

    Abstract: This paper aims at providing the summary of the Global Data Science Project (GDSC) for COVID-19. as on May 31 2020. COVID-19 has largely impacted on our societies through both direct and indirect effects transmitted by the policy measures to counter the spread of viruses. We quantitatively analysed the multifaceted impacts of the COVID-19 pandemic on our societies including people's mobility, heal… ▽ More

    Submitted 3 August, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: 42 pages, 49 figures

  18. arXiv:2006.02950  [pdf, other

    cs.SI cs.LG physics.soc-ph

    The Impact of COVID-19 on Flight Networks

    Authors: Toyotaro Suzumura, Hiroki Kanezashi, Mishal Dholakia, Euma Ishii, Sergio Alvarez Napagao, Raquel Pérez-Arnal, Dario Garcia-Gasulla, Toshiaki Murofushi

    Abstract: As COVID-19 transmissions spread worldwide, governments have announced and enforced travel restrictions to prevent further infections. Such restrictions have a direct effect on the volume of international flights among these countries, resulting in extensive social and economic costs. To better understand the situation in a quantitative manner, we used the Opensky network data to clarify flight pa… ▽ More

    Submitted 14 February, 2021; v1 submitted 4 June, 2020; originally announced June 2020.

    Comments: 12 pages, 42 figures. Toyotaro Suzumura and Hiroki Kanezashi contributed equally to this work

  19. arXiv:2004.10899  [pdf, other

    cs.CL cs.CY cs.LG

    What are We Depressed about When We Talk about COVID19: Mental Health Analysis on Tweets Using Natural Language Processing

    Authors: Irene Li, Yixin Li, Tianxiao Li, Sergio Alvarez-Napagao, Dario Garcia-Gasulla, Toyotaro Suzumura

    Abstract: The outbreak of coronavirus disease 2019 (COVID-19) recently has affected human life to a great extent. Besides direct physical and economic threats, the pandemic also indirectly impact people's mental health conditions, which can be overwhelming but difficult to measure. The problem may come from various reasons such as unemployment status, stay-at-home policy, fear for the virus, and so forth. I… ▽ More

    Submitted 8 June, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

    Comments: 7 pages, 7 figures

  20. arXiv:2002.01284  [pdf, other

    cs.CV eess.IV

    Obstruction level detection of sewer videos using convolutional neural networks

    Authors: Mario A. Gutierrez-Mondragon, Dario Garcia-Gasulla, Sergio Alvarez-Napagao, Jaume Brossa-Ordoñez, Rafael Gimenez-Esteban

    Abstract: Worldwide, sewer networks are designed to transport wastewater to a centralized treatment plant to be treated and returned to the environment. This process is critical for the current society, preventing waterborne illnesses, providing safe drinking water and enhancing general sanitation. To keep a sewer network perfectly operational, sampling inspections are performed constantly to identify obstr… ▽ More

    Submitted 4 February, 2020; originally announced February 2020.

  21. arXiv:1911.11471  [pdf, other

    q-bio.GN cs.LG

    Random Forest as a Tumour Genetic Marker Extractor

    Authors: Raquel Pérez-Arnal, Dario Garcia-Gasulla, David Torrents, Ferran Parés, Ulises Cortés, Jesús Labarta, Eduard Ayguadé

    Abstract: Finding tumour genetic markers is essential to biomedicine due to their relevance for cancer detection and therapy development. In this paper, we explore a recently released dataset of chromosome rearrangements in 2,586 cancer patients, where different sorts of alterations have been detected. Using a Random Forest classifier, we evaluate the relevance of several features (some directly available i… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

  22. Towards a Goal-oriented Agent-based Simulation framework for High-Performance Computing

    Authors: Dmitry Gnatyshak, Luis Oliva-Felipe, Sergio Álvarez-Napagao, Julian Padget, Javier Vázquez-Salceda, Dario Garcia-Gasulla, Ulises Cortés

    Abstract: Currently, agent-based simulation frameworks force the user to choose between simulations involving a large number of agents (at the expense of limited agent reasoning capability) or simulations including agents with increased reasoning capabilities (at the expense of a limited number of agents per simulation). This paper describes a first attempt at putting goal-oriented agents into large agent-b… ▽ More

    Submitted 22 November, 2019; originally announced November 2019.

    Journal ref: Frontiers in Artificial Intelligence and Applications 319 (2019) 329-338

  23. arXiv:1911.08953   

    cs.CV

    MetH: A family of high-resolution and variable-shape image challenges

    Authors: Ferran Parés, Dario Garcia-Gasulla, Harald Servat, Jesús Labarta, Eduard Ayguadé

    Abstract: High-resolution and variable-shape images have not yet been properly addressed by the AI community. The approach of down-sampling data often used with convolutional neural networks is sub-optimal for many tasks, and has too many drawbacks to be considered a sustainable alternative. In sight of the increasing importance of problems that can benefit from exploiting high-resolution (HR) and variable-… ▽ More

    Submitted 29 September, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: An improved and extended version of this paper has been published in arXiv:2007.13693 This version is now obsolete

  24. Feature discriminativity estimation in CNNs for transfer learning

    Authors: Victor Gimenez-Abalos, Armand Vilalta, Dario Garcia-Gasulla, Jesus Labarta, Eduard Ayguadé

    Abstract: The purpose of feature extraction on convolutional neural networks is to reuse deep representations learnt for a pre-trained model to solve a new, potentially unrelated problem. However, raw feature extraction from all layers is unfeasible given the massive size of these networks. Recently, a supervised method using complexity reduction was proposed, resulting in significant improvements in perfor… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

    Comments: Presented in the 22nd International Conference of the Catalan Association for Artificial Intelligence (CCIA 19)

    Journal ref: Volume 319: Artificial Intelligence Research and Development 2019

  25. Adaptive Pattern Matching with Reinforcement Learning for Dynamic Graphs

    Authors: Hiroki Kanezashi, Toyotaro Suzumura, Dario Garcia-Gasulla, Min-hwan Oh, Satoshi Matsuoka

    Abstract: Graph pattern matching algorithms to handle million-scale dynamic graphs are widely used in many applications such as social network analytics and suspicious transaction detections from financial networks. On the other hand, the computation complexity of many graph pattern matching algorithms is expensive, and it is not affordable to extract patterns from million-scale graphs. Moreover, most real-… ▽ More

    Submitted 21 December, 2018; originally announced December 2018.

    Comments: 10 pages and 11 figures

  26. arXiv:1804.09558  [pdf, other

    cs.CL cs.AI cs.LG cs.NE stat.ML

    A Visual Distance for WordNet

    Authors: Raquel Pérez-Arnal, Armand Vilalta, Dario Garcia-Gasulla, Ulises Cortés, Eduard Ayguadé, Jesus Labarta

    Abstract: Measuring the distance between concepts is an important field of study of Natural Language Processing, as it can be used to improve tasks related to the interpretation of those same concepts. WordNet, which includes a wide variety of concepts associated with words (i.e., synsets), is often used as a source for computing those distances. In this paper, we explore a distance for WordNet synsets base… ▽ More

    Submitted 27 April, 2018; v1 submitted 24 April, 2018; originally announced April 2018.

  27. arXiv:1707.09872  [pdf, other

    cs.CV cs.CL cs.NE

    Full-Network Embedding in a Multimodal Embedding Pipeline

    Authors: Armand Vilalta, Dario Garcia-Gasulla, Ferran Parés, Eduard Ayguadé, Jesus Labarta, Ulises Cortés, Toyotaro Suzumura

    Abstract: The current state-of-the-art for image annotation and image retrieval tasks is obtained through deep neural networks, which combine an image representation and a text representation into a shared embedding space. In this paper we evaluate the impact of using the Full-Network embedding in this setting, replacing the original image representation in a competitive multimodal embedding generation sche… ▽ More

    Submitted 9 August, 2017; v1 submitted 24 July, 2017; originally announced July 2017.

    Comments: In 2nd Workshop on Semantic Deep Learning (SemDeep-2) at the 12th International Conference on Computational Semantics (IWCS) 2017

  28. arXiv:1707.07465  [pdf, other

    cs.NE

    Building Graph Representations of Deep Vector Embeddings

    Authors: Dario Garcia-Gasulla, Armand Vilalta, Ferran Parés, Jonatan Moreno, Eduard Ayguadé, Jesus Labarta, Ulises Cortés, Toyotaro Suzumura

    Abstract: Patterns stored within pre-trained deep neural networks compose large and powerful descriptive languages that can be used for many different purposes. Typically, deep network representations are implemented within vector embedding spaces, which enables the use of traditional machine learning algorithms on top of them. In this short paper we propose the construction of a graph embedding space inste… ▽ More

    Submitted 9 August, 2017; v1 submitted 24 July, 2017; originally announced July 2017.

    Comments: Accepted at the 2nd Workshop on Semantic Deep Learning (SemDeep-2)

  29. arXiv:1705.07706  [pdf, other

    cs.LG cs.NE

    An Out-of-the-box Full-network Embedding for Convolutional Neural Networks

    Authors: Dario Garcia-Gasulla, Armand Vilalta, Ferran Parés, Jonatan Moreno, Eduard Ayguadé, Jesus Labarta, Ulises Cortés, Toyotaro Suzumura

    Abstract: Transfer learning for feature extraction can be used to exploit deep representations in contexts where there is very few training data, where there are limited computational resources, or when tuning the hyper-parameters needed for training is not an option. While previous contributions to feature extraction propose embeddings based on a single layer of the network, in this paper we propose a full… ▽ More

    Submitted 22 May, 2017; originally announced May 2017.

  30. arXiv:1703.09307  [pdf, other

    cs.DS cs.SI physics.soc-ph

    Fluid Communities: A Competitive, Scalable and Diverse Community Detection Algorithm

    Authors: Ferran Parés, Dario Garcia-Gasulla, Armand Vilalta, Jonatan Moreno, Eduard Ayguadé, Jesús Labarta, Ulises Cortés, Toyotaro Suzumura

    Abstract: We introduce a community detection algorithm (Fluid Communities) based on the idea of fluids interacting in an environment, expanding and contracting as a result of that interaction. Fluid Communities is based on the propagation methodology, which represents the state-of-the-art in terms of computational cost and scalability. While being highly efficient, Fluid Communities is able to find communit… ▽ More

    Submitted 9 October, 2017; v1 submitted 27 March, 2017; originally announced March 2017.

    Comments: Accepted at the 6th International Conference on Complex Networks and Their Applications

  31. arXiv:1703.01127  [pdf, other

    cs.NE cs.AI cs.LG stat.ML

    On the Behavior of Convolutional Nets for Feature Extraction

    Authors: Dario Garcia-Gasulla, Ferran Parés, Armand Vilalta, Jonatan Moreno, Eduard Ayguadé, Jesús Labarta, Ulises Cortés, Toyotaro Suzumura

    Abstract: Deep neural networks are representation learning techniques. During training, a deep net is capable of generating a descriptive language of unprecedented size and detail in machine learning. Extracting the descriptive language coded within a trained CNN model (in the case of image data), and reusing it for other purposes is a field of interest, as it provides access to the visual descriptors previ… ▽ More

    Submitted 29 January, 2018; v1 submitted 3 March, 2017; originally announced March 2017.

    Comments: Published in the Journal of Artificial Intelligence Research (JAIR), Special Track on Deep Learning, Knowledge Representation, and Reasoning

  32. arXiv:1611.09084  [pdf, other

    cs.DS cs.IR cs.SI

    Hierarchical Hyperlink Prediction for the WWW

    Authors: Dario Garcia-Gasulla, Eduard Ayguadé, Jesús Labarta, Ulises Cortés, Toyotaro Suzumura

    Abstract: The hyperlink prediction task, that of proposing new links between webpages, can be used to improve search engines, expand the visibility of web pages, and increase the connectivity and navigability of the web. Hyperlink prediction is typically performed on webgraphs composed by thousands or millions of vertices, where on average each webpage contains less than fifty links. Algorithms processing g… ▽ More

    Submitted 28 November, 2016; originally announced November 2016.

    Comments: Submitted to Transactions on Internet Technology journal

  33. arXiv:1611.00547  [pdf, other

    cs.SI cs.AI cs.DB

    Limitations and Alternatives for the Evaluation of Large-scale Link Prediction

    Authors: Dario Garcia-Gasulla, Eduard Ayguadé, Jesús Labarta, Ulises Cortés

    Abstract: Link prediction, the problem of identifying missing links among a set of inter-related data entities, is a popular field of research due to its application to graph-like domains. Producing consistent evaluations of the performance of the many link prediction algorithms being proposed can be challenging due to variable graph properties, such as size and density. In this paper we first discuss tradi… ▽ More

    Submitted 25 November, 2016; v1 submitted 2 November, 2016; originally announced November 2016.

    Comments: Submitted to New Generation Computing, 15 pages, 4 tables, 4 figures

  34. arXiv:1507.08818  [pdf, other

    cs.CV cs.LG cs.NE

    A Visual Embedding for the Unsupervised Extraction of Abstract Semantics

    Authors: D. Garcia-Gasulla, J. Béjar, U. Cortés, E. Ayguadé, J. Labarta, T. Suzumura, R. Chen

    Abstract: Vector-space word representations obtained from neural network models have been shown to enable semantic operations based on vector arithmetic. In this paper, we explore the existence of similar information on vector representations of images. For that purpose we define a methodology to obtain large, sparse vector representations of image classes, and generate vectors through the state-of-the-art… ▽ More

    Submitted 16 December, 2016; v1 submitted 31 July, 2015; originally announced July 2015.

    Comments: 14 pages, 5 figures, accepted at Cognitive Systems Research