-
A Systematic Analysis of Hybrid Linear Attention
Authors:
Dustin Wang,
Rui-Jie Zhu,
Steven Abreu,
Yong Shan,
Taylor Kergan,
Yuqi Pan,
Yuhong Chou,
Zheng Li,
Ge Zhang,
Wenhao Huang,
Jason Eshraghian
Abstract:
Transformers face quadratic complexity and memory issues with long sequences, prompting the adoption of linear attention mechanisms using fixed-size hidden states. However, linear models often suffer from limited recall performance, leading to hybrid architectures that combine linear and full attention layers. Despite extensive hybrid architecture research, the choice of linear attention component…
▽ More
Transformers face quadratic complexity and memory issues with long sequences, prompting the adoption of linear attention mechanisms using fixed-size hidden states. However, linear models often suffer from limited recall performance, leading to hybrid architectures that combine linear and full attention layers. Despite extensive hybrid architecture research, the choice of linear attention component has not been deeply explored. We systematically evaluate various linear attention models across generations - vector recurrences to advanced gating mechanisms - both standalone and hybridized. To enable this comprehensive analysis, we trained and open-sourced 72 models: 36 at 340M parameters (20B tokens) and 36 at 1.3B parameters (100B tokens), covering six linear attention variants across five hybridization ratios. Benchmarking on standard language modeling and recall tasks reveals that superior standalone linear models do not necessarily excel in hybrids. While language modeling remains stable across linear-to-full attention ratios, recall significantly improves with increased full attention layers, particularly below a 3:1 ratio. Our study highlights selective gating, hierarchical recurrence, and controlled forgetting as critical for effective hybrid models. We recommend architectures such as HGRN-2 or GatedDeltaNet with a linear-to-full ratio between 3:1 and 6:1 to achieve Transformer-level recall efficiently. Our models are open-sourced at https://huggingface.co/collections/m-a-p/hybrid-linear-attention-research-686c488a63d609d2f20e2b1e.
△ Less
Submitted 8 July, 2025;
originally announced July 2025.
-
An Algebraic Approach to Weighted Answer-set Programming
Authors:
Francisco Coelho,
Bruno Dinis,
Dietmar Seipel,
Salvador Abreu
Abstract:
Logic programs, more specifically, Answer-set programs, can be annotated with probabilities on facts to express uncertainty. We address the problem of propagating weight annotations on facts (eg probabilities) of an ASP to its standard models, and from there to events (defined as sets of atoms) in a dataset over the program's domain. We propose a novel approach which is algebraic in the sense that…
▽ More
Logic programs, more specifically, Answer-set programs, can be annotated with probabilities on facts to express uncertainty. We address the problem of propagating weight annotations on facts (eg probabilities) of an ASP to its standard models, and from there to events (defined as sets of atoms) in a dataset over the program's domain. We propose a novel approach which is algebraic in the sense that it relies on an equivalence relation over the set of events. Uncertainty is then described as polynomial expressions over variables. We propagate the weight function in the space of models and events, rather than doing so within the syntax of the program. As evidence that our approach is sound, we show that certain facts behave as expected. Our approach allows us to investigate weight annotated programs and to determine how suitable a given one is for modeling a given dataset containing events.
△ Less
Submitted 28 March, 2025; v1 submitted 26 March, 2025;
originally announced March 2025.
-
Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2
Authors:
Steven Abreu,
Sumit Bam Shrestha,
Rui-Jie Zhu,
Jason Eshraghian
Abstract:
Large language models (LLMs) deliver impressive performance but require large amounts of energy. In this work, we present a MatMul-free LLM architecture adapted for Intel's neuromorphic processor, Loihi 2. Our approach leverages Loihi 2's support for low-precision, event-driven computation and stateful processing. Our hardware-aware quantized model on GPU demonstrates that a 370M parameter MatMul-…
▽ More
Large language models (LLMs) deliver impressive performance but require large amounts of energy. In this work, we present a MatMul-free LLM architecture adapted for Intel's neuromorphic processor, Loihi 2. Our approach leverages Loihi 2's support for low-precision, event-driven computation and stateful processing. Our hardware-aware quantized model on GPU demonstrates that a 370M parameter MatMul-free model can be quantized with no accuracy loss. Based on preliminary results, we report up to 3x higher throughput with 2x less energy, compared to transformer-based LLMs on an edge GPU, with significantly better scaling. Further hardware optimizations will increase throughput and decrease energy consumption. These results show the potential of neuromorphic hardware for efficient inference and pave the way for efficient reasoning models capable of generating complex, long-form text rapidly and cost-effectively.
△ Less
Submitted 25 March, 2025; v1 submitted 11 February, 2025;
originally announced March 2025.
-
Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity
Authors:
Alessandro Pierro,
Steven Abreu,
Jonathan Timcheck,
Philipp Stratmann,
Andreas Wild,
Sumit Bam Shrestha
Abstract:
Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference. These architectures hold promise for streaming applications at the edge, but deployment in resource-constrained environments requires hardware-aware optimizations to minimize latency and energy consumption. Unstructured sparsity offers a compelling solution,…
▽ More
Linear recurrent neural networks enable powerful long-range sequence modeling with constant memory usage and time-per-token during inference. These architectures hold promise for streaming applications at the edge, but deployment in resource-constrained environments requires hardware-aware optimizations to minimize latency and energy consumption. Unstructured sparsity offers a compelling solution, enabling substantial reductions in compute and memory requirements--when accelerated by compatible hardware platforms. In this paper, we conduct a scaling study to investigate the Pareto front of performance and efficiency across inference compute budgets. We find that highly sparse linear RNNs consistently achieve better efficiency-performance trade-offs than dense baselines, with 2x less compute and 36% less memory at iso-accuracy. Our models achieve state-of-the-art results on a real-time streaming task for audio denoising. By quantizing our sparse models to fixed-point arithmetic and deploying them on the Intel Loihi 2 neuromorphic chip for real-time processing, we translate model compression into tangible gains of 42x lower latency and 149x lower energy consumption compared to a dense model on an edge GPU. Our findings showcase the transformative potential of unstructured sparsity, paving the way for highly efficient recurrent neural networks in real-world, resource-constrained environments.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
Neuromorphic Programming: Emerging Directions for Brain-Inspired Hardware
Authors:
Steven Abreu,
Jens E. Pedersen
Abstract:
The value of brain-inspired neuromorphic computers critically depends on our ability to program them for relevant tasks. Currently, neuromorphic hardware often relies on machine learning methods adapted from deep learning. However, neuromorphic computers have potential far beyond deep learning if we can only harness their energy efficiency and full computational power. Neuromorphic programming wil…
▽ More
The value of brain-inspired neuromorphic computers critically depends on our ability to program them for relevant tasks. Currently, neuromorphic hardware often relies on machine learning methods adapted from deep learning. However, neuromorphic computers have potential far beyond deep learning if we can only harness their energy efficiency and full computational power. Neuromorphic programming will necessarily be different from conventional programming, requiring a paradigm shift in how we think about programming. This paper presents a conceptual analysis of programming within the context of neuromorphic computing, challenging conventional paradigms and proposing a framework that aligns more closely with the physical intricacies of these systems. Our analysis revolves around five characteristics that are fundamental to neuromorphic programming and provides a basis for comparison to contemporary programming methods and languages. By studying past approaches, we contribute a framework that advocates for underutilized techniques and calls for richer abstractions to effectively instrument the new hardware class.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
Authors:
Joris Postmus,
Steven Abreu
Abstract:
Large language models have transformed AI, yet reliably controlling their outputs remains a challenge. This paper explores activation engineering, where outputs of pre-trained LLMs are controlled by manipulating their activations at inference time. Unlike traditional methods using a single steering vector, we introduce conceptors - mathematical constructs that represent sets of activation vectors…
▽ More
Large language models have transformed AI, yet reliably controlling their outputs remains a challenge. This paper explores activation engineering, where outputs of pre-trained LLMs are controlled by manipulating their activations at inference time. Unlike traditional methods using a single steering vector, we introduce conceptors - mathematical constructs that represent sets of activation vectors as ellipsoidal regions. Conceptors act as soft projection matrices and offer more precise control over complex activation patterns. Our experiments demonstrate that conceptors outperform traditional methods across multiple steering tasks. We further use Boolean operations on conceptors for combined steering goals that empirically outperform additively combining steering vectors on a set of tasks. These results highlight conceptors as a promising tool for more effective steering of LLMs. Our code is available on github.com/jorispos/conceptorsteering.
△ Less
Submitted 12 May, 2025; v1 submitted 9 October, 2024;
originally announced October 2024.
-
EmBARDiment: an Embodied AI Agent for Productivity in XR
Authors:
Riccardo Bovo,
Steven Abreu,
Karan Ahuja,
Eric J Gonzalez,
Li-Te Cheng,
Mar Gonzalez-Franco
Abstract:
XR devices running chat-bots powered by Large Language Models (LLMs) have the to become always-on agents that enable much better productivity scenarios. Current screen based chat-bots do not take advantage of the the full-suite of natural inputs available in XR, including inward facing sensor data, instead they over-rely on explicit voice or text prompts, sometimes paired with multi-modal data dro…
▽ More
XR devices running chat-bots powered by Large Language Models (LLMs) have the to become always-on agents that enable much better productivity scenarios. Current screen based chat-bots do not take advantage of the the full-suite of natural inputs available in XR, including inward facing sensor data, instead they over-rely on explicit voice or text prompts, sometimes paired with multi-modal data dropped as part of the query. We propose a solution that leverages an attention framework that derives context implicitly from user actions, eye-gaze, and contextual memory within the XR environment. Our work minimizes the need for engineered explicit prompts, fostering grounded and intuitive interactions that glean user insights for the chat-bot.
△ Less
Submitted 11 March, 2025; v1 submitted 15 August, 2024;
originally announced August 2024.
-
Mamba-PTQ: Outlier Channels in Recurrent Large Language Models
Authors:
Alessandro Pierro,
Steven Abreu
Abstract:
Modern recurrent layers are emerging as a promising path toward edge deployment of foundation models, especially in the context of large language models (LLMs). Compressing the whole input sequence in a finite-dimensional representation enables recurrent layers to model long-range dependencies while maintaining a constant inference cost for each token and a fixed memory requirement. However, the p…
▽ More
Modern recurrent layers are emerging as a promising path toward edge deployment of foundation models, especially in the context of large language models (LLMs). Compressing the whole input sequence in a finite-dimensional representation enables recurrent layers to model long-range dependencies while maintaining a constant inference cost for each token and a fixed memory requirement. However, the practical deployment of LLMs in resource-limited environments often requires further model compression, such as quantization and pruning. While these techniques are well-established for attention-based models, their effects on recurrent layers remain underexplored.
In this preliminary work, we focus on post-training quantization for recurrent LLMs and show that Mamba models exhibit the same pattern of outlier channels observed in attention-based LLMs. We show that the reason for the difficulty of quantizing SSMs is caused by activation outliers, similar to those observed in transformer-based LLMs. We report baseline results for post-training quantization of Mamba that do not take into account the activation outliers and suggest first steps for outlier-aware quantization.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos
Authors:
Steven Abreu,
Tiffany D. Do,
Karan Ahuja,
Eric J. Gonzalez,
Lee Payne,
Daniel McDuff,
Mar Gonzalez-Franco
Abstract:
Intelligent assistance involves not only understanding but also action. Existing ego-centric video datasets contain rich annotations of the videos, but not of actions that an intelligent assistant could perform in the moment. To address this gap, we release PARSE-Ego4D, a new set of personal action recommendation annotations for the Ego4D dataset. We take a multi-stage approach to generating and e…
▽ More
Intelligent assistance involves not only understanding but also action. Existing ego-centric video datasets contain rich annotations of the videos, but not of actions that an intelligent assistant could perform in the moment. To address this gap, we release PARSE-Ego4D, a new set of personal action recommendation annotations for the Ego4D dataset. We take a multi-stage approach to generating and evaluating these annotations. First, we used a prompt-engineered large language model (LLM) to generate context-aware action suggestions and identified over 18,000 action suggestions. While these synthetic action suggestions are valuable, the inherent limitations of LLMs necessitate human evaluation. To ensure high-quality and user-centered recommendations, we conducted a large-scale human annotation study that provides grounding in human preferences for all of PARSE-Ego4D. We analyze the inter-rater agreement and evaluate subjective preferences of participants. Based on our synthetic dataset and complete human annotations, we propose several new tasks for action suggestions based on ego-centric videos. We encourage novel solutions that improve latency and energy requirements. The annotations in PARSE-Ego4D will support researchers and developers who are working on building action recommendation systems for augmented and virtual reality systems.
△ Less
Submitted 25 July, 2024; v1 submitted 14 June, 2024;
originally announced July 2024.
-
Q-S5: Towards Quantized State Space Models
Authors:
Steven Abreu,
Jens E. Pedersen,
Kade M. Heckel,
Alessandro Pierro
Abstract:
In the quest for next-generation sequence modeling architectures, State Space Models (SSMs) have emerged as a potent alternative to transformers, particularly for their computational efficiency and suitability for dynamical systems. This paper investigates the effect of quantization on the S5 model to understand its impact on model performance and to facilitate its deployment to edge and resource-…
▽ More
In the quest for next-generation sequence modeling architectures, State Space Models (SSMs) have emerged as a potent alternative to transformers, particularly for their computational efficiency and suitability for dynamical systems. This paper investigates the effect of quantization on the S5 model to understand its impact on model performance and to facilitate its deployment to edge and resource-constrained platforms. Using quantization-aware training (QAT) and post-training quantization (PTQ), we systematically evaluate the quantization sensitivity of SSMs across different tasks like dynamical systems modeling, Sequential MNIST (sMNIST) and most of the Long Range Arena (LRA). We present fully quantized S5 models whose test accuracy drops less than 1% on sMNIST and most of the LRA. We find that performance on most tasks degrades significantly for recurrent weights below 8-bit precision, but that other components can be compressed further without significant loss of performance. Our results further show that PTQ only performs well on language-based LRA tasks whereas all others require QAT. Our investigation provides necessary insights for the continued development of efficient and hardware-optimized SSMs.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Neuromorphic Intermediate Representation: A Unified Instruction Set for Interoperable Brain-Inspired Computing
Authors:
Jens E. Pedersen,
Steven Abreu,
Matthias Jobst,
Gregor Lenz,
Vittorio Fra,
Felix C. Bauer,
Dylan R. Muir,
Peng Zhou,
Bernhard Vogginger,
Kade Heckel,
Gianvito Urgese,
Sadasivan Shankar,
Terrence C. Stewart,
Sadique Sheik,
Jason K. Eshraghian
Abstract:
Spiking neural networks and neuromorphic hardware platforms that simulate neuronal dynamics are getting wide attention and are being applied to many relevant problems using Machine Learning. Despite a well-established mathematical foundation for neural dynamics, there exists numerous software and hardware solutions and stacks whose variability makes it difficult to reproduce findings. Here, we est…
▽ More
Spiking neural networks and neuromorphic hardware platforms that simulate neuronal dynamics are getting wide attention and are being applied to many relevant problems using Machine Learning. Despite a well-established mathematical foundation for neural dynamics, there exists numerous software and hardware solutions and stacks whose variability makes it difficult to reproduce findings. Here, we establish a common reference frame for computations in digital neuromorphic systems, titled Neuromorphic Intermediate Representation (NIR). NIR defines a set of computational and composable model primitives as hybrid systems combining continuous-time dynamics and discrete events. By abstracting away assumptions around discretization and hardware constraints, NIR faithfully captures the computational model, while bridging differences between the evaluated implementation and the underlying mathematical formalism. NIR supports an unprecedented number of neuromorphic systems, which we demonstrate by reproducing three spiking neural network models of different complexity across 7 neuromorphic simulators and 4 digital hardware platforms. NIR decouples the development of neuromorphic hardware and software, enabling interoperability between platforms and improving accessibility to multiple neuromorphic technologies. We believe that NIR is a key next step in brain-inspired hardware-software co-evolution, enabling research towards the implementation of energy efficient computational principles of nervous systems. NIR is available at neuroir.org
△ Less
Submitted 30 September, 2024; v1 submitted 24 November, 2023;
originally announced November 2023.
-
Concepts and Paradigms for Neuromorphic Programming
Authors:
Steven Abreu
Abstract:
The value of neuromorphic computers depends crucially on our ability to program them for relevant tasks. Currently, neuromorphic computers are mostly limited to machine learning methods adapted from deep learning. However, neuromorphic computers have potential far beyond deep learning if we can only make use of their computational properties to harness their full power. Neuromorphic programming wi…
▽ More
The value of neuromorphic computers depends crucially on our ability to program them for relevant tasks. Currently, neuromorphic computers are mostly limited to machine learning methods adapted from deep learning. However, neuromorphic computers have potential far beyond deep learning if we can only make use of their computational properties to harness their full power. Neuromorphic programming will necessarily be different from conventional programming, requiring a paradigm shift in how we think about programming in general. The contributions of this paper are 1) a conceptual analysis of what "programming" means in the context of neuromorphic computers and 2) an exploration of existing programming paradigms that are promising yet overlooked in neuromorphic computing. The goal is to expand the horizon of neuromorphic programming methods, thereby allowing researchers to move beyond the shackles of current methods and explore novel directions.
△ Less
Submitted 27 October, 2023;
originally announced October 2023.
-
Demonstrating (Hybrid) Active Logic Documents and the Ciao Prolog Playground, and an Application to Verification Tutorials
Authors:
Daniela Ferreiro,
José F. Morales,
Salvador Abreu,
Manuel V. Hermenegildo
Abstract:
Active Logic Documents (ALD) are web pages which incorporate embedded Prolog engines that run locally within the browser. ALD offers both a very easy way to add click-to-run capabilities to any kind of teaching materials, independently of the tool used to generate them, as well as a tool-set for generating web-based materials with embedded examples and exercises. Both leverage on (components of)…
▽ More
Active Logic Documents (ALD) are web pages which incorporate embedded Prolog engines that run locally within the browser. ALD offers both a very easy way to add click-to-run capabilities to any kind of teaching materials, independently of the tool used to generate them, as well as a tool-set for generating web-based materials with embedded examples and exercises. Both leverage on (components of) the Ciao Prolog Playground. We present a demonstration of the ALD approach and the Ciao Prolog Playground, as well as a recent extension to ALDs to facilitate the integration of other tools into the system for creating Hybrid Active Logic Documents (HALD). We also present a concrete application of these technologies to the creation of tutorials for a program verification tool.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Training a spiking neural network on an event-based label-free flow cytometry dataset
Authors:
Muhammed Gouda,
Steven Abreu,
Alessio Lugnan,
Peter Bienstman
Abstract:
Imaging flow cytometry systems aim to analyze a huge number of cells or micro-particles based on their physical characteristics. The vast majority of current systems acquire a large amount of images which are used to train deep artificial neural networks. However, this approach increases both the latency and power consumption of the final apparatus. In this work-in-progress, we combine an event-ba…
▽ More
Imaging flow cytometry systems aim to analyze a huge number of cells or micro-particles based on their physical characteristics. The vast majority of current systems acquire a large amount of images which are used to train deep artificial neural networks. However, this approach increases both the latency and power consumption of the final apparatus. In this work-in-progress, we combine an event-based camera with a free-space optical setup to obtain spikes for each particle passing in a microfluidic channel. A spiking neural network is trained on the collected dataset, resulting in 97.7% mean training accuracy and 93.5% mean testing accuracy for the fully event-based classification pipeline.
△ Less
Submitted 19 March, 2023;
originally announced March 2023.
-
Fifty Years of Prolog and Beyond
Authors:
Philipp Körner,
Michael Leuschel,
João Barbosa,
Vítor Santos Costa,
Verónica Dahl,
Manuel V. Hermenegildo,
Jose F. Morales,
Jan Wielemaker,
Daniel Diaz,
Salvador Abreu,
Giovanni Ciatto
Abstract:
Both logic programming in general, and Prolog in particular, have a long and fascinating history, intermingled with that of many disciplines they inherited from or catalyzed. A large body of research has been gathered over the last 50 years, supported by many Prolog implementations. Many implementations are still actively developed, while new ones keep appearing. Often, the features added by diffe…
▽ More
Both logic programming in general, and Prolog in particular, have a long and fascinating history, intermingled with that of many disciplines they inherited from or catalyzed. A large body of research has been gathered over the last 50 years, supported by many Prolog implementations. Many implementations are still actively developed, while new ones keep appearing. Often, the features added by different systems were motivated by the interdisciplinary needs of programmers and implementors, yielding systems that, while sharing the "classic" core language, and, in particular, the main aspects of the ISO-Prolog standard, also depart from each other in other aspects. This obviously poses challenges for code portability. The field has also inspired many related, but quite different languages that have created their own communities.
This article aims at integrating and applying the main lessons learned in the process of evolution of Prolog. It is structured into three major parts. Firstly, we overview the evolution of Prolog systems and the community approximately up to the ISO standard, considering both the main historic developments and the motivations behind several Prolog implementations, as well as other logic programming languages influenced by Prolog. Then, we discuss the Prolog implementations that are most active after the appearance of the standard: their visions, goals, commonalities, and incompatibilities. Finally, we perform a SWOT analysis in order to better identify the potential of Prolog, and propose future directions along which Prolog might continue to add useful features, interfaces, libraries, and tools, while at the same time improving compatibility between implementations.
△ Less
Submitted 14 March, 2022; v1 submitted 26 January, 2022;
originally announced January 2022.
-
Generating Local Search Neighborhood with Synthesized Logic Programs
Authors:
Mateusz Ślażyński,
Salvador Abreu,
Grzegorz J. Nalepa
Abstract:
Local Search meta-heuristics have been proven a viable approach to solve difficult optimization problems. Their performance depends strongly on the search space landscape, as defined by a cost function and the selected neighborhood operators. In this paper we present a logic programming based framework, named Noodle, designed to generate bespoke Local Search neighborhoods tailored to specific dis…
▽ More
Local Search meta-heuristics have been proven a viable approach to solve difficult optimization problems. Their performance depends strongly on the search space landscape, as defined by a cost function and the selected neighborhood operators. In this paper we present a logic programming based framework, named Noodle, designed to generate bespoke Local Search neighborhoods tailored to specific discrete optimization problems. The proposed system consists of a domain specific language, which is inspired by logic programming, as well as a genetic programming solver, based on the grammar evolution algorithm. We complement the description with a preliminary experimental evaluation, where we synthesize efficient neighborhood operators for the traveling salesman problem, some of which reproduce well-known results.
△ Less
Submitted 18 September, 2019;
originally announced September 2019.
-
Pre-proceedings of the DECLARE 2019 Conference
Authors:
Salvador Abreu,
Petra Hofstedt,
Ulrich John,
Herbert Kuchen,
Dietmar Seipel
Abstract:
This volume constitutes the pre-proceedings of the DECLARE 2019 conference, held on September 9 to 13, 2019 at the University of Technology Cottbus - Senftenberg (Germany).
Declarative programming is an advanced paradigm for the modeling and solving of complex problems. This method has attracted increased attention over the last decades, e.g., in the domains of data and knowledge engineering, da…
▽ More
This volume constitutes the pre-proceedings of the DECLARE 2019 conference, held on September 9 to 13, 2019 at the University of Technology Cottbus - Senftenberg (Germany).
Declarative programming is an advanced paradigm for the modeling and solving of complex problems. This method has attracted increased attention over the last decades, e.g., in the domains of data and knowledge engineering, databases, artificial intelligence, natural language processing, modeling and processing combinatorial problems, and for establishing systems for the web.
The conference DECLARE 2019 aims at cross-fertilizing exchange of ideas and experiences among researches and students from the different communities interested in the foundations, implementation techniques, novel applications, and combinations of high-level, declarative programming and related areas. The technical program of the event included invited talks, presentations of refereed papers, and system demonstrations. DECLARE 2019 consisted of the sub-events INAP, WFLP, and WLP:
INAP - 22nd International Conference on Applications of Declarative Programming and Knowledge Management WFLP - 27th International Workshop on Functional and (Constraint) Logic Programming WLP - 33rd Workshop on (Constraint) Logic Programming
△ Less
Submitted 21 November, 2019; v1 submitted 11 September, 2019;
originally announced September 2019.
-
Automated Architecture Design for Deep Neural Networks
Authors:
Steven Abreu
Abstract:
Machine learning has made tremendous progress in recent years and received large amounts of public attention. Though we are still far from designing a full artificially intelligent agent, machine learning has brought us many applications in which computers solve human learning tasks remarkably well. Much of this progress comes from a recent trend within machine learning, called deep learning. Deep…
▽ More
Machine learning has made tremendous progress in recent years and received large amounts of public attention. Though we are still far from designing a full artificially intelligent agent, machine learning has brought us many applications in which computers solve human learning tasks remarkably well. Much of this progress comes from a recent trend within machine learning, called deep learning. Deep learning models are responsible for many state-of-the-art applications of machine learning. Despite their success, deep learning models are hard to train, very difficult to understand, and often times so complex that training is only possible on very large GPU clusters. Lots of work has been done on enabling neural networks to learn efficiently. However, the design and architecture of such neural networks is often done manually through trial and error and expert knowledge. This thesis inspects different approaches, existing and novel, to automate the design of deep feedforward neural networks in an attempt to create less complex models with good performance that take away the burden of deciding on an architecture and make it more efficient to design and train such deep networks.
△ Less
Submitted 21 August, 2019;
originally announced August 2019.
-
Experimenting with X10 for Parallel Constraint-Based Local Search
Authors:
Danny Munera,
Daniel Diaz,
Salvador Abreu
Abstract:
In this study, we have investigated the adequacy of the PGAS parallel language X10 to implement a Constraint-Based Local Search solver. We decided to code in this language to benefit from the ease of use and architectural independence from parallel resources which it offers. We present the implementation strategy, in search of different sources of parallelism in the context of an implementation of…
▽ More
In this study, we have investigated the adequacy of the PGAS parallel language X10 to implement a Constraint-Based Local Search solver. We decided to code in this language to benefit from the ease of use and architectural independence from parallel resources which it offers. We present the implementation strategy, in search of different sources of parallelism in the context of an implementation of the Adaptive Search algorithm. We extensively discuss the algorithm and its implementation. The performance evaluation on a representative set of benchmarks shows close to linear speed-ups, in all the problems treated.
△ Less
Submitted 17 July, 2013;
originally announced July 2013.
-
Parallel Local Search: Experiments with a PGAS-based programming model
Authors:
Rui Machado,
Salvador Abreu,
Daniel Diaz
Abstract:
Local search is a successful approach for solving combinatorial optimization and constraint satisfaction problems. With the progressing move toward multi and many-core systems, GPUs and the quest for Exascale systems, parallelism has become mainstream as the number of cores continues to increase. New programming models are required and need to be better understood as well as data structures and al…
▽ More
Local search is a successful approach for solving combinatorial optimization and constraint satisfaction problems. With the progressing move toward multi and many-core systems, GPUs and the quest for Exascale systems, parallelism has become mainstream as the number of cores continues to increase. New programming models are required and need to be better understood as well as data structures and algorithms. Such is the case for local search algorithms when run on hundreds or thousands of processing units. In this paper, we discuss some experiments we have been doing with Adaptive Search and present a new parallel version of it based on GPI, a recent API and programming model for the development of scalable parallel applications. Our experiments on different problems show interesting speedups and, more importantly, a deeper interpretation of the parallelization of Local Search methods.
△ Less
Submitted 10 May, 2013; v1 submitted 31 January, 2013;
originally announced January 2013.
-
Online Proceedings of the 11th International Colloquium on Implementation of Constraint LOgic Programming Systems (CICLOPS 2011), Lexington, KY, U.S.A., July 10, 2011
Authors:
Salvador Abreu,
Vitor Santos Costa
Abstract:
These are the revised versions of the papers presented at CICLOPS 2011, a workshop colocated with ICLP 2011.
These are the revised versions of the papers presented at CICLOPS 2011, a workshop colocated with ICLP 2011.
△ Less
Submitted 21 December, 2011;
originally announced December 2011.
-
On the Implementation of GNU Prolog
Authors:
Daniel Diaz,
Salvador Abreu,
Philippe Codognet
Abstract:
GNU Prolog is a general-purpose implementation of the Prolog language, which distinguishes itself from most other systems by being, above all else, a native-code compiler which produces standalone executables which don't rely on any byte-code emulator or meta-interpreter. Other aspects which stand out include the explicit organization of the Prolog system as a multipass compiler, where intermediat…
▽ More
GNU Prolog is a general-purpose implementation of the Prolog language, which distinguishes itself from most other systems by being, above all else, a native-code compiler which produces standalone executables which don't rely on any byte-code emulator or meta-interpreter. Other aspects which stand out include the explicit organization of the Prolog system as a multipass compiler, where intermediate representations are materialized, in Unix compiler tradition. GNU Prolog also includes an extensible and high-performance finite domain constraint solver, integrated with the Prolog language but implemented using independent lower-level mechanisms. This article discusses the main issues involved in designing and implementing GNU Prolog: requirements, system organization, performance and portability issues as well as its position with respect to other Prolog system implementations and the ISO standardization initiative.
△ Less
Submitted 15 December, 2010; v1 submitted 11 December, 2010;
originally announced December 2010.
-
Casting of the WAM as an EAM
Authors:
Paulo André,
Salvador Abreu
Abstract:
Logic programming provides a very high-level view of programming, which comes at the cost of some execution efficiency. Improving performance of logic programs is thus one of the holy grails of Prolog system implementations and a wide range of approaches have historically been taken towards this goal. Designing computational models that both exploit the available parallelism in a given application…
▽ More
Logic programming provides a very high-level view of programming, which comes at the cost of some execution efficiency. Improving performance of logic programs is thus one of the holy grails of Prolog system implementations and a wide range of approaches have historically been taken towards this goal. Designing computational models that both exploit the available parallelism in a given application and that try hard to reduce the explored search space has been an ongoing line of research for many years. These goals in particular have motivated the design of several computational models, one of which is the Extended Andorra Model (EAM). In this paper, we present a preliminary specification and implementation of the EAM with Implicit Control, the WAM2EAM, which supplies regular WAM instructions with an EAM-centered interpretation.
△ Less
Submitted 20 September, 2010;
originally announced September 2010.
-
Distributed Work Stealing for Constraint Solving
Authors:
Vasco Pedro,
Salvador Abreu
Abstract:
With the dissemination of affordable parallel and distributed hardware, parallel and distributed constraint solving has lately been the focus of some attention. To effectually apply the power of distributed computational systems, there must be an effective sharing of the work involved in the search for a solution to a Constraint Satisfaction Problem (CSP) between all the participating agents, and…
▽ More
With the dissemination of affordable parallel and distributed hardware, parallel and distributed constraint solving has lately been the focus of some attention. To effectually apply the power of distributed computational systems, there must be an effective sharing of the work involved in the search for a solution to a Constraint Satisfaction Problem (CSP) between all the participating agents, and it must happen dynamically, since it is hard to predict the effort associated with the exploration of some part of the search space. We describe and provide an initial experimental assessment of an implementation of a work stealing-based approach to distributed CSP solving.
△ Less
Submitted 20 September, 2010;
originally announced September 2010.
-
Parallel local search for solving Constraint Problems on the Cell Broadband Engine (Preliminary Results)
Authors:
Salvator Abreu,
Daniel Diaz,
Philippe Codognet
Abstract:
We explore the use of the Cell Broadband Engine (Cell/BE for short) for combinatorial optimization applications: we present a parallel version of a constraint-based local search algorithm that has been implemented on a multiprocessor BladeCenter machine with twin Cell/BE processors (total of 16 SPUs per blade). This algorithm was chosen because it fits very well the Cell/BE architecture and requ…
▽ More
We explore the use of the Cell Broadband Engine (Cell/BE for short) for combinatorial optimization applications: we present a parallel version of a constraint-based local search algorithm that has been implemented on a multiprocessor BladeCenter machine with twin Cell/BE processors (total of 16 SPUs per blade). This algorithm was chosen because it fits very well the Cell/BE architecture and requires neither shared memory nor communication between processors, while retaining a compact memory footprint. We study the performance on several large optimization benchmarks and show that this achieves mostly linear time speedups, even sometimes super-linear. This is possible because the parallel implementation might explore simultaneously different parts of the search space and therefore converge faster towards the best sub-space and thus towards a solution. Besides getting speedups, the resulting times exhibit a much smaller variance, which benefits applications where a timely reply is critical.
△ Less
Submitted 7 October, 2009;
originally announced October 2009.