-
CAMIL: Context-Aware Multiple Instance Learning for Cancer Detection and Subtyping in Whole Slide Images
Authors:
Olga Fourkioti,
Matt De Vries,
Chen Jin,
Daniel C. Alexander,
Chris Bakal
Abstract:
The visual examination of tissue biopsy sections is fundamental for cancer diagnosis, with pathologists analyzing sections at multiple magnifications to discern tumor cells and their subtypes. However, existing attention-based multiple instance learning (MIL) models used for analyzing Whole Slide Images (WSIs) in cancer diagnostics often overlook the contextual information of tumor and neighboring…
▽ More
The visual examination of tissue biopsy sections is fundamental for cancer diagnosis, with pathologists analyzing sections at multiple magnifications to discern tumor cells and their subtypes. However, existing attention-based multiple instance learning (MIL) models used for analyzing Whole Slide Images (WSIs) in cancer diagnostics often overlook the contextual information of tumor and neighboring tiles, leading to misclassifications. To address this, we propose the Context-Aware Multiple Instance Learning (CAMIL) architecture. CAMIL incorporates neighbor-constrained attention to consider dependencies among tiles within a WSI and integrates contextual constraints as prior knowledge into the MIL model. We evaluated CAMIL on subtyping non-small cell lung cancer (TCGA-NSCLC) and detecting lymph node (CAMELYON16 and CAMELYON17) metastasis, achieving test AUCs of 97.5\%, 95.9\%, and 88.1\%, respectively, outperforming other state-of-the-art methods. Additionally, CAMIL enhances model interpretability by identifying regions of high diagnostic value.
△ Less
Submitted 10 October, 2024; v1 submitted 9 May, 2023;
originally announced May 2023.
-
Having Fun in Learning Formal Specifications
Authors:
I. S. W. B. Prasetya,
Craig Q. H. D. Leek,
Orestis Melkonian,
Joris ten Tusscher,
Jan van Bergen,
J. M. Everink,
Thomas van der Klis,
Petar Kostic,
Rick Meijerink,
Roan Oosenbrug,
Jelle J. Oostveen,
Tijmen van den Pol,
Mike de Vries,
Wink M. van Zon
Abstract:
There are many benefits in providing formal specifications for our software. However, teaching students to do this is not always easy as courses on formal methods are often experienced as dry by students. This paper presents a game called FormalZ that teachers can use to introduce some variation in their class. Students can have some fun in playing the game and, while doing so, also learn the basi…
▽ More
There are many benefits in providing formal specifications for our software. However, teaching students to do this is not always easy as courses on formal methods are often experienced as dry by students. This paper presents a game called FormalZ that teachers can use to introduce some variation in their class. Students can have some fun in playing the game and, while doing so, also learn the basics of writing formal specifications in the form of pre- and post-conditions. Unlike existing software engineering themed education games such as Pex and Code Defenders, FormalZ takes the deep gamification approach where playing gets a more central role in order to generate more engagement. This short paper presents our work in progress: the first implementation of FormalZ along with the result of a preliminary users' evaluation. This implementation is functionally complete and tested, but the polishing of its user interface is still future work.
△ Less
Submitted 1 March, 2019;
originally announced March 2019.
-
Exceptions in Business Processes in Relation to Operational Performance
Authors:
Remco Dijkman,
Geoffrey van IJzendoorn,
Oktay Turetken,
Meint de Vries
Abstract:
Business process models describe the way of working in an organization. Typically, business process models distinguish between the normal flow of work and exceptions to that normal flow. However, they often present an idealized view. This means that unexpected exceptions - exceptions that are not modelled in the business process model - can also occur in practice. This has an effect on the efficie…
▽ More
Business process models describe the way of working in an organization. Typically, business process models distinguish between the normal flow of work and exceptions to that normal flow. However, they often present an idealized view. This means that unexpected exceptions - exceptions that are not modelled in the business process model - can also occur in practice. This has an effect on the efficiency of the organization, because information systems are not developed to handle unexpected exceptions. This paper studies the relation between the occurrence of exceptions and operational performance. It does this by analyzing the execution logs of business processes from five organizations, classifying execution paths as normal or exceptional. Subsequently, it analyzes the differences between normal and exceptional paths. The results show that exceptions are related to worse operational performance in terms of a longer throughput time and that unexpected exceptions relate to a stronger increase in throughput time than expected exceptions.
△ Less
Submitted 26 June, 2017;
originally announced June 2017.
-
Team Delft's Robot Winner of the Amazon Picking Challenge 2016
Authors:
Carlos Hernandez,
Mukunda Bharatheesha,
Wilson Ko,
Hans Gaiser,
Jethro Tan,
Kanter van Deurzen,
Maarten de Vries,
Bas Van Mil,
Jeff van Egmond,
Ruben Burger,
Mihai Morariu,
Jihong Ju,
Xander Gerrmann,
Ronald Ensing,
Jan Van Frankenhuyzen,
Martijn Wisse
Abstract:
This paper describes Team Delft's robot, which won the Amazon Picking Challenge 2016, including both the Picking and the Stowing competitions. The goal of the challenge is to automate pick and place operations in unstructured environments, specifically the shelves in an Amazon warehouse. Team Delft's robot is based on an industrial robot arm, 3D cameras and a customized gripper. The robot's softwa…
▽ More
This paper describes Team Delft's robot, which won the Amazon Picking Challenge 2016, including both the Picking and the Stowing competitions. The goal of the challenge is to automate pick and place operations in unstructured environments, specifically the shelves in an Amazon warehouse. Team Delft's robot is based on an industrial robot arm, 3D cameras and a customized gripper. The robot's software uses ROS to integrate off-the-shelf components and modules developed specifically for the competition, implementing Deep Learning and other AI techniques for object recognition and pose estimation, grasp planning and motion planning. This paper describes the main components in the system, and discusses its performance and results at the Amazon Picking Challenge 2016 finals.
△ Less
Submitted 18 October, 2016;
originally announced October 2016.
-
Evaluating e-voting: theory and practice
Authors:
Wouter Bokslag,
Manon de Vries
Abstract:
In the Netherlands as well as many other countries, the use of electronic voting solutions is a recurrent topic of discussion. While electronic voting certainly has advantages over paper voting, there are also important risks involved. This paper presents an analysis of benefits and risks of electronic voting, and shows the relevance of these issues by means of three case studies of real-world imp…
▽ More
In the Netherlands as well as many other countries, the use of electronic voting solutions is a recurrent topic of discussion. While electronic voting certainly has advantages over paper voting, there are also important risks involved. This paper presents an analysis of benefits and risks of electronic voting, and shows the relevance of these issues by means of three case studies of real-world implementations. Additionally, techniques that may be employed to improve upon many of the current systems are presented. We conclude that the advantages of E-voting do not outweigh the disadvantages, as the resulting reduced verifiability and transparency seem hard to overcome.
△ Less
Submitted 8 February, 2016;
originally announced February 2016.
-
Parallel Streaming Signature EM-tree: A Clustering Algorithm for Web Scale Applications
Authors:
Christopher M. de Vries,
Lance De Vine,
Shlomo Geva,
Richi Nayak
Abstract:
The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls cover…
▽ More
The proliferation of the web presents an unsolved problem of automatically analyzing billions of pages of natural language. We introduce a scalable algorithm that clusters hundreds of millions of web pages into hundreds of thousands of clusters. It does this on a single mid-range machine using efficient algorithms and compressed document representations. It is applied to two web-scale crawls covering tens of terabytes. ClueWeb09 and ClueWeb12 contain 500 and 733 million web pages and were clustered into 500,000 to 700,000 clusters. To the best of our knowledge, such fine grained clustering has not been previously demonstrated. Previous approaches clustered a sample that limits the maximum number of discoverable clusters. The proposed EM-tree algorithm uses the entire collection in clustering and produces several orders of magnitude more clusters than the existing algorithms. Fine grained clustering is necessary for meaningful clustering in massive collections where the number of distinct topics grows linearly with collection size. These fine-grained clusters show an improved cluster quality when assessed with two novel evaluations using ad hoc search relevance judgments and spam classifications for external validation. These evaluations solve the problem of assessing the quality of clusters where categorical labeling is unavailable and unfeasible.
△ Less
Submitted 21 May, 2015;
originally announced May 2015.
-
Document Clustering Evaluation: Divergence from a Random Baseline
Authors:
Christopher M. De Vries,
Shlomo Geva,
Andrew Trotman
Abstract:
Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes th…
▽ More
Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The divergence from a random baseline approach is able to differentiate ineffective clusterings encountered in the INEX XML Mining track. It also appears to perform a normalisation similar to the Normalised Mutual Information (NMI) measure but it can be applied to any measure of cluster quality. When it is applied to the intrinsic measure of distortion as measured by RMSE, subtraction from a random baseline provides a clear optimum that is not apparent otherwise. This approach can be applied to any clustering evaluation. This paper describes its use in the context of document clustering evaluation.
△ Less
Submitted 29 August, 2012; v1 submitted 28 August, 2012;
originally announced August 2012.
-
TopSig: Topology Preserving Document Signatures
Authors:
Shlomo Geva,
Christopher M. De Vries
Abstract:
Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternativ…
▽ More
Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.
△ Less
Submitted 24 April, 2012;
originally announced April 2012.
-
Random Indexing K-tree
Authors:
Christopher M. De Vries,
Lance De Vine,
Shlomo Geva
Abstract:
Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are useful for managing a changing collection. Furthermore, it solves previous issues with sparse document vectors when using K-tree. The algorithms and data struc…
▽ More
Random Indexing (RI) K-tree is the combination of two algorithms for clustering. Many large scale problems exist in document clustering. RI K-tree scales well with large inputs due to its low complexity. It also exhibits features that are useful for managing a changing collection. Furthermore, it solves previous issues with sparse document vectors when using K-tree. The algorithms and data structures are defined, explained and motivated. Specific modifications to K-tree are made for use with RI. Experiments have been executed to measure quality. The results indicate that RI K-tree improves document cluster quality over the original K-tree algorithm.
△ Less
Submitted 1 February, 2010; v1 submitted 6 January, 2010;
originally announced January 2010.
-
K-tree: Large Scale Document Clustering
Authors:
Christopher M. De Vries,
Shlomo Geva
Abstract:
We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collectio…
▽ More
We introduce K-tree in an information retrieval context. It is an efficient approximation of the k-means clustering algorithm. Unlike k-means it forms a hierarchy of clusters. It has been extended to address issues with sparse representations. We compare performance and quality to CLUTO using document collections. The K-tree has a low time complexity that is suitable for large document collections. This tree structure allows for efficient disk based implementations where space requirements exceed that of main memory.
△ Less
Submitted 6 January, 2010;
originally announced January 2010.
-
Document Clustering with K-tree
Authors:
Christopher M. De Vries,
Shlomo Geva
Abstract:
This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising resul…
▽ More
This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document clustering. Many large scale problems exist in document clustering. K-tree scales well with large inputs due to its low complexity. It offers promising results both in terms of efficiency and quality. Document classification was completed using Support Vector Machines.
△ Less
Submitted 6 January, 2010;
originally announced January 2010.