The informal research seminar of the ALGO group. Talks last roughly 25 minutes, with five extra minutes allocated for discussion. Many presentations are focused on recent conference presentations, or practice talks for upcoming conferences. New members are often asked to give an overview of their field of research. Talks given by invited speakers may take up to 45–60 minutes including questions.
To be kept up-to-date about noon seminar presentations, please subscribe to the algoseminar-l mailing list.
This is the schedule for 2021. Schedules are available between 2005 and 2024.
Given a stream of edges in a dynamic graph, how can we detect anomalous edges in near real-time with constant memory? In this thesis, we propose algorithms with link prediction measures to detect anomalies and leverage these measures to implement the performance of the baseline algorithm. Reservoir sampling is used to ensure the time and memory complexity of our approach. Experimental results on five real-world datasets show that our algorithms outperform the baseline approach.
In this report, aggregation methods for nominal geospatial data are analyzed. Aggregation in geospatial visualization is used to combine results and show data that has too many instances to clearly visualize on a map. When visual clutter becomes too much or the area is too large, aggregation is necessary to display the data in a way where the user can still obtain meaningful information. During aggregation some information is lost, and we want to minimize this loss.
In particular, dot maps that lay on a grid overlaid on a map are used to visualize these aggregations. In this report, three algorithms are explained which generate aggregation solutions from an input grid of nominal data, which can be overlaid on a map. Several variants of these algorithms are then used to optimize these aggregations over class balance. We analyze the performance on several quality metrics. As input, data representing cities in the Netherlands is used. This data is then aggregated using our algorithms and the performance is analyzed.
One of the algorithms will optimize a solution for class balance, one of our quality metrics, while trying to keep the other quality metrics high. Its performance is analyzed and we conclude on the best input to use in order to generate the best possible solutions.
In the end we see the method improve our results from the baseline algorithms. However, some unwanted properties start to arise when the algorithm has to adjust an abnormally bad class balance. We see that the algorithm is also too slow in its current form for proper use and improvements are suggested to fix this.
Ideas for future work are also proposed based on our findings. Several ways are suggested to improve the algorithm created in this paper. Ideas for other algorithms which better optimize the other quality metrics are also suggested. Ideally we would like to have an algorithm which can optimize for each of these metrics and let expert users tweak the aggregation based on necessary demands for their purpose.
In this thesis, we study two types of constrained paths in different polygonal settings, namely self-approaching paths and paths with increasing chords. A directed path is self-approaching if the distance between a point traveling along the path and a point that has not been traversed yet is not decreasing, more specifically for any points a, b, and c that lie in that order on the path, |ac| ≥ |bc|. Furthermore, a path has increasing chords if the path is self-approaching in both directions, that is for any points a, b, c, and d that lie in that order on the path, |ad| ≥ |bc|.
Existing literature on this subject describes an algorithm for finding a shortest self-approaching path inside a simple polygon. In this thesis, we show that finding the shortest path with increasing chords is similar to finding the shortest self-approaching path. Furthermore, we propose an algorithm that can find the shortest self-approaching path in a polygon with holes, albeit inefficiently. Lastly, we show that if any two points in the polygon can be connected by a self-approaching path, the holes must have all centers of curvatures internal to the holes. If however any two points can be connected by a path with increasing chords, the holes must be of constant width.
Hotspots are parts of data that are denser (or hotter) than their surrounding sparse (or cool) neighborhoods. In real world scenarios when data is constantly changing, hotspots form, fade and change rapidly. Often retail companies, ride hailing services, financial services and airlines are interested in identifying and tracking hotspots of their data. In particular, they study hotspots to classify and forecast significant changes in their customer shopping patterns. This in turn helps them to adjust their production and marketing plans accordingly.
We model hotspots as a set of points that are generated from a mixture of uniformly distributed (axis-aligned) rectangles and a background noise (which could be a significant portion of data) and our goal is to efficiently and approximately find these rectangular hotspots. We propose two efficient algorithms for detecting hotspots. Indeed, we test these algorithms on synthetic as well as real datasets and we show that the precision, recall and the F-score of these algorithms are very high (often, above 75%). Our first algorithm is based on random sampling pairs of points and our second algorithm is an extension of the classical DBScan algorithm [Gunawan, 2013]. We show that our former algorithm is very fast, but the latter one has better performance.
We also apply our algorithms in the streaming setting where points are given in a streaming fashion and we are interested in detecting hotspots that are forming, fading and changing dynamically.
Sets of moving entities can form groups that travel closely together for longer periods of time. Analyzing the shape and the collective movement properties of such groups of people, vehicles, or wildlife animals allows us to make inferences about the underlying movement behavior. An example of such a property is the density of the group, which has been shown to be influenced by external stimuli in applications from, for example, wildlife ecology. Motivated by such applications, we investigate how to find and maintain the location of dense and sparse areas of a given group of one-dimensional moving entities. We achieve this by maintaining the maxima and minima of a probability density estimate of a moving input point set, which represent the dense and sparse areas of the data, respectively. We use the kinetic data structure framework to build a data structure that maintains the extrema of the probability density estimate by utilizing the slope of the function. The kinetic data structure is compact, local, responsive, and weakly efficient. We also investigate how we can use an ε-approximation of the kernel density estimate to make the kinetic data structure more efficient. In addition to the work on density, we also demonstrate how to classify movement phases of a group of moving entities. We do so by combining an existing a grouping approach with an existing model-based classification approach for single entities. In a proof-of-concept implementation, the resulting method is able to classify movement phases of a group relatively well.
To implement machine learning, it is essential to first determine an appropriate algorithm for the dataset. Different algorithms may produce a large number of different models with different hyperparameter configurations, and it usually takes a lot of time to run the model on a large dataset when the model is relatively complex. Therefore, how to predict the performance of a model on a dataset is an fundamental problem to be solved. This thesis designs a prediction system based on matrix factorization to predict the classification accuracy of a specific model on a particular dataset.
In this thesis, we conduct a comprehensive empirical research on more than fifty datasets that we collected from the OpenML web site. We study the performance prediction of three fundamental machine learning algorithms, namely, random forest, XGBoost, and MultiLayer Perceptron (MLP). In particular, we obtain the following results.
* Predictability of fine-tuned models using coarse-tuned variants:
Usually, training and testing complex machine learning models is time-consuming. Thus, we hope to predict the complicated models by their simple ones. Three machine learning algorithms are compared in experiments. We find that random forest and XGBoost have good predictability on most datasets, that is, as the model becomes more complex, the performance of the model becomes better, and thus the accuracy of the complex model can be foreseen directly from its simple model. Hence, we can decide efficiently which algorithm to utilize by comparing simple models.
* Predictability of MLP using feature extraction techniques:
Often, real datasets have quite numerous features, from a few hundred to a few thousand features. Training models on fully-featured datasets is a very time-consuming task. We explore the idea of training a model $D$ on datasets that are projected on a few features and use this as a hint to predict the performance of the model $D$ when we consider all features of the dataset. We try different feature extraction techniques including techniques based on permutation importance, gain-based feature importance, hierarchical clustering based on Spearman correlation, and principal component analysis. We study the performance of techniques on the multilayer perceptron (MLP) model and observe that feature extraction with permutation importance and hierarchical clustering based on Spearman correlation has better performance. That is, on most datasets, the accuracy of the MLP improves as the number of features extracted by these techniques increases.
* Predict model performance using implicit feedback:
After researching the predictability of three algorithms, our goal is to discover a method that can be used to predict the specific model performance on a particular dataset. In order to predict the classification accuracy of different algorithms on different datasets with different hyperparameters, a prediction system with matrix factorization is built to predict the performance of different models on different datasets. With this system, the input accuracy can be seen as implicit feedback because there is no more information about the cause of these performance. This system can best achieve a mean absolute error of only 6.7% in the experiment.
This thesis looks at the problem of planning fiber-optics networks. The research was done in collaboration with ThePeopleGroup, where the company is searching for alternative methods in network calculations in order to deliver better results to their respective customers.
We consider a graph $G$ of potential edges, and a set of users/clients $U \subseteq V(G)$. We wish to place facilities on the vertex set $V(G)$, that can each satisfy up to a given maximum capacity, and to determine what edges $E(G)$ to include such that there exists a path between the appropriate facilities and users. The cost of such a solution consists of three factors: the cost of constructing facilities, the cost of digging trenches along included edges, and the cost of each cable between facility and user. A cost-efficient solution balances the cost of digging and equipment cost, so that the total cost is minimized.
The planning of fiber-optics networks is closely related to the Capacitated Facility Location Problem (CFLP), which we know to be NP-hard on general and planar graphs. There exist algorithms to help with automating the planning of these networks. One such algorithm, which has been developed by A. van den Boogaart in collaboration with ThePeopleGroup, involves reducing the solution space to a Steiner tree on the general graph, and then solving CFLP on the resulting Steiner tree in polynomial time. This method may be fast, but the step of selecting which edges to include in the Steiner tree is completely blind to the subsequent cost of constructing facilities and placing cables. This means that we may end up with sub-optimal results in terms of total cost.
The goal of this thesis is to seek an alternative method that finds solutions that are more cost-efficient than the algorithm based on Steiner trees. We approach this goal by widening our solution space. Instead of merely considering a tree on the general graph, we can significantly increase cost efficiency if we considered a tree-like graph that includes appropriate additional “edges of interest”.
To this end we propose a fixed parameter tractable algorithm for CFLP on graphs of a bounded treewidth $k$. We have implemented a variant of this algorithm to serve as a proof-of-concept that we can build a solution on the tree decomposition of a graph. The results of the algorithm in this paper are very promising. However, more work is still needed to improve the implementation of the algorithm, so that it is fast enough to run on larger datasets.
An experimental analysis of $k$-means clustering in the sliding window model in two dimensional space is performed. We perform this experiment on algorithms presented by Ackermann et al. and Youn et al. for $k$-means clustering on streaming data and sliding window streaming. We experiment on the different parameters of the algorithms and decide on values which are ideal for a well distributed Gaussian dataset, to give a good trade-off between quality of clustering and space usage. We then go forward to also devise an algorithm which combines concepts from aforementioned algorithms and then analyze its parameters to find optimal values to use for clustering on a well distributed Gaussian dataset.
UPD: the talks have been moved online.
The talks are 15 minutes long. We schedule 45 minutes per talk to accommodate questions and feedback.
1. Bram Custers — Coordinated Schematization for Visualizing Mobility Patterns on Networks
2. Willem Sonke — Volume from Outlines on Terrains
Title: Coordinated Schematization for Visualizing Mobility Patterns on Networks
Abstract: GPS trajectories of vehicles moving on a road network are a valuable source of traffic information. However, the sheer volume of available data makes it challenging to identify and visualize salient patterns. Meaningful visual summaries of trajectory collections require that both the trajectories and the underlying network are aggregated and simplified in a coherent manner. In this paper we propose a coordinated fully-automated pipeline for computing a schematic overview of mobility patterns from a collection of trajectories on a street network. Our pipeline utilizes well-known building blocks from GIS, automated cartography, and trajectory analysis: map matching, road selection, schematization, movement patterns, and metro-map style rendering. We showcase the results of our pipeline on two real-world trajectory collections around The Hague and Beijing.
Title: Volume from Outlines on Terrains
Abstract: Outlines (closed loops) delineate areas of interest on terrains, such as regions with a heightened risk of landslides. For various analysis tasks it is necessary to define and compute a volume of earth (soil) based on such an outline, capturing, for example, the possible volume of a landslide in a high-risk region. In this paper we discuss several options to define meaningful 2D surfaces induced by a 1D outline, which allow us to compute such volumes. We experimentally compare the proposed surface options for two applications: similarity of paths on terrains and landslide susceptibility analysis.
Origami is the art of folding paper (derived from the words ‘oru’ which means folding and ‘kami’ which means paper). In origami, one tries to fold some shape from a single piece of paper without cutting the paper, moving through itself, stretching or shrinking the paper. In the map folding problem, given is an $m \times n$ regular grid paper, where the boundary of each cell is a crease with either a mountain or valley assignment. The problem is to decide whether all the creases can be folded flat. In this thesis we introduce and explore a new map folding variation: the equilateral triangle map folding. This variation changes the type of grid, and asks whether a hexagon paper on a triangular grid with assigned creases can be folded flat.
The constraint satisfaction problem (CSP) is a problem which asks whether it is possible to find a satisfying assignment for a number of variables such that a set of constraints is satisfied. While the complexity of any CSP depends on the types of allowed constraints, many have been observed to be NP-hard. Some well-known instances include q-Graph Coloring and $d$-CNF-SAT for $d > 2$. One way to more efficiently solve NP-hard CSPs is through kernelizations. A kernelization is a formalization of preprocessing the original problem to a simpler one without changing the solutions. We will explore how the size of the set of suitably preprocessed constraints for CSP depends on the number of variables and the types of allowed constraints. The work by Chen, Jansen and Pieterse showed how to construct a kernel with a constraint set of linear size in terms of the number of variables for CSPs over Boolean domains where the types of allowed constraints are so-called “balanced”. Their approach relied on representing the constraints by low-degree polynomials. By computing a basis for these polynomials it was then possible to sparsify the set of constraints. Earlier work only considered applying this method on Boolean domain CSPs however. In this thesis we will expand upon the method provided by Chen, Jansen and Pieterse over non-Boolean finite domains, and show how to identify which CSPs have a kernel of size $O(n^t)$ for $t > 1$, thus going beyond linear kernelization. For Boolean CSPs we show that when the allowed types of constraints are so-called $t$-balanced, it is possible to construct polynomials of degree $t$ which represent the constraints of the CSP. These polynomials can then be used to sparsify said CSP as mentioned before. A similar approach can be applied for non-Boolean CSP by first rewriting the constraints of such a CSP to an equivalent binary constraint and then applying a framework similar to that of the CSP over a Boolean domain. For a specific type of CSP, known as 3-Uniform Hypergraph 3-Coloring, we will show that it is unlikely for a kernel with $O(n^{3−ε})$ constraints for $ε > 0$ to exist. This type of CSP is similar to 3-Graph Coloring except that the graph is structured differently. In a 3-uniform hypergraph, each edge consists of 3 vertices instead of 2. In addition, a coloring is said to be a proper coloring if each edge contains at least 2 uniquely colored vertices. This can be thought of as a CSP where each constraint consists of 3 variables and each variable can be assigned a value from 1 to 3.
The talks are 20–25 minutes long. We schedule 50 minutes per talk to accommodate questions and feedback.
1. Pantea Haghighatkhah — Obstructing Classification via Projection
2. Aleksandr Popov — Uncertain Curve Simplification
3. Mart Hagedoorn and Max van Mulken — Dots & Boxes Is PSPACE-Complete
In this thesis, we present a reconfiguration algorithm for self-reconfigurable modular robots. The robot consists of cubic modules that are able to move by pivoting around neighbouring modules. We show that if a configuration of a modular robot is admissible, which means that it does not contain any of three forbidden patterns, then our algorithm can transform it into a straight line of modules. Using this intermediate state, our algorithm can transform a robot between any two admissible configurations in $O(n^3)$ moves. Previous work by Sung et al. [ICRA 2015] has shown that it is possible to transform a robot between any two admissible configurations but also required that the configurations only contain convex holes.
Pick-and-place machines put electronic components onto printed circuit boards. Scheduling component placements performed by the robots of such a machine can be seen as a Vehicle Routing Problem. Kulicke & Soffa (K&S) is currently developing a pick-and-place machine where each robot has two placement heads that can place components simultaneously, giving rise to the Paired-Vehicle Routing Problem. We present a novel algorithm for this problem and experimentally compare it to K&S's own prototype. By combining the best parts of both algorithms we can reduce the makespan of K&S's schedules by 9% on average.
The talk is ~15 minutes; we schedule 15 extra for questions and feedback.
We study metrics that assess how close a triangulation is to being a Delaunay triangulation, for use in contexts where a good triangulation is desired but constraints (e.g., maximum degree) prevent the use of the Delaunay triangulation itself. Our near-Delaunay metrics derive from common Delaunay properties and satisfy a basic set of design criteria, such as being invariant under similarity transformations. We compare the metrics, showing that each can make different judgments as to which triangulation is closer to Delaunay.
In this thesis, the problem of scale-dependent road generalization for the use in automatic cartography software based on OpenStreetMap data is discussed. The goal is to decide which roads should be shown such that we get an accurate and informative map while keeping it readable. Previous work does not provide a complete, computationally fast procedure that incorporates the simplification of complex highway interchanges and double roads, and works in different areas of the world. Current, efficient alternatives make a naive selection based on road classes. This thesis first investigates problems with such an approach and then describes a new computationally fast strategy that, using a stroke detection algorithm, combines a new thinning algorithm with a road selection algorithm. Maps are generated using this approach and evaluated both qualitatively and quantitatively. From this evaluation, we conclude that this strategy creates high-quality maps.
Research in various humanities fields, such as history of ideas, involves historical literature research. This starts with defining a corpus, i.e. a set of texts specified by bibliographical data, relevant to the research project and acquiring copies of these texts in order to analyze them.
Researchers get online access to increasingly large federated catalogs containing bibliographical data (such as WorldCat) and repositories of digitized texts (such as Google Books). A new approach to history of ideas, computational history of ideas, aims to employ these developments to enlarge the evidence basis for a wide-scope historical investigation.
A challenge this new approach faces is that the bibliographic data available online varies in quality and structure. This is problematic since humanities researchers, for example, must know precisely what edition of a book they are analyzing, and that they have identified all the books relevant to their research project.
This thesis presents a technical and user interface design of a framework that is created in collaboration with the Concepts in Motion team from the University of Amsterdam, representing the users of this framework. The framework supports researchers in the process of compiling text corpora using online catalogs and repositories. This way, bigger corpora with more accurate bibliographic data can be modeled and explored in a user-defined model. In addition the framework keeps a detailed history of activity within the framework. This allows researchers to investigate and ensure the legitimacy of conclusions drawn from results from the framework. Other features include discussions and access control. The framework also prepares for the future integration of other features such as automated analysis of texts, and applications in other fields.
Next to presenting a design, this thesis also involves the implementation of a subset of the design in a production-ready way. This allows future projects to use the results of this thesis to deploy the framework for use in actual research projects.
The quality of geometric networks like road networks is often measured based on distances: a good network should provide relatively short routes between the nodes of the network. In this thesis we study the robustness of an important class of geometric networks, namely geometric spanners. The vertex set of a geometric graph is a set of points in the plane (or some higher-dimensional Euclidean space) and the edges are straight line segments, weighted with the Euclidean distance between their endpoints. For some $t > 1$, such a network is a $t$-spanner if the length of the shortest path between any two vertices is at most $t$ times their Euclidean distance. The factor $t$ is called the dilation (or stretch factor, spanning ratio) of the network. Since the spanning ratio of the complete graph is always one, the challenge is to construct spanners with constant dilation that have as few edges as possible. There are several constructions of linear size spanners, which are optimal. Naturally, more edges are needed for spanners that can resist to failures. We give several constructions of spanners that can survive massive failures, while having only a few edges.
Thesis: https://pure.tue.nl/ws/portalfiles/portal/173826728/thesis
Grid maps are used to display data coupled with a visual, spatial information. Depending on the amount of data, the spatial information has to be distorted for the sake of the data's visibility. In a grid map, each spatial element is condensed into a simple tile (such as a square or hexagon) and then arranged such that the global shape of all tiles matches the global shape of the input with the least amount of spatial distortion, such that each tile is visually identifiable according to its spatial information. We analyze the state-of-the-art solution for generating such grid maps and identify avenues of improvement which we pursue in this paper. We improve two steps of this solution: the partitioning of an input map based on salient features and its conversion into a square tile mosaic cartogram. The main improvements over the state-of-the-art consist of guaranteeing a valid mosaic cartogram conversion and significant runtime improvement. We discuss our methodology, provide an implementation which we apply on one of the datasets used by the state-of-the-art solution, a map of mainland Netherlands and its municipalities, compare it to the state-of-the-art result and discuss further avenues of improvement for our implementation.
Anomaly detection is a classical topic in machine learning and data mining, with a wide range of applications. As the amount of information shared online grows rapidly, the need for parallelizable anomaly detection algorithms increases. In this thesis, we propose the Random Shift Forest (RSF) algorithm to detect anomalies. RSF is an unsupervised anomaly detection algorithm with a low time complexity. It is an ensemble method that uses partitioning to isolate anomalies. To evaluate RSF, we use it on synthetic and real datasets. Compared to a state-of-the-art algorithm, RSF performs favourably in terms of ROC–AUC. The straightforward construction of RSF makes it amenable to big data models. In particular, we show that RSF can be implemented in the distributed and the streaming model using sparse recovery techniques.
Trajectory segmentation is the problem of partitioning a trajectory into different movement phases. There is a large body of work on the problem of segmenting individual trajectories, such as Behavioral Change Point Analysis (BCPA) which fits a model to statistical features of trajectories to compute likely points of change in behavior. However, there is very little work on algorithms for segmenting a set of trajectories. The aim of this work is to generalize segmentation to multiple trajectories in a model-based context.
We define a graph-like representation of model-based segmentations and an information criterion that can evaluate the quality of these representations, based on their complexity and the models fit to the movement phases and breakpoints of these phases. Our representation is a directed acyclic graph, wherein edges represents a movement phase of a group of subtrajectories, and vertices represent changes in movement or grouping behavior. We generalize movement models including BCPA to this group setting, and formalize our optimization objective with these generalized models.
Furthermore, we present three heuristic algorithms to compute such segmentations. A clustering approach is presented which builds clusters of close subtrajectories with similar movement from a trajectory set and greedily selects a subset of the clusters to form a valid segmentation. We relate the cluster selection problem to Weighted Set Cover to provide context on the problem and our greedy solution. We also present an incremental method, where we build a complete segmentation for a trajectory by adding trajectories to the segmentation one by one. We define the subproblem of adding a trajectory to a segmentation and describe how our solution to this problem yields a complete segmentation. The last method we present solves the problem in two steps. This method first discovers which sets of subtrajectories are considered to be grouped, by computing the trajectory grouping structure, a graph that represents subgroups in a set of trajectories. These subgroups are then segmented separately using an dynamic programming algorithm that generalizes the existing algorithm for model-based segmentation of individual trajectories.
Finally, we present results obtained from experiments on synthetically generated trajectories and real data sets of moving animal and human entities. We used a simple implementation of the heuristic segmentation methods to run these experiments with different methods and models. We conclude our work with a discussion of the results and future work on model-based segmentation of trajectory groups.
Programmable matter is a concept in the field of distributed computing, it refers to a collection
of synthetic particles which can be programmed to change the physical properties of the material
that they form. Leader election is a common and important problem in distributed computing,
in which a single node must be chosen to lead the execution of a larger algorithm. In the context
of programmable matter, leader election is a significantly complex problem, both due to the large
numbers of particles in a system, and the storage and computational limitations of the individual
particles. In this thesis, several leader election algorithms for particle systems of programmable
matter are evaluated both experimentally and analytically. We consider the Amoebot model for
programmable matter, which describes how particles can move, compute, and interact with one
another. Four different leader election algorithms are implemented in AmoebotSim, a simulator
for the Amoebot model. Using experiments we determine which of these algorithms is fastest,
and what factors influence the running time of each algorithm. Furthermore, we investigate
the theoretical robustness and probability of failure of the algorithms where applicable. This
information should be helpful for the selection of a leader election algorithm on given types of
input particle systems, and may also help in the development of new algorithms with different
performance characteristics or assumptions.
EuroCG has 12-minute talks with 3-minute breaks for questions and switching. We schedule 15 minutes extra for feedback and inevitable technical issues.
EuroCG has 12-minute talks with 3-minute breaks for questions and switching. We schedule 15 minutes extra per person for feedback and inevitable technical issues.
1. Aleksandr Popov — Uncertain Curve Simplification
2. Bram Custers — Route Reconstruction from Traffic Flow via Representative Trajectories
3. Nathan van Beusekom — Crossing Numbers of Beyond-Planar Graphs Revisited
EuroCG has 12-minute talks with 3-minute breaks for questions and switching. We schedule 15 minutes extra for feedback and inevitable technical issues.
Upd: Pantea’s talk has been rescheduled to 6th April.
The aim of this thesis is to define to what extent the exact route driven can be determined from a GPS trajectory using a continuous hidden Markov model (CHMM). We present a short overview of the various map matching methods and a more extensive overview of the hidden Markov model (HMM). Current HMMs are discrete as they calculate with the closest position which is not necessarily the most optimal point. Thus, the CHMM defines the measurement and transition probability in a continuous way, by defining a probability distribution over all points. Using data from HERE Technologies we test the performance of the CHMM for various noise levels and sampling intervals. The CHMM outperforms some models for short sampling intervals and has a similar overall performance to others. We observe two main limitations. First, the CHMM fails to take into account the correct route when finding the shortest path between candidates if the sampling interval is too long. Second, if the distance traversed on the road is considerably larger than the distance between consecutive measurements, the transition probability gets too small and the model is unable to find the correct route. The latter limitation may be overcome by implementing a different projection technique for the transition probability. A different projection technique, presented in this thesis, uses time instead of distance, which requires a good estimation of the average velocity, as it is very sensitive to traffic conditions and the accuracy of the recorded velocity of the vehicle. The results show that, besides the limitations, the CHMM is a simple but effective continuous map matching method.