Home
Machine Learning
Research

Topological data analysis

We study the shape of data using methods from machine learning and algebraic topology.

Topological data analysis pipeline
Topological data analysis pipeline
Photo:
Nello Blaser

Main content

Persistent homology is one of the most important methods in topological data analysis. The method consists of several steps: First, the data is translated into a filtered simplicial complex, e.g. Cech complex. Starting from low filtration values, this filtered simplicial complex is then assembled, while keeping track of the components, holes and voids in the structure. In this way, we get an overview of the features and the range of filtration values for which they exist, allowing for multi-scale analysis. Features that persist over large ranges of filtration values are called persistent features and thought to be important. For unsupervised machine learning, we then visualize the persistence diagrams. For supervised machine learning, persistence diagrams need to be transformed into a vector representation, before using standard supervised learning algorithms. Our research includes efficient encoding of data into simplicial complexes (topological representation and sparsification), validation measures of unsupervised persistent homology, data benchmarks, efficient vector representations of persistent homology, cycle representations and applications in biomedicine and geophysics.