III: Small: Geospatial Data Representation and Analysis through the Stellar Decomposition

Award: NSF-IIS - 1910766

October 1, 2019 – September 30, 2022

Project description

Thanks to recent developments in remote sensing technologies, the amount of spatial and spatio-temporal data currently available has been dramatically increasing. Specifically, LiDAR (Light Detection and Ranging) technologies generate precise three-dimensional information about the shape of the Earth, and its characteristics, in the form of massive point clouds. LiDAR data are used in a variety of different fields, such as urban modeling, climate study, earthquake analysis, disaster management, flood risk mapping, forestry analysis. Capitalizing on the opportunities presented by such massive, high-resolution, data and representing, analyzing, and transforming them into useful information poses several challenges.

The project focuses on developing scalable data representations, and algorithms for processing and analyzing scattered big geospatial data. The emphasis is on dealing with point clouds of very large size, arising from LIDAR acquisitions, and on applications to terrain modeling and to tree reconstruction from forestry data, with benefits to the research in environmental and Earth science. Being the approach entirely data agnostic, the project has a potential impact on a broader range of applications, including neuroscience, social science, and virtual reality. Software tools for modeling and analysis of very large terrains, and for forestry segmentation will be developed and distributed in the public domain.

Currently, raw LiDAR point clouds are processed by first converting them into raster models, with high computational costs, potential loss of information, and creation of artifacts due to missing data and to the presence of noise. The innovative aspect of the project is in performing data analysis directly on the LiDAR point clouds. This requires encoding the neighboring relations among the points as a simplicial mesh so as to provide an approximation of the underlying "shape" of the point cloud. In order to be able to deal with massive data sets, the research will develop efficient and effective mesh data structures, based on a new data clustered spatio-topological model, the Stellar decomposition, supporting scalability, and efficient processing of fundamental spatial and connectivity queries. New algorithms, rooted in computational topology, will be developed for terrain simplification and analysis, and for tree reconstruction from forestry data. The new scalable framework will make the analysis of terrain and forest data possible on commodity hardware even for datasets composed of billions of points. Moreover, thanks to the use of the Stellar decomposition, it will be well suited for implementations in a distributed environment for the analysis of such data at the global Earth scale.

The major challenges of the project are in (i) defining and implementing compact and efficient data structures for simplicial complexes in order to make the direct processing and analysis of big scattered geospatial data effective and efficient on commodity hardware; in (ii) developing scalable representations which are well suited for future implementations in distributed environments; in (iii) designing and developing new algorithms, rooted in computational topology, for terrain analysis and simplification, and for forest segmentation, applicable also in other contexts; in (iv) developing a software platform for modeling and analysis of terrains and for tree segmentation from forestry data, which will consist of several integrated libraries and software tools based on highly innovative data representation and analysis techniques, capable of dealing with large-size point clouds.

The fundamental research tasks identified in this project are:

  • Task 1. Develop a new decomposition-based model for data structures for simplicial complexes (meshes in two, three and higher dimension complexes), as well as a dimension-independent implementation.

  • Task 2. Develop a new framework for processing and analysis of terrains reconstructed from large LIDAR point clouds, based on the new data model, and on topological data analysis algorithms.

  • Task 3. Develop a new framework for the representation and analysis of 3D shapes for applications to tree identification and reconstruction from LiDAR point clouds based on topological algorithms and on a data structure rooted in the new decomposition-based model.


Participants

Leila De Floriani

Principal Investigator, Professor, University of Maryland at College Park

Federico Iuricich

Collaborator, Assistant professor, Clemson University, South Carolina

Riccardo Fellegara

Collaborator, Senior Scientist, German Aerospace Center (DLR), Germany

Kenneth Weiss

Collaborator, Senior Scientist, Lawrence Livermore National Laboratory, USA

Xin Xu

PhD student, University of Maryland at College Park

Yunting Song

PhD student, University of Maryland at College Park

Noel Dyer

PhD student, University of Maryland at College Park


Publications

  • Efficient Homology-Preserving Simplification of High-Dimensional Simplicial Shapes
    Riccardo Fellegara, Federico Iuricich, Leila De Floriani, Ulderico Fugacci
    Journal: Computer Graphics Forum, February 2020

    Simplicial complexes are widely used to discretize shapes. In low dimensions, a 3D shape is represented by discretizing its boundary surface, encoded as a triangle mesh, or by discretizing the enclosed volume, encoded as a tetrahedral mesh. High-dimensional simplicial complexes have recently found their application in topological data analysis. Topological data analysis aims at studying a point cloud P, possibly embedded in a high dimensional metric space, by studying the topological characteristics of the simplicial complexes computed on P. Analyzing such complexes is not feasible due to their size and dimensions and, to this aim, the idea of simplifying a complex while preserving its topological features has been proposed in the literature. Here, we consider the problem of efficiently simplifying simplicial complexes in arbitrary dimensions. We provide a new definition for the edge contraction operator based on a top-based data structure With the objective of preserving structural aspects of a simplicial shape (i.e., its homology), we provide a new algorithm for verifying the link condition on a top-based representation. We implement the simplification algorithm obtained by coupling the new edge contraction and the link condition, on a specific top-based data structure, the Stellar tree, that we use to demonstrate the scalability of our approach.

    @inproceedings{fellegara2020efficient,
    title={Efficient Homology-Preserving Simplification of High-Dimensional Simplicial Shapes},
    author={Fellegara, Riccardo and Iuricich, Federico and De Floriani, Leila and Fugacci, Ulderico},
    booktitle={Computer Graphics Forum},
    volume={39},
    number={1},
    pages={244--259},
    year={2020},
    organization={Wiley Online Library}
    }
  • Computing Multiparameter Persistent Homology through a Discrete Morse-Based Approach
    Sara Scaramuccia, Federico Iuricich, Leila De Floriani, Claudia Landi
    Journal: Computational Geometry, August 2020

    Persistent homology allows for tracking topological features, like loops, holes and their higher-dimensional analogues, along a single-parameter family of nested shapes. Computing descriptors for complex data characterized by multiple parameters is becoming a major challenging task in several applications, including physics, chemistry, medicine, and geography. Multiparameter persistent homology generalizes persistent homology to allow for the exploration and analysis of shapes endowed with multiple filtering functions. Still, computational constraints prevent multiparameter persistent homology to be a feasible tool for analyzing large size data sets. We consider discrete Morse theory as a strategy to reduce the computation of multiparameter persistent homology by working on a reduced dataset. We propose a new preprocessing algorithm, well suited for parallel and distributed implementations, and we provide the first evaluation of the impact of multiparameter persistent homology on computations.

    @article{scaramuccia2020computing,
      title={Computing multiparameter persistent homology through a discrete Morse-based approach},
      author={Scaramuccia, Sara and Iuricich, Federico and De Floriani, Leila and Landi, Claudia},
      journal={Computational Geometry},
      volume={89},
      pages={101623},
      year={2020},
      publisher={Elsevier}
    }
  • A Persistence-Based Approach for Individual Tree Mapping
    Xin Xu, Federico Iuricich, Leila De Floriani
    28th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, November 2020

    Light Detection and Ranging (LiDAR) sensors generate dense point clouds that can be used to map forest structures at a high spatial resolution level. In this work, we consider the problem of identifying individual trees in a LiDAR point cloud. Existing techniques generally require intense parameter tuning and user interactions. Our goal is defining an automatic approach capable of providing robust results with minimal user interactions. To this end, we define a segmentation algorithm based on the watershed transform and persistence-based simplification. The proposed algorithm uses a divide-and-conquer technique to split a LiDAR point cloud into regions with uniform density. Within each region, single trees are identified by applying a segmentation approach based on watershed by simulated immersion. Experiments show that our approach performs better than state-of-the-art algorithms on most of the study areas in the benchmark provided by the NEW technologies for a better mountain FORest timber mobilization (NEWFOR) project. Moreover, our approach requires a single (Boolean) parameter. This makes our approach well suited for a wide range of forest analysis applications, including biomass estimation, or field inventory surveys.

    @inproceedings{xu2020persistence,
      title={A Persistence-Based Approach for Individual Tree Mapping},
      author={Xu, Xin and Iuricich, Federico and De Floriani, Leila},
      booktitle={Proceedings of the 28th International Conference on Advances in Geographic Information Systems},
      pages={191--194},
      year={2020}
    }
  • Tetrahedral trees: A family of hierarchical spatial indexes for tetrahedral meshes
    Riccardo Fellegara, Leila De Floriani, Paola Magillo, Kenneth Weiss
    Journal: ACM Transactions on Spatial Algorithms and Systems, June 2020

    We address the problem of performing efficient spatial and topological queries on large tetrahedral meshes with arbitrary topology and complex boundaries. Such meshes arise in application domains such as 3D Geographic Information Systems (GISs), scientific visualization and finite element analysis. To this aim, we propose Tetrahedral trees, a family of spatial indexes based on a nested space subdivision (an octree or a kD-tree) and defined by several different subdivision criteria. We provide efficient algorithms for spatial and topological queries on Tetrahedral trees and compare to state-of-the-art approaches. Our results indicate that Tetrahedral trees are an improvement over R*-trees for querying tetrahedral meshes; they are more compact, they perform faster in many queries and are stable at variations of construction thresholds. They also support spatial queries on more general problem domains than topological data structures which explicitly encode adjacency information for efficient navigation, but have difficulties with non-trivial geometric or topological shapes.

    @article{Fellegara2020_tsas,
    author    = {Fellegara, Riccardo and De Floriani, Leila and Magillo, Paola and Weiss, Kenneth},
    title     = {Tetrahedral Trees: A Family of Hierarchical Spatial Indexes for Tetrahedral Meshes},
    journal   = {ACM Transactions on Spatial Algorithms and Systems},
    publisher = {Association for Computing Machinery},
    address   = {New York, NY, USA},
    year      = {2020},
    volume    = {6},
    number    = {4},
    pages     = {23:1--23:34},
    issn      = {2374-0353},
    doi       = {10.1145/3385851},
    }
  • The Stellar Tree: A compact representation for simplicial complexes and beyond
    Riccardo Fellegara, Kenneth Weiss, Leila De Floriani
    Submitted for publication

    We introduce the Stellar decomposition, a model for efficient topological data structures over a broad range of simplicial and cell complexes. A Stellar decomposition of a complex is a collection of regions indexing the complex's vertices and cells such that each region has sufficient information to locally reconstruct the star of its vertices, i.e., the cells incident in the region's vertices. Stellar decompositions are general in that they can compactly represent and efficiently traverse arbitrary complexes with a manifold or non-manifold domain They are scalable to complexes in high dimension and of large size, and they enable users to easily construct tailored application-dependent data structures using a fraction of the memory required by the corresponding topological data structure on the global complex.As a concrete realization of this model for spatially embedded complexes, we introduce the Stellar tree, which combines a nested spatial tree with a simple tuning parameter to control the number of vertices in a region. Stellar trees exploit the complex's spatial locality by reordering vertex and cell indices according to the spatial decomposition and by compressing sequential ranges of indices. Stellar trees are competitive with state-of-the-art topological data structures for manifold simplicial complexes and offer significant improvements for cell complexes and non-manifold simplicial complexes. As a proxy for larger applications, we describe how Stellar trees can be used to generate existing state-of-the-art topological data structures. In addition to faster generation times, the reduced memory requirements of a Stellar tree enable generating these data structures over large and high-dimensional complexes even on machines with limited resources.

    @article{Fellegara17,
    author    = {Riccardo Fellegara and Kenneth Weiss and Leila De Floriani},
    title     = {The Stellar Tree: A compact representation for simplicial complexes and beyond},
    journal   = {Computing Research Repository (CoRR)},
    volume    = {abs/1707.02211},
    year      = {2017},
    url       = {http://arxiv.org/abs/1707.02211},
    }
  • Terrain trees: a framework for representing, analyzing and visualizing triangulated terrains
    Riccardo Fellegara, Federico Iuricich, Yunting Song, Leila De Floriani
    Submitted for publication

    We propose a family of spatial data structures for the representation and processing of Triangulated Irregular Networks (TINs). We call such data structures Terrain trees. A Terrain tree combines a minimal encoding of the connectivity of the TIN with a hierarchical spatial index. Connectivity relations are extracted locally at run-time, within each leaf block of the hierarchy, based on specific application needs. Spatial queries are performed by exploring the hierarchical data structure. We present a new framework for terrain analysis based on Terrain trees. The framework, implemented in the Terrain trees library (TTL), contains approaches for morphological features extraction, such as roughness and curvature, and new approaches for topology- based analysis of terrains. Moreover, it includes a technique for multivariate visualization, which enables the analysis of multiple scalar fields defined on the same terrain. To prove the effectiveness and scalability of such framework, we have compared the different Terrain trees against each other and also against the most compact state-of-the-art data structure for TINs. Comparisons are performed on storage and generation costs and on the efficiency in performing terrain analysis operations.


Presentations

  • Leila De Floriani @ IEEE Computer Society Japan Chapter, December 2020
    Mesh-based approached to modeling point clouds for geospatial applications
  • Xin Xu @ ACM SIGSPATIAL, November 2020
    A Persistence-Based Approach for Individual Tree Mapping | video
  • Leila De Floriani @ IEEE YESIST12, August 2020
    Topology-based approaches to data analysis
  • Leila De Floriani @ IEEE Vardhaman College of Engineering and IEEE Hyderabad Section, August 2020
    Topology-based clustering methods for geospatial data analysis
  • Leila De Floriani and Yunting Song @ NOAA, August 2020
    Representation and analysis of big terrain data
  • Federico Iuricich @ Eurographics, May 2020
    Efficient Homology-Preserving Simplification of High-Dimensional Simplicial Shapes | video

Software tools

A C++ library for computing a discrete gradient on multivariate data

This library computes a discrete vector field preserving the multiparameter persistent homology of a multiparameter filtration. This library has been used to generate the results of the paper "Computing multiparameter persistent homology through a discrete Morse-based approach." [doi] by Sara Scaramuccia, Federico Iuricich, Leila De Floriani, Claudia Landi in Journal Geometry 89, 101623.
The expected input is an .off file representing a simplicial complex (up to dimension three) with multiple scalar values defined at its vertices. If no scalar value is defined of the vertices, the coordinates of the vertices will be used as scalar fields (either the z coordinates or the x and y coordinates combined). Since the program can be used with an arbitrary number of scalar functions (even 1) it can be used for computing the Forman gradient compatible with a simple filtration. The output generated are two .txt files encoding the boundary matrices of the original simplicial complex and the boundary matrix of the computed gradient.

Tetrahedral Trees

In this library we address the problem of performing efficient spatial and topological queries on large tetrahedral meshes with arbitrary topology and complex boundaries. Such meshes arise in several application domains, such as 3D Geographic Information Systems (GISs), scientific visualization, and finite element analysis. To this aim, we have defined the Tetrahedral trees, a family of spatial indexes based on a nested space subdivision (an octree or a kD-tree) and defined by several different subdivision criteria. In this library, we provide efficient algorithms for spatial and topological queries on Tetrahedral trees and it has been used for generating the results of the paper "Tetrahedral Trees: A Family of Hierarchical Spatial Indexes for Tetrahedral Meshes" by Riccardo Fellegara, Leila De Floriani, Paola Magillo, and Kenneth Weiss in ACM Transaction on Spatial Algorithms and Systems 6, 4, Article 23, 34 pages, 2020 [doi]. For a description of the required libraries for building the library, the compilation process, the run-time options and of the supported input format, please refer to the github page of the repository listed above.

Stellar Trees

In this library we address the problem of the efficient representation and management of simplicial and cell complexes in arbitrary dimension and with arbitrary domain. Managing complexes in three dimensions and higher is not a simple task, since the topological data structures proposed in the literature do not scale when the size and the dimension of a complex become high. We propose the Stellar library as a topological C++ framework for performing efficient topological queries on simplicial and non-simplicial meshes. The Stellar tree library provides a scalable and compact representation that encodes the minimal information to locally reconstruct the topological connectivity of its indexed elements. This provides the flexibility to efficiently construct the optimal data structures to solve the task at hand using a fraction of the memory required for a corresponding topological data structure on the global mesh. The efficiency of the Stellar library increases with the execution of successive queries, as the construction costs of these runtime data structures are amortized over multiple accesses while processing each node. This library has been used for generating the results of the paper "The Stellar decomposition: A compact representation for simplicial complexes and beyond" by Riccardo Fellegara, Kenneth Weiss, and Leila De Floriani currently submitted for review at Elsevier Computer & Graphics journal. A preprint of this article can be found here. For a description of the required libraries for building the library, the compilation process, the run-time options and of the supported input format, please refer to the readme file available in the repository archive.

Terrain Modeling and Analysis based on the IA data structure

This library includes the implementation of the Indexed data structure with Adjacencies (IA data structure) for the representation of triangle meshes. The IA data structure is a triangle-based data structure that encodes the vertices and the triangles of the mesh. The vertices of the meshes with their attributes are encoding in a vertex table. Each triangle is encoded in a triangle table as a triple of indexes to its vertices in the vertex table. The IA data structure also explicitly encodes, for each triangle, the indexes in the triangle table of its three bounding vertices, and for each vertex, the index of one triangle incident in the vertex. This enables the efficient retrieval of all connectivity relations. This library is developed based on the LibTri for terrain analysis, which includes state-of-the-art estimators for slope and curvature for triangulated surfaces (Triangulated Irregular Networks - TINs), and for the extraction of critical points from a TIN. For a description of the required libraries for building the library, the compilation process, the run-time options and of the supported input format, please refer to the github page of the repository listed above.


Data sets

  • Princeton Shape Benchmark
    The Princeton Shape Benchmark provides a repository of 3D models and software tools for evaluating shape-based retrieval and analysis algorithms. The benchmark contains a database of 3D polygonal models collected from the World Wide Web. For each 3D model, there is an Object File Format (.off) file with the polygonal geometry of the model, a model information file (e.g., the URL from where it came), and a JPEG image file with a thumbnail view of the model.
  • NEWFOR (NEW technologies for a better mountain FORest timber mobilization) single tree detection benchmark
    The NEWFOR single tree detection benchmark is used to evaluate LIDAR based detection methods. The benchmark includes the data of 15 different study areas in 5 countries of the Alpine Space. In each study area, the following data are provided: (1) Airborne Laser Scanning data in *.las format; (2) Digital Terrain Model (DTM) in *.tif format (1×1m or 0.5×0.5m spatial resolution); (3) Forest Inventory data in *.shp format; (4) Area of Interest in *.shp format; and (5) Metainformation of the data. Statistical Parameters about the Field Inventory (FI) data can be found in the publication paper:
    Eysn, L.; Hollaus, M.; Lindberg, E.; Berger, F.; Monnet, J.-M.; Dalponte, M.; Kobal, M.; Pellegrini, M.; Lingua, E.; Mongus, D.; Pfeifer, N. A Benchmark of Lidar-Based Single Tree Detection Methods Using Heterogeneous Forest Data from the Alpine Space. Forests 2015, 6, 1721-1747. http://www.mdpi.com/1999-4907/6/5/1721
  • Volume Library
    The Volume Library provides volume datasets for scientists involved with volume visualization and rendering. The datasets contain regular volume data mainly coming from CT or MRI scanners. The data is stored in the PVM format which contains information about the grid size, bit depth, and the cell spacing of a dataset. Optionally it may also contain a dataset description, courtesy information, the type of the scanner and a comment. This information and the raw data can be extracted easily using the PVM tools distributed with the V^3 volume rendering package available here.
  • CMU Unstructured Mesh Suite
    The CMU Unstructured Mesh Suite is a collection of partitioned unstructured 3D meshes of the San Fernando Valley in Southern California. The meshes were generated as part of the Quake project using the Archimedes tool chain. The meshes are described and characterized in detail in the following report: David R. O'Hallaron and Jonathan Richard Shewchuk, Properties of a Family of Parallel Finite Element Simulations, Technical Report CMU-CS-96-141, December, 1996. Abstract and BibTex extry, PostScript.
  • Virtual Terrain Project
    The Virtual Terrain Project provides both tools for the creation of terrain datasets as well as has a collection of several 2D datasets including DEMs, Bathymetry, and Contour Data.