III: Small: Geospatial Data Representation and Analysis through the Stellar Decomposition

Award: NSF-IIS - 1910766

October 1, 2019 – September 30, 2022

Project description

Thanks to recent developments in remote sensing technologies, the amount of spatial and spatio-temporal data currently available has been dramatically increasing. Specifically, LiDAR (Light Detection and Ranging) technologies generate precise three-dimensional information about the shape of the Earth, and its characteristics, in the form of massive point clouds. LiDAR data are used in a variety of different fields, such as urban modeling, climate study, earthquake analysis, disaster management, flood risk mapping, forestry analysis. Capitalizing on the opportunities presented by such massive, high-resolution, data and representing, analyzing, and transforming them into useful information poses several challenges.

The project focuses on developing scalable data representations, and algorithms for processing and analyzing scattered big geospatial data. The emphasis is on dealing with point clouds of very large size, arising from LIDAR acquisitions, and on applications to terrain modeling and to tree reconstruction from forestry data, with benefits to the research in environmental and Earth science. Being the approach entirely data agnostic, the project has a potential impact on a broader range of applications, including neuroscience, social science, and virtual reality. Software tools for modeling and analysis of very large terrains, and for forestry segmentation will be developed and distributed in the public domain.

Currently, raw LiDAR point clouds are processed by first converting them into raster models, with high computational costs, potential loss of information, and creation of artifacts due to missing data and to the presence of noise. The innovative aspect of the project is in performing data analysis directly on the LiDAR point clouds. This requires encoding the neighboring relations among the points as a simplicial mesh so as to provide an approximation of the underlying “shape” of the point cloud. In order to be able to deal with massive data sets, the research will develop efficient and effective mesh data structures, based on a new data clustered spatio-topological model, the Stellar decomposition, supporting scalability, and efficient processing of fundamental spatial and connectivity queries. New algorithms, rooted in computational topology, will be developed for terrain simplification and analysis, and for tree reconstruction from forestry data. The new scalable framework will make the analysis of terrain and forest data possible on commodity hardware even for datasets composed of billions of points. Moreover, thanks to the use of the Stellar decomposition, it will be well suited for implementations in a distributed environment for the analysis of such data at the global Earth scale.

The major challenges of the project are in (i) defining and implementing compact and efficient data structures for simplicial complexes in order to make the direct processing and analysis of big scattered geospatial data effective and efficient on commodity hardware; in (ii) developing scalable representations which are well suited for future implementations in distributed environments; in (iii) designing and developing new algorithms, rooted in computational topology, for terrain analysis and simplification, and for forest segmentation, applicable also in other contexts; in (iv) developing a software platform for modeling and analysis of terrains and for tree segmentation from forestry data, which will consist of several integrated libraries and software tools based on highly innovative data representation and analysis techniques, capable of dealing with large-size point clouds.

The fundamental research tasks identified in this project are:

  • Task 1. Develop a new decomposition-based model for data structures for simplicial complexes (meshes in two, three and higher dimension complexes), as well as a dimension-independent implementation.

  • Task 2. Develop a new framework for processing and analysis of terrains reconstructed from large LIDAR point clouds, based on the new data model, and on topological data analysis algorithms.

  • Task 3. Develop a new framework for the representation and analysis of 3D shapes for applications to tree identification and reconstruction from LiDAR point clouds based on topological algorithms and on a data structure rooted in the new decomposition-based model.


Participants

Leila De Floriani

Principal Investigator, Professor, University of Maryland at College Park

Federico Iuricich

Collaborator, Assistant professor, Clemson University, South Carolina

Riccardo Fellegara

Collaborator, Senior Scientist, German Aerospace Center (DLR), Germany

Kenneth Weiss

Collaborator, Senior Scientist, Lawrence Livermore National Laboratory, USA

Xin Xu

PhD student, University of Maryland at College Park

Yunting Song

PhD student, University of Maryland at College Park

Noel Dyer

PhD student, University of Maryland at College Park

Yuehui Qian

PhD student, University of Maryland at College Park


Publications

  • Efficient Homology-Preserving Simplification of High-Dimensional Simplicial Shapes

  • Computing Multiparameter Persistent Homology through a Discrete Morse-Based Approach

  • A Persistence-Based Approach for Individual Tree Mapping

  • Tetrahedral trees: A family of hierarchical spatial indexes for tetrahedral meshes

  • The Stellar Tree: A compact representation for simplicial complexes and beyond by Riccardo Fellegara, Kenneth Weiss, Leila De Floriani
    Submitted for publication

    Abstract

    We introduce the Stellar decomposition, a model for efficient topological data structures over a broad range of simplicial and cell complexes. A Stellar decomposition of a complex is a collection of regions indexing the complex's vertices and cells such that each region has sufficient information to locally reconstruct the star of its vertices, i.e., the cells incident in the region's vertices. Stellar decompositions are general in that they can compactly represent and efficiently traverse arbitrary complexes with a manifold or non-manifold domain They are scalable to complexes in high dimension and of large size, and they enable users to easily construct tailored application-dependent data structures using a fraction of the memory required by the corresponding topological data structure on the global complex.As a concrete realization of this model for spatially embedded complexes, we introduce the Stellar tree, which combines a nested spatial tree with a simple tuning parameter to control the number of vertices in a region. Stellar trees exploit the complex's spatial locality by reordering vertex and cell indices according to the spatial decomposition and by compressing sequential ranges of indices. Stellar trees are competitive with state-of-the-art topological data structures for manifold simplicial complexes and offer significant improvements for cell complexes and non-manifold simplicial complexes. As a proxy for larger applications, we describe how Stellar trees can be used to generate existing state-of-the-art topological data structures. In addition to faster generation times, the reduced memory requirements of a Stellar tree enable generating these data structures over large and high-dimensional complexes even on machines with limited resources.

  • Terrain trees: a framework for representing, analyzing and visualizing triangulated terrains by Riccardo Fellegara, Federico Iuricich, Yunting Song, Leila De Floriani
    Submitted for publication

    Abstract

    We propose a family of spatial data structures for the representation and processing of Triangulated Irregular Networks (TINs). We call such data structures Terrain trees. A Terrain tree combines a minimal encoding of the connectivity of the TIN with a hierarchical spatial index. Connectivity relations are extracted locally at run-time, within each leaf block of the hierarchy, based on specific application needs. Spatial queries are performed by exploring the hierarchical data structure. We present a new framework for terrain analysis based on Terrain trees. The framework, implemented in the Terrain trees library (TTL), contains approaches for morphological features extraction, such as roughness and curvature, and new approaches for topology- based analysis of terrains. Moreover, it includes a technique for multivariate visualization, which enables the analysis of multiple scalar fields defined on the same terrain. To prove the effectiveness and scalability of such framework, we have compared the different Terrain trees against each other and also against the most compact state-of-the-art data structure for TINs. Comparisons are performed on storage and generation costs and on the efficiency in performing terrain analysis operations.


Presentations

  • Leila De Floriani @ IEEE Computer Society Japan Chapter, December 2020
    Mesh-based approached to modeling point clouds for geospatial applications
  • Xin Xu @ ACM SIGSPATIAL, November 2020
    A Persistence-Based Approach for Individual Tree Mapping | video
  • Leila De Floriani @ IEEE YESIST12, August 2020
    Topology-based approaches to data analysis
  • Leila De Floriani @ IEEE Vardhaman College of Engineering and IEEE Hyderabad Section, August 2020
    Topology-based clustering methods for geospatial data analysis
  • Leila De Floriani and Yunting Song @ NOAA, August 2020
    Representation and analysis of big terrain data
  • Federico Iuricich @ Eurographics, May 2020
    Efficient Homology-Preserving Simplification of High-Dimensional Simplicial Shapes | video

Software

fg_multi A C++ library for computing a discrete gradient on multivariate data | code
Description

This library includes the implementation of the Indexed data structure with Adjacencies (IA data structure) for the representation of triangle meshes. The IA data structure is a triangle-based data structure that encodes the vertices and the triangles of the mesh. The vertices of the meshes with their attributes are encoding in a vertex table. Each triangle is encoded in a triangle table as a triple of indexes to its vertices in the vertex table. The IA data structure also explicitly encodes, for each triangle, the indexes in the triangle table of its three bounding vertices, and for each vertex, the index of one triangle incident in the vertex. This enables the efficient retrieval of all connectivity relations. This library is developed based on the LibTri for terrain analysis, which includes state-of-the-art estimators for slope and curvature for triangulated surfaces (Triangulated Irregular Networks - TINs), and for the extraction of critical points from a TIN. For a description of the required libraries for building the library, the compilation process, the run-time options and of the supported input format, please refer to the github page of the repository listed above.

tetra_trees Tetrahedral Trees | code
Description

In this library we address the problem of performing efficient spatial and topological queries on large tetrahedral meshes with arbitrary topology and complex boundaries. Such meshes arise in several application domains, such as 3D Geographic Information Systems (GISs), scientific visualization, and finite element analysis. To this aim, we have defined the Tetrahedral trees, a family of spatial indexes based on a nested space subdivision (an octree or a kD-tree) and defined by several different subdivision criteria. In this library, we provide efficient algorithms for spatial and topological queries on Tetrahedral trees and it has been used for generating the results of the paper "Tetrahedral Trees: A Family of Hierarchical Spatial Indexes for Tetrahedral Meshes" by Riccardo Fellegara, Leila De Floriani, Paola Magillo, and Kenneth Weiss in ACM Transaction on Spatial Algorithms and Systems 6, 4, Article 23, 34 pages, 2020 [doi]. For a description of the required libraries for building the library, the compilation process, the run-time options and of the supported input format, please refer to the github page of the repository listed above.

stellar_trees Stellar Trees | code
Description

In this library we address the problem of the efficient representation and management of simplicial and cell complexes in arbitrary dimension and with arbitrary domain. Managing complexes in three dimensions and higher is not a simple task, since the topological data structures proposed in the literature do not scale when the size and the dimension of a complex become high. We propose the Stellar library as a topological C++ framework for performing efficient topological queries on simplicial and non-simplicial meshes. The Stellar tree library provides a scalable and compact representation that encodes the minimal information to locally reconstruct the topological connectivity of its indexed elements. This provides the flexibility to efficiently construct the optimal data structures to solve the task at hand using a fraction of the memory required for a corresponding topological data structure on the global mesh. The efficiency of the Stellar library increases with the execution of successive queries, as the construction costs of these runtime data structures are amortized over multiple accesses while processing each node. This library has been used for generating the results of the paper "The Stellar decomposition: A compact representation for simplicial complexes and beyond" by Riccardo Fellegara, Kenneth Weiss, and Leila De Floriani in Elsevier Computer & Graphics journal, 98: 322-343, 2021 [doi]. A preprint of this article can be also fund on arXiv here. For a description of the required libraries for building the library, the compilation process, the run-time options and of the supported input format, please refer to the readme file available in the repository archive.

terrain_ia Terrain Modeling and Analysis based on the IA data structure | code
Description

This library includes the implementation of the Indexed data structure with Adjacencies (IA data structure) for the representation of triangle meshes. The IA data structure is a triangle-based data structure that encodes the vertices and the triangles of the mesh. The vertices of the meshes with their attributes are encoding in a vertex table. Each triangle is encoded in a triangle table as a triple of indexes to its vertices in the vertex table. The IA data structure also explicitly encodes, for each triangle, the indexes in the triangle table of its three bounding vertices, and for each vertex, the index of one triangle incident in the vertex. This enables the efficient retrieval of all connectivity relations. This library is developed based on the LibTri for terrain analysis, which includes state-of-the-art estimators for slope and curvature for triangulated surfaces (Triangulated Irregular Networks - TINs), and for the extraction of critical points from a TIN. For a description of the required libraries for building the library, the compilation process, the run-time options and of the supported input format, please refer to the github page of the repository listed above.

terrain_trees Terrain Trees | code
Description

Terrain trees are a new in-core family of spatial indexes for the representation and analysis of Triangulated Irregular Networks (TINs). Terrain trees combine a minimal encoding of the connectivity of the underlying triangle mesh with a hierarchical spatial index, implicitly representing the other topological relations among vertices, edges and vertices. Topological relations are extracted locally within each leaf block of the hierarchal index at runtime, based on specific application needs. Based on Terrain trees, we have developed a set of tools for terrain analysis and named this toolset as Terrain Trees library (TTL). TTL contains a kernel for connectivity and spatial queries, and modules for extracting morphological features, including edge and triangle slopes, roughness, curvature. It also contains modules for extracting topological structures, like critical point, critical net, watershed segmentation, based on the discrete Morse gradient, and a technique for multivariate data visualization, which enables the analysis of multiple scalar fields defined on the same terrain. By working on TINs generated from big LiDAR (Light, Detection and Ranging) data sets, we demonstrate the effectiveness and scalability of the Terrain trees against state-of-the-art compact data structures.


Datasets

  • Princeton Shape Benchmark
    The Princeton Shape Benchmark provides a repository of 3D models and software tools for evaluating shape-based retrieval and analysis algorithms. The benchmark contains a database of 3D polygonal models collected from the World Wide Web. For each 3D model, there is an Object File Format (.off) file with the polygonal geometry of the model, a model information file (e.g., the URL from where it came), and a JPEG image file with a thumbnail view of the model.
  • NEWFOR (NEW technologies for a better mountain FORest timber mobilization) single tree detection benchmark
    The NEWFOR single tree detection benchmark is used to evaluate LIDAR based detection methods. The benchmark includes the data of 15 different study areas in 5 countries of the Alpine Space. In each study area, the following data are provided: (1) Airborne Laser Scanning data in *.las format; (2) Digital Terrain Model (DTM) in *.tif format (1×1m or 0.5×0.5m spatial resolution); (3) Forest Inventory data in *.shp format; (4) Area of Interest in *.shp format; and (5) Metainformation of the data. Statistical Parameters about the Field Inventory (FI) data can be found in the publication paper:
    Eysn, L.; Hollaus, M.; Lindberg, E.; Berger, F.; Monnet, J.-M.; Dalponte, M.; Kobal, M.; Pellegrini, M.; Lingua, E.; Mongus, D.; Pfeifer, N. A Benchmark of Lidar-Based Single Tree Detection Methods Using Heterogeneous Forest Data from the Alpine Space. Forests 2015, 6, 1721-1747. http://www.mdpi.com/1999-4907/6/5/1721
  • Volume Library
    The Volume Library provides volume datasets for scientists involved with volume visualization and rendering. The datasets contain regular volume data mainly coming from CT or MRI scanners. The data is stored in the PVM format which contains information about the grid size, bit depth, and the cell spacing of a dataset. Optionally it may also contain a dataset description, courtesy information, the type of the scanner and a comment. This information and the raw data can be extracted easily using the PVM tools distributed with the V^3 volume rendering package available here.
  • CMU Unstructured Mesh Suite
    The CMU Unstructured Mesh Suite is a collection of partitioned unstructured 3D meshes of the San Fernando Valley in Southern California. The meshes were generated as part of the Quake project using the Archimedes tool chain. The meshes are described and characterized in detail in the following report: David R. O'Hallaron and Jonathan Richard Shewchuk, Properties of a Family of Parallel Finite Element Simulations, Technical Report CMU-CS-96-141, December, 1996. Abstract and BibTex extry, PostScript.
  • Virtual Terrain Project
    The Virtual Terrain Project provides both tools for the creation of terrain datasets as well as has a collection of several 2D datasets including DEMs, Bathymetry, and Contour Data.