Document Type


Publication Date



Multimodal hyperspectral and lidar data sets provide complementary spectral and structural data. Joint processing and exploitation to produce semantically labeled pixel maps through semantic segmentation has proven useful for a variety of decision tasks. In this work, we identify two areas of improvement over previous approaches and present a proof of concept network implementing these improvements. First, rather than using a late fusion style architecture as in prior work, our approach implements a composite style fusion architecture to allow for the simultaneous generation of multimodal features and the learning of fused features during encoding. Second, our approach processes the higher information content lidar 3D point cloud data with point-based CNN layers instead of the lower information content lidar 2D DSM used in prior work. Unlike previous approaches, the proof of concept network utilizes a combination of point and pixel-based CNN layers incorporating concatenation-based fusion necessitating a novel point-to-pixel feature discretization method. We characterize our models against a modified GRSS18 data set. Our fusion model achieved 6.6% higher pixel accuracy compared to the highest-performing unimodal model. Furthermore, it achieved 13.5% higher mean accuracy against the hardest to classify samples (14% of total) and equivalent accuracy on the other test set samples.


Copyright Statement: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (



Source Publication

Remote Sensing