TITLE: Processing Terrestrial LiDAR with PDAL
DATE: 2020-12-15
AUTHOR: John L. Godlee
====================================================================


I used a Leica HDS6100 terrestrial laser scanner for my PhD work.
The scans are processed initially in a program produced by Leica
called Cyclone. I used Cyclone to georeference and stitch together
scans from a single plot to create a single shadowless scene. From
there they can be exported to .ptx files, each of which can contain
multiple scans, each with their own local coordinate reference
system. Below is an annotated .ptx file, showing the role of each
line:

 [Cyclone]:
https://leica-geosystems.com/en-gb/products/laser-scanners/software/
leica-cyclone

   20224  # Number of columns
   8615  # Number of rows
   482595.121831 8330769.987967 1254.138086  # Scanner registered
position in real space (xyz)
   -0.990870 -0.134818 -0.000312  # Scanner registered axis 'X'
   0.134818 -0.990870 -0.000175  # Scanner registered axis 'Y'
   -0.000285 -0.000215 1.000000  # Scanner registered axis 'Z'
   -0.990870 -0.134818 -0.000312 0  # 4x4 tranformation matrix
   0.134818 -0.990870 -0.000175 0
   -0.000285 -0.000215 1.000000 0
   482595.121831 8330769.987967 1254.138086 1
   0 0 0 0.500000  # Start of point coordinates
   0 0 0 0.500000  # Unreturned pulses
   0 0 0 0.500000
   0 0 0 0.500000
   -0.000046 0.909775 -1.885635 0.010376  # First returned pulse
   -0.000046 0.903366 -1.870834 0.015015
   -0.000046 0.895859 -1.853836 0.019165
   -0.000046 0.894424 -1.849380 0.020874
   -0.000046 0.898849 -1.857010 0.024781

When a new scan starts the same header material will be repeated,
but with different values depending on the scanner position.

My end goal was to have a voxelised point cloud with noise removed
so that only foliage material remains.

First I needed to split the .ptx file into separate scans, based on
the header material in each scan:

   #!/usr/bin/env sh

   if [ $# -ne 1 ]; then
       printf "Must supply one argument:\n  [1] input.ptx\n"
       exit 1
   fi

   # Get lines at which to split
   lines=$(rg -n --no-encoding -M 10 "^[0-9]+\s+?$" $1 |
       sed 's/:.*//g' |
       awk 'NR%2!=0' |
       tr '\n' ' ' |
       sed 's/^[0-9]\s//g')

   # Get name of file without extension
   noext="${1%.ptx}"

   # Split file by scans using array dimension rows in header as
line ref
   csplit -f $noext -b "_%d.ptx" $1 $lines

Then I converted each scan into a .laz file, with the coordinates
transformed according to the transformation matrix in the header
material:

   #!/usr/bin/env sh

   if [ $# -lt 2 ]; then
       printf "Must supply at least two arguments:\n  [1]
input.ptx\n  [2] output.laz\n"
       exit 1
   fi

   # For each argument
   matrix=$(head -n 10 $1 | tail -4 | sed -r 's/0\s+?$/0.0/g' |
awk -f transpose.awk)

   pdal pipeline pipelines/ptx_laz.json --readers.text.filename=$1
\
       --filters.transformation.matrix="${matrix}" \
       --writers.las.filename=$2

Here is the PDAL pipeline:

   [
       {
           "type" : "readers.text",
           "filename" : "input.txt",
           "header" : "X Y Z I",
           "skip" : 10
       },
       {
           "type" : "filters.transformation",
           "matrix" : "0 -1  0  1  1  0  0  2  0  0  1  3  0  0  0
1"
       },
       {
           "type" : "writers.las",
           "compression" : "true",
           "minor_version" : "2",
           "dataformat_id" : "0",
           "forward" : "all",
           "filename" : "output.laz"
       }
   ]

Then I merged the .laz files back together using pdal merge
file1.ptx file2.ptx ....

Then I voxelised the .laz file, again using a PDAL pipeline. This
method creates voxels of 0.01 m^3 (1 cm^3), with a point at the
center of each occupied voxel:

   [
       {
           "type" : "readers.las",
           "filename" : "input.laz"
       },
       {
           "type":"filters.voxeldownsize",
           "cell" : 0.01,
           "mode" : "center"
       },
       {
           "type" : "writers.las",
           "compression" : "true",
           "minor_version" : "2",
           "dataformat_id" : "0",
           "forward" : "all",
           "filename" : "output.laz"
       }
   ]

Finally I excluded noise with a PDAL pipeline. This method measures
the mean distance of each point to its eight nearest neighbours,
then excludes points with mean distances greater than the 95%
confidence interval of the distribution:

   [
       {
           "type" : "readers.las",
           "filename" : "input.laz"
       },
       {
           "type" : "filters.outlier",
           "method" : "statistical",
           "mean_k" : 8,
           "multiplier" : 1.96
       },
       {
         "type" : "filters.range",
         "limits" : "Classification![7:7]"
       },
       {
           "type" : "writers.las",
           "compression" : "true",
           "minor_version" : "2",
           "dataformat_id" : "0",
           "forward" : "all",
           "filename" : "output.laz"
       }
   ]