Extending GeoTrellis Types

Custom Keys

Want to jump straight to a code example? See VoxelKey.scala

Keys are used to index (or “give a position to”) tiles in a tile layer. Typically these tiles are arranged in some conceptual grid, for instance in a two-dimensional matrix via a SpatialKey. There is also a SpaceTimeKey, which arranges tiles in a cube of two spatial dimensions and one time dimension.

In this way, keys define how a tile layer is shaped. Here, we provide an example of how to define a new key type, should you want a custom one for your application.

The VoxelKey type

A voxel is the 3D analogue to a 2D pixel. By defining a new VoxelKey type, we can create grids of tiles that have a 3D spatial relationship. The class definition itself is simple:

case class VoxelKey(x: Int, y: Int, z: Int)

Key usage in many GeoTrellis operations is done generically with a K type parameter, for instance in the S3LayerReader class:

/* Read a tile layer from S3 via a given `LayerId`. Function signature slightly simplified. */
S3LayerReader.read[K: Boundable: JsonFormat, V, M]: LayerId => RDD[(K, V)] with Metadata[M]

Where the pattern [A: Trait1: Trait2: ...] means that for whichever A you end up using, it must have an implicit instance of Trait1 and Trait2 (and any others) in scope. Really it’s just syntactic sugar for [A](implicit ev0: Trait1[A], ev1: Trait2[A], ...). The read method above would be used in real life like:

val reader: S3LayerReader = ...

// The type on `rdd` is often left off for brevity.
val rdd: RDD[(SpatialKey, MultibandTile)] with Metadata[LayoutDefinition] =
    reader.read[SpatialKey, MultibandTile, LayoutDefinition]("someLayer")

Boundable, SpatialComponent, and JsonFormat are frequent constraints on keys. Let’s give those typeclasses some implementations:

import geotrellis.spark._
import spray.json._

// A companion object is a good place for typeclass instances.
object VoxelKey {

  // What are the minimum and maximum possible keys in the key space?
  implicit object Boundable extends Boundable[VoxelKey] {
    def minBound(a: VoxelKey, b: VoxelKey) = {
      VoxelKey(math.min(a.x, b.x), math.min(a.y, b.y), math.min(a.z, b.z))
    }

    def maxBound(a: VoxelKey, b: VoxelKey) = {
      VoxelKey(math.max(a.x, b.x), math.max(a.y, b.y), math.max(a.z, b.z))
    }
  }

  /** JSON Conversion */
  implicit object VoxelKeyFormat extends RootJsonFormat[VoxelKey] {
    // See full example for real code.
    def write(k: VoxelKey) = ...

    def read(value: JsValue) = ...
  }

  /** Since [[VoxelKey]] has x and y coordinates, it can take advantage of
    * the [[SpatialComponent]] lens. Lenses are essentially "getters and setters"
    * that can be used in highly generic code.
    */
  implicit val spatialComponent = {
    Component[VoxelKey, SpatialKey](
      /* "get" a SpatialKey from VoxelKey */
      k => SpatialKey(k.x, k.y),
      /* "set" (x,y) spatial elements of a VoxelKey */
      (k, sk) => VoxelKey(sk.col, sk.row, k.z)
    )
  }
}

With these, VoxelKey is now (almost) usable as a key type in GeoTrellis.

A Z-Curve SFC for VoxelKey

Many operations require a KeyIndex as well, which are usually implemented with some hardcoded key type. VoxelKey would need one as well, which we will back by a Z-Curve for this example:

/** A [[KeyIndex]] based on [[VoxelKey]]. */
class ZVoxelKeyIndex(val keyBounds: KeyBounds[VoxelKey]) extends KeyIndex[VoxelKey] {
  /* ''Z3'' here is a convenient shorthand for any 3-dimensional key. */
  private def toZ(k: VoxelKey): Z3 = Z3(k.x, k.y, k.z)

  def toIndex(k: VoxelKey): Long = toZ(k).z

  def indexRanges(keyRange: (VoxelKey, VoxelKey)): Seq[(Long, Long)] =
    Z3.zranges(toZ(keyRange._1), toZ(keyRange._2))
}

And with a KeyIndex written, it will of course need its own JsonFormat, which demands some additional glue to make fully functional. For more details, see ShardingKeyIndex.scala.

We now have a new fully functional key type which defines a tile cube of three spatial dimensions. Of course, there is nothing stopping you from defining a key in any way you like: it could have three spatial and one time dimension (EinsteinKey?) or even ten spatial dimensions (StringTheoryKey?). Happy tiling.

Custom KeyIndexes

Want to dive right into code? See: ShardingKeyIndex.scala

The KeyIndex trait

The KeyIndex trait is high-level representation of Space Filling Curves, and for us it is critical to Tile layer input/output. As of GeoTrellis 1.0.0, its subclasses are:

  • ZSpatialKeyIndex
  • ZSpaceTimeKeyIndex
  • HilbertSpatialKeyIndex
  • HilbertSpaceTimeKeyIndex
  • RowMajorSpatialKeyIndex

While the subclass constructors can be used directly when creating an index, we always reference them generically elsewhere as KeyIndex. For instance, when we write an RDD, we need to supply a generic KeyIndex:

S3LayerWriter.write[K, V, M]: (LayerId, RDD[(K, V)] with Metadata[M], KeyIndex[K]) => Unit

but when we read or update, we don’t:

S3LayerReader.read[K, V, M]: LayerId => RDD[(K, V)] with Metadata[M]

S3LayerWriter.update[K, V, M]: (LayerId, RDD[(K, V)] with Metadata[M]) => Unit

Luckily for the end user of GeoTrellis, this means they don’t need to keep track of which KeyIndex subclass they used when they initially wrote the layer. The KeyIndex itself is stored a JSON, and critically, (de)serialized generically. Meaning:

/* Instantiate as the parent trait */
val index0: KeyIndex[SpatialKey] = new ZSpatialKeyIndex(KeyBounds(
    SpatialKey(0, 0),
    SpatialKey(9, 9)
))

/* Serializes at the trait level, not the subclass */
val json: JsValue = index0.toJson

/* Deserialize generically */
val index1: KeyIndex[SpatialKey] = json.convertTo[KeyIndex[SpatialKey]]

index0 == index1  // true

Extending KeyIndex

To achieve the above, GeoTrellis has a central JsonFormat registry for the KeyIndex subclasses. When creating a new KeyIndex type, we need to:

  1. Write the index type itself, extending KeyIndex
  2. Write a standard spray.json.JsonFormat for it
  3. Write a Registrator class that registers our new Format with GeoTrellis

To extend KeyIndex, we need to supply implementations for three methods:

/* Most often passed in as an argument ''val'' */
def keyBounds: KeyBounds[K] = ???

/* The 1-dimensional index in the SFC of a given key */
def toIndex(key: K): Long = ???

/* Ranges of results of `toIndex` */
def indexRanges(keyRange: (K, K)): Seq[(Long, Long)] = ???

where K will typically be hard-coded as either SpatialKey or SpaceTimeKey, unless you’ve defined some custom key type for your application. K is generic in our example ShardingKeyIndex, since it holds an inner KeyIndex:

class ShardingKeyIndex[K](val inner: KeyIndex[K], val shardCount: Int) extends KeyIndex[K] { ... }

Writing and Registering a JsonFormat

Supplying a JsonFormat for our new type is fairly ordinary, with a few caveats:

import spray.json._

class ShardingKeyIndexFormat[K: JsonFormat: ClassTag] extends RootJsonFormat[ShardingKeyIndex[K]] {
  /* This is the foundation of the reflection-based deserialization process */
  val TYPE_NAME = "sharding"

  /* Your `write` function must follow this format, with two fields
   * `type` and `properties`. The `properties` JsObject can contain anything.
   */
  def write(index: ShardingKeyIndex[K]): JsValue = {
    JsObject(
      "type" -> JsString(TYPE_NAME),
      "properties" -> JsObject(
        "inner" -> index.inner.toJson,
        "shardCount" -> JsNumber(index.shardCount)
      )
    )
  }

  /* You should check the deserialized `typeName` matches the original */
  def read(value: JsValue): ShardingKeyIndex[K] = {
    value.asJsObject.getFields("type", "properties") match {
      case Seq(JsString(typeName), properties) if typeName == TYPE_NAME => {
        properties.asJsObject.getFields("inner", "shardCount") match {
          case Seq(inner, JsNumber(shardCount)) =>
            new ShardingKeyIndex(inner.convertTo[KeyIndex[K]], shardCount.toInt)
          case _ => throw new DeserializationException("Couldn't deserialize ShardingKeyIndex.")
        }
      }
      case _ => throw new DeserializationException("Wrong KeyIndex type: ShardingKeyIndex expected.")
    }
  }
}

Note

Our Format here only has a K constraint because of our inner KeyIndex. Yours likely won’t.

Now for the final piece of the puzzle, the format Registrator. With the above in place, it’s quite simple:

import geotrellis.spark.io.json._

/* This class must have no arguments! */
class ShardingKeyIndexRegistrator extends KeyIndexRegistrator {
  def register(keyIndexRegistry: KeyIndexRegistry): Unit = {
    implicit val spaceFormat = new ShardingKeyIndexFormat[SpatialKey]()
    implicit val timeFormat = new ShardingKeyIndexFormat[SpaceTimeKey]()

    keyIndexRegistry.register(
      KeyIndexFormatEntry[SpatialKey, ShardingKeyIndex[SpatialKey]](spaceFormat.TYPE_NAME)
    )
    keyIndexRegistry.register(
      KeyIndexFormatEntry[SpaceTimeKey, ShardingKeyIndex[SpaceTimeKey]](timeFormat.TYPE_NAME)
    )
  }
}

At its simplest for an Index with a hard-coded key type, a registrator could look like:

class MyKeyIndexRegistrator extends KeyIndexRegistrator {
  def register(keyIndexRegistry: KeyIndexRegistry): Unit = {
    implicit val format = new MyKeyIndexFormat()

    keyIndexRegistry.register(
      KeyIndexFormatEntry[SpatialKey, MyKeyIndex](format.TYPE_NAME)
    )
  }
}

Plugging a Registrator in

GeoTrellis needs to know about your new Registrator. This is done through an application.conf in your-project/src/main/resources/:

// in `application.conf`
geotrellis.spark.io.index.registrator="geotrellis.doc.examples.spark.ShardingKeyIndexRegistrator"

GeoTrellis will automatically detect the presence of this file, and use your Registrator.

Testing

Writing unit tests for your new Format is the best way to ensure you’ve set up everything correctly. Tests for ShardingKeyIndex can be found in doc-examples/src/test/scala/geotrellis/doc/examples/spark/ShardingKeyIndexSpec.scala, and can be ran in sbt with:

geotrellis > project doc-examples
doc-examples > testOnly geotrellis.doc.examples.spark.ShardingKeyIndexSpec