back to index

drone photogrammetry to explorable 3d terrain

Problem: Festival venues are physical places with terrain, trees, ditches, and infrastructure that affect event design. A flat grid in a 3D engine is not a venue. To design festivals in VR, the virtual environment needs to match the real location with enough precision to make design decisions about structure placement, sightlines, and crowd flow.

Solution: Fly a drone over the venue with automated flight plans, process hundreds of photos into three outputs (3D mesh, orthographic map, elevation data), and stream the result into Unity as an explorable VR environment.

Capture

We used a DJI Phantom 4 Pro with automated flight planning software. The drone flies a grid pattern at a fixed altitude, capturing overlapping nadir (straight-down) photos. A typical venue scan produces 250 to 1,500 automated photos depending on the area. After the grid flight, we manually fly lower passes for vertical surfaces: building facades, tree lines, fences, and any structure that the overhead pass would miss.

Overlap is everything in photogrammetry. Each point on the ground needs to appear in at least three photos from different angles for the reconstruction software to triangulate its 3D position. We flew at 70% frontal overlap and 60% side overlap. Over-capture is cheap (storage costs nothing relative to the flight). Under-capture produces holes in the mesh that require a re-fly.

Processing

The photos go through cloud processing. The software aligns all images by matching feature points across overlapping photos, builds a dense point cloud, then generates three outputs.

The first output is a 3D mesh as an OBJ file with 10cm geometric precision. This becomes the walkable terrain in VR. The second is a geo-referenced orthographic map at 2cm precision in Amersfoort/RD coordinates (the Dutch national grid). This serves as the ground truth reference for the 2D CAD drawings. The third is an elevation map as a GeoTIFF, encoding terrain height data per pixel.

The Netherlands is famously flat, but "flat" is relative. A 30cm slope across a 100-meter field is invisible on a 2D map. For event production, that slope determines whether a stage is level, whether rainwater pools in a vendor area, and whether wheelchair-accessible paths actually work. The elevation data makes these gradients visible in VR.

Loading Environments

Each processed venue becomes a Unity scene. The SceneController manages async additive loading, so environments load on top of the persistent Management scene that holds all managers and session state.

public void Open(Scene newScene)
{
    if (IsLoading)
    {
        Debug.LogError("Opening " + newScene.ToString() 
            + " failed, because another scene was loading already");
        return;
    }

    if (ActiveScene != Scene.None)
    {
        StartCoroutine(UnloadScene(ActiveScene));
    }
    loading = StartCoroutine(Load(newScene));
}

Three environments shipped with the MVP. Each presented different terrain challenges: an airport apron in Germany (hard surface, large open area), an industrial wharf in Amsterdam (mixed terrain, waterfront, existing structures), and a polder field in Flevoland (open grassland, subtle elevation changes, adjacent waterways).

The web viewer mirrors these environments with simplified representations. Each environment defines ground color, sky gradient, fog density, and grid styling to match the character of the real location.

export const ENVIRONMENTS: EnvironmentDef[] = [
  {
    id: "weeze",
    name: "Weeze Airport",
    groundColor: "#707070",
    gridColor: "#555555",
    skyTop: "#2a3a5a",
    fogColor: "#a0b0c0",
    fogDensity: 0.002,
  },
  {
    id: "biddinghuizen",
    name: "Biddinghuizen",
    groundColor: "#3d6b3d",
    gridColor: "#2d5a2d",
    skyTop: "#1a3050",
    fogColor: "#a8c8e0",
    fogDensity: 0.0012,
  },
]

Lower fog density for the open polder, higher for the urban wharf. These details make the web viewer feel like the actual location even without the full photogrammetry mesh.

Texture Streaming

Photogrammetry meshes produce massive textures. A single venue scan generates gigabytes of texture data at full resolution. Loading all of it into GPU memory at once is not feasible for VR, where you need consistent frame rates.

We integrated the Granite SDK (by Graphine) for virtual texture streaming. Instead of loading full-resolution textures, Granite streams only the tiles visible at the current camera position and detail level. The SDK manages a tile cache, loads high-resolution tiles for nearby surfaces, and falls back to lower resolution for distant terrain.

public void CreateGraniteLibrary() {
    string folder = "03-GraniteLib";
    string[] names = BundleManager.GetBundle(bundleID).GetAllAssetNames();
    List<string> graniteFiles = new List<string>();
    for (int i = 0; i < names.Length; i++) {
        if (names[i].ToLower().Contains(folder.ToLower())) {
            graniteFiles.Add(names[i]);
            StartCoroutine(LoadTextAsset(names[i], false));
        }
    }
}

The Granite library files ship as part of the environment's asset bundle. On scene load, the setup script extracts them to the streaming assets path, creates the tile set, and initializes the streaming system before the environment becomes visible. The result is full-resolution terrain textures where the designer is standing, with graceful degradation in the distance, all within VR frame budgets.

Why Photogrammetry Over LIDAR

We started with LIDAR in 2015. Terrestrial laser scanners produced geometrically precise point clouds, but the workflow was wrong for our use case. LIDAR equipment was expensive to rent, slow to set up, and produced poor results in natural environments. Vegetation scattered the laser returns, creating noisy data that required extensive manual cleanup.

Drone photogrammetry solved all three problems. The DJI Phantom 4 Pro cost a fraction of a LIDAR rental. A full venue scan took hours instead of days. Natural terrain (grass, trees, water) reconstructed cleanly because the algorithm works from visual features rather than laser returns. The trade-off was lower geometric precision (10cm versus sub-centimeter for LIDAR), but for event production, 10cm is more than sufficient.

Result: A repeatable pipeline from physical location to explorable VR environment, using consumer drone hardware and cloud processing.

I was Technical Producer at Chasing the Hihat and built the festVR prototype from 2015 to 2018.