depth-aware ar edge blending

date: January 12, 2021

Problem: Raw AR overlays look fake. Virtual objects sit on top of the camera feed with hard edges. No integration with the environment. The eye immediately recognizes something is wrong.

Standard AR rendering composites virtual content over video. This works for floating UI elements. It fails for physical objects that should appear to exist in real space. A pipe behind a wall renders on top of it. Virtual edges stay sharp even when they should fade. The disconnect is jarring.

Solution: A multi-pass shader pipeline that analyzes both the virtual geometry and the real camera feed, then blends them based on depth relationships and environmental cues. Virtual BIM elements integrate with physical space rather than floating on top of it.

Why Standard Compositing Fails

Simple alpha blending treats AR as a 2D overlay: final color equals camera color times transparency plus virtual color times opacity. This ignores depth relationships entirely. The result looks like a video game cutout pasted onto real footage.

The eye uses multiple cues to judge spatial relationships. Nearer objects should occlude farther ones. Real boundaries have specific edge characteristics. Objects in the same space share lighting. Distance affects appearance through atmospheric effects. Standard compositing addresses none of these. Our shader pipeline handles all of them.

Multi-Pass Depth Strategy

The key insight is that we need to know where virtual geometry sits relative to physical space. To answer that question, we render depth information separately for different object categories.

Shell objects are environmental surfaces: walls, floors, ceilings. Inside objects are BIM elements: pipes, ducts, equipment. Two depth buffers enable comparison. When comparing these buffers pixel-by-pixel, the shader knows whether a virtual pipe is in front of or behind a virtual wall.

RenderTexture depthShellRT = RenderTexture.GetTemporary(
    source.width>>1, source.height>>1, 16, RenderTextureFormat.Depth);
RenderTexture depthInsideRT = RenderTexture.GetTemporary(
    source.width>>1, source.height>>1, 16, RenderTextureFormat.Depth);

// Render shell objects depth
manual.cullingMask = settings.shellObjects;
manual.targetTexture = depthShellRT;
manual.RenderWithShader(depthShader, string.Empty);

// Render inside objects depth  
manual.cullingMask = settings.insideObjects;
manual.targetTexture = depthInsideRT;
manual.RenderWithShader(depthShader, string.Empty);

The depth buffers render at half resolution. Depth variation is smooth across pixels, so full resolution is unnecessary. This saves 75% of memory bandwidth with imperceptible quality loss. The depth-only shader skips all material evaluation: no lighting calculations, no textures, just vertex transformation and depth write. This pass runs fast even with complex geometry.

Environment Edge Detection

Real-world edges provide cues for blending. Where the camera feed has edges, brighten virtual geometry near them. This creates the impression of environmental interaction: light reflecting off real surfaces onto virtual objects.

The pipeline first extracts luminance from the camera feed using standard perceptual weights (green contributes most to perceived brightness, blue least). Then Sobel-like filtering runs in two passes: one for horizontal edges, one for vertical. Additive blending combines both into a single edge texture.

The trick for performance is pre-calculating texture coordinates in the vertex shader. The fragment shader samples nine neighboring pixels per fragment. Moving coordinate math to the vertex shader (which runs once per triangle vertex) rather than the fragment shader (which runs once per pixel) saves significant GPU cycles.

Depth-Aware Composition

The composition shader combines all inputs into the final frame. This is where the actual blending decisions happen.

half4 frag (v2f i) : SV_Target
{
    half insideDepth = LinearEyeDepth(tex2D(_DepthInsideTex, i.uv).r);
    half shellDepth = LinearEyeDepth(tex2D(_DepthShellTex, i.uv).r);
    half4 cameraColor = tex2D(_MainTex, i.uv);
    half4 virtualColor = tex2D(texPipe, i.uv);
    half edge = min(1.0, tex2D(texEdge, i.uv).r * 20.0);
    
    half diff = insideDepth - shellDepth;
    
    if (diff > 0 && virtualColor.a > 0.5)
    {
        // Virtual object is behind shell: apply depth-aware blending
        if (edge > 0)
        {
            half3 hsv = rgb2hsv(cameraColor.rgb);
            hsv.b *= (1 + edge * 0.25);  // Brighten by 25% at edges
            cameraColor.rgb = hsv2rgb(hsv);
        }
        return lerp(virtualColor, cameraColor, clamp(diff * scale + edge, lower, upper));
    }
    else
    {
        // Virtual object is in front: simple alpha blend
        return lerp(cameraColor, virtualColor, virtualColor.a);
    }
}

The logic: compare depths to determine if the virtual object is behind the shell surface. Positive difference means the virtual pipe is behind the virtual wall. For objects behind, apply edge brightening (increasing brightness in HSV space preserves hue) and blend based on depth distance. Deeper behind means more fade toward camera feed. For objects in front, use simple alpha blending since no occlusion relationship exists.

HSV Color Space Manipulation

RGB space doesn't separate brightness from color. Increasing brightness in RGB shifts hue, making red look orange or yellow. HSV separates hue (color identity), saturation (color intensity), and value (brightness).

Adjusting brightness without changing hue preserves perceived color. When we brighten the camera feed near edges, we want the colors to stay true. A red wall should look like a brighter red wall, not an orange one. Converting to HSV, adjusting the V channel, and converting back achieves this. The effect feels like light interaction rather than artificial overlay.

Occlusion Visualization

For elements behind walls, blending toward transparency isn't enough. Users need to know where hidden BIM elements exist. A specialized shader provides x-ray visualization.

Fresnel rim lighting creates an edge glow that simulates light wrapping around occluded objects. A pulsing gradient animates attention toward hidden elements. Distance-based fade gradually reduces visibility for distant occluded objects to prevent clutter.

The result shows users where BIM elements exist behind physical surfaces. On a construction site, this means verifying that pipes run where they should before the drywall goes up. Critical for construction verification.

Performance Optimization

The pipeline runs multiple full-screen passes per frame. Without careful optimization, this would destroy frame rate on mobile devices.

Half-resolution depth buffers reduce memory bandwidth by 75% with imperceptible quality loss. Depth-only shader replacement skips expensive material evaluation for depth passes. Edge detection combines horizontal and vertical passes through additive blending rather than separate full-screen draws.

Temporary render textures allocate from Unity's pool and release immediately. This means zero allocation during steady state. The pool reuses allocations across frames, preventing memory churn and maintaining consistent frame times. On mobile, where thermal throttling kills performance after sustained load, consistent frame times matter more than peak performance.

Configuration and Tuning

The system exposes parameters for artistic control. Scale determines how quickly blend transitions based on depth difference. Offset sets a baseline blend amount. Clamps prevent completely invisible or completely opaque overlays: you always see some hint of the virtual geometry, and it never completely blocks the camera feed.

These parameters adjust per-scene based on environment characteristics. Bright outdoor sites need different settings than dim indoor spaces. A construction site at noon has different lighting than a basement mechanical room.

Why This Approach Works

The multi-pass strategy provides information standard compositing lacks. We know where virtual sits relative to real through depth comparison. We understand environment structure through edge detection. We can compare multiple depth layers simultaneously.

This enables intelligent blending decisions. The shader knows when to fade, when to brighten, when to show occlusion effects. The result looks natural because it respects the spatial relationships the eye expects.

Most AR applications use simple alpha blending. This approach analyzes the environment and adapts rendering in real-time. That difference is visible. The overlay is convincing. Construction professionals can trust what they see.

The technical complexity serves one goal: make virtual BIM elements feel like they belong in physical space.