Seamless AR Integration: Depth-Aware Edge Blending and Environment Analysis

January 12, 2021

The problem: Raw AR overlays look fake. Virtual objects sit on top of camera feed with hard edges. No integration with the environment. The eye immediately recognizes something is wrong.

Standard AR rendering simply composites virtual content over video. This works for floating UI elements. It fails for physical objects that should appear to exist in real space. The disconnect is jarring.

We needed natural blending. Virtual BIM elements should feel integrated with physical space. Edges should soften where virtual meets real. Lighting should feel consistent. The overlay should be convincing.

Why Standard Compositing Fails

Simple alpha blending treats AR as a 2D overlay:

FinalColor = CameraColor * (1 - VirtualAlpha) + VirtualColor * VirtualAlpha

This ignores depth relationships. A virtual pipe behind a real wall renders on top of it. Virtual edges stay sharp even when they should fade. The result looks like a video game cutout, not integrated objects.

The eye uses multiple cues to judge spatial relationships:

Depth ordering — Nearer objects occlude farther ones
Edge characteristics — Real boundaries have specific properties
Lighting consistency — Objects in the same space share lighting
Atmospheric effects — Distance affects appearance

Standard compositing addresses none of these. Our shader pipeline handles all of them.

Multi-Pass Depth Strategy

The solution renders depth information separately for different object categories:

Shell objects — Environmental surfaces: walls, floors, ceilings
Inside objects — BIM elements: pipes, ducts, equipment

Two depth buffers enable comparison. Where is virtual geometry relative to physical space?

RenderTexture depthShellRT = RenderTexture.GetTemporary(
    source.width>>1, source.height>>1, 16, RenderTextureFormat.Depth);
RenderTexture depthInsideRT = RenderTexture.GetTemporary(
    source.width>>1, source.height>>1, 16, RenderTextureFormat.Depth);

// Render shell objects depth
manual.CopyFrom(main);
manual.cullingMask = settings.shellObjects;
manual.clearFlags = CameraClearFlags.Depth;
manual.targetTexture = depthShellRT;
manual.RenderWithShader(depthShader, string.Empty);

// Render inside objects depth
manual.CopyFrom(main);
manual.cullingMask = settings.insideObjects;
manual.clearFlags = CameraClearFlags.Depth;
manual.targetTexture = depthInsideRT;
manual.RenderWithShader(depthShader, string.Empty);

Half resolution rendering — Depth buffers at 50% resolution save bandwidth. Depth variation is smooth, full resolution unnecessary.

Layer mask separation — Unity layers categorize objects. Shell vs inside determined by layer assignment during import.

Shader replacement — Simple depth-only shader maximizes performance. No lighting calculations, no material evaluation, just depth values.

The depth shader is minimal:

Shader "Twnkls/DepthOnly"
{
    SubShader
    {
        Pass
        {
            ZTest GEqual
            cull off
            
            CGPROGRAM
            #pragma vertex vert
            #pragma fragment frag
            #include "UnityCG.cginc"

            struct appdata
            {
                float4 vertex : POSITION;
            };

            struct v2f
            {
                float4 vertex : SV_POSITION;
            };

            v2f vert (appdata v)
            {
                v2f o;
                o.vertex = UnityObjectToClipPos(v.vertex);
                return o;
            }

            half4 frag (v2f i) : SV_Target
            {
                return half4(0,0,0,0);
            }
            ENDCG
        }
    }
}

Only vertex transformation. No fragment work beyond depth write. This pass runs fast even with complex geometry.

Environment Edge Detection

Real-world edges provide cues for blending. Detect edges in camera feed, brighten virtual geometry near them. This creates the impression of environmental interaction.

The pipeline extracts luminance first:

Shader "Twnkls/Luminance"
{
    Pass
    {
        half4 frag (v2f varIn) : COLOR
        {
            return half4(dot(tex2D(_MainTex, varIn.tex).rgb, 
                half3(0.299, 0.587, 0.114)).xxx, 1);
        }
    }
}

Standard RGB to luminance conversion. Weights account for human perception — green contributes most, blue least.

Edge detection applies Sobel-like filtering in two passes:

Shader "Twnkls/EdgeDetect"
{
    Pass
    {
        Blend One[blendMode]

        uniform float2 direction;
        static float2 directionUp = direction / _ScreenParams.xy;
        static float2 directionLeft = float2(direction.y, -direction.x) / 
            _ScreenParams.xy;
        uniform half3 filterRow;
        static half filterSum = 1.f / (2.f * (filterRow.x + filterRow.y + 
            filterRow.z));

        half4 mainFS(v2f varIn) : COLOR
        {
            half3 up;
            up.x = tex2D(_MainTex, varIn.tex00).r;
            up.y = tex2D(_MainTex, varIn.tex01).r;
            up.z = tex2D(_MainTex, varIn.tex02).r;
            half val = dot(up, filterRow);
            
            half3 down;
            down.x = tex2D(_MainTex, varIn.tex20).r;
            down.y = tex2D(_MainTex, varIn.tex21).r;
            down.z = tex2D(_MainTex, varIn.tex22).r;
            val += dot(down, -filterRow);
            
            return half4(abs(val * filterSum).xxx, 1);
        }
    }
}

First pass — Direction (1,0,0) detects horizontal edges
Second pass — Direction (0,1,0) detects vertical edges, additive blend combines both

The vertex shader pre-calculates texture coordinates for neighboring pixels:

v2f mainVS(appdata_img varIn)
{
    v2f varOut;
    varOut.pos = UnityObjectToClipPos(varIn.vertex);
    varOut.tex00 = varIn.texcoord.xy + directionUp + directionLeft;
    varOut.tex01 = varIn.texcoord.xy + directionUp;
    varOut.tex02 = varIn.texcoord.xy + directionUp - directionLeft;
    varOut.tex20 = varIn.texcoord.xy - directionUp + directionLeft;
    varOut.tex21 = varIn.texcoord.xy - directionUp;
    varOut.tex22 = varIn.texcoord.xy - directionUp - directionLeft;
    return varOut;
}

This moves coordinate calculation to vertex shader. Fragment shader just samples the pre-calculated coordinates. Performance optimization.

Depth-Aware Composition

The composition shader combines all inputs — camera feed, BIM render, depth buffers, edge detection — into final output:

half4 frag (v2f i) : SV_Target
{
    half shellDepthRaw = tex2D(_DepthShellTex, i.uv).r;
    half insideDepthRaw = tex2D(_DepthInsideTex, i.uv).r;
    half4 originalcol = tex2D(_MainTex, i.uv);
    half4 colPipe = tex2D(texPipe, i.uv);
    half edge = tex2D(texEdge, i.uv).r;
    
    half scale = _Scale_Offset_Lower_Upper.x;
    half offset = _Scale_Offset_Lower_Upper.y;
    half lower = _Scale_Offset_Lower_Upper.z;
    half upper = _Scale_Offset_Lower_Upper.w;
    
    half insideDepth = LinearEyeDepth(insideDepthRaw);
    half shellDepth = LinearEyeDepth(shellDepthRaw);

    edge = min(1.0, (edge * 20.0));
    
    half diff = insideDepth - shellDepth;
    
    if (diff > 0 && colPipe.a > 0.5)
    {
        diff = clamp(diff * scale + offset, lower, upper);
        
        if (edge > 0)
        {
            half3 originalhsv = rgb2hsv(originalcol.rgb);
            originalhsv.b *= (1 + edge * 0.25);
            originalcol.rgb = hsv2rgb(originalhsv);
        }
        
        colPipe = lerp(colPipe, originalcol, clamp(diff + edge, 0, 1));
        return colPipe;
    }
    else
    {
        return lerp(originalcol, colPipe, colPipe.a);
    }
}

The logic flow:

Compare depths — diff = insideDepth - shellDepth
Positive diff means virtual object behind physical surface.

Apply edge brightening — Where camera feed has edges, increase brightness by 25% in HSV space.

Blend based on depth — Deeper behind = more fade toward camera feed. Near surface = stay opaque.

Combine edge and depth — Both contribute to final blend factor.

HSV Color Space Manipulation

RGB space doesn't separate brightness from color. Increasing brightness in RGB shifts hue. HSV separates these:

Hue — Color identity (red, green, blue)
Saturation — Color intensity (vivid vs washed out)
Value — Brightness (dark vs light)

Adjusting V (brightness) without changing H (hue) or S (saturation) preserves perceived color:

half3 rgb2hsv(float3 c)
{
    float4 K = float4(0.0, -1.0 / 3.0, 2.0 / 3.0, -1.0);
    float4 p = lerp(float4(c.bg, K.wz), float4(c.gb, K.xy), step(c.b, c.g));
    float4 q = lerp(float4(p.xyw, c.r), float4(c.r, p.yzx), step(p.x, c.r));

    float d = q.x - min(q.w, q.y);
    float e = 1.0e-10;
    return float3(abs(q.z + (q.w - q.y) / (6.0 * d + e)), 
        d / (q.x + e), q.x);
}

half3 hsv2rgb(float3 c)
{
    c = float3(c.x, clamp(c.yz, 0.0, 1.0));
    float4 K = float4(1.0, 2.0 / 3.0, 1.0 / 3.0, 3.0);
    float3 p = abs(frac(c.xxx + K.xyz) * 6.0 - K.www);
    return c.z * lerp(K.xxx, clamp(p - K.xxx, 0.0, 1.0), c.y);
}

This enables natural-looking brightening at edges. The effect feels like light interaction rather than artificial overlay.

Occlusion Visualization

For elements behind walls, a specialized shader provides x-ray visualization:

// Fresnel rim lighting
float fresnelNode = FresnelBias + FresnelScale * 
    pow(1.0 - fresnelNdotV, FresnelPower);

float4 RimLighting = RimLightColor * 
    ((1.0 + (_SinTime.z - -1.0) * (2.0 - 1.0) / (1.0 - -1.0)) * 
    fresnelNode);

// Distance-based pulsing
float dist = distance(appendResult23, appendResult24 + float2(0,0)) + 
    mulTime27;
float2 temp_cast = (0.0 + (dist - 0.0) * (1.0 - 0.0) / 
    (PulsatingSize - 0.0)).xx;

o.Emission = RimLighting + 
    (DistancePulseColor * tex2D(GradientTexture, temp_cast).r);

// Distance-based fade
float temp_output = distance(worldPos, _WorldSpaceCameraPos);
o.Alpha = saturate((1.0 + (temp_output - FadeStart) * (0.0 - 1.0) / 
    (FadeEnd - FadeStart)));

This shader:

Fresnel rim light — Edge glow simulates light wrapping around occluded objects
Pulsing gradient — Animated effect draws attention to hidden elements
Distance fade — Gradually fade out distant occluded objects to reduce clutter

The result shows users where BIM elements exist behind physical surfaces — critical for construction verification.

Performance Optimization

The pipeline maintains real-time performance through several optimizations:

Half-resolution depth buffers — Reduces memory bandwidth by 75% with imperceptible quality impact.

Depth-only shader replacement — Skip expensive material evaluation for depth passes.

Single-pass edge detection — Combine horizontal and vertical passes through additive blending rather than separate full-screen passes.

Manual camera control — Reuse configuration, only change culling masks between passes. Avoids camera system overhead.

Temporary render textures — Allocate from pool, release immediately. Zero allocation during steady state.

private void OnRenderImage(RenderTexture source, RenderTexture destination)
{
    // Allocate temporary render targets
    RenderTexture depthShellRT = RenderTexture.GetTemporary(
        source.width>>1, source.height>>1, 16, RenderTextureFormat.Depth);
    RenderTexture depthInsideRT = RenderTexture.GetTemporary(
        source.width>>1, source.height>>1, 16, RenderTextureFormat.Depth);
    RenderTexture luminance = RenderTexture.GetTemporary(
        source.width, source.height, 0, RenderTextureFormat.R8);
    RenderTexture edgeTex = RenderTexture.GetTemporary(
        source.width, source.height, 0, RenderTextureFormat.R8);

    // ... rendering operations ...

    // Release all temporary targets
    RenderTexture.ReleaseTemporary(edgeTex);
    RenderTexture.ReleaseTemporary(luminance);
    RenderTexture.ReleaseTemporary(depthShellRT);
    RenderTexture.ReleaseTemporary(depthInsideRT);
}

Unity's temporary RT pool reuses allocations across frames. This prevents memory churn and maintains consistent frame times.

Configuration and Tuning

The system exposes parameters for artistic control:

[System.Serializable]
private class ShaderSettings
{
    public float scale = 0.3f;
    public float offset = 0.0f;
    public float lowerClamp = 0.1f;
    public float upperClamp = 0.9f;

    [Space]
    public LayerMask insideObjects;
    public LayerMask shellObjects;
}

Scale — How quickly blend transitions based on depth difference
Offset — Baseline blend amount
Clamps — Minimum and maximum blend values prevent completely invisible or completely opaque overlays

These parameters adjust per-scene based on environment characteristics. Bright outdoor sites need different settings than dim indoor spaces.

Why This Approach Works

The multi-pass strategy provides information standard compositing lacks:

Depth relationships — Know where virtual sits relative to real
Environment structure — Detect edges and features in camera feed
Spatial context — Compare multiple depth layers simultaneously

This enables intelligent blending decisions. The shader knows when to fade, when to brighten, when to show occlusion effects.

The result looks natural because it respects spatial relationships. Virtual geometry doesn't just sit on top of camera feed — it integrates with the environment using depth cues and edge characteristics the eye expects.

Most AR applications use simple alpha blending. This approach analyzes the environment and adapts rendering in real-time. That difference is visible. The overlay is convincing. Construction professionals can trust what they see.

The technical complexity — multi-pass rendering, depth comparison, edge detection, color space conversion — serves one goal: make virtual BIM elements feel like they belong in physical space.