back to index

qr-based ar anchoring in construction

Problem: GPS doesn't work indoors. Construction sites need AR overlays positioned accurately, within centimeters, not meters. Walk into a building under construction, point your device at a wall, and the virtual BIM model needs to align perfectly with physical elements.

Traditional AR solutions use feature tracking or SLAM. These drift over time and struggle in sparse environments. We needed something deterministic. Something that works on day one of a project when there's nothing but bare concrete.

Solution: QR markers placed with measured positions enable instant, deterministic BIM anchoring. Scanning a code sets the BIM model in the correct location immediately.

The Challenge

Indoor AR positioning presents three core problems:

No GPS signal. Satellite positioning fails when close to or inside buildings. Alternative methods are needed.

Precision requirements. Construction tolerances demand accuracy. A pipe off by 40cm is a problem. AR overlays must match that precision as much as possible.

Dynamic environments. Construction sites change daily. Today's landmark is tomorrow's demolished wall. The positioning system can't rely on persistent features of the phyisical environment.

Architecture Overview

The solution to face the challenge uses three coordinated systems:

Multi-threaded QR detection. OpenCV running on background thread, processing camera frames continuously without blocking the main thread.

Marker management system. Handles marker lifecycle, spatial relationships, and intelligent neighbor activation to maintain tracking coverage.

Transformation math. Converts marker-relative positions to world-space coordinates, applying inverse transforms to anchor the BIM model correctly.

Threading Strategy

QR detection is quite expensive. Processing frames on the main thread would drop framerate to unacceptable levels resulting in a bad user experience. The solution: dedicated detection thread using concurrent queues.

private Thread processThread;
private readonly ConcurrentQueue<string> detectedCodes = new ConcurrentQueue<string>();
private byte[] pixels;
private bool bufferAvailable;

private void Update()
{
    if (detectedCodes.TryDequeue(out string newCode)) 
        SetLastDetectedCode(newCode);

    if (bufferAvailable) return;

    if (cameraImage == null || cameraImage.PixelBufferPtr == IntPtr.Zero)
    {
        cameraImage = VuforiaBehaviour.Instance.CameraDevice
            .GetCameraImage(PixelFormat.GRAYSCALE);
        return;
    }

    if (pixels == null)
        InitializeBuffer(cameraImage.BufferWidth, cameraImage.BufferHeight);

    Marshal.Copy(cameraImage.PixelBufferPtr, pixels, 0, 
        cameraImage.BufferWidth * cameraImage.BufferHeight);

    bufferAvailable = true;
}

private void DetectorThread()
{
    while (isRunning)
    {
        if (!bufferAvailable) continue;

        string detectResult = DetectFunction();

        if (!string.IsNullOrWhiteSpace(detectResult))
            detectedCodes.Enqueue(detectResult);

        bufferAvailable = false;
        GC.Collect();
        Thread.Sleep((int)(detectionTimeInterval * 1000));
    }
}

Main thread grabs camera frame, copies pixels to shared buffer and a sets flag.

Detection thread waits for flag, processes with OpenCV, queues result, sleeps.

Concurrent queue safely passes detected codes back to main thread for processing.

This pattern maintains 60fps AR rendering while continuously scanning for markers. Good user experience, no locks, no blocking, clean separation.

Marker Management

Each physical marker has stored metadata: position, rotation, neighboring markers. What we don't want is to track every marker simultaneously. Instead we have activation of the marker based on detection and proximity.

private void OnQrCodeDetection(string obj)
{
    Match match = idRegex.Match(obj);

    if (!match.Success || !int.TryParse(match.Value, out int id)) return;

    if (activeMarkers.ContainsKey(id) && activeMarkers[id].enabled) return;

    DisableAllActiveMarkers();

    ActivateMarker(id, null, true);
}

private void ActivateMarker(int markerID, StoreyData storeyData = null, 
    bool activateNeighbours = false)
{
    if (activeMarkers.ContainsKey(markerID))
    {
        activeMarkers[markerID].gameObject.SetActive(true);
        activeMarkers[markerID].enabled = true;
        return;
    }

    storeyData ??= storeyDataMap[activeStorey];

    Texture2D texture = textureDBMap[storeyData.markerDatabaseID][markerID];

    if (!markerInstanceMap.ContainsKey(texture))
        markerInstanceMap[texture] = BuildMarker(texture, 
            MARKER_WIDTH_IN_METERS, storeyData.markers[markerID]);

    MagicMarker magicMarker = markerInstanceMap[texture];

    activeMarkers.Add(markerID, magicMarker);
    activeMarkers[markerID].enabled = true;
    activeMarkers[markerID].gameObject.SetActive(true);

    if (!activateNeighbours) return;

    foreach (MarkerData neighbourData in storeyData.markers.Values
        .OrderBy(x => Vector3.Distance(x.position, magicMarker.Data.position))
        .Take(closestNeighboursCount))
    {
        ActivateMarker(neighbourData.id, storeyData);
    }
}

When a QR code is detected a few things happen:

  1. Extract ID using regex pattern matching
  2. Disable all currently active markers
  3. Activate the detected marker
  4. Activate the N closest neighboring markers

Many more things happen besides this. But overall, this creates tracking redundancy. If the user moves and loses sight of the first marker, neighbors are already active and tracking continues seamlessly.

Position Calculation

The core task: translating a marker’s stored position into world coordinates that place the BIM model accurately in the physical environment.

Each marker records its location relative to the BIM model’s origin. When a marker is detected, the system computes the inverse transform. This determines where the virtual model should be, relative to the marker’s position in the real world.

Each marker stores its position relative to the BIM model's origin. When tracked, we need the inverse: where should the model be, relative to this marker?

public void UpdateModelPosition()
{
    if (!ModelController.Instance) return;

    ModelController.Instance.transform.SetParent(transform);

    ModelController.Instance.transform.localPosition = 
        Vector3.Scale(Data.Matrix.inverse.ExtractPosition(), 
        new Vector3(-1, 1, 1));
    
    ModelController.Instance.transform.localRotation = 
        Data.Matrix.ExtractRotation();
}

The code parents the BIM model to the detected marker, making the marker the reference frame for all positioning. This is intentional: the marker and AR camera stay at the world origin (0,0,0), and the model moves relative to them. The alternative, moving the camera and marker to the marker's position in the BIM model, would place them at coordinates like (900, 300, 100). Unity's 32-bit floats lose precision at those distances, causing visible jitter and rendering artifacts.

From there, the code applies the inverse of the marker's stored position. If a marker sits at coordinates (5, 0, 3) relative to the model origin, the model needs to be placed at (-5, 0, -3) relative to the marker's real-world position.

One detail worth noting: Unity uses left-handed coordinates, while BIM authoring tools typically export right-handed data. The Vector3.Scale with (-1, 1, 1) handles that conversion by flipping the X axis.

Vuforia takes care of the actual marker tracking. Everything else, from managing which markers are active to storing their spatial relationships to calculating these transforms, that's our layer.

Regex-Based ID Extraction

QR codes contain structured data. We only need the numeric ID. Regex extracts it cleanly:

private static readonly Regex idRegex = new Regex("[0-9]{3}");

Match match = idRegex.Match(qrCodeContent);
if (match.Success && int.TryParse(match.Value, out int id))
{
    // Process marker ID
}

Pattern [0-9]{3} matches exactly three digits. The QR code can contain other data (project identifiers, metadata, URLs). We ignore it. Three-digit ID is all we need.

This loose coupling means QR codes can evolve. We can add project codes, embed URLs, include validation data. The detection system now simply doesn't care and it extracts the ID and continues.

Performance Characteristics

The threading strategy pays off in real-world use. AR rendering stays at 60fps even while the detection thread processes camera frames continuously. From the moment a QR code enters the frame, it takes roughly one second for the system to detect, parse, and anchor the model. Marker switching happens faster, under 100ms, so walking between markers feels seamless. Memory sits around 50MB for the marker textures and instances.

Construction sites are harsh environments for computer vision. Poor lighting, dust on markers, codes viewed at steep angles, and the environment changes every hour. The system handles these conditions because OpenCV's QR detection is robust, and we're not relying on a single frame. Continuous scanning means if one frame fails to detect, the next one often succeeds.

When multiple markers appear in frame simultaneously, the concurrent queue handles the burst. Detections queue up and process in order without blocking the render loop.

Tracking State Management

Not all tracking is created equal. A marker might be fully visible and locked in, or it might be partially obscured and Vuforia is guessing. The user needs to know when to trust what they see.

Vuforia reports tracking quality per marker: NO_POSE means it lost the marker entirely, LIMITED means it's struggling, DETECTED means it found something, TRACKED means solid lock, and EXTENDED_TRACKED means it's inferring position from motion after losing visual contact.

Extended tracking happened constantly. Construction floors can span 30 by 50 meters. Walk away from a marker and the device is inferring position from motion alone. Drift accumulates. That's why we placed markers at regular intervals across the floor. The user walks until drift becomes noticeable, scans the next marker, and the model snaps back into alignment.

With multiple markers active, the system needs a single confidence level to show the user. The solution: take the best state across all active markers.

public int GetTrackingState()
{
    int output = -1;

    foreach (MagicMarker magicMarker in activeMarkers.Values)
    {
        output = magicMarker.CurrentStatus.Status switch
        {
            Status.NO_POSE => Mathf.Max(output, 0),
            Status.LIMITED => Mathf.Max(output, 1),
            Status.DETECTED => Mathf.Max(output, 2),
            Status.TRACKED => Mathf.Max(output, 3),
            Status.EXTENDED_TRACKED => Mathf.Max(output, 2),
            _ => output
        };
    }

    return output;
}

If any marker reports TRACKED, the UI shows green. If everything is at NO_POSE, the user sees a warning to find a marker. This simple aggregation gives users confidence in what they're seeing without overwhelming them with per-marker details.

World Center Switching

This is where the marker handoff happens. You've walked 20 meters from the last marker, extended tracking has been inferring your position from motion, and now you scan a new QR code. The system needs to switch which marker serves as the reference point for positioning.

Vuforia calls this the "world center." When a marker achieves solid tracking, it can claim that role. The model position is then recalculated relative to the new marker's known coordinates.

private void OnTrackingChanged(ObserverBehaviour observer, TargetStatus newStatus)
{
    if (observer != observerBehaviour) return;
    if (VuforiaBehaviour.Instance.WorldCenter == observerBehaviour) return;
    if (newStatus.Status != Status.TRACKED) return;

    CurrentTrackedMarker = this;

    if (VuforiaBehaviour.Instance.WorldCenterMode == WorldCenterMode.SPECIFIC_TARGET)
        VuforiaBehaviour.Instance.SetWorldCenter(WorldCenterMode.SPECIFIC_TARGET, 
            observerBehaviour);

    UpdateModelPosition();
}

The guards at the top matter. We only switch if this marker isn't already the world center, and only if the tracking status is solid. No point switching to a marker that's barely visible.

When the switch happens, UpdateModelPosition() runs the same inverse transform logic from before. The model snaps to its correct position relative to the new marker. Any drift that accumulated during extended tracking disappears. From the user's perspective, the BIM model just locks back into place.

Why Not Visual Positioning?

We tested alternatives. Visual Positioning Systems like Immersal scan the environment and match camera frames against a pre-built 3D map. No markers needed. The results were impressive: accurate positioning across varied environments, smooth user experience, no physical infrastructure to maintain.

But construction sites change constantly. A wall goes up. Scaffolding moves. Equipment relocates. The environment from last week no longer matches this week's 3D map. VPS struggled with this. Positioning became unreliable, sometimes placing the model meters off, sometimes failing to localize entirely. Rebuilding the 3D map weekly wasn't practical.

Markers have their own problems. They can be covered, damaged, or removed. But they're predictable problems. A covered marker means you walk to the next one. A removed marker means you print and place a new one. The failure modes are obvious and recoverable. VPS failure modes are subtle: you don't always know when positioning has drifted until the model is visibly wrong.

For stable environments, VPS is the better solution. For construction, markers won.

Why This Works

The key insight is that QR codes are deterministic. Unlike SLAM or feature tracking, they don't drift over time. They don't need a richly textured environment to work. Bare concrete, empty floors, outdoor sites: the system doesn't care. As long as the marker is visible, positioning is accurate.

The workflow starts during site survey. Someone walks the building, places markers at known locations, measures their positions relative to the BIM origin, and uploads that data. From that point forward, any device with the app can scan any marker and know exactly where it is. No training period. No environment mapping. Day one of the project, the system works.

This also means positioning is persistent. Turn off the device, come back the next day, scan a marker, and the accuracy is identical. Different users on the same project see the same aligned model because they're all referencing the same marker database. If someone questions the accuracy, they can verify it directly: measure the distance between a physical column and its virtual counterpart.

The architecture handles the rest. Threading keeps rendering smooth while detection runs continuously. Marker management activates neighbors for seamless coverage. Transform math places the model correctly regardless of which marker you scan.