Context-Aware Annotations: Capturing and Attaching Field Notes to BIM Elements

March 25, 2021

The problem: Construction issues get reported verbally, in scattered notes, through disconnected photos. "The pipe on the second floor near the south wall has a problem." Which pipe? Which wall? What problem?

Traditional issue tracking loses spatial context. Photos exist separately from BIM models. Notes reference locations ambiguously. Weeks later, finding the exact element someone reported requires detective work.

AR-BIM enables precise spatial anchoring. Point at an element, add an annotation, capture the exact view. The issue stays attached to that specific BIM object forever. No ambiguity. No lost context.

Annotation Data Structure

Each annotation packages multiple data types:

[Serializable]
public class Annotation
{
    public string subject;
    public string message;
    public string deadline;
    public long recipientEmployeeId = 2307;
    public long[] ccEmployeeIds = { };

    public CameraData camera;

    public override string ToString()
    {
        return JsonUtility.ToJson(this);
    }
}

[Serializable]
public class CameraData
{
    public Photo snapShot;

    public float[] viewPoint;
    public float[] direction;
    public float[] upVector;

    public int fieldOfView;
}

[Serializable]
public class Photo
{
    public string format;
    public string data; // base64 encoded

    public Photo(string format, string base64Data)
    {
        this.format = format;
        data = base64Data;
    }
}

Text content — Subject and message describe the issue
Assignment data — Recipient and CC list for workflow
Deadline — Expected resolution date
Camera data — Exact view parameters for recreation
Photo — Visual documentation

This structure captures both human-readable description and machine-readable spatial data. Text for understanding, camera data for precise positioning.

Camera Data Capture

The camera data enables view recreation. Store view parameters, later position a virtual camera identically. This recreates the exact perspective the annotator saw.

public CameraData GetCameraData(Camera cam)
{
    return new CameraData
    {
        viewPoint = new float[] {
            cam.transform.position.x,
            cam.transform.position.y,
            cam.transform.position.z
        },
        direction = new float[] {
            cam.transform.forward.x,
            cam.transform.forward.y,
            cam.transform.forward.z
        },
        upVector = new float[] {
            cam.transform.up.x,
            cam.transform.up.y,
            cam.transform.up.z
        },
        fieldOfView = (int)cam.fieldOfView
    };
}

viewPoint — Camera position in world space
direction — Camera forward vector
upVector — Camera up vector
fieldOfView — Viewing angle

Together these parameters fully specify camera state. Recreating the view is straightforward:

void SetCameraFromData(Camera cam, CameraData data)
{
    cam.transform.position = new Vector3(
        data.viewPoint[0], 
        data.viewPoint[1], 
        data.viewPoint[2]
    );
    
    cam.transform.rotation = Quaternion.LookRotation(
        new Vector3(data.direction[0], data.direction[1], data.direction[2]),
        new Vector3(data.upVector[0], data.upVector[1], data.upVector[2])
    );
    
    cam.fieldOfView = data.fieldOfView;
}

This enables annotation review. Show the annotated element from the exact angle and distance the annotator used. The reviewer sees precisely what the annotator saw.

Photo Capture and Encoding

Visual documentation accompanies text descriptions. The system captures camera feed at annotation time:

public void CapturePhoto()
{
    Texture2D photo = new Texture2D(Screen.width, Screen.height, 
        TextureFormat.RGB24, false);
    
    photo.ReadPixels(new Rect(0, 0, Screen.width, Screen.height), 0, 0);
    photo.Apply();
    
    byte[] bytes = photo.EncodeToJPG(75);
    string base64 = System.Convert.ToBase64String(bytes);
    
    Photo photoData = new Photo("image/jpeg", base64);
    
    Destroy(photo);
    
    return photoData;
}

ReadPixels — Copy framebuffer to texture
EncodeToJPG — Compress to JPEG at 75% quality
Base64 encode — Convert binary to string for JSON transport
Destroy texture — Clean up temporary allocation

The result is embeddable in JSON. The annotation object serializes to string containing both text and image data. One POST request uploads everything.

Quality 75% balances file size and visual fidelity. Construction photos don't need highest quality — issues must be visible, pixel-perfect accuracy unnecessary. Lower quality reduces bandwidth, especially on construction site networks.

Element Association

Annotations attach to specific BIM elements by ID. Raycasting determines which element the user is looking at:

public void SelectElement()
{
    Ray ray = mainCamera.ScreenPointToRay(new Vector3(
        Screen.width / 2, Screen.height / 2, 0));
    
    RaycastHit hit;
    
    if (Physics.Raycast(ray, out hit, Mathf.Infinity, selectableLayerMask))
    {
        BimMetaData metadata = hit.collider.GetComponent<BimMetaData>();
        
        if (metadata != null)
        {
            annotationUI.SetCurrentElementID(metadata.ElementID);
        }
    }
}

Center screen raycast — Shoots ray from viewport center into scene
Layer mask filtering — Only hit objects on selectable layers
Metadata extraction — Get BIM ID from component

This pattern is common in FPS games. Replace gun with selection, same concept. The center of screen becomes the selection point.

UI crosshair shows users where they're aiming. Tap confirm button, element gets selected, annotation dialog opens with element ID pre-populated.

UI Flow

The annotation workflow is multi-stage:

Each stage has specific responsibilities:

Select element — User points at BIM object, raycast determines which
Take photo — Capture camera feed showing the element
Preview photo — User confirms photo shows the issue
Fill form — Enter subject, message, deadline, recipient
Validate — Ensure required fields present, photo attached
Upload — POST to API endpoint
Confirmation — Show success or failure
Reset — Clear form for next annotation

public class AnnotationUI : MonoBehaviour
{
    [SerializeField] private Image preview;
    [SerializeField] private TMP_InputField subject;
    [SerializeField] private TMP_InputField message;
    [SerializeField] private DateTimeSelector deadline;
    [SerializeField] private IDDropdown employeesDropdown;
    [SerializeField] private Button saveBtn;
    [SerializeField] private Button clearBtn;
    [SerializeField] private Button takePictureButton;

    private long elementID = -1;
    private CameraData camData;

    private void Update()
    {
        saveBtn.interactable = camData?.snapShot != null;
    }

    public void PreviewPicture(Texture2D picture, CameraData data)
    {
        preview.sprite = Sprite.Create(picture, 
            new Rect(0, 0, picture.width, picture.height), 
            new Vector2(0.5f, 0.5f));
        preview.transform.GetChild(0).gameObject.SetActive(false);
        camData = data;
    }

    public void SetCurrentElementID(long newElementID)
    {
        elementID = newElementID;
    }

    private void OnSave()
    {
        if (elementID < 0)
        {
            Debug.LogError("Missing id error, could not upload annotation");
            return;
        }

        Annotation a = new Annotation
        {
            subject = string.IsNullOrWhiteSpace(subject.text) ? 
                "Subject" : subject.text,
            message = string.IsNullOrWhiteSpace(message.text) ? 
                "Message" : message.text,
            deadline = deadline.Value.ToString("yyyy-MM-ddTHH:mm:00Z"),
            recipientEmployeeId = employeesDropdown.SelectedID,
            camera = camData
        };

        StartCoroutine(DoUploadAnnotation(a, elementID));
        Reset();
    }
}

The save button stays disabled until a photo is captured. This prevents invalid submissions. Required fields enforce data quality.

API Integration

Annotations POST to REST endpoint as JSON:

private IEnumerator DoUploadAnnotation(Annotation annotation, long elementId)
{
    JSONPostRequest req = AnnotateRequest.PostElementQuestion(
        elementId, annotation);
    
    yield return req.Send();

    WebResponse res = new WebResponse(req);

    if (res.Success && res.ResultCode >= 200 && res.ResultCode < 300)
    {
        Debug.Log("Successfully uploaded annotation");
    }
    else
    {
        Debug.LogError($"Upload of annotation failed ({res.ResultCode}: " +
            $"{res.Error})! \n{res.DataString}");
        Debug.LogError($"Original Request: " +
            $"{JsonConvert.SerializeObject(req.request)}");
    }
}

Coroutine execution — Network operation doesn't block UI
Response validation — Check HTTP status code
Error logging — Capture full request/response for debugging
User feedback — Show success or failure message

The endpoint receives element ID in URL path, annotation data in request body. Backend associates annotation with BIM element in database.

DateTime Handling

Deadlines use ISO 8601 format for consistency across systems:

deadline.Value.ToString("yyyy-MM-ddTHH:mm:00Z")

Format: 2024-12-17T14:30:00Z

Date — YYYY-MM-DD
Time — HH:mm:00
Timezone — Z indicates UTC

Seconds always zero — minute-level precision sufficient for construction deadlines. Consistent format prevents parsing errors across client/server boundaries.

Employee Selection

Annotations route to responsible parties via employee dropdown:

public class IDDropdown : MonoBehaviour
{
    private Dictionary<int, string> employeeMap;
    
    public long SelectedID { get; private set; }

    public void PopulateFromAPI()
    {
        // Fetch employee list from API
        StartCoroutine(DownloadEmployees());
    }

    private IEnumerator DownloadEmployees()
    {
        // Download employee data
        // Populate dropdown
        // Map dropdown indices to employee IDs
    }

    public void OnSelectionChanged(int index)
    {
        SelectedID = employeeMap[index];
    }
}

Dropdown shows employee names. Selection stores ID. Annotation references ID in recipientEmployeeId field. Backend routes notification to appropriate person.

CC list enables additional notifications. Array of employee IDs. Each receives notification without being primary recipient.

Validation and Error Handling

Multiple validation points prevent invalid submissions:

UI validation — Save button disabled without photo
Field validation — Required fields checked before submission
Element validation — Must have valid element ID
Network validation — Check response codes

private void OnSave()
{
    if (elementID < 0)
    {
        Debug.LogError("Missing id error, could not upload annotation");
        return;
    }

    // Ensure subject and message have values
    string subjectText = string.IsNullOrWhiteSpace(subject.text) ? 
        "Subject" : subject.text;
    string messageText = string.IsNullOrWhiteSpace(message.text) ? 
        "Message" : message.text;

    // Create annotation with validated data
    Annotation a = new Annotation
    {
        subject = subjectText,
        message = messageText,
        deadline = deadline.Value.ToString("yyyy-MM-ddTHH:mm:00Z"),
        recipientEmployeeId = employeesDropdown.SelectedID,
        camera = camData
    };

    StartCoroutine(DoUploadAnnotation(a, elementID));
    Reset();
}

Defensive programming prevents crashes. If subject empty, use default. If message empty, use default. ID missing? Log error and abort.

Form Reset

After submission, clear state for next annotation:

public void Reset()
{
    preview.sprite = null;
    preview.transform.GetChild(0).gameObject.SetActive(true);

    ScreenManager.Instance.ActivateScreen(viewer);
    
    deadline.Value = DateTime.Now;
    subject.text = string.Empty;
    message.text = string.Empty;
}

Clear preview — Remove photo, show placeholder
Return to viewer — Exit annotation mode
Reset fields — Clear text, reset deadline to now

This prevents accidentally resubmitting previous annotation data. Each annotation starts fresh.

Why Spatial Anchoring Matters

Traditional issue reports have this problem:

"Second floor mechanical room, pipe near north wall has corrosion."

Which mechanical room? Which pipe? Which wall is north? The description depends on shared mental models. Different people interpret differently.

Spatial annotation eliminates ambiguity:

{
  "elementId": 4523,
  "subject": "Corrosion detected",
  "message": "Red-brown discoloration on pipe surface",
  "camera": {
    "viewPoint": [45.2, 12.8, -23.4],
    "direction": [0.707, -0.1, 0.707],
    "upVector": [0, 1, 0],
    "fieldOfView": 60
  },
  "photo": "data:image/jpeg;base64,..."
}

elementId 4523 — Specific pipe in BIM model, no ambiguity
Camera data — Exact view angle for recreation
Photo — Visual evidence

Later, another person views annotation:

System loads BIM model
Highlights element 4523
Positions camera from annotation data
Shows photo alongside current view

The reviewer sees exactly what the annotator saw. Same element, same angle. If conditions changed, comparison is immediate and obvious.

This transforms issue tracking from text-based to spatially-anchored. Issues don't float in ambiguous space — they attach to specific BIM elements with precise viewing context.

Use Cases

Quality inspections — Document defects with photo and exact location
Change requests — Show what needs modification, attach to element
Progress tracking — Record completion state per element
Safety observations — Flag hazards attached to specific locations
Client walkthroughs — Capture feedback spatially referenced

All leverage the same mechanism: element ID + camera data + photo + text. The spatial anchor makes everything precise.

Performance Considerations

Photo capture is expensive. ReadPixels forces GPU-CPU sync. Minimize captures:

public void TakePicture()
{
    // Only capture when user explicitly requests
    if (Input.GetButtonDown("Capture"))
    {
        StartCoroutine(CapturePhotoCoroutine());
    }
}

private IEnumerator CapturePhotoCoroutine()
{
    // Wait for frame render to complete
    yield return new WaitForEndOfFrame();
    
    // Now read pixels
    Texture2D photo = CapturePhoto();
    
    // Process asynchronously where possible
    ThreadPool.QueueUserWorkItem(EncodePhoto, photo);
}

WaitForEndOfFrame — Ensures rendering complete before pixel read
Explicit trigger — Only capture when user presses button
Async encoding — JPEG compression happens off main thread where possible

This prevents frame drops during normal AR usage. Capture cost isolated to explicit user action.

Why This Enables Collaboration

AR-BIM without annotations is single-player. You see BIM overlay, walk around, inspect. Useful, but isolated.

Annotations make it multiplayer. One person identifies issue. Another reviews from same perspective. Third person verifies fix. All see identical spatial context.

The camera data + photo combination means remote review works. Reviewer doesn't need to be on site. They see the annotator's view remotely, understand context, make decisions.

This turns AR from visualization tool into collaborative platform. Field observations stay spatially anchored to exact BIM elements. No ambiguity. No lost context. Just precise, spatial issue tracking.