one platform, three runtimes

date: June 28, 2022

Most VR collaboration platforms pick one device and build for it. We needed to support three: Meta Quest headsets, desktop clients, and web browsers. A participant on a VR headset in Amsterdam and a colleague on a laptop in London had to share the same virtual space in real time, with no perceptible difference in capability. This multi-device requirement was the product’s core differentiator and its hardest engineering constraint.

The problem with one Unity project and three build targets

Unity can target Android (Quest), Windows standalone, and WebGL from the same project. In theory. In practice, the three runtimes have fundamentally different performance envelopes. A Quest 2 runs on a mobile Snapdragon chipset. A desktop client has a dedicated GPU. A WebGL build runs inside a browser sandbox, competing for memory with every open tab. The same scene that rendered at 72fps on Quest would choke a Chrome tab.

We maintained a single Unity project with preprocessor directives and build profiles to handle divergence. But the real complexity was not in the build system. It was in the interaction model, the avatar system, and how the backend served all three clients without branching into platform-specific logic.

Input abstraction

VR controllers provide six degrees of freedom per hand. A mouse provides a 2D cursor and clicks. These are not variations of the same input. They are categorically different interaction paradigms.

We built an input abstraction layer in the Unity client that mapped device capabilities to a shared set of actions: point, select, grab, navigate. On Quest, pointing was a ray cast from the controller. On desktop, it was a ray cast from the camera through the mouse position. On WebGL, identical to desktop but with tighter performance constraints on the ray cast frequency.

The abstraction was not about making all devices feel the same. It was about making all devices produce the same networked events. When a VR user pointed at an object and grabbed it, the network message was identical to when a desktop user clicked and dragged it. Other participants saw the same outcome regardless of how it was triggered.

Two avatar formats, one identity

Users configured their avatars through Ready Player Me, an embedded iframe on the profile page of the web application. The platform stored two avatar URLs per account: a half-body GLB and a full-body GLB. The half-body avatar showed head, torso, and arms. The full-body avatar included legs and full skeletal animation.

VR users loaded the full-body avatar. Their hand and head tracking drove the skeleton directly. Desktop and WebGL users loaded the half-body variant. Without tracked controllers, a full-body avatar would just stand there with limp arms. The half-body format masked the absence of tracking data rather than exposing it.

The backend treated both URLs as simple string fields on the user entity, validated to end in .glb. No platform logic. The Unity client decided which URL to fetch based on its own build target at runtime. This kept the API surface clean. The server did not need to know what device a user was on. It stored both avatar options and let the client choose.

Environment delivery

Virtual spaces (we called them environments) were Unity asset bundles hosted on S3 and served through CloudFront. Each environment had a build configuration that included the base URL for the WebGL build assets: the .wasm binary, the .framework.js, the .data file, and the .loader.js. The web client fetched this configuration from a dedicated environment microservice, then initialized the Unity WebGL context with those URLs.

The Quest and desktop clients loaded native asset bundles directly. The environment service did not differentiate between clients. It served build metadata. The client interpreted what it needed.

This architecture meant we could deploy environment updates independently of client updates. A new version of a virtual space went to S3 and CloudFront. All three clients picked it up on next load. No app store review cycle for Quest. No desktop installer update. No web deployment. Just a CDN cache invalidation.

Networking without platform branches

Photon handled the real-time networking. All three clients connected to the same Photon rooms. The server relayed position updates, voice data, and interaction events between participants regardless of their device.

We set an app mode flag during initialization that the Unity client sent to the server on connect. But this flag controlled quality-of-service parameters, not branching logic. The networking protocol was identical across all three runtimes. A VR user’s hand position serialized the same way as a desktop user’s pointer position. The receiving client rendered it according to its own avatar and input model.

Voice chat ran through Agora, layered on top of Photon. The web client mounted the voice component after the Unity WebGL context confirmed it had loaded. This sequencing mattered. WebGL builds competed for audio resources with the browser. Initializing voice too early would sometimes cause the Unity audio context to fail silently.

What made this work

Three decisions kept the architecture from collapsing under the weight of three runtimes.

First, the backend stayed device-agnostic. The API served data. Clients interpreted it. No if (platform == "quest") in the Java services. This meant the backend team and the Unity team could work independently. It also meant adding a fourth client type (say, a mobile AR viewer) would require zero backend changes.

Second, the avatar split was a product decision, not a technical hack. We did not try to animate a full-body avatar with mouse input. We designed the half-body format specifically for non-tracked contexts. The product adapted to the device rather than pretending all devices were equivalent.

Third, the WebGL client was a React application that embedded Unity through react-unity-webgl. The React layer handled authentication, space selection, and session management. Unity handled the 3D experience. This separation meant the web client could share UI components with the admin panel and the creator tools. The expensive 3D runtime only loaded when a user actually entered a space.

The cost

The Unity project accumulated platform-specific code paths despite our best efforts. WebGL builds required aggressive optimization that sometimes degraded visual quality. Testing across three platforms tripled QA effort. But building three separate clients would have tripled the development team, and dropping a platform would have cut the addressable market. The multi-device approach was a trade-off, not a solution, but it was the right one for a collaboration platform where the value was getting people into the same room.

I was Technical Director and co-founder at Ravel from 2021 to October 2022.