$log writings thoughts tools

Play YT Videos as a Native SwiftUI Experience

Nov 23, 2025

·

5 min read

tl;dr: Better YouTube playback in SwiftUI without brittle iframes.

SwiftUI-in-YouTube-Color-Scheme

iFrames Fall Short

Most RSS readers embed YouTube with iframes, but you get almost no control over playback. On spotty networks, they often fail outright. For a smooth, native-feeling experience, you need something better.

Architecture

+---------------------------+
| State Manager             |
| (side-loads config/data)  |
+-----------^-------^-------+
            |       |
            |       | sync state
            |       |
+-----------+-------+-----------+
| Controls Layer                |
+-------------------------------+
| Thumbnail Layer (SwiftUI Image)|
+-------------------------------+
| WebKit WebView Layer          |
|   [JS actions & event hooks]  |
|         ^          |          |
|   perform actions   | observe |
+-----------+---------+---------+
            | events/state
            v
        State Manager

Think of it like a burger:

  • Bun: SwiftUI Image for the thumbnail.
  • Veg & Patty: WKWebView with the video and user scripts.
  • Cheese: State manager syncing updates.
  • Condiments: SwiftUI controls.

Bun

YTPlayer is initialized with plain URLs. Extract the video ID and pull your own thumbnail for consistent quality and cacheability: https://i.ytimg.com/vi/<videoID>/maxresdefault.jpg. Prefer maxresdefault and gracefully fall back to hqdefault if it 404s. Keep the thumbnail visible through load so list scrolling stays smooth and there’s no flash when the WebView mounts.

YouTube API Reference

Veg & Patty

Treat WKWebView as a touchless render surface and let SwiftUI own interaction.

WebView Setup

Prewarm a hidden WKWebView sharing a WKProcessPool so the first play is instant, and keep it muted until the user taps play to satisfy iOS autoplay rules. Enable inline playback and autoplay (allowsInlineMediaPlayback = true, mediaTypesRequiringUserActionForPlayback = []). Inject user scripts at document start to strip YouTube chrome, block pointer events, prune the DOM to just the player/video, restyle it full-bleed, and wire message handlers for state callbacks.

let config = WKWebViewConfiguration()
config.processPool = sharedPool
config.allowsInlineMediaPlayback = true
config.mediaTypesRequiringUserActionForPlayback = []
config.userContentController.addUserScript(
  WKUserScript(source: removalJS, injectionTime: .atDocumentStart, forMainFrameOnly: true)
)
config.userContentController.add(self, name: "videoState")

User scripts do the heavy lifting. A document-start guard wraps HTMLMediaElement.play so audio cannot start until SwiftUI opts in, blocks runtime autoplay toggles, and hides native controls until the overlay allows them. Another layer watches the YouTube watch page, forces native controls, pauses on metadata load, dedupes text tracks with a locale preference, and collapses the DOM to just the player/video, restyled as a full-bleed canvas. A bridge forwards <video> lifecycle events to SwiftUI through window.webkit.messageHandlers.playbackState—metadata, duration/time updates, play/pause, buffering, seeking, ended, PiP, fullscreen—with a MutationObserver to keep late-added videos wired.

DOM hooks and why: we grab the video element to intercept playback (play/pause), suppress autoplay (autoplay attribute/property), hide native chrome (controls attribute/property), observe media state (time/duration events), and detect mode changes (webkitPresentationMode) for PiP/fullscreen tracking. We target YouTube’s player containers (#player.style-scope.ytd-watch-flexy, #player) so we can prune everything else and restyle the player/video to a full-bleed canvas. We bridge back into SwiftUI via window.webkit.messageHandlers.playbackState because that’s the sanctioned way for WKWebView to post messages to native. The autoplay guard overrides HTMLMediaElement.play to no-op until SwiftUI opts in, and wraps Element.setAttribute plus the autoplay/controls property descriptors so YouTube can’t flip those flags behind our back.

Observer

Observers keep SwiftUI in sync. JavaScript posts state changes through the message handler; the state manager coalesces noisy signals (like rapid timeupdate bursts) and updates overlay progress, play/pause, buffering, and mode (inline/fullscreen/PiP). If the WebView disappears during cell reuse, the observer unsubscribes and state falls back to the thumbnail.

PiP

YouTube blocks PiP on iOS. Intercept presentation mode changes and stop propagation so WebKit can enter PiP anyway:

var video = document.querySelector('video');
if (video) {
  video.addEventListener(
    'webkitpresentationmodechanged',
    function (event) {
      event.stopPropagation();
    },
    true
  );
}

Cheese

WKWebView stays display-only: no gesture handling. It’s created lazily—only when the user hits play—to keep list scrolling fast. The state manager holds the single source of truth: it listens to WebView messages (time, duration, buffering, mode changes), smooths them, and drives SwiftUI bindings. Commands flow the other way: play/pause/seek/PiP/fullscreen dispatch as JavaScript evaluations, not WebView gestures. When a cell disappears, the manager tears down the WebView to reclaim memory and resets progress to the thumbnail state.

Condiments

The UI lives in two simple states: ready and playing/paused. Minimal branching keeps the surface snappy and predictable.

Ready

Show a single Start button and the thumbnail; the WKWebView does not exist yet, so lists stay lightweight. On tap, spin up the WebView, swap the button for a progress indicator, and keep the thumbnail pinned until loading finishes, scripts inject, and the video reports ready.

Playing/Paused

Surface only what matters: PiP, Fullscreen, Play/Pause, and a seek bar. Every control calls into the state manager, which forwards a JavaScript action to the WebView and waits for the playback observer to confirm before updating UI.

Next Steps

I don’t copy&paste the code directly from my repo here, instead I extract important points and explain. Have a plan to wrap the implementation into a Swift package, which will be published soon.