DocsAI Video Analysis

Video pipeline, Modal integration, and insights

AI Video Analysis

Surflink's AI video analysis is powered by a Modal serverless backend running the SurfVision model. The system processes surf session footage to detect surfers, track movement, classify maneuvers, and generate structured analytics.

Video Processing Pipeline

1. Upload

The coach uploads video files through the /upload page. Files can come from:

  • Direct file upload (drag-and-drop or file picker)
  • Frame.io import (browsing connected accounts)

Before upload, files are optionally compressed client-side using FFmpeg WASM to reduce file size and upload time.

2. Submit to Modal

Each clip is sent to the Modal API endpoint:

POST $NEXT_PUBLIC_MODAL_API_URL/analyze

The API returns a job_id for tracking the processing job.

3. Processing

The Modal backend runs the SurfVision model which:

  • Detects and tracks individual surfers across frames
  • Classifies actions/maneuvers per surfer per frame
  • Groups detected actions into wave rides
  • Generates an annotated output video with overlays
  • Re-encodes the output to H.264 via FFmpeg for browser playback
  • Produces a structured JSON stats file

Progress is written every 30 frames during analysis. After frame analysis completes, the backend writes a "Re-encoding video..." stage at 98% during FFmpeg re-encode, then "Complete" at 100% before saving the final output. This ensures the UI always reflects the current processing stage rather than appearing stuck.

4. Polling

The frontend polls for progress:

GET $NEXT_PUBLIC_MODAL_API_URL/status/{job_id}

For multi-clip sessions, all clips are polled in parallel every 3 seconds using Promise.allSettled. The session detail page shows per-clip processing indicators with progress bars, stage labels, and ETA until each job completes.

5. CDN Transfer (Storage Caching)

Once analysis completes, a centralized CDN queue manages downloading processed videos and uploading them to Supabase Storage:

  • All callers (poll loop, page-load recovery, manual retry) push to a single FIFO queue backed by refs
  • Concurrency is limited to MAX_CDN_CONCURRENCY = 2 workers globally -- no competing queues
  • Local GPU clips: triggerClipStoreLocal downloads the video from the local server into the browser and uploads to Supabase Storage
  • Cloud (Modal) clips: triggerClipStoreCloud calls /api/clips/store which downloads from Modal server-side and uploads to Supabase Storage
  • Automatic retry: transient failures (timeouts, network errors) are retried up to 3 times with exponential backoff (2s, 4s, 8s) before marking cdn_error
  • Stats are fetched alongside the video and saved to the stats_json column on the clip/session record

6. Streaming

Processed videos are served through the /api/video/stream proxy endpoint, which supports byte-range requests for efficient scrubbing. The proxy allowlists Modal, Supabase, and Frame.io hosts.

Local GPU clips bypass the stream proxy entirely -- the browser plays directly from localhost:{port}/result/{job_id}/video for immediate playback while CDN transfer runs in the background. Once the CDN transfer completes, playback switches to the Supabase Storage URL.

Access to the stream proxy is protected by HMAC-signed tokens -- clients call /api/video/sign (which requires authentication) to obtain a time-limited signed URL, then pass that to the stream proxy. The middleware bypass for /api/video/stream (required to avoid 431 errors from range request cookies) is compensated by this token verification.

AI Insights Panel

The AIInsightsPanel component renders the analysis results in five tabs:

Overview

  • Summary cards: waves detected, surfers tracked, actions classified, session duration
  • AI-generated text summary of the session
  • Top actions bar chart showing the most frequent maneuvers

Actions Detected

  • Expandable list of all classified actions
  • Categories: paddling, riding, turning, cutback, aerial, wipeout, duck dive, popup, floater, bottom turn, top turn, tube ride
  • Each action entry includes timestamp -- click to seek the video to that moment

Wave Analysis

  • Per-wave breakdown with surfer count, duration, and classified actions
  • Click-to-seek timestamps for each wave

Surfer Tracking

  • Per-surfer statistics: waves ridden, time in water, total actions
  • Links surfer track IDs to student profiles via the session_surfers table
  • Coaches can assign tracked surfers to specific students

Timeline

  • Chronological event timeline spanning the full session
  • Color-coded action markers
  • Click to seek to any event in the video

Multi-Clip Sessions

Sessions can contain multiple clips (e.g., different camera angles or time segments). The parseStats function in AIInsightsPanel aggregates data across all clips, merging action counts, wave data, surfer tracking, and timeline events into a unified view.

Stats JSON Structure

The AI backend returns a JSON payload with this general structure:

{
  "summary": "AI-generated text summary...",
  "duration_seconds": 180,
  "total_waves": 12,
  "total_surfers": 3,
  "actions": [
    {
      "type": "cutback",
      "surfer_id": 1,
      "start_ms": 45200,
      "end_ms": 47800,
      "confidence": 0.92
    }
  ],
  "waves": [
    {
      "wave_number": 1,
      "start_ms": 10000,
      "end_ms": 25000,
      "surfers": [1, 2],
      "actions": [...]
    }
  ],
  "surfer_tracks": [
    {
      "track_id": 1,
      "waves_ridden": 5,
      "total_actions": 18,
      "time_in_water_seconds": 120
    }
  ],
  "timeline": [
    {
      "timestamp_ms": 45200,
      "type": "cutback",
      "surfer_id": 1
    }
  ]
}