# Interactive Remote Control — Scoping Plan

*Status:* Planned, not implemented. ~3-6 weeks of focused work.
*Owner:* TBD
*Last updated:* 2026-05-06

## Why this exists

The TATER agent currently supports two adjacent capabilities that together fall short of a true remote-assistance experience:

1. **Remote PowerShell/bash command execution** (Phase 2 — shipped) — admins queue scripts that run as SYSTEM or as the logged-in user. Output captured. Good for "run this fix and report back."
2. **One-way screen capture** (Phase 3 — shipped 2026-05-06) — the agent uploads a screenshot of the primary display every 60 seconds. Admins watch in TATER Manage's Multi-Screen Viewer. View-only.

What's missing for parity with ScreenConnect / TeamViewer / RDP / NinjaOne Remote / Splashtop:

- **Real-time framebuffer streaming** at interactive frame rates (15-30 fps, not one frame per minute)
- **Bidirectional input injection** — admin's mouse and keyboard drive the user's session
- **Multi-monitor support** with per-monitor selection
- **Clipboard sharing** between admin browser and remote endpoint
- **File transfer** through the channel
- **Audio passthrough** (optional but expected by some users)

## What we'd build

### Architecture: WebRTC data channel + framebuffer codec

The cleanest implementation for a Go agent + browser admin UI is WebRTC, because:
- Browsers have native `RTCPeerConnection` — no plugin or extension needed in TATER Manage
- WebRTC handles NAT traversal, ICE, and DTLS encryption
- Go has a mature WebRTC library: [`pion/webrtc`](https://github.com/pion/webrtc)
- A single peer connection can carry video (framebuffer), data channels (input events, clipboard), and optionally audio

### Components

```
Admin browser (TATER Manage)               TATER Agent (endpoint)
┌─────────────────────────┐                ┌────────────────────────┐
│ <video>                 │                │ pion/webrtc.PeerConn   │
│   ←  framebuffer        │ ←── WebRTC ──→ │ ↳ video track          │
│ canvas captures input   │   (UDP/STUN)   │   (kbinani capture     │
│ ↓                       │                │    + h.264/vp8 encode) │
│ data channel: input     │                │ ↳ data channel: input  │
│   ↳ pointer + key events│                │   ↳ inject via Win32   │
│ data channel: clipboard │                │     SendInput / X11    │
│ data channel: file xfer │                │     XTestFakeInput     │
└─────────────────────────┘                └────────────────────────┘
        ▲                                            ▲
        │   signaling (offer/answer/ICE)             │
        └────────────────── API ─────────────────────┘
                  POST /api/agents/{id}/rtc/signal
```

### Required new APIs

| Endpoint | Purpose |
|---|---|
| `POST /api/agents/{deviceId}/rtc/sessions` | Admin opens a new session — returns sessionId + ICE servers config |
| `POST /api/agents/{deviceId}/rtc/signal` | Bidirectional signaling — admin and agent exchange SDP offer/answer + ICE candidates via the API as relay |
| `DELETE /api/agents/{deviceId}/rtc/sessions/{id}` | Tear down |
| `GET /api/agents/{deviceId}/rtc/sessions` | SuperAdmin list active sessions for audit |

### Required new agent capabilities

- **Framebuffer encoder** — capture at 15-30 fps, encode with h.264 (via Pion's H264 packetizer with x264 software encode, or VP8 for portability without licensing complications)
- **Input injection** —
  - Windows: `user32.dll!SendInput` for mouse + keyboard. Need session-context handling so input goes to the user's interactive desktop, not session 0.
  - Linux X11: `XTestFakeKeyEvent` / `XTestFakeButtonEvent` (libxtst). Wayland would need a different approach (xdg-desktop-portal RemoteDesktop interface).
  - macOS: CoreGraphics `CGEventCreateMouseEvent` / `CGEventCreateKeyboardEvent` (CGO required).
- **Clipboard bridge** — read/write via clipboard libs per platform.
- **Session manager** — coordinate the WebRTC peer connection, multiple data channels, and graceful teardown.

### Security model

- Agent must reject unsolicited sessions — sessionId issued only by the API, agent verifies signature
- All sessions audit-logged (`AgentRtcSession` container) with start/end timestamps, admin user oid, agent device id, session duration
- Optional **end-user consent prompt** on first connection — UAC-style toast on the endpoint that requires the user to accept before input injection is allowed. Required by enterprise security policies.
- Session timeout (default 60 min idle, 4h hard cap)
- TLS via WebRTC's mandatory DTLS — no plain frames over the wire
- Recording option: record the session for compliance audit (encrypted at rest, retention configurable)

### Browser-side admin UI

- New tab in TATER Manage device detail: **"Remote Session"**
- Click "Start session" → modal with framebuffer `<video>`, mouse/keyboard capture overlay, multi-monitor selector, clipboard send/receive buttons, file upload/download
- All input events sent over the data channel JSON-encoded with millisecond timestamps for replay
- Show end-user consent status, session duration timer, and "End session" button

## Effort estimate

| Phase | Work | Estimate |
|---|---|---|
| 1 | Pion/webrtc agent integration + signaling API + session manager | 1 week |
| 2 | Framebuffer capture+encode loop (Windows GDI, Linux X11; macOS deferred) | 1 week |
| 3 | Input injection (Windows SendInput, Linux XTest) | 4-5 days |
| 4 | Browser UI for admin (video element + canvas overlay + data channels) | 1 week |
| 5 | Clipboard + file transfer over data channel | 3-4 days |
| 6 | End-user consent prompt + session recording + audit hardening | 4-5 days |
| 7 | Multi-monitor + Wayland support | 1 week (deferred) |
| 8 | macOS support (CGO + CoreGraphics + accessibility prompts) | 1 week (deferred) |

**Total: ~3-6 weeks** for a production-grade Win/Linux interactive remote-control feature, depending on whether multi-monitor/Wayland/macOS are in scope for v1.

## Why we haven't done it yet

1. **The market is crowded with mature alternatives.** ScreenConnect, TeamViewer, RDP, AnyDesk, Splashtop, Apple Remote Desktop, NinjaOne Remote, Atera — all are in regular daily use. The feature is *expected*, not *novel*.
2. **TATER's wedge isn't remote control.** TATER's value is compliance scanning, automated remediation, GRC, and federal pipeline. Adding a mediocre remote-control feature wouldn't meaningfully shift positioning vs. the dedicated tools.
3. **The two capabilities we DO ship — remote scripted execution + one-way screen view — cover most "I just need to see what's on the screen and run a fix" cases without the operational + legal overhead of real input injection** (consent prompts, session recording for audit, NIST 800-53 AC-17/AT-3 documentation, etc.).
4. **The right time to build it** is when a customer is willing to pay for it specifically, OR when the federal/DoD pipeline customers ask for it as part of an ATO compliance package and we can scope the consent + session-recording requirements correctly.

## Recommended interim positioning

Per `Docs/TATER-Complementary-Stack.md`, TATER positions ScreenConnect / TeamViewer / NinjaOne Remote as **complement, not replace**. Customers typically have a remote-support tool already; TATER doesn't need to displace it. The two we ship (scripted execution + screen view) handle the diagnostic and remediation work TATER agents do daily; the customer's existing remote-support tool handles the genuinely interactive sessions.

## Decision log

- **2026-05-06** — Phase 3 one-way screen capture shipped. Interactive remote control deferred. This document captures the scoping plan; revisit when a customer specifically requests it or when the broader market position shifts.