Open Ops

TATER Ops — Script Library

Author once, run anywhere. The Script Library lets technicians publish reusable PowerShell or Bash scripts and execute them on up to 500 target devices via the TATER Agent — with full per-target stdout/stderr capture and aggregated job status. MCP-addressable so AI agents can drive the full lifecycle without leaving the chat.

What it is

The Script Library lives in TATER Ops at Operations → Script Library. It is built on top of the same TATER Agent command channel that powers Phase 2 Interactive Remote Control, so no agent-side changes are required — the agent's existing 30-second poll loop picks up new commands automatically.

Each script entry stores:

  • Name & description — what the script does, what it expects
  • Language — PowerShell (Windows) or Bash (Linux / macOS)
  • Risk level — Low / Medium / High. High scripts surface a confirmation banner in the run dialog.
  • Default run-assystem (LOCAL SYSTEM context) or user (logged-in interactive user)
  • Default timeout — 5 to 3600 seconds
  • Version — auto-increments on every save. Old versions are retained on the job records that ran them.
  • Tags / category — for browsing

Running a script

  1. Open Operations → Script Library and click ▶ Run on a script row.
  2. Paste a list of target hostnames (one per line, lowercase, max 500).
  3. Optionally override the run-as context or timeout for this execution.
  4. Confirm. TATER creates one OpsScriptJob document plus one AgentCommand per target. Each agent picks up its command on the next poll.
  5. The job-detail modal opens automatically and auto-polls every 5 seconds while any target is still Pending or Running. Per-target status, exit code, stdout, and stderr stream in as commands complete.

HIGH-risk script warning: Scripts marked riskLevel = high show an amber warning banner in the Run modal that calls out the risk level explicitly. Review the script body and target list before clicking Execute.

Downloading scripts

Every script row in the Library tab has a ⬇ download button between Run and Edit. It saves the script body as a .ps1 (PowerShell) or .sh (Bash) file with a filename derived from the script's name.

For bulk downloads, use the multi-selector:

  1. Each row has a leading checkbox. The header also has a Select all checkbox.
  2. Once one or more scripts are selected, a selection bar appears above the table showing N selected alongside two buttons: ⬇ Download as Zip and Clear.
  3. Clicking Download as Zip fetches each selected script in parallel and packages them into tater-scripts-YYYYMMDD.zip. Filename collisions dedupe automatically (script.ps1, script-2.ps1, etc.).

The zip path uses JSZip loaded lazily from cdn.jsdelivr.net on first use (already permitted by the app CSP). No new API endpoints are involved — the bulk operation reuses GET /api/ops/scripts/:id per selection.

Reviewing results

The Recent Jobs tab is split into two sortable, searchable sections:

  • Active Jobs — jobs in Queued, Running, or Paused state. These can still be paused, resumed, cancelled, or killed.
  • Inactive Jobs — terminal-state jobs (Completed, Cancelled, Failed). Read-only.

Each section has independent sort state — click any column header (Script, Targets, Status, By, When) to sort. The default is When desc (newest first). A search box at the top filters across script name, status, person, and job id with 200ms debounce. The page fetches up to 200 jobs.

Click any row to open the detail modal:

  • KPI strip — live counts that update every 5 seconds while in-flight
  • Per-target table — device, status pill, exit code, expandable stdout and stderr (clipped to ~64 KB per stream)
  • Auto-poll — stops automatically once all commands terminate, or if you navigate away from the modal

Job records have a 180-day TTL in Cosmos — long enough for compliance and audit lookback, capped to keep storage costs predictable.

Job lifecycle controls — Cancel vs. Kill vs. Pause/Resume

Active jobs surface three distinct lifecycle actions, each with different semantics. The action bar above the Active section explains them in one line; pick carefully:

Action Pending children (not yet claimed by agent) Running children (subprocess in flight) When to use
⏸ Pause / ▶ Resume Held — agent's poll endpoint filters out paused commands, so dispatch halts immediately. Resume re-allows them. Continue naturally to completion. Pause is dispatch-only. Halt a rollout mid-wave so you can spot-check the first few completions before continuing the rest.
✕ Cancel Dropped from dispatch queue. Marked Cancelled immediately. Continue to completion. Agent subprocess is not signaled. You've fanned out to 200 devices, 5 are already running, and you want to keep those 5 going but never start the remaining 195.
⛔ Kill Dropped from dispatch queue. Agent receives a kill signal within ~5 seconds (the runner polls GET /agents/commands/:id/check every 5s during execution; on killRequested=true, the subprocess context cancels and the OS process terminates — SIGKILL on Linux/macOS, TerminateProcess on Windows). Script is misbehaving and must stop immediately on every endpoint. Partial state changes are not rolled back.
Kill requires agent v2.1.8+

The kill capability needs the Go agent (v2.1.8 or newer) on each endpoint to poll the new self-check endpoint. Older agents will respect Cancel (Pending dropped) and Pause (dispatch halts) immediately, but Kill won't terminate Running scripts until the agent is upgraded. Check agent versions in Manage → Endpoint Fleet → Agent Versions.

Audit trail

Every script CRUD action and every job creation fires an entry in the TATER audit log:

  • OpsScript:create / update / delete — with name, language, risk level, version
  • OpsScriptJob:create — with target count, script id, script name, risk level

Audit entries record the actor (UPN, OID), the channel (via: web, mcp, copilot, claude, api), and a millisecond-resolution timestamp. SIEM forwarding picks these up via the standard CEF / webhook channel if you have it configured.

Permissions

ActionRequired role
Browse the library / read script bodiesAuditor — not Viewer, because script bodies may contain hardcoded creds or internal hostnames
List job history / read job resultsAuditor
Create / update / delete a scriptAdmin
Execute a script against target devicesAdmin

MCP integration

Seven MCP tools expose the full lifecycle so an AI agent can configure, target, execute, and collect — closing the loop on agent-driven remediation:

ToolPurpose
list_scriptsBrowse the library — returns name, description, language, risk, version, tags
get_scriptRead full content + parameters + version history
create_scriptAuthor a new library script (Admin)
update_scriptEdit an existing script — bumps version (Admin)
execute_scriptRun with target spec + parameter values, returns a job ID (Admin)
get_script_job_statusPoll job progress — per-target state, exit codes, output
list_script_jobsBrowse recent executions in the active org

End-to-end flow an agent can drive without human intervention: find a failing control via get_failing_controls → identify a remediation script via list_scripts → preview affected devices → execute_script → poll get_script_job_statusadd_evidence_comment linking the run to the control. create_change_request is a recommended companion call for fleet-impacting executions so the run ties into change control.

Phase 2 deferred

The current Phase 1 MVP focuses on the core CRUD + execute + status loop. The following are explicitly deferred:

  • Approval gating — second-person sign-off required before HIGH-risk scripts run
  • Scheduling / recurrence — run now / run at time / run on cron
  • Staged rollout — run on N% of targets first, pause, review, continue
  • Dynamic-query targeting — target by OS / OU / compliance state instead of explicit hostname list
  • Authoring sandbox — test on a single target before promoting to the library
  • Syntax highlighting in the editor