docs/architecture/realtime-errors-observability.md
Realtime, Errors, And Observability
This slice covers WebSocket updates, support-reference errors, CloudWatch/Grafana observability, and cost visibility.
Realtime Updates
flowchart LR
subgraph Browser["Browser"]
NAV["DashboardSideNavigation"]
WSMGR["WebSocketManager"]
DETAIL["Current detail page"]
end
subgraph WSAPI["WebSocket API Gateway"]
CONNECT["$connect"]
DISCONNECT["$disconnect"]
ROUTES["Notifier posts"]
end
subgraph Lambdas["WebSocket Lambdas"]
WSC["ws-test-history-connect"]
WSD["ws-test-history-disconnect"]
WST["ws-test-history-notify"]
WSDOSE["ws-dose-history-notify"]
WSPROG["ws-program-history-notify"]
end
subgraph Streams["DynamoDB Streams"]
TESTH["env-test-history stream"]
DOSEH["env-dose-history stream"]
PROGH["env-program-history stream"]
end
subgraph Data["DynamoDB"]
CONN["env-websocket-connections"]
end
WSMGR --> CONNECT --> WSC --> CONN
WSMGR --> DISCONNECT --> WSD --> CONN
TESTH --> WST --> CONN
DOSEH --> WSDOSE --> CONN
PROGH --> WSPROG --> CONN
WST --> ROUTES --> WSMGR
WSDOSE --> ROUTES --> WSMGR
WSPROG --> ROUTES --> WSMGR
WSMGR --> NAV
WSMGR --> DETAILSupport Reference Error Flow
sequenceDiagram
participant UI as Browser UI
participant API as REST API Gateway
participant Lambda as REST Lambda
participant Logs as CloudWatch Logs
participant ErrorTable as env-app-error
participant Admin as Admin errors widget
UI->>API: Request
API->>Lambda: Invoke
alt unexpected error
Lambda->>Lambda: Generate support reference id
Lambda->>Logs: Log reference id, request id, user id, tank id, exception
Lambda->>ErrorTable: Persist error details
Lambda-->>UI: Friendly error with reference id
UI-->>UI: Show sticky dismissible error banner
else success
Lambda-->>UI: Response
end
Admin->>ErrorTable: List recent errors
Admin->>ErrorTable: Acknowledge/delete selected errorOperations And Cost Observability
flowchart TB
subgraph Runtime["Runtime Resources"]
API["REST API Gateway"]
WS["WebSocket API Gateway"]
LAMBDA["Lambda functions"]
SQS["SQS queues"]
DDB["DynamoDB tables"]
S3["S3 buckets"]
CF["CloudFront"]
end
subgraph Telemetry["AWS Telemetry"]
LOGS["CloudWatch Logs"]
METRICS["CloudWatch Metrics"]
ALARMS["CloudWatch Alarms"]
SNS["SNS ops alerts"]
DASH["CloudWatch dashboard: env-reef-a-matic-operations"]
GRAFANA["Amazon Managed Grafana: env-reefamatic-ops"]
end
subgraph Cost["Cost Sources"]
CE["AWS Cost Explorer"]
TAGS["Environment + Module cost tags"]
USAGE["env-usage-event"]
ADMINCOST["Admin Usage & Costs tab"]
end
API --> LOGS
WS --> LOGS
LAMBDA --> LOGS
LAMBDA --> METRICS
SQS --> METRICS
DDB --> METRICS
S3 --> METRICS
CF --> METRICS
METRICS --> ALARMS --> SNS
LOGS --> DASH
METRICS --> DASH
LOGS --> GRAFANA
METRICS --> GRAFANA
TAGS --> CE
CE --> ADMINCOST
USAGE --> ADMINCOSTNotes
- Test, dose, and program streams use new-and-old images so delete messages can remove left-nav items.
- The app shell applies websocket payloads directly and keeps a short fallback refresh for eventual consistency during async fan-out.
- Admin live-login status reads active WebSocket connection rows.
- Cost Explorer is filtered by
Environmentand grouped by AWS service andModulecost tag. - Per-PDF cost attribution is persisted as usage events with a correlation id.