SIEM Operations SOP

Day-to-day operation of Wealthy’s SIEM (Wazuh) — dashboard reviews, alert response, rule tuning, playbooks, and Annexure-N metrics.

SIEM Operations SOP

Field	Value
Document ID	SOP-006
Classification	Internal
Owner	SRE / SecOps
Approved By	CTO (interim CISO)
Effective Date	April 2026
Review Cycle	Quarterly
Parent Standard	Information Logging (STD-011) · Log Management (STD-012)
Related SOPs	Incident Response (SOP-004) · CERT-In Compliance (SOP-002) · Vulnerability Management (SOP-009)

Role note: The CISO role is currently pending formal appointment. Until then the CTO acts as interim CISO for sign-offs referenced in this SOP.

1. Purpose

Procedures for operating Wealthy’s Security Information and Event Management (SIEM) platform — detection, triage, response, rule tuning, and evidence generation for regulatory audits.

2. Scope

Applies to all security-monitoring activities across:

GKE cluster (security namespace)
GCP audit logs (all projects)
AWS CloudWatch (integrated accounts)
Employee endpoints (Mac, Windows, GCP Linux VMs) via Wazuh agents
GitHub org audit events
External threat intelligence feeds (OTX + AbuseIPDB)

3. Stack

Wealthy’s SIEM is Wazuh 4.14.4, self-hosted on GKE in the security namespace.

Component	Purpose
Wazuh Manager	Rule processing, integrations, agent server
Wazuh Indexer	OpenSearch (3-node cluster) — alert + log storage
Wazuh Dashboard	Web UI at `https://wazuh.wealthy.systems`
Wazuh Agents	Endpoint HIDS on ~20 laptops + GCP VMs
`custom-ai` binary	Triages level ≥10 alerts via Gemini Flash → opens GitHub Issue + Telegram + Slack
`threatintel-sync` binary	Pulls OTX + AbuseIPDB IOCs every 4 hours into Wazuh CDB lists
GCP Pub/Sub integration	Cloud audit logs → Wazuh

4. Daily Operations

Owner: SRE on-call Time: ~15 min each morning (start of IST business hours)

4.1 Dashboard walk-through

Open https://wazuh.wealthy.systems and review:

Dashboard	What to check
Security Overview	24h totals — total alerts, high-severity count, fury failures, threat-intel matches, MITRE tactics
Threat Map	Geographic origin of attacks, top attacker IPs, AI Triage table, Threat Intel Matches table
CERT-In Compliance	6-hour reporting queue (set time range to last 6h). Anything in the list = may need regulator notification
GCP Security	IAM changes by principal — watch for spikes from a single service account

4.2 Agent health

1kubectl exec -n security wazuh-manager-master-0 -- /var/ossec/bin/agent_control -ls

All agents expected to be Active
Any Disconnected agent older than 24h → ping the employee on Slack to reconnect

4.3 Threat intel freshness

1kubectl exec -n security wazuh-manager-master-0 -c wazuh-manager -- \
2  wc -l /var/ossec/etc/lists/malicious-ioc/*

Expected: ~10k IPs, ~1.2k domains, ~1.2k hashes
If counts are 0 → check threatintel-sync s6 service; possibly OTX/AbuseIPDB API key expired

4.4 GitHub Issues triage queue

1gh issue list --repo wealthy/security --label threat-alert --state open --limit 20

Walk through open threat-alert issues:

True positive → relabel incident, follow Incident Response SOP (SOP-004)
False positive → close with false-positive label + comment explaining; queue rule tuning (§6)
Duplicate → close with reference to the open parent issue

5. Alert Response

Trigger: Wazuh emits alert at level ≥ 10 → custom-ai invokes automatically.

Wazuh alert
  └─▶ custom-ai binary
        ├─ Dedup check (rule+IP or rule+agent+desc, 6h window) — skip Gemini if duplicate
        ├─ Gemini Flash — returns priority (noise/low/medium/high/critical) + summary + recommended actions
        ├─ Write to OpenSearch (wazuh-ai-analysis-*)
        ├─ Create/comment GitHub Issue (medium+) — wealthy/security repo
        └─ Notify Telegram + Slack (non-noise)

5.1 Response SLA

Priority (Gemini)	Response time	Channel
`critical`	Immediate — page on-call	Telegram + Slack + phone
`high`	Same business day	Telegram + Slack (`#security-alerts`)
`medium`	Within 3 days	Slack
`low`	Next sprint	GitHub Issue only
`noise`	— (no notification)	—

5.2 CERT-In reportable check

For every critical / high alert, also check: does this fall under one of CERT-In’s 20 reportable incident categories? If yes — the 6-hour reporting clock starts at detection. See CERT-In Compliance (SOP-002).

6. Rule Tuning

6.1 When to tune

Trigger	Action
False positive ratio on a specific rule > 10% over a week	Adjust threshold or add suppression in configmap
New detection gap found during VAPT	Add Wazuh custom rule, PR to `wazuh/manager/*-rules-configmap.yaml`
Post-incident RCA identifies missed signal	Add rule + retest, log in ISRMC minutes
New integration / service deployed	Add rules for its logs

6.2 Where rules live

Layer	Location
Wazuh built-in ruleset	Shipped with the image — do not modify
Fury auth rules (100100-100105)	`wazuh/manager/fury-rules-configmap.yaml` in `wealthy/security`
Threat intel match rules (100200-100205)	`wazuh/manager/threatintel-rules-configmap.yaml`
Ad-hoc suppressions	Add to `configmap.yaml` in `<rule_exclude>`

6.3 Deploying rule changes

1# From wealthy/security repo
2kubectl apply -f wazuh/manager/<changed-configmap>.yaml
3kubectl rollout restart statefulset/wazuh-manager-master -n security

Validate with /var/ossec/bin/wazuh-logtest inside the manager pod before committing to master.

7. Playbooks

7.1 SSH brute force

Rule 5763 / fury 100105 fires → level 10 alert → AI triage → GitHub Issue + Telegram + Slack
On-call:
- Confirm source IP on Threat Map dashboard
- Block source IP at Kong gateway (or Cloudflare WAF if web-facing)
- Check whether any login succeeded from that IP (search rule.groups: authentication_success)
- If success → escalate to incident, begin Incident Response SOP (SOP-004)

7.2 Fury auth brute force + known-bad IP

Rule 100202 fires (level 13, auto-elevated because IP matches threat intel)
Treat as critical by default — this is almost certainly an attack, not user error
Block at Kong + Cloudflare immediately
Investigate whether the targeted account username has been leaked elsewhere (pwned passwords check)

7.3 GCP IAM anomaly

GCP Security dashboard shows spike from a single principal
Check gcloud logging read for the raw events
If principal is a service account → possible key compromise
- Rotate the SA key immediately
- Check for unauthorized resource creation/modification
- Begin Incident Response SOP (SOP-004) if evidence of compromise

7.4 Threat intel IOC match

Rules 100200-100205 or native 99901-99908 fire → AI triage categorizes
If IOC is on AbuseIPDB (confidence ≥ 90) → block the IP
If IOC is on OTX (named campaign) → investigate whether related indicators (domains, hashes) are also present
Add context to the GitHub Issue: which pulse/campaign matched

7.5 Vulnerability detector CVE

Wazuh vuln detector creates alert; custom-ai opens GitHub Issue
Verify the affected package + version is genuinely installed (Wazuh has version-parsing quirks — e.g. Hoppscotch 26.3.1 vs 2026.3.1)
If real: check upstream vendor advisory, identify affected agents, schedule patch per Patch Management SOP (SOP-005)
If false positive: document in the GitHub Issue, add rule suppression if recurring

7.6 Repeated noise / tuning needed

Same rule firing frequently without actionable signal → tune
Label GitHub Issue false-positive and link to rule change PR
Measure FP rate for Annexure-N metrics (§8)

8. Metrics for Annexure-N (SEBI half-yearly)

Tracked metrics feeding the SOC Functional Efficacy Report:

Metric	Target	Source
Log ingestion latency	< 5 minutes	Wazuh queue + agent `Last keep alive`
Threat intel processing time	< 60 minutes	`threatintel-sync` 4-hour refresh → avg 2h lag
Rule firing count (24h)	Baseline trend	Wazuh Dashboard metric
Dead rules (zero fires / period)	< 20% of total	Wazuh API `rule.firedtimes`
False positive rate	Track monthly	GitHub Issues with `false-positive` label / total threat-alert
False negative rate	Per-incident	Post-incident reviews flagging missed alerts
Critical system agent coverage	100%	`agent_control -ls`

Half-yearly: export the above → fill in Annexure-N template → ISRMC sign-off → SEBI submission.

9. Access Control

Role	Wazuh Dashboard access
`admin`	Full admin (SRE / SecOps / CTO)
`analyst`	Read-only — can view dashboards, search events, no config changes
`kibanaserver`	Internal service account (do not touch)

Account creation / change via OpenSearch Security API — see wazuh/README.md in the security repo. Access reviewed quarterly per Quarterly Access Review.

10. Log Retention

Data	Retention	Hot / queryable	Location
Application logs (all services)	2 years	All hot — fully queryable throughout	GCP Cloud Logging
GCP audit logs (Admin Activity, IAM, etc.)	2 years	All hot	GCP Cloud Logging `_Default` bucket
AWS CloudWatch logs	2 years	All hot	Per-log-group retention
Wazuh alert index (`wazuh-alerts-*`)	30 days rolling	Hot	OpenSearch ISM policy `wazuh-rollover-delete`
Wazuh AI analysis index (`wazuh-ai-analysis-*`)	30 days rolling	Hot	Same ISM policy

No cold / archive tier. All logs are directly queryable for the full retention window — no restore-from-glacier hop, no tiered lookup. GCP Cloud Logging’s Logs Explorer returns any event within the 2-year window in seconds.

Canonical long-term audit trail lives in GCP (satisfies IRDAI/SEBI/CERT-In retention). Wazuh indexer holds a 30-day rolling window for fast security-specific queries; the full 2-year archive is in GCP.

11. Incident Reporting Evidence

Every incident produces a GitHub Issue in wealthy/security:

Label	Meaning
`threat-alert`	Auto-created by custom-ai (all level 10+ alerts)
`incident`	Relabeled when confirmed a real incident
`false-positive`	Closed as not-an-incident (tuning queue)
`contained`	Closed with resolution — audit evidence
`reported-to-cert-in` / `reported-to-sebi` / `reported-to-irdai`	Regulator submissions filed

Auditor query: gh issue list --repo wealthy/security --label incident --state closed --limit 100

12. Escalation Matrix

Signal	First responder	Escalates to	Timeline
Level 10-12 alert, non-critical rule	SRE on-call	SRE Lead	30 min
Level 13+ alert	SRE on-call	CTO (interim CISO)	15 min
Suspected breach (data exfil, unauthorized access to PII)	CTO	CEO	30 min
CERT-In reportable event	CTO	CERT-In PoC	Within 6h (regulatory)
SEBI reportable	CTO	SEBI PoC	Within 6h
DPDP data breach	CTO	DPO	Within 72h

Incident Response SOP (SOP-004) — once an alert becomes an incident
CERT-In Compliance (SOP-002) — regulatory timelines
Patch Management SOP (SOP-005) — fixing vulnerabilities detected by Wazuh
Vulnerability Management SOP (SOP-009) — broader vuln handling
Information Logging Standard (STD-011) — logging requirements
Log Management Standard (STD-012) — retention and access
Security repo: wazuh/README.md, docs/wazuh/ — detailed operator reference

14. Review

Reviewed quarterly by the CTO (interim CISO) + SRE Lead. Changes recorded in ISRMC minutes. Next scheduled review: Q3 2026.

Contact: security@wealthy.in

SIEM Operations SOP

SIEM Operations SOP

1. Purpose

2. Scope

3. Stack

4. Daily Operations

4.1 Dashboard walk-through

4.2 Agent health

4.3 Threat intel freshness

4.4 GitHub Issues triage queue

5. Alert Response

5.1 Response SLA

5.2 CERT-In reportable check

6. Rule Tuning

6.1 When to tune

6.2 Where rules live

6.3 Deploying rule changes

7. Playbooks

7.1 SSH brute force

7.2 Fury auth brute force + known-bad IP

7.3 GCP IAM anomaly

7.4 Threat intel IOC match

7.5 Vulnerability detector CVE

7.6 Repeated noise / tuning needed

8. Metrics for Annexure-N (SEBI half-yearly)

9. Access Control

10. Log Retention

11. Incident Reporting Evidence

12. Escalation Matrix

13. Related Documents

14. Review