SIEM Operations SOP
SIEM Operations SOP
| Field | Value |
|---|---|
| Document ID | SOP-006 |
| Classification | Internal |
| Owner | SRE / SecOps |
| Approved By | CTO (interim CISO) |
| Effective Date | April 2026 |
| Review Cycle | Quarterly |
| Parent Standard | Information Logging (STD-011) Β· Log Management (STD-012) |
| Related SOPs | Incident Response (SOP-004) Β· CERT-In Compliance (SOP-002) Β· Vulnerability Management (SOP-009) |
Role note: The CISO role is currently pending formal appointment. Until then the CTO acts as interim CISO for sign-offs referenced in this SOP.
1. Purpose
Procedures for operating Wealthy’s Security Information and Event Management (SIEM) platform β detection, triage, response, rule tuning, and evidence generation for regulatory audits.
2. Scope
Applies to all security-monitoring activities across:
- GKE cluster (
securitynamespace) - GCP audit logs (all projects)
- AWS CloudWatch (integrated accounts)
- Employee endpoints (Mac, Windows, GCP Linux VMs) via Wazuh agents
- GitHub org audit events
- External threat intelligence feeds (OTX + AbuseIPDB)
3. Stack
Wealthy’s SIEM is Wazuh 4.14.4, self-hosted on GKE in the security namespace.
| Component | Purpose |
|---|---|
| Wazuh Manager | Rule processing, integrations, agent server |
| Wazuh Indexer | OpenSearch (3-node cluster) β alert + log storage |
| Wazuh Dashboard | Web UI at https://wazuh.wealthy.systems |
| Wazuh Agents | Endpoint HIDS on ~20 laptops + GCP VMs |
custom-ai binary |
Triages level β₯10 alerts via Gemini Flash β opens GitHub Issue + Telegram + Slack |
threatintel-sync binary |
Pulls OTX + AbuseIPDB IOCs every 4 hours into Wazuh CDB lists |
| GCP Pub/Sub integration | Cloud audit logs β Wazuh |
4. Daily Operations
Owner: SRE on-call Time: ~15 min each morning (start of IST business hours)
4.1 Dashboard walk-through
Open https://wazuh.wealthy.systems and review:
| Dashboard | What to check |
|---|---|
| Security Overview | 24h totals β total alerts, high-severity count, fury failures, threat-intel matches, MITRE tactics |
| Threat Map | Geographic origin of attacks, top attacker IPs, AI Triage table, Threat Intel Matches table |
| CERT-In Compliance | 6-hour reporting queue (set time range to last 6h). Anything in the list = may need regulator notification |
| GCP Security | IAM changes by principal β watch for spikes from a single service account |
4.2 Agent health
1kubectl exec -n security wazuh-manager-master-0 -- /var/ossec/bin/agent_control -ls
- All agents expected to be Active
- Any Disconnected agent older than 24h β ping the employee on Slack to reconnect
4.3 Threat intel freshness
1kubectl exec -n security wazuh-manager-master-0 -c wazuh-manager -- \
2 wc -l /var/ossec/etc/lists/malicious-ioc/*
- Expected: ~10k IPs, ~1.2k domains, ~1.2k hashes
- If counts are 0 β check
threatintel-syncs6 service; possibly OTX/AbuseIPDB API key expired
4.4 GitHub Issues triage queue
1gh issue list --repo wealthy/security --label threat-alert --state open --limit 20
Walk through open threat-alert issues:
- True positive β relabel
incident, follow Incident Response SOP (SOP-004) - False positive β close with
false-positivelabel + comment explaining; queue rule tuning (Β§6) - Duplicate β close with reference to the open parent issue
5. Alert Response
Trigger: Wazuh emits alert at level β₯ 10 β custom-ai invokes automatically.
Wazuh alert
βββΆ custom-ai binary
ββ Dedup check (rule+IP or rule+agent+desc, 6h window) β skip Gemini if duplicate
ββ Gemini Flash β returns priority (noise/low/medium/high/critical) + summary + recommended actions
ββ Write to OpenSearch (wazuh-ai-analysis-*)
ββ Create/comment GitHub Issue (medium+) β wealthy/security repo
ββ Notify Telegram + Slack (non-noise)
5.1 Response SLA
| Priority (Gemini) | Response time | Channel |
|---|---|---|
critical |
Immediate β page on-call | Telegram + Slack + phone |
high |
Same business day | Telegram + Slack (#security-alerts) |
medium |
Within 3 days | Slack |
low |
Next sprint | GitHub Issue only |
noise |
β (no notification) | β |
5.2 CERT-In reportable check
For every critical / high alert, also check: does this fall under one of CERT-In’s 20 reportable incident categories? If yes β the 6-hour reporting clock starts at detection. See CERT-In Compliance (SOP-002).
6. Rule Tuning
6.1 When to tune
| Trigger | Action |
|---|---|
| False positive ratio on a specific rule > 10% over a week | Adjust threshold or add suppression in configmap |
| New detection gap found during VAPT | Add Wazuh custom rule, PR to wazuh/manager/*-rules-configmap.yaml |
| Post-incident RCA identifies missed signal | Add rule + retest, log in ISRMC minutes |
| New integration / service deployed | Add rules for its logs |
6.2 Where rules live
| Layer | Location |
|---|---|
| Wazuh built-in ruleset | Shipped with the image β do not modify |
| Fury auth rules (100100-100105) | wazuh/manager/fury-rules-configmap.yaml in wealthy/security |
| Threat intel match rules (100200-100205) | wazuh/manager/threatintel-rules-configmap.yaml |
| Ad-hoc suppressions | Add to configmap.yaml in <rule_exclude> |
6.3 Deploying rule changes
1# From wealthy/security repo
2kubectl apply -f wazuh/manager/<changed-configmap>.yaml
3kubectl rollout restart statefulset/wazuh-manager-master -n security
Validate with /var/ossec/bin/wazuh-logtest inside the manager pod before committing to master.
7. Playbooks
7.1 SSH brute force
- Rule 5763 / fury 100105 fires β level 10 alert β AI triage β GitHub Issue + Telegram + Slack
- On-call:
- Confirm source IP on Threat Map dashboard
- Block source IP at Kong gateway (or Cloudflare WAF if web-facing)
- Check whether any login succeeded from that IP (search
rule.groups: authentication_success) - If success β escalate to incident, begin Incident Response SOP (SOP-004)
7.2 Fury auth brute force + known-bad IP
- Rule 100202 fires (level 13, auto-elevated because IP matches threat intel)
- Treat as critical by default β this is almost certainly an attack, not user error
- Block at Kong + Cloudflare immediately
- Investigate whether the targeted account username has been leaked elsewhere (pwned passwords check)
7.3 GCP IAM anomaly
- GCP Security dashboard shows spike from a single principal
- Check
gcloud logging readfor the raw events - If principal is a service account β possible key compromise
- Rotate the SA key immediately
- Check for unauthorized resource creation/modification
- Begin Incident Response SOP (SOP-004) if evidence of compromise
7.4 Threat intel IOC match
- Rules 100200-100205 or native 99901-99908 fire β AI triage categorizes
- If IOC is on AbuseIPDB (confidence β₯ 90) β block the IP
- If IOC is on OTX (named campaign) β investigate whether related indicators (domains, hashes) are also present
- Add context to the GitHub Issue: which pulse/campaign matched
7.5 Vulnerability detector CVE
- Wazuh vuln detector creates alert;
custom-aiopens GitHub Issue - Verify the affected package + version is genuinely installed (Wazuh has version-parsing quirks β e.g. Hoppscotch
26.3.1vs2026.3.1) - If real: check upstream vendor advisory, identify affected agents, schedule patch per Patch Management SOP (SOP-005)
- If false positive: document in the GitHub Issue, add rule suppression if recurring
7.6 Repeated noise / tuning needed
- Same rule firing frequently without actionable signal β tune
- Label GitHub Issue
false-positiveand link to rule change PR - Measure FP rate for Annexure-N metrics (Β§8)
8. Metrics for Annexure-N (SEBI half-yearly)
Tracked metrics feeding the SOC Functional Efficacy Report:
| Metric | Target | Source |
|---|---|---|
| Log ingestion latency | < 5 minutes | Wazuh queue + agent Last keep alive |
| Threat intel processing time | < 60 minutes | threatintel-sync 4-hour refresh β avg 2h lag |
| Rule firing count (24h) | Baseline trend | Wazuh Dashboard metric |
| Dead rules (zero fires / period) | < 20% of total | Wazuh API rule.firedtimes |
| False positive rate | Track monthly | GitHub Issues with false-positive label / total threat-alert |
| False negative rate | Per-incident | Post-incident reviews flagging missed alerts |
| Critical system agent coverage | 100% | agent_control -ls |
Half-yearly: export the above β fill in Annexure-N template β ISRMC sign-off β SEBI submission.
9. Access Control
| Role | Wazuh Dashboard access |
|---|---|
admin |
Full admin (SRE / SecOps / CTO) |
analyst |
Read-only β can view dashboards, search events, no config changes |
kibanaserver |
Internal service account (do not touch) |
Account creation / change via OpenSearch Security API β see wazuh/README.md in the security repo. Access reviewed quarterly per Quarterly Access Review.
10. Log Retention
| Data | Retention | Hot / queryable | Location |
|---|---|---|---|
| Application logs (all services) | 2 years | All hot β fully queryable throughout | GCP Cloud Logging |
| GCP audit logs (Admin Activity, IAM, etc.) | 2 years | All hot | GCP Cloud Logging _Default bucket |
| AWS CloudWatch logs | 2 years | All hot | Per-log-group retention |
Wazuh alert index (wazuh-alerts-*) |
30 days rolling | Hot | OpenSearch ISM policy wazuh-rollover-delete |
Wazuh AI analysis index (wazuh-ai-analysis-*) |
30 days rolling | Hot | Same ISM policy |
No cold / archive tier. All logs are directly queryable for the full retention window β no restore-from-glacier hop, no tiered lookup. GCP Cloud Logging’s Logs Explorer returns any event within the 2-year window in seconds.
Canonical long-term audit trail lives in GCP (satisfies IRDAI/SEBI/CERT-In retention). Wazuh indexer holds a 30-day rolling window for fast security-specific queries; the full 2-year archive is in GCP.
11. Incident Reporting Evidence
Every incident produces a GitHub Issue in wealthy/security:
| Label | Meaning |
|---|---|
threat-alert |
Auto-created by custom-ai (all level 10+ alerts) |
incident |
Relabeled when confirmed a real incident |
false-positive |
Closed as not-an-incident (tuning queue) |
contained |
Closed with resolution β audit evidence |
reported-to-cert-in / reported-to-sebi / reported-to-irdai |
Regulator submissions filed |
Auditor query: gh issue list --repo wealthy/security --label incident --state closed --limit 100
12. Escalation Matrix
| Signal | First responder | Escalates to | Timeline |
|---|---|---|---|
| Level 10-12 alert, non-critical rule | SRE on-call | SRE Lead | 30 min |
| Level 13+ alert | SRE on-call | CTO (interim CISO) | 15 min |
| Suspected breach (data exfil, unauthorized access to PII) | CTO | CEO | 30 min |
| CERT-In reportable event | CTO | CERT-In PoC | Within 6h (regulatory) |
| SEBI reportable | CTO | SEBI PoC | Within 6h |
| DPDP data breach | CTO | DPO | Within 72h |
13. Related Documents
- Incident Response SOP (SOP-004) β once an alert becomes an incident
- CERT-In Compliance (SOP-002) β regulatory timelines
- Patch Management SOP (SOP-005) β fixing vulnerabilities detected by Wazuh
- Vulnerability Management SOP (SOP-009) β broader vuln handling
- Information Logging Standard (STD-011) β logging requirements
- Log Management Standard (STD-012) β retention and access
- Security repo:
wazuh/README.md,docs/wazuh/β detailed operator reference
14. Review
Reviewed quarterly by the CTO (interim CISO) + SRE Lead. Changes recorded in ISRMC minutes. Next scheduled review: Q3 2026.
Contact: security@wealthy.in