2025-09-25: Pritunl VPN IP Change Incident

Root cause analysis for unexpected Pritunl VPN IP change on September 25, 2025

Incident Summary

Date & Time

  • Date: September 25, 2025
  • Time: Around 2:30 AM IST
  • Duration: ~0.5 hours
  • Severity: Medium

What Happened

The Pritunl VPN server became inaccessible when its VM restarted and was assigned a new ephemeral public IP. Since the old IP was used in allowlists and client profiles, users were unable to connect until corrective actions were taken.


Impact

Services Affected

  • Completely Down:
    • VPN access for all developers, SREs, and operators.

Business Impact

  • Teams temporarily lost access to AKS and GKE clusters over VPN.
  • No production customer-facing services were directly impacted.

Root Cause

  • The VM hosting Pritunl was configured with an ephemeral public IP.
  • When the VM restarted, Google Cloud released the old IP and allocated a new one.
  • The old IP was hardcoded in allowlists (AKS, GKE) and client profiles, breaking connectivity.

Resolution

Immediate Fix

  1. Promoted the Pritunl VM IP from ephemeral → static to prevent future changes.
  2. Updated authorized IPs in:
    • Azure Kubernetes Service (AKS)
    • Google Kubernetes Engine (GKE)
  3. Users were guided to re-import their VPN profiles via https://vpn53.wealthy.systems.

Validation

  • Confirmed VPN connectivity restored for all users.
  • Verified AKS and GKE allowlists accepted traffic from the new static IP.

Contributing Factors

  • Reliance on ephemeral IPs for critical infrastructure.

Future Mitigation Plan

  • Always allocate static IPs for production-critical infrastructure (VPNs, gateways, bastions).
  • Implement monitoring and alerts on VM restarts and public IP changes.

Lessons Learned

  • Critical access services (VPNs, jump hosts) must never rely on ephemeral IPs.
  • Even small infra changes (like VM restarts) can cause wide developer downtime.

Last modified November 11, 2025: RCA added for SIP failure (16439aa)