Pular para o conteúdo principal

Kubernetes Operator

A partir da v4.1, o CipherVault tem um Operator Kubernetes nativo em Go (controller-runtime), com 3 CRDs e reconcilers dedicados. Isso é diferente — e complementar — ao Mutating Admission Webhook que continua disponível para casos pod-level via annotations.

Operator vs. Mutating Webhook — quando usar cada um

CenárioUse
App existente, sem mudar manifestsMutating Webhook + annotations
Você controla os manifests, quer GitOps puroOperator + CRDs
Multi-cluster com plano de controle únicoOperator + Federation
Dynamic Secrets (lease lifecycle)Operator (CipherVaultLease CRD)
Provisionamento de DynamicRole via IaCOperator (CipherVaultDynamicRole CRD)

Stack técnica

  • Go 1.26 + controller-runtime v0.20
  • Imagem base distroless static, nonroot (uid 65534)
  • Multi-arch linux/amd64 + linux/arm64
  • Disponível em ghcr.io/martinez1991/ciphervault-k8s-operator:v4.4.0

CRDs

CipherVaultSecret

Materializa um secret do CV como Kubernetes Secret. Reconciler:

  • Fetch + SHA256 hash drift detect
  • ownerRef apontando do K8s Secret de volta ao CRD (auto-cleanup no delete)
  • Requeue de acordo com refreshInterval
apiVersion: ciphervault.io/v1alpha1
kind: CipherVaultSecret
metadata:
name: stripe-key
namespace: billing
spec:
cvUrl: https://cv.acme.com.br
clientIdRef:
name: cv-app-credentials
key: client_id
clientSecretRef:
name: cv-app-credentials
key: client_secret
vault: producao
path: api/stripe/secret_key
refreshInterval: 5m
target:
type: Opaque
name: stripe-credentials
keys:
- name: STRIPE_KEY
sourcePath: value

Result: K8s Secret stripe-credentials com chave STRIPE_KEY populada do CV, atualizado a cada 5min se houver rotação.

CipherVaultLease

Solicita lease dinâmica do Dynamic Secrets e mantém ativa enquanto o CRD existe. Reconciler:

  • request no CV no primeiro reconcile
  • Auto-renew quando remainingTTL < threshold% (default 30%)
  • Revoke via finalizer no delete
apiVersion: ciphervault.io/v1alpha1
kind: CipherVaultLease
metadata:
name: billing-db-lease
namespace: billing
spec:
cvUrl: https://cv.acme.com.br
authSecretRef:
name: cv-app-credentials
roleId: 42
ttlSeconds: 600
renewThresholdPercent: 30
target:
type: Opaque
name: billing-db-creds
keys:
- name: DB_USER
sourcePath: username
- name: DB_PASS
sourcePath: password
status:
leaseId: lease_01HXY...
expiresAt: "2026-05-07T15:00:00Z"
remainingSeconds: 547

Pod consome o Secret billing-db-creds. Operator renova transparentemente — pod nunca enxerga troca (mas pode precisar reconectar; veja Secretless Proxy para evitar).

CipherVaultDynamicRole

Gerencia uma role no dynamic_secrets do CV declarativamente. Reconciler:

  • POST /dynamic-secrets/roles no primeiro reconcile (idempotent)
  • PUT em mudanças de spec
  • Drift check 1h — se alguém mudou no UI, operator restaura ao spec
apiVersion: ciphervault.io/v1alpha1
kind: CipherVaultDynamicRole
metadata:
name: billing-readonly
namespace: ciphervault-system
spec:
cvUrl: https://cv.acme.com.br
authSecretRef:
name: cv-admin-credentials
backendId: 12
name: billing-readonly
ttlSeconds: 600
maxTtlSeconds: 3600
creationStatements:
- "CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'"
- "GRANT pg_read_all_data TO \"{{name}}\""
revocationStatements:
- "REVOKE ALL ON DATABASE billing FROM \"{{name}}\""
- "DROP ROLE IF EXISTS \"{{name}}\""

GitOps puro: role no CV é reflexo do CRD aplicado.

Instalação

Via manifests YAML

# CRDs
kubectl apply -f https://raw.githubusercontent.com/Martinez1991/ciphervault/main/kubernetes/operator/config/crds/

# RBAC + Deployment
kubectl apply -f https://raw.githubusercontent.com/Martinez1991/ciphervault/main/kubernetes/operator/config/manager/

# Verificar
kubectl -n ciphervault-system get pods
kubectl -n ciphervault-system logs deployment/ciphervault-operator

Via Helm (em breve)

helm install ciphervault-operator ciphervault/operator \
--namespace ciphervault-system \
--create-namespace \
--version 4.4.0

Leader election

O Operator roda com leader election habilitado por default. Em HA, deploy com replicas: 3 — apenas 1 instância reconcilia, as outras ficam em standby pra failover.

Healthz / readyz

GET /healthz liveness probe
GET /readyz readiness probe
GET /metrics Prometheus

Probes recomendados no Deployment:

livenessProbe:
httpGet: { path: /healthz, port: 8081 }
initialDelaySeconds: 15
periodSeconds: 20
readinessProbe:
httpGet: { path: /readyz, port: 8081 }
initialDelaySeconds: 5
periodSeconds: 10

Métricas

cv_operator_reconciles_total{kind, result}
cv_operator_reconcile_duration_seconds{kind}
cv_operator_drift_detected_total{kind}
cv_operator_lease_renewals_total{result}
cv_operator_secret_age_seconds{kind, namespace, name}

ServiceMonitor:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: ciphervault-operator
namespace: ciphervault-system
spec:
selector:
matchLabels: { app: ciphervault-operator }
endpoints:
- port: metrics
interval: 30s

Boas práticas

  • CRDs em GitOps — versione manifests; mudanças passam por PR
  • refreshInterval realista — 5min cobre 99% dos casos; críticos podem ir a 1min
  • Replicas: 3 com leader election em produção
  • AppConnection por cluster — não compartilhe entre clusters
  • NetworkPolicy restringindo egress do operator apenas pra cv-url + DNS + K8s API
  • Para multi-cluster, combine com K8s Federation — plano de controle único entrega policies para todos os operators