How to Rescue Terminating PersistentVolumes in Kubernetes/OpenShift Cluster (Why You Should Take Backups)
We recently faced a near-miss situation in our OpenShift bare metal cluster — a mistake during routine cleanup led to multiple PersistentVolumes (PVs) being deleted accidentally. But thanks to Kubernetes’ finalizer mechanism, we narrowly avoided data loss. Here’s how we got everything back up and running — and what we learned along the way.
What Went Wrong?
During a cleanup operation, several PersistentVolumes were deleted unintentionally, using oc delete pv --all -n namespace. At first, we panicked, but upon checkingoc get pv, we noticed they had entered a Terminating state instead of being immediately destroyed.
This behaviour happened because of the finalizer:
finalizers:
- kubernetes.io/pv-protectionThis pv-protection finalizer is designed to prevent accidental deletion if the volume is still in use. While it saved our data this time, the PVs were now stuck — unable to be removed or used cleanly.
Surprisingly, the applications using those PVS kept running, because the data was still mounted and the PV deletion hadn’t been completed. In a way, the “stuck” state actually reduced the blast radius.
kind: PersistentVolume
apiVersion: v1
metadata:
deletionTimestamp: '2020-08-23T09:38:42Z'
resourceVersion: '112535964'
name: 'pvc-eef4ec4b-326d-47e6-b11c-6474a5fd4d89'
deletionGracePeriodSeconds: 0
creationTimestamp: '2020-08-22T03:39:41Z'
finalizers:
- kubernetes.io/pv-protection
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
claimRef:
kind: PersistentVolumeClaim
namespace: image
name: repo
uid: 'eef4ec4b-326d-47e6-b11c-6474a5fd4d89'
apiVersion: v1
resourceVersion: '111102610'
persistentVolumeReclaimPolicy: Delete
volumeMode: Filesystem
status:
phase: BoundStep-by-Step Recovery:
Identify All Terminating PV’S
We filtered out all the PV’S in the Terminating state:
oc get pv --no-headers | awk '$5=="Terminating" {print $1}'Patch All PVs to Use Retain Policy
We ensured the reclaim policy was set to Retain, so that Kubernetes wouldn’t delete the underlying data, even if it succeeded in removing the PV metadata:
for pv in $(oc get pv --no-headers | awk '$5=="Terminating" {print $1}'); do
echo "Patching reclaimPolicy=Retain for PV: $pv"
oc patch pv "$pv" -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' --type=merge
doneTake a Backup of All PV Manifests
We saved YAML backups of all the Terminating PVs — critical in case anything went further south:
for pv in $(oc get pv --no-headers | awk '$5=="Terminating" {print $1}'); do
oc get pv "$pv" -o yaml > "${pv}-backup.yaml"
doneHow to reset persistent volume status from terminating back to bound:
Since I cannot reset PV status back to Bound using the kubectl client or calling the Kubernetes API, I decided to update the PV’s value in etcd directly.
First, you need to download the latest compiled binary from here.
If you prefer to compile by yourself:
#forked official repo from jianz github account
git clone https://github.com/surajsolanki724/k8s-reset-terminating-pv.git
cd k8s-reset-terminating-pv
go build -o resetpvRequirements
You need:
etcd-ca.crt– CA certificateetcd.crt– Client certetcd.key– Client keyetcd-host– IP or hostname of etcdetcd-port– Default: 2379
Usage
Usage:
resetpv [flags] <persistent volume name>
Flags:
--etcd-ca string CA Certificate used by etcd (default "ca.crt")
--etcd-cert string Public key used by etcd (default "etcd.crt")
--etcd-key string Private key used by etcd (default "etcd.key")
--etcd-host string The etcd domain name or IP (default "localhost")
--etcd-port int The etcd port number (default 2379)
--k8s-key-prefix string The etcd key prefix for kubernetes resources. (default "registry")
-h, --help help for resetpvFor simplicity, you can name the etcd certificate ca.crt, etcd.crt, etcd.key, and put them in the same directory as the tool(resetpv).
The tool, by default, connects to etcd using localhost:2379.
Getting etcd Certificates from master node:
To connect securely to the etcd data store, we need three files:
- CA certificate →
ca.crt - Client certificate →
etcd-master.crt - Client key →
etcd-master.key
These are available on any OpenShift control-plane (master) node under the static pod resource directories.
# CA bundle for etcd
/etc/kubernetes/static-pod-resources/etcd-certs/configmaps/etcd-all-bundles/ca.crt# client certificates
cd /etc/kubernetes/static-pod-resources/etcd-certs/secrets/etcd-all-certs/├── etcd-peer-<hostname>.crt
├── etcd-peer-<hostname>.key
To confirm your certificate and key match, verify their modulus hash using openssl:
openssl x509 -noout -modulus -in etcd-master.crt | openssl md5
openssl rsa -noout -modulus -in etcd-master.key | openssl md5Both commands should output the same MD5 hash, confirming that the key pair is valid.
Once verified, copy ca.crt, etcd-master.crt, and etcd-master.key into the same directory as your resetpv binary.
Execute the command below:
./resetpv --k8s-key-prefix kubernetes.io <pv-name> \
--etcd-ca ca.crt --etcd-cert etcd-master-crt --etcd-key etcd-master.key \
--etcd-host API-SERVER-IP
#single command will bound all the pv's
for PV in $(kubectl get pv -o jsonpath='{.items[*].metadata.name}'); do
./resetpv --k8s-key-prefix kubernetes.io $PV --etcd-ca etcd_ca.crt --etcd-cert etcd-master.crt --etcd-key etcd.key --etcd-host API-SERVER-IP
donekubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-eef4ec4b-326d-47e6-b11c-6474a5fd4d89 1Gi RWO Retain Bound image/repo managed-nfs-storage 11hThe terminating persistent volume is back to bound status. Finally.
How It Works
This tool directly connects to the etcd key-value store (which backs Kubernetes) and resets the metadata of a Terminating PV. It works by setting:
(*pv).ObjectMeta.DeletionTimestamp = nil
(*pv).ObjectMeta.DeletionGracePeriodSeconds = nilThat means Kubernetes forgets it ever started deleting the PV’s and treats it as Bound again.
Recommendations:
We were lucky this time. Our steps saved all application data, and our apps continued running without interruption.
Set Reclaim Policy to Retain for critical volumes.
This avoids automatic deletion of underlying storage.
Recommended Backup Tools
Conclusion
While this incident began with a critical mistake — accidentally deleting PersistentVolumes — it became a valuable learning experience. Thanks to Kubernetes’ built-in safeguards like finalizers, and tools like k8s-reset-terminating-pvWe were able to fully recover without any data loss or downtime.
This event reinforced a few core DevOps principles:
- Always validate before deleting any critical Kubernetes resources.
- Set
Retainas your default reclaim policy for important PVs. - Back up your PVs regularly — and not just data, but also the object metadata.
- Know your recovery path ahead of time, including how to work with etcd in emergencies.
If you have any questions or feedback, feel free to comment.
About The Author
Suraj Solanki
Senior DevOps Engineer
LinkedIn: https://www.linkedin.com/in/suraj-solanki
Topmate: https://topmate.io/suraj_solanki
