How to Rescue Terminating PersistentVolumes in Kubernetes/OpenShift Cluster (Why You Should Take Backups)
We recently faced a near-miss situation in our OpenShift bare metal cluster — a mistake during routine cleanup led to multiple PersistentVolumes (PVs) being deleted accidentally. But thanks to Kubernetes’ finalizer mechanism, we narrowly avoided data loss. Here’s how we got everything back up and running — and what we learned along the way.
What Went Wrong?
During a cleanup operation, several PersistentVolumes were deleted unintentionally using oc delete pv --all -n namespace
. At first, we panicked, but upon checkingoc get pv
, we noticed they had entered a Terminating state instead of being immediately destroyed.
This behaviour happened because of the finalizer:
finalizers:
- kubernetes.io/pv-protection
This pv-protection finalizer is designed to prevent accidental deletion if the volume is still in use. While it saved our data this time, the PVs were now stuck — unable to be removed or used cleanly.
Surprisingly, the applications using those PVS kept running, because the data was still mounted and the PV deletion hadn’t been completed. In a way, the “stuck” state actually reduced the blast radius.
kind: PersistentVolume
apiVersion: v1
metadata:
deletionTimestamp: '2020-08-23T09:38:42Z'
resourceVersion: '112535964'
name: 'pvc-eef4ec4b-326d-47e6-b11c-6474a5fd4d89'
deletionGracePeriodSeconds: 0
creationTimestamp: '2020-08-22T03:39:41Z'
finalizers:
- kubernetes.io/pv-protection
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
claimRef:
kind: PersistentVolumeClaim
namespace: image
name: repo
uid: 'eef4ec4b-326d-47e6-b11c-6474a5fd4d89'
apiVersion: v1
resourceVersion: '111102610'
persistentVolumeReclaimPolicy: Delete
volumeMode: Filesystem
status:
phase: Bound
Step-by-Step Recovery:
Identify All Terminating PV’S
We filtered out all the PV’S in the Terminating state:
oc get pv --no-headers | awk '$5=="Terminating" {print $1}'
Patch All PVs to Use Retain Policy
We ensured the reclaim policy was set to Retain, so that Kubernetes wouldn’t delete the underlying data, even if it succeeded in removing the PV metadata:
for pv in $(oc get pv --no-headers | awk '$5=="Terminating" {print $1}'); do
echo "Patching reclaimPolicy=Retain for PV: $pv"
oc patch pv "$pv" -p '{"spec":{"persistentVolumeReclaimPolicy":"Retain"}}' --type=merge
done
Take a Backup of All PV Manifests
We saved YAML backups of all the Terminating PVs — critical in case anything went further south:
for pv in $(oc get pv --no-headers | awk '$5=="Terminating" {print $1}'); do
oc get pv "$pv" -o yaml > "${pv}-backup.yaml"
done
How to reset persistent volume status from terminating back to bound:
Since I cannot reset PV status back to Bound using the kubectl client or calling the Kubernetes API, I decided to update the PV’s value in etcd directly.
First, you need to download the latest compiled binary from here.
If you prefer to compile by yourself:
#forked official repo from jianz github account
git clone https://github.com/surajsolanki724/k8s-reset-terminating-pv.git
cd k8s-reset-terminating-pv
go build -o resetpv
Requirements
You need:
etcd-ca.crt
– CA certificateetcd.crt
– Client certetcd.key
– Client keyetcd-host
– IP or hostname of etcdetcd-port
– Default: 2379
Usage
Usage:
resetpv [flags] <persistent volume name>
Flags:
--etcd-ca string CA Certificate used by etcd (default "ca.crt")
--etcd-cert string Public key used by etcd (default "etcd.crt")
--etcd-key string Private key used by etcd (default "etcd.key")
--etcd-host string The etcd domain name or IP (default "localhost")
--etcd-port int The etcd port number (default 2379)
--k8s-key-prefix string The etcd key prefix for kubernetes resources. (default "registry")
-h, --help help for resetpv
For simplicity, you can name the etcd certificate ca.crt, etcd.crt, etcd.key, and put them in the same directory as the tool(resetpv).
The tool, by default, connects to etcd using localhost:2379
.
Execute below command:
./resetpv --k8s-key-prefix kubernetes.io <pv-name> \
--etcd-ca ca.crt --etcd-cert etcd-master-crt --etcd-key etcd-master.key \
--etcd-host API-SERVER-IP
kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-eef4ec4b-326d-47e6-b11c-6474a5fd4d89 1Gi RWO Retain Bound image/repo managed-nfs-storage 11h
The terminating persistent volume is back to bound status. Finally.
How It Works
This tool directly connects to the etcd key-value store (which backs Kubernetes) and resets the metadata of a Terminating PV. It works by setting:
(*pv).ObjectMeta.DeletionTimestamp = nil
(*pv).ObjectMeta.DeletionGracePeriodSeconds = nil
That means Kubernetes forgets it ever started deleting the PV’s and treats it as Bound again.
Recommendations:
We were lucky this time. Our steps saved all application data, and our apps continued running without interruption.
Set Reclaim Policy to Retain
for critical volumes.
This avoids automatic deletion of underlying storage.
Recommended Backup Tools
Conclusion
While this incident began with a critical mistake — accidentally deleting PersistentVolumes — it became a valuable learning experience. Thanks to Kubernetes’ built-in safeguards like finalizers, and tools like k8s-reset-terminating-pv
We were able to fully recover without any data loss or downtime.
This event reinforced a few core DevOps principles:
- Always validate before deleting any critical Kubernetes resources.
- Set
Retain
as your default reclaim policy for important PVs. - Back up your PVs regularly — and not just data, but also the object metadata.
- Know your recovery path ahead of time, including how to work with etcd in emergencies.
If you have any questions or feedback, feel free to comment.
About The Author
Suraj Solanki
Senior DevOps Engineer
LinkedIn: https://www.linkedin.com/in/suraj-solanki
Topmate: https://topmate.io/suraj_solanki