Why the GKE metadata server failed to work, and how we fixed it — Overview Someday, after our Google Kubernetes Engine (GKE) cluster was updated automatically, the GKE metadata server went down for nearly a day. As a result, the kube-dns (Kubernetes DNS service) Pods kept restarting, and services in the cluster were unavailable. This post will detail the outage, explain what caused it, and…