HOW TO SET UP PERSISTENT LOGGING FOR APACHE AIRFLOW ON GKE USING THE GCS FUSE CSI DRIVER

This article explains how to configure persistent logging for Apache Airflow on Google Kubernetes Engine (GKE) when using the Kubernetes Executor. By default, logs are lost when worker pods terminate. The solution is to use the Google Cloud Storage (GCS) FUSE CSI driver to create a ReadWriteMany PersistentVolumeClaim (PVC) backed by a GCS bucket. This involves enabling the GCS Fuse CSI driver in your GKE cluster, configuring a Google Cloud service account with appropriate permissions, and annotating the Airflow Kubernetes service accounts to use Workload Identity. Finally, you create the PersistentVolume and PersistentVolumeClaim and update your Airflow Helm chart values to use this new persistent storage for logs.