Working with Persistent Volumes by Using Azure Files in Azure Kubernetes Service

Applications that we develop as cloud-native sometimes may have to deal with data related operations such as accessing, storing or sharing a data between applications.

As we know, our applications that we host on Kubernetes as containerized are ephemeral and stateless by nature. In other words, if we don’t use a persistent storage solution, any data that we store in a container will survive until the container is deleted.

In this article, we will see how we can create a persistent volume that can be accessed by many containers and also how to mount it to our containers. We will perform this operation by using fully managed Azure File shares service. We will also briefly mention that how we can make our persistent volume secure against potential security threats.

Prerequisites

We need to have following environment and tools.

Azure Kubernetes Service
Azure CLI
Kubectl

Kubernetes Volumes

Kubernetes has a volume abstraction model that solves many problems such as data loss and has a structure that allows us to use different volume types. With this flexible structure, we can use different volume types within the pods simultaneously.

We can categorize volumes as Ephemeral and Persistent in Kubernetes. While the lifecycle of the ephemeral is bound to a pod, lifecycle of the persistent is not bound to any pod. In addition, when creating a Kubernetes volume, either it can be created statically by a cluster administrator or dynamically by the Kubernetes API.

In the context of the article, we will look at how to mount statically created Azure File shares service to a pod by using PersistentVolume and PersistentVolumeClaim resources.

Let’s Create a Azure File Shares

Azure Files offers us a fully managed file share service that can be accessed over SMB or NFS protocols. Especially if we are in the cloud environment migration process, we can make this migration process easier by using Azure Files since it has ability to work on both on-prem and cloud environments and also it easily gets mounted to the applications over SMB protocol.

First we need to create a storage account, after that we will be able to create a File shares service. Let’s perform the operations via Azure CLI as follows.

az storage account create -n STORAGE_ACCOUNT_NAME -g RESOURCE_GROUP -l LOCATION --sku Standard_LRS

At this point, I used the “Standard_LRS” SKU as an example. If you are using AKS with Availability Zones for high availability, when creating a storage account, I recommend you to prefer “Zone Redundant” SKUs instead of “Locally Redundant“. Otherwise, you may encounter with problems during volume mount operations as the resources may be availabile on different zones at that time. In addition, for the production workloads, you can prefer SSD based “Premium” options.

After creating the storage account with a name we want, let’s export the connection string of the account to create a File shares service and then create a File shares service.

export AZURE_STORAGE_CONNECTION_STRING=$(az storage account show-connection-string -n STORAGE_ACCOUNT_NAME -g RESOURCE_GROUP -o tsv)

az storage share create -n STORAGE_ACCOUNT_NAME --connection-string $AZURE_STORAGE_CONNECTION_STRING

After completing the operations, we can check the storage account and the File shares we just created through the Azure Portal.

Microsoft Defender for Storage

Now, let’s take a quick look at how we can secure the File shares service we just created against potential external threats and unusual accesses with Microsoft Defender.

Microsoft Defender is an advanced threat protection tool which we can use for potential security threats in both our Azure-native and hybrid environments. In order to protect our File shares service, we will benefit from Microsoft Defender’s native intelligent security layer for storages.

In short, without us having the skills of a security expert, this smart layer ensures that we have a protection against potential threats, that may come from outside. It performs this process by continouosly analyzing the telemetry data which produced by Azure Blob Storage and Azure Files service. When it detects any potential security threat, it creates security alerts on the portal.

It is possible to activate Microsoft Defender for Storage either at the subscription level or at the resources level that we choose. Now, using the commands below, let’s activate advanced threat protection for the storage account we created.

az security atp storage update --resource-group RESOURCE_GROUP --storage-account STORAGE_ACCOUNT_NAME --is-enabled true

As we can see, advanced threat protection has been enabled for the storage account that I created named “mydocumentstrg“.

What I like about Microsoft Defender for Storage is that when we get into the “Security” tab of the relevant storage account through the portal, we can quickly get information about the general security status of the storage account or we can see the recommendations on how to make the storage account more secure. It is also possible to take some actions in case of any alerts by creating integrations.

One of the things that I don’t like in terms of protection against malicious contents is that it does not scan each uploaded file one by one. Instead, it analyzes the telemetry data produced by Azure Blob Storage and Azure Files services and performs hash scanning for known virus or trojan types. In addition, currently it does not support upload operations over SMB protocol.

Working with PersistentVolume and PersistentVolumeClaim

PersistentVolume is a Kubernetes storage resource that has its own lifecycle independent of any pod unlike a normal volume. PersistentVolumeClaim is used to mount PersistentVolume to a pod.

In short, when an application needs a persistent storage, the relevant domain user can request a storage for their own application by using PersistentVolumeClaim without having detailed information about the storage.

First, Kubernetes needs the credential information of the File shares to access it. For this, let’s create a secret object named “storage-account” in Kubernetes cluster that will contain credential information of the storage as follows.

STORAGE_KEY=$(az storage account keys list --resource-group RESOURCE_GROUP --account-name mydocumentstrg --query "[0].value" -o tsv)

kubectl create secret generic storage-account --from-literal=azurestorageaccountname=mydocumentstrg --from-literal=azurestorageaccountkey=$STORAGE_KEY

Now we need to define a yaml file called “pv-pvc” as follows.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: azurefile
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteMany
  azureFile:
    secretName: storage-account
    shareName: mydocumentstrg
    readOnly: false
  persistentVolumeReclaimPolicy: Retain
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: azurefile
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""
  resources:
    requests:
      storage: 5Gi

Let’s take a look at what we have done here.

We have defined that the PersistentVolume has a capacity of 5Gi.
We set the “accessModes” option to “ReadWriteMany“. Thus, this PersistentVolume can be mounted by more than one node as read-write.
Since we will use the Azure Files service as the storage plugin, we have pointed as a reference the “shareName” and “secretName” information of the File shares service we created.
We also set the “persistentVolumeReclaimPolicy” option to “Retain“. Thus, when the PersistentVolumeClaim gets deleted, the PersistentVolume will not be deleted along with the related data.

In short, we are going to create a storage resource with a capacity of 5Gi in the cluster with PersistentVolume.

As a developer, we will use the PersistentVolumeClaim to request a persistent storage with 5Gi capacity by referencing it in a pod spec that we want. In addition, by setting the “storageClassName” attribute to empty, we have ensured that the storage is not created dynamically.

If we wanted to have storage account to be created dynamically by the Kubernetes rather than statically, instead of the PersistentVolume resource we would have to create a StorageClass resource with the criteria we want and create a PersistentVolumeClaim that will use that resource.

NOTE: If we want the PersistentVolumeClaim to be bound with a specific PersistentVolume that we have created before, we can do this by setting the “volumeName” attribute. For example, we may have services with different tiers in terms of the scalability and performance.

Now let’s execute the following commands via terminal and take a look at the created resources.

kubectl apply -f pv-pvc.yaml

kubectl get pv azurefile
kubectl get pvc azurefile
kubectl describe pvc azurefile

As we can see that the PersistentVolumeClaim named azurefile has been successfully “bound” to PersistentVolume named azurefile. If we take a look at the “Used By” attribute, we can see that it is “none” as it has not yet been referenced by any pod.

Now we are ready to deploy an example application.

As an example, I will deploy the following simple .NET 6 console application on Kubernetes.

using (StreamWriter writer = File.CreateText("/mnt/azure/mytext.txt"))
{
await writer.WriteLineAsync("hello");
} 
await Task.Delay(TimeSpan.FromHours(1));

Isn’t it nice to be able to quickly develop small and simplified applications with top-level statements? With this simple application, we will create a simple txt file under the “/mnt/azure/” path. We will also mount the File shares service to that path.

This is the Dockerfile.

FROM mcr.microsoft.com/dotnet/runtime:6.0 AS base
WORKDIR /app

FROM mcr.microsoft.com/dotnet/sdk:6.0 AS build
WORKDIR /src
COPY ["MyDocumentConsoleApp/MyDocumentConsoleApp.csproj", "MyDocumentConsoleApp/"]
RUN dotnet restore "MyDocumentConsoleApp/MyDocumentConsoleApp.csproj"
COPY . .
WORKDIR "/src/MyDocumentConsoleApp"
RUN dotnet build "MyDocumentConsoleApp.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "MyDocumentConsoleApp.csproj" -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "MyDocumentConsoleApp.dll"]

First, let’s define a simple deployment spec as follows.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mydocumentapp-deployment
  labels:
    app: mydocumentapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mydocumentapp
  template:
    metadata:
      labels:
        app: mydocumentapp
    spec:
      containers:
      - name: mydocumentapp
        image: YOUR_ACR.azurecr.io/mydocumentapp:v1
        volumeMounts:
        - name: azurefileshare
          mountPath: /mnt/azure
      volumes:
      - name: azurefileshare
        persistentVolumeClaim:
          claimName: azurefile

In this simple deployment spec, we have defined that we are requesting a volume by using our PersistentVolumeClaim named “azurefile” under the “volumes” section. Also we have defined the path that we want to mount in the container.

Thus, the Kubernetes control plane will try to find a PersistentVolume for this pod that has the criteria we defined in the PersistentVolumeClaim. When it successfully finds a suitable PersistentVolume, it will bind the PersistentVolumeClaim to the corresponding volume. Otherwise the relevant pod cannot be scheduled.

Now let’s deploy the application as follows and check the content of the “/mnt/azure/mytext.txt” file in the container.

kubectl apply -f ./mydocumentapp-deployment.yaml

kubectl exec -it $(kubectl get pods -l=app=mydocumentapp --output=jsonpath={.items..metadata.name}) -- cat /mnt/azure/mytext.txt

As we can see, our application has successfully created the “mytext.txt” file on the File shares service we have mounted. Now this file is locating on the Azure File shares service persistently. Even if the relevant applications are rescheduled or scaled out/down, the same File shares service will stay accessible.

Now, let’s describe the PersistentVolumeClaim again.

kubectl describe pvc azurefile

If we take a look at the “Used By” attribute now, we can see that this claim is being used by the pod we have created.

Let’s Wrap Up

We said that the pods in Kubernetes are ephemeral by their nature. Pods can be deleted and rescheduled together with the local data at anytime. For this reason, we need to use persistent volumes for data to be shared between pods or to be stored persistently.

Within the scope of this article, we took a brief look at how we can mount the managed Azure File shares service to our applications, which need a persistent storage solution, with the PersistentVolume and PersistentVolumeClaim approach.

While PersistentVolume is used by a cluster administrator to manage storage resources within the cluster, PersistentVolumeClaim is used by a developer to request a persistent storage to their application in a abstract way without knowing anything about the background.

References

https://docs.microsoft.com/en-us/azure/storage/files/storage-files-introduction
https://docs.microsoft.com/en-us/azure/defender-for-cloud/defender-for-storage-introduction
https://kubernetes.io/docs/concepts/storage/persistent-volumes/