
Upgrades and maintenance
After a VMSS and applications are deployed, they need to be actively maintained. Planned maintenance should be conducted periodically to ensure that both the environment and application are up to date with the latest features, from a security and resilience point of view.
Upgrades can be associated with applications, the guest VM instance, or the image itself. Upgrades can be quite complex because they should happen without affecting the availability, scalability, and performance of environments and applications. To ensure that updates can take place one instance at a time using rolling upgrade methods, it is important that a VMSS supports and provides capabilities for these advanced scenarios.
There is a utility provided by the Azure team to manage updates for VMSSes. It's a Python-based utility that can be downloaded from https://github.com/gbowerman/vmssdashboard. It makes REST API calls to Azure to manage scale sets. This utility can be used to start, stop, upgrade, and reimage VMs on in an FD or group of VMs, as shown in Figure 2.15:

Figure 2.15: Utility for managing VMSS updates
Since you have a basic understanding of upgrade and maintenance, let's see how application updates are done in VMSSes.
Application updates
Application updates in VMSSes should not be executed manually. They must be run as part of the release management and pipelines that use automation. Moreover, an update should happen one application instance at a time and not affect the overall availability and scalability of an application. Configuration management tools, such as Desired State Configuration (DSC), should be deployed to manage application updates. The DSC pull server can be configured with the latest version of the application configuration and it should be applied on a rolling basis to each instance.
In the next section, we will focus on how the updates are done on the guest OS.
Guest updates
Updates to VMs are the responsibility of the administrator. Azure is not responsible for patching guest VMs. Guest updates are in preview mode and users should control patching manually or use custom automation methods, such as runbooks and scripts. However, rolling patch upgrades are in preview mode and can be configured in the Azure Resource Manager template using an upgrade policy, as follows:
"upgradePolicy": {"mode": "Rolling","automaticOSUpgrade": "true" or "false", "rollingUpgradePolicy": { "batchInstancePercent": 20, "maxUnhealthyUpgradedInstanceCount": 0, "pauseTimeBetweenBatches": "PT0S" }}
Now that we know how guest updates are managed in Azure, let's see how image updates are accomplished.
Image updates
A VMSS can update the OS version without any downtime. OS updates involve changing the version or SKU of the OS or changing the URI of a custom image. Updating without downtime means updating VMs one at a time or in groups (such as one FD at a time) rather than all at once. By doing so, any VMs that are not being upgraded can keep running.
So far, we have discussed updates and maintenance. Let's now examine what the best practices of scaling for VMSSes are.
Best practices of scaling for VMSSes
In this section, we will go through some of the best practices that applications should implement to take advantage of the scaling capability provided by VMSSes.
The preference for scaling out
Scaling out is a better scaling solution than scaling up. Scaling up or down means resizing VM instances. When a VM is resized, it generally needs to be restarted, which has its own disadvantages. First, there is downtime for the machine. Second, if there are active users connected to the application on that instance, they might face a lack of availability of the application, or they might even lose transactions. Scaling out does not impact existing VMs; rather, it provisions newer machines and adds them to the group.
New instances versus dormant instances
Scaling new instances can take two broad approaches: creating the new instance from scratch, which requires installing applications, configuring, and testing them; or starting the dormant, sleeping instances when they are needed due to scalability pressure on other servers.
Configuring the maximum and minimum number of instances appropriately
Setting a value of two for both the minimum and maximum instance counts, with the current instance count being two, means no scaling action can occur. There should be an adequate difference between the maximum and minimum instance counts, which are inclusive. Autoscaling always scales between these limits.
Concurrency
Applications are designed for scalability to focus on concurrency. Applications should use asynchronous patterns to ensure that client requests do not wait indefinitely to acquire resources if resources are busy serving other requests. Implementing asynchronous patterns in code ensures that threads do not wait for resources and that systems are exhausted of all available threads. Applications should implement the concept of timeouts if intermittent failures are expected.
Designing stateless applications
Applications and services should be designed to be stateless. Scalability can become a challenge to achieve with stateful services, and it is quite easy to scale stateless services. With states comes the requirement for additional components and implementations, such as replication, centralized or decentralized repository, maintenance, and sticky sessions. All these are impediments on the path to scalability. Imagine a service maintaining an active state on a local server. Irrespective of the number of requests on the overall application or the individual server, the subsequent requests must be served by the same server. Subsequent requests cannot be processed by other servers. This makes scalability implementation a challenge.
Caching and the Content Distribution Network (CDN)
Applications and services should take advantage of caching. Caching helps eliminate multiple subsequent calls to either databases or filesystems. This helps in making resources available and free for more requests. The CDN is another mechanism that is used to cache static files, such as images and JavaScript libraries. They are available on servers across the globe. They also make resources available and free for additional client requests—this makes applications highly scalable.
N+1 design
N+1 design refers to building redundancy within the overall deployment for each component. It means to plan for some redundancy even when it is not required. This could mean additional VMs, storage, and network interfaces.
Considering the preceding best practices while designing workloads using VMSSes will improve the scalability of your applications. In the next section, we will explore monitoring.