Architectural considerations for storage accounts
Storage account should be provisioned within the same region as that of other application components. This would help in using the same data center network backbone without incurring any network charges.
Each of the Azure storage services has scalability targets for capacity (GB), transaction rate, and bandwidth. A general storage account allows 500 TB of data to be stored. If there is a need to store more than 500 TB of data, then either multiple storage accounts should be created, or premium storage should be used. General storage performs max at 20000 IOPS or 60 MB of data per second. Any requirement for higher IOPS or data managed per second will be throttled. If this is not enough for applications from the performance perspective, either premium storage or multiple storage accounts should be used.
The size of a virtual machine determines the size and capacity of data disks available. While higher sized virtual machines have data disks with higher IOPS capacity, the max capacity will still be limited to 20000 IOPS and 60 MB per second. It is to be noted that these are maximum numbers and so generally lower levels should be taken into consideration when finalizing storage architecture.
Azure storage accounts should be enabled for authentication using SAS tokens. They should not be allowed for anonymous access. Moreover, for blob storage, different containers should be created having separate SAS tokens generated based on different types and categories of clients accessing those containers. These SAS tokens should be periodically regenerated to ensure that these keys cannot be guessed or cracked by anyone.
Generally, blobs fetched for blob storage accounts should be cached and it can be determined that data in a cache is stale by comparing the blob's last modified property to re-fetch the latest blob.
Azure storage account provides concurrency features to ensure that the same file and data is not modified simultaneously by multiple users. It provides the following:
- Optimistic concurrency: It allows multiple users to modify data simultaneously, but while writing checks if the file or data has changed. If it has, it informs the users to re-fetch the data and performs the update again. This is the default concurrency for table service.
- Pessimistic concurrency: When an application tries to update a file, it places a lock that explicitly denies any updates to it by other users. This is the default concurrency for file services when accessed using SMB protocol.
- Last writer wins: In this, the updates are not constrained, and the last user updates the file irrespective of what was read initially. This is the default concurrency for both queue, blob, and file (when accessed using REST) services.