A common trend in system deployment is the migration of systems from fixed servers running in an on-premise (on prem) facility to cloud-based deployments in a commercial cloud environment like Amazon Web Services, Microsoft Azure, or Google Cloud Platform. Major benefits of moving to the cloud include improved operational resilience and increased productivity for IT staff who then have fewer physical assets to manage. Moving to the cloud also creates opportunities for business agility – enabling agencies to develop and launch new services without having to procure new hardware, as well as quickly and easily scale their services up and down to meet evolving needs. Finally, there are opportunities for possible cost savings.
However, comparing the costs of cloud deployments to the costs of on prem deployments is not an apples to apples comparison. Large capital expenditures for new equipment every few years are replaced by recurring operational fees paid to the cloud vendor. Determining those fees is complicated because they are based on a number of factors such as the type and number of computations performed, the volume of the data stored in the cloud, and the volume of the data flowing in or out of the cloud. In addition, cloud platform vendors usually offer a variety of discounts – for public sector clients, for long term commitments, and more. So, how does an organization accurately estimate the cost of its systems running on the cloud?
What we suggest is that when agencies are considering deploying a new or existing system in the cloud, they first create a validation environment – in order to test the system in a controlled manner, measure the utilization of the cloud resources and the associated costs, and if necessary, perform some optimizations as well. The technical team should perform load tests of the various system components that represent both typical usage and burst usage of the system. While this is taking place, the team should measure and record resource utilization and the concomitant cloud services fees that result from these activities. In addition, the agency should utilize the resource utilization measurements for its load tests to project the total anticipated resource utilization for production operations and input that data into the cost estimate calculator that each cloud vendor makes available to its current and prospective customers.
At this point, the team can pinpoint significant contributors to resource utilization, identify opportunities for efficiency improvements, and potentially modify the systems themselves and/or the cloud infrastructure configuration to make the system and its cloud implementation as efficient as possible. If modifications are made, the load tests should be run again and the cost estimates can be recalculated using the new resource utilization measurements.
While there is a cost to performing this evaluation prior to moving to the production operations to the cloud, there are some distinct advantages that benefit the agency in the long run. This approach:
- Yields a well founded estimate of the cost of operating the system in the cloud with a much narrower range of uncertainty than would otherwise be possible.
- Avoids the financial risk of migrating production operations to the cloud and then being surprised by costs that cannot be sustained.
- Enables the system to go through multiple rounds of measurement and improvement in a limited, controlled way. The agency can run a load test, stop it, study the results, and implement potential remedies without continuously utilizing resources and generating related fees. This is more efficient than only performing the measurement and improvement only after the production operations have already migrated to the cloud thereby incurring the full cost of any inefficiencies continually until the inefficiencies are identified and resolved.
- Helps the agency as a whole to obtain experience in the following areas, even if the agency ultimately decides not to move forward with deploying the specific system being evaluated:
- Deploying a system in the cloud
- Developing techniques for identifying the major contributors to resource utilization and cloud fees that can be applied to other projects.
- Determining and implementing strategies for optimizing systems in a cloud environment.
- Producing a reliable estimate for what it costs to run a major system in the cloud.
Believe it or not, some organizations are moving their systems from the cloud back to on-premise deployment (“cloud repatriation”) and this is primarily because of cost, though other factors may also be in play. We think the strategies discussed in this post can help agencies to gather the necessary information to make better informed decisions when it comes to cloud hosting and save money in the long run. It may also be possible for agencies to share the insights gained from their efforts with other organizations to allow for richer cross-agency learning.