Remove “Forgotten Compute Instances” automatically.

Vaibhav Sonavane
4 min readMar 7, 2021
Do you know what you have lost?

Let us all first agree and acknowledge the fact that working in a cloud environment is not as easy as proclaimed. A network or infrastructure engineer who has spend his life working on on-premises workloads is bound to make mistakes when it comes to cloud. Today, a developer is not only expected to code but also has an additional responsibility of creating the necessary infra on the cloud to host the application given the ease of deployment provided by the cloud providers.

You cannot expect a developer to be wary of the technical nuances of the networking world or even the seasoned network engineers can have a hard time with the cloud ways of deploying and managing workloads.

I would like to highlight a few common mistakes below that takes place in most cloud deployments:

  1. Creating public instances when not required.
  2. Creating public instances with weak passwords like admin/admin.
  3. Allowing the entire internet to access the instance (0.0.0.0/0).
  4. Not adequately restricting access to resources using IAM policies.
  5. Creating cloud resources and forgetting about it.

I would like to concentrate on the 5th mistake which is not only important from a security perspective but also from a commercial point of view. Many a times, we login to the cloud console to create the required infra for our project and we end up over provisioning resources. Or after creating the environment we find that we have created it in a different region altogether. Also, we all would agree that after the project has been completed we do not bother to de-provision the infra created. These actions have grave issues which had never crossed our mind.

Issues:-

  1. Security- The public instances could be exposed to brute force attack and the compromised instances could become part of a botnet.
  2. Security- Confidential information could be leaked if the Object Store/ S3 Buckets are left as public.
  3. Security- Any successful compromise of the network can leverage the forgotten instances to deploy their malware/rootkits and initiate a secondary level of attack.
  4. Commercial- Cloud resources are billed monthly. The forgotten infra gets piled up over time and keeps the billing expenses every increasing.

Solution:-

The best solution is to educate the cloud users like developers & admins to ensure that corrective steps are taken to avoid the above issues, but we all know that this cannot be a foolproof solution. There has to be an automated solution which can ensure that there are no “Forgotten Instances” lying in the cloud.

There are many ways one can think to automate the cleansing process, like writing a script which can search the entire cloud infra to list the “Forgotten Resources” but that has its own challenges. I would like to propose a simpler solution which I would like to call the “Kill Me” solution.

The “Kill Me” Solution:-

Pre-Requisites-

  1. As a best practice, we always recommend to use hardened OS images for spinning the instances. The “Kill Me” process/daemon should be part of the hardened image for the solution to work. All instances created on the cloud would use this as the base image.
  2. A Serverless/Lambda Function should be created which is called by the “Kill Me” process. The Function is the one that actually deletes the instances and alerts the required admins.

Solution in Action-

  1. The “Kill Me” process is continuously running on the instances.
  2. Whenever the instance is idle for > (tolerable period) (say 60 days), the “Kill Me” process would invoke the Serverless/ Lambda function and pass its instance details.
  3. The Function would delete the Instance which has called and alert the desired admins or an SIEM solution.

Code-

Source:https://stackoverflow.com/questions/25662926/how-to-know-if-system-is-completely-idle

The above Shell Script runs on each compute instance at all times and invoke the below Python Script once the idle time exceeds the tolerable time.

Disclaimer: The “XprintIdle” might not be the ideal parameter to claim that the machine is idle. I leave it to the Linux/Unix experts to find additional parameters to ensure the the claim to be “Idle” is justified.

Source: https://github.com/vaibhavsonavane/oci

I have not provided the code for the Serverless/Lambda Function to perform the cleansing activity as it is not the intent of this article.

If anyone has a better solution, please let me know in the comments. Also, I would like to know how the cleansing process can be taken to other cloud resources like volumes, buckets etc.

References:-

--

--

Vaibhav Sonavane

A cloud security enthusiast with an urge to learn and unlearn. A coder at heart with a logical mind.