A stopped EC2 instance that would not start again
by Sebastien Mirolo on Fri, 16 Jun 2023Many times we stopped an EC2 instance, changed the instance type and started it again. This time, the instance would just refuse to start again. Let's dive in how we debugged this issue and fixed it.
A cryptic error message
The instance would start and stop again almost immediately. Using the Command Line Interface (CLI) to describe the instance, we find a cryptic error message
$ aws ec2 describe-instances --instance-ids *instance-id* { ... "StateTransitionReason": "Server.InternalError", ... "StateReason": { "Code": "Client.InternalError", "Message": "Client.InternalError: Client error on launch" }, ... }
Few and far between clues
A few Google search for AWS Server.InternalError, or AWS Client error on launch didn't return much. The oddest thing is that instance created in 2022 would not start again, yet an instance created in 2021 would have no problem. Furthermore, the exact same script we used to bring up the 2022 instance initially would not be able to bring a new instance up. Now in 2023, it appeared to suffer the same start-and-immediately-stop issue ...
Pointers were few but it seemed the issue was with the EBS Volume, most likely the KMS encryption keys as hinted here: How do I troubleshoot an Amazon EC2 instance that stops or terminates when I try to start it?
Doing a side-by-side comparison of the 2021 and 2022 instances, we found out the KMS encryption keys used were different. Furthermore the 2021 instance was using an AWS managed key while the 2022 instance is using a Customer managed key.
Poking around, the Key policy for both were quite different. The AWS managed key had lines like the following, which pointed us to believe there was a problem for the EC2 service to access the customer managed key.
... "Condition": { "StringEquals": { "kms:ViaService": "ec2.eu-central-1.amazonaws.com", "kms:CallerAccount": "577747654972" } } ...
So we copy/pasted the AWS managed key policy into the customer managed key policy, adding a "kms:Put*" is the second statement in order to be able to edit that policy again. Et Voila! The EC2 instances started again!
The EC2 instance booted properly with the original customer key policy in 2022. It does not anymore in 2023. It is unclear if it is something that changed on AWS-side or not. It will remain a mystery.
More to read
If you are looking for related posts, Resizing an EBS disk and PostgreSQL, encrypted EBS volume and Key Management Service are good reads.
More technical posts are also available on the DjaoDjin blog. For fellow entrepreneurs, business lessons learned running a subscription hosting platform are also available.