CIT - Build CDK Infrastructure Testing - Part 1 - Terratest and the Integrated Integration



TL;DR You don`t need a DSL to do easy integration testing. With CDK available in go, infrastructure test can be programmed with GO packages easily.

Motivation for CIT-CDK Infrastructure Testing

Basic Test Pyramid

On top of the test pyramid are the End-To-End test. For a website that would mean - will the (end) user of the site will get see a proper HTML page or not.

With DevOps and IaC - Infrastructure as Code - the application is not separated from the infrastructure anymore. The application itself relies on the infrastructure. So to have a running application and a successful End-To-End test, application and infrastructure must be tested and work.

So we have an application and and infrastructure side of the pyramid.

Divided Test Pyramid

To test the infrastructure End-To-End we have to decouple the application from the infrastructure. On way to do this is to deploy a minimal web application - a web server with a testable response - to fully test the infrastructure.

Integration on the app side

Application Development Testing

During development the unit test should be performed multiple times, at least before each commit to code repository. So they should be as atomic as possible and performing fast.

If multiple services or frontend and backend work together, the cooperation is to be tested in the integration tests. Usually they take more time and will be performed in a CI/CD pipeline.

The End-to-End test just checks if everything works together. It does not check whether the application is maintainable or flexible or understandable. For development, extensibility and refactoring and a good unit test coverage is helpful.

Integration on the CDK side

Infrastructure Development Testing

You start an CDK development with(*):

cdk init app --language=yourlanguage

The Unit test generated by this scaffolding just check, whether the right the CloudFormation (Cfn) templates are generated.

This is useful, if you are not sure about the generated Cfn-template. For instances if you create new constructs. But not so useful with standard constructs.

For instance, this is a part of the generated CDK go code:

awssns.NewTopic(stack, jsii.String("MyTopic"), &awssns.TopicProps{
		DisplayName: jsii.String("MyCoolTopic"),
	})

A SNS topic is created.

With the generation of the SNS Topic, the generated test code checks the proper creation of an SNS CloudFormation Topic:

	template := gjson.ParseBytes(bytes)
	displayName := template.Get("Resources.MyTopic86869434.Properties.DisplayName").String()
	assert.Equal(t, "MyCoolTopic", displayName)

This is only useful if you are unsure, whether the awssns constructs works. But this is tested in the awssns.NewTopc construct itself.

Have a look at Github CDK SNS:

Code Snippet from aws-cdk/packages/@aws-cdk/aws-sns/test/test.sns.ts

expect(stack).toMatch({
        'Resources': {
          'MyTopic86869434': {
            'Type': 'AWS::SNS::Topic',

So we are testing twice.

Again, if you are developing a new construct or if you create resources dynamically it could also be helpful to add test. But not if you just test “an sns construct will generate an sns cloudformation resource”.

Do not test AWS basic service functionality, but the integrations

You also should not test that an AWS service base functionality works, but you should test if your configuration works.

For example, you do not have to test that a Security Group which is open on port 80 really opens port 80. But with a more complex scenario with several groups and several servers, it could make sense to test that server A really can connect to server B and that only on this very port.

So I think that testing the integration of the created infrastructure should be part of the automated testing. I will call this “CIT” - CDK Infrastructure Testing.

Challenges for CIT

To achieve this, we have to face some challenges:

  • Mapping of logical and physical IDs
  • Finding or creating Test Libraries
  • Testing with context

Logical - Physical Mapping

Mapping

When you create infrastructure with CDK or CloudFormation you define names for your Constructs. CDK generates Resource names from the construct names. These Resource names are called logical IDs or Logical Names. In the CDK code, an instance can be titled “Web-Server”.

Construct name MyTopic

awssns.NewTopic(stack, jsii.String("MyTopic"), &awssns.TopicProps{
	DisplayName: jsii.String("MyCoolTopic"),
})

Resource name MyTopic86869434 in Cfn

Resources:
  MyTopic86869434:
    Type: AWS::SNS::Topic
    Properties:
      DisplayName: MyCoolTopic
    Metadata:
      aws:cdk:path: GocdkStack/MyTopic/Resource

When CloudFormation creates the resource it will give it an ID like GocdkStack-MyTopic86869434-1EQYGNEXFF12T. This is called the Physical ID Each call to an AWS API concerning this Topic (like aws sns list-subscriptions-by-topic) will need to have the Physical ID.

Call to get information about the Topic resource

aws sns list-subscriptions-by-topic --topic-arn "arn:aws:sns:eu-central-1:669453403305:GocdkStack-MyTopic86869434-1EQYGNEXFF12T"
{
    "Subscriptions": []
}

Physical ID GocdkStack-MyTopic86869434-1EQYGNEXFF12T in deployed Stack

{
	"LogicalResourceId": "MyTopic86869434",
	"PhysicalResourceId": "arn:aws:sns:eu-central-1:669453403305:GocdkStack-MyTopic86869434-1EQYGNEXFF12T",
	"ResourceType": "AWS::SNS::Topic"
}

So test running on the physical side need to have the physical ID.

My goal for the CIT is to provide helper functions for an easy mapping of Construct names and physical IDs. For the quick start we will create a SystemsManager Parameter Store parameter for the physical id.

Service Test Libraries

A general distinction for testing libraries is if you want to use a specialized DSL (domain-specific language) or using low levels calls with the AWS SDK.

An example for an testing DSL is Chef InSpec

E.g. an Application Load balancer can be tested with:

describe aws_alb('arn:aws:elasticloadbalancing') do
  it { should exist }
end

describe aws_alb(load_balancer_arn: 'arn:aws:elasticloadbalancing') do
  it { should exist }
end

Or the generated SNS Topic could be tested in InSpec with:

describe aws_sns_topic('arn:aws:sns:*::my-topic-name') do
  it { should exist }
end

The plus of such an generic approach is an easy start. The downside is that the DSL is limited and extensions of the DSL is more complicated. And as new AWS services types keep coming, it is very hard work keeping creating new DSL items for each new service type.

Between DSL and totally relying on coding with pure SDK is terratest. This is an GO library aimed at helping tests for terraform generated infrastructure. As the infrastructure does not care about how its generated, its quite easy to use it for Cfn/CDK generated resources also. Some AWS helper functions for testing you will find in the module terratest/aws.

I have used terratest now in a handful of projects and found it quite useful. As it uses the AWS GO SDK, you may easily add AWS GO SDK API Calls and have access to the whole AWS API. terratest has code for checking SNS Topic existance here

Creating a test for SNS with the GO SDK would be easy as adding a few lines in this GO V2 SNS Example: GO SNS.

Context aware Testing

Another challenge is that some resources under the test microscope only work when its linked to another resource, which I will call agents. This is the case with SecurityGroups and IAM policies.

VPC/Network Context

Without attaching an Security Group to an ENI Elastic Network Interface you may not check whether it really works.

2018 I used Chef kitchen/inspec in this (german) blogpost to create a test and a testee instance to test routing with transit gateway: https://aws-blog.de/2018/12/mit-allen-verbunden-teil-1.html

Another example of inspec with Systems Manager is described here in Thomas post: https://aws-blog.de/2020/10/air-gapped-compliance-scans-with-inspec.html

IAM Context

If you want to test the results of IAM policies, you need to have en entity which has these policies attached. With the complexity and feature rich IAM policy json/yaml data, testing could often help to get clarity here.

We will look at the agent problem later.

Now we start with a working first example.

CIT UseCase: End to End Testing an CDK generated Load Balancer Web Server

I want to code an End to End test for an CDK generated webserver with Application Load Balancer.

The Application

Mapping

To start right away without additional libraries, i just write the physical ID, or the dns name in this example to the Systems Manager (SSM) Parameter Store. With that the code has a mapping between the logical and physical ID.

The Load Balancer GO CDK code:

	lb := elasticloadbalancingv2.NewApplicationLoadBalancer(stack, aws.String("LB"),
	&elasticloadbalancingv2.ApplicationLoadBalancerProps{
		Vpc:                myVpc,
		InternetFacing:     aws.Bool(true),
		LoadBalancerName:   aws.String("ALBGODEMO"),
	},
	)

Output the Url to Parameter Store:

ssm.NewStringParameter(stack, aws.String("govpc"),
		&ssm.StringParameterProps{
			Description:    aws.String("alb"),
			ParameterName:  aws.String("/cdk-templates/go/alb_ec2"),
			StringValue:    lb.LoadBalancerDnsName(),
		},
	)

Test Code

terratest provides an easy method to test http calls with the http_helper

func TestALBRequest(t *testing.T) {
	
	storedUrl := aws.GetParameter(t,region,"/cdk-templates/go/alb_ec2")

	url := fmt.Sprintf("http://%s", storedUrl)

	sleepBetweenRetries, error := time.ParseDuration("10s")
	if error != nil {
		panic("Can't parse duration")
	}
	http_helper.HttpGetWithRetry(t, url, nil, 200 , "<h1>hello world</h1>", 20, sleepBetweenRetries)
	
}

Run, Test and Destroy script

In the github repo, you see all scripts defined in the Taskfile.yml:

Taskfile settings

  1. In the moment CDK version going wild, so I use a fixed CDK version number:
vars:
  version: v2.0.0-rc.3
  constructs: v10.0.5
  npxoptions: -y
...
npx {{.npxoptions}} cdk@{{.version}}
  1. Deploy
      - npx {{.npxoptions}} cdk@{{.version}}  deploy --require-approval never --profile $AWSUME_PROFILE
  1. Test
    - go test

or use go test -v for chattyness.

  1. Destroy
      -  npx {{.npxoptions}} cdk@{{.version}}  destroy --force --profile $AWSUME_PROFILE

Calling the deploy-test-destroy cycle on different settings:

FAIL 1

If the stack is not created yet, the test fails:

 go test
--- FAIL: TestALBRequest (0.41s)
    ssm.go:18:
        	Error Trace:	ssm.go:18
        	            				alb_ec2_test.go:16
        	Error:      	Received unexpected error:
        	            	ParameterNotFound:
        	            		status code: 400, request id: 6576de83-ff43-4817-b7c6-d337637d0542
        	Test:       	TestALBRequest
FAIL
exit status 1
FAIL	alb_ec2	0.875s

This is because the SSM parameter does not exist - ParameterNotFound.

Fail 2

If I forgot to start the httpd in the userdata script, the test will also fail:

yum update
yum install -y httpd
systemctl enable httpd
echo "<h1>hello world</h1>" > /var/www/html/index.html

test:

Stack ARN:
arn:aws:cloudformation:eu-central-1:669453403305:stack/AlbInstStack/ae48b400-b94e-11eb-9fc8-0635bbfae99c
task: go test
TestALBRequest 2021-05-20T11:39:44+02:00 retry.go:91: HTTP GET to URL http://ALBGODEMO-1441991621.eu-central-1.elb.amazonaws.com
TestALBRequest 2021-05-20T11:39:44+02:00 http_helper.go:32: Making an HTTP GET call to URL http://ALBGODEMO-1441991621.eu-central-1.elb.amazonaws.com
TestALBRequest 2021-05-20T11:39:44+02:00 retry.go:103: HTTP GET to URL http://ALBGODEMO-1441991621.eu-central-1.elb.amazonaws.com returned an error: Validation failed for URL http://ALBGODEMO-1441991621.eu-central-1.elb.amazonaws.com. Response status: 502. Response body:
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
...
</html>. Sleeping for 10s and will try again.

The SSM parameter exists, points to the right ALB, but the webserver is not running.

With the retry cycle I make sure that CloudFormation has enough time to create the resources.

OK

If I fix the userdata, created the right alb etc the test will pass:

task test
task: go test
TestALBRequest 2021-05-16T16:56:25+02:00 retry.go:91: HTTP GET to URL http://ALBGODEMO-465149672.eu-central-1.elb.amazonaws.com
TestALBRequest 2021-05-16T16:56:25+02:00 http_helper.go:32: Making an HTTP GET call to URL http://ALBGODEMO-465149672.eu-central-1.elb.amazonaws.com
PASS
ok  	alb_ec2	0.771s

This is a simple example. I did the same thing with IIS on windows and Powershell Scripting - thats more than four lines in the UserData and this test really helped me.

You can run the test cycle with task cit from the github code.

Whole Deploy-Test-Destroy cycle

task cit
...
AlbInstStack: deploying...
[0%] start: Publishing 4ad61576fc9e67b1526ec28726d554e5177af3c40a96dce030832ddf21f2eda2:669453403305-eu-central-1
[100%] success: Published 4ad61576fc9e67b1526ec28726d554e5177af3c40a96dce030832ddf21f2eda2:669453403305-eu-central-1
AlbInstStack: creating CloudFormation changeset...
[██████████████████████████████████████████████████████████] (48/48)

 ✅  AlbInstStack
Stack ARN:
arn:aws:cloudformation:eu-central-1:669453403305:stack/AlbInstStack/a2bc6830-bba0-11eb-b5db-0ad1bb830c16
task: go test
TestALBRequest 2021-05-23T10:31:01+02:00 retry.go:91: HTTP GET to URL http://ALBGODEMO-1159953432.eu-central-1.elb.amazonaws.com
TestALBRequest 2021-05-23T10:31:01+02:00 http_helper.go:32: Making an HTTP GET call to URL http://ALBGODEMO-1159953432.eu-central-1.elb.amazonaws.com
PASS
ok  	alb_ec2	1.622s
Profile ggtrcadmin
task: npx -y cdk@v2.0.0-rc.3  destroy --force --profile $AWSUME_PROFILE
AlbInstStack: destroying...
10:31:19 | DELETE_IN_PROGRESS   | AWS::CloudFormation::Stack                | AlbInstStack
...
 ✅  AlbInstStack: destroyed

Using stages for Infrastructure testing vs Integrated End-to-End Test

With the userdata in the example the webserver just responded with a testable response. In order to distinguish between infrastructure test and app+infrastructure end-to-end test, context variables can be used.

The CDK documentation shows you how to do it:

With a flag “stage”, which holds values like ‘[dev|prod] you switch between test userdata and production userdata:

	stage := scope.Node().TryGetContext(aws.String("stage"))

And check the flag in your code like:

	userdataFileName := "userdata/webserver-infratest.sh"
	if stage == "prod"{
		userdataFileName="userdata/webserver-production.sh"
	}

For prod stage with: cdk synth -c stage=prod now you get:

      UserData:
        Fn::Base64: |-
          #!/bin/bash
          yum update
          yum install -y httpd
          systemctl start httpd
          systemctl enable httpd
          echo "<h1>Styled Production Page</h1> <p> lorem ipso</p>" > /var/www/html/index.html          

Whereas with cdk synth -c stage=dev you get :

     UserData:
        Fn::Base64: |-
          #!/bin/bash
          yum update
          yum install -y httpd
          systemctl start httpd
          systemctl enable httpd
          echo "<h1>hello world</h1>" > /var/www/html/index.html          

Which would deploy a testable instance and webserver.

Other CDK languages

This approach involves programming only the “cit” infrastructure test in GO. Here is the same example with the TypeScript CDK LoadBalancer and cit in GO:

CDK example template LoadBalancer EC2.

By using the SystemsManager parameter store as a language-independent parameter transfer, we are polyglot.

Conclusion

It was really easy (30 lines of go code) to test a CDK generated webserver. This approach can be used for all CDK languages. In the next part I will show a GO module for directly accessing physical IDs via the CDK name aka logical ID.

Check it out

This code is available at the tecRacer Github “cdk-templates” repository:

https://github.com/tecracer/cdk-templates/tree/master/go/alb_ec2

For remarks, discussion, chatting please contact me on twitter @megaproaktiv.

I hope these thought can be useful for you next project!

Notes

(*) Create a GO CDK V2 Application

To build a GO CDK application, you currently have to create the template in cdk v1 and then manually upgrade the modules to CDK V2.

  1. Create go app
alias cdk1='npx cdk@v1.105.0'
cdk1 init app --language=go
  1. Migrate the modules, here I have described how to:

Migrating to AWS CDK v2 - the missing GO manual

  1. Switch to CDK V2
alias cdk='npx cdk@v2.0.0-rc.4'
cdk synth

Similar Posts You Might Enjoy

Using CloudFormation Modules for Serverless Standard Architecture

Serverless - a Use Case for CloudFormation Modules? Let´s agree to “infrastructure as code” is a good thing. The next question is: What framework do you use? To compare the frameworks, we have the tRick-benchmark repository, where we model infrastructure with different frameworks. Here is a walk through how to use CloudFormation Modules. This should help you to compare the different frameworks. - by Gernot Glawe

Test driven development with AWS and golang

Why Go? Go(lang) is a fast strongly typed language, which is a good fit for AWS lambda and other backend purposes. I am going to highlight some nice go features. Usually this leads to heated discussions about the “best” programming language… - by Gernot Glawe

Implementing and deploying Custom Resources using CDK

CDK doesn’t offer every type of resource by default. Custom Resources allow us to overcome this limitation. Anything that can be created within a Lambda function, can be deployed as a Custom Resource, with its lifecycle being managed by CDK/CloudFormation. This blog posts explains how to set this up using the example of an user in a SimpleAD. - by Fabian Brakowski