Implementing and deploying Custom Resources using CDK



I am a big fan of AWS’s Cloud Development Kit (CDK) as it allows me to use Typescript (or another preferred programming language) to set up any infrastructure in AWS as code. However, not all types of resources are creatable with CDK by default. I recently learned this fact in a small project I was working on. Instead of being frustrated about it, that gave me the chance to learn about an awesome feature of CDK (or more precisely CloudFormation on which CDK is built). I am referring to CustomResources.

Task

The initial task was to set up AWS workspaces for a training lab environment. The students should be able to log into to their virtual desktop environment via the web browser and from there connect to other AWS resources. This was necessary because one customer’s security policy didn’t allow direct RDP or SSH traffic from the user’s workstations. Setting this up via the AWS Console didn’t take extremely long and worked perfectly fine. However, I needed a solution that would run automatically and would provide a clean setup for each new training. The requirement was to have a solution where the exact number of created workspaces would be configurable and no additional manual effort would be needed after triggering the setup.

Problem

On my quest to find a proper solution for this task, I quickly realized a few problems that were not solvable with CDK by default:

  1. Workspaces require a connection to an Active Directory. Luckily I was working in eu-west-1 were SimpleAD is enabled. This removed some of the management overhead related to Managed AD. However, CDK does not yet have the required API to “register” the directory for use with AWS Workspaces.
  2. There is no CloudFormation/CDK Construct for the individual Workspaces. One reason for this is that every workspaces is bound to a user in the provided directory.
  3. AD users cannot be created via CloudFormation/CDK. AWS doesn’t manage those resources. AWS manages the AD, but not what is inside.

Solution

Initially, I built a simple node.js script that I ran on my local machine. It required that a Simple AD would be provisioned already and that its API-Endpoint (using the LDAP protocol) would be reachable from my machine. For the latter I used a Network Load Balancer.

The script did the following:

  • Call the RegisterWorkspaceDirectory API via the workspaces SDK. This prepared the directory for use with Workspaces. Additionally, I used the ModifyWorkspaceAccessProperties API to enable web access to workspaces.
  • Create a list of users and passwords based on the required training participants named training01, training02, etc.
  • Loop the list and do the following for every user in the list:
    • Create the user in the Simple AD using the LDAP protocol. I found a pretty handy npm package called ldapts to achieve this. There is no AWS API for user creation in Simple AD.
    • Set the user’s password using the ResetUserPassword API of directory services SDK. (I do not understand why this one exists while there is no user creation API, however it is easier to use than plain LDAP).
    • Create a workspace for the user using the CreateWorkspaces API.

So far so good. This works fine. However, it is not integrated into CDK and requires network connectivity to the LDAP API endpoint.

Therefore I started learning about CDK Custom Resources. Custom Resources provide a flexible interface to define and managed entities that are not part of CDK’s or even AWS’s default set of resources. With custom resources you can define any type of object you like. Creation, Update and Deletion are handled within a Lambda function that is called during deployment of your stack. Of course you have to follow some guidelines on how to handle those “resource lifecycle events”, but they are fairly well described in the official documentation and are easy to learn if you are familiar with CDK and Lambda. If you use Python, AWS even provides some helper libraries. (Didn’t help me… I used Typescript).

Custom Resource Lambda Integration

For matter of simplicity, I will demonstrate only a small part of my final setup - the full solution can be found here.

The following steps are required to deploy a Custom Resource:

  • Build Lambda Function to handle lifecycle events
  • Build Provider Resource that references the Lambda Function
  • Create Custom Resource with required properties, referencing the Provider’s Service Token

Technical Deep Dive

At its core, a Custom Resource is itself a CDK construct that takes in two types of parameters. First, the properties of the resource (e.g. username, password, ID of the directory, etc.). Second, it requires a reference to a Provider (or better said its serviceToken).

  new CustomResource(this, "ResourceName", {
            properties: {
                directoryId: simpleAD.ref,
                userName: "user",
                ...
            },
            serviceToken: provider.serviceToken
  }

All logic of how to create and maintain the resource is defined with the provider function. As I am a big fan of useful encapsulation, I placed all required objects into a custom class that I initialize once within my code and then refer to later (e.g. to get the serviceToken).

Lets analyse it.

interface LDAPUserProviderProps {
    vpc: Vpc,
    simpleAD: CfnSimpleAD,
    adminUser: string;
    adminPasswordParameter: StringParameter,
    baseDN: string;
    domain: string;
}

export class LDAPUserProvider extends cdk.Construct {
    provider: Provider;
    serviceToken: string;
    simpleAD: CfnSimpleAD;
    baseDN: string;
    domain: string;
    adminPasswordParameter: StringParameter;
    adminUser: string;



    constructor(scope: cdk.Construct, id: string, props: LDAPUserProviderProps) {
        super(scope, id);

Above are simply the parameter definitions and the properties of the class. The properties store some general data later used for creation of the resource as well as a reference to the actual Provider object.

        //create Handler Lambda Function
        const customResourceHandler = new Function(this, "workspaceCreateLdapUser", {
            runtime: Runtime.NODEJS_14_X,
            handler: 'index.handler',
            code: Code.fromAsset('lambda/workspace-create-ldap-user'),
            logRetention: 3,
            timeout: Duration.minutes(2),
            vpc: props.vpc,
            vpcSubnets:
            {
                subnets: props.vpc.privateSubnets
            }

        });

This lambda function “does all the magic”. We will dive into it later. You can see that it is deployed within a VPC. This is required as the API endpoints of the directory are only accessible within this VPC. As it additionally needs to reach the public endpoints of the AWS APIs, it has to be placed within a private subnets with a route to a NAT gateway. (Public Subnets for Lambda functions aren’t an alternative as Lambda functions can’t be assigned a public IP.)

The rest of the class is again straightforward. The Lambda function requires some additional authorizations for the AWS APIs. Also it is always important to be aware of what other resources need to exist for proper deployment of the Provider and the CustomResource. For Provider deployment, the VPC needs to exists. For the deployment of the new AD User, the directory itself needs to exists. The dependency to the directory is attached to the CustomResource directly as we will see later.

        //provide proper authorizations to Lambda Function
        props.adminPasswordParameter.grantRead(customResourceHandler);
        customResourceHandler.role?.addToPrincipalPolicy(new PolicyStatement({
            actions: [
                "ds:*",
            ],
            effect: Effect.ALLOW,
            resources: ['*'],
        }));

        //add dependencies to other resources that have to exists for the Lambda function to work properly
        //deployment of the custom resource will wait for those to be created.
        customResourceHandler.node.addDependency(props.vpc); // dependency to VPC required to make sure NAT gateways do not get deleted before Lambda does.

        // create actual provider using the Lambda function and the vpc
        this.provider = new Provider(this, "provider", {
            onEventHandler: customResourceHandler,
            vpc: props.vpc,
        });

        //store properties
        this.simpleAD = props.simpleAD;
        this.baseDN = props.baseDN;
        this.domain = props.domain;
        this.serviceToken = this.provider.serviceToken;
        this.adminPasswordParameter = props.adminPasswordParameter
        this.adminUser = props.adminUser
    }
}

Next, lets look at the Lambda function. Its basic structure is defined by the RequestType of the current request. Based on this property, the respective activity is performed.

exports.handler = async (event: CloudFormationCustomResourceEvent, context: Context, callback: Callback): Promise<CloudFormationCustomResourceResponse> => {

    //do some pre execution preparation, e.g.:
    //get LDAP endpoint URL via DescribeDirectories API
    //get Admin Password from SSM Parameter Store
    [...]

    //then execute task based on request type
     switch (event.RequestType) {
        case "Create":

            try {
                    [...] //magic to create the resource
                    return {
                        Status: "SUCCESS",
                        Reason: "",
                        LogicalResourceId: event.LogicalResourceId,
                        PhysicalResourceId: directoryId + "+user-" + username,
                        RequestId: event.RequestId,
                        StackId: event.StackId
                    };
                    //PhysicalResourceID needs to be able to uniquely identify the object later on
            }
            catch(error) {
                return {
                        Status: "FAILED",
                        Reason: JSON.stringify(error),
                        LogicalResourceId: event.LogicalResourceId,
                        PhysicalResourceId: directoryId + "+user-" + username,
                        RequestId: event.RequestId,
                        StackId: event.StackId
                    };
            }
                    
            break;

        case "Update":

            try {
                    [...] //magic to update the resource based properties in event and PhysicalResourceID
                    return {
                        Status: "SUCCESS",
                        Reason: "",
                        LogicalResourceId: event.LogicalResourceId,
                        PhysicalResourceId: directoryId + "+user-" + username,
                        RequestId: event.RequestId,
                        StackId: event.StackId
                    };
            }
            catch(error) {
                return {
                        Status: "FAILED",
                        Reason: JSON.stringify(error),
                        LogicalResourceId: event.LogicalResourceId,
                        PhysicalResourceId: directoryId + "+user-" + username,
                        RequestId: event.RequestId,
                        StackId: event.StackId
                    };
            }

            break;

        case "Delete":

            try {
                    [...] //magic to delete the resource based on the PhysicalResourceID
                    return {
                        Status: "SUCCESS",
                        Reason: "",
                        LogicalResourceId: event.LogicalResourceId,
                        PhysicalResourceId: directoryId + "+user-" + username,
                        RequestId: event.RequestId,
                        StackId: event.StackId
                    };
            }
            catch(error) {
                return {
                        Status: "FAILED",
                        Reason: JSON.stringify(error),
                        LogicalResourceId: event.LogicalResourceId,
                        PhysicalResourceId: directoryId + "+user-" + username,
                        RequestId: event.RequestId,
                        StackId: event.StackId
                    };
            }

            break;
    }


}

There are a few important things to note about those request types:

  • The PhysicalResourceId that is returned after creation is supposed to be a unique identifier for the object. It is created by the Lambda function when initially creating the object. It is later provided to the function during update or deletion.
  • Implement proper error handling to avoid endless loops
  • During deployment the function will be called multiple times if it fails. Keep that it mind .

To come back to the original example of user creation in LDAP, this is the content of my Creation event handler. The trc_ws_ops.create_user function is defined in a separate module and calls the LDAP API with the provided parameters using the connection details specified in workspaceProps.


    var workspaceProps: trc_ws_ops.workspace_props = {
        adminUser: "Administrator",
        adminPassword: adminPasswordParameter.Value,
        baseDN: baseDN,
        defaultEmail: email,
        bundle: "", //not required
        directory: directoryId,
        domain: domain,
        endpointUrl: endpointUrl
    };
  
    return await trc_ws_ops.create_user(workspaceProps,
        {
            username: username,
            password: password,
            email: email
        })
        .then(() => {
            var returnData: CloudFormationCustomResourceSuccessResponse =
            {
                Status: "SUCCESS",
                Reason: "",
                LogicalResourceId: event.LogicalResourceId,
                PhysicalResourceId: directoryId + "+user-" + username,
                RequestId: event.RequestId,
                StackId: event.StackId
            };
            console.log(JSON.stringify(returnData));
            return returnData;
        })
        .catch(error => {
            var returnData: CloudFormationCustomResourceFailedResponse =
            {
                Status: "FAILED",
                Reason: JSON.stringify(error),
                LogicalResourceId: event.LogicalResourceId,
                PhysicalResourceId: directoryId + "+user-" + username,
                RequestId: event.RequestId,
                StackId: event.StackId
            };

            console.log(JSON.stringify(returnData));
            return returnData;
        });

Let us look into creating users within a provided Active Directory (here using SimpleAD). My goal was to have CKD Construct as below that I could simply loop for all new users I required.

    const ldapUser = new LDAPUser(this, "User_" + user, {
        username: user,
        password: user + "!",
        email: "noreply@tecracer.de",
        provider: lDAPUserProvider
      });

The Construct LDAPUser is merely an abstraction for the following:


export class LDAPUser extends cdk.Construct {
    Resource: CustomResource;
    username: string;
    email: string;
    password: string;

    constructor(scope: cdk.Construct, id: string, props: LDAPUserProps) {
        super(scope, id);

        this.Resource = new CustomResource(this, "LDAPUser-" + props.username, {
            serviceToken: props.provider.serviceToken,
            properties: {
                directoryId: props.provider.simpleAD.ref,
                "adminUser": props.provider.adminUser,
                "adminPasswordParameter": props.provider.adminPasswordParameter.parameterName,
                "baseDN": props.provider.baseDN,
                "email": props.email,
                "domain": props.provider.domain,
                "username": props.username,
                "password": props.password

            }
        });

        //ensures that the AD exists before the user creation is started
        this.Resource.node.addDependency(props.provider.simpleAD);
        this.Resource.node.addDependency(props.provider.adminPasswordParameter);


        this.username = props.username;
        this.password = props.password;
        this.email = props.email;

    }

}

After having defined, implemented and tested the Provider Functions, using it is fairly simple as already demonstrated in the beginning.

First, the Provider is initialized. For this we create an instance of our previously defined LDAPUserProvider class. As a reminder: it contains the actual provider and its serviceToken as a property.


    const lDAPUserProvider = new LDAPUserProvider(this, "LDAPUserProvider", {
      adminUser: props.adminUser,
      adminPasswordParameter: adminPasswordParameter,
      simpleAD: simpleAD,
      vpc: props.vpc,
      baseDN: props.baseDN,
      domain: props.domain
    });

The creation of the CustomResource can further be encapsulated in a LDAPUser class to simplify usage, which concludes this task. Note that the provider properties refers to the LDAPUserProvider.

    const ldapUser = new LDAPUser(this, "User_" + user, {
        username: user,
        password: user + "!",
        email: "noreply@tecracer.de",
        provider: lDAPUserProvider
      });

Pain Points and Learnings

Having completed this task, I feel like I will be able to build new Custom Resources fairly quickly. However, getting to this point was accompanied by a steep learning curve and a lot of trial and error. Especially debugging the provider function took a lot of time and effort.

My recommendation is to follow a layered approach when developing:

  1. Create a Creation, Update and Deletion Logic in a separate module that you can test locally
  2. Create the provider function, but test it first outside of CDK
  3. Only then integrate it into CDK.

Another thing that took some time to realize and debug were dependency issues to other resources. One example was an instance, when the NAT Gateways of the VPC were deleted before the LDAP Users could be deleted. However, those are required for some API calls made by the Lambda function. This led to timeout issues of the API calls, resulting in endless retries of the delete event until the Deployment timed out (leaving the stack in an inconsistent state).

Similar Posts You Might Enjoy

How boto3 impacts the cold start times of your Lambda functions

Many Lambda functions are written in Python and use the AWS SDK boto3. Initializing your first boto3 client or resource can take a long time after a Lambda cold start. In this article we explore how much exactly and why it’s up to 10x faster on Lambdas with more memory. - by Maurice Borgmeier

CDK Infrastructure Testing - Part 2b - Unit, Integration and Application Test for Serverless Lambda Functions

After describing the context of the test pyramid for Infrastructure as Code in part 1, and the Web Application in Part 2a - let`s apply that to some Lambda function. - by Gernot Glawe

CDK Infrastructure Testing - Part 2a - Implement Unit, Integration and Application Test for CDK Infrastructure and an EC2 Web Server Application

With CDK you create Infrastructure as Code - IaC. You can automate the test for the IaC code. The three test tastes -Unit, Integration and Application- should work closely together. Here I show you how. It is like the three steps of coffee tasting: 1 smell, 2 Taste, 3 Feel. - by Gernot Glawe