Cross Account Resource Access - Invalid Principal in Policy



Separating projects into different accounts in a big organization is considered a best practice when working with AWS. AWS supports us by providing the service Organizations. However, this leads to cross account scenarios that have a higher complexity. My colleagues and I already explained one of those scenarios in this blog post, which deals with S3 ownership (AWS provided a solution for the problem in the meantime). Today, I will talk about another cross account scenario that came up in our project, explain why it caused problems and how we solved them.

The Scenario

It is a rather simple architecture. A Lambda function from account A called Invoker Function needs to trigger a function in account B called Invoked Function. Obviously, we need to grant permissions to Invoker Function to do that. We have some options to implement this.

Invoker-Function

The Simple Solution (that caused the Problem)

The simplest way to achieve the functionality is to grant the Invoker Function in account A permission to invoke the Invoked Function in account B by attaching the following policy to the role of Invoker Function:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction",
            ],
            "Resource": "arn:aws:lambda:eu-central-1:<account-id-b>:function:invoked-function"
        }
    ]
}

While this would be a complete solution in a non-cross-account scenario, we need to do an additional step, namely granting the invoke permission also in the resource policy of Invoked Funciton in Account B. This is not possible via the console, so you will need to use the CLI or even better, build everything via Infrastructure as Code (IaC). Using the CLI the necessary command looks like this:

aws lambda add-permission --function-name invoked-function \
  --statement-id any-id \
  --action lambda:invokeFunction \
  --principal arn:aws:iam::<account-id-b>:role/service-role/invoker-function-role-3z82i06i

The Invoker role ARN has a random suffix, as it got automatically created by AWS. In the AWS console of account B the Lambda resource based policy will look like this:

{
  "Version": "2012-10-17",
  "Id": "default",
  "Statement": [
    {
      "Sid": "any-id",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account-id-a>:role/service-role/invoker-function-role-3z82i06i"
      },
      "Action": "lambda:invokeFunction",
      "Resource": "arn:aws:lambda:eu-central-1:<account-id-b>:function:invoked-function"
    }
  ]
}

Now this works fine and you can go for it. Have fun :)

Unless you are in a real world scenario, maybe even productive, and you need a reliable architecture. Then go on reading. In the real world, things happen. In this case the role in account A gets recreated. As the role got created automatically and has a random suffix, the ARN is now different. Consequently, the Invoker Function does not have permission to trigger Invoked Function anymore. You can simply solve this problem by creating the role by yourself and giving it a name without random suffix and you will be surprised: You still get permission denied in Invoker Function when recreating the role.

What happened is that on the side of Invoked Function in account B, the resource policy changed to something like this as soon as the role gets deleted:

{
  "Version": "2012-10-17",
  "Id": "default",
  "Statement": [
    {
      "Sid": "any-id",
      "Effect": "Allow",
      "Principal": {
        "AWS": "AROA4KVSNIJZBLR5NCUAW"
      },
      "Action": "lambda:invokeFunction",
      "Resource": "arn:aws:lambda:eu-central-1:<account-id-b>:function:invoked-function"
    }
  ]
}

The principal changed from the ARN of the role in account A to a cryptic value. This is due to the fact that each ARN at AWS has a unique id that AWS works with in the backend. We normally only see the better-readable ARN. However, as the role in A got recreated, the new role got a new unique id and AWS can’t resolve the old unique id anymore. Hence, we do not see the ARN here, but the unique id of the deleted role. Although we might have the same ARN when recreating the role, we do not have the same underlying unique id. That is the reason why we see permission denied error on the Invoker Function now. This is done for security purposes by AWS.

A consequence of this error is that each time the principal changes in account A, account B needs a redeployment. But a redeployment alone is not even enough. A simple redeployment will give you an error stating Invalid Principal in Policy. Here you have some documentation about the same topic in S3 bucket policy. To solve this, you will need to manually delete the existing statement in the resource policy and only then you can redeploy your infrastructure. You don’t want that in a prod environment. Instead we want to decouple the accounts so that changes in one account don’t affect the other.

The Account Id Solution

The easiest solution is to set the principal to a more static value. That is, for example, the account id of account A.

{
  "Version": "2012-10-17",
  "Id": "default",
  "Statement": [
    {
      "Sid": "any-id",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account-id-a>:root"
      },
      "Action": "lambda:invokeFunction",
      "Resource": "arn:aws:lambda:eu-central-1:<account-id-b>:function:invoked-function"
    }
  ]
}

However, this does not follow the least privilege principle. In this case, every IAM entity in account A can trigger the Invoked Function in account B. You could argue that account A is a trusted account from your Organization and that they do not get sensitive information or cause harm when triggering Invoked Function. Be aware that account A could get compromised. Then this policy enables the attacker to cause harm in a second account.

The Policy-Condition Solution

A nice solution would be to use a combination of both approaches by setting the account id as principal and using a condition that limits the access to a specific source ARN. This could look like the following:

{
  "Version": "2012-10-17",
  "Id": "default",
  "Statement": [
    {
      "Sid": "any-id",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account-id-a>:root"
      },
      "Action": "lambda:invokeFunction",
      "Resource": "arn:aws:lambda:eu-central-1:<account-id-b>:function:invoked-function",
      "Condition": {
        "ArnLike": {
          "AWS:SourceArn": "arn:aws:iam::<account-id-a>:role/service-role/invoker-role"
        }
      }
    }
  ]
}

Sadly, this does not work. The Invoker Function gets a permission denied error as the condition evaluates to false. It seems SourceArn is not included in the invoke request. We can’t create such a resource policy in the console and the CLI and IaC frameworks are limited to use the --source-arn parameter to set a condition. I tried a lot of combinations and never got it working.

The Assume-Role Solution

The last approach is to create an IAM role in account B that the Invoker Function assumes before invoking Invoked Function. The IAM role needs to have permission to invoke Invoked Function. In that case we don’t need any resource policy at Invoked Function. However, we have a similar issue in the trust policy of the IAM role even though we have far more control about the condition statement here.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<account-id-a>:root"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringLike": {
          "aws:PrincipalArn": "arn:aws:iam::<account-id-a>:role/service-role/invoker-role"
        }
      }
    }
  ]
}

Using this policy statement and adding some code in the Invoker Function, so that it assumes this role in account A before invoking the Invoked Function, works. This is some overhead in code and resources compared to the simple solution via resource policy, but it solves our problem and provides some advantages. First, the value of aws:PrincipalArn is just a simple string. AWS does not resolve it to an internal unique id. Hence, it does not get replaced in case the role in account A gets deleted and recreated. Second, you can use wildcards (* or ?) for potentially changing characters like e.g. a random suffix or if you want to grant the AssumeRole permission to a set of resources.

We decoupled the accounts as we wanted. As long as account A keeps the role name in a pattern that matches the value of PrincipalArn, account B is now independent of redeployments in account A.

Conclusion

The simple solution is obviously the easiest to build and has least overhead. In case resources in account A never get recreated this is totally fine. Using the accounts root as a principle without condition is a simple and working solution but does not follow least privileges principle so I would not recommend you to use it. In this scenario using a condition in the Lambdas resource policy did not work due to limited configuration possibilities in the CLI. Lastly, creating a role and using a condition in the trust policy is the solution that solves the described problems.

In this blog I explained a cross account complexity with the example of Lambda functions. However, I guess the Invalid Principal error appears everywhere, where resource policies are used. I have experienced it with bucket policies and it just makes sense that it is similar with SNS topics or trust policies in IAM roles. The difference for Lambda is that in most other cases you have more options to set conditions in the resource policy and thus you don’t need to use an extra role.

Similar Posts You Might Enjoy

Streamlined Kafka Schema Evolution in AWS using MSK and the Glue Schema Registry

In today’s data-driven world, effective data management is crucial for organizations aiming to make well-informed, data-driven decisions. As the importance of data continues to grow, so does the significance of robust data management practices. This includes the processes of ingesting, storing, organizing, and maintaining the data generated and collected by an organization. Within the realm of data management, schema evolution stands out as one of the most critical aspects. Businesses evolve over time, leading to changes in data and, consequently, changes in corresponding schemas. Even though a schema may be initially defined for your data, evolving business requirements inevitably demand schema modifications. Yet, modifying data structures is no straightforward task, especially when dealing with distributed systems and teams. It’s essential that downstream consumers of the data can seamlessly adapt to new schemas. Coordinating these changes becomes a critical challenge to minimize downtime and prevent production issues. Neglecting robust data management and schema evolution strategies can result in service disruptions, breaking data pipelines, and incurring significant future costs. In the context of Apache Kafka, schema evolution is managed through a schema registry. As producers share data with consumers via Kafka, the schema is stored in this registry. The Schema Registry enhances the reliability, flexibility, and scalability of systems and applications by providing a standardized approach to manage and validate schemas used by both producers and consumers. This blog post will walk you through the steps of utilizing Amazon MSK in combination with AWS Glue Schema Registry and Terraform to build a cross-account streaming pipeline for Kafka, complete with built-in schema evolution. This approach provides a comprehensive solution to address your dynamic and evolving data requirements. - by Hendrik Hagen

Cross Account Kafka Streaming: Part 1

When discussing high performant real-time event streaming, Apache Kafka is a tool that immediately comes to mind. Optimized for ingesting and transforming real-time streaming data in a reliable and scalable manner, a great number of companies today rely on Apache Kafka to power their mission-critical applications and data analytics pipelines. In this blog series, I would like to show you how you can leverage Amazon MSK and Terraform to set up a fully managed, cross-account Apache Kafka streaming pipeline on AWS. In this first part, we will set up the MSK Kafka cluster and producers. The second part will show you how you can set up distributed Kafka clients in different AWS accounts and communicate with the MSK cluster via AWS VPC Endpoints. - by Hendrik Hagen

Cross Account Kafka Streaming: Part 2

When discussing high performant real-time event streaming, Apache Kafka is a tool that immediately comes to mind. Optimized for ingesting and transforming real-time streaming data in a reliable and scalable manner, a great number of companies today rely on Apache Kafka to power their mission-critical applications and data analytics pipelines. In this blog series, I would like to show you how you can leverage Amazon MSK and Terraform to set up a fully managed, cross-account Apache Kafka streaming pipeline on AWS. In the first part, we already set up the MSK Kafka cluster and producers. The second part will show you how you can set up distributed Kafka clients in different AWS accounts and communicate with the MSK cluster via AWS VPC Endpoints. - by Hendrik Hagen