Working with lists in DynamoDB



DynamoDB has support for storing complex data types like lists, sets or maps (aka dictionaries/hash tables). This capability allows for flexible usage patterns. In this article we’ll take a closer look at lists. We’ll explore what is possible with them, what isn’t and how we can manipulate them through Python.

This article doesn’t follow a clear storyline, it’s more like a list of recipes you can use in your own projects.

The Basics

DynamoDB supports lists as attributes for items. However: they’re not supported as part of a key. That means the attributes that make up your partition and sort key can’t be lists (or maps or sets for that matter). If you try to create an item with an attribute that’s part of a global secondary indexes' key schema that has an incompatible data type, you’ll get the error below. In the example GSI1PK and GSI1SK are the partition and sort keys of the global secondary index GSI1.

GSI1PK as list

One of the things that makes GSIs useful is that you can create them at any point in time. Even after you’ve added data to the table. Now the question is: what happens if I create a GSI based on an attribute that may be a complex data type in some of the already existing items. The answer is a little anticlimactic. Index creation works, but the items won’t show up in the GSI.

Lists can store items of different types, that means you’re free to mix numbers, strings, sets, lists and other types in a single list. An item like this that mixes different data types is perfectly valid:

{
  "listAttribute": {
    "L": [
      {
        "N": "1"
      },
      {
        "M": {
          "a": { "S": "b"}
        }
      },
      {
        "L": [
          { "N": "1"}
        ]
      },
      {
        "NS": ["1", "2"]
      },
      { "S": "text"}
    ]
  },
  "PK": {
    "S": "pk"
  }
}

Creating a demo table

Let’s move on to manipulating lists. We’ll use Python and the AWS SDK for this. First we’ll create a table for us to work with - it’s a simple table with On-Demand capacity and a partition key that is also the primary key.

"""Quick primer for working with lists in DynamoDB attributes"""

import typing

import boto3

from botocore.exceptions import ClientError

TABLE_NAME = "list-demo"

def create_table_if_not_exists():

    try:
        boto3.client("dynamodb").create_table(
            AttributeDefinitions=[{"AttributeName": "PK", "AttributeType": "S"}],
            TableName=TABLE_NAME,
            KeySchema=[{"AttributeName": "PK", "KeyType": "HASH"}],
            BillingMode="PAY_PER_REQUEST"
        )
    except ClientError as err:
        if err.response["Error"]["Code"] == 'ResourceInUseException':
            # Table already exists
            pass
        else:
            raise err

Creating items with lists

Now that we have a table, we can think about the kind of data we want to store in it. I decided on a simple pattern where there is a sensor and each sensor can have a list of measurements. To create a sensor with the list of measurements, I’m using the table-resource from boto3, which automatically translates the Python data types to the underlying DynamoDB format. Creating an item is now a simple put_item operation on the table resource. Note that I’ve also included a condition that raises an exception if the item already exists. This way we’ll only create new items and not overwrite existing ones.

import typing

import boto3
import boto3.dynamodb.conditions as conditions

from botocore.exceptions import ClientError

TABLE_NAME = "list-demo"

def create_sensor_if_not_exists(sensor_id: str, measurements: typing.List[int] = None):
    """Create a new sensor with optional measurements if it doesn't exist."""

    measurements = measurements or []

    table = boto3.resource("dynamodb").Table(TABLE_NAME)

    try:
        table.put_item(
            Item={
                "PK": f"S#{sensor_id}",
                "sensorId": sensor_id,
                "type": "SENSOR",
                "measurements": measurements
            },
            ConditionExpression=conditions.Attr("PK").not_exists()
        )
    except ClientError as err:
        if err.response["Error"]["Code"] == 'ConditionalCheckFailedException':
            raise ValueError("Sensor already exists") from err
        else:
            raise err

This function optionally accepts a list of initial measurements. If they’re not supplied, it will just store an empty list on the item. This wasn’t possible in the old days, but DynamoDB now supports empty lists.

Appending to lists

A second use case would be to append a new measurement to the list. To achieve this we could read the item, append the new measurement to the list locally and subsequently overwrite the old item, but that would be inefficient. DynamoDB has a list_append function that is supported in the UpdateItem API call. This also has the benefit that DynamoDB takes care of any race conditions that may arise when we update an item. Here’s an example for that:

import typing

import boto3
import boto3.dynamodb.conditions as conditions

from botocore.exceptions import ClientError

TABLE_NAME = "list-demo"

def append_measurement_to_sensor(sensor_id: str, measurement: int):
    """Add a measurement to a sensor if said sensor exists"""

    table = boto3.resource("dynamodb").Table(TABLE_NAME)

    try:
        table.update_item(
            Key={
                "PK": f"S#{sensor_id}",
            },
            UpdateExpression="SET #m = list_append(#m, :measurement)",
            ExpressionAttributeNames={
                "#m": "measurements",
            },
            ExpressionAttributeValues={
                ":measurement": [measurement]
            },
            ConditionExpression=conditions.Attr("PK").exists()

        )
    except ClientError as err:
        if err.response["Error"]["Code"] == 'ConditionalCheckFailedException':
            raise ValueError("Sensor doesn't exist") from err
        else:
            raise err

I want to point out how the UpdateExpression works. The expression SET #m = list_append(#m, :measurement) essentially says: For the item that matches the Key, set the attribute that’s referenced as #m to the value of list_append(#m, :measurement). The latter only works, if #m is of type list and in that case adds the value of the :measurement placeholder at the end.

The ExpressionAttributeNames argument is responsible for replacing any #-variables in the update expression. ExpressionAttributeValues on the other hand replaces all :-variables in the update expression. This is a good practice and it allows you to circumvent problems, if your attributes have the names of reserved keywords in DynamoDB.

Deleting from lists

Now that we’ve added a few measurements, we notice that some of them are incorrect. Let’s remove those. Removing list items can be done through an UpdateItem call with a specific update expression.

import typing

import boto3
import boto3.dynamodb.conditions as conditions

from botocore.exceptions import ClientError

TABLE_NAME = "list-demo"

def delete_measurement_from_sensor(sensor_id: str, measurement_idx: int):
    """Remove the measurement at a specific index from a sensor if the sensor exists"""

    table = boto3.resource("dynamodb").Table(TABLE_NAME)

    try:
        table.update_item(
            Key={
                "PK": f"S#{sensor_id}",
            },
            UpdateExpression=f"REMOVE #m[{measurement_idx}]",
            ExpressionAttributeNames={
                "#m": "measurements",
            },
            ConditionExpression=conditions.Attr("PK").exists()

        )
    except ClientError as err:
        if err.response["Error"]["Code"] == 'ConditionalCheckFailedException':
            raise ValueError("Sensor doesn't exist") from err
        else:
            raise err

Note that this removes items from the measurement list based on their index in the list (0-based). I’ve also added a condition that verifies the item exists before we remove a value. This is actually optional as it wouldn’t fail without it. In my case I want it to fail if it can’t find the item, because something clearly has gone wrong and I want to be notified of that fact.

Appending to a list and updating a specific value at the same time

The last use case is an edge case. Suppose we want to change the value of an existing measurement at any point of the list and append a new measurement at the end. Easy, you might think - just combine list_append and the regular set-a-value syntax. Unfortunately that doesn’t work (see this stackoverflow question for an example) and you’ll get an error like this:

Two document paths overlap with each other; must remove or rewrite one of these paths

Fortunately there is a neat workaround for this. When you set a high index on your update call that is outside of the range of the list, the value will be appended to the end.

import typing

import boto3
import boto3.dynamodb.conditions as conditions

from botocore.exceptions import ClientError

TABLE_NAME = "list-demo"

def change_first_and_append(sensor_id: str, new_first: int, to_append: int):
    table = boto3.resource("dynamodb").Table(TABLE_NAME)

    try:
        table.update_item(
            Key={
                "PK": f"S#{sensor_id}",
            },
            UpdateExpression=f"SET #m[0] = :new_first, #m[1000000] = :new_last",
            ExpressionAttributeNames={
                "#m": "measurements",
            },
            ExpressionAttributeValues={
                ":new_first": new_first,
                ":new_last": to_append
            },
            ConditionExpression=conditions.Attr("PK").exists()

        )
    except ClientError as err:
        if err.response["Error"]["Code"] == 'ConditionalCheckFailedException':
            raise ValueError("Sensor doesn't exist") from err
        else:
            raise err

I’m using this property to avoid the aforementioned error. In this case I know that my list will have fewer than 1.000.000 entries, so I’m using 1.000.000 in the update expression (SET #m[0] = :new_first, #m[1000000] = :new_last) to essentially append the value to the list. I was surprised when I learned about this behavior in the stackoverflow question I linked to, but it’s well documented:

When you use SET to update a list element, the contents of that element are replaced with the new data that you specify. If the element doesn’t already exist, SET appends the new element at the end of the list.

If you add multiple elements in a single SET operation, the elements are sorted in order by element number.

Limitations

Working with and updating lists has a few limitations at the moment:

  • You can’t remove items based on their position from the end of a list (something like list[-1] to address the last item isn’t possible as it would be in pure Python)
  • You can’t have a condition that checks if an item exists in a list
  • There is no way to enforce a data type for a list, you’d have to use a set for that, which has the drawback of not being ordered
  • It’s unfortunately impossible to have list-based sort keys and filter based on that (although this would be really cool)

Conclusions

Working with lists is fairly easy in DynamoDB, although there are some quirks to it. If you have more of these to share, feel free to reach out to me on the social media channels in my bio, I’m happy to add them here.

— Maurice

Similar Posts You Might Enjoy

Modelling a product catalog in DynamoDB

Data modelling in NoSQL databases is different from what we’re used to in the relational world. In this article we’ll talk about the process of data modelling in DynamoDB, single-table design and how to build a basic data access layer using python. We’ll explore these concepts by building a product catalog for a simple webshop that supports different query patterns and basic inventory management. - by Maurice Borgmeier

Complexity costs: Read performance for nested DynamoDB items with different Lambda configurations

DynamoDB allows us to store complex data structures and deeply nested objects, but this complexity isn’t free. In this post we take a look at how different Lambda configurations impact the read times from boto3. We examine how different resource configurations can improve the read time of the same item by more than a factor of 12. - by Maurice Borgmeier

DynamoDB in 15 minutes

In this post I’ll introduce DynamoDB, a very powerful fully managed NoSQL wide-column data store in AWS. We will talk about data structures, the APIs to read and write data, indexes, as well as performance and cost considerations. In the end you will gain a solid understanding of the basics, which will serve as a starting point for further research. - by Maurice Borgmeier