Hands on: Migrating from PostgreSQL to MySQL with the AWS Database Migration Service (DMS)



Hello, today we’ll take a look at how to migrate from PostgreSQL to MySQL using AWS DMS, the database migration service of AWS.

This is a hands on article, so we’ll be light on theory and start by preparing databases for our source and targets with some sample data. Afterwards we’ll set up the DMS instance and then configure our migration and transformation job. I’ll also point out some of the things to keep in mind for your own migrations along the way. Let’s get started.

There can’t be a database migration without databases

We need a PostgreSQL database as source and a MySQL database as target. In case you already have both databases available, you can skip to the Preparing databases section.

First we’re going to prepare the source database on an EC2 instance and then we’ll set up the target database.

Building the source database

  1. Create an EC2 instance using Amazon Linux 2
  2. Add a Security group to your instance that allows to ports 22. To reduce security risks, you may want to only allow port 22 communication from your IP. We’ll come back later to the security group once we have an IP address for our replication instance.
  3. Connect to your instance using ssh

Postgres are you there?

  1. Install PostgreSQL (or let it run inside docker ^_^ ). The following script will install Docker on your instance and start a PostgreSQL server inside a container with the username and password set to postgres.

    # Install and enable docker as a service
    sudo yum install docker -y
    sudo systemctl enable docker
    sudo systemctl start docker
    
    # Create a directory for the data volume
    mkdir ~/postgres_data -p
    
    # Run the container and expose the
    # database port on our instance
    sudo docker run -d  \
        --name postgres_container \
        -p 5432:5432 \
        -e POSTGRES_PASSWORD=postgres \
        -e PGDATA=/var/lib/postgresql/data/pgdata \
        -v ~/postgres_data/:/var/lib/postgresql/data \
        --restart unless-stopped \
        postgres:12
    
  2. Start the postgres client with

    sudo docker exec -it postgres_container psql -U postgres
    
  3. Populate your PostgreSQL database with the following sample table. Inspired from TutorialsPoint.

    CREATE TABLE COMPANY(
    ID INT PRIMARY KEY     NOT NULL,
    NAME           TEXT    NOT NULL,
    AGE            INT     NOT NULL,
    ADDRESS        CHAR(50),
    SALARY         REAL,
    JOIN_DATE      DATE
    );
    INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE)
         VALUES (1, 'Paul', 32, 'California', 20000.00,'2001-07-13');
    INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,JOIN_DATE)
         VALUES (2, 'Allen', 25, 'Texas', '2007-12-13');
    INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE)
         VALUES (3, 'Teddy', 23, 'Norway', 20000.00, DEFAULT );
    INSERT INTO COMPANY (ID,NAME,AGE,ADDRESS,SALARY,JOIN_DATE)
         VALUES (4, 'Mark', 25, 'Rich-Mond ', 65000.00, '2007-12-13')
              , (5, 'David', 27, 'Texas', 85000.00, '2007-12-13');
    

Now that we’ve successfuly taken care of the source database, let’s work on getting the target database set up.

Building the target database

Note: You can also install the mysql database on the same instance containing the postgres database. In that case skip the steps 1 and 2.

  1. Create another EC2 instance using Amazon Linux 2

  2. Add a Security group to your instance that allows connections to ports 22. To reduce security risks, you may want to only allow port 22 communication from your IP. We’ll come back later to the security group once we have an IP address for our replication instance.

  3. Connect to your instance using ssh

  4. Install MySQL (or let in run inside docker ^_^ ). The following script will start a MySQL instance inside docker with root password mysql.

    # Install and enable docker as a service
    sudo yum install docker -y
    sudo systemctl enable docker
    sudo systemctl start docker
    
    # Create a directory for the data volume
    mkdir ~/mysql_data -p
    
    # Run the container and expose the
    # database port on our instance
    sudo docker run -d \
        --name mysql_container \
        -p 3306:3306 \
        -e MYSQL_ROOT_PASSWORD=mysql \
        -v ~/mysql_data:/var/lib/mysql \
        --restart unless-stopped \
        mysql:8.0
    

That’s it for the first step. We now have both our databases up and running!

Preparing databases

Assuming your databases are up and running, let’s see what is required to have them ready for DMS

Source (PostgreSQL) database prerequisites

At the time of writing this, AWS DMS only support PostgreSQL 9.x (starting at 9.4) through 12.x as the source database. In case you are migrating from PostgreSQL 10.x please have a look at the extra steps required for this version in the AWS DMS recommendations for PostgreSQL v10.
For other supported version, the preparation steps are as follows:

  1. Install a text editor if not available on your server/container:

    sudo docker exec -it postgres_container sh -c "apt update && apt install nano"
    
  2. Open your pg_hba.conf configuration file:

    sudo docker exec -it postgres_container nano "/var/lib/postgresql/data/pgdata/pg_hba.conf"
    
  3. Add the following lines at the end of the previously opened file (pg_hba.conf) (You can save/exit in nano with Ctrl + X, Y, Enter ):

    # Allow users to establish connections from any host
    host all all 0.0.0.0/0 md5
    # Allow replication connections from any host, for user aws_dms
    host replication aws_dms 0.0.0.0/0 md5
    

    This allows connections from anywhere and replication connections from anywhere for the user aws_dms. The ip-address (to be known later) of the replication instance could be used instead of of 0.0.0.0/0 however it is fine as the ip-based restriction will be set in the security-group later once the ip address is known.

  4. Open your postgresql.conf configuration file

    sudo docker exec -it postgres_container nano "/var/lib/postgresql/data/pgdata/postgresql.conf"
    
  5. Apply the following modifications to the file postgresql.conf (You can search in nano with Ctrl + W):

    • Set wal_level to logical (without quotes).
    • Set max_replication_slots to a value greater than 1.
    • Set max_wal_senders to a value greater than 1.
    • Set wal_sender_timeout to 0.

The wal_sender_timeout parameter terminates replication connections that are inactive longer than the specified number of milliseconds. Although the default is 60 seconds, AWS recommends to set this parameter to zero, which disables the timeout mechanism.

  1. Restart the postgres database (or the container inside which it is running)

    sudo docker restart postgres_container
    
  2. Start the postgres client with

    sudo docker exec -it postgres_container psql -U postgres
    
  3. Create a user with read access to the source schemas/tables. Grant Superuser+Replication privileges if you want Change Data Capture too (our case):

    CREATE ROLE aws_dms LOGIN REPLICATION SUPERUSER PASSWORD 'aws_dms_password';
    GRANT USAGE ON SCHEMA PUBLIC TO aws_dms;
    GRANT SELECT ON ALL TABLES IN SCHEMA PUBLIC TO aws_dms;
    

Target (MySQL) database prerequisites

We need to create credentials (user: aws_dms, password: aws_dms_password) and a schema (aws_dms_schema) that we will pass to aws dms. It is possible to use existing credentials and schemas. We start mysql in our container (the root password for our mysql in container is mysql):

sudo docker exec -it mysql_container mysql -p

And enter the password: mysql. Once we are connect we can run the following command to create desired schema and a user with enough privileges to use it.

CREATE USER 'aws_dms'@'%' IDENTIFIED BY 'aws_dms_password';
CREATE SCHEMA aws_dms_schema;
GRANT ALTER, CREATE, DROP, INDEX, INSERT, UPDATE, DELETE, SELECT ON aws_dms_schema.* TO 'aws_dms'@'%';
GRANT ALL PRIVILEGES ON awsdms_control.* TO 'aws_dms'@'%';

Prepare the migration in AWS/DMS

Get the roles and permissions ready

AWS DMS requires some specific roles: dms-vpc-role, dms-cloudwatch-logs-role. In case you use Amazon Redshift as a target database dms-access-for-endpoint is required as well. These roles are normally created when using DMS with the AWS Console. If you happen to use the cli or if for whatever reason they aren’t created, please refer to the documentation for instructions on how to create them.

You need source and target endpoints

Get into AWS DMS and create a source endpoint for your PostgreSQL database and a target endpoint for your MySQL database. The source endpoint should look as follows:

Source endpoint configuration

The target endpoint should look as follows:

Target endpoint configuration

Make sure to set the following settings for your target (MySQL) endpoint:

parallelLoadThreads=1 initstmt=SET FOREIGN_KEY_CHECKS=0 local_infile=1

parallelLoadThreads defines how many threads should be used to load the data into the target MySQL database. A low number of threads is recommended as each thread requires its own database connection.
initsmt=SET FOREIGN_KEY_CHECKS disables foreign key checks during the full load.
local_infile=1 lets AWS DMS know it should load data into the target database. More information about the configurations of a MySQL compatible database are available here.

Target endpoint specific configuration

Create a replication instance

Make sure to select a VPC and an instance with enough memory. The selected DMS version should be adequate for your database version (e.g. DMS>=3.3.1 for MySQL 8.0 and DMS==3.3.3 for PostgreSQL 12.x) we chose DMS version 3.3.3. Check or uncheck the checkbox Publicly available depending on whether or not your databases are able to communicate with the replication over a private network. Replication instance

Since we now have an IP address for our replication instance, we can now go back to the security groups of our PostgreSQL and MySQL servers (EC2 instances). First, we allow incoming traffic from the replication instance at port 5432 to the security group of our PostgreSQL database. Then, we allow incoming traffic from the replication instance at port 3306 to the security group of our MySQL database.

Now we’ve finished all the prerequisite steps and can finally create our replication task.

Create a migration task

Create a migration task and select the previously created replication instance, source and target endpoints. We are going to migrate our existing data and replicate ongoing changes.

Task configuration

Make sure to enable CloudWatch logs, as the system may otherwise silently fail e.g. when unable to write into the target database. The remaining settings can be left untouched.

Task configuration Settings

At least one INCLUDE rule is required. In our case, we want to include the table company from our schema public. This is also where you’d create your own mapping and migration settings.

Table mapping selection

The source schema public will then be replicated as the target schema aws_dms_schema in the MySQL database.

Table mapping transformation

Leaving the following option checked will start the migration as soon as you create it

Table mapping

Et voilà

The migration task should have started directly upon creation.

Note: In case the task doesn’t start and notifies an error with the endpoints, check the endpoints configuration once again and run a connection test on both of them.

Migration task starting

After a while the task status should switch to Load complete, replication ongoing.

Migration task running

Conclusion

AWS database migration service (DMS) can help you achieve your database migration with virtually no downtime. This post presented the case of an heterogeneous migration from PostgreSQL to MySQL but AWS DMS supports more sources for data migration and even more targets for data migration.

I recommend you check out the documentation to learn more about supported sources and targets for data migration.

For simplicity’s sake we used simple passwords and even reused some. In practice you should make sure to always use strong and unique passwords.

Similar Posts You Might Enjoy

Working with lists in DynamoDB

DynamoDB supports complex data types like lists. In this post we take a look at different ways to interact with lists. We will use Python to write code that may be used in a data access layer to manipulate items with list attributes. - by Maurice Borgmeier

CDK Lambda Deployment takes about a minute - how about sub second Function Code Deployment?

CDK Lambda Deployment takes about a minute - how about sub second Function Code Deployment? Creation of Lambda infrastructure with the CDK is really powerful. Updating the Function code is really slow. Here is a fix for that to get to a sub-second Lambda function deployment time. - by Gernot Glawe

Modelling a product catalog in DynamoDB

Data modelling in NoSQL databases is different from what we’re used to in the relational world. In this article we’ll talk about the process of data modelling in DynamoDB, single-table design and how to build a basic data access layer using python. We’ll explore these concepts by building a product catalog for a simple webshop that supports different query patterns and basic inventory management. - by Maurice Borgmeier