BXT

Back How not to change PostgreSQL column type

How not to change PostgreSQL column type

One of the surprises that comes with developing applications and operating a database cluster behind them is the discrepancy between practice and theory, development environment and the production. A perfect example of such a mismatch is changing a column type.

The conventional knowledge on how to change a column type in PostgreSQL (and other systems compliant with the SQL standard) is to:

ALTER TABLE table_nameALTER COLUMN column_name[SET DATA] TYPE new_data_type

which is obviously the semantically correct way, but given the right circumstances, you might be set for a rather unpleasant surprise.

The Problem#

Let's create a sample table and demonstrate the isolated behaviour that you might observe. Let's start with 10 million rows (which really is just a drop in the whole world of data).

-- create very simple tableCREATE TABLE sample_table (    id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,    label TEXT,    created_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NOW(),    updated_at TIMESTAMP WITHOUT TIME ZONE DEFAULT NOW());-- populate with 10m recordsINSERT INTO sample_table (label)SELECT   'hash: ' || md5(random()::text)FROM generate_series(1, 7000000);

and let's change the id type from INT to BIGINT.

alter_type_demo=# ALTER TABLE sample_table ALTER COLUMN id TYPE bigint;ALTER TABLETime: 21592.190 ms (00:21.592)

And … 21 seconds later, you have your change. Mind you, this is a small table with roughly 600 MB of data in it. What if you going to face 100x that amount? Let's have a look what went on behind the scene.

The things PostgreSQL must do#

Changing a data type (and many other operations you might experience) is no simple task, and the PostgreSQL engine has to perform several tasks:

As you can see, there's quite a bit involved in performing something that might be understood as routine table maintenance. The size of the data being modified, disk I/O and capacity, and general system congestion come into play.

But the real problem does not end here. If we are talking about any sort of serious production deployment, you must consider more things:

As you can see, altering the column data type is not as straightforward as it might seem. Current CI/CD practices often make it very easy for software developers to commit and roll out database migrations to a production environment, only to find themselves in the middle of a production incident minutes later. While a staging deployment might help, it's not guaranteed to share the same characteristics as production (either due to the level of load or monetary constraints).

The problem is, therefore (and I will repeat myself), the scale of the amount of data being modified, overall congestion of the system, I/O capacity, and the target table's importance in the application design.

At the end of the day, it translates to the total time needed for the migration to finish, and the unique constraints your business can or maybe cannot afford. The simplest solution to the problem is to schedule the planned maintenance during low traffic periods and get it done.

How to Safely Change a PostgreSQL Column Type#

What if you need to rewrite hundreds of gigabytes or even terabytes of data and can't afford anything more than minimal downtime? Let's explore how to change a column type properly.

Let's start with The Bad News - you cannot avoid rewriting the entire table, which will generate a significant amount of WAL files in the process. This is a given, and you must plan how to manage it.

The Good News: You can spread the potential downtime over a much longer period than it might take to process the data. The specific requirements and constraints will vary based on individual business needs, so careful planning is essential.

The full migration can be summarised as series of following steps:

At this point you are ready to perform the switch itself. If you can verify all rows have correctly populated new column, it's time to embrace the most difficult part. If possible in one transaction and or smaller scheduled downtime

It's good practise to consider the restart of all the application relying on the changed table, as some tools (ORMs… I'm looking at you) might cache the OIDs and not handle the change gracefully.

And that would be it - except not really. Dropping the column only removes the reference and the data itself will remain physically on the disk. This is the scenario where you will might need to perform VACUUM FULL - which could lock the table and rewrite it completely—potentially defeating the purpose of a concurrent migration. This brings us back to the original article which motivated me to write this guide - [[The Bloat Busters: pg_repack vs pg_squeeze]] is the way to go. Preparation and familiarity with these tools in advance are highly recommended.

Conclusion#

While changing the column type in PostgreSQL can be as simple as issuing an ALTER TABLE command, it is important for everyone involved to understand the complexities attached to it. Whether you are the software developer requesting the change, someone reviewing it, or the individual tasked with resolving incidents when such changes are deployed to the production environment without careful planning, a deep understanding of this process is crucial. Moreover, grasping this particular change enables you to easily project insights onto other potentially costly operations.

Correction: it's actually done without full table rewrite since PostgreSQL 11 (Fast Column Creation with Defaults)


source: https://notso.boringsql.com/posts/how-not-to-change-postgresql-column-type/
https://bxt.org/h9red