Features for PostgreSQL

Modified date: Monday, June 30, 2025

Introduction
General
Data Types
SQL
Storage and System
Benchmarking
Tools
Export Regulations
Internal
Internal - Optimizer
Internal - Runtime
Internal - Storage Engine

Introduction

In its own words, "PostgreSQL is a powerful, open source object-relational database system with over 35 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance". It is currently ranked #4 in DB-Engine rankings.

▶️ Online Playground

Specific Concepts

We list some concpets might be used in the sections to explain things.

Concept	Explaination
MVCC, VACUUM	PostgreSQL's MVCC (Multi-Version Concurrency Control) allows transactions to run concurrently without interfering with each other by maintaining multiple versions of each row. When a transaction modifies data, PostgreSQL doesn't overwrite the existing row but instead creates a new version, while marking the old version with transaction IDs that indicate when it was created (xmin) and when it was deleted or superseded (xmax). Each transaction sees a consistent snapshot of the database based on its start time, so readers can continue to access the original version of data even while writers are updating it, avoiding read-write conflicts. To manage the accumulation of old row versions, PostgreSQL relies on the VACUUM process, which periodically cleans up obsolete rows that are no longer visible to any active transaction. PostgreSQL has spent much efforts to tune and improve VACCUM.
WAL, PITR	PostgreSQL’s Write-Ahead Logging (WAL) ensures data durability and crash recovery by recording changes to the database in a separate log before applying them to the data files. Every modification — whether an insert, update, delete, or DDL change — is first written sequentially to the WAL files on disk. This guarantees that, in the event of a crash, PostgreSQL can replay the WAL to restore the database to a consistent state without losing committed transactions. The WAL design also improves performance by allowing changes to be flushed to the WAL quickly (sequential I/O), while actual data files can be updated later in the background. Additionally, WAL enables features like point-in-time recovery (PITR), replication, and streaming replication, where standby servers replay WAL records to stay synchronized with the primary server.
OID	In PostgreSQL, an OID (Object Identifier) is a unique, system-assigned 32-bit integer used to identify various internal objects such as tables, indexes, types, and functions. While originally used as the default primary key for user tables, OIDs are now primarily reserved for system catalogs and internal use to efficiently reference database objects. Each OID is globally unique within a database, enabling fast lookups and joins in system tables, and is essential for PostgreSQL's internal metadata management.
LISTEN/NOTIFY	PostgreSQL’s LISTEN/NOTIFY mechanism provides a lightweight publish-subscribe system within the database. A client can execute LISTEN channel to wait for notifications on a specified channel, while another client can use NOTIFY channel, 'payload' to send a message. When a NOTIFY is issued, all clients listening on that channel receive the notification asynchronously. This feature is commonly used for inter-process communication, event-driven systems, and triggering external actions without constant polling.
More …	Free Space Map (FSM) used for free space management, The Oversized-Attribute Storage Technique (Toast) used for big row storage, Foreign Data Wrapper (FDW) allows PostgreSQL to access external data sources as if they were local tables,

Internals - Documents

Here are some readings explain PostgreSQL internals.

Item	Introduction
The Internals of PostgreSQL	This free, in-depth resource is authored by Hironobu Suzuki—a software engineer, and author of multiple PostgreSQL/MySQL books. Beginning in September 2015, his comprehensive guide delves into every core subsystem of PostgreSQL (up to version 17), covering topics such as memory architecture, query planning, write-ahead logging, vacuuming, backup/recovery, concurrency, and more.
PostgreSQL Internals (official document)	The "Internals" section of the current PostgreSQL documentation provides an in-depth guide aimed at PostgreSQL hackers It covers all core subsystems. These include system catalogs and views, the front-end/backend protocol, coding conventions, mechanisms for extensions, foreign data wrappers, custom scan methods, the genetic query optimizer, access methods, storage layout, transaction processing, WAL and more.
Learning PostgreSQL Internals	This is a good summary with many links to PostgreSQL internals related materials. It actually already covered the materials I listed above.
More …	🔍

General

Feature	Value	Definition
intro	PostgreSQL is a powerful, open source object-relational database system with over 35 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.	in their own words - but I reserved the rights to remove some bold claims like "the best", unless it is widely recognized.
vendor	PostgreSQL Committees
initial release	1998
latested version	17	We don't put a release date here as the software is patching frequently. So tracking it is not much useful.
supported platforms	x86, PowerPC, S/390, SPARC, ARM, MIPS, RISC-V, and PA-RISC	supported OS/CPU platforms
db-engines ranking	4	ranks from https://db-engines.com/en/ranking (06/25)
relational?	yes	Is it a relational database? (1) Most database are actually with some extensions, for example, nested data types, graph support, etc, which we usually called "multi-model". (2) Some of them are product family, meaning they have more than one database. Here we focus on the main one but explain others when needed.
open source?	yes	mainly the engine code
license	PostgreSQL PostgreSQL is released under the PostgreSQL License, a liberal Open Source license, similar to the BSD or MIT licenses -- allows for commercial use and modification of the software.
price: box software	0 free download
on-premise offering	yes	if no means you can't buy "box" software from them
cloud offering	cloud vendors PostgreSQL’s cloud offerings are diverse, with many managed services available across major providers. For example, Amazon RDS for PostgreSQL. Technically, there are many efforts on improving PostgreSQL's native cloud implementation. The most famous one is Neon, which is acquired by DataBricks on 2025.
technical doc	www.postgresql.org/docs

Data Types

Feature	Value	Definition
int: signess	signed only	if differentiate signed and unsigned int
int: 1-bytes int name	n.a.
int: 2-bytes int name	smallint
int: 3-bytes int name	n.a.
int: 4-bytes int name	int
int: 8-bytes int name	bigint
decimal: storage size	variable
decimal: range	up to 131072 digits before the decimal point; up to 16383 digits after the decimal point	also called number, numeric in different systems
char(n): max bytes	10,485,760
text: max bytes	1G

SQL

Feature	Value	Definition
SQL: standard complaince	high
max SQL length	undefined The actual limit may depends on limits of the SUT, for example, its memory/system swap settings: PostgreSQL internally use a struct named "StringInfo" to hold the SQL string, which is with MAX_INT length.	maximal SQL statement length
PL: main	SQL + PL/PgSQL	main programming lanage: most database suports SQL because SQL is a well established standard. However, each database would like to extend SQL more or less.
PL: other language support	yes	PL lanaguage other than PL/SQL, like PL/Java, PL/Rust etc
SP: max parameters	100
UDF: max parameters	100
SQL: max parameters	65535	number of parameters in a PREPARED query
SQL: query hints	GUC only	if it allows use query hints to guide the optimizer
SQL: explicit locking	yes: row, page, table level	Locking is usually an internal matter - so does it allow explicit locking? What levels do they support?
Triggers?	yes	If support triggers
Triggers: scope	tables, views, foreign tables	What objects can have triggers
Triggers: type	BEFORE, AFTER, INSTEAD OF	Types of triggers supported
Object-Relational?	yes
Extension Mechanism	C programming, link with engine
vector search	no native Extensions like pgvector supports it.	does it support vector search

Storage and System

Feature	Value	Definition
arch: server	C/S	Embedded or traditional C/S?
arch: run in browser?	no/yes Officially not supported, but there are some good OSS projects like Pglite.	It also known as a client-side database, is a database that is stored and managed within a user's web browser, rather than on a remote server.
arch: in-memory support	no
arch: Multi-master support?	no	if multi-master support?
arch: primary/read replica?	yes	if primary + mulitiple read replica supported
replication: sync/async	both	Can commits wait or w/o wait for replicas to acknowledge
replication: WAL shipping	yes	Uses write-ahead log (WAL) shipping for replication
replication: quorum-based commit	no	Multiple synchronous replicas with quorum for commit
database: max number	4,294,950,911 Note that 4G-4294950911=16385. The reason is that a database needs assign an object ID (you see the db folder is a number), which is starting from 16385 because smaller are reserved by boostrap.
tables: max number per database	1,431,650,303
partitions: methods	Range, List, Hash; limited composite (multi-level partitioning more limited than Oracle/SQL Server).	Supported partitioning strategies (range, list, hash, etc.).
partitions: global index	no	index across partitions
rows: max rows per table	32T/row_size	The actual number depends on storage etc
index: max allowed index	rows per table is restricted by the number of rows that can fit onto MAX_UINT*8K pages.	Max number of indices allowed per table
ACID	yes/yes	for DML and DDL
ACID: max isolation level	Serializable PostgreSQL implements Serializable Snapshot Isolation (SSI) in version 9.1.
ACID: max ANSI isolation level	Serializable
ACID: durability	yes
Materialized View: support?	yes

Benchmarking

Feature	Value	Definition
any official TPC benchmarks?	no	The TPC benchmark includes a set of tests simulating real-world scenarios to evaluate database performance.

Tools

Feature	Value	Definition
command line client	psql	it means "sql client" for database supporting SQL. For embedded atabase, the client includes the server together.
admin(GUI)	pgAdmin

Export Regulations

Feature	Value	Definition
Jurisdiction	US	Which country controls export
ECCN	None	An Export Control Classification Number (ECCN) is a five-character alphanumeric code used to categorize items on the Commerce Control List (CCL) for export control purposes. Most database may fall into 5D992.c category, "mass market encryption", which means it has some ordinary encryption related code, for example, the SSL connection code.
Eligible License Exception / CCATS	Not required There is no ECCN for open source software	A License Exception is an authorization that allows you to export or reexport items subject to the EAR without needing to obtain a specific export license, provided certain conditions are met. CCATS stands for Commodity Classification Automated Tracking System. The BIS assigns a CCATS number to products that it has classified under the Commerce Control List (CCL).
Encryption Components	SSL	Crypto functionality that triggers control

Internal

Feature	Value	Definition
concurrency control	MVCC
MVCC: implemented?	yes	if implement MVCC for concurrency control
MVCC: rollback segment	no PostgreSQL uses VACUUM to clean up old versions: it is more difficult than in-memory database because the old versions are mixed with new versions on database pages.	if uses rollback segment (RS) to store old versions. Without RS, old versions and new versions are mixed stored, then the database engine has to find a way to efficiently drop the old versions at certain point.
Implementation language	C	A DBMS may use mulitple programming languages, for example, supports its stored procedure. The major programming language used to implement the engine.

Internal - Optimizer

Feature	Value	Definition
CBO?	yes	if it employees a cost based optimizer
framework	System-R	System-R is more like a dynamic programming, bottom up optimizer, while Cascades/Volcano gebaseerd optimizer is more like top-down optimizer.
join order search	dynamic programming + Generic Algorithm (GEQO) It also allows to extend the join order algorithm.	How join order permutations are explored during plan generation.
stats: multi column	yes	Multi-column stats may cause storage space bloat: for example, if one dimension has 100 buckets, then three dimensions will require 1M buckets - but reducing the total number of buckets will result in reduced accuracy.
query hints?	GUC only	if it allows use query hints to guide the optimizer
plan guide?	no	Can we use plan guide to correct the plan? This is a more systematic and accurate way to repair the plan than plan hints. Especially during system upgrades, if the plan becomes worse, we can use plan guide to force a query to use a previous plan.

Internal - Runtime

Feature	Value	Definition
resource management	simple: work_mem controls per-operator memory use	How execution memory is allocated and limited.
spilling support	yes	"spilling" refers to the process of writing temporary data or intermediate results of a query to disk when the available memory is exhausted. This is crucial for handling large datasets or complex queries that require more memory than available.
model	Volcano + push for parallel run	The operator scheduling model: pull-gebaseerd (Volcano), push etc.
support intra-parallel query?	yes	meaning a single query can utilize multi hardware threads to run it
adaptive execution (AQP)?	no	Traditionally, after the optimizer determines the plan, the runtime must execute it completely without any room for adjustment, such as which of the two tables should be built. The adaptive method allows the runtime to make some adjustments based on the actual situation, and the optimizer must also prepare for this uncertainty, such as preparing an alternative plan.
vectorization	no native Several extensions allowed it. For example, pg_duckdb and Hydra.	Speed up OLAP queries using vectorized execution. A vectorized runtime exchange a bunch of rows between iterators, and these rows are physically sotre with column-oriented order.
iterator: join methods	all 3	Hash Join (HJ), Sort-Merge Join (MJ) and Nested loop Join (NLJ) are 3 major ones
Error: out-of-range and overflow	abort the transaction	To maintain atomic requirement of ACID, database engine usually fail the statement and abort the transaction.

Internal - Storage Engine

Feature	Value	Definition
Page Size	8K default configurable with the --with-blocksize=BLOCKSIZE option	The physical disk block size for storage and I/O.
Storage Compression	TOAST compression (within a row)
Row Overflow / Large Row Handling	Row within a page and large columns pointing to TOAST storage	How rows larger than page size are handled. Usually a row must be within one page, but big columns can only save a pointer with the row and pointing to the actual storage. This is also why there is a limit of maximal columns per table.
Index: primary cluster	Heap storage by default; CLUSTER command for static clustering - if modification happens after, order is not maintained.	Whether physical rows are ordered by primary index.
Index: secondary	B+Tree, GIN, GiST, SP-GiST, BRIN, Hash, Fulltext (via extensions).	Supported index types beyond the primary key.
index: column store as primary	No native support; available via extensions (e.g. cstore_fdw, citus).	if column-store as primary storage supported
index: column store as secondary	No native support; available via extensions (e.g. cstore_fdw, citus).	if column-store as secondary index supported
partition table	declarative partition table is implemented via inheritance table, since V11	Partitioning capabilities for large tables.
partition table: prunning	mostly at compile time, some runtime	Ability to eliminate partitions at query compile time and runtime
ACID: durability mechanism	WAL	How is it implements durability. Database's classic way is write ahead logging (WAL).