Maintaining Index Health in PostgreSQL: When to Choose REINDEX or VACUUM FULL
Emily Parker
Product Engineer · Leapcell

Maintaining optimal performance in a PostgreSQL database often hinges on the efficiency of its indexes. Over time, indexes can become bloated or fragmented due to frequent updates and deletions, leading to slower query execution and increased storage consumption. This degradation isn't just a minor annoyance; it can significantly impact application responsiveness and overall system health. Understanding the tools available to combat this – specifically REINDEX and VACUUM FULL – and knowing when and how to apply them is crucial for any database administrator or developer. This article delves into these two powerful commands, exploring their underlying mechanisms, use cases, and practical implications for keeping your PostgreSQL indexes in top shape.
Understanding the Essentials: Bloat, Indexes, and MVCC
Before we dive into REINDEX and VACUUM FULL, let's briefly touch upon some core concepts.
Index Bloat: In PostgreSQL, when rows are updated or deleted, the old versions of these rows (and their corresponding index entries) are not immediately removed. Instead, they are marked as "dead tuples." While autovacuum eventually cleans these up, a high rate of changes can lead to a significant accumulation of dead tuples, causing indexes to grow larger than necessary. This unnecessary space consumption and the need to scan more data pages constitute "index bloat." Bloat directly impacts performance because more data needs to be read from disk, and cache utilization becomes less effective.
PostgreSQL Indexes: An index is a special lookup table that the database search engine can use to speed up data retrieval. It's essentially a sorted list of values from the indexed columns, along with pointers to the actual rows in the table. When a query uses an indexed column, PostgreSQL can quickly locate the relevant rows without scanning the entire table.
MVCC (Multi-Version Concurrency Control): PostgreSQL implements MVCC to handle concurrent transactions without locking entire tables. When a row is updated, a new version of the row is created, and the old version remains visible to transactions that started before the update. This mechanism, while excellent for concurrency, is the root cause of dead tuples and, consequently, bloat.
REINDEX: Rebuilding Indexes Incrementally
REINDEX is a command used to rebuild one or more existing indexes. When you execute REINDEX, PostgreSQL essentially drops the old index (or marks its entries as invalid) and then recreates it from scratch using the current, live data in the table. This process eliminates any bloat present in the old index structure, as all dead tuples are discarded, and the new index is built only with active pointers.
How REINDEX Works
From a high level, REINDEX involves:
- Scanning the table: It reads the entire table to identify all live rows.
- Building a new index: Based on the live rows, it constructs a completely new index structure.
- Replacing the old index: Once the new index is built, it replaces the old one, atomically.
When to Use REINDEX
REINDEX is the primary tool for addressing index bloat. You should consider using it in the following scenarios:
- Significant Index Bloat: When pg_stats_reportsorpg_bloat_checktools indicate a high percentage of bloat in an index.
- Performance Degradation: If queries that utilize a specific index are becoming noticeably slower, and you suspect index bloat is the cause.
- Corruption: In rare cases, if an index becomes corrupted, REINDEXcan effectively repair it by rebuilding it from the table data.
- Changing Index Parameters: To apply new storage parameters or fillfactor settings to an existing index.
Practical Examples with REINDEX
Let's illustrate with an example. Suppose we have a table products and an index idx_product_name.
-- Create a sample table and index CREATE TABLE products ( id SERIAL PRIMARY KEY, name VARCHAR(255) NOT NULL, price NUMERIC(10, 2) ); CREATE INDEX idx_product_name ON products (name); -- Insert some data INSERT INTO products (name, price) SELECT 'Product ' || generate_series(1, 10000), random() * 100; -- Simulate bloat by updating many rows UPDATE products SET name = 'Updated Product ' || id WHERE id > 5000; DELETE FROM products WHERE id BETWEEN 100 AND 200; -- You can check for bloat (requires specific extensions or queries) -- For demonstration, let's assume bloat has occurred. -- Reindex a specific index REINDEX INDEX idx_product_name; -- Reindex all indexes on a table REINDEX TABLE products; -- Reindex all indexes in a database (can be very resource intensive) REINDEX DATABASE my_database;
Important Considerations for REINDEX:
- Locking: REINDEXcan acquire anACCESS EXCLUSIVElock on the index (or table, or database), which prevents reads and writes on the affected objects during its execution. This outage can be unacceptable for high-availability systems.
- CONCURRENTLYOption: For indexes on tables, PostgreSQL offers- REINDEX INDEX CONCURRENTLYand- REINDEX TABLE CONCURRENTLY. This option allows the reindex to occur without taking an- ACCESS EXCLUSIVElock, making it suitable for production environments. It performs two table scans and requires more time and temporary disk space.
 This concurrently built index will replace the original without blocking concurrent DML operations. Note that- REINDEX INDEX CONCURRENTLY idx_product_name;- CONCURRENTLYis not available for- REINDEX DATABASEor- REINDEX SYSTEM.
VACUUM FULL: Reclaiming Space in Tables and Indexes
VACUUM FULL is a much more aggressive command than VACUUM. While regular VACUUM simply marks dead tuples for reuse, VACUUM FULL rebuilds the entire table and its associated indexes from scratch by writing a completely new version of the table. This effectively reclaims all dead tuple space, both within the table data and its indexes, and compacts the table on disk.
How VACUUM FULL Works
VACUUM FULL operates by:
- Creating a new table file: It writes a new version of the table, containing only live rows, to a new location on disk.
- Rebuilding indexes: As part of this process, all associated indexes are also rebuilt to point to the new table structure.
- Replacing the old table: Once the new table and its indexes are complete, the old table and its associated index files are removed.
When to Use VACUUM FULL
VACUUM FULL is a blunt instrument and should be used sparingly due to its severe locking implications. Its primary use case is:
- Severe Table Bloat: When a table itself has accumulated a massive amount of dead space that regular VACUUM(even autovacuum) cannot efficiently reclaim. This is less about index bloat specifically and more about overall table storage.
- Specific Storage Scenarios: In very rare cases where you need to recover disk space immediately and you are certain that a long outage is acceptable.
Practical Examples with VACUUM FULL
Using our products table example:
-- Simulate significant table and index bloat -- (e.g., delete a large percentage of rows without autovacuum running) DELETE FROM products WHERE id > 7000; -- Now run VACUUM FULL VACUUM FULL products;
Important Considerations for VACUUM FULL:
- Locking: VACUUM FULLacquires anACCESS EXCLUSIVElock on the table it is processing. This means no reads or writes can occur on that table (and implicitly, its indexes) untilVACUUM FULLcompletes. For large tables, this outage can last for hours, making it generally unsuitable for production systems.
- Disk Space: It requires temporary disk space roughly equal to the size of the table being processed, as it creates a new copy of the table.
- Index Impact: While VACUUM FULLdoes eliminate index bloat as a side effect of rebuilding the table, its primary focus is on table bloat. If only index bloat is the concern,REINDEX(especiallyCONCURRENTLY) is almost always the preferred solution.
- Alternatives: For table bloat, consider tools like pg_repack(a third-party utility) which can rebuild tables and indexes online, without holdingACCESS EXCLUSIVElocks for extended periods.
Choosing Between REINDEX and VACUUM FULL
The choice boils down to the specific problem you're trying to solve and your tolerance for downtime:
| Feature | REINDEX (without CONCURRENTLY) | REINDEX CONCURRENTLY | VACUUM FULL | 
|---|---|---|---|
| Primary Goal | Eliminate index bloat, repair index | Eliminate index bloat, repair index | Eliminate table and index bloat | 
| Locks | ACCESS EXCLUSIVEon index | SHARE UPDATE EXCLUSIVE(briefly) | ACCESS EXCLUSIVEon table | 
| Downtime | Brief outage for index | Minimal to no outage | Full outage for table and its indexes | 
| Disk Space | Temporary space for new index | More temporary space, longer run time | Temporary space for entire table | 
| Resource Use | Moderate CPU/IO | High CPU/IO | Very High CPU/IO | 
| Applicability | Single index, fast fix | Production systems, individual index | Severe table bloat, maintenance windows | 
When to prefer REINDEX:
- You primarily observe bloat in your indexes, not necessarily the table itself.
- You need to repair a corrupted index.
- You want to modify index storage parameters.
- You require minimal downtime (use CONCURRENTLY).
When to consider VACUUM FULL (with extreme caution):
- Your table itself is severely bloated, and VACUUM(even autovacuum) is insufficient.
- You have a scheduled maintenance window where a prolonged ACCESS EXCLUSIVElock is acceptable.
- You understand and accept the complete downtime for the table.
In most modern PostgreSQL deployments, REINDEX INDEX CONCURRENTLY is the go-to solution for index maintenance. If table bloat is a recurring issue, consider tuning your autovacuum settings or exploring external tools like pg_repack before resorting to VACUUM FULL.
Conclusion
Maintaining index health is paramount to achieving optimal PostgreSQL performance. While both REINDEX and VACUUM FULL can eliminate bloat and reclaim space, they differ significantly in their scope, locking behavior, and impact on system availability. REINDEX is specifically designed for indexes, and its CONCURRENTLY option makes it the preferred method for
online index optimization, whereas VACUUM FULL acts on entire tables and should only be used in rare, high-downtime tolerance situations. Choosing the right tool for the job ensures your database remains fast and efficient without unnecessary service disruptions.

