Compacting
PowerSync Cloud
The cloud-hosted version of PowerSync will automatically compact all buckets once per day. Support to manually trigger compacting is available in the Dashboard: Right-click on an instance, or search for the action using the Command Palette. Support to trigger compacting from the CLI will be added soon. Defragmenting may still be required.Self-hosted PowerSync
For self-hosted setups (PowerSync Open Edition & PowerSync Enterprise Self-Hosted Edition), thecompact
command in the Docker image can be used to compact all buckets. This can be run manually, or on a regular schedule using Kubernetes CronJob or similar scheduling functionality.
Defragmenting may still be required.
Background
Bucket operations
Each bucket is an ordered list ofPUT
, REMOVE
, MOVE
and CLEAR
operations. In normal operation, only PUT
and REMOVE
operations are created.
A simplified view of a bucket may look like this:
Compacting step 1 - MOVE operations
The first step of compacting involvesMOVE
operations. This just indicates that an operation is not needed anymore, since a later PUT
or REMOVE
operation replaces the row.
After this compact step, the bucket may look like this:
Compacting step 2 - CLEAR operations
The second step of compacting takes a sequence ofCLEAR
, MOVE
and/or REMOVE
operations at the start of the bucket, and replaces them all with a single CLEAR
operation. The CLEAR
operation indicates to the client that “this is the start of the bucket, delete any prior operations that you may have”.
After this compacting step, the bucket may look like this:
CLEAR
operation can only remove operations at the start of the bucket, not in the middle of the bucket, which leads us to the next step.
Defragmenting
There are cases that the above compacting steps cannot optimize efficiently. The key factor is that the oldest PUT operation in a bucket determines how much of the history can be compacted. This means:- If a row has never been updated since its initial creation, its original PUT operation remains at the start of the bucket
- All operations that come after this oldest PUT cannot be fully compacted
- This is particularly problematic when you have:
- A small number of rarely-changed rows in the same bucket as frequently-updated rows
- The rarely-changed rows’ original PUT operations “block” compacting of the entire bucket
- The frequently-updated rows continue to accumulate operations that can’t be fully compacted
- The original PUT operation for row ‘a’ remains at the start
- All subsequent operations can’t be fully compacted
- We end up with over 100k operations for what should be a simple bucket
Note: All rows in the bucket must be updated for this to be effective. If some rows are never updated, they will continue to block compacting of the entire bucket.
Bucket Design Tip: If you have a mix of frequently-updated and rarely-changed rows, consider splitting them into separate buckets. This prevents the rarely-changed rows from blocking compacting of the frequently-updated ones.
When to Defragment
You should consider defragmenting your buckets when:- High Operations-to-Rows Ratio: If you notice that the number of operations significantly exceeds the number of rows in a bucket. You can inspect this using the Diagnostics app.
- Frequent Updates: Tables that are frequently updated (e.g., status fields, counters, or audit logs)
- Large Data Churn: Tables where you frequently insert and delete many rows
Defragmenting Strategies
There are manual and automated approaches to defragmenting:-
Manual Defragmentation
- Use the PowerSync Dashboard to manually trigger defragmentation
- Right-click on an instance and select “Compact Buckets” with the “Defragment” checkbox selected
- Best for one-time cleanup or after major data changes
- Use the PowerSync Dashboard to manually trigger defragmentation
-
Scheduled Defragmentation
- Set up a cron job to regularly update rows
- Recommended for frequently updated tables or tables with large churn
- Example using pg_cron:
- This will cause clients to re-sync each updated row, while preventing the number of operations from growing indefinitely. Depending on how often rows in the bucket are modified, the interval can be increased or decreased.
Defragmenting Trade-offs
Defragmenting + compacting as described above can significantly reduce the number of operations in a bucket, at the cost of existing clients needing to re-sync that data. When and how to do this depends on the specific use-case and data update patterns. Key considerations:- Frequency: More frequent defragmentation means fewer operations per sync but more frequent re-syncs
- Scope: Defragmenting all rows at once is more efficient but causes a larger sync cycle
- Monitoring: Use the Diagnostics app to track operations-to-rows ratio
Sync Rule deployments
Whenever modifications to Sync Rules are deployed, all buckets are re-created from scratch. This has a similar effect to fully defragmenting and compacting all buckets. This was recommended as a workaround before explicit compacting became available (released July 26, 2024). In the future, we may use incremental sync rule reprocessing to process changed bucket definitions only.Technical details
See the documentation in thepowersync-service
repo for more technical details on compacting.