This contains a broad overview of the sync protocol used between PowerSync clients and a PowerSync Service instance. For details, see the implementation in the various client SDKs.
The PowerSync protocol is designed to efficiently sync changes to clients, while maintaining consistency and integrity of data.
The same process is used to download the initial set of data, bulk download changes after being offline for a while, and incrementally stream changes while connected.
All synced data is grouped into buckets. A bucket represents a collection of synced rows, synced to any number of users.
Buckets is a core concept that allows PowerSync to efficiently scale to thousands of concurrent users, incrementally syncing changes to hundreds of thousands of rows to each.
Each bucket keeps an ordered list of changes to rows within the bucket — generally as “PUT” or “REMOVE” operations.
A checkpoint is a sequential id that represents a single point-in-time for consistency purposes. This is further explained in Consistency.
For any checkpoint, the client and server can compute a per-bucket checksum. This is essentially the sum of checksums of individual operations within the bucket, which each individual checksum being a hash of the operation data.
The checksum helps to ensure that the client has all the correct data. If the bucket data changes on the server, for example because of a manual edit to the underlying bucket storage, the checksums will stop matching, and the client will re-download the entire bucket.
Note: Checksums are not a cryptographically secure method to verify data integrity. Rather, it is designed to detect simple data mismatches, whether due to bugs, manual data modification, or other corruption issues.
To avoid indefinite growth in size of buckets, the history of a bucket can be compacted. Stale updates are replaced with marker entries, which can be merged together, while keeping the same checksums.
A client initiates a sync session using:
The server then responds with a stream of:
The server then waits until a new checkpoint is available, then repeats the above sequence.
The stream can be interrupted at any time, at which point the client will initiate a new session, resuming from the last point.
If a checksum validation fails on the client, the client will delete the bucket and start a new sync session.
Data for individual rows are represented using JSON. The protocol itself is schemaless - the client is expected to use their own copy of the schema, and gracefully handle schema differences.
Write checkpoints are used to ensure clients have synced their own changes back before applying downloaded data locally.
Creating a write checkpoint is a separate operation, which is performed by the client after all data has been uploaded. It is important that this happens after the data has been written to the backend source database.
The server then keeps track of the current CDC stream position on the database (LSN in Postgres, resume token in MongoDB, or binlog position in MySQL), and notifies the client when the data has been replicated, as part of checkpoint data in the normal data stream.