Synchronizing data

I revamped the data storage on my Taigen project. Previously it used PouchDB to store data on the client and CouchDB on the server, and PouchDB synced its data with the server automatically.

I ditched the Pouch/Couch key:value1 data storage because the client database gets too large over time. CouchDB is append-only so old data is never really deleted: the database gets larger and larger both on the server (not a big deal, server storage space is cheap) and on the client (a very big deal, web browsers complain if a website tries to store a large dataset).

Now Taigen stores data using plain old IndexedDB on the client and PostgreSQL on the server. The app is now much faster and responsive, but I had to write my own sync module, a task that loomed with hidden gotchas that could lose data.

Overview

Taigen is an online notebook that works on your desktop, tablet, and mobile browser.The database has a Note table which stores the user’s notes and a DeletedNote table which stores only the ID and delete timestamp of the deleted notes. Each note has a UUID (a unique ID) generated by the client.

Client -> Server (Push)

Syncing notes from client to server was much easier than anticipated.

New and updated notes

  1. Add a modified boolean field to the Note table.
  2. When a note is modified, set modified to True.
  3. Every few seconds, push the changes to the server: search for all notes with the modified bit set. Send them one at a time.
  4. When the server receives the note, it stores the new data and updates the note’s date-time stamp. It sends a success response to the client.
  5. Clear the modified bit for each note only if the server sends a success response.

Deleted notes

  1. The client stores the note’s ID in a DeletedNote table and deletes the note from its Note table.
  2. When you push the changes, send the deleted IDs as well.
  3. The server adds the note’s ID to its own DeletedNote table and deletes the note from its Note table.
  4. The client deletes the ID from its own DeletedNote table on a successful send.

After everything is done, the note is gone and only the server has a record of the deleted note’s ID. This allows other clients to receive the deletion when they pull.

Server -> Client (Pull)

A Taigen client pulls updates from the server only once per session, when the app is opened.

  1. The client stores its last pull request date-time.
  2. During the pull, the client sends the date-time to the server.
  3. The server reads the date-time and queries its database for all notes modified or deleted after the date-time.
  4. The server sends a list of note IDs to the client.
  5. The client iterates through the list of IDs, requesting the full data for each note in the list.
  6. If all pulls are successful, the client sets the last pull request date-time to now and stores it in its database. If any pulls fail, don’t set the date-time. This allows the client to try again later.

Sharing data with others

Taigen has fairly simple sync requirements: I’m not going to simultaneously edit documents on my phone and laptop, so Taigen doesn’t have to sort out which version is newer or try to merge changes. If you write a web-based tool that allows collaboration between people, then you’ll need some kind of versioning system.

Do everything above but also add a version integer field to the table.

  1. Client A edits note version 1. He pushes the note to the server at 10 AM.
  2. The server receives the note from A and increments the note’s version to 2. The server sends the version number in the response to A. The server and client A are now working with version 2 of the document.
  3. Client B edits the same note Client A edited, but he’s still working with note version 1. He pushes to server at 11 AM because of internet problems (maybe he was working while riding on a train). The server sees version 1 in the request and rejects it. B has to pull the version 2 changes from the server or try to merge his changes with the current version.

  1. A key:value database is a type of NoSQL database and stores data as a collection of key:value pairs: very simple and reliable and scales horizontally (easy to add extra database servers). An example of a key:value pair: name:Craig