Nusantara Cloud

Cloud and Software Development Services


Simple Online-Offline & Online-Offline Architecture Using MySQL

10.31.2018

As a context, we're building a web application for blended-learning, which is a learning paradigm where technology is used to aid offline learning. In our case, it's for K-12 students learning math. We're building a math tutoring center where each of the students studies by watching lecture videos and then exercising in their tablet. There is still a teacher, but his job is to help students who are confused. In other words, it's like a traditional math classroom on steroid: teachers no longer need to do repetitive task of teaching the same materials and grading exercises, but they can focus on giving personal attention to the students.

We need offline-to-online architecture because:
1. Internet access in the tutoring center is not reliable.
2. We want students to be able to continue studying anytime 24/7 (via Internet) outside of the tutoring center.
3. We want to analyze student data captured during tutoring session so that our remote team can improve the curriculum and prepare to better help the students on the next session.

So we basically have 2 kind of servers: cloud-server and class-server. A cloud-server is used to serve all the Internet access, and a class-server is used to serve a tutoring center exclusively.

First off, this is the specification:
1. Cloud-server and class-server both have single code-base. The only difference is the flag CLOUD set to true/false accordingly.
2. There's only 1 cloud-server. (we aren't worried about scalability yet :))
3. There are many class-servers, each for different tutoring center.
4. Database is MySQL

Our trick to keep it simple is by having only a single curriculum data source: the cloud-server. We update our curriculum (videos, exercises, etc) only on the cloud. The class-server is only allowed to mutate student-related tables (new students, new analytics, student's exercise submission, etc). Because of this design decision, we need to only implement class-to-cloud synchronization logic. We can get the reverse by forking off the cloud database. For example, to update a class server to have the latest curriculum, we only need to sync class-to-cloud, and then drop the class database and replace it with the cloud's using mysqldump.

The first thing needed is Synchronizations table, which looks like:
+------------------+--------------+------+-----+---------+----------------+
| Field            | Type         | Null | Key | Default | Extra          |
+------------------+--------------+------+-----+---------+----------------+
| id               | int(11)      | NO   | PRI | NULL    | auto_increment |
| schoolIdentifier | varchar(255) | YES  |     | NULL    |                |
| serverHash       | varchar(255) | YES  |     | NULL    |                |
| localId          | int(11)      | YES  |     | NULL    |                |
| cloudId          | int(11)      | YES  |     | NULL    |                |
| tableName        | varchar(255) | YES  |     | NULL    |                |
| createdAt        | datetime     | NO   |     | NULL    |                |
| updatedAt        | datetime     | NO   |     | NULL    |                |
+------------------+--------------+------+-----+---------+----------------+
This table basically maps the a row from class server into cloud server. tableName indicates where the data comes from, localId indicates the id of the row on the class-server, and cloudId indicates the id of the row on the cloud-server. This table is used to distinguish between update vs. insert. Basically it is a way to figure out that user with name Alyssa updated its password vs. there's a new user with name Alyssa but have different password. serverHash is unique between class-servers, so that different class-server has different mapping table. The entry is generated only once and is re-generated if cloud database is forked into class database, because the previous mapping on Synchronizations table is no longer valid after a fork.

The second thing is the column onCloud 

+------------------+--------------+------+-----+---------+----------------+
| onCloud      | tinyint(1)   | YES  |     | 1       |                |
+------------------+--------------+------+-----+---------+----------------+

This column indicates that the data is already on the cloud. Without this, when rows are sent into the cloud, the cloud can't differentiate between new rows and rows that is already there, because remember that the class database is a fork of the cloud's. Whenever cloud receives a data from class-to-cloud sync, onCloud column is set to true, so that subsequent forks wouldn't send it again.


The third thing is syncHistories
+------------------+------------------------------------+------+-----+---------+----------------+
| Field            | Type                               | Null | Key | Default | Extra          |
+------------------+------------------------------------+------+-----+---------+----------------+
| id               | int(11)                            | NO   | PRI | NULL    | auto_increment |
| schoolIdentifier | varchar(255)                       | YES  |     | NULL    |                |
| status           | enum('Syncing','Success','Failed') | YES  |     | NULL    |                |
| date             | varchar(255)                       | YES  |     | NULL    |                |
| createdAt        | datetime                           | NO   |     | NULL    |                |
| updatedAt        | datetime                           | NO   |     | NULL    |                |
+------------------+------------------------------------+------+-----+---------+----------------+
While Synchronizations and onCloud are already enough to implement basic class-to-cloud sync logic, relying only on them have an overhead of re-sending data that's already synced to the cloud. We were tempted to use onCloud on the class-database by setting it to true after a class-to-cloud sync, but we realized the hard way that in our case cloud-to-class synchronization takes quite a long time and we can't block that long due to HTTP server limitation. In an extreme it can take minutes for the cloud server to process each of the rows received from the class server. So to avoid the overhead, cloud server keeps track of when was the last time a sync happens from a specific class server.