Wednesday, September 2, 2009

The definition of eventually secure systems

I've been using Web services under a new assumption of integrity these days, which allows the data inconsistency during a span of a few minutes. The designers of those systems allow such a relaxed condition to data consistency, for putting higher priority to availability and tolerance to split database subsystems within a cluster representing an integrated database.

Then a question comes into my mind: what does it mean for a database to be secured, while allowing unstable condition in a range of few minutes? Of course guaranteeing unconditional access restriction is a solution to claim a database secure, provided each party who is allowed to get access to the database does not harm the integrity at all. This sort of strict access limitation, however, is impractical for a public system. So, a new notion of security, probably called eventually secure systems, should be introduced. But how? I still have no idea about this.

Traditionally, databases are designed under the restriction of Atomicity, Consistency, Isolation and Durability (ACID) for every query and update operation. The ACID policy demands locking of critical sections between conflicting database requests and causes performance degradation.

On the other hand, Gilbert and Lynch [1] claim in their CAP Theorem for a distributed database, that the three properties of a database will not be realized at the same timing: data consistency, availability, and tolerance to network partition. BASE [2], which stands for basically available, soft state, eventually consistent, is an example of anti-ACID design policies based on the CAP Theorem, giving higher priority to availability and tolerance to network partition than the data consistency.

Vogels [3] also explains the idea of eventual consistency, or an eventually consistent change of states, as an analogy to Domain Name System (DNS), which allows the clients to query the distributed database to see the inconsistency during the propagation of database update events, while the inconsistency will be resolved in a finite period determined by the configuration of the replication network between the database caches.

While CAP Theorem, BASE, and the notion of eventually consistent systems are effective to relax the boundary condition of data inconsistency for making a very large-scale systems, those ideas will not solve the core issue: how to keep the consistency of a cluster of a database in a finite predictable time range. I understand many applications do not require atomic consistency of data, especially those for casual conversation, such as Twitter or Facebook. I don't think, however, that a bank system can be created under the BASE principle, unless the maximum allowance of temporal data inconsistency or the maximum time of eventual convergence are given and proven.

And I think on running large-scale systems, things are often getting eventually inconsistent and disintegrated, rather than eventually consistent. I still wonder how we can solve this problem consistently.