Software updates are disruptive. They typically need program reboots, and the updates can take a long time. Such downtime windows, even when planned in advance, can hurt Internet services and drive their users away. As a result, critical updates are often postponed, even when these delays hurt security.
Researchers have proposed several techniques for upgrading software without downtime. For example, dynamic software updates can change a running program on-the-fly, and competitors in DARPA’s Cyber Grand Challenge recently demonstrated tools for finding and patching security vulnerabilities automatically, within seconds. These techniques do not work when the update involves changes to the format (or “schema”) of the program’s persistent data, which is typically stored in a database. Such schema changes are perhaps the biggest cause of downtime for Internet services. Updating the schema of a large database takes time; for example, Wikipedia was locked for 22 hours during the upgrade to MediaWiki 1.5, which involved a major schema reorganization. Large systems, such as Google’s AdWords, may require multiple schema changes every week.
We built a system, called KVolve, for updating data from Redis key-value stores on-the-fly, without disconnecting the applications that read and write the data [ICSME 2016]. Using KVolve requires almost no changes to the application code. We can also use KVolve with Kitsune, a whole-program updating framework for C, to update both the code and data of a complex application, with zero downtime. For example, we upgraded redisfs, a FUSE filesystem backed by Redis, and we seamlessly maintained the mount point during the upgrade.
It may come as a surprise that schema changes are a problem for NoSQL databases such as Redis. After all, these databases lack a formal schema specification, like the Data Definition Language (DDL) from SQL databases. Instead, NoSQL databases store key-value pairs and provide simple key-based read and write commands. However, applications attach meaning to the format of the keys and values stored in the database. Typically, keys are structured strings and values store objects with multiple fields and data structures, serialized as Protocol Buffers, Thrift, Avro or JSON. These formats evolve with the application code, and software updates may change objects to add or delete fields, split objects into multiple key-value pairs, and rename keys or value fields.
With KVolve, each key-value pair in Redis has a version number, and applications must indicate the data version they expect when they connect to the database. To an application, KVolve always presents the logical view that data is at the newest version of the format. When upgrading to a new version of the application, the developer must create a KVolve update specification that defines how to transform existing data to the newest version. We apply these transformations lazily: when the updated application accesses an object in the old format, KVolve converts the object to the new format on-the-fly. This allows us to amortize the downtime due to data transformations over the updated application’s execution, causing slower queries immediately after the update but no outage. In our experiments, the slowdown was up to 3% for read operations and up to 6% for write operations. Usually, this meant under 1 second.
After launching a schema upgrade, administrators must also update all the applications using the data store. They can do this with dynamic software updates or with a rolling upgrade (for instance, by splitting traffic in Google’s AppEngine). Even when stopping and restarting the application, the downtime is minimal, as KVolve performs the time consuming schema update transparently. In other words, the new version may start using the data store immediately, even if the transformation is still in progress. Another benefit is that the schema update specification is a separate module from the application code. This is better than rewriting applications to expect data in both old and new formats and mixing application and format-maintenance logic. For example, to upgrade redisfs.5 to redisfs.7 (which added data compression and changed the inode data structure), we added only 6 lines of code in both versions. These changes consisted of an additional call to Redis on start-up to declare the data version expected and a few additional lines of error handling.
KVolve installs updates atomically in a way that supports fault tolerance. We ensure that data transformations take place atomically with the triggering database action. As such, KVolve avoids races that could clobber concurrent accesses. KVolve also supports changes to Redis data structures, such as sets, hashes, lists, and sorted sets, by storing the version information in the container and by updating all the contained values at once.
Along with our paper [ICSME 2016], we are releasing the KVolve code. KVolve patches Redis to keep track of data versions and to invoke the data transformation functions when necessary. Written in C, a transformation function must access a single key-value pair in the old format (otherwise their output could depend on the order of invocation). To support laziness, transformations to keys must also be reversible and unambiguous. We examined the historic schema changes for multiple Redis applications on GitHub and they were all compatible with these restrictions. This gives us hope that many developers will find KVolve useful for their applications. We also hope that our paper and code will open the door to zero-downtime schema updates beyond Redis, as the principles behind KVolve are applicable to other key-value stores.
Paper: [ICSME 2016]
Karla Saurs presents "Evolving NoSQL Databases Without Downtime" at the #icsme16 Software Evolution track.Posted by ICSME on Thursday, October 6, 2016
[ICSME 2016] K. Saur, T. Dumitraș, and M. Hicks, “Evolving NoSQL Databases Without Downtime,” in IEEE International Conference on Software Maintenance and Evolution (ICSME), Raleigh, NC, 2016.