Retroactive Data Structures (extended abstract)
Erik D. Demaine
Abstract We introduce a new data structuring paradigm in which operations can be performed on a data structure not only in the present but also in the past. In this new paradigm, called retroactive data structures, the historical sequence of operations performed on the data structure is not fixed. The data structure allows arbitrary insertion and deletion of operations at arbitrary times, subject only to consistency requirements. We initiate the study of retroactive data structures by formally defining the model and its variants. We prove that, unlike persistence, efficient retroactivity is not always achievable, so we go on to present several specific retroactive data structures. 1 Introduction Suppose that we just discovered that an operation previously performed in a database was erroneous (e.g., from a human mistake), and we need to change the operation. In most existing systems, the only method to support these changes is to rollback the state of the system to before the time in question and then re-execute all of the operations from the modifications to the present. Such processing is wasteful, inefficient, and often unnecessary. In this paper we introduce and develop the notion of retroactive data structures, which are data structures that efficiently support modifications to the historical sequence of operations performed on the structure. Such modifications could take the form of retroactively inserting, deleting, or changing one of the operations performed at a given time in the past on the data structure in question. After defining the model, we show that there is no general efficient transformation from non-retroactive structures into retroactive structures. We then turn to the development of specific retroactive structures. For some classes of data structures (commutative and invertible data structures, and data structures for decomposable search problems), we
MIT Laboratory for Computer Science, 200 Technology Square, Cambridge, MA 02139, USA. edemaine@mit.edu. Polytechnic University, 5 MetroTech Center, Brooklyn NY 11201, USA. jiacono@poly.edu. Charge de recherches du FNRS, Universite Libre de Bruxelles, ´ ´ Departement d'informatique, ULB CP212, Belgium. ´ Stefan. Langerman@ulb.ac.be.

John Iacono

Stefan Langerman

present general transformations to make data structures efficiently retroactive. For other data structures where the dependency between operations is stronger, efficient retroactivity requires the development of new techniques. In particular, we present a retroactive heap that achieves optimal bounds. 1.1 Comparison to persistence. The idea of retroactive data structures is related at a high level to the classic notion of persistent data structures, because they both consider the notion of time, but otherwise they differ almost entirely. A persistent data structure maintains several versions of a data structure, and operations can be performed on one version to produce a new version. In its simplest form, modifications can only be made to the last structure, thus creating a linear relationship amongst the versions. In full persistence [4], an operation can be performed on any past version to create a new version, thus creating a tree structure of versions. Confluently persistent structures [5] allow a new version to be created by merge-like operations on multiple existing structures, and thus the versions form a directed acyclic graph. The data structuring techniques for persistence represent a substantial cost savings over the na¨ve i method of maintaining separate copies of all versions. The key difference between persistent and retroactive data structures is that, in persistent data structures, each version is treated as an unchangeable archive. Each new version is dependent on the state of existing versions of the structure. However, because existing versions are never changed, the dependence relationship between two versions never changes. The user can view a past state of the structure, but changes in the past state can only occur by forking off a new version from a past state. Thus, the persistence paradigm is useful for maintaining archival versions of a structure, but is inappropriate for when changes must be made directly to the past state of the structure. In contrast, the retroactive model we define allows changes to be made directly to previous versions. Because of the interdependence of the versions, such a change can radically affect the contents of all later versions. In effect we sever the relationship between time as perceived by a data structure, and time as perceived by the user of the data structure. Operations such as "Insert 42" now become

Copyright © 2004 by the Association for Computing Machinery, Inc. and the Society for industrial and Applied Mathematics. All Rights reserved. Printed in The United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Association for Computing Machinery, 1515 Broadway, New York, NY 10036 and the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688

281

¢

¡

 
¤ ¥

£


realizing the mistake, you call customer service and you reach a settlement in which the payment is transferred into 1.2 Motivation. In a real-world environment, large sys- the correct account. Unfortunately, the next month, you are tems processing many transactions are commonplace. In charged a late fee for the late payment of the bill. You call such systems there are many situations where the need arises customer service again, and the late fee is removed as per to alter the historical sequence of operations that were pre- the previous agreement. The next month, interest from the viously performed on the system. We now suggest several (now removed) late fee appears on the bill. Once again you applications where a retroactive approach to data structures must call customer service to fix the problem. This sequence would help: of events is typical and results from the system's inability to Simple Error. Data was entered wrong. The data retroactively change the destination of the payment. should be corrected and all secondary effects of the data Intentional Manipulation of the Past. In Orwell's removed. 1984 [9], the protagonist's job was to change the past ediSecurity Breaches. Suppose some operations were dis- tions of a newspaper to reflect the desires of the government covered to have been maliciously performed by an unautho- to control the past. Retroactively changing several docurized user. It is particularly important in the context of com- ments while maintaining consistency has many applications, puter security to not only remove the malicious transactions, some more dangerous than others. but also to act as if the malicious operation never occurred. Dynamization. Some static algorithms or data strucFor example, if the intruder modified the password file, not tures are constructed by performing on some dynamic data only should we restore that file, but we should also undo lo- structure a sequence of operations determined by the input. gins enabled by this modification. For example, the point-location data structure of Sarnak and Tainted Sources. In a situation where data enters a sys- Tarjan [12] consists of performing a sequence of insertions tem from various automated devices, if one device is found and deletions on a persistent binary search tree. Making such to have malfunctioned, all of its transactions are invalid over data structures retroactive would allow us to dynamically a period of time and must be retroactively removed. For ex- modify the input by retroactively modifying the operation ample, in a situation where many temperature readings from sequence, thus making static algorithms or data structures various weather stations are reported to a central computer, if dynamic. one weather station's sensors are discovered to be intermittently malfunctioning, we wish to remove all of the sensor 1.3 Time is not an ordinary dimension. One may think readings from the malfunctioning station because they are that the problem of retroactive data structures can be solved no longer reliable. Secondary effects of the readings, such by adding one more dimension to the structure under conas averages, historical highs and lows, along with weather sideration. For example, in the case of a min-heap, it would predictions must be retroactively changed. seem at first glance that one could create a two-dimensional Disconnection. Continuing with the weather-station variant of a heap, and the problem would be solved. The idea analogy of the previous paragraph, suppose the transmission is to assign the key values of the items in the heap to the system for one weather station is damaged, but later the data axis and use the axis to represent time. In this represenis recovered. We should then be able to retroactively enter tation, each item in the heap is represented by a horizontal the reports from the weather station, and see the effects of the new data on, for example, the current and past forecasts. lifespan of element 4 Online Protocols. In a standard client-server model, lifespan of element 3 the server can be seen as holding a data structure, and clients lifespan of element 2 lifespan of element 1 send update or query operations. When the order of the requests is important (e.g., Internet auctions), the users can insert(1) insert(2) insert(3) insert(4) delete-min delete-min Insert(t=0, delete-min delete-min send a timestamp along with their requests. In all fairness, "insert(x=0)") 1 2 3 4 the server should execute the operations in the order they lifespan of element 4 lifespan of element 3 were sent. If a request is delayed by the network, it should lifespan of element 2 be retroactively executed at the appropriate time in the past. lifespan of element 1 Settlements. In some cases it is mutually agreed upon lifespan of element 0 by the parties involved to change some past transaction and insert(0) insert(1) insert(2) insert(3) insert(4) delete-min delete-min delete-min delete-min all of its effects. We cite one example of such a settlement 0 1 2 3 and describe how the traditional method of handing such settlements often fails in today's systems. Suppose you Figure 1: A single insertion of an insertion operation in a have two charge cards from one company. When a bill retroactive heap data structure can change the outcome of comes from one card, you pay the wrong account. Upon every delete-min operation and the lifespan of every element.

"Insert at time 10 the operation `Insert 42' ".

282

¦

§


1. Insert : insert into a new update operation time (or before operation if it already exists). 2. Delete : delete the past update operation sequence of updates.

from the

2.3 Running Times. When expressing the running times of retroactive data structures, we will use for the total number of updates performed in the structure (retroactive or 2.1 Partial Retroactivity. Any data structural problem not), for the number of updates before which the retroactive can be reformulated in the retroactive setting. In general, operation is to be performed (i.e., ), the data structure involves a sequence of updates and queries and for the maximum number of elements present in the made over time. We define the list of structure at any single time. Most running times in this paper updates performed on the data structure, where is the are expressed in terms of , but in many cases, it is possible operation performed at time , and . to improve the data structures to express the running time The data structure is partially retroactive if, in addition of the operations in terms of and , so that retroactive to supporting updates and queries on the "current state" of operations performed at a time closer to present time are the data structure (present time), it supports insertion and executed faster. These improvements are not detailed here deletion of updates at past times as well. In other words, due to space constraints.

283

5c gf4QbF 2STe2d7 bF 2 cQ `

2 Definitions In this section, we define the precise operations that we generally desire from a retroactive data structure.

§ P %H§G

a

h

2P § P '§G

`

I'2G

1.4 Outline. The rest of the paper proceeds as follows. In Section 2, we further develop the model of retroactive data structures and explore the possible variants on the model. Next, Section 3 considers some basic problems about retroactivity, proving separations among the model variations and proving that automatic efficient retroactivity is impossible in general. In Section 4, we present two general transformations to construct an efficient retroactive version of a data structure, one for commutative invertible operations and one for any decomposable search problem. Finally, in Section 5, we discuss specific data structures for which we propose to create efficient retroactive structures. Table 1 in Section 5 gives a partial summary of the results obtained.

Thus, the retroactive versions of standard insert and delete operations are Insert "insert " , "delete " , and Delete , where repreInsert sents a moment in time. For example, if , Insert "insert " creates a new operation insert , which inserts a specified element , and modifies history to suppose that operation occurred between operations and in the past. Informally, we are traveling back in time to a prior state of the data structure, introducing or preventing an update at that time, and then returning to the present time. All such retroactive changes on the operational history of the data structure potentially affect all existing operations between the time of modification and the present time. Particularly interesting is the (common) case in which local changes propagate in effect to produce radically different perceived states of the data structure. The challenge is to realize these perceived differences extremely efficiently by implicit representations. 2.2 Full Retroactivity. The definitions presented above capture only a partial notion of retroactivity: the ability to insert or delete update operations in the past, and to view the effects at the present time. Informally, we can travel back in time to modify the past, but we cannot directly observe the past. A data structure is fully retroactive if, in addition to allowing updates in the past, it can answer queries about the past. In some sense, this can be seen as making a partially retroactive version of a persistent version of the original structure. Thus, the standard search operation, which finds an element in the data structure, becomes Query "search " , which finds the element in the state of the data structure at time .

P 2 '§G T 2 7 A3USEVR5© 'Q432 P 2 IH2G $PH2G PP '§G %H§G


 0

§

 ¨

1   YXW1     P %H§G IH2G P P %'§G %'§G IH2G PP

¨ $P'2G P 0$IH2G 2

line segment. The left endpoint of this segment represents when an item is inserted into the heap and the right endpoint represents when the item is removed. If the only operations supported by the heap are insert() and delete-min(), then we have the additional property that there are no points below the right endpoint of any segment, because only minimal items are removed from the heap. While this seems to be a clean two-dimensional representation of a heap throughout time, retroactively adding and removing an operation in the heap cannot simply be implemented by adding or removing a line segment. In fact, the endpoints of all the segments could be changed by inserting a single operation, as illustrated in Figure 1. Thus, while time can be drawn as a spatial dimension, that dimension is special in that complicated dependencies may exist, so that when small changes are made to some part of the diagram, changes may have to be made to the rest of the diagram. Thus, traditional geometric and highdimensional data structures cannot be used directly to solve most retroactive data-structure problems. New techniques must be introduced to create retroactive data structures without explicitly storing every state of the structure, much like Chan's breakthrough dynamic convex hull algorithm [3], which does not explicitly store the hull.

there are two operations of interest: at

a

h

AFED""CBA@98652 27 727 (  #   © )'&%$'10! "! ¨

432


3 General Theory The goal of this research is to design retroactive structures for abstract data types with performance similar to their nonretroactive counterparts. This section considers some of the most general problems about when this is possible. Unless stated otherwise, the results will apply to the RAM model of computation (or Real-RAM when real values are used), and sometimes to the pointer model [17] as well. Lower bounds will be given in the straight-line-program model [15] or in the cell-probe model [19].

T H E O R E M 3 . 2 . There exists a data structure in the straightline-program model, supporting time update operations, but the (partially) retroactive insertions of those operations require time, worst case or amortized. Proof: The data structure maintains two values and , initially , and supports operations addX and addY which add the value to the value or , and mulXY , which multiplies by , and store the resulting value in . Queries return the value of . Consider the sequence of opera3.1 Automatic Retroactivity. A natural question in this , mulXY , addY , mulXY , area is the following: is there a general technique for tions: addY . At the end of the sequence, converting any data structure in (e.g.) the pointer-machine mulXY , addY . We then retroactively insert the operation model into an efficient partially retroactive data structure? and Such a general technique would nicely complement existing "addX " at the very beginning of the sequence. The value is now , which is a methods for making data structures persistent [4, 5]. As of described in the introduction, retroactivity is fundamentally polynomial of degree in with arbitrary coefficients. By different from persistence, and known techniques do not Motzkin's theorem [15], the computation of that polynomial for a given value of requires multiplications, regardapply here. One simple approach to this general problem is the less of how much preprocessing time or space is used. Thus operation requires rollback method. Here we store as auxiliary information all the retroactive insertion of the addX changes to the data structure made by each operation in such that many multiplications. Because the retroactive modificaa way that every change could be reversed. For operations on tion can be repeated an arbitrary number of times, and each the present, the data structure proceeds as normal, modulo modification will have the same worst-case lower bound, the some extra logging. When the user requests a retroactive lower bound also applies to amortized data structures. , the The straight-line-program model counts only the numchange at a past time with data structure rolls back all changes made by operations ber of arithmetic operations performed by a program. Our , then applies the retroactive change as if lower bound thus holds in more general models of comit were the present, and finally re-performs all operations putation such as the Real-RAM model or the algebraic. Notice that these re-performances may computation-tree model. act differently from how the operations were performed before, depending on the retroactive change. Because the 3.2 From Partial to Full Retroactivity. A natural general changes made to the data structure are bounded by the time question about the two versions of retroactivity is whether taken by the operations, a straightforward analysis proves the partial retroactivity is indeed easier to support than full following theorem: retroactivity. In other words, is it easier to answer queries only about the present? We first give a partial answer: T H E O R E M 3 . 1 . Given any data structure that performs a collection of operations each in worst case time, there T H E O R E M 3 . 3 . In the cell-probe model, there exists a is a corresponding retroactive data structure that supports data structure supporting partially retroactive updates in the same operations in time, and supports retroactime, but fully retroactive queries of the past require tive versions of those operations in time. time.

284

 # edpPG h #P0©Q HG ` "! ©   5   !PPpHG  G  "PG  

m

P %§HG PG xhH § §  h       §  jl"!k @ § @ jU§ 5 ihY  %©P '§G   gf PG (!PHGP HG PG    

PAy G

a

P'a G

h

P h r o r o u h G xtwvpxwvpgutsnPr qp$Hoy G

a

2.4 Consistency. We assume that only valid retroactive operations are performed. For example, in a retroactive dictionary, a delete operation for a key must always appear after a corresponding insert in the list ; and in a retroactive stack, the number of push operations is always larger than the number of pop operations for any prefix of . The retroactive data structures we describe in this paper will not check the validity of recursive updates, but it is often easy to create a data structure to verify the validity of a recursive operation.

The rollback method is widely used in database management systems (see e.g. [10]) and robust file systems for concurrency control and crash recovery. It has also been studied in the data structures litterature under the name of unlimited undo or backtracking [8, 18]. Of course, this result, and its extension to operations with nonuniform costs, is far too inefficient for applications where can be or even larger--the total number of operations performed on the data structure. The goal would be to reduce the dependence on in the running time of retroactive operations. We show that this is not possible in the straight-line-program model of computation:

5 cQ 2 7 2 7 cQ u6f%8AFstsr8$F2

¨i

4Px'hwv'ay 4Px'hwHvy PG G PGG PG x'hwv

pPG qpiG P

PG

2

P DiG

F $ " !"! u6f%8F  5 cQ 5 cQ u6f%8F 4 " !"" F 

¨


Proof: The data structure is for the following problem: maintain a set of numbers subject to the update insert , which adds a number to the set, and the query sum which reports the sum of all of the numbers. For this problem, the only retroactive update operations are Insert "insert ") and Delete , whose effects on queries about the present are to add or subtract a number to the current aggregate. Thus, a simple data structure solves partially retroactive updates in time per operation. In contrast, to support queries at arbitrary times, we need to remember the order of the update operations and support arbitrary prefix sums. Thus, in the cellwe obtain a lower bound of probe model by a reduction from dynamic prefix sums [7]. On the other hand, we can show that it is always possible, at some cost, to convert a partially retroactive data structure into a fully retroactive one: T H E O R E M 3 . 4 . Any partially retroactive data structure in the pointer-machine model with constant indegree, supporting -time retroactive updates and -time queries about the present can be transformed into a fully retroactive data structure with amortized -time retroactive updates and -time fully retroactive queries using space. Proof: We define checkpoints such that at most operations have occurred between consecutive checkpoints, and maintain versions of the partially retroactive data structure , where the structure only contains the updates that occurred before time . When a retroactive update is performed for time , we perform the update on all structures such that . When a retroactive query is made at time , we find the largest such that , and perform on all updates that occurred between times and , and finally perform the query on the resulting structure. In order to reduce space, we use persistent data structures [4]. Given a sequence of operations, we perform the sequence on a fully persistent version of the partially retroactive data structure, and keep a pointer to the version obtained after the first operations for . The retroactive updates branch off a new version of the data structure for each of the modified . After retroactive updates have been performed, we rebuild the entire structure in time . This will ensure that the number of updates between any two checkpoints is always between and . The resulting data structure will have the claimed running times. The fully persistent version of the partiallly retroactive data structure after a rebuild will use at most space because it can use at most one unit of space for each computational step. The data structure will perform at most retroactive updates between two rebuilds, each using at most time and extra space, and so the space used by the fully retroactive data

structure will never exceed

.

4 Transformable Structures In this section, we present some general transformations to make data structures partially or fully retroactive for several easy classes of problems. 4.1 Commutative Operations. To highlight the difficult case of nonlocal effects, we define the notion of commutative operations. A set of operation types is commutative if the state of the data structure resulting from a sequence of operations is independent of the order of those operations. If a data structure has a commutative set of operations, performing an operation at any point in the past has the same effect as performing it in the present, so we have: L E M M A 4 . 1 . Any data structure supporting a commutative set of operations allows the retroactive insertion of operations in the past (and queries in the present) at no additional asymptotic cost. We say that a set of operations is invertible if, for every operation , there is another operation that negates the effects of operation , that is, the sequence of operations doesn't change the state of the data structure.

L E M M A 4 . 2 . Any data structure supporting a commutative and invertible set of operations can be made partially retroactive at no additional asymptotic cost.

For example, a data structure for searchable dynamic of values, where partial sums [11] maintains an array sum returns the sum of the first elements of the array, search returns the smallest such that sum , and update adds the value to . The state of the data structure at the present time is clearly independent of the order of the update operations, so it is commutative. Any operation update is negated by the operation update , so the updates are also invertible, and so any data structure for searchable dynamic partial sums is automatically partially retroactive. An important class of commutative data structures are for searching problems. The goal is to maintain a set of objects under insertion and deletion operations, so that we can efficiently answer queries that ask some relation of a new object with the set . Because a set is by definition unordered, the set of operations for a searching problem is commutative, given that the deletion of an object is always performed after its insertion. As long as the retroactive updates do not violate this consistency condition, we have: L E M M A 4 . 3 . Any data structure for a searching problem can be made partially retroactive at no additional asymptotic cost.

285

"P!6HG p( ~    AP'G ( h vwqw

m


 0

$Pe'`w'`y P Gv G PG 6xH§zy

4PVH`w}` |{y ` { P Gv G u 4PVH`w'`y P Gv G u { we`  § $Pe'`w'`y P Gv G e` { 3  u ` I{ # "!"sj3©    ` ${ ` "P66HG 2 432 A3U2 "P4'P G 3 2 APqG 'G 2 e3 2 2 3   32 2 F z6 " !`""{  5  F 4 42 # "!" 5 2 4PVH`w'`` { y ($g   P Gv G   4PVH`w4PG VH`` z|{ge'`~}` {y P P} G y y P G v G vG PG VH`zy

"PG I'2G "PG PG

Phrorouhr G xtwvpxwvpgu)wvp'o


$P'2G

` Y {

PG VH`wv


PAy G

` { u

3

m


Table 1: Running times for retroactive versions of a few common data structures. is the number of operations.

286

Petw`r vp}` G { y `P o r G Pe`rsqpHoGy Pe`rwvp'oGy Pewvp'oy Pe G e` @` swqprr vpHo'oy y G P`r G V)wvp'oy

Pe`rsqpHoGy `r G PPe`rs G V`)rwqpHoGy PV)wvpvp'o'oyy PG V)A`P w'ovpGr y y P `r G esqpHoy

Data Structure Dictionary (exact) Dictionary (successor) Queue Stack DEQUE Union/Find Priority Queue

Partially Retroactive

Fully Retroactive

¨

2

`

¨2

¨

T H E O R E M 4 . 1 . Any data structure for a decomposable searching problem supporting insertions, deletions, and queries in time and space can be transformed into a fully retroactive data structure with all operations taking time if for some , or otherwise. The space used is . Proof: Every element that was ever inserted in the data structure can be represented by a segment on the time line, between its insertion time and its deletion time (or present time if it wasn't deleted). We maintain a segment tree [1], which is a balanced binary tree where the leaves correspond to the elementary intervals between consecutive endpoints of the segments, and internal nodes correspond to the union of the intervals of their children. Each segment is thus represented as the union of intervals, each represented by one node of the tree, and each node of the tree will contain the set of segments it represents. For each node, we maintain that set in a data structure supporting the desired queries. Each retroactive update affects at most of those data structures. Given a point on the timeline, the set of segments containing that point can be expressed as the union of sets from as many nodes. For a retroactive query Query , we query in each of the sets and compose the global result using the operator. If , then the query and update times for a retroactive operation form a geometric progression and the total time is , otherwise, the total time is . For example, dictionaries, dynamic point location, and nearest-neighbor query data structures solve decomposable searching problems thus can be made fully retroactive. Of course, in many cases, it will be possible to improve the fully retroactive data structures obtained through the appli-

5 Maintaining the Timeline We showed in Theorem 3.2 in Section 3.1 that no general technique can turn every data structure into an efficient retroactive counterpart. This suggests that in order to obtain efficient data structures, we need to study special cases separately. In this section, we show how to construct retroactive data structures by maintaining a structure on top of the sequence of update operations (the timeline). Table 1 gives a partial summary of our results. In the following, we assume that the sequence is maintained in a doubly linked list, and that when a retroactive operation is performed at time , a pointer to the operation following time in is provided (such a pointer could for example have been stored during a previous operation). In the case where the pointer is not provided, it could easily

§¦

4.2 Decomposable Searching Problems. A searching problem maintains a set of objects subject to queries that ask some relation of a new object with the set . We already saw in Lemma 4.3 that data structures for searching problems are automatically partially retroactive. A searching problem is decomposable if there is a binary operator (computable in constant time) such that . Decomposable searching problems have been studied extensively by Bentley and Saxe [2]. In particular, they show how to transform a static data structure for such a problem into an efficient dynamic one. In this section, we show that data structures for decomposable searching problems can also be made fully retroactive.

`r P GP `r G wvpo VH`yesqpHoy

For example, dictionary structures, but also dynamic convex hull or planar width data structures can be stated as searching problems and are thus automatically partially retroactive. Note that these results can also be combined with Theorem 3.4 to obtain fully retroactive data structures.

cation of Theorem 4.1. For example, any comparison-based dictionary where only exact search queries are performed can be made fully retroactive by storing with each key the times at which it was present in the structure. The resulting data structure will use space and all operations can be performed in time, a factor improvement in both time and space over the straightforward application of Theorem 4.1. In other cases, though, improving upon the structures obtained from Theorem 4.1 seems rather difficult, as is the case for example with the dictionary problem allowing predecessor and successor queries. Indeed, we can view it as a geometric problem, in which we maintain a set of horizontal line segments, where the coordinate of each line segment is the element's key and the extent of the line segment is the element's lifetime. A faster retroactive data structure would immediately result in a faster data structure for dynamic planar point location for orthogonal regions, which may also play a role in general dynamic planar point location. In fact, this retroactive approach is hinted at as a research direction for dynamic planar point location by Snoeyink [14, p. 566].

P `r G ewvp'oy

m

m

©P  G 4xH§zy

§

P ` r oP G G V)wvpVH`y $Pu'hGg©VH`wv $PV)wvpHveyPGo '`wHvy    P G P e` '`G w G G Pr PG x'h PG x'hwv

P `r G esqpHoy

§

2

4Px'hwHvy VwvpVH`w'vy P G G P `r oP G G 4PxHhtte'`wv P ` r G  G ©P G P P `r G %$§IH2G esqpHoy ewvp'oy


$P4xH§z6y#86x'§m y8m P G PGG

PG t6u '§zy


be found in time by maintaining a binary search tree indexed by time on top of . 5.1 Queues. A queue supports two update operations, enqueue and dequeue and two query operations, front which returns the next element to be dequeued, and back which returns the last element enqueued. Here we describe two data structure, one partially retroactive and one fully retroactive, that thus support the update operations Insert "enqueue " , Insert "dequeue " , Delete , and queries, Query "front " , and Query "back " . The partially retroactive data structure will only allow queries at the present time, that is, at time .

retroactive without changing their asymptotic run-times.

5.2 Doubly Ended Queues. A doubly ended queue (deque) maintains a list of elements and supports four update operations: pushL , popL which insert or delete an element at the left endpoint of the list, pushR , popR , which insert or delete an element at the right endpoint of the list, and two query operations left and right that return the leftmost or rightmost element in the list. The deque generalizes both the queue and the stack.

T H E O R E M 5 . 1 . There exists a fully retroactive DEQUE data structure with all retroactive operations taking time and present-time operations taking time. L E M M A 5 . 1 . There exists a partially retroactive queue Proof: In a standard implementation of a DEQUE in an and . Then data structure with all retroactive updates and present-time array , we initialize variables a pushR operation increments and places in , queries taking time. decrements and places Proof: The data structure maintains the enqueue operations popR decrements , pushL in , and popL increments . The operation left in a doubly linked list, and two pointers: will point to the and right returns . last enqueued element in the sequence, and to the next returns In our retroactive implementation of a DEQUE, we also element to be dequeued. When an enqueue is retroactively and popR inserted, if it occurs before the operation pointed to by , maintain and : if we maintain all pushR and associate a weight of to we move that pointer to its predecessor. When an enqueue operations in a linked list operation and a weight of to each popR , is removed, if it occurs before the operation pointed by , each pushR we move that pointer to its successor. When a dequeue, then at time can be calculated as a weighted sum of a retroactive or not, is performed, we move the front pointer prefix of the list up to time . The same can be done for , , and reversing the weights. to its successor, and when a dequeue is removed, we move maintaining the list The values of the sums for all prefixes of can the front pointer to its predecessor. The pointer is only be maintained in the modified -tree of [6] with the updated when we add an enqueue operation at the end of the list. The front and back operations return the items elements of the list as leaves. Every node of the tree will contain a value , and the sum of the values along a path pointed by and , respectively. to a leaf will compute the sum of the prefix of up to L E M M A 5 . 2 . There exists a fully retroactive queue data that element. After inserting an element with weight in structure with all retroactive operations taking time the list and in the tree, we set the value in the leaf to and present-time operations taking time. and walk along the path to the root, and add to the of Proof: We maintain two order-statistic trees and . The all right siblings along the path. Deletions are processed tree stores the enqueue operations sorted by time, and symmetrically. the stores the dequeue operations, sorted by time. The Finally, we have to describe how to extract from update operations can then be implemented directly in time the data structure, where at time . For this, we , where is the size of the operation sequence augment each node of the tree with two values containing the currently stored. minimum and maximum prefix sum values for all the leaves The Query "front " operation is implemented by in its subtree. Note that these values can also be maintained querying to determine the number of dequeue op- after insertions and deletions by adding to them whenever erations performed at or before time . The operation is added to the value of the same node, and updating them then returns the item in with time rank . The if an insertion occurs in their subtree. Query "back " operation uses to determine the numTo find the contents of at time , we find the last ber of enqueue operations that were performed at or be- time when had value . This can be done by finding fore time , and simply returns the item in with time the last operation in before time , walking up the tree, rank . Thus, both queries can executed in time . and walking back down the rightmost subtree for which Using balanced search trees supporting updates in is between the minimum and maximum value. The same is worst-case constant-time [6], and by maintaining pointers done for . into the trees to the current front and back of the queues, updates and queries at the current time can be supported 5.3 Union-Find. A union-find data structure [16] mainin time. Thus queues may be made efficiently fully tains an equivalence relation on a set of distinct elements,

287


¦¨ ¢  B GP PG ~ %'§G P PpG (8¥ w S§ ¥  ©¢ PAy G

( w 2  a   ¦¨


pPG m

m

PG P %'§G

2


a m a a P"6¨%HG  § 2 t¨ 2 %§HG ¥  P ¦ ¨  ¥¢ (¢  8¥ ~ PG (t¢ ~¢ ~ § t  ( P %'§G PG ¥ P §'G pP G ¥ ¤©£¢ P `r G esqpHoy


2

pPG

PG

¥© ©

p( w 

P %H§G

¦ ¨ ¥

a

§¨

2 ET  2

pPPGG

P$PH2G PG

PV`)wvp'oxv r G y   

PG

G Rv PAuy v

© }x2 IP H2G PG


2

2

IP'2G pPG

PG

¨

P '§pPG G

v P pPG

PG

P I%'P 2 G H§G

P`r G V)wvp'oy pPG  nP$y G `

2¡ P pPG PG I'2G ¡ I'2G  v


P '§G

Pewvp'oy `r G RRv v P `r G ewvp'oy

IH2G

PAy G


L E M M A 5 . 3 . After an operation Insert

"insert

" , the

288

PG

P qpiG P

i


I'2G

i

 0 PG

2

2

¯  y¯

© By V   y qiG P

i © ©

5.4 Priority Queues. More sophisticated than queues, stacks and deques is the priority queue, which supports operations insert , which inserts an element with key value , delete-min which deletes the element with smallest key, and the query find-min which reports the current minimum key element. The delete-min operation is particularly interesting here because of its dependence on all operations in the past: which element gets deleted depends on the set of elements when the operation is executed. More precisely, it is delete-min that makes the set of operations non-commutative. Priority queues seem substantially more challenging than queues and stacks. Figure 1 shows an example of the major nonlocal effects caused by a minor modification to the past in a priority queue. In particular, in this example, the lifetime of all elements change because of a single Insert "insert " operation. Such cascading effects need to be succinctly represented in order to avoid the cost inherent in any explicit maintenance of element lifetimes. Without loss of generality, we assume that all key values

Figure 2: The "L" representation of a sequence of operations. One obvious invariant of a priority queue data structure is that the number of elements present in the queue is always equal to the number of inserts minus the number of delete-min operations. Thus, when we add an operation "insert " at time in the past, one element will have to be added in . There are two possibilities: if the element is not deleted between time and present time, can just be added to . Otherwise, the element is deleted by some operation "delete-min ", but then the element that was supposed to be deleted by will stay in the structure a little longer until it is deleted by some other delete-min operation and so on. So the insertion of operation causes a cascade of changes which is depicted in figure 3.

y

P qpiG pPG ¶ © 8µ¦ pPG

T H E O R E M 5 . 2 . There exists a fully retroactive UnionSameSet data structure supporting all operations in time. Proof: The equivalence relation can be represented by a forest, where each equivalence class corresponds to a tree in the forest. The create operation constructs a new tree in the forest with a unique node , sameset determines if the root of the trees of and are the same, and union assumes and are not in the same tree, sets as the root of the tree that contains it, and creates an edge between and . Such a forest can be maintained in time per operation using the link-cut trees of Sleator and Tarjan [13], which maintain a forest and support the creation and deletion of nodes, edges, and the changing of the root of a tree. In order to support retroactive operations, we modify the above structure by adding to each edge the time at which it was created. The link-cut tree structure also allows to find the maximum edge value on a path between two nodes. To determine whether two nodes are in the same set at time , we just have to verify that the maximum edge time on the path from to is no larger than .

PqiqHG ª i¦

Pq6i4ª2HG §

2 y

that is, a partition of into disjoint subsets (equivalence classes). The operation create creates a new element in , with its own equivalence class, union merges the two sets that contain and , and find returns a unique representative element for the class of . Note that the representative might be different after each update, so the main use of find is two determine if two elements are in the same class. The Union-Find structure can be made fully retroactive, but to simplify the discussion, we replace the find operation by a sameset operation which determines if and are in the same equivalence class.

inserted in the structure are distinct. will be used to denote the insertion time of key , and its deletion time. We write for the set of elements contained in the priority queue at time , and so is the set of elements in the queue in the present time. Let be the set of keys inserted after time t, and be the set of keys deleted after time . In order to construct a retroactive priority queue, we need to learn more about the structure of the problem. For this, we represent a sequence of updates by a planar axis represents time, and the axis figure where the represents key values. In this representation, each item in the heap is represented by a horizontal line segment. The left endpoint of this segment represents when an item is inserted into the heap and the right endpoint represents when the item is removed. Similarly, a deletemin operation is represented by a vertical ray shooting from and stopping at the intersection with the horizontal segment representing the element it deletes. Thus, insert operations paired with their corresponding deletemin are together represented by upside-down "L" shapes, and no two "L" intersect, while elements still in the structure at present time (i.e. in ) are represented by horizontal rays. See figure 2.

±2´~Y Y y u ³ ¯ 2 ª¯ 2± °$ªi2 ji©® ¬« By ²  R   ¬ ©  ª 6ª2  i

rG  PV`)wvp'oy #P6¨%G #P¨ ¨lG 

P HG

2

m


 #P6¨%G G P PG

2

 ¨ G  P

P G

"P6¨%HG 

PG

PG


¨

P qpiG P


P DiGPG

¨

¨


P G

P `r G ewvp'oy

IH2G

¨

¨

 i


C O RO L L A RY 5 . 1 . After an operation Delete , where the operation at time is "delete-min ", the element to be inserted in is

Because can change for many values of each time an operation is performed, it would be quite difficult to maintain explicitly. The next lemma will allow us to avoid this task. We say that there is a bridge at time if . Bridges are displayed as dotted vertical lines in Figure 2. Figure 3: The Insert "insert " operation causes a cascade of changes of deletion times, and one insertion in .

Proof: By definition of , any key in is not in . If the same was inserted before time , then , but this would contradict the fact that is a bridge, and so . This shows that , and so

Figure 4: The Insert "delete-min " operation causes a cascade of changes of deletion times, and one deletion in .

. This implies , and so . Because was the last bridge before time , cannot be a bridge, and so there is another key , and otherwise would be deleted instead of . But this contradicts the fact that was maximum. We next study the effect of adding an operation "delete-min " at time in the past. In that case, one element will have to be removed from . Again, this operation will have a cascading effect: if it is not in , the key that will be deleted by operation was supposed to be deleted by the operation at time , but as is being deleted at time by , the operation will delete the next key up, and so on. See figure 4. L E M M A 5 . 5 . After an operation Insert the element to be removed from is "delete-min " ,

element to be inserted in

is

Proof: As discussed above, the retroactive insertion will cause several elements to extend the time they are present in the structure. Consider the chain of keys whose life in the structure is extended. After the retroactive update, the extended pieces of horizontal segments are from to , from to for , and finally from to . They form a nondecreasing step function which, by construction, is not properly intersected by any of the (updated) vertical rays. The key that will be added to at the end of the retroactive update is . Suppose there is a key larger than in . This implies that is above every segment of the step function. But then, the vertical ray from that point intersects the step function, which is a contradiction. In the particular case where is never deleted, the step function is just one horizontal segment and the same argument holds. Note that removing a delete-min operation has the same effect as re-inserting the element that was being deleted at the time of the deletion. So we have:

where is the first bridge after time . Proof: Consider the chain of keys whose life in the structure is shortened, with and . After the retroactive update, the shortened pieces of horizontal segments are from to , from to for , and finally from to . First, it must be clear that there is a bridge at because there is no key smaller than in , and all keys larger than in are also in because . So we just have to show that there is no bridge between times

289

ÂÍ y i  By  42 y  i ÍÂ³ ~y i P P D6iqG 3 qi Â ª Â ª G "P6i ª G $Á !P# "!6iI"H2qG Ð Ps6iI1 ª HG PwiY'W1 ª G 5 5 © 3 y ³i i ¬ }7 i i }7  ¿ Ä l!³ "3 i7 @ t7 2 5 i i A2 i º ¾ vÏn»· ª ÎÉ  P PG IH2G hy i    0  2 ª  i  hy  hy 2 PG © t Ãi Ãi m Ã i   i B º  ¬«  i   y Æ y  ªÌÍ y ³  Di 27 Åª iÅ $2  2  ¬  ËÃ i 7 u  ½2 ¼ i wA¾ )w» º ª ¸  É ¾È ¸ i Ê ³ Q º w½ w» º ª ¹· © Ã i ¹z·  Ãi ¾ ½ ¹¸w ¼ ÉÈ  i Ê %Q ¹º wz½¸¾ ·w» º ª T  i w)z» ·º ª  © º  ¬  ¬   © º  ¬ « ³ qi y «Æ y  º By y ³  i    2 42 i i  ¬  ¬ ¾w ¼ ÊÉ È  i u%Q ¹º w¸¾z½ ·w» º ª ©  i w½)¸¹z» ·º ª $2

L E M M A 5 . 4 . Let

be the last bridge before , then

Let

, and suppose

 ytÆÇhy 2  2

$PH2G

2

pPG

¾ ½ ¹¸w ¼  i w)H»·Iºª

2

¬ 

 y

P P 3 ¿ 6i6iDYH vÀG 1YG P ¿ iP n3 6ÂiIª H1G HG ª ª 7 5i 7 r i

y

m

Åª Ã i hy ÃP i  RHG 

PG

P PG

¼ w¸ P  i ¾w)¸¹½ H»·ºIª 0iG ¹z·  By

P DiG P

P ª q  © qi6#HG j6ih2I HAÁG # !"!w¤z P ¿l!"r~wi i 7 7@

I'2G

I'2G

¿i

i

¬  Ä ¿ i

 y


and . For this we observe that the shortened segments at times form a step function, and that none of the keys corresponding to the steps are in , but they are in . Because removing an "insert " operation from time has the same effect as adding a "delete-min " operation directly after it, we also have: C O RO L L A RY 5 . 2 . After an operation Delete where the operation at time is "insert ", the element to be removed from is if ; otherwise, it is the

where

is the first bridge after time .

Again, because we don't maintain explicitly for all , we will ease the computation by using the fact that if is a bridge, then T H E O R E M 5 . 3 . There exists a partially retroactive priority queue data structure with all retroactive updates taking time and present-time queries taking time. Proof: The history of all update operations is maintained in a doubly linked list, and the data structure will also explicitly maintain the set in a binary search tree, and associating with each key a pointer to its insert operation in the list. After each retroactive update, an element will be inserted or deleted from according to the rules described in the lemmas above. In order to decide which element to insert or delete, we need to be able to perform two types of operations: A. find the last bridge before or the first bridge after B. find the maximum key in key in

or the minimum

If we maintain the list of updates, assigning a weight of to insert operations with , to insert with , and to delete-min operations, every bridge corresponds to a prefix with sum . So, using the data structure used in Theorem 5.1, we can answer the queries of type A in time. Because every retroactive update adds or deletes at most one element from , only one weight change has to be performed in the structure, which also takes . If we maintain the list of insertions augmented by the modified -tree of [6], and store in each internal node the maximum of all keys in its subtree which are absent in , we can easily find the maximum key in in time by walking down the tree. The minimum key in can also be maintained if we store in every internal node of the tree the minimum of all keys in its subtree which are in . Those values can be maintained in time per retroactive update because each update changes at most one element of .

290

I×ÖÔ Õ

P qpiG

2 42  y

m m

 t º  ¬ « y

2

P$H2G pPG

hy   ~  pPyG ³ i  t º  ¬ « y

nP$y G

 hy

2 i vÏ¾º É»Î º· ª  P DiG y ³ © i  i 2 By Ò«   y qº %Ñ}© º By  hy 2 y y

P qiG

P `r G esqpHoy y y P Ò G e%wÑr` º vp«'oyy #P¨lG  P `r G esqpHoy P `r G esqpHoy B DiG y Ói  P u³  y Ò º lÑ «

#P6Â4I2  ª

3 i ºº  y ³ 6Â 2ª$  2 P `r G ewvp'oy $2 2

Acknowledgments. The authors thank Michael Bender, Prosenjit Bose, Jean Cardinal, Alejandro Lopez-Ortiz, ´ Ian Munro, and the anonymous referees for helpful discussions and comments. References
[1] J. Bentley. Algorithms for Klee's rectangle problems. unpublished manuscript, Dept. of Computer Science, CarnegieMellon University, 1977. [2] J. L. Bentley and J. B. Saxe. Decomposable searching problems I: Static-to-dynamic transformations. J. Algorithms, 1:301­358, 1980. [3] T. M. Chan. Dynamic planar convex hull operations in nearlogarithmic amortized time. J. ACM, 48:1­12, 2001. [4] J. R. Driscoll, N. Sarnak, D. D. Sleator, and R. E. Tarjan. Making data structures persistent. Journal of Computer and System Sciences, 38(1):86­124, 1989. [5] A. Fiat and H. Kaplan. Making data structures confluently persistent. In Proc. 12th Ann. Symp. Discrete Algorithms, pages 537­546, Washington, DC, January 2001. [6] R. Fleischer. A simple balanced search tree with worstcase update time. International Journal of Foundations of Computer Science, 7(2):137­149, 1996. [7] M. L. Fredman and M. E. Saks. The cell probe complexity of dynamic data structures. In Proc. of the 21st ACM Symposium on Theory of Computing, pages 345­354, May 1989. [8] H. Mannila and E. Ukkonen. The set union problem with backtracking. In Proc. 13th Int. Colloq. Automata, Languages and Programming, LNCS 226, pages 236­243, 1986. [9] G. Orwell. 1984. Signet Classic, 1949. [10] R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw-Hill, 2002. [11] R. Raman, V. Raman, and S. S. Rao. Succinct dynamic data structures. In Proc. 7th Workshop on Algorithms and Data Structures, volume 2125 of Lecture Notes in Computer Science, pages 426­437, 2001. [12] N. Sarnak and R. E. Tarjan. Planar point location using persistent search trees. Commun. ACM, 29(7):669­679, 1986. [13] D. D. Sleator and R. E. Tarjan. A data structure for dynamic trees. J. Comput. Syst. Sci., 26(3):362­381, 1983. [14] J. Snoeyink. Point location. In J. E. Goodman and J. O'Rourke, editors, Handbook of Discrete and Computational Geometry, chapter 30, pages 559­574. CRC Press LLC, Boca Raton, FL, 1997. [15] V. Strassen. Algebraic complexity theory. In J. van Leeuwen, editor, Algorithms and Complexity, volume A of Handbook of Theoretical Computer Science, chapter 11, pages 633­672. MIT Press, 1990. [16] R. E. Tarjan. Efficiency of a good but not linear set union algorithm. J. ACM, 22:215­225, 1975. [17] R. E. Tarjan. A class of algorithms which require nonlinear time to maintain disjoint sets. J. Comput. Syst. Sci., 18:110­ 127, 1979. [18] J. Westbrook and R. E. Tarjan. Amortized analysis of algorithms for set union with backtracking. SIAM J. Comput., 18:1­11, 1989. [19] A. C. Yao. Should tables be sorted? Journal of the Association for Computing Machinery, 28(3):615­628, 1981.