A Statistical Approach to Risk Mitigation in Computational Markets Thomas Sandholm KTH ­ Royal Institute of Technology Center for Parallel Computers SE-100 44 Stockholm, Sweden Kevin Lai Hewlett-Packard Laboratories Information Dynamics Laboratory Palo Alto, California 94304 sandholm@pdc.kth.se kevin.lai@hp.com ABSTRACT We study stochastic models to mitigate the risk of p oor Quality-of-Service (QoS) in computational markets. Consumers who purchase services exp ect b oth price and p erformance guarantees. They need to predict future demand to budget for sustained p erformance despite price fluctuations. Conversely, providers need to estimate demand to price future usage. The skewed and bursty nature of demand in large-scale computer networks challenges the common statistical assumptions of symmetry, indep endence, and stationarity. This discrepancy leads to underestimation of investment risk. We confirm this non-normal distribution b ehavior in our study of demand in computational markets. The high agility of a dynamic resource market requires flexible, efficient, and adaptable predictions. Computational needs are typically expressed using p erformance levels, hence we estimate worst-case b ounds of price distributions to mitigate the risk of missing execution deadlines. To meet these needs, we use moving time windows of statistics to approximate price p ercentile functions. We use snapshots of summary statistics to calculate prediction intervals and estimate model uncertainty. Our approach is model-agnostic, distribution-free b oth in prices and prediction errors, and does not require extensive sampling nor manual parameter tuning. Our simulations and exp eriments show that a Chebyshev inequality model generates accurate predictions with minimal sample data requirements. We also show that this approach mitigates the risk of dropp ed service level p erformance when selecting hosts to run a bag-of-task Grid application simulation in a live computational market cluster. General Terms Management, Performance Keywords QoS, Service Level Management, Resource Allocation 1. INTRODUCTION Large scale shared computational Grids allow more efficient usage of resources through statistical multiplexing. Economic allocation of resources in such systems provide a variety of b enefits including allocating resources to users who b enefit from them the most, encouraging organizations to share resources, and providing accountability [29, 15, 5, 31]. One critical issue for economic allocation systems is predictability. Users require the ability to predict future prices for resources so that they can plan their budgets. Without predictability, users will either over-sp end, sacrificing future p erformance by prematurely sp ending the risk-hedging buffer, or over-save, sacrificing current p erformance. Both lead to dissatisfaction and instability. Moreover, the lack of accurate information precludes rational b ehavior, which would disrupt the op eration of the many allocation mechanisms that dep end on rational b ehavior. There are three parts to predictability: the predictability provided by the allocation mechanism, the predictability of the users' b ehavior, and the predictability provided by statistical algorithms used to model the b ehavior. In this work we focus on the last part. The first part was investigated in [11] and examples of mechanisms addressing the second part include [7, 30]. The goal of this pap er is to examine the degree to which future QoS levels, guarantees, and resource prices can b e predicted from previous demand on a shared computing platform. Ideally, we would use the pricing data from a heavily used economic Grid system, but such systems have not yet b een widely deployed. Instead, we examine PlanetLab[20], a widely-distributed, shared computing platform with a highly flexible and fluid allocation mechanism. The PlanetLab data set has the advantage of encompassing many users and hosts and having very little friction for allocation. However, PlanetLab does not use economic allocation, so we substitute usage as a proxy for pricing. Since many economic allocation mechanisms (e.g., Spawn[25], Pop corn[21], ICE[9], and Tycoon[16]) adjust prices in resp onse to the demand, we b elieve that this approximation is appropriate. Categories and Subject Descriptors D.4.8 [Operating Systems]: Performance--Modeling and prediction Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. HPDC'07, June 25­29, 2007, Monterey, California, USA. Copyright 2007 ACM 978-1-59593-673-8/07/0006 ...$5.00. 85 We also analyze job traces from the Royal Institute of Technology in Stockholm (KTH), Ohio Sup ercomputing Center (OSC), and San Diego Sup ercomputer Center (SDSC) to contrast the PlanetLab load with deployments exhibiting more traditional high p erformance computing user b ehavior. In these traces, we examine three key statistical metrics computed on the time-varying demand: distribution skewness, correlation over time, and volatility over time (heteroskedacity). Skewness captures whether most of the distribution's values are closer to the lower b ound (right-skewed), upp er b ound (left-skewed), or neither (symmetric). Correlation over time measures the relative imp ortance of old and recent samples in predicting future samples. Finally, heteroskedacity measures how variance changes over time. We find that that the traced demand distributions have significant skewness, long-term correlation, and changes in volatility over time. Common modelling techniques fail under these conditions. We show using simulation that a predictor based on a Normal (Norm) Gaussian distribution model and a b enchmark (Bench) predictor using the entire sample history b oth p erform p oorly with highly skewed data. In addition, high heteroskedacity reduces the accuracy of any predictions of the exp ected value of demand. We develop a new predictor based on the Chebyshev (Cheb) model. It is distribution-free and provides worst-case scenario b ounds indep endent of distribution symmetry. To handle time correlations, we sample statistics in running moments with exp onential smoothing in different time windows (e.g., hour, day and week). This allows us to maintain less data as well as to adapt quickly to changing demand b ehavior. To handle heteroskedacity, the Chebyshev-based predictor uses the running moments to predict demand b ounds. We observe that users of a market-based system can substitute predictions of demand b ounds for predictions of exp ected value. Market-based systems have mechanisms for insuring against future changes in prices (e.g., bidding more for resources). Demand b ound predictions are sufficient for users to decide how much money to sp end to mitigate risk. For example, a predicted price b ound of [$9, $50] p er GHz with an exp ected price of $10 conveys that, despite low exp ected demand, there is significant risk of a high price and the user should sp end more to mitigate risk. We use a live computational market based on the Tycoon[16] resource allocator to compare a strategy using the Chebyshev-based predictor to the default strategy of using the sp ot market price when scheduling jobs. Using a varying load over time, we show that the predictor outp erforms the sp ot market strategy while imp osing a negligible (0.001%) computational and storage overhead. The remainder of this pap er is structured as follows. In Section 2 we outline the requirements of the prediction problem we are addressing. We analyze the demand traces by comparing the dynamics of the the different systems under investigation in Section 3. In Section 4, we describ e in more detail the design and rationale of our prediction approach and models. In Section 5, we present and discuss the setup and results of the simulations and exp eriments. We review related work in Section 6 and conclude in Section 7. 2. REQUIREMENTS 2.1 User Requirements Consumers in a computational market need guidance regarding how much to sp end to meet their p erformance requirements and task deadlines. The quality of this guidance is critical to b oth individual and overall system p erformance. The b est prediction system would recommend users to invest as little as p ossible while meeting their QoS requirements with as high a probability as p ossible. Robust statistical p erformance guarantees are an alternative to reservations with absolute guarantees, which are vulnerable to sp eculation and a variety of other gaming strategies [6]. In the discussion b elow, the bid is the amount the consumer pays a provider for a resource share over a given p eriod of time. A higher bid obtains a greater share, or QoS level. The guarantee is the likelihood that the delivered QoS is greater than the consumer's defined value. The guarantee is not a contract, but rather the estimated probability that a QoS level can b e delivered. The prediction system answers the following questions for the user: Question 1. How much should I bid to obtain an expected QoS level with a given guarantee? Question 2. What QoS level can I expect with a given guarantee if I spend a given amount? Question 3. What guarantee can I expect for a given QoS level if I spend a given amount? Exactly which question(s) a user asks varies from job to job and from user to user. We assume that users can ask any question at any time. We further assume that providers allocate shares among consumers using the prop ortional share model as follows: Definition 1. Q = bid bid+Y where Y denotes the demand of the resource modeled as a random variable with an arbitrary unknown distribution, and Q is the random variable representing the obtained p erformance when requesting p erformance q os. We define guarantee as the probability that the obtained p erformance is greater than the requested p erformance: Definition 2. g = P (Q > q os) Proposition 1. To get performance guarantee g , we need -1 ( to spend C D F -qog)qos , where C DF -1 is the percent point 1 s function or the inverse of the cumulative distribution function for the demand. Proof. Substituting the share Q in Definition 2 with the right side of the equation in Definition 1 gives « ,, « ,, bid bid > q os = P Y - bid = g=P bid + Y q os ,, C DF bid - bid q os « the inverse CDF of the demand distribution can then b e expressed as: C D F -1 ( g ) = bid - bid q os 86 which after rearranging gives: bid = C DF -1 (g )q os 1 - q os mentioned in Section 2.1. Different questions incur different prediction errors, which complicates the generic evaluation of model uncertainty and construction of prediction intervals. This provides everything to answer Question 1, 2, 3. To obtain an exp ected QoS level with a given guarantee, a user should bid the following (Question 1): C D F -1 ( g ) q . (1) bid = 1-q P (Q>q )=g A user who bids a given amount and exp ects a given guarantee should exp ect the following QoS level (Question 2): bid q= . (2) -1 ( g ) bid + C DF P (Q>q )=g A user who bids a given amount and exp ects a given QoS level should exp ect the following guarantee (Question 3): « ,, (1 - q )bid (3) g = C DF q P (Q>q )=g The main goal of our prediction method is to construct estimates of the C DF and C DF -1 (a.k.a. p ercent p oint function (PPF)) accurately and efficiently. 3. DEMAND ANALYSIS In this section we describ e the data traces used in the simulations in Section 5. The load traces come from four shared compute clusters. The PlanetLab trace is available on-line at http://comon.cs.princeton.edu/. The San Diego Sup ercomputer Center, Ohio Sup ercomputing Center, and Stockholm Royal Institute of Technology traces are all available at http://www.cs.huji.ac.il/labs/parallel/workload/. · PlanetLab. PlanetLab (PL) is a planetary-scale, distributed computing platform comprising approximately 726 machines at 354 sites in 25 countries, all running the same Linux based op erating system and PlanetLab software stack, including a virtual machine monitor. The user community is predominantly computer science researchers p erforming large-scale networking algorithm and system exp eriments. Resources are allocated in a prop ortional share fashion, in virtual machine slices. The trace is from Decemb er 2005 to Decemb er 2006, and was obtained from PlanetLab CoMon node-level statistics. We calculate demand by aggregating the load value across all hosts for each 5-min snapshot interval available. This load measures the numb er of processes that are ready to run across all virtual machines. The load data was filtered to remove a p eriodic p eak caused by synchronized rpm database up dates across all slices app earing over a 30-min p eriod every night, to get a more accurate representation of demand. · San Diego Supercomputer Center. The San Diego Sup ercomputer (SDSC) trace was obtained from Dror Freitelson's parallel workloads archive [2]. It is the SDSC Blue Horizon workload from June 2000 to June 2001 (SDSCBLUE-2000-3.1-cln.swf ). The load is from the 144 node 8way SMP crossbar Blue Horizon cluster using the LoadLeveler scheduler. The demand is calculated in three steps. First, the CPU usage or load for a job is obtained as: rt (te - ts ), where rt is the CPU time consumed by the job in p ercentage of the total run time, and te and ts are the end time and start time in seconds since ep och resp ectively­all three values are available directly from the trace. Second, each job CPU usage is correlated to the time when it was submitted, thus effectively simulating that no queuing was done but all jobs could run with their actual load instantly. Finally, we aggregate the obtained CPU load value across all jobs running in every 5-min time interval. We did not analyze the utilization data directly b ecause it could mask high demand under heavy load. The transformation also makes it comparable to prop ortional share systems such as PlanetLab and Tycoon, which are the primary focus of our work. Although this masks the needs of users who do not submit jobs when the queue wait-times are too long, we assume that these users would not sp end money in an exp ensive computational market either. Consequently, we assume that these users do not contribute to demand. · Ohio Supercomputing Center. The Ohio Sup ercomputing Center (OSC) trace was also obtained from the parallel workloads archive. It is the OSC Linux cluster workload from January 2001 to Decemb er 2001 (OSC-Clust-20002.swf ). The site is a Linux cluster with 32 quad nodes and 2.2 System Requirements Two different approaches for estimating probability densities stand out: parameter-based and parameter-free estimation. In the parameter-free approach, one takes a random sample of data p oints, and smooths them to calculate the most likely real underlying distribution, e.g. using a least-squares algorithm. In the parameter-based approach, one assumes certain structural and b ehavioral characteristics ab out the real distribution and finds the parameters that b est fit the measured data, e.g. using some maximum likelihood algorithm. In either case, sample measurements or data p oints are needed to calculate the density functions. Recording the history of demand in time-series streams for a large numb er of resources across a large numb er of computational nodes in real-time does not scale, in terms of state maintenance and distribution, and prediction model construction and execution. The scalability limitations force restrictions on the length of past and future prediction horizons and the numb er of resources that can b e tracked. As a result, our goal is to use as few distribution summary data p oints as p ossible to make as flexible predictions as p ossible. Studies of large-scale networked systems [19, 8] show that the underlying distribution of the demand is neither normal nor symmetric. Assuming that it is would result in underestimated risks, so accommodating bursty, skewed b ehavior is a necessity. Furthermore, we neither want to assume stationarity nor indep endence of the underlying distribution since consumers are interested in getting the most accurate estimate based on p erforming a task in the near future, and evaluate that option against waiting for b etter conditions. There is a trade-off b etween p erformance and accuracy of the predictions, but there is also a similar trade-off b etween flexibility and evaluation capability. We would like to emp ower users to do rich customized predictions based on minimal summary statistics. They should b e able to execute what-if scenario prob es based on all three questions 87 Table 1: Central Moments of Traces (skewness > 2 is marked in bold to indicate a heavy tail) 25000 12/2005 01/2006 02/2006 03/2006 04/2006 05/2006 06/2006 07/2006 08/2006 09/2006 10/2006 11/2006 12000 10000 SDSC Demand OSC Demand 25 dual nodes with a total of 178 processors using the Maui scheduler. We p erform the identical transformation from job workload to demand as with the SDSC data describ ed ab ove. · Royal Institute of Technology. The Center for Parallel Computers at the Swedish Royal Institute of Technology (KTH) in Stockholm, provided us with the final workload trace. The trace is from a 100-node IBM SP2 cluster from Octob er 1996 to Septemb er 1997 (KTH-SP2-19962.swf ). Because CPU time was not recorded in the logs, the CPU load is set to the run time of the jobs. Apart from this the demand transformation is identical to the SDSC and OSC traces describ ed ab ove. Next, we characterize the dynamics of the computational demand traces by examining the typical prop erties of time series distributions: symmetry, indep endence, and stationarity. 8000 6000 4000 2000 0 05/2000 06/2000 07/2000 08/2000 09/2000 10/2000 11/2000 12/2000 01/2001 02/2001 03/2001 04/2001 05/2001 06/2001 12/2001 09/1997 1600 1400 1200 KTH Demand 1000 800 600 400 200 0 10/1996 11/1996 12/1996 01/1997 02/1997 03/1997 04/1997 05/1997 06/1997 07/1997 08/1997 450 400 350 300 250 200 150 100 50 0 12/2000 01/2001 02/2001 03/2001 04/2001 05/2001 06/2001 07/2001 08/2001 09/2001 10/2001 11/2001 3.1 Distribution Symmetry The first step towards characterizing the load and detecting anomalies is to look at the raw demand traces. Figure 1 shows that PlanetLab exhibits much thinner p eaks that b oth app ear and disapp ear more quickly than the p eaks in the other traces. We attribute this b ehavior to the fact that PlanetLab jobs tend to b e distributed systems or networking exp eriments that have lower resource requirement than scientific computing workloads. Next, we measure the distribution symmetry of the demand for the different clusters in Figure 2. A distribution is right-skewed if its right tail is longer than its left and its mass leans to the left. We see that the PlanetLab load stands out as b eing more right-skewed than the others. All traces, however, show asymmetric right-skewed b ehavior, indicating that low demand is the most common state of the systems. Distribution models such as the Gaussian or Normal distribution assumes symmetry and will thus b e inaccurate for all of the traces studied. The central moments are summarized in Table 1. Figure 1: Demand History (Hourly) 0.14 0.12 P(D = Demand) 0.1 0.08 0.06 0.04 0.02 0 0 10 20 30 40 50 60 Demand (%) 70 80 90 100 3.2 Dependence and Long Memory One of the most common assumptions when studying time series and when sampling data to approximate distributions and densities is that the observations are I ID. I.e. the sampled data p oints are indep endent and identically distributed. This allows the models to b e trained once and then reused indefinitely when they have converged. It also simplifies the construction of confidence and prediction intervals. Because there is no bias in the samples they can b e taken to b e a good representation of the whole data set. The simplest way of testing dep endence, seasonality and randomness of samples is to draw an auto-correlation function (ACF) plot with a series of increasing time lags. We study the correlations PlanetLab KTH OSC SDSC Figure 2: Demand Density 88 12/2006 PL KTH OSC SDSC 3433 382 66 3249 PL Demand 0.37 0.71 0.72 0.51 1 4.06 1.11 1.53 1.02 2 29.45 0.90 4.35 2.07 20000 15000 10000 5000 0 R/S (snapshots) for lags up to 7 days. The plots in Figure 3, clearly show that the observations are not indep endent in time but rather show a strong and slowly decaying correlation to measures in the previous time interval. If the decay is slower than exp onential the data is said to exhibit long memory. This is clearly the case for at least the KTH trace (within the weekly time lag). Long memory is a feature that gives standard auto-regressive models such as GARCH problems [18]. 1 0.8 PlanetLab ACF 0.6 0.4 0.2 0 0 1 2 3 4 5 6 Lag (Days) 1 0.8 SDSC ACF 0.6 0.4 0.2 0 0 1 2 3 4 5 6 Lag (Days) 1 0.8 KTH-PDC ACF 0.6 0.4 0.2 0 0 1 2 3 4 5 6 Lag (Days) 1 0.8 OSC ACF 0.6 0.4 0.2 0 0 1 2 3 4 5 6 Lag (Days) 7 7 7 7 100000 10000 PlanetLab KTH OSC SDSC Norm H=0.92 1000 100 10 1 10 100 1000 10000 100000 1e+06 snapshots Figure 4: Rescaled-Range Dynamics and Hurst Exponent Figure 3: Auto-Correlation Functions Another p opular approach used to detect long-term correlations is the rescaled-range (R/S) analysis [19], which was influenced by Hurst's study of the Nile floods [14]. In general it provides a way to investigate how the values in a time series scale with time. The scaling index, known as the Hurst exp onent is 0.5 if all the increments of the time series are indep endent. A series where a trend in the past makes the same trend in the future more likely is called presistent, and has a Hurst exp onent greater than 0.5. Conversely, systems where past trends are likely to b e reversed are called anti-p ersistent and scale with a Hurst exp onent less than 0.5. If the R/S values (increment range, standard deviation ratio) for different time intervals are plotted against time on a log-log scale, the Hurst exp onent app ears as the slop e of the fitted line. Figure 4 shows the R/S plot for the demand traces. A Hurst exp onent around 0.92 fits all the traces under investigation, which indicates a very high correlation b etween past and future demand trends. ments. If the data is extremely skewed such as in p ower-law distributed data, b oth the mean and the variance can b e infinite and thus some other measures of volatility need to b e used. One p opular approach is to look at the (absolute) log difference in the increments of the data. It turns out that even for very risky and volatile time series with p ower-law b ehavior like the stock-market, the absolute increments are predictable since high volatility data p oints tend to cluster [19]. A volatility graph showing the logtransformed increment differences over time is also another measurement of how Gaussian-like the data is. A Gaussian distribution produced by a Brownian-motion process has a near uniform distribution of increment sizes, whereas more unpredictable and thereby riskier processes have clusters of high volatility intermingled with lower volatility p eriods. In Figure 5 all of the traces show signs of changing volatility over time (heteroskedacity). High volatility instances also seem to b e clustered. An analysis of how they are clustered is b eyond the scop e of this pap er. The stock market has b een shown to exhibit p ower-law scaling in the distribution of the volatility over time [19]. We therefore also look at the distribution tail b ehavior for our traces. A heavy-tailed or fat-tailed distribution will exhibit a longer tail of the complement of the cumulative distribution function (1-CDF) than the exp onential distribution. According to this definition Figure 6 and Table 2 show that all traces are heavy-tailed in hourly volatility. PL and SDSC are also heavy-tailed in daily volatility. This investigation of traces indicates that a multi-fractal time-scaling (trading time deformation) [18, 19] model may b e appropriate. We, for example, note that the Hurst exp onent obtained can b e used to determine the fractal dimension [3], which is a typical measure of the roughness of the system in multifractal time-series. Figure 6 also indicates that a stretched exp onential distribution [8] could b e a good fit. However, an analysis of these more complicated distributional models are outside the scop e of this pap er. 4. PREDICTION APPROACH In this section, we present our approach to providing accurate distribution predictions with an upp er prediction b ound. The method is agnostic to the model used to fit the time series data of demand, and the prediction error distribution. 3.3 Heteroskedacity and Fat Tails The last prop erty that we investigate is the general volatility of the data which is crucial for making good risk assess- 89 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 12/2005 01/2006 02/2006 03/2006 04/2006 05/2006 06/2006 07/2006 08/2006 09/2006 10/2006 11/2006 12/2006 0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 05/2000 06/2000 07/2000 08/2000 09/2000 10/2000 11/2000 12/2000 01/2001 02/2001 03/2001 04/2001 05/2001 06/2001 There are two architectural comp onents providing prediction capabilities: the statistics col lector (SC), and the prediction generator (PG). The SC consumes a time series of prices and produces a statistical summary, which in turn is consumed by the PG. The statistical summary comprises instantaneous running (non-central) moments over configurable time horizons. In our case we provide hourly, daily and weekly summaries, as they corresp ond b est to the length of the computational jobs run as well as the length of the horizon that is typically predictable. In addition to the moments, the summary also has the current price, a short history of moments for the most recent time p eriods, and the minimum and maximum measured price values. The running non-central moments are calculated as follows: t,p = t-1,p + (1 - )xp t where t,p is the pth moment at time t, xt the price at time 1 t and = 1 - n , where n is the numb er of data p oints in the time horizon covered by the moments, and 0,p = xp . We 0 refer to [22] for further details on how the central moments are calculated. The PG comp onent is instantiated with a predictor that uses the moments and the extremes to construct approximations of the cumulative distribution function (CDF), p ercent p oint function (PPF), and a function generating confidence intervals. The history of moments is used to construct prediction intervals. Here we will describ e a Gaussian (Norm), a Chebyshev (Cheb), and a sample-based predictor (Bench) which we use for b enchmarking. SDSC Hourly Volatility PL Hourly Volatility 1 KTH Hourly Volatility 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 10/1996 11/1996 12/1996 01/1997 02/1997 03/1997 04/1997 05/1997 06/1997 07/1997 08/1997 09/1997 12/2001 2 OSC Hourly Volatility 1.5 1 0.5 0 -0.5 -1 12/2000 01/2001 02/2001 03/2001 04/2001 05/2001 06/2001 07/2001 08/2001 09/2001 10/2001 11/2001 Figure 5: Hourly Demand Volatility 4.1 Gaussian Predictor 1 P(V > Volatilty) 0.1 0.01 0.001 1e-04 0.01 1 P(V > Volatilty) 0.1 0.01 0.001 1e-04 0.01 0.1 Volatility 1 0.1 Volatility SDSC SDSC Exp OSC OSC Exp 1 PL PL Exp KTH KTH Exp The Gaussian CDF () is readily available and can b e calculated directly from the first two central moments. Since the inverse of the Gaussian CDF or PPF (-1 ) does not have a closed form, we use an approximation based on a rational minimax algorithm develop ed by Acklam [1]. The 100p%-confidence interval is then calculated as [-1 (0.5 - p ), -1 (0.5 + p )]. An identical interval calculation can b e 2 2 done for all other predictors using their resp ective p ercent p oint functions. The prediction interval is calculated by applying and -1 on the history of moments. 4.2 Chebyshev Predictor When predicting p erformance guarantees we are typically more interested in worst case scenario b ounds as opp osed to p erfect data density fitting across all p ercentiles. One of the most prominent techniques to estimate b ounds in probability theory is by means of the Chebyshev inequality [12], which states that: P (|Y - | k ) 1 k2 Figure 6: Heavy Tails for Hourly Demand Volatility Table 2: Skewness of Volatility hour day week PL 7.45 4.55 1.45 KTH 4.40 1.87 1.13 OSC 6.41 1.82 0.53 SDSC 4.44 2.05 1.63 where is the first central moment (mean) and the second central moment (standard deviation) of the underlying distribution. Rewriting the inequality as a CDF we get: C D F (y ) = 1 - 1 1 + k2 where k is y - . For unimodal distributions we can tighten the b ound further by applying the Vysochanskij-Petunin in- 90 equality [24], which for k q 8 3 gives: 4 9k2 q ,k< ,k 5. SIMULATION AND EXPERIMENT RESULTS This section contains four different validators of our approach presented in the previous section. First, we run a simulation with generated random distributions against our predictors. Second, we run a prediction simulation using the compute cluster demand traces describ ed in Section 3. Third, we run an exp eriment in our own live computational market cluster, Tycoon, comparing sp ot market scheduling with our extended risk mitigating scheduler describ ed in the previous section. Finally, we run an efficiency exp eriment to measure the overhead incurred by the predictions. C D F (y ) = 1 - Taking the inverse of the CDF we get: 8 q 1 < ± -1 q 1-p P P F (p ) = 4 : ± 9-p q3 8 , 8 . 3 Since Chebyshev only gives us upp er and lower b ounds we cannot calculate the p ercentiles around the mean accurately, but this is not a great limitation in our case where we are primarily interested in worst-case scenario (upp er) b ounds. The confidence and prediction intervals are calculated in the same way as in the Gaussian case. 5.1 Asymmetry Modeling Simulation To validate the ability to approximate arbitrary skewed distributions we develop ed a generator capable of producing random data with distributions in a continuum from a right-skewed Pareto through Gaussian to a mirrored leftskewed Pareto with the first two moments kept stable and only the skewness varying. An example of distributions generated can b e seen in the lower graph in Figure 7. The upp er graph shows the result for the Cheb and Gauss predictors. As exp ected, Cheb gives b etter approximations for left and right-skewed distributions, whereas the Gauss predictor p erforms b est for the near-symmetrical distributions (skewness near 0). 4.3 Sample Bench Predictor We use a b enchmark predictor to compare our summary statistics predictors with a sample-based predictor. This predictor has access to the entire past history of data p oints and calculates the p ercentile p oints, cumulative distributions, and prediction b ounds from the raw data sampled. The b enchmark predictor could not b e used in practice b ecause of its prohibitive computational and storage requirements. 4.4 Multi-Host Predictions We combine the results from Equation 1, Equation 2 and Equation 3 with our predictors to assess risk when making scheduling decisions across a set of hosts. For this purp ose we extend an economic scheduling algorithm previously presented in [11, 22]. The purp ose of this Best Resp onse algorithm is to distribute a total budget across a set of hosts to yield the maximum utility for the user. The optimization problem, as seen by a user, is defined as: P bj maximize U = n=1 wj bj +yj sub ject to j Pn j =1 bj = bid, and bj 0. 1.2 1 0.8 0.6 0.4 0.2 0 -50 Error -40 -30 -20 -10 0 10 20 30 40 50 Skewness Chebychev 0.5 0.4 P(X=x) 0.3 0.2 0.1 0 0 20 40 x 60 80 100 Norm where wj is the preference or weight for host j sp ecified by the user, bj is the bid that the user puts on host j , and bid the total budget of the user. We replace yj , the sp ot price of host j , with the stochastic variable Y as modeled ab ove. In Equation 1, which gives an exp ected p erformance value given a p ercentile and a (prediction) confidence level, we calculate Y with the PPF of the predictor. To calculate the bid given a p erformance level, a g uar antee and a confidence level or to calculate the g uar antee given a bid and a p erformance level, we numerically invert the Best Resp onse algorithm. This allows us to prob e different bid or g uar antee levels until we get an acceptable p erformance match or we encounter budget constraints. Users can use this model in a variety of ways. They can use the sp ot prices to determine whether they can meet their deadline, guarantee, and budget constraints, or if they should submit their job at a later time. Instead of the sp ot price, users can also use the statistical guarantees and prediction b ounds to pick the hosts with the b est sustained low-price b ehavior for long running tasks. We examine the effectiveness of the model for these use cases in the next section. Figure 7: Fitting Skewed Distribution 5.2 Demand Prediction Simulation We use the compute cluster demand traces from PlanetLab, KTH, SDSC, and OSC to study the ability of our prediction approach to give accurate risk assessments with the Cheb, Gauss, and Bench predictors. Recall that the Bench predictor simply uses all previous historical data to make PPF and prediction (confidence) interval estimates, whereas the other predictors base their estimates on the summary statistics generated by the statistics collector (SC) comp onent (including running moments, and moment history). The setup of the exp eriment is as follows. For each trace, we feed the time series data into the SC comp onent configured for hourly, daily and weekly prediction horizons, and then try to make a prediction of the 95th p ercentile price 91 with a 90% confidence for the subsequent interval (for which no data is revealed) using the prediction generator (PG) instantiated with the Cheb, Gauss and Bench predictors. We then measure the delivered guarantee as the likelihood that at least 95% of the demand values are less than the predicted upp er prediction confidence b ound, which we denote the success rate (S). We also track the width of the prediction b ound (B) as the difference b etween the actual 95th p ercentile demand and the predicted demand. The PG comp onent is configured to track three historical running moment snapshots into the past of the first two non-central moments. The results are shown in Table 3. S and B are defined as: ^ S = P (f (0.95) > f (0.95)) and B=E ^ |f (0.95) - f (0.95)| f (0.95) ! Table 3: Prediction Result of 95th Percentile with 90% Upper Prediction Bound. (S is the success rate and B the prediction bound.) PL Cheb Norm Bench KTH Cheb Norm Bench OSC Cheb Norm Bench SDSC Cheb Norm Bench Hour S B 0.93 0.16 0.62 0.08 0.79 0.28 Hour S B 0.96 0.38 0.87 0.22 0.98 3.25 Hour S B 0.94 0.40 0.88 0.22 0.81 1.63 Hour S B 0.95 0.26 0.85 0.15 0.79 0.96 Day S 0.95 0.76 0.72 Day S 0.95 0.87 0.98 Day S 0.94 0.81 0.73 Day S 0.96 0.78 0.64 Week S B 0.93 1.20 0.80 0.58 0.55 0.17 Week S B 0.97 0.91 0.76 0.44 0.97 1.40 Week S B 0.94 1.11 0.74 0.51 0.63 0.33 Week S B 0.95 0.92 0.72 0.43 0.41 0.32 B 0.57 0.29 0.24 B 0.92 0.47 1.97 B 1.30 0.70 0.80 B 0.88 0.47 0.55 ^ where f is the actual p ercentile function of the price, and f is the predictor estimated p ercentile function of the price. The Cheb predictor generates consistent and accurate success rates (S) for all traces across all prediction horizons. For daily and weekly horizons the prediction b ound size (B) is in most cases very wide, which is likely to cause risk-averse users or users with a very limited budget to delay their job submissions. Both the Norm and the Bench predictors underestimate the risk grossly as seen by very low success rates, although the b ounds are tight. The Norm predictor would yield b etter success rates if we provided additional moment history snapshots. However, one of our requirements is to maintain system scalability and flexibility, so we must minimize the numb er of statistical summary p oints to reduce the size of the snapshots. We note that using a horizon size of one and an infinite snapshot size in the SC comp onent would make the summary statistics results converge to the results obtained by the Bench predictor. 5.3 Risk Mitigation Experiment To exp erimentally validate our approach, we run a scheduling b enchmark in a live Tycoon computational market. We submit jobs using the NorduGrid/ARC Grid meta-scheduler [23] and schedule locally using the extended Best Resp onse algorithm describ ed in Section 4. Tycoon uses the Xen virtual machine monitor [10] to host running jobs. Each job is run in a separate, dedicated, isolated machine. The design rationale b ehind this exp eriment is to create a changing usage pattern that could p otentially b e predicted, and to study how well our approach adapts to this pattern under heavy load. We do not claim that the traffic pattern is representative of a real system. See the analysis in Section 5.2 to see how the Tycoon predictors handle real-world usage patterns. The exp eriment consists of two indep endent runs with 720 virtual machines created on 60 physical machines during each 6 hour run. All jobs run on dedicated virtual machines that are configured based on the current demand, and job resource requirements. All jobs request 800MB of disk, 512MB of memory and 1 - 100% CPU, dep ending on demand. The users are configured as shown in Figure 8. More sp ecifically: · User 1 (Continuous) continuously runs 60 parallel jobs with low funding on 30 physical hosts throughout the exp eriment run (6 hours). The set of hosts is static and separate from the bursty user's hosts. · User 2 (Bursty) sp oradically runs 60 highly funded 30 minute jobs every hour on the 30 physical hosts not used by User 1. · User 3 (Spot Market) schedules and runs 30 jobs of 40 minutes each every hour based on sp ot market prices. The sp ot market user selects from any of the 60 hosts in the system. · User 4 (Predicting) schedules and runs 30 jobs of 40 minutes each every hour based on the predicted 80th p ercentile prices using the PG/Cheb predictor consuming continuous statistical feeds from the SC comp onent deployed on each compute node. The predicting user selects from any of the 60 hosts in the system. All jobs run a CPU b enchmark incrementing a counter with a frequency based on the allocated resource share. The value of the counter is our metrics for work completed. Both the sp ot market and predicting users have the same budget for purchasing resources. The dynamic b ehavior of the system is as follows. The continous user's jobs run on the hosts in the left of Figure 8. The bursty user's jobs run on the right. The sp ot market user selects the host with the lowest sp ot price, so that the jobs will almost always run on one of the right hosts. On a few rare occasions, all of the right hosts will have a higher sp ot price than the left hosts. The predicting user selects based on historical prices. These tend to b e lower for the left hosts since the bursty user avoids those hosts, so the predicting user's jobs will tend to use the left hosts. The distribution of work completed by each job submitted by the sp ot market and predicting users are graphed in the top graph in Figure 9. The distribution for the predicting user is shifted to the right compared to the sp ot market 92 Figure 8: Risk mitigation experimental setup Round-Trip Time (seconds) user, showing that the predicting user finishes more work. The b ottom graph shows the cumulative distribution function (CDF) of the work completed by the two users. The area b etween the two plots shows that fewer predicting jobs were impacted by the bursty user. On average, the jobs of the predicting user p erformed 22.7% more work. The sp ot market user's jobs on the far right of the CDF were able (by chance) to run on hosts at times when none of the bursty user's jobs were running. This is why they were able to complete so much work. In contrast, the predicting user almost never selects such hosts b ecause it predicts (accurately in most cases) that the bursty user will soon return. Thus, risk mitigation b oth increases average p erformance and reduces variability in p erformance, even under heavy and spiky load, given that the spiky b ehavior is predictable over some time horizon (1 hour in this case). 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0 150000 p erformance. The prediction implementation extends step 1 by retrieving the summary statistics required to calculate the prediction b ounds, and then extends step 2 by calculating the optimal bid distribution given a desired guarantee level and a given prediction confidence b ound level. In the exp eriment the algorithms were run against a cluster of 70 hosts, using a guarantee level of 95% and a confidence b ound of 90%. We interleave 400 runs using the two algorithms and measure the round-trip time of each op eration. 0.52 0.51 0.5 0.49 0.48 0.47 0.46 0.45 Predicting Algorithm Spot Market Algorithm 0 50 100 150 200 250 300 350 400 PDF 200000 250000 300000 350000 400000 450000 500000 550000 Work Completed Prediction 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 150000 Spot 0.44 Job Figure 10: Round-Trip Times of Spot-Market vs Prediction-Based Host Selection and Budgeting CDF 200000 250000 300000 350000 400000 Work Completed 450000 500000 550000 Prediction Spot Figure 9: Risk Mitigation using Percentile Predictions 5.4 Prediction Efficiency Experiment As a final exp eriment, we evaluate the overhead imp osed by the prediction implementation presented in this pap er compared to the standard sp ot market budgeting algorithm used in the previous exp eriment. The standard algorithm involves the following two steps: (1) get live host sp ot-market price information, (2) evaluate the bid distribution across the hosts given a total budget to maximize the aggregate Figure 10 shows the results. The mean round-trip time for sp ot-market budgeting is .46 seconds and the mean for prediction is .5 seconds. The impact of this overhead on actual running time dep ends on how frequently the budgeting process is run. In the exp eriment in Section 5.3, we budget once an hour, so the overhead of the prediction algorithm over the sp ot market one is (.5 - .46)/3600 = .001%. In other words, the overhead for 70 hosts is negligible, indicating that the algorithm will scale to tens of thousands of hosts. 6. RELATED WORK MacKie-Mason et. al. [17] investigate how price predictors can improve users' bidding strategies in a market-based re- 93 source scheduling scenario. They conclude that simple predictors, such as taking the average of the previous round of auctions, improve exp ected bidder p erformance. Although the goal of this work is similar to ours, they investigate a different combinatorial allocation scenario where first price winner-takes-it-all auctions are employed, as opp osed to the prop ortional share allocation in our work. Nevertheless, their results are encouraging. Another use of economic predictions is describ ed by Wellman et. al. [26], where bidding agents use the exp ected market clearing price in a comp etitive or Walrasian equilibrium. They employ tatonnement which involves determining users' inclination to bid a certain value given a price-level. Wellman et. al. compare their comp etitive analysis predictor to simple historical averaging and machine learning models. They conclude that strategies that consider b oth historical data and instance-sp ecific data provide a comp etitive advantage. The conditional probability of price dynamics given a price-level would b e additional useful information in our model. However, this is probably impractical in large-scale systems with users entering and leaving the market at will, and with large real-valued price ranges, so we assume this b ehavior is incorp orated in the price history itself. Wolski et. al. [28] describ e the Network Weather Service (NWS) which has b een used in large-scale Grid deployments to monitor compute jobs. Our work differs from NWS in b oth how statistics are collected and stored and how predictions are computed. NWS uses a multi-service infrastructure to track, store and distribute entire time-series feeds from providers to consumers via sensors and memory comp onents (feed history databases). Our solution only maintains summary statistics and therefore is more light-weight. No p ersistent storage or searching infrastructure is required. For prediction, NWS uses simple moving average with static parameters. We use more general predictors that can handle any dynamics and adapt their parameters automatically. In addition, the focus in [28] is on predicting queue wait times, whereas we focus on predicting actual demand. Brevik et. al. [4] present a Binomial Method Batch Predictor (BMBP) complementing NWS [28]. The approach is to assume that each observed value in the time-series can b e seen as an indep endent draw from a Bernoulli trial. The problem is that this does not account for time correlations, which we have found to b e substantial in our analysis. Brevik addresses the correlation by first detecting structural changes in the feed when BMBP generates a sequence of bad predictions and thereafter truncating the history which the predictor model is fit against. Our approach is to leverage the correlation by using biased samples of the most recent time intervals, which result in dynamic adaptation of 'structural' changes in the feed. The problem of monitoring and fixing prediction problems a p osteriori as in BMBP is that the detection mechanism is somewhat arbitrary and a structural failure of the model could result in great losses, which could defeat the purp ose of providing risk mitigating predictions [18]. Our prediction interval calculation was inspired by Haan and Meeker [13] but they also assume that random indep endent samples are drawn and that a large numb er of sample data p oints are used to yield tight prediction b ounds. Neither of these two assumptions are true in our scenario. Our calculation of the prediction interval can b e seen as more in the spirit of the simple empirical intervals prop osed by Williams and Goodman [27]. Their empirical source is the previous sample p oint, whereas, we use summary statistics as input to the empirical predictions. This allows us to cover larger prediction horizons with greater confidence using fewer data p oints. The data analysis of the distribution characteristics in Section 3 was inspired by the work by Mandelbrot on modeling risk in volatile markets [19]. The fat-tail b ehavior of the hourly volatility (not daily or weekly across all the traces) fits well with the volatility Mandelbrot has seen in the cotton-price, Deutschmark-Dollar exchange rate, and the stock price market dynamics, which he calls 'wild' randomness or chance. 7. CONCLUSIONS All of the demand traces studied show non-Gaussian properties, which called for more generic distribution-free predictors. The clear correlation b etween subsequent hourly, daily and weekly time intervals of the traces suggests that the typical I ID assumption is not valid and would lead to risk underestimation. This leads us to a model based on dynamically tracking running moments and the most recent snapshots of these moments instantiated by a worst-case b ound, Chebyshev inequality influenced distribution estimator. Although this predictor does not generate tight prediction b ounds for daily and weekly predictions, it is consistent in the confidence levels promised across all traces investigated, which makes it a good general indicator of the model uncertainty and reliability of the predictions. In highly volatile environments, making p oint predictions into the future is not p ossible with any level of accuracy, but high volatility p eriods are typically clustered. Thus, accurately estimating model uncertainty helps users to decide 1) whether to delay running their jobs until after the 'stormy' p eriod has passed, or if they do decide to run, 2) how much insurance funding to sp end. The Bench predictor shows how p oor predictions can b e if one only relies on the available past history and ignores time correlations. Similarly, the Norm predictor exemplifies how inaccurate predictions can b e if symmetric Gaussian distributions are wrongly assumed. The distribution agnostic Cheb predictor, on the other hand, provides sup erior predictions in cases where demand was heavily skewed and the volatility spiky, e.g. in the PlanetLab trace. We have seen that our prediction approach can easily b e deployed in scheduling scenarios where the most reliable hosts, delivering a good sustained p erformance over time, need to b e picked for long-running jobs. This work has focused primarily on helping consumers sp end the right amount of money when purchasing resource shares, but the prediction approach with SC/PG/Cheb is general enough to b e used by brokers pricing options or reservations as well, which is the focus of future work. We would also like to study the effects different mixes of prediction, sp ot-market and reservation usage patterns have on the overall system efficiency and fairness. 8. ACKNOWLEDGMENTS We thank our colleagues Scott Clearwater, Bernardo Hub erman, Li Zhang, Fang Wu, and Ali Ghodsi for enlightening discussions. This work would not have b een p ossible without the funding from the HP/Intel Joint Innovation 94 Program (JIP), our JIP liason, Rick McGeer, and our collab orators at Intel, Rob Knauerhase and Jeff Sedayao. We are grateful to Vivek Pai at Princeton University for making the PL trace available and helping us interpret it; Travis Earheart and Nancy Wilkins-Diehr at SDSC for making the SDSC trace available; and Lars Malinowsky at PDC for providing the KTH trace. [16] [17] [18] 9. REFERENCES [19] [1] An Algorithm for Computing the Inverse Normal Cumulative Distribution Function. http://home.online.no/~ p jacklam/notes/invnorm/, 2007. [2] Parallel Workloads Archive. http://www.cs.huji.ac.il/labs/parallel/workload/, 2007. [3] M. Bo druzzaman, J. Cadzow, R. Shiavi, A. Kilroy, B. Dawant, and M. Wilkes. Hurst's rescaled-range (r/s) analysis and fractal dimension of electromyographic (emg) signal. In Proceedings of IEEE Souteastcon '91, pages 1121­1123, Williamsburg, VA, USA, 1991. IEEE. [4] J. Brevik, D. Nurmi, and R. Wolski. Predicting b ounds on queuing delay for batch-scheduled parallel machines. In PPoPP '06: Proceedings of the 2006 ACM Principles and Practices of Paral lel Programming, New York, NY, USA, 2006. ACM. [5] R. Buyya, D. Abramson, and S. Venugopal. The Grid Economy. Proceedings of the IEEE, Special Issue on Grid Computing, 93(3):479­484, March 2005. [6] Chaki Ng and Philip Buonadonna and Brent N. Chun and Alex C. Sno eren and Amin Vahdat. Addressing Strategic Behavior in a Deployed Micro economic Resource Allo cator. In Proceedings of the 3rd Workshop on Economics of Peer-to-Peer Systems, 2005. [7] S. Clearwater and B. A. Hub erman. Swing Options. In Proceedings of 11th International Conference on Computing in Economics, June 2005. [8] S. Clearwater and S. D. Kleban. Heavy-tailed distributions in sup ercomputer jobs. Technical Rep ort SAND2002-2378C, Sandia National Labs, 2002. [9] David C. Parkes and Ruggiero Cavallo and Nick Elprin and Adam Juda and Sebastien Lahaie and Benjamin Lubin and Loizos Michael and Jeffrey Shneidman and Hassan Sultan. ICE: An Iterative Combinatorial Exchange. In Proceedings of the ACM Conference on Electronic Commerce, 2005. [10] B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, I. Pratt, A. Warfield, P. Barham, and R. Neugebauer. Xen and the Art of Virtualization. In Proceedings of the ACM Symposium on Operating Systems Principles, 2003. [11] M. Feldman, K. Lai, and L. Zhang. A Price-Anticipating Resource Allo cation Mechanism for Distributed Shared Clusters. In Proceedings of the ACM Conference on Electronic Commerce, 2005. [12] W. Feller. An Introduction to Probability Theory and its Applications, volume II. Wiley Eastern Limited, 1988. [13] G. J. Hahn and W. Q. Meeker. Statistical Intervals: A Guide for Practitioners. John Wiley & Sons, Inc, New York, NY, USA, 1991. [14] H. Hurst. Long term storage capacity of reservoirs. Proc. American Society of Civil Engineers, 76(11), 1950. [15] L. V. Kale, S. Kumar, M. Potnuru, J. DeSouza, and S. Bandhakavi. Faucets: Efficient resource allo cation on the computational grid. [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] In ICPP '04: Proceedings of the 2004 International Conference on Paral lel Processing (ICPP'04), pages 396­405, Washington, DC, USA, 2004. IEEE Computer So ciety. K. Lai. Markets are Dead, Long Live Markets. SIGecom Exchanges, 5(4):1­10, July 2005. J. K. MacKie-Mason, A. Osepayshvili, D. M. Reeves, and M. P. Wellman. Price prediction strategies for market-based scheduling. In ICAPS, pages 244­252, 2004. B. Mandelbrot, A. Fisher, and L. Calvet. The multifractal mo del of asset returns. In Cow les Foundation Discussion Papers: 1164. Yale University, 1997. B. Mandelbrot and R. L. Hudson. The (Mis)behavior of Markets: A Fractal View of Risk, Ruin, and Reward. Basic Bo oks, New York, NY, USA, 2004. L. Peterson, T. Anderson, D. Culler, , and T. Rosco e. Blueprint for Intro ducing Disruptive Technology into the Internet. In First Workshop on Hot Topics in Networking, 2002. O. Regev and N. Nisan. The Pop corn Market: Online Markets for Computational Resources. In Proceedings of 1st International Conference on Information and Computation Economies, pages 148­157, 1998. T. Sandholm, K. Lai, J. Andrade, and J. Odeb erg. Market-based resource allo cation using price prediction in a high p erformance computing grid for scientific applications. In HPDC '06: Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, pages 132­143, June 2006. http://hp dc.lri.fr/index.php. O. Smirnova, P. Erola, T. Ekel¨f, M. Ellert, J. Hansen, o A. Konsantinov, B. Konya, J. Nielsen, F. Ould-Saada, and A. W¨¨n¨nen. The NorduGrid Architecture and Middleware aa a for Scientific Applications. Lecture Notes in Computer Science, 267:264­273, 2003. D. F. Vyso chanskij and Y. I. Petunin. Justification of the 3 sigma rule for unimo dal distributions. Theory of Probability and Mathematical Statistics, 21:25­36, 1980. C. A. Waldspurger, T. Hogg, B. A. Hub erman, J. O. Kephart, and W. S. Stornetta. Spawn: A Distributed Computational Economy. Software Engineering, 18(2):103­117, 1992. M. P. Wellman, D. M. Reeves, K. M. Lo chner, and Y. Vorob eychik. Price prediction in a trading agent comp etition. J. Artif. Intel l. Res. (JAIR), 21:19­36, 2004. W. Williams and M. Go o dman. A simple metho d for the construction of empirical confidence limits for economic forecasts. Journal of the American Statistical Association, 66(336):752­754, 1971. R. Wolski, G. Ob ertelli, M. Allen, D. Nurmi, and J. Brevik. Predicting Grid Resource Performance On-Line. In Handbook of Innovative Computing: Models, Enabling Technologies, and Applications. Springer Verlag, 2005. R. Wolski, J. S. Plank, T. Bryan, and J. Brevik. G-commerce: Market formulations controlling resource allo cation on the computational grid. In IPDPS '01: Proceedings of the 15th International Paral lel and Distributed Processing Symposium (IPDPS'01), page 10046.2, Washington, DC, USA, 2001. IEEE Computer So ciety. F. Wu, L. Zhang, and B. A. Hub erman. Truth-telling Reservations. http://arxiv.org/abs/cs/0508028, 2005. L. Xiao, Y. Zhu, L. M. Ni, and Z. Xu. Gridis: An incentive-based grid scheduling. In IPDPS '05: Proceedings of the 19th IEEE International Paral lel and Distributed Processing Symposium (IPDPS'05) - Papers, page 65.2, Washington, DC, USA, 2005. IEEE Computer So ciety. 95