PROBABILISTIC MODELING OF SYSTEMATIC ERRORS IN TWO-HYBRID EXPERIMENTS

DAVID SONTAG

ROHIT SINGH

BONNIE BERGER

Computer Science and Artificial Intel ligence Laboratory Massachusetts Institute of Technology Cambridge MA 02139 E-mail: {dsontag, rsingh, bab}@mit.edu

We describe a novel probabilistic approach to estimating errors in two-hybrid (2H) experiments. Such experiments are frequently used to elucidate protein-protein interaction networks in a high-throughput fashion; however, a significant challenge with these is their relatively high error rate, specifically, a high false-positive rate. We describe a comprehensive error model for 2H data, accounting for both random and systematic errors. The latter arise from limitations of the 2H experimental protocol: in theory, the reporting mechanism of a 2H experiment should be activated if and only if the two proteins being tested truly interact; in practice, even in the absence of a true interaction, it may be activated by some proteins ­ either by themselves or through promiscuous interaction with other proteins. We describe a probabilistic relational model that explicitly models the above phenomenon and use Markov Chain Monte Carlo (MCMC) algorithms to compute both the probability of an observed 2H interaction being true as well as the probability of individual proteins being self-activating/promiscuous. This is the first approach that explicitly models systematic errors in protein-protein interaction data; in contrast, previous work on this topic has modeled errors as being independent and random. By explicitly modeling the sources of noise in 2H systems, we find that we are better able to make use of the available experimental data. In comparison with Bader et al.'s method for estimating confidence in 2H predicted interactions, the proposed method performed 5-10% better overall, and in particular regimes improved prediction accuracy by as much as 76%. Supplementary Information: http://theory.csail.mit.edu/probmod2H

1. Intro duction
The fundamental goal of systems biology is to understand how the various components of the cellular machinery interact with each other and the environment. In pursuit of this goal, experiments for elucidating proteinprotein interactions (PPI) have proven to be one of the most powerful tools available. Genome-wide, high-throughput PPI experiments have started to
 These  Corresp onding

authors contributed equally to the work author  Also in the MIT Dept. of Mathematics


provide data that has already been used for a variety of tasks: for predicting the function of uncharacterized proteins; for analyzing the relative importance of proteins in signaling pathways; for new perspectives in comparative genomics, by cross-species comparisons of interaction patterns etc. Unfortunately, the quality of currently available PPI data is unsatisfactory, which limits its usefulness to some degree. Thus, techniques that enhance the availability of high-quality PPI data are of value. In this paper, we aim to improve the quality of experimentally available PPI data by identifying erroneous datapoints from PPI experiments. We attempt to move beyond current one-size-fits-all error models that ignore the experimental source of a PPI datapoint; instead, we argue that a better error model will also have components tailored to account for the systematic errors of specific experimental protocols. This may help achieve higher sensitivity without sacrificing specificity. This motivated us to design an error model tailored to one of the most commonly-used PPI experimental protocols. We specifically focus on data from two hybrid (2H) experiments6,4 , which are one of the most popular high-throughput methods to elucidate proteinprotein interaction. Data from 2H experiments forms the ma jority of the known PPI data for many species: D. melanogaster, C. elegans, H. sapiens etc. However, currently available 2H data also has unacceptably high false-positive rates: von Mering et al. estimate that more than 50% of 2H interactions are spurious11 . These high rates of error seriously hamper the ability to perform analyses of the PPI data. As such, we believe an error model that performs better than existing models -- even if it is tailored to 2H data -- is of significant practical value, and may also serve as an example for the development of error models for other biological experiments. Ideally, the reporting mechanism in a 2H experiment is activated if and only if the pair of proteins being tested truly interact. As in most experimental protocols, there are various sources of random noise. However, there are also systematic, repeatable errors in the data, originating from limitations in the 2H protocol. In particular, there exist proteins that are disproportionately prone to be part of false-positive observations (Fig. 1). It is thought that these proteins either activate the reporting mechanism by themselves or promiscuously bind with many other proteins in the particular setup (promiscuous binding is an experimental artifact-- it does not imply a true interaction under plausible biological conditions). Contributions: The key contribution of this paper is a comprehensive error model for 2H experiments, accounting for both random as well as systematic errors, which is guided by insights into the systematic errors of the 2H experi-


True Positive
A B

Actual

A

B

False Negative Actual
A
B

A

B

2H Experiment Signal Present

2H Experiment No Signal

True Negative
A
B

Actual A 2H Experiment

B

False Positive
A
B

Actual A 2H Experiment

B

No Signal
Reporter gene Promoter region of the reporter gene

Signal Present

The origin of systematic errors in 2H data. The cartoons shown above demonstrate the mechanism of 2H experiments. Protein A is fused to the DNA binding domain of a particular transcription factor, while protein B is fused to the activation domain of that transcription factor. If A and B physically interact then the combined influence of their respective enhancers results in the activation of the reporter gene. Systematic errors in such experiments may arise: false negatives occur when two proteins which interact in-vivo fail to activate the reporter gene under experimental conditions. False positives may occur due to proteins which trigger the reporting mechanism of the system, either by themselves (self-activation) or by spurious interaction with other proteins (promiscuity). Spurious interaction can occur when a protein is grossly over-expressed. In the above figure, protein A in the lower right panel is such a protein: it may either promiscuously bind with B or activate the reporting mechanism even in the absence of B.

Figure 1:

mental protocol. We believe this is the first model to account for both sources of error in a principled manner; in contrast, previous work on estimating error in PPI data has assumed that the error in 2H experiments (as in other experiments) is independent and random. Another contribution of the paper are estimates of proteins especially likely to be self-activating/promiscuous (see Supp. Info.). Such estimates of "problem proteins", may enable the design of 2H experimental protocols which have lower error rates. We use the framework of Bayesian networks to encode our assumption that a 2H interaction is likely to be observed if the corresponding protein pair truly interacts or if either of the proteins is self-activating/promiscuous. The Bayesian framework allows us to represent the inherent uncertainty and the relationship between promiscuity of proteins, true interactions and observed 2H data, while using all the data available to simultaneously learn the model parameters and predict the interactions. We use a Markov Chain Monte Carlo (MCMC) algorithm to do approximate probabilistic inference in our models, jointly inferring both desired sets of quantities: the probability of interaction, and the propensity of a protein for self-activation/promiscuity. We show how to integrate our error model into the two most common


probabilistic models used for combining PPI experimental data, and show that our error model can significantly improve the accuracy of PPI prediction. Related Work: With data from the first genome-wide 2H experiments (Ito et al.6 , Uetz et al.4 ), there came the realization that 2H experiments may have significant systematic errors. Vidalain et al. have identified the presence of self-activators as one of the sources of such errors, and described some changes in the experimental setup to reduce the problem10 . Our work aims to provide a parallel, computational model of the problem, allowing postfacto filtering of data, even if the original experiment retained the errors. The usefulness of such an approach was recently demonstrated by Sun et al.2 (to reconstruct transcriptional regulatory networks). Previous computational methods of modeling systematic errors in PPI data can be broadly classified into two categories. The first class of methods5,11,8 exploits the observation that if two very different experimental setups (e.g. 2H and Co-IP) observe a physical interaction, then the interaction is likely to be true. This is a reasonable assumption to make because the systematic errors of two different experimental setups are likely to be independent. However, this approach requires multiple costly and time consuming genome-wide PPI experiments, and may still result in missed interactions, since the experiments have high false negative rates. Many of these approaches also integrate non-PPI functional genomic information, such as co-expression, co-localization, and Gene Ontology functional annotation. The second class of methods is based on the topological properties of the PPI networks. Bader et al.1 , in their pioneering work, used the number of 2H interactions per protein as a negative predictor of whether two proteins truly interact. Since the prior probability of any interaction is small, disproportionately many 2H interactions involving a particular protein could possibly be explained by it being self-activating or promiscuous. However, such an approach is unable to make fine-grained distinctions: an interaction involving a high-degree protein need not be incorrect, especially if there is support for it from other experiments. Furthermore, the high degree of a promiscuous protein in one experiment (e.g. Ito et al.'s) should not penalize interactions involving that protein observed in another experiment (e.g. Uetz et al.'s) if the errors are mostly independent (e.g. they use different reporters). Our proposed probabilistic models solve all of these problems.

2. Data Sets
One difficulty with validating any PPI prediction method is that we must have a gold standard from which to say whether two proteins interact or do


not interact. We constructed a gold standard data set of protein-protein interactions in S. cerevisiae (yeast) from which we could validate our methods. Our gold standard test set is an updated version of Bader et al.'s data. Bader et al.'s data consisted of all published interactions found by 2H experiments; data from experiments by Uetz et al.4 (the Uetz2H data set) and Ito et al.6 (the Ito2H data set) comprised the bulk of the data set. They also included as possible protein interactions all protein pairs that were of distance at most two in the 2H network. Bader et al. then used published Co-Immunoprecipitation (Co-IP) data to give labels to these purported interactions. When two proteins were found in a bait-hit or hit-hit interaction in Co-IP, they were labeled as having a true interaction. When two proteins were very far apart in the Co-IP network (distance larger than three), they were labeled as not interacting. We updated Bader et al.'s data to include all published 2H interactions through February 2006, getting our data from the MIPS7 database. We added, for the purposes of evaluation, recently published yeast Co-IP data from Krogan et al.3 . This allowed us to significantly increase the number of labeled true and false interactions in our data set. Since the goal of our algorithms is to model the systematic errors in largescale 2H experiments, we evaluated our models' performance on the test data where at least one of Uetz2H or Ito2H indicated an interaction. We were left with 397 positive examples, 2298 negative examples, and 2366 unlabeled interactions. We randomly chose 397 of the 2298 negative examples to be part of our test set. For all of the experiments we performed 4-fold cross validation on the test set, hiding one fourth of the labels while using the remaining labeled data during inference.

3. Probabilistic Mo dels
We show how to integrate our model of systematic errors into the two most common probabilistic models used for PPI prediction. Our first model is complementary to the relational probabilistic model proposed by Jaimovich et al.8 , and can be easily integrated into their approach. Our second model is an extension of Bader et al.'s, and will form the basis of our comparison. Our models also adjust to varying error rates in different experiments. For instance, while we account for random noise and false negatives in our error model for both Uetz2H and Ito2H, we only model selfactivation/promiscuity for Ito2H observations. The Uetz2H data set was smaller and included only one protein with degree larger than 20; Ito2H had 36 proteins with degree larger than 30, including one with degree as high as 285. Thus, while modeling promiscuity made a big difference for the Ito2H


data, it did not significantly affect our results on the Uetz2H data. 3.1. Generative mo del We begin with a simplified model of PPI interaction (Fig. 2). We represent the uncertainty about a protein interaction as an indicator random variable Xij , which is 1 if proteins i and j truly interact, and 0 otherwise. For each experiment, we construct corresponding random variables (RVs) indicating if i and j have been observed to interact under that experiment. Thus, Uij is the observeda random variable (RV) representing the observation from Uetz2H, and Iij is the observed RV representing the observation from Ito2H. The arrow from Xij to Iij indicates the dependency of Iij on Xij . The box surrounding the three RVs indicates that this template of three RVs is repeated for all i, j = 1, . . . , N (i.e. all pairs of proteins), where N is the number of proteins. In all models of this type, the Iij RVs are assumed to be independent of one another. If an experiment provides extra information about each observation, the model can be correspondingly enriched. For instance, for each of their observed interactions Ito et al. provide the number of times the interaction was discovered (called the number of IST hits). Rather than making Iij binary, we have it equal the number of IST hits, or 3 if IST > 3. We will refer to the portion of Ito2H observations with IST  3 as ItoCore. The model is called "generative" because the ground truth about the interaction, Xij , generates the observations in the 2H experiments, Iij and Uij . To our knowledge, all previous generative models of experimental interactions made the assumption that Iij depended only on Xij . They allowed for false positives by saying that P r(Iij > 0|Xij = 0) = f p , where f p is a parameter of their model. Similarly, they allowed for false negatives by saying that P r(Iij = 0|Xij = 1) = f n , for another parameter f n . However, these models are missing much of the picture. For example, many experiments have particular difficulty testing the interactions of proteins along the membrane. For these proteins, f n should be significantly higher. In the 2H experiment, for interactions that involve self-activating/promiscuous proteins, f p will be significantly higher. In Fig. 3, we propose a novel probabilistic model in which the selfactivating/promiscuous tendencies of particular proteins are explicitly modeled. The latent Bernoulli RV Fk is 1 if protein k is believed to be promiscuous or self-activating. In the context of our data set, this RV applies specifically to the Ito2H data; if self-activation/promiscuity in multiple exa Clear

nodes are unobserved (latent) RVs, and shaded nodes are observed RVs.


U U

X X

I I

U U U
O

O I
O

X
X

I I

F
F

X ij U ij I ij
i,j = 1,...,N

OU ij U ij

X ij

O ij
k=i,j

I

Fk
k=1,...,N

I ij
i, j = 1,...,N

Figure 2:

Generative mo del.

Figure 3:

Generative mo del, with noise variables.

H periments is to be modeled, we may introduce multiple such variables F k (for protein k and experiment H ). The Iij RV thus depends on Fi and Fj . Intuitively, Iij will be > 0 if either Xij = 1 or Fk = 1. As we show later in the Results section, this model of noise is significantly more powerful than the earlier model, because it allows for the "explaining away" of false positives in Ito2H. Furthermore, it allows evidence from data sets other than Ito2H to influence (through the Xij RVs) the determination of the Fk RVs. U I We also added the latent variables Oij and Oij , which will be 1 if the Uetz et al. and Ito et al. experiments, respectively, have the capacity to observe a possible interaction between proteins i and j . These RVs act to explain away the false negatives in Uetz2H and Ito2H. We believe that these RVs will be particularly useful for species where we have relatively little PPI data. The distributions in these models all have Dirichlet priors () with associated hyperparameters  (see Supp. Info. for more details). There are many advantages to using the generative model described in this section. First, it can easily handle missing data without adding complexity to the inference procedure. This is important for when integrating additional experimental data into the model. Suppose, for example, that we use gene expression correlation as an additional signal of protein interaction, by introducing new RVs Eij (indicating coexpression of genes i and j ) and corresponding edges Xij  Eij . If, for a pair of proteins, the coexpression data is unavailable, we simply omit the corresponding Eij from this model. In Bader et al.'s model, and the second model that we propose below, we would need to integrate over possible values of the missing datapoint, a potentially complicated task. Second, this generative model can be easily extended: e.g., we could easily combine this model with Jaimovich et al.'s in order to model the common occurrence of transitive closure in PPIs.


µX

X

I I

F F

X

I ij
U ij I ij
C

L ij

A ij

D ij

U ij
X ij
i ,j = 1,...,N

L ij X ij

I ij

T

k=i,j

Fk
k=1,...,N

i,j = 1,...,N

Bader et al.'s logistic regression mo del (BaderLR).

Figure 4:

Our Bayesian logistic mo del, with noise variables (BayesLR).

Figure 5:

3.2. Bayesian logistic mo del In Fig. 4 we show Bader et al.'s model (BaderLR); it includes three new variables in addition to the RVs already mentioned, whose values are pre-calculated using the 2H network. Two of these encode topological information: variable Aij is the number of adjacent proteins in common between i and j , and variable Dij is ln(di + 1) + ln(dj + 1), where di is the degree of protein i. Variable Lij is an indicator variable for whether this protein interaction has been observed in any low-throughput experiments. In Bader C et al.'s model, Iij is an indicator variable representing whether the interaction between proteins i and j was in the ItoCore data set (IST  3). X ij 's conditional distribution is given by the logistic function: p(Xij = 1) = 1 1 + exp (woffset + Uij wU +
C Iij wI

+ Lij wL + Aij wA + Dij wD )

.

The weights w are discriminatively learned using the Iterative Re-weighted Least Squares (IRLS) algorithm, which requires that all of the above quantities are observed in the training data. In Fig. 5 we propose a new model (BayesLR), with two significant differences. First, we no longer use the two proteins' degree, Dij , and instead integrate our noise model in the form of the Fk random variables. Second, instead of learning the model using IRLS, we assign the weights uninformative priors and do inference via Markov Chain Monte Carlo (MCMC). This will T be necessary because Xij will have an unobserved parent, Iij . The new RV T Iij will be 1 when the Ito et al. experiment should be considered for predict¬ F ing Xij . Intuitively, its value should be (Iij > 0) (Fi j ). However, to T allow greater flexibility, we give the conditional distribution for Iij a Dirich-


let prior, resulting in a noisy version of the above logical expression. The RVs Oij are not needed in this logistic model because the parameterization of the Xij conditional distribution induces a type of noisy OR distribution in the posterior. Thus, logistic models can easily handle false negatives. Because we wanted to highlight the advantages of modeling the experimental noise, we omitted Aij (one-hop) from both the models, BayesLR and BaderLR. The one-hop signal, gene expression, co-localization, etc. can be easily added to any of the models to improve their prediction ability. 3.3. Inference As is common in probabilistic relational models, the parameters for the conditional distributions of each RV are shared across all of their instances. For example, in the generative model, the prior probability P r(Xij = 1) is the same for all i and j . With the exception of Xij in BayesLR, we gave all the distributions a Dirichlet prior. In BayesLR, the conditional distribution of Xij is the logistic function, and its weights are given Gaussian priors with 2 mean µX = 0 and variance X = .01. Note that by specifying these hyper2 parameters (e.g. µX , X ), we never need to do learning of the parameters (i.e., weights). Given the relational nature of our data, and the relatively small amount of it, we think that this Bayesian approach is well-suited. We prevent the models from growing too large by only including protein pairs where at least one experiment hinted at an interaction. We used BUGS9 to do inference via Gibbs sampling. We ran 12 MCMC chains for 6000 samples each, from which we computed the desired marginal posterior probabilities. The process is simple enough that someone without much knowledge of machine learning could take our probabilistic models (which we provide in the Supplementary Information) and use them to interpret the results of their 2H experiments. We also tried using loopy belief propagation instead of MCMC to do approximate inference in the generative model of Fig. 3. These results (see Supp. Info.) were very similar, showing that we are likely not being hurt by our choice of approximate inference method. Furthermore, our implementation of the inference algorithm (in Java) takes only seconds to run, and would easily scale to larger problems.

4. Results
We compared the proposed Bayesian logistic model (BayesLR) with the model based on Bader et al.'s work (BaderLR). Both models were trained and tested on the new, updated version of Bader et al.'s gold standard data set. We show in Fig. 6 that BayesLR achieves 5-10% higher accuracy


at most points along the ROC curve. We then checked to see that the improvement was really coming from the noise model, and not just from our use of unlabeled data and MCMC. We tried using a modified BayesLR model (called Bayesian Bader) which has Dij RVs instead of the noise model, and which uses ItoCore instead of Ito2H. As expected, it performed the same as BaderLR. We also tried modifying this model to use Ito2H, and found that the resulting performance was much worse. Investigating this further, we found that the average maximum a posteriori (MAP) weights for BayesLR were {wU = -2.32, wL = -10.85, wI = -4.26, and woffset = 7.34}. The weight corresponding to Ito2H is almost double the weight for Uetz2H. Interestingly, this is a similar ratio of weights as would be learned had we only used the ItoCore data set, as in BaderLR. In the last of the above-mentioned experiments, the MAP weight for Ito2H was far smaller than the weight for Uetz2H, which indicates that Uetz2H was a stronger signal than Ito2H. Overall, these experiments demonstrate that we can get significantly better performance using data with many false positives (Ito2H) and a statistical model of the noise than by using prefiltered data (ItoCore) and no noise model. In all regimes of the ROC curve, BayesLR performs at least as well as BaderLR; in some, it performs significantly better (Fig. 8). The examples that follow demonstrate the weaknesses inherent in BaderLR and show how the proposed model BayesLR solves these problems. When IRLS learns the weight for the degree variable (in BaderLR), it must trade off having too high a weight, which would cause other features to be ignored, and having too low a weight, which would insufficiently penalize the false positives caused by self-activation/promiscuity. In BaderLR, a high degree Dij penalizes positive predictors from all the experiments (Uij , Iij , Lij ). However, the degree of a protein in a particular experiment (say, Ito et al.'s) only gives information about self-activation/promiscuity of the protein in that experiment. Thus, if a protein has a high degree in one experiment, even if that experiment did not predict an interaction (involving some other protein), the degree will negatively affect any predictions made by other experiments on that protein. Our proposed models solve this problem by giving every experiment a different noise model, and by having each noise model be conditionally independent given the Xij variables. Thus, we get the desired property that noise in one experiment should not affect the influence of other experiments on the Xij variables. Fig. 8(a) illustrates this by showing the prediction accuracy for the test points where Dij > 4 and Uij = 1 or Lij = 1 (called the `medium' degree


1

1

0.8 True positive rate True positive rate Bayesian LR with noise model Bader Bayesian Bader Bayesian Bader with full Ito Random 0 0.2 0.4 0.6 False positive rate 0.8 1

0.8

0.6

0.6

0.4

0.4 Noise model (Ito ISTs) No noise variables (Ito Core) No noise variables (Ito ISTs) No noise variables (Ito Naive) Random 0 0.2 0.4 0.6 False positive rate 0.8 1

0.2

0.2

0

0

Figure 6:
mo dels.

Comparison of logistic

Figure 7: Comparison of generative mo dels.
1 0.8

1 0.8 True positive rate 0.6 0.4 0.2 0 0 0.2 0.4 0.6 False positive rate 0.8 1 True positive rate Bayesian LR with noise model Bader Random

0.6 0.4 0.2 0 0 0.2 0.4 0.6 False positive rate 0.8 1

Bayesian LR with noise model Bader Random

(a)
1 0.8 True positive rate 0.6 0.4 0.2 0 0

Medium degree and Uetz2H =1 or

(b)
1 0.8 True positive rate 0.6 0.4 0.2 0

High degree (107 neg and 50 p os)

Lit=1 (29 neg and 115 p os)

Bayesian LR with noise model Bader Random 0.2 0.4 0.6 False positive rate 0.8 1

Bayesian LR with noise model Bader Random 0 0.2 0.4 0.6 False positive rate 0.8 1

(c)

No ItoCore (342 neg and 211 p os)

(d)

No signal (286 neg and 64 p os)

Examples of regimes where the noise mo del is particularly helpful. In parentheses we give the numb er of test cases that fall into each category.

Figure 8:

range). When the degree of a protein is very high, BaderLR will always classify interactions involving it as false positives. Fig. 8(b) shows the setting of Dij > 6b . With a false positive rate of less than 1%, BaderLR detects
b Recall

that Dij is on a log-scale, and is the sum for both proteins.


42% of the true interactions, while BayesLR detects 74% of the true interactions, a 76% improvement. Bader et al. found that they got better performance by using only a subset (where IST  3) of the interactions in Ito2H. Our noise model allows us to make use of all of the predicted interactions, without hurting our overall results. As a result, our predictions for the proteins pairs where Bader et al.'s model ignored Ito2H's interactions (i.e. IST < 3) are highly more accurate. This is illustrated in Fig. 8(c). Finally, we show in Fig. 8(d) that at the very extreme when neither ItoCore, nor the low-throughput 2H experiments (Lit), nor Uetz2H showed an interaction, we can still make meaningful predictions, using a combination of the noise model and the observed interactions in Ito et al. where IST < 3. We next compared the various generative models, with the results shown in Fig. 7. Naively implementing the generative model of Fig. 2, using an indicator variable for whether the interaction was observed in Ito2H, results in the worst performance. Changing the indicator variable to a discretized IST count significantly improves performance. Using our noise model (i.e. the model from Fig. 3) provides further improvements, especially in the lower left corner, where the previous two had performed poorly. However, if we remove the noise model and instead pre-filter the data as Bader et al. did, using an indicator variable for whether IST  3 in Ito2H, we can get almost as good performance using the simple generative model of Fig. 2. The noise model still does better in the upper half of the ROC curve, which is arguably where it matters the most. It is also interesting that our noise model is able to recover the accuracy of the hand-filtered IST 3 criterion. We then applied the BayesLR model to the full data set to identify proteins in the Ito2H data which are likely to be self-activating/promiscuous (see Supplementary Information). As expected, most proteins with high degree in Ito2H (e.g. YPR086W, degree 99) had a high probability of being self-activating/promiscuous. However, three of the proteins with high degree (YER022W, degree 98; YGL127C, degree 68; and YGR218W, degree 34) had very low probabilities. These differences in promiscuity estimates make sense: for example, there were no positive labeled 2H examples involving YPR086W, while there were five involving YER022W. This propagation of information is precisely what we hoped to capture by using our Bayesian framework. When applying this model to new species where no labeled data is available, the inclusion of additional signals (e.g. co-expression) should result in the same effect. (Note that when no labeled data is available, it might be helpful to fix the model parameters to their MAP values from experiments on related species.)


5. Conclusion
In this paper, we have presented a principled approach to modeling the random and systematic sources of error in two-hybrid experiments, and showed how to integrate our noise models into the two most common probabilistic models for integrating PPI data. Comparisons with previous work demonstrate that explicit modeling of the sources of error can improve proteinprotein interaction prediction, making better use of experimental data. Future work could involve discriminative training of the generative models, investigation of systematic sources of noise in other biological experiments such as Co-IP, and applying noise models to the Markov networks of Jaimovich et al. and possibly even in a first-order probabilistic model, where more intricate properties of proteins can be described and jointly predicted.
Acknowledgments: The authors thank Chris Bakal, Leslie Kaelbling, Luke Zettlemoyer, Dan Roy, and Tommi Jaakkola for useful comments. D.S. and R.S were partially supported by a NSF Graduate Research Fellowship and NSF grant ITR (ASE + NIH)-(dms)-0428715, respectively.

References
1. J.S. Bader et al. Gaining confidence in high-throughput protein interaction networks. Nat Biotech, 22(1):78­85, January 2004. 2. N. Sun et al. Bayesian error analysis model for reconstructing transcriptional regulatory networks. PNAS, 103(21):7988­7993, 2006. 3. Nevan J. Krogan et al. Global landscape of protein complexes in the yeast saccharomyces cerevisiae. Nature, 2006. 4. Peter Uetz et al. A comprehensive analysis of protein-protein interactions in saccharomyces cerevisiae. Nature, 403(6770):623­627, February 2000. 5. R. Jansen et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science, 302(5644):449­453, October 2003. 6. Takashi Ito et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. PNAS, 98(8):4569­4574, 2001. 7. U. Guldener et al. Cygd: the comprehensive yeast genome database. Nucleic Acids Research, 33:Database issue:D364­8, 2005. 8. Ariel Jaimovich, Gal Elidan, Hanah Margalit, and Nir Friedman. Towards an integrated protein-protein interaction network: A relational Markov network approach. Journal of Computational Biology, 13(2):145­164, 2006. 9. D. J. Lunn, A. Thomas, N. G. Best, and D. J. Spiegelhalter. Winbugs ­ a Bayesian modelling framework: concepts, structure and extensibility. Statistics and Computing, 10:321­333, 2000. 10. Pierre-Olivier Vidalain, Mike Boxem, Hui Ge, Siming Li, and Marc Vidal. Increasing specificity in high-throughput yeast two-hybrid experiments. Methods, 32:363­370, 2004. 11. Christian von Mering et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature, 417:399­403, 2002.