The Bayesian Scoring

The formula for updating probability given a clue for a function F from a protein feature (referred as feature-clue):

p(F|feature-clue) = p(F) * p(feature-clue|F) / Z

p(F|feature-clue) = The posterior probability or the updated probability
p(F) = The prior probability
p(feature-clue|F) = The likelihood of function F given a clue from protein feature
Z = Normalization constant, Summation [ p(feature-clue|F) * p(F) ]

We assume that if we analyze the protein, we would be able to find certain descriptors in proteins called "features" which can give us clues to a set of functions. These features could be the sequence of the protein, the fold, motifs or functional linkages etc. An estimate of the strength of the clue to a function is given by an evidence value (essentially a weight). All such evidences are weighted using the Bayesian theorem to arrive at an updated weight for a function. The final result is a set of function and weights corresponding to the evidence found in the structure.

A dummy example how it works

Clue 1Clue 2Clue 3
WeightFunctionWeightFunctionWeightFunction
0.45Function 10.50Function 10.33Function 1
0.45Function 20.19Function 20.33Function 2
0.10Function 30.31Function 30.33Function 3


Prior probability is 1/3 for the first step; equiprobable.