By J. Vaidya, et al.

**Read or Download Privacy Preserving Data Mining PDF**

**Best mining books**

**Additional resources for Privacy Preserving Data Mining**

This method faces an obvious problem if sites collude. Sites P/_i and P/_l-i can compare the values they send/receive to determine the exact value for xi. T h e method can be extended to work for an honest majority. Each site divides xi into shares. The sum for each share is computed individually. However, the p a t h used is permuted for each share, such t h a t no site has the same neighbor twice. To compute x/, the neighbors of P/ from each iteration would have to collude. Varying the number of shares varies the number of dishonest (colluding) parties required to violate security.

Probabihties are computed diflPerently for nominal and numeric attributes. Nominal Attributes: For a nominal attribute X with r possible attributes values x i , . . , X r , the probability P{X = Xk\cj) = — where n is the total number of training examples for which C = Cj, and rij is the number of those training examples that also have X = XkNumeric Attributes: In the simplest case, numeric attributes are assumed to have a "normal" or "Gaussian" probability distribution. 5) The mean /i and variance a^ are calculated for each class and each numeric attribute from the training set.

Since there are no drivers under age 16 (as determined from the reconstructed distribution), a driver whose age is given as 1 in the "privacy-preserving" dataset is known to be 16 years old - all privacy for this individual is lost. 1). Specifically, if a random variable Y has entropy H{Y), the privacy is 2^^^\ This has the nice property that for a uniform distribution, the privacy is equivalent to the width of the interval from which the random value is chosen. This gives a meaningful way to compare different noise distributions.