ML: answers HW5

1. The variance is given by the formula
σ2 = ∑ (Xi – μ)2 ⋅ P(Xi)
many of you didn’t complete the process.

2. This problem does use a t-test. A t-test is used when you don’t know the variance and you are estimating it from the sample. Since I said sample standard deviation, that would apply. Otherwise, you would use a normal distribution.

4. Normalization is about scaling each attribute to the same range.

5. the sum of squared distances (Euclidean distance) is a good distance function. In the case of binary attributes, this is just the number of attributes that are different.