tags data
2023 Educational Data Mining and Applications HW1.pdf
Five number summary: min, Q1, median, Q3, max
| min | Q1 | median | Q3 | max |
|---|
| 13 | 20 | 25 | 35 | 70 |
following 2D data set
x={1.4,1.6}
| A1 | A2 |
|---|
| x1 | 1.5 | 1.7 |
| x2 | 2.0 | 1.9 |
| x3 | 1.6 | 1.8 |
| x4 | 1.2 | 1.5 |
| x5 | 1.5 | 1.0 |
d(x,y)=i=0∑n∣xi−yi∣
| A1 | A2 | distance |
|---|
| x | 1.4 | 1.6 | 0 |
| x1 | 1.5 | 1.7 | 0.2 |
| x4 | 1.2 | 1.5 | 0.3 |
| x3 | 1.6 | 1.8 | 0.4 |
| x5 | 1.5 | 1.0 | 0.5 |
| x2 | 2.0 | 1.9 | 0.9 |
d(x,y)=2i=0∑n(xi−yi)2
| A1 | A2 | distance | rank |
|---|
| x | 1.4 | 1.6 | 0 | |
| x1 | 1.5 | 1.7 | 0.141 | 1 |
| x2 | 2.0 | 1.9 | 0.670 | 5 |
| x3 | 1.6 | 1.8 | 0.282 | 3 |
| x4 | 1.2 | 1.5 | 0.223 | 2 |
| x5 | 1.5 | 1.0 | 0.508 | 4 |
d(x,y)=i=0maxn∣xi−yi∣
| A1 | A2 | distance | rank |
|---|
| x | 1.4 | 1.6 | 0 | |
| x1 | 1.5 | 1.7 | 0.1 | 1 |
| x2 | 2.0 | 1.9 | 0.6 | 5 |
| x3 | 1.6 | 1.8 | 0.2 | 3 |
| x4 | 1.2 | 1.5 | 0.2 | 2 |
| x5 | 1.5 | 1.0 | 0.4 | 4 |
d(x,y)=∣∣x∣∣×∣∣y∣∣x⋅y
| A1 | A2 | similarity | rank |
|---|
| x | 1.4 | 1.6 | 0 | |
| x1 | 1.5 | 1.7 | 0.9999 | 1 |
| x2 | 2.0 | 1.9 | 0.9957 | 3 |
| x3 | 1.6 | 1.8 | 0.9999 | 2 |
| x4 | 1.2 | 1.5 | 0.9990 | 5 |
| x5 | 1.5 | 1.0 | 0.9653 | 4 |
normalize(x)=2∑i=0n(xi)2x
| A1 | A2 | distance | rank |
|---|
| x | 0.658 | 0.752 | 0 | |
| x1 | 0.661 | 0.749 | 0.0042 | 1 |
| x2 | 0.642 | 0.789 | 0.0403 | 3 |
| x3 | 0.724 | 0.688 | 0.0919 | 4 |
| x4 | 0.664 | 0.747 | 0.0078 | 2 |
| x5 | 0.832 | 0.554 | 0.2635 | 5 |
| Bin | Data |
|---|
| Bin 1 | 13, 15, 16 |
| Bin 2 | 16, 19, 20 |
| Bin 3 | 20, 21, 22 |
| Bin 4 | 22, 25, 25 |
| Bin 5 | 25, 25, 30 |
| Bin 6 | 33, 33 ,35 |
| Bin 7 | 35, 35, 35 |
| Bin 8 | 35, 35, 36 |
| Bin 9 | 36, 40, 45 |
| Bin 10 | 46, 52, 70 |
| Bin | Smoothed Data |
|---|
| Bin 1 | 14.67, 14.67, 14.67 |
| Bin 2 | 18.33, 18.33, 18.33 |
| Bin 3 | 21.00, 21.00, 21.00 |
| Bin 4 | 24.00, 24.00, 24.00 |
| Bin 5 | 26.67, 26.67, 26.67 |
| Bin 6 | 33.67, 33.67, 33.67 |
| Bin 7 | 35.00, 35.00, 35.00 |
| Bin 8 | 35.33, 35.33, 35.33 |
| Bin 9 | 40.33, 40.33, 40.33 |
| Bin 10 | 56.00, 56.00, 56.00 |
find the outlier value using the IQR method:
IQR=Q3−Q1=35−20=15Lower Bound=Q1−1.5∗IQR=20−1.5∗15=20−22.5=−2.5Upper Bound=Q3+1.5∗IQR=35+1.5∗15=35+22.5=57.570>Upper Bound70 is the outlier value
min-max-normalizaion=max−minx−min=70−1335−13=0.3859
μ=29.96,σ=12.7z-index=σx−μ=12.735−29.96=0.3968
| age | fat |
|---|
| 23 | 9.5 |
| 23 | 26.5 |
| 27 | 7.8 |
| 27 | 17.8 |
| 39 | 31.4 |
| 41 | 25.9 |
| 47 | 27.4 |
| 49 | 27.2 |
| 50 | 31.2 |
| 52 | 34.6 |
| 54 | 28.8 |
| 56 | 33.4 |
| 57 | 30.2 |
| 58 | 34.1 |
| 58 | 32.9 |
| 60 | 41.2 |
| 61 | 35.7 |
r=i∑(xi−x^)2i∑(yi−y^)2i∑(xi−x^)(yi−y^)=29101256.7311590.6=0.8329
cov(x,y)=n1i∑((xi−E(x))(yi−E(y))=99.41