hw2

tags data

Support(L) ≥ minimum support threshold.
Since S is a subset of L, it means that any transaction containing all items in S also contains all items in L.
Support(S) ≥ Support(L) (because S includes all transactions that L includes)
Therefore, Support(S) ≥ minimum support threshold.
This proves that S is also a frequent itemset.

If I is frequent in D, it must have a support count greater than or equal to the minimum support count threshold (minsup) for frequent itemsets in D.
If I is not frequent in any partition Pi, it must have a support count less than the minsup in each partition.
Any itemset that is frequent in the original database D must be frequent in at least one partition of D.

6.6

min_support=0.6
min_confi=0.8

1-Itemsets

2-Itemsets

3-Itemsets

\text{frequent itemsets}: \{\{E\},\{ K\},\{ M\},\{ O\},\{ Y\},\{ EK\},\{ EO\}, \{KM\}, \{KO\}, \{KY\}, \{EKO\}\}

itemsets	condition	support $\boldsymbol{\geq}$ 0.6 itemsets	frequent itemsets
e	{k:4}	{k:4}	$\{E,K\}$
m	{e,k:2},{k:1}	{k:3}	$\{M,K\}$
o	{k,e,m:1},{k,e:2}	{k,3}{e:3}	$\{O,K\},\{O,E\},\{O,E,K\}$
y	{k,e,m:1},{k,e,o:1},{k,m:1}	{k:3}	$\{Y,K\}$

\begin{aligned}\\ \text{frequent itemsets}: &\{\{E:5\},\{ K:4\},\{ M:3\},\{ O:3\},\{ Y:3\},\\ &\{ E,K:4\},\{ E,O:3\}, \{K,M:3\}, \{K,O:3\}, \{K,Y:3\}, \\ &\{E,K,O:3\}\} \end{aligned}\\

By query 2 time to build FP-tree reduce the time to query database .So FP-growth is more efficient compared to a priori.

frequent itemsets	support	Confidence
$\{K,O \}\rightarrow \{E \}$	0.6	1.0
$\{E,O \}\rightarrow \{K \}$	0.6	1.0

hot	dogs	!(hot dogs)	total
hamburgers	2000	500	2500
!(hamburgers)	1000	1500	2500
Total	3000	2000	5000

\text{support(hot dogs , hamburgers)}=P(\text{hot dogs}\cap \text{hamburgers} )=0.4

\text{confidence(hot dogs} \rightarrow \text{hamburgers)}=\frac{P(\text{hot dogs}\cap \text{hamburgers} )}{P(\text{hamburgers})}=0.8

0.4>0.25 \text{ and } 0.8 >0.5 \text{ so it is a strong rule}

lift(\text{hot dogs}\rightarrow\text{hamburgers})=\frac{\frac{2000}{5000}}{\frac{3000}{5000}\frac{2500}{5000}}=\frac{4}{3}\\ \frac{4}{3}>1 \text{ so positively correlated}

\begin{aligned}\\ AllConf(a,b)&=\frac{P(a \& b)}{\max(P(a),P(b))}&=0.666\\ MaxConf(a,b)&=\max(P(a|b),P(b|a))&=0.8&\\ Kulc(a,b)&=\frac{1}{2}(P(b|a)+P(a|b))&=0.7333&\\ Cosine(a,b)&=\frac{a \cdot b}{|a|\times |b|}=\frac{2000}{\sqrt{2500\times 3000}}&=0.730&\\ Lift(a,b)&=\frac{P(a\& b)}{P(a)P(b)}&=1.333&\\ \end{aligned}\\