OLS estimate
When we have observed outcomes (\(Y_i\)) and observed treatment (\(D_i\)) status, we can regress \(Y_i\) on \(D_i\): \(Y_i = a + \tau D_i + \epsilon\) to estimate \(\tau\).
In condition of binary \(D_i\), we can find that \(\hat{\tau}^{OLS} = \bar{Y_t}-\bar{Y_c}\) where \(\bar{Y_t}\) is the average outcomes in treat group and ${Y_c} is the average outcomes in control group.
Proof.
\[ \begin{align} D_i &\in \{1,0\} \\ &\implies \begin{cases} \bar{Y_T}&=\frac{1}{N_T}\sum Y_i D_i &\iff \sum Y_i D_i = N_T \bar{Y_T} \\ \bar{Y_C}&=\frac{1}{N-N_T}\sum Y_i (1-D_i) &\iff \sum Y_i = N \bar{Y} - N_T \bar{Y_C} + \sum Y_i D_i = N \bar{Y} - N_T \bar{Y_C} + N_T \bar{Y_T} \\ N_T &= \sum D_i \\ \end{cases} \tag{1} \end{align} \]
Then:
\[ \begin{align} \hat{\tau}^{OLS} &= \frac{\sum(Y_i - \bar{Y})(D_i - \bar{D})}{\sum(D_i-\bar{D})^2} \\ & \text{where} \quad \bar{x} = \frac{1}{N} \sum{x_i} \\ &= \frac{\sum(Y_i D_i -D_i\bar{Y}-Y_i\bar{D}+\bar{Y}\bar{D})}{\sum (D_i^2 + \bar{D}^2 - 2D_i\bar{D})} \\ & \text{summation rule} \\ &= \frac{\sum Y_i D_i - \sum D_i\bar{Y}- \sum Y_i\bar{D}+ \sum \bar{Y}\bar{D} } {\sum D_i^2 + \sum \bar{D}^2 - \sum 2D_i\bar{D}} \\ & \text{given} \sum x_i\bar{C}=\bar{C}\sum{x_i}= \frac{1}{N} \cdot \sum C_i \sum x_i \text{and}\sum\bar{C}=N \cdot \bar{C} \\ &= \frac{\sum Y_i D_i - 1/N\sum Y_i \sum D_i - 1/N\sum Y_i \sum D_i + N \cdot 1/N \cdot \sum\bar{Y} \cdot 1/N \cdot \sum \bar{D}} {\sum D_i^2 + N \cdot (1/N \sum D_i)^2 - 2 \cdot 1/N\sum D_i \sum D_i} \\ &= \frac{\sum Y_i D_i - 1 /N \sum Y_i \sum D_i}{\sum D_i^2 - 1/N(\sum D_i)^2} \\ & \text{both fractions} \times N \\ &= \frac{N\sum Y_i D_i - \sum Y_i \sum D_i}{N\sum D_i^2 - (\sum D_i)^2} \\\ & \text{apply equation (1):} \\ &= \frac{N \cdot N_T \bar{Y_T} - N_T \sum Y_i }{N \cdot N_T - (N_T)^2} \\ &= \frac{N\bar{Y_T}-N\bar{Y_C}+N_T\bar{Y_C} - N_T \bar{Y_T}}{N-N_T} \\ &= \bar{Y_T}-\bar{Y_C} \end{align} \tag{2} \]
when sample large enough: \[ \begin{align} \lim_{p \to \infty} \hat \tau^{OLS} =& \tau^{OLS} \\ =& \mathbb{E}[Y_i \mid D_i = 1] - \mathbb{E}[Y_i \mid D_i = 0] \\ =& \mathbb{E}[Y_{1i} \mid D_i = 1] - \mathbb{E}[Y_{0i} \mid D_i = 0] \\ =& \underbrace{\mathbb{E}[Y_{1i} - Y_{0i} \mid D_i = 1]}_{ATT} + \underbrace{\mathbb{E}[Y_{0i} \mid D_i =1] - \mathbb{E}[Y_{0i} \mid D_i = 0]}_{\text{selection bias}} \\ =& \underbrace{\mathbb{E}[Y_{1i} - Y_{0i} \mid D_i = 0]}_{ATC} + \underbrace{\mathbb{E}[Y_{1i} \mid D_i =1] - \mathbb{E}[Y_{1i} \mid D_i = 0]}_{\text{selection bias}}\\ =& \underbrace{\mathbb{E}[Y_{1i} - Y_{0i}]}_{ATE} \\ &+ \underbrace{\mathbb{E}[Y_{0i} \mid D_i =1] - \mathbb{E}[Y_{0i} \mid D_i = 0]}_{\text{selection bias}} \\ &+ \{1-\Pr[D_i=1]\} \cdot \underbrace{ \{\mathbb{E}[Y_{1i} - Y_{0i} \mid D_i = 1]- \mathbb{E} [Y_{1i} - Y_{0i} \mid D_i = 0]\}}_{ATT-ATC} \tag{3} \end{align} \]
The last term ((3)) shows the difference of causal effect in both group when. (2) use it
Because