潜在结果框架

实验研究

For unit i, we have pretreatment covariates Xi, a binary treatment indicator Di, and an observed outcome Yi with two potential outcomes Yi(1) and Yi(0) under the treatment and control, respectively. For simplicity, we assume

{Di,Xi,Yi(0),Yi(1)}i=1niid{D,X,Y(0),Y(1)}

So we can drop the subscript i for quantities of this population.

Definition Potential outcome of Y

Y={Y(0), if D=0Y(1), if D=1

Definition Treatment effect of unit i

TEY(0)Y(1)

Definition Average treatment effect on the treated

ATTE[TED=1]=E[Y(1)Y(0)D=1]

Definition Average treatment effect on the untreated

ATUE[TED=0]=E[Y(1)Y(0)D=0]

Definition Average Treatment Effect

ATEE[TE]=E[Y(1)Y(0)]=ATT×P(D=1)+ATU×P(D=0)

Definition 差分被估量(naive estimand)

Δ=E[YD=1]E[YD=0]=E[Y(1)D=1]E[Y(0)D=0]

比较 ΔATTATU 的差异

ΔATT=E[Y(0)D=1]E[Y(0)D=0]ΔATU=E[Y(1)D=1]E[Y(1)D=0]

这说明如果处理组和对照组的潜在结果不同,估计组平均处理效应时就会存在选择偏差

比较 ΔATE 的差异

ΔATE=ΔATT×P(D=1)ATU×P(D=0)=(ΔATT)+(ATTATU)×P(D=0)=(ΔATU)+(ATUATT)×P(D=1)

两个等式对应两种分解方法,分解为相应组的选择偏差异质处理效应

引入独立性假设(Independence Assuption, IA

{Y(0),Y(1)}D

即是否接受处理是随机的,则有

Δ=ATT=ATU=ATE

此时,选择偏差和异质处理效应全部消失。

观察研究

Definition Conditional average treatment effect on the treated

ATT(X)E[Y(1)Y(0)D=1,X]

Definition Conditional average treatment effect on the untreated

ATU(X)E[Y(1)Y(0)D=0,X]

Definition Conditional average Treatment Effect

ATE(X)E[Y(1)Y(0)X]=ATT(X)×P(D=1X)+ATU(X)×P(D=0X)

Definition 条件差分被估量(naive estimand)

Δ(X)=E[YD=1,X]E[YD=0,X]=E[Y(1)D=1,X]E[Y(0)D=0,X]

比较 Δ(X)ATT(X)ATU(X) 的差异

Δ(X)ATT(X)=E[Y(0)D=1,X]E[Y(0)D=0,X]Δ(X)ATU(X)=E[Y(1)D=1,X]E[Y(1)D=0,X]

比较 Δ(X)ATE(X) 的差异

Δ(X)ATE(X)=Δ(X)ATT(X)×P(D=1X)ATU(X)×P(D=0X)=[Δ(X)ATT(X)]+[ATT(X)ATU(X)]×P(D=0X)=[Δ(X)ATU(X)]+[ATU(X)ATT(X)]×P(D=1X)

引入条件独立性假设(Conditional Independence Assuption, CIA

{Y(0),Y(1)}DX

即给定 X 后是否接受处理是随机的,则有

Δ(X)=ATT(X)=ATU(X)=ATE(X)

此时,选择偏差和异质处理效应全部消失。

根据定义易知

ATT=ATT(X=x)P(X=xD=1)ATU=ATU(X=x)P(X=xD=0)ATE=ATE(X=x)P(X=x)

因此,在 CIA 假设下使用 Δ(X) 即可估计任意因果效应。

Attention

条件差分被估量 Δ(X) 的存在要求 X 划分任意的层同时包含处理组和对照组,这被称为共同支撑条件(common support condition),记作 0<P(D=1X)<1。事实上,共同支撑条件在 X 维数很高时难以满足,这为倾向得分的出现埋下了伏笔。