当混淆因素隐匿于幕后:因果关系的困与解
基本问题
所谓混淆因素,指的是与处理(treatment)及结果(outcome)皆有联系之变量。若不予控制,易致偏差,令我们所得结论非因果关系之实,乃混淆所致之假象。然而,倘若混淆因素不可观测(unobserved confounding),我们又该如何在未知中求解因果?
界限估计
当我们不能妄断“unconfoundedness”的条件成立,则需要借助弱假设,推导出因果效应的区间(interval)估计。这类方法一般依赖部分识别(partial identification)之思想,亦即在无法完全排除混淆时,通过宽泛但合理的假设,限定因果效应的可能范围。
如 Bounding methods 并不假设完全无混淆(unconfoundedness),而是假设混淆的影响在某特定范围内。借此,我们可以得出一个因果效应的上下界,而非唯一值之估计。这种思路可以在不严苛的假设之下,仍提供有意义的推断。
“The Law of Decreasing Credibility: the credibility of inference decreases with the strength of the assumptions maintained.”
No-Assumptions Bound
对二元(binary)之个体处理效应(ITE)而言,其最大值和最小值为
$$ -1 \leq Y_i(1)-Y_i(0) \leq 1 \qquad \text{if } \forall t,0 \leq Y(t) \leq 1 $$故平均处理效应(ATE)之区间长度也都在 $2$ 之内。
在无需任何假设的条件下,较之 ITE,ATE 的区间长度可减半,即 ATE 将落于长度为 $1$ 的区间内。
ASSUMPTION Bounder Potential Outcomes
$$ \forall t,a \leq Y(t) \leq b $$根据该假设,易得
$$ \begin{aligned} a-b \leq Y_i(1)-Y_i(0) \leq b-a \\\ a-b \leq \Bbb{E}[Y_i(1)-Y_i(0)] \leq b-a \end{aligned} $$ITE 的区间长度为 $(b-a)-(a-b)=2(b-a)$.
ATE 的区间长度可压缩一半 $(b-a)$。
Observational-Counterfactual Decomposition
首先,将 ATE 分解为
$$ \begin{aligned} \Bbb{E}[Y(1)-Y(0)]&=\Bbb{E}[Y(1)]-\Bbb[Y(0)]\\\ &=\big(P(T=1)\Bbb{E}[Y(1)|T=1]+P(T=0)\Bbb{E}[Y(1)|T=0]\big) \\\ &\quad-\big(P(T=1)\Bbb{E}[Y(0)|T=1]+P(T=0)\Bbb{E}[Y(0)|T=0]\big) \end{aligned} $$其中,首末两项可以借由观测数据估计得到,故 $Y(t)=Y$。
用 $\pi \triangleq P(T=1)$,将 Observational-Counterfactual Decomposition 记作
$$ \begin{aligned} \Bbb{E}[Y(1)-Y(0)] &=\pi\Bbb{E}[Y|T=1]+(1-\pi)\Bbb{E}[Y(1)|T=0]\\\ &\quad-\pi\Bbb{E}[Y(0)|T=1]-(1-\pi)\Bbb{E}[Y|T=0] \end{aligned} $$虽然 $\Bbb{E}[Y(1)|T=0]$ 和 $\Bbb{E}[Y(0)|T=1]$ 为反事实,但其取值仍介于 $a$ 和 $b$ 之间。
估计 ATE 的上界(upper bound)时,对 $+\Bbb{E}[Y(1)|T=0]$ 考虑其最大值 $+b$,对 $-\Bbb{E}[Y(0)|T=1]$ 考虑其最小值 $-a$;
估计 ATE 的下界(lower bound)时,对 $+\Bbb{E}[Y(1)|T=0]$ 考虑其最小值 $+a$,对 $-\Bbb{E}[Y(0)|T=1]$ 考虑其最大值 $-b$;
从而可以估计 ATE 的区间。
No-Assumptions Bound
$$ \begin{aligned} \Bbb{E}[Y(1)-Y(0)] \leq \pi\Bbb{E}[Y(1)|T=1] + (1-\pi) b - \pi a-(1-\pi)\Bbb{E}[Y(0)|T=0] \\\ \Bbb{E}[Y(1)-Y(0)] \geq \pi\Bbb{E}[Y(1)|T=1] + (1-\pi) a - \pi b-(1-\pi)\Bbb{E}[Y(0)|T=0] \end{aligned} $$通过 ATE 上下界之差,可以获得其区间长度 $b-a$。
然而,当 $0$ 位于对应的区间时,我们不能从因果效应的区间估计中辨析出不存在因果效应的情形。
单调性假设
单调性假设(Monotone Assumptions)是考虑某些影响方向单调的情形,或者在特定子集上假设因果效应一致性(比如只在某些特定人群中成立)。这些假设较之 unconfoundedness 宽松,然则能让我们在混淆因素部分未测量之情况下,仍得以进行因果效应的估计。
Nonnegative Monotone Treatment Response
非负单调性处理假设(Nonnegative Monotone Treatment Response)表示接受处理必定带来积极效果。
$$ \forall i \quad Y_i(1) \geq Y_i(0) $$故 ITE 非负,使区间的下界可从 $a-b$ 调整为 $0$。
据假设, 有
$$ \Bbb{E}[Y(1)|T=0] \geq \Bbb{E}[Y(0)|T=0] $$故对 $+\Bbb{E}[Y(1)|T=0]$ 考虑比它更小的 $+\Bbb{E}[Y|T=0]$;对 $-\Bbb{E}[Y(0)|T=1]$ 考虑比它更大的 $-\Bbb{E}[Y|T=1]$,可以估计 ATE 之下界
$$ \begin{aligned} \Bbb{E}[Y(1)-Y(0)] &=\pi\Bbb{E}[Y|T=1]+(1-\pi)\Bbb{E}[Y(1)|T=0]\\\ &\quad-\pi\Bbb{E}[Y(0)|T=1]-(1-\pi)\Bbb{E}[Y|T=0]\\\ &\geq\pi\Bbb{E}[Y|T=1]+(1-\pi)\Bbb{E}[Y|T=0]\\\ &\quad-\pi\Bbb{E}[Y|T=1]-(1-\pi)\Bbb{E}[Y|T=0]\\\ &=0 \end{aligned} $$Nonnegative MTR Lower Bound 为:$\Bbb{E}[Y(1)-Y(0)]\geq0$;
同理,Nonpositive MTR Upper Bound 为:$\Bbb{E}[Y(1)-Y(0)]\leq0$。
Monotone Treatment Selection
假设选择接受处理者,均得更好功效,即
$$ \begin{aligned} \Bbb{E}[Y(1)|T=1] \geq \Bbb{E}[Y(1)|T=0] \\\ \Bbb{E}[Y(0)|T=1] \geq \Bbb{E}[Y(0)|T=0] \end{aligned} $$故对 $\Bbb{E}[Y(1)|T=0]$ 考虑比它更大的 $\Bbb{E}[Y(1)|T=1]$;对$-\Bbb{E}[Y(0)|T=1]$ 考虑比它更小的 $-\Bbb{E}[Y(0)|T=0]$,可以确定其上界。
$$ \begin{aligned} \Bbb{E}[Y(1)-Y(0)] &=\pi\Bbb{E}[Y|T=1]+(1-\pi)\Bbb{E}[Y(1)|T=0]\\\ &\quad-\pi\Bbb{E}[Y(0)|T=1]-(1-\pi)\Bbb{E}[Y|T=0] \\\ &\leq\pi\Bbb{E}[Y|T=1]+(1-\pi)\Bbb{E}[Y|T=1]\\\ &\quad-\pi\Bbb{E}[Y|T=0]-(1-\pi)\Bbb{E}[Y|T=0]\\\ &=\Bbb{E}[Y|T=1]-\Bbb{E}[Y|T=0] \end{aligned} $$Monotone Treatment Selection Upper Bound 为:$\Bbb{E}[Y(1)-Y(0)]\leq\Bbb{E}[Y|T=1]-\Bbb{E}[Y|T=0]$
Optimal Treatment Selection
设,若个体皆接受最契合自身之处理
$$ T_i=1 \implies Y_i(1) \geq Y_i(0), \qquad T_i=0 \implies Y_i(0) \geq Y_I(1) $$易得
$$ \Bbb{E}[Y(1)|T=0] \leq \Bbb{E}[Y(0)|T=0] = \Bbb{E}[Y|T=0] $$再利用 No-Assumptions Bound 中对反事实 $-\Bbb{E}[Y(0)|T=1]$ 的处理,考虑其最小值 $-a$,从而确定 ATE 之上界为
$$ \begin{aligned} \Bbb{E}[Y(1)-Y(0)] &=\pi\Bbb{E}[Y|T=1]+(1-\pi)\Bbb{E}[Y(1)|T=0]\\\ &\quad-\pi\Bbb{E}[Y(0)|T=1]-(1-\pi)\Bbb{E}[Y|T=0]\\\ &\leq\pi\Bbb{E}[Y|T=1]+(1-\pi)\Bbb{E}[Y|T=0]\\\ &\quad-\pi a-(1-\pi)\Bbb{E}[Y|T=0]\\\ &=\pi\left(\Bbb{E}[Y|T=1]-a\right) \end{aligned} $$同理,有
$$ \Bbb{E}[Y(0)|T=1] \leq \Bbb{E}[Y(1)|T=1] = \Bbb{E}[Y|T=1] $$对反事实 $+\Bbb{E}[Y(1)|T=0]$ 考虑其最小值 $+a$,所以 ATE 的下界为
$$ \begin{aligned} \Bbb{E}[Y(1)-Y(0)] &=\pi\Bbb{E}[Y|T=1]+(1-\pi)\Bbb{E}[Y(1)|T=0]\\\ &\quad-\pi\Bbb{E}[Y(0)|T=1]-(1-\pi)\Bbb{E}[Y|T=0]\\\ &\geq\pi\Bbb{E}[Y|T=1]+(1-\pi)a\\\ &\quad-\pi\Bbb{E}[Y|T=1]-(1-\pi)\Bbb{E}[Y|T=0]\\\ &=(1-\pi)\left(a-\Bbb{E}[Y|T=1]\right) \end{aligned} $$综合考虑,可以获得
Optimal Treatment Selection Bound 1
$$ \begin{aligned} \Bbb{E}[Y(1)-Y(0)] &\leq \pi\left(\Bbb{E}[Y|T=1]-a\right) \\\ \Bbb{E}[Y(1)-Y(0)] &\geq (1-\pi)\left(a-\Bbb{E}[Y|T=1]\right) \\\ \text{Interval Length}&=\pi\Bbb{E}[Y|T=1]+(1-\pi)\Bbb{E}[Y|T=0]-a \end{aligned} $$注意: OTS bound 1 之区间仍包含 $0$
$$ > \Bbb{E}[Y(1)|T=0]\leq\Bbb{E}[Y|T=1] > $$$$ \Bbb{E}[Y(1)|T=0]=\Bbb{E}[Y(1)|Y(0)>Y(1)] $$Manski (1990), ‘Nonparametric Bounds on Treatment Effects’
当我们对潜在结果 $Y(1)$ 求期望时,将 $Y(0)>Y(1)$ 翻转为 $Y(1) \leq Y(0)$,可以确定其上界,即
$$\Bbb{E}[Y(1)|Y(0) \leq Y(1)]=E[Y(1)|T=1]=\Bbb{E}[Y|T=1]$$从而获得 Manski 在论文中使用的假设。
对 $+\Bbb{E}[Y(1)|T=0]$ 考虑比它更大的 $+\Bbb{E}[Y|T=1]$;对 $\Bbb{E}[Y(0)|T=1]$ 考虑其最小值 $a$,从而确定 ATE 的上界
$$ \begin{aligned} \Bbb{E}[Y(1)-Y(0)]&\leq\pi\Bbb{E}[Y|T=1]+(1-\pi)\Bbb{E}[Y|T=1]\\\ &\quad-\pi a-(1-\pi)\Bbb{E}[Y|T=0]\\\ &=\Bbb{E}[Y|T=1]-\pi a-(1-\pi)\Bbb{E}[Y|T=0] \end{aligned} $$同理可以确定其下界
Optimal Treatment Selection Bound 2
$$ \begin{aligned} \Bbb{E}[Y(1)-Y(0)] &\leq \Bbb{E}[Y|T=1]-\pi a-(1-\pi)\Bbb{E}[Y|T=0] \\\ \Bbb{E}[Y(1)-Y(0)] &\geq \pi\Bbb{E}[Y|T=1]+(1-\pi)a-\Bbb{E}[Y|T=0] \\\ \text{Interval Length}&=(1-\pi)\Bbb{E}[Y|T=1]+\pi\Bbb{E}[Y|T=0]-a \end{aligned} $$注意: OTS Bound 2 之区间虽仍涵盖 $0$,然在某些情境下,可判定效应之方向。
- Different bounds are better in different cases.
- Different bounds can be better in different ways (e.g., identifying the sign vs. getting a smaller interval).
例如 Mixing Bounds,同时取用 OTS Bound 1 之上(或下)界,与 OTS Bound 2 之下(或上)界。如此种方法,虽较严苛假设下所得结论为宽泛,但亦可为实际研究中提供参考,特别是当完全排除混淆过于乐观之时。