Apr 24 2018 stat.OT
Well known Simpson's paradox is puzzling and surprising for many, especially for the empirical researchers and users of statistics. However there is no surprise as far as mathematical details are concerned. A lot more is written about the paradox but most of them are beyond the grasp of such users. This short article is about explaining the phenomenon in an easy way to grasp using simple algebra and geometry. The mathematical conditions under which the paradox can occur are made explicit and a simple geometrical illustrations is used to describe it. We consider the reversal of the association between two binary variables, say, $X$ and $Y$ by a third binary variable, say, $Z$. We show that it is always possible to define $Z$ algebraically for non-extreme dependence between $X$ and $Y$, therefore occurrence of the paradox depends on identifying it with a practical meaning for it in a given context of interest, that is up to the subject domain expert. And finally we discuss the paradox in predictive contexts since in literature it is argued that the paradox is resolved using causal reasoning.