Skip to main content

Posts

Showing posts from May, 2018

Encoding design matrices in Patsy

Some of us have seen the connections between ANOVA and linear regression (see here  for more detailed explanation).  In order to draw the equivalence between ANOVA and linear regression, we need a design matrix. For instance if we have a series of observations A, B, C as follows \[ \{A, B, C, A, A, B, C\}\] If we wanted to reformulate this into ANOVA-style test, we can do a comparison between A vs B, and A vs C.  We can encode that design matrix as follows \begin{bmatrix} 0 & 1 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 1 \\ \end{bmatrix} In the first row of the matrix, only the entries with B are labeled (since we are doing a comparison between A and B. In the second row, only the entries with C are labeled.  Since we have set A to implicitly be the reference, there is no row corresponding to A. If we want to explicitly derive this design matrix in patsy, we can do it as follows import pa