Skip to main content

Posts

Encoding design matrices in Patsy

Some of us have seen the connections between ANOVA and linear regression (see here  for more detailed explanation).  In order to draw the equivalence between ANOVA and linear regression, we need a design matrix. For instance if we have a series of observations A, B, C as follows \[ \{A, B, C, A, A, B, C\}\] If we wanted to reformulate this into ANOVA-style test, we can do a comparison between A vs B, and A vs C.  We can encode that design matrix as follows \begin{bmatrix} 0 & 1 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 & 1 & 0 & 1 \\ \end{bmatrix} In the first row of the matrix, only the entries with B are labeled (since we are doing a comparison between A and B. In the second row, only the entries with C are labeled.  Since we have set A to implicitly be the reference, there is no row corresponding to A. If we want to explicitly derive this design matrix in patsy, we can do it as follows import pa
Recent posts

Behind the scenes with BIOM tables

Today, I'll be covering the BIOM file format , a standardized file format for storing sequence counts in samples.  This file format is typically used in the biological sciences, most notably in amplicon sequencing technologies, such as 16S sequencing. For those of you that aren't as familiar with these technologies.  When we conduct survey studies, we like to get a broad overview of the microbes that are living within a raw sample.  But we don't need to sequence the entire bacteria's genome to identify what the bacteria is.  We can just a sequence a housekeeping gene that every bacteria as - the 16S ribosome. Its a similar strategy deployed in court.  When DNA evidence is presented in the court room, only a tiny, tiny portion of an individuals DNA is actually required to uniquely identify that person. But moving on. The BIOM file format was originally designed to store counts of 16S sequences across samples, but it has grown to become a more generalized file fo

ANCOM explained

In case you have not heard, ANCOM is another differential abundance test, designed specifically for tweezing out differentially abundance bacteria between groups.  Now, note that there are a ton of differential abundance techniques out there.  And one might ask why are there so many people focused on this seemingly simple problem. It turns out that this problem is actually impossible.   And this is rooted into the issue of relative abundances.  A change of 1 species between samples can be also explained by the change of all of the other species between samples.   Let's take a look at simple, concrete example. Here we have ten species, and 1 species doubles after the first time point.  If we know the original abundances of this species, it's pretty clear that species 1 doubled.  However, if we can only obtain the proportions of species within the environment, the message isn't so clear. Above are the proportions of the species in the exact same environment

Setting environmental parameters in conda

Just came across this post about specifying parameters on conda Basically, when you create an enter a new environment, this gives you the capability to save environmental variables specific to that environment.  And even better, you can deconstruct those variables once you leave that environment. What does this mean?  It means that you can enforce relatively isolated environments, when it comes to environments.  So if you wanted to install 2 different versions of the same package in two different environments, easy!  Just specify the path variables that you want under the activate.d file, and restore the variables that you had previously in the deactivate.d file. To see what I mean, consider the following test conda create -n test pip source activate test Now we have a new conda environment, and if you check out your miniconda envs folder you should be able to see it.  For me, I can run the following to do this ls ~/miniconda3/envs/ I'm using miniconda instead bec

Installing non-conda R packages for jupyter notebook on R and conda

When you run in the situation that you want to run all of your R scripts in a jupyter notebook, within a conda environment, you will have to take some slight detours to install non-conda R packages. This is assuming that you have already installed R and jupyter through conda.  For information, checkout this awesome post . For an example, let's take a look at how one would install the ecodist package in R. We will need to pass in 3 arguments, (1) the name of the package, (2) the location of the conda R version and (3) the cran repository we will want to download from. install.packages('ecodist', '/Users/mortonjt/miniconda3/envs/bio/lib/R/library/', repos="http://cran.cnr.berkeley.edu") So, just modify the path to your environment in the second argument, and you should be good to go :)

Installing qiime through conda

First set of posts on conda.  Its becoming increasingly difficult to sift through my inbox to find all of the proper commands, so here it goes :) Anyways, conda has proven to be quite a powerful tool.  It enables _all_ of the capabilities provided by virtualenv, plus more.  It can install C libraries such as hdf5, is my personal go-to whenever I'm installing software on a new system.  Heck you can even install different versions of Python - how cool is that? That being said, the fastest way I know of to install qiime on a new cluster is through conda. To get started, you'll first want to install Miniconda .  The reason way is because you want a minimal conda install, otherwise you'll end up breaking some of the dependencies required by qiime. After getting into your root directory, you can download python (for python 3) for linux wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh If you have a mac you can use the following URL instead

Getting started with the XBee

XBee is a wireless module that supports Zigbee protocol, allowing the development of microcontroller networks. The datasheet is in the following link http://www.sparkfun.com/datasheets/Wireless/Zigbee/XBee-Datasheet.pdf Materials To get your first xbee tutorial working, you'll need the following materials USB Explorer Dongle XBee interface Two XBee radios You'll need a way to program the XBee chip. Either of the following options will work  https://www.sparkfun.com/products/9819 https://www.sparkfun.com/products/8687 If you decide to get the second product, make sure you have a micro USB cord Either of the following product should work https://www.sparkfun.com/products/11373 https://www.sparkfun.com/products/10854 Just make sure that you also include headers! The can be found at the following url https://www.sparkfun.com/products/9280 You'll need to get two XBee radios since they use a P2P protocol to communicate. You'll als