Quantcast
Channel: Mike Love’s blog
Browsing all 22 articles
Browse latest View live

Image may be NSFW.
Clik here to view.

German nouns and gender

I’m working on a presentation about classification of strings, and using 240,000 German nouns as an example dataset.

View Article


Image may be NSFW.
Clik here to view.

PCA on training and test data

In the past months, I heard some talks where dimension reduction (e.g. taking the top k principal components) was used on the full data set before splitting the data into training and test sets. My...

View Article


Image may be NSFW.
Clik here to view.

Block bootstrap

In looking at sequential data (e.g. time-series or genomic data), any inference comparing different sequences needs to take into account local correlations within a sequence. For example, you might...

View Article

Image may be NSFW.
Clik here to view.

Plotting hclust

After many years I’ve finally worked out the x and y coordinates of the points in plot.hclust. hang <- 0.07 hc <- hclust(dist) plot(hc) pt.heights <- c(hc$height[hc$merge[,1] <...

View Article

Image may be NSFW.
Clik here to view.

Points and line ranges

Two ways of plotting a grid of points and line ranges. I’m coming around to ggplot2. I recommend skimming the first few chapters of the book to understand what is going on – but it only takes about 30...

View Article


Image may be NSFW.
Clik here to view.

Pipe to Rscript

with this, I can switch from doing simple statistics on the command line using awk to using R, which is more familiar for me: blah blah blah | Rscript -e 'summary(scan(file("stdin")))'

View Article

Image may be NSFW.
Clik here to view.

Splitting data

The caret package has a nice function for splitting up balanced subsets of data. Though I don’t see why I don’t get 3 rows out of 10 in this example. The p argument is defined as “the percentage of...

View Article

Image may be NSFW.
Clik here to view.

Poisson regression

In trying to explain generalized linear models, I often say something like: GLMs are very similar to linear models but with different domains for the target y, e.g. positive numbers, outcomes in {0,1},...

View Article


Image may be NSFW.
Clik here to view.

How wrong is hypergeometric test with one random margin?

In biostats and bioinformatics, the hypergeometric distribution is often used to assign probability of surprise to the amount of overlap between results and annotation, e.g.: 100 gene levels are...

View Article


Image may be NSFW.
Clik here to view.

Binomial GLM for ratios of read counts

For certain sequencing experiments (e.g. methylation data), one might end up with a ratio of read counts at a certain location satisfying a given property (e.g. ‘is methylated’) and want to test if...

View Article

Image may be NSFW.
Clik here to view.

Jacob and Monod

The original gene regulation diagram? J Mol Biol. 1961 Jun;3:318-56. Genetic regulatory mechanisms in the synthesis of proteins. JACOB F, MONOD J.

View Article

Image may be NSFW.
Clik here to view.

Plot hclust with colored labels

Again I find myself trying to plot a cluster dendrogram with colored labels. With some insight from this post, I came up with the following function: library(RColorBrewer) # matrix contains...

View Article

Image may be NSFW.
Clik here to view.

More hclust madness

Here is a bit of code for making a heatmap, which orders the rows of a matrix such that the first column (as ordered by in the dendrogram) has all 0s then all 1s, then the 2nd column is similarly...

View Article


Image may be NSFW.
Clik here to view.

Empirical Bayes and the James-Stein rule

Suppose we observe 300 individual estimates y_i which are distributed N(mean_i, sigma.y^2), with sigma.y known. Now if we assume mean_i ~ N(0, sigma.mean^2), the James-Stein rule gives an estimator for...

View Article

Image may be NSFW.
Clik here to view.

R gotchas

I put together a short list of potential R gotchas: unexpected results which might trip up new R users. For example, if we have a matrix m, m[1:2,] returns a matrix, while m[1,] returns a vector,...

View Article


Image may be NSFW.
Clik here to view.

How to check your simple definition of p-value

I just read Andrew Gelman’s post about an article with his name on it starting with an inaccurate definition of p-value. I sympathize with all parties. Journalists and editors are just trying to reduce...

View Article

Image may be NSFW.
Clik here to view.

How to use latex math in Rmd to display properly on GitHub Pages

Working on our PH525x online course material, Rafa and I wanted to base all lecture material in Rmd files, as these are easy for students to load into RStudio to walk through the code. Additionally,...

View Article


Image may be NSFW.
Clik here to view.

Be precise

I’ve seen a lot of brash negativity lately on twitter. Here are 3 reasons why you shouldn’t say “x sucks” or “y FAIL” on twitter: 1. you are being sarcastic. sarcasm doesnt work on twitter and some...

View Article

Image may be NSFW.
Clik here to view.

Jacob and Monod

The original gene regulation diagram? J Mol Biol. 1961 Jun;3:318-56. Genetic regulatory mechanisms in the synthesis of proteins. JACOB F, MONOD J.

View Article

Image may be NSFW.
Clik here to view.

RNA-seq fragment sequence bias

Our paper was just published describing a new method for modeling and correcting fragment sequence bias for estimation of transcript abundances from RNA-seq: “Modeling of RNA-seq fragment sequence bias...

View Article
Browsing all 22 articles
Browse latest View live