The caret package has a nice function for splitting up balanced subsets of data. Though I don’t see why I don’t get 3 rows out of 10 in this example. The p argument is defined as “the percentage of data that goes to training”.
d <- data.frame(x=rnorm(10), group=c(1,1,1,2,2,2,3,3,3,3)) d x group 1 1.0089900 1 2 0.4854706 1 3 1.7083259 1 4 -1.3362274 2 5 1.4905259 2 6 1.6451234 2 7 1.0361174 3 8 0.2369341 3 9 -2.0043264 3 10 1.4361718 3 library(caret) d[createDataPartition(d$group, p=3/10)$Resample1,] x group 3 1.7083259 1 4 -1.3362274 2 8 0.2369341 3 10 1.4361718 3
