Quantcast
Channel: Mike Love’s blog
Viewing all articles
Browse latest Browse all 22

Splitting data

$
0
0

The caret package has a nice function for splitting up balanced subsets of data. Though I don’t see why I don’t get 3 rows out of 10 in this example. The p argument is defined as “the percentage of data that goes to training”.


d <- data.frame(x=rnorm(10), group=c(1,1,1,2,2,2,3,3,3,3))
d
            x group
1   1.0089900     1
2   0.4854706     1
3   1.7083259     1
4  -1.3362274     2
5   1.4905259     2
6   1.6451234     2
7   1.0361174     3
8   0.2369341     3
9  -2.0043264     3
10  1.4361718     3
library(caret)
d[createDataPartition(d$group, p=3/10)$Resample1,]
            x group
3   1.7083259     1
4  -1.3362274     2
8   0.2369341     3
10  1.4361718     3



Viewing all articles
Browse latest Browse all 22

Trending Articles