“如果你用R编写的程序很慢,请检查自己的循环。”
嗯,不错,写了上句之后很有成就感。R中的一些预置循环函数,用起来那是相当的爽,来两个。
1. apply()
一句话“对行/列/每个单元进行操作”。这个就不说了嘛,太经典了,无数种超级用法。
2 lapply(X, FUN, ...)
一句话“对列表进行操作”。不是列表的,先转化为列表。至于data.frame,请注意,它也是一个列表,啊,啊!
sapply(),返回结果更好看些的lapply()
========================================
#formally using
> a <- data.frame(name = c('a', 'b', 'c', 'd', 'e'), x1=rep(1, 5), x2=seq(5))
> a
name x1 x2
1 a 1 1
2 b 1 2
3 c 1 3
4 d 1 4
5 e 1 5
> is.list(a)
[1] TRUE
> lapply(a[-1], mean)
$x1
[1] 1
$x2
[1] 3
# an interesting example
> lapply(1:3, function(x)print(x))
[1] 1
[1] 2
[1] 3
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
#clever usage
> lapply(1:3, runif)
[[1]]
[1] 0.7953843
[[2]]
[1] 0.009511986 0.014620154
[[3]]
[1] 0.3515089 0.6179728 0.6724647
> sapply(a[-1], mean)
x1 x2
1 3
# 一个很好玩sapply和apply的联合应用,也说明函数FUN后面的参数可以传递(很像paste()函数)
> testcul #是一个列表
[[1]]
[,1] [,2]
[1,] -1.2866210 -0.2685693
[2,] 1.3842814 1.9853389
[3,] 1.5937214 -0.6717815
[4,] 0.5248024 -0.2175002
[5,] 1.7479599 0.3178725
[6,] -0.9006554 1.0752814
[7,] 0.7844334 1.0482259
[8,] -0.7308886 -0.4430378
[9,] -1.2088685 1.5549266
[10,] -1.0480727 0.2928597
[[2]]
[,1] [,2] [,3]
[1,] -3.1393330 -0.23722704 1.34118049
[2,] 1.3107357 1.82073202 -0.71766319
[3,] -0.7134135 0.36238927 -0.87612364
[4,] 0.7447617 0.67473600 -0.74937624
[5,] -0.0837959 -0.06708523 -0.07118088
[6,] -0.2549086 1.53768357 0.33565245
[7,] -1.7181883 0.58089985 -0.85411148
[8,] 0.1160611 0.67117684 0.71815686
[9,] 0.4569479 0.35652096 -0.41536960
[10,] 0.9169137 -0.94427696 1.11756517
[[3]]
[,1] [,2] [,3]
[1,] 0.97486110 -0.19133644 -0.9147590
[2,] 1.46550890 -1.86653680 -0.6270747
[3,] 0.03275271 0.27933836 0.9450201
[4,] 0.52421699 -1.86095929 0.3082170
[5,] 0.91287800 -0.54052685 -0.8613839
[6,] 0.76407813 -0.73753415 2.6269367
[7,] -0.36976604 -0.67746496 1.0180791
[8,] 0.79666639 0.24423928 0.5493549
[9,] 0.79408187 -0.02954059 0.4128063
[10,] 0.01681600 0.81758855 0.4179971
[[4]]
[,1] [,2]
[1,] -1.21189467 0.59042009
[2,] -0.99203544 2.05900893
[3,] 0.73424020 -0.78684771
[4,] 0.22416011 0.92180061
[5,] 1.26502276 0.03246454
[6,] -1.70265031 -1.63964678
[7,] 1.64618462 -0.86794966
[8,] 0.02223913 -1.49163689
[9,] 1.36978971 1.42062220
[10,] -0.94584516 0.51691580
> sapply(testcul, apply, 1, sd) # 对列表中每一个元素的每一行求standard deviation
[,1] [,2] [,3] [,4]
[1,] 0.7198713 2.27261193 0.9534167 1.27442899
[2,] 0.4250118 1.34275515 1.6841269 2.15741416
[3,] 1.6019525 0.67302061 0.4719054 1.07557158
[4,] 0.5248872 0.84315355 1.3191568 0.49330633
[5,] 1.0112245 0.00870976 0.9454576 0.87155027
[6,] 1.3971984 0.91351225 1.6854646 0.04455022
[7,] 0.1865295 1.16129901 0.9032960 1.77776139
[8,] 0.2035412 0.33488302 0.2767171 1.07047200
[9,] 1.9542983 0.47729069 0.4121884 0.03594400
[10,] 0.9481824 1.13691723 0.4003865 1.03432819
=======================================
3 tapply(X, INDEX, FUN = NULL, ..., simplify = TRUE)
一句话“对某列(且只能针对一列,多列用aggregate())中有相同元素的行进行操作”,和table()有一拼
========================================
# an interesting sample
> ind <- list(c(1, 2, 2), c("A", "A", "B"))
> table(ind)