今天以ggplot2的一个坑来说一下,坑无处不在,防不胜防,你大可以试一下下面的代码:

> set.seed(123)
> require(ggplot2)
Loading required package: ggplot2
> rnorm(3)
[1]  0.8005543  1.1902066 -1.6895557
> set.seed(123)
> rnorm(3)
[1] -0.5604756 -0.2301775  1.5587083

在两次set.seed和rnorm之间,第一次因为加载了ggplot2,结果就不一样了!这必须是第二次是正确答案,也就是说加载ggplot2把你的seed给吃了!加载包会改变R环境?这绝对不是好主意,我们来试试加载别的包试试,比如我的clusterProfiler:

> set.seed(123)
> require(clusterProfiler)
Loading required package: clusterProfiler
Loading required package: DOSE

DOSE v3.4.0  For help: https://guangchuangyu.github.io/DOSE

If you use DOSE in published research, please cite:
Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an R/Bioconductor package for Disease Ontology Semantic and Enrichment analysis. Bioinformatics 2015, 31(4):608-609

clusterProfiler v3.6.0  For help: https://guangchuangyu.github.io/clusterProfiler

If you use clusterProfiler in published research, please cite:
Guangchuang Yu., Li-Gen Wang, Yanyan Han, Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.
> rnorm(3)
[1] -0.5604756 -0.2301775  1.5587083

显然并不影响!这才是正确的打开方式。而这种雷,你不小心踩了,都不知道在那里死的!

我们来看看代码:

https://github.com/tidyverse/ggplot2/blob/master/R/zzz.r

.onAttach <- function(...) {
  if (!interactive() || stats::runif(1) > 0.1) return()

  tips <- c(
    "Need help? Try the ggplot2 mailing list: http://groups.google.com/group/ggplot2.",
    "Find out what's changed in ggplot2 at http://github.com/tidyverse/ggplot2/releases.",
    "Use suppressPackageStartupMessages() to eliminate package startup messages.",
    "Stackoverflow is a great place to get help: http://stackoverflow.com/tags/ggplot2.",
    "Need help getting started? Try the cookbook for R: http://www.cookbook-r.com/Graphs/",
    "Want to understand how all the pieces fit together? Buy the ggplot2 book: http://ggplot2.org/book/"
  )

  tip <- sample(tips, 1)
  packageStartupMessage(paste(strwrap(tip), collapse = "\n"))
}

因为你加载包的时候,hadley用了sample,也就是说你的seed,被加载时候的sample指令给用掉了。这个坑,就在两天前被Jim Hester给修复了,用了withr::with_preserve_seed,这个坑存在了两年多啊!

.onAttach <- function(...) {
  withr::with_preserve_seed({
    if (!interactive() || stats::runif(1) > 0.1) return()

    tips <- c(
      "RStudio Community is a great place to get help: https://community.rstudio.com/c/tidyverse.",
      "Find out what's changed in ggplot2 at https://github.com/tidyverse/ggplot2/releases.",
      "Use suppressPackageStartupMessages() to eliminate package startup messages.",
      "Need help? Try Stackoverflow: https://stackoverflow.com/tags/ggplot2.",
      "Need help getting started? Try the cookbook for R: http://www.cookbook-r.com/Graphs/",
      "Want to understand how all the pieces fit together? See the R for Data Science book: http://r4ds.had.co.nz/"
    )

    tip <- sample(tips, 1)
    packageStartupMessage(paste(strwrap(tip), collapse = "\n"))
  })
}

之前写的R的诡异事件,都是R容易掉的坑,而大家面临的,远不止这些,因为「包治百病」嘛,都是在用R包,而各种R包,还可能有各种各样想不到的坑在等着你!而且修复bug这种事情,远不比表面看的那么简单,不信看看微软的bug(http://azaleasays.com/2017/01/22/30-year-old-bug-in-microsoft-excel/

在 Excel 诞生之前,电子表格软件的天下是属于 Lotus 1-2-3 的。而 Lotus 1-2-3 就假设1900年是闰年,这样计算和处理闰年方便快捷。Excel 为了和市场领导者 Lotus 1-2-3 兼容,使用了同样的日期数据格式,并且兼容了这个 bug,这样用户就可以无缝地在 Excel 上读写 Lotus 1-2-3 文件。几年后,Excel 打败了 Lotus 1-2-3,但是 Excel 也要兼容自己老版本的文件,一旦修复了这个 bug,则:

所有 Excel 文件里的日期,都会差一天。修正这些数据要花费人力物力。

使用日期相关函数的公式,可能会得出错误结果。

会导致兼容 Excel 日期的其他软件不再兼容。

所以,我们决定让这个 bug 长命百岁。

看完还想看