# Support Vector Machine

SVM对于hyperplane的定义，在形式上和logistic regression一样,logistic regression的decision boundary由$\theta^TX=0$确定,SVM则用$w^TX+b=0$表示,其中b相当于logistic regression中的$\theta_0$，从形式上看，两者并无区别，当然如前面所说，两者的目标不一样，logistic regression着眼于全局，SVM着眼于support vectors。有监督算法都有label变量y，logistic regression取值是{0,1}，而SVM为了计算距离方便，取值为{-1,1}

# Five things biologists should know about statistics

Ewan Birney最近的一篇博文（Five statistical things I wished I had been taught 20 years ago ）讲述了统计对于生物学的重要性。

Ewan所谈及的五个方面分别是：

1. Non parametric statistics. These are statistical tests which make a bare minimum of assumptions of underlying distributions; in biology we are rarely confident that we know the underlying distribution, and hand waving about central limit theorem can only get you so far. Wherever possible you should use a non parameteric test. This is Mann-Whitney (or Wilcoxon if you prefer) for testing “medians” (Medians is in quotes because this is not quite true. They test something which is closely related to the median) of two distributions, Spearman’s Rho (rather pearson’s r2) for correlation, and the Kruskal test rather than ANOVAs (though if I get this right, you can’t in Kruskal do the more sophisticated nested models you can do with ANOVA). Finally, don’t forget the rather wonderful Kolmogorov-Smirnov (I always think it sounds like really good vodka) test of whether two sets of observations come from the same distribution. All of these methods have a basic theme of doing things on the rank of items in a distribution, not the actual level. So - if in doubt, do things on the rank of metric, rather than the metric itself.

# Bootstrap Method

a <- c(seq(1:10), rnorm(50))


#创建一个样本，60个数据，非正态分布的，如下图

# 从概率的角度看 如何告别单身

> 1-0.8^6
[1] 0.737856


> 1-0.8^7
[1] 0.7902848


> 1-0.7^6
[1] 0.882351
> 1-0.7^7
[1] 0.9176457


#### Guangchuang Yu

a senior-in-age-but-not-senior-in-knowledge bioinformatician

Postdoc researcher

Hong Kong