宋词-词频

December 9, 2011 in Software, R

最近老收到那个“消灭”文科生的词频页面。好吧，我也来跟风娱乐一下 =,=

对着原贴那样的题目，做为理科生，拿着随机数对着看啥的，哪好意思啊。搞几行代码才好装13。

R for scientific programming

September 24, 2011 in Calculus, R

刚看完《Introduction to Scientific Programming and Simulation Using R》这本书，

第一部分是R编程入门；第二部分是数值计算，主要是解方程，求积分和优化；第三部分是概率和统计，主要讲概率、随机变量等概念和参数估计；第四部分是simulation，主要讲Monte Carlo积分和方差降低。

ggplot2 Version of Figures in <25 Recipes for Getting Started with R>

August 16, 2011 in Visualization, R

In order to provide an option to compare graphs produced by basic internal plot function and ggplot2, I have recreated the figures in the book, 25 Recipes for Getting Started with R, with ggplot2.

The code used to create the images is in separate paragraphs, allowing easy comparison. 1.16 Creating a Scatter Plot

plot(cars)

the batman equation

August 13, 2011 in math, Visualization, R

HardOCP has an image with an equation which apparently draws the Batman logo.

根号2的几何作图

August 12, 2011 in R, Math

读研时买了很多书，大部分都没时间看，《什么是数学》就是其中的一本。这两天翻看了一点。

《第二章：数学中的数系》讲到了当年的伟大发现，一个正方形的对角线与它的边是不可公度的。而由不可公度线段，引入的无理数概念，引入负数，在17世纪都是个另人不安的事情，无理数是个巨大的飞跃，

73页中的图10,给出了 $\sqrt{2}$的几何作图。

我用R尝试把它画出来：

QQ plot

August 2, 2011 in R, Visualization

虽然R提供了很多作图函数，但自己实现一下，是非常好的体验，而且能够让我们了解其中的细节。

最近在读＜Modern Applied Statistics With S-PLUS＞，115页讲到Q-Q图时，书中给出了一个Trellis的实现。（Trellis是S/S-PLUS的可视化系统，在R里的对等实现是lattice包）。

我们知道一组数字，可以算4分位数，分别是25%， 50%（中位数）， 75%，它等于该组数字中所有数值由小到大排列后第X%的数字，事实上每个数字都可以对应一个X%，Q-Q图很简单，把样本数据和理论分布算出来的quantiles，画个散点图而已。分别用base graph和ggplot2实现，图中三个图分别由系统函数qqnorm，和这里定义的qqplot, qqplot2画出来。

a simple gene finder

July 10, 2011 in conference, R

上了几天的课，http://ibw2011.fmmu.edu.cn/schedule.htm 今天就上完了，只完成了project 1，想写gibbs sampling，但是没搞明白，汗。

这个纯属练习用，没啥实用价值。

Course Projects:

Project 1: Implementation of a simple gene finder

GOAL

Build a simple codon-usage based gene finder for finding genes in E.coli.

Procedure

Collect 100 gene sequences from the bacterium E. coli in the genbank (http://www.ncbi.nlm.nihh.gov). Compute the codon usage table based on these genes (and the translated protein sequences from them); Build a probabilistic model based on the codon usages; Implement a random sequence model in which the nucleotide frequency is computed from the 100 E. coli genes. For a given DNA sequence (and one selected reading frame), compare your model with a random sequence model; Results that you should submit:

Two FASTA files for the collected 100 genes and 100 translated protein sequences; The printed codon usage table; A program named ECgnfinder, running with the syntax as ECgnfinder –i inputfile

Inputfile stands for the name of input file, which should contain one DNA sequence in FASTA file format; the program should be able to report an error message if the input file is in the wrong format.

The output should be printed to the standard output as (xxx stands for the likelihood)

ORF1: xxx ORF2: xxx

宋词-词频

R for scientific programming

ggplot2 Version of Figures in <25 Recipes for Getting Started with R>

the batman equation

根号2的几何作图

QQ plot

a simple gene finder

Guangchuang Yu