One day, I am looking for R packages that can analyze PPI and after searching, I found the ppiPre package in CRAN.

The function of this package is not impressive, and I already knew some related works, including http://intscore.molgen.mpg.de/. The authors of this webserver contacted me for the usages of GOSemSim when they developing it. What makes me curious is that the ppiPre package can calculate GO semantic similarity and supports 20 species exactly like GOSemSim. I opened the source tarball, and surprisingly found that its sources related to semantic similarity calculation are totally copied from GOSemSim.

GOSemSim was firstly released in 2008 Bioconductor 2.4 (at that time, devel version) and published in Bioinformatics in 2010. After compared the sources, I found the sources in ppiPre were copied from GOSemSim version 1.6.8 which released in 2010 Bioconductor 2.6. The Wang method defined in GOKEGGSims.r file of ppiPre is:

   119  WangMethod <- function(GOID1, GOID2, ont="MF", organism="yeast") {
   120      if(!exists("ppiPreEnv")) .initial()
   121      weight.isa = 0.8
   122      weight.partof = 0.6
   123
   124      if (GOID1 == GOID2)
   125          return (1)
   126
   127      Parents.name <- switch(ont,
   128          MF = "MFParents",
   129          BP = "BPParents",
   130          CC = "CCParents"
   131      )
   132      if (!exists(Parents.name, envir=ppiPreEnv)) {
   133          GetGOParents(ont)
   134      }
   135      Parents <- get(Parents.name, envir=ppiPreEnv)
   136
   137      sv.a <- 1
   138      sv.b <- 1
   139      sw <- 1
   140      names(sv.a) <- GOID1
   141      names(sv.b) <- GOID2
   142
   143      sv.a <- WangSemVal(GOID1, ont, Parents, sv.a, sw, weight.isa, weight.partof)
   144      sv.b <- WangSemVal(GOID2, ont, Parents, sv.b, sw, weight.isa, weight.partof)
   145
   146      sv.a <- uniqsv(sv.a)
   147      sv.b <- uniqsv(sv.b)
   148
   149      idx <- intersect(names(sv.a), names(sv.b))
   150      inter.sva <- unlist(sv.a[idx])
   151      inter.svb <- unlist(sv.b[idx])
   152      sim <- sum(inter.sva,inter.svb) / sum(sv.a, sv.b)
   153      return(sim)
   154  }
   155  WangSemVal <- function(goid, ont, Parents, sv, w, weight.isa, weight.partof) {
   156      if(!exists("ppiPreCache"))
   157          return(WangSemVal_internal(goid, ont, Parents, sv, w, weight.isa, weight.partof))
   158      goid.ont <- paste(goid, ont, sep=".")
   159      if (!exists(goid.ont, envir=ppiPreCache)) {
   160          value <- WangSemVal_internal(goid, ont, Parents, sv, w, weight.isa, weight.partof)
   161          assign(eval(goid.ont), value, envir=ppiPreCache)
   162      }
   163      return(get(goid.ont, envir=ppiPreCache))
   164  }
   165
   166  WangSemVal_internal <- function(goid, ont, Parents, sv, w, weight.isa, weight.partof) {
   167      p <- Parents[goid]
   168      p <- unlist(p[[1]])
   169      if (length(p) == 0) {
   170          return(0)
   171      }
   172      relations <- names(p)
   173      old.w <- w
   174      for (i in 1:length(p)) {
   175          if (relations[i] == "is_a") {
   176              w <- old.w * weight.isa
   177          } else {
   178              w <- old.w * weight.partof
   179          }
   180          names(w) <- p[i]
   181          sv <- c(sv,w)
   182          if (p[i] != "all") {
   183              sv <- WangSemVal_internal(p[i], ont, Parents, sv, w, weight.isa, weight.partof)
   184          }
   185      }
   186      return (sv)
   187  }
   188
   189  uniqsv <- function(sv) {
   190      sv <- unlist(sv)
   191      una <- unique(names(sv))
   192      sv <- unlist(sapply(una, function(x) {max(sv[names(sv)==x])}))
   193      return (sv)
   194  }

It is identical to the one I defined in GOSemSim 1.6.8:

   196  ygcWangMethod <- function(GOID1, GOID2, ont="MF", organism="human") {
   197      if(!exists("GOSemSimEnv")) .initial()
   198      weight.isa = 0.8
   199      weight.partof = 0.6
   200
   201      if (GOID1 == GOID2)
   202          return (gosim=1)
   203
   204      Parents.name <- switch(ont,
   205          MF = "MFParents",
   206          BP = "BPParents",
   207          CC = "CCParents"
   208      )
   209      if (!exists(Parents.name, envir=GOSemSimEnv)) {
   210          ygcGetParents(ont)
   211      }
   212      Parents <- get(Parents.name, envir=GOSemSimEnv)
   213
   214      sv.a <- 1
   215      sv.b <- 1
   216      sw <- 1
   217      names(sv.a) <- GOID1
   218      names(sv.b) <- GOID2
   219
   220      sv.a <- ygcSemVal(GOID1, ont, Parents, sv.a, sw, weight.isa, weight.partof)
   221      sv.b <- ygcSemVal(GOID2, ont, Parents, sv.b, sw, weight.isa, weight.partof)
   222
   223      sv.a <- uniqsv(sv.a)
   224      sv.b <- uniqsv(sv.b)
   225
   226      idx <- intersect(names(sv.a), names(sv.b))
   227      inter.sva <- unlist(sv.a[idx])
   228      inter.svb <- unlist(sv.b[idx])
   229      sim <- sum(inter.sva,inter.svb) / sum(sv.a, sv.b)
   230      return(sim)
   231  }
   232
   233
   234
   235  uniqsv <- function(sv) {
   236      sv <- unlist(sv)
   237      una <- unique(names(sv))
   238      sv <- unlist(sapply(una, function(x) {max(sv[names(sv)==x])}))
   239      return (sv)
   240  }
   241
   242  ygcSemVal_internal <- function(goid, ont, Parents, sv, w, weight.isa, weight.partof) {
   243      p <- Parents[goid]
   244      p <- unlist(p[[1]])
   245      if (length(p) == 0) {
   246          #warning(goid, " may not belong to Ontology ", ont)
   247          return(0)
   248      }
   249      relations <- names(p)
   250      old.w <- w
   251      for (i in 1:length(p)) {
   252          if (relations[i] == "is_a") {
   253              w <- old.w * weight.isa
   254          } else {
   255              w <- old.w * weight.partof
   256          }
   257          names(w) <- p[i]
   258          sv <- c(sv,w)
   259          if (p[i] != "all") {
   260              sv <- ygcSemVal_internal(p[i], ont, Parents, sv, w, weight.isa, weight.partof)
   261          }
   262      }
   263      return (sv)
   264  }
   265
   266  ygcSemVal <- function(goid, ont, Parents, sv, w, weight.isa, weight.partof) {
   267      if(!exists("GOSemSimCache")) return(ygcSemVal_internal(goid, ont, Parents, sv, w, weight.isa, weight.partof))
   268      goid.ont <- paste(goid, ont, sep=".")
   269      if (!exists(goid.ont, envir=GOSemSimCache)) {
   270          value <- ygcSemVal_internal(goid, ont, Parents, sv, w, weight.isa, weight.partof)
   271          assign(goid.ont, value, envir=GOSemSimCache)
   272          #cat("recompute ", goid, value, "\n")
   273      }
   274      else{
   275          #cat("cache ", goid, get(goid, envir=GOSemSimCache), "\n")
   276      }
   277      return(get(goid.ont, envir=GOSemSimCache))
   278  }

The information content based method in ppiPre:

   495  GetLatestCommonAncestor<-function(GOID1, GOID2, ont, organism){
   496  #message("Calulating Latest Common Ancestor...")
   497      if(!exists("ppiPreEnv")) .initial()
   498
   499      fname <- paste("Info_Contents", ont, organism, sep="_")
   500      tryCatch(utils::data(list=fname, package="ppiPre", envir=ppiPreEnv))
   501      InfoContents <- get("IC", envir=ppiPreEnv)
   502
   503      rootCount <- max(InfoContents[InfoContents != Inf])
   504      InfoContents["all"] = 0
   505      p1 <- InfoContents[GOID1]/rootCount
   506      p2 <- InfoContents[GOID2]/rootCount
   507      if(is.na(p1) || is.na(p2)) return (NA)
   508      if (p1 == 0 || p2 == 0) return (NA)
   509      Ancestor.name <- switch(ont,MF = "MFAncestors",BP = "BPAncestors",CC = "CCAncestors")
   510      if (!exists(Ancestor.name, envir=ppiPreEnv)) {
   511          TCSSGetAncestors(ont)
   512      }
   513
   514      Ancestor <- get(Ancestor.name, envir=ppiPreEnv)
   515      ancestor1 <- unlist(Ancestor[GOID1])
   516      ancestor2 <- unlist(Ancestor[GOID2])
   517      if (GOID1 == GOID2) {
   518          commonAncestor <- GOID1
   519      } else if (GOID1 %in% ancestor2) {
   520          commonAncestor <- GOID1
   521      } else if (GOID2 %in% ancestor1) {
   522          commonAncestor <- GOID2
   523      } else {
   524          commonAncestor <- intersect(ancestor1, ancestor2)
   525      }
   526      if (length(commonAncestor) == 0)
   527          LCA<-NULL
   528      max<- -100
   529      LCA<-NULL
   530      for(a in commonAncestor){
   531          if(!is.na(InfoContents[a])) {
   532              if(InfoContents[a]>max){
   533                  max<-InfoContents[a]
   534                  LCA<-a
   535              }
   536          }
   537      }
   538  #message("done...")
   539      return (LCA)
   540
   541  }

also identical to the one in GOSemSim 1.6.8:

   280  `ygcInfoContentMethod` <- function(GOID1, GOID2, ont, measure, organism) {
   281      if(!exists("GOSemSimEnv")) .initial()
   282      fname <- paste("Info_Contents", ont, organism, sep="_")
   283      tryCatch(utils::data(list=fname, package="GOSemSim", envir=GOSemSimEnv))
   284      Info.contents <- get("IC", envir=GOSemSimEnv)
   285
   286      rootCount <- max(Info.contents[Info.contents != Inf])
   287      Info.contents["all"] = 0
   288      p1 <- Info.contents[GOID1]/rootCount
   289      p2 <- Info.contents[GOID2]/rootCount
   290
   291      if (p1 == 0 || p2 == 0) return (NA)
   292      Ancestor.name <- switch(ont,
   293          MF = "MFAncestors",
   294          BP = "BPAncestors",
   295          CC = "CCAncestors"
   296      )
   297      if (!exists(Ancestor.name, envir=GOSemSimEnv)) {
   298          ygcGetAncestors(ont)
   299      }
   300
   301      Ancestor <- get(Ancestor.name, envir=GOSemSimEnv)
   302      ancestor1 <- unlist(Ancestor[GOID1])
   303      ancestor2 <- unlist(Ancestor[GOID2])
   304      if (GOID1 == GOID2) {
   305          commonAncestor <- GOID1
   306      } else if (GOID1 %in% ancestor2) {
   307          commonAncestor <- GOID1
   308      } else if (GOID2 %in% ancestor1) {
   309          commonAncestor <- GOID2
   310      } else {
   311          commonAncestor <- intersect(ancestor1, ancestor2)
   312      }
   313      if (length(commonAncestor) == 0) return (NA)
   314      pms <- max(Info.contents[commonAncestor], na.rm=TRUE)/rootCount
   315      sim<-switch(measure,
   316          Resnik = pms,
   317          Lin = pms/(p1+p2),
   318          Jiang = 1 - min(1, -2*pms + p1 + p2),
   319          Rel = 2*pms/(p1+p2)*(1-exp(-pms*rootCount))
   320      )
   321      return (sim)
   322  }

Let’s look at some helper functions in ppiPre:

   477  rebuildICdata <- function(){
   478      ont <- c("MF","CC", "BP")
   479      species <- c("human", "rat", "mouse", "fly", "yeast", "zebrafish", "arabidopsis","worm", "ecolik12", "bovine","canine","anopheles","ecsakai","chicken","chimp","malaria","rhesus","pig","xenopus","coelicolor")
   480      cat("------------------------------------\n")
   481      cat("calulating Information Content...\nSpecies:\t\tOntology\n")
   482      for (i in ont) {
   483          for (j in species) {
   484              cat(j)
   485              cat("\t\t\t")
   486              cat(i)
   487              cat("\n")
   488              TCSSComputeIC(ont=i, organism=j)
   489          }
   490      }
   491      cat("------------------------------------\n")
   492      message("done...")
   493  }

Again, it is identical to GOSemSim 1.6.8:

   390  rebuildICdata <- function(){
   391      ont <- c("MF","CC", "BP")
   392      species <- c("human", "rat", "mouse", "fly", "yeast", "zebrafish", "arabidopsis","worm", "ecolik12", "bovine","canine","anopheles","ecsakai","chicken","chimp","malaria","rhesus","pig","xenopus")
   393      cat("------------------------------------\n")
   394      cat("calulating Information Content...\nSpecies:\t\tOntology\n")
   395      for (i in ont) {
   396          for (j in species) {
   397              cat(j)
   398              cat("\t\t\t")
   399              cat(i)
   400              cat("\n")
   401              ygcCompute_Information_Content(ont=i, organism=j)
   402          }
   403      }
   404      cat("------------------------------------\n")
   405      print("done...")
   406  }

Let’s look at the internal function TCSSComputeIC in ppiPre:

   410  TCSSComputeIC <- function(dropCodes="IEA", ont, organism) {
   411  message("Calulating IC...")
   412      wh_ont <- match.arg(ont, c("MF", "BP", "CC"))
   413      wh_organism <- match.arg(organism, c("human", "fly", "mouse", "rat", "yeast", "zebrafish", "worm", "arabidopsis", "ecolik12", "bovine","canine","anopheles","ecsakai","chicken","chimp","malaria","rhesus","pig","xenopus", "coelicolor"))
   414      CheckAnnotationPackage(wh_organism)
   415      gomap <- switch(organism,
   416          human = org.Hs.egGO,
   417          fly = org.Dm.egGO,
   418          mouse = org.Mm.egGO,
   419          rat = org.Rn.egGO,
   420          yeast = org.Sc.sgdGO,
   421          zebrafish = org.Dr.egGO,
   422          worm = org.Ce.egGO,
   423          arabidopsis = org.At.tairGO,
   424          ecoli = org.EcK12.egGO,
   425          bovine  = org.Bt.egGO,
   426          canine  = org.Cf.egGO,
   427          anopheles   =   org.Ag.egGO,
   428          ecsakai =   org.EcSakai.egGO,
   429          chicken =   org.Gg.egGO,
   430          chimp   =   org.Pt.egGO,
   431          malaria =   org.Pf.plasmoGO,
   432          rhesus  =   org.Mmu.egGO,
   433          pig = org.Ss.egGO,
   434          xenopus =   org.Xl.egGO,
   435          coelicolor  =   org.Sco.egGO
   436      )
   437
   438      mapped_genes <- mappedkeys(gomap)
   439      gomap = AnnotationDbi::as.list(gomap[mapped_genes])
   440      if (!is.null(dropCodes)){
   441          gomap<-sapply(gomap,function(x) sapply(x,function(y) c(y$Evidence %in% dropCodes, y$Ontology %in% wh_ont)))
   442          gomap<-sapply(gomap, function(x) x[2,x[1,]=="FALSE"])
   443          gomap<-gomap[sapply(gomap,length) >0]
   444      }else {
   445          gomap <- sapply(gomap,function(x) sapply(x,function(y) y$Ontology %in% wh_ont))
   446      }
   447
   448      goterms<-unlist(sapply(gomap, function(x) names(x)), use.names=FALSE) # all GO terms appearing in an annotation
   449      goids <- toTable(GOTERM)
   450
   451      goids <- unique(goids[goids[,"Ontology"] == wh_ont, "go_id"])
   452      gocount <- table(goterms)
   453      goname <- names(gocount) #goid of specific organism and selected category.
   454
   455      go.diff <- setdiff(goids, goname)
   456      m <- double(length(go.diff))
   457      names(m) <- go.diff
   458      gocount <- as.vector(gocount)
   459      names(gocount) <- goname
   460      gocount <- c(gocount, m)
   461
   462      Offsprings.name <- switch(wh_ont,
   463          MF = "MFOffsprings",
   464          BP = "BPOffsprings",
   465          CC = "CCOffsprings"
   466      )
   467      if (!exists(Offsprings.name, envir=ppiPreEnv)) {
   468          TCSSGetOffsprings(wh_ont)
   469      }
   470      Offsprings <- get(Offsprings.name, envir=ppiPreEnv)
   471      cnt <- sapply(goids,function(x){ c=gocount[unlist(Offsprings[x])]; gocount[x]+sum(c[!is.na(c)])})
   472      names(cnt) <- goids
   473      IC<- -log(cnt/sum(gocount))
   474  message("done...")
   475      return (IC)
   476  }

and ygcCompute_Information_Content in GOSemSim 1.6.8:

   326  ygcCompute_Information_Content <- function(dropCodes="NULL", ont, organism) {
   327      wh_ont <- match.arg(ont, c("MF", "BP", "CC"))
   328      wh_organism <- match.arg(organism, c("human", "fly", "mouse", "rat", "yeast", "zebrafish", "worm", "arabidopsis", "ecolik12", "bovine","canine","anopheles","ecsakai","chicken","chimp","malaria","rhesus","pig","xenopus"))
   329      ygcCheckAnnotationPackage(wh_organism)
   330      gomap <- switch(wh_organism,
   331          human = org.Hs.egGO,
   332          fly = org.Dm.egGO,
   333          mouse = org.Mm.egGO,
   334          rat = org.Rn.egGO,
   335          yeast = org.Sc.sgdGO,
   336          zebrafish = org.Dr.egGO,
   337          worm = org.Ce.egGO,
   338          arabidopsis = org.At.tairGO,
   339          ecolik12 = org.EcK12.egGO,
   340          bovine  = org.Bt.egGO,
   341          canine  = org.Cf.egGO,
   342          anopheles   =   org.Ag.egGO,
   343          ecsakai =   org.EcSakai.egGO,
   344          chicken =   org.Gg.egGO,
   345          chimp   =   org.Pt.egGO,
   346          malaria =   org.Pf.plasmoGO,
   347          rhesus  =   org.Mmu.egGO,
   348          pig = org.Ss.egGO,
   349          xenopus =   org.Xl.egGO
   350      )
   351      mapped_genes <- mappedkeys(gomap)
   352      gomap = AnnotationDbi::as.list(gomap[mapped_genes])
   353      if (!is.null(dropCodes)){
   354          gomap<-sapply(gomap,function(x) sapply(x,function(y) c(y$Evidence %in% dropCodes, y$Ontology %in% wh_ont)))
   355          gomap<-sapply(gomap, function(x) x[2,x[1,]=="FALSE"])
   356          gomap<-gomap[sapply(gomap,length) >0]
   357      }else {
   358          gomap <- sapply(gomap,function(x) sapply(x,function(y) y$Ontology %in% wh_ont))
   359      }
   360
   361      goterms<-unlist(sapply(gomap, function(x) names(x)), use.names=FALSE) # all GO terms appearing in an annotation
   362      goids <- toTable(GOTERM)
   363      # all go terms which belong to the corresponding category..
   364      goids <- unique(goids[goids[,"Ontology"] == wh_ont, "go_id"])
   365      gocount <- table(goterms)
   366      goname <- names(gocount) #goid of specific organism and selected category.
   367      ## ensure goterms not appearing in the specific annotation have 0 frequency..
   368      go.diff <- setdiff(goids, goname)
   369      m <- double(length(go.diff))
   370      names(m) <- go.diff
   371      gocount <- as.vector(gocount)
   372      names(gocount) <- goname
   373      gocount <- c(gocount, m)
   374
   375      Offsprings.name <- switch(wh_ont,
   376          MF = "MFOffsprings",
   377          BP = "BPOffsprings",
   378          CC = "CCOffsprings"
   379      )
   380      if (!exists(Offsprings.name, envir=GOSemSimEnv)) {
   381          ygcGetOffsprings(wh_ont)
   382      }
   383      Offsprings <- get(Offsprings.name, envir=GOSemSimEnv)
   384      cnt <- sapply(goids,function(x){ c=gocount[unlist(Offsprings[x])]; gocount[x]+sum(c[!is.na(c)])})
   385      names(cnt) <- goids
   386      IC<- -log(cnt/sum(gocount))
   387      save(IC, file=paste(paste("Info_Contents", wh_ont, organism, sep="_"), ".rda", sep=""))
   388  }

Another helper function GetGOMap in ppiPre:

   308  GetGOMap <- function(organism="yeast") {
   309      if(!exists("ppiPreEnv")) .initial()
   310      CheckAnnotationPackage(organism) #download and install the packages
   311      species <- switch(organism,
   312          human = "Hs",
   313          fly = "Dm",
   314          mouse = "Mm",
   315          rat = "Rn",
   316          yeast = "Sc",
   317          zebrafish = "Dr",
   318          worm = "Ce",
   319          arabidopsis = "At",
   320          ecolik12 = "EcK12",
   321          bovine  = "Bt",
   322          canine  = "Cf",
   323          anopheles   =   "Ag",
   324          ecsakai =   "EcSakai",
   325          chicken =   "Gg",
   326          chimp   =   "Pt",
   327          malaria =   "Pf",
   328          rhesus  =   "Mmu",
   329          pig = "Ss",
   330          xenopus =   "Xl",
   331          coelicolor  =   "Sco"
   332      )
   333
   334      gomap <- switch(organism,
   335          human = org.Hs.egGO,
   336          fly = org.Dm.egGO,
   337          mouse = org.Mm.egGO,
   338          rat = org.Rn.egGO,
   339          yeast = org.Sc.sgdGO,
   340          zebrafish = org.Dr.egGO,
   341          worm = org.Ce.egGO,
   342          arabidopsis = org.At.tairGO,
   343          ecoli = org.EcK12.egGO,
   344          bovine  = org.Bt.egGO,
   345          canine  = org.Cf.egGO,
   346          anopheles   =   org.Ag.egGO,
   347          ecsakai =   org.EcSakai.egGO,
   348          chicken =   org.Gg.egGO,
   349          chimp   =   org.Pt.egGO,
   350          malaria =   org.Pf.plasmoGO,
   351          rhesus  =   org.Mmu.egGO,
   352          pig = org.Ss.egGO,
   353          xenopus =   org.Xl.egGO,
   354          coelicolor  =   org.Sco.egGO
   355      )
   356
   357      assign(eval(species), gomap, envir=ppiPreEnv)
   358  }

My ygcGetGOMap in GOSemSim 1.6.8:

   100  ygcGetGOMap <- function(organism="human") {
   101      if(!exists("GOSemSimEnv")) .initial()
   102      ygcCheckAnnotationPackage(organism)
   103      species <- switch(organism,
   104          human = "Hs",
   105          fly = "Dm",
   106          mouse = "Mm",
   107          rat = "Rn",
   108          yeast = "Sc",
   109          zebrafish = "Dr",
   110          worm = "Ce",
   111          arabidopsis = "At",
   112          ecolik12 = "EcK12",
   113          bovine  = "Bt",
   114          canine  = "Cf",
   115          anopheles   =   "Ag",
   116          ecsakai =   "EcSakai",
   117          chicken =   "Gg",
   118          chimp   =   "Pt",
   119          malaria =   "Pf",
   120          rhesus  =   "Mmu",
   121          pig = "Ss",
   122          xenopus =   "Xl"
   123      )
   124      gomap <- switch(organism,
   125          human = org.Hs.egGO,
   126          fly = org.Dm.egGO,
   127          mouse = org.Mm.egGO,
   128          rat = org.Rn.egGO,
   129          yeast = org.Sc.sgdGO,
   130          zebrafish = org.Dr.egGO,
   131          worm = org.Ce.egGO,
   132          arabidopsis = org.At.tairGO,
   133          ecolik12 = org.EcK12.egGO,
   134          bovine  = org.Bt.egGO,
   135          canine  = org.Cf.egGO,
   136          anopheles   =   org.Ag.egGO,
   137          ecsakai =   org.EcSakai.egGO,
   138          chicken =   org.Gg.egGO,
   139          chimp   =   org.Pt.egGO,
   140          malaria =   org.Pf.plasmoGO,
   141          rhesus  =   org.Mmu.egGO,
   142          pig = org.Ss.egGO,
   143          xenopus =   org.Xl.egGO
   144      )
   145      assign(eval(species), gomap, envir=GOSemSimEnv)
   146  }

There are many other small helper functions that are identical. ppiPre copy most of the source code of GOSemSim. There is 862 lines in GOKEGGSims.r, in which only the following function is about KEGG that is not related to GOSemSim.

    10  KEGGSim <- function(protein1, protein2)    # KEGG-based similarity of two proteins
    11  {
    12
    13    if(!require("KEGG.db")){ stop("package KEGG.db is needed.")}
    14    Pathway1 <- KEGG.db::KEGGEXTID2PATHID[[protein1]]
    15      Pathway2 <- KEGG.db::KEGGEXTID2PATHID[[protein2]]
    16      intersec <- length(na.omit(match(Pathway1, Pathway2)))
    17      if(intersec==0)
    18          sim<-0
    19      else
    20          sim<-intersec/(length(Pathway1)+length(Pathway2)-intersec)
    21      return(sim)
    22  }

This function is only 12 lines, and it calculates the similarity by divide the intersect to the total sum. The other lines in GOKEGGSims.r, more than 800 lines, were totally copied from GOSemSim. Other source files in the ppiPre only has less than 450 lines in sum. About 2/3 of ppiPre were copied from GOSemSim.

The author of ppiPre changed the function names and pretend it is their original works. They just copy and paste and take the credit of months of development of GOSemSim. This is really sucks.

After I found this issue, I add a proper use of GOSemSim statement in its github page:

I am very glad that many people find GOSemSim useful and GOSemSim has been cited by 114 (by google scholar, Aug, 2014).

There are two R packages BiSEp and tRanslatome depend on GOSemSim and three R packages clusterProfiler, DOSE and Rcpi import GOSemSim.

SemDist package copy some of the source code from GOSemSim with acknowledging within source code and document.

ppiPre package copy many source code from GOSemSim without any acknowledgement in souce code or document and did not cited GOSemSim in their publication. This violates the restriction of open source license.

For R developers, if you found functions provided in GOSemSim useful, please depends or imports GOSemSim. If you would like to copy and paste source code, you should acknowledge the source code was copied/derived from GOSemSim authored by Guangchuang Yu guangchuangyu@gmail.com within source code, add GOSemSim in Suggests field and also includes the following reference in the man files for functions that copied/derived from GOSemSim and cited in vignettes.

\references{ Yu et al. (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products \emph{Bioinformatics} (Oxford, England), 26:7 976–978, April 2010. ISSN 1367-4803 \url{http://bioinformatics.oxfordjournals.org/cgi/content/abstract/26/7/976} PMID: 20179076 }

You are welcome to use GOSemSim in the way you like, but please cite it and give it the proper credit. I hope you can understand.