Recent technology platforms in proteomics and genomics produce count data for quantitative analysis. In proteomics, the number of MS/MS events observed for a protein in the mass spectrometer has been shown to correlate strongly with the protein's abundance in a complex mixture. In genomics, next-generation sequencing technologies use read count as a reliable measure of the abundance of the target transcript. This R package contains two functions for statistical analysis of count data.
The package is now available and maintained via CRAN https://cran.r-project.org/package=countdata.
User manual | Example data and R scripts
For latest R (from 4.0.0)
Windows (both 32 bit & 64 bit)
| MacOS (64 bit)
| Linux 64 bit
For 3.5.0 <= R <= 3.6.3
Windows (both 32 bit & 64 bit)
| MacOS (64 bit)
| Linux 64 bit
On Windows, go to R menu "Packages" --> "Install package(s) from local zip files", and then select the downloaded zip file.
On Linux and MacOS, use the shell command: R CMD INSTALL <the downloaded file>
All rights reserved by the author. This software package is provided for research purposes in a non-commercial environment. Please do not redistribute.
Contact: Thang V Pham
Performs the beta-binomial test for count data.
bb.test(x, tx, group, alternative = c("two.sided", "less", "greater"), n.threads = 1)
x |
A vector or matrix of counts. When x a matrix, the test is performed row by row. |
tx |
A vector or matrix of the total sample counts. When tx is a matrix, the number of rows must be equal to the number of rows of x. |
group |
A vector of group indicators. |
alternative |
A character string specifying the alternative hypothesis: "two.sided" (default), "greater" or "less". |
n.threads |
The number of threads to be used. |
When n.threads is 0, the maximal number of CPU cores is used. When n.threads is -1, one CPU core less than the maximum is used, and so on.
A list with a single component is returned
p.value |
The p-value of the test. |
Thang V. Pham <t.pham@vumc.nl>
Pham TV, Piersma SR, Warmoes M, Jimenez CR (2010) On the beta binomial model for analysis of spectral count data in label-free tandem mass spectrometry-based proteomics. Bioinformatics, 26(3):363-369.
# example proteomics spectral count data x <- c(1, 5, 1, 10, 9, 11, 2, 8) tx <- c(19609, 19053, 19235, 19374, 18868, 19018, 18844, 19271) group <- c(rep("cancer", 3), rep("normal", 5)) bb.test(x, tx, group) ###################### # comparing 3 groups: columns c(1, 2, 3), c(4, 5, 6), and c(7, 8) of a data file d <- read.delim("example-3groups.txt", header = TRUE) # compare 3 groups, using all available CPU cores out <- bb.test(d[, 1:8], colSums(d[, 1:8]), c(rep("a", 3), rep("b", 3), rep("c", 2)), n.threads = 0) # write result to file write.table(cbind(d, out$p.value), file = "example-3groups-out.txt", sep = "\t", row.names = FALSE)
Performs the inverted beta-binomial test for paired count data.
ibb.test(x, tx, group, alternative = c("two.sided", "less", "greater"), n.threads = 1)
x |
A vector or matrix of counts. When x is a matrix, the test is performed row by row. |
tx |
A vector or matrix of the total sample counts. When tx is a matrix, the number of rows must be equal to the number of rows of x. |
group |
A vector of group indicators. There should be two groups of equal size. The samples are matched by the order of appearance in each group. |
alternative |
A character string specifying the alternative hypothesis: "two.sided" (default), "greater" or "less". |
n.threads |
The number of threads to be used. |
This test is designed for paired count data, for example data acquired before and after treatment.
A list of values is returned
p.value |
The p-value of the test. |
fc |
An estimation of the common fold change. |
Thang V. Pham <t.pham@vumc.nl >
Pham TV, Jimenez CR (2012) An accurate paired sample test for count data. Bioinformatics, 28(18):i596-i602.
# example RNA-seq read count data x <- c(33, 32, 86, 51, 52, 149) tx <- c(7742608, 15581382, 20933491, 7126839, 13842297, 14760103) group <- c(rep("cancer", 3), rep("normal", 3)) ibb.test(x, tx, group) ###################### # columns c(1, 2, 3) are respectively paired with columns c(4, 5, 6) d <- read.delim("example-paired.txt", header = TRUE) # perform a paired test for all rows, using all but one CPU cores out <- ibb.test(d[, 1:6], colSums(d[, 1:6]), c(rep("pre", 3), rep("post", 3)), n.threads = -1) # write result to file write.table(cbind(d, out$fc, out$p.value), file = "example-paired-out.txt", sep = "\t", row.names = FALSE)