R: The countdata package

Recent technology platforms in proteomics and genomics produce count data for quantitative analysis. In proteomics, the number of MS/MS events observed for a protein in the mass spectrometer has been shown to correlate strongly with the protein's abundance in a complex mixture. In genomics, next-generation sequencing technologies use read count as a reliable measure of the abundance of the target transcript. This R package contains two functions for statistical analysis of count data.

The beta-binomial test (bb.test) can be used for significance analysis of independent samples (two or more groups).
The inverted beta-binomial test (ibb.test) can be used for paired sample testing (e.g. pre-treatment and post-treatment data).

The package is now available and maintained via CRAN https://cran.r-project.org/package=countdata.

The following is for archival purposes.

User manual | Example data and R scripts

INSTALLATION
============

For latest R (from 4.0.0)
Windows (both 32 bit & 64 bit) | MacOS (64 bit) | Linux 64 bit

For 3.5.0 <= R <= 3.6.3
Windows (both 32 bit & 64 bit) | MacOS (64 bit) | Linux 64 bit

For 3.0.0 <= R <= 3.4.4
Windows (both 32 bit & 64 bit) | MacOS (64 bit) | Linux 32 bit | Linux 64 bit

For R < 3.0.0
Windows (both 32 bit & 64 bit) | MacOS (both 32 bit & 64 bit) | Linux 32 bit | Linux 64 bit

To check your R version, enter 'version' in your R console.

On Windows, go to R menu "Packages" --> "Install package(s) from local zip files", and then select the downloaded zip file.

On Linux and MacOS, use the shell command: R CMD INSTALL <the downloaded file>

Contact: Thang V Pham

The beta-binomial test

Description

Performs the beta-binomial test for count data.

Usage

bb.test(x, tx, group, alternative = c("two.sided", "less", "greater"), n.threads = 1)

Arguments

x	A vector or matrix of counts. When x a matrix, the test is performed row by row.
tx	A vector or matrix of the total sample counts. When tx is a matrix, the number of rows must be equal to the number of rows of x.
group	A vector of group indicators.
alternative	A character string specifying the alternative hypothesis: "two.sided" (default), "greater" or "less".
n.threads	The number of threads to be used.

Details

When n.threads is 0, the maximal number of CPU cores is used. When n.threads is -1, one CPU core less than the maximum is used, and so on.

Value

A list with a single component is returned

p.value

The p-value of the test.

Author

Thang V. Pham <t.pham@vumc.nl>

Reference

Pham TV, Piersma SR, Warmoes M, Jimenez CR (2010) On the beta binomial model for analysis of spectral count data in label-free tandem mass spectrometry-based proteomics. Bioinformatics, 26(3):363-369.

Examples


# example proteomics spectral count data

x <- c(1, 5, 1, 10, 9, 11, 2, 8)
tx <- c(19609, 19053, 19235, 19374, 18868, 19018, 18844, 19271)
group <- c(rep("cancer", 3), rep("normal", 5))
bb.test(x, tx, group)

######################

# comparing 3 groups: columns c(1, 2, 3), c(4, 5, 6), and c(7, 8) of a data file

d <- read.delim("example-3groups.txt", header = TRUE)

# compare 3 groups, using all available CPU cores

out <- bb.test(d[, 1:8], colSums(d[, 1:8]), c(rep("a", 3), rep("b", 3), rep("c", 2)), n.threads = 0)

# write result to file

write.table(cbind(d, out$p.value), file = "example-3groups-out.txt", sep = "\t", row.names = FALSE)

The inverted beta-binomial test

Description

Performs the inverted beta-binomial test for paired count data.

Usage

ibb.test(x, tx, group, alternative = c("two.sided", "less", "greater"), n.threads = 1)

Arguments

x	A vector or matrix of counts. When x is a matrix, the test is performed row by row.
tx	A vector or matrix of the total sample counts. When tx is a matrix, the number of rows must be equal to the number of rows of x.
group	A vector of group indicators. There should be two groups of equal size. The samples are matched by the order of appearance in each group.
alternative	A character string specifying the alternative hypothesis: "two.sided" (default), "greater" or "less".
n.threads	The number of threads to be used.

Details

This test is designed for paired count data, for example data acquired before and after treatment.

Value

A list of values is returned

p.value	The p-value of the test.
fc	An estimation of the common fold change.

Author

Thang V. Pham <t.pham@vumc.nl >

Reference

Pham TV, Jimenez CR (2012) An accurate paired sample test for count data. Bioinformatics, 28(18):i596-i602.

Examples

# example RNA-seq read count data

x <- c(33, 32, 86, 51, 52, 149)
tx <- c(7742608, 15581382, 20933491, 7126839, 13842297, 14760103)
group <- c(rep("cancer", 3), rep("normal", 3))
ibb.test(x, tx, group)

######################

# columns c(1, 2, 3) are respectively paired with columns c(4, 5, 6)

d <- read.delim("example-paired.txt", header = TRUE)

# perform a paired test for all rows, using all but one CPU cores

out <- ibb.test(d[, 1:6], colSums(d[, 1:6]), c(rep("pre", 3), rep("post", 3)), n.threads = -1)

# write result to file

write.table(cbind(d, out$fc, out$p.value), file = "example-paired-out.txt", sep = "\t", row.names = FALSE)

Last updated 2021, Thang Pham