Monday, 15 July 2013

r - Determining the number of non-zero cells and calculating the prevalence by a stratifying variable -


I have spent a great time and can not find a solution to my specific question. I would really appreciate any help.

I have a large data frame (298 variables of 1258 objects) where each line is a sample record of a participant and each column is a specific bacterial gene found within the sample. I then have several records for each participant, which also shows as a column variable.

Here is an example of how data frames can look like. C (0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.1, 0.5, 0.7, 0.1, 0.0) Panibilasil & lt; - c (0.0, 0.1, 0.7, 0.3, 0.5, 0.7, 0.0, 0.0, 0.0, 0.3, 0.3, 0.0) Psychologist & lt; -c (0.1, 0.1, 0.5, 0.0, 0.0, 0.0, 0.3, 0.6, 0.0, 0.6, 0.7, 0.0) Staphylococcus & lt; - c (0.5, 0.0, 0.3, 0.0, 0.3, 0.2, 0.5, 0.0, 0.4, 0.1, 0.1, 0.5) Timepoint & lt; Sample (c, "B", "C", "D", "E", "F", "A", "B", "C", "D", "E", "F") DF & lt; - Data.fr (CorineBacterium, Panibisilus, Psychotrocessor, Timepoint)

I would like to know the number of non-zero cells on the total number of cells for a given timepoint.

For example: For Coronetabacterium in Timepoint A, this would be #nonzirollen / total # cells = 1/2 = 0.5. There is a different way to consider this at Time Point A 50% of the cells for Corinebacterium are non-zero.

Here is a dplyr The answer is:

  sample df%>% by group_ (timepoint)%>% summary (fun (zodiac (!! = 0) / length (.))) #timepoint Corynebacterium Panibysilus psychobacterium Staphylococcus # 1 A 0.5 0.0 1.0 1.0 # 2 B 0.5 0.5 1.0 0.0 # 3 C 0.5 0.5 0.5 1.0 # 4D 0.5 1.0 0.5 0.5 # 5 E 0.5 1.0 0.5 1.0 # 6 F 0.0 0.5 0.0 1.0  

You can do this very easily in the base R:

total (. ~ Timepoint, data = sample DF, function (x) sum (x! = 0) / length (x))

No comments:

Post a Comment