I have a data frame that includes pairs of elements found in many datasets. There is no difference in the order of joints They should be given alphabetical sequence once, though for example the first example may vary between the databases.
data & lt; - data.frame (i = C ("b", "b", "b", "c"), j = c ("c", "d", "d", "a"), + database = C (1,1,2, 3))
I want to generate a score for them which will show the ratio of the examples present in each pair, which will have only one pair.
I can imagine such a raw function like this: For every database that is specially included in me or Jammu, check whether they have J or I, # There is a connection with other special elements. Calculate the number of successes by # divided by: # count (Number of databases in which J or which is included)
Result I expect from the example data set (order unimportant ):
AC 0.5 BC 0.33 BD1
I could see how this crude loop system could work, but I'm pretty sure Is there more elegant solutions, able to help anyone? Maybe a graph in the library is a special task for this thanks!
Just playing a bit (i.e. merge) with joining
< Code> Library data (dplyr) & lt; - data.frame (i = c ("b", "b", "b", "c"), j = c ("c", "d", "d", "a"), database = c (1,1,2,3), stringsfactors = FALSE) # Sort pair dictionaries and calculations of pairs of data 2 & lt; - Introduction to the supporting index to identify the mutation (data, x = pmin (i, j), y pairs, (to include the pair) pairs_all $ pair_id and lt = pmax (i, j) pairs_all & lt; - summarize (Group_by (data2, x, y), n_all = length (unique) (database)); - 1: Nero (ducts) # pairs of elements of the number of occurrences R & lt; - Merge (pairs_all, summarized (group_key (merge) Do (merge (pair_all, transmitter (data2, x, db1 = database)), transmitter (data2, y, db Calculate the resultant transmitter (R, X, Y, N_OUL / N_AI) at the end. 2> =), pair_ID, n_eny = length (exclusive (union (db1, db2))) Pre>
No comments:
Post a Comment