Sunday, 15 May 2011

r - Merge function with several key repetitions on the "right-crossing" data frame -


Noob is sorry for the question, but in reality I do not know where to find me for this process that I should do. My problem is that I need to get the value of the second column in all the matches in the second column. Any kind of merge but in this case I have several repeating values ​​in the correct database, so I need all the matches.

To be clear: Suppose I have such a data frame:

  df < -data.frame (CustomerId = c ("a", "b", "c", "h")) 1 a 2 b3c4h  

and my other dataframe Something like this will happen:

  df2 & lt; -data.frame (customer ID = C ("a", "b", "b", "b", "a", "d", "c"), code = c (1,2,2,3, 2,4,4) Customer ID code 1A1 2B2 3B2 4B3A2 6D4 7C4  

Let me "merge" this two dataframes There is a need to do so that I can get all the code for each customer ID. I'll need something like this:

  Customer ID code 1 one 1,2 2B 2,3 3C4 4H NA  

I Found problems are:

  • Repeated several times on the crossing dataframe
  • I can use a loop but my database is so large that I avoid inefficiency
  • The amount of time that the customer ID can be repeated on the second datafile variable
  • This can be That repeats one of the customer ID multiple times but there is only one code, so I will only need one code, not all of them
  • If no other customer ID is found on the other customer, Value will be required

Thanks Guys for your help, hope can be a possible solution using data.table :

>

  Library (dataable) # data.trees to data.tables data = dt = data .table (df, key = " CustomerId ") dt2 = data.table (D. case sensitivity setname (DT2," CustomerID "," Customer ID ") can also retain all the files in the account, which have any important DT3 = Not merge (DT, DT2, all = t) #list "Customer ID" DT3 [list (code = list (unique (code)), = "customer ID"]   Separate the code by 

If anybody is feeling uncomfortable, the data point is any data beyond this point .table objects can be easily simplified as data.frame : df = data.frame (dt) .


No comments:

Post a Comment