I have two letters of type R.
I want to be able to compare the context using the Jarwenkler to list the raw character list and assign the% equality score. For example if I have 10 reference items and twenty raw data items, then I want to be able to get the best score for the comparison and match it with the algorithm (2 vectors of 10). If I have raw figures of size of size 8 and 10, then I only get the best match with 2 vector results of 8 items and score per item
items There is a lot to see with match , MacD- to ice, 78, ice-cream
below is my code.
NumItems.Raw = Length (word) NumItems.Ref = length (word in context) (refitem in ref.Desc for) {jarowinkler (refitem, item) # Find the best match score # Find the best item in the reference table # Add both items for the vector # Deletion number. Wrap = loop}}
library (record linkage) library (Dplyr) referee & lt; - c ('cat', 'dog', 'tortoise', 'cow', 'dog', 'kiwi', 'emu', 'pig', 'sheep', 'horse', 'pig' The word 'sheep', 'koala', 'bear', 'fish' cow '', 'cat', 'horse') word list & lt; grid (word = word, ref = ref, stringsfactor = FALSE) ;% Group_by (word)%> mutate (match_score = jarowinkler (word, ref))% & gt;% summarize (match = match_core [j.max (match_core)], matched = to = referee [ Which
returns
matching words matching 1 cat 1.0000000 Cat 2 Cow 1.0000000 Cow 3 Dog 1.0000000 Dog 4 EMU 0.5277778 5 Horse Bear 1.0000000 Horse 6 Kiwi 0.5350000 Koala 7 Pig 1.0000000 Pig 8 Sheep 1.0000000 Sheep Edit: As an answer to OP comment, The last command pipeline uses dplyr
, and each combination group of raw words and contexts, with crude words, scores a column match with Zerovenkellar score, and only gets the highest score score. Ansh gives (which index.max (Match_score)), as well as reference which is indexed by Max match_score.
No comments:
Post a Comment