Tuesday, 15 May 2012

python - Building a dataframe in an efficient way from dictionary -


I have data of large data that I have a process and generated a dictionary. Now I make dataframe from this dictionary Want to The dictionary has a list of Valle Tuples. From those standards, I need to know the unique values ​​for creating dataframe columns:

  d = '' 0001 ': [(' Skiing ', 0.789) '(' Snow '0.65), (' winter ', 0.56)],' 0002 ': [(' drama ', 0.8 9), (' comedy ', 0.678), (' action ', - 0.42) (' winter '(' Children ', 0.12)],' 0003 ': [(' Action ', 0.89), (' funny ', 0.58), (' game ', 0.12)],' 0004 ': [ ('Dark', 0.8 9), ('cartoon', -0.89), ('comedy', 0.678), ('mystery', 0.678), ('crime', 0.12), ('adult', - 0.423 ), '0005': ('Action', 0.12)], '0006': [('drama', -0.49), ('funny', 0.378), (' Spence ', 0.12), (' Thriller ', 0.78)],' 0007 ': ((' Dark ', 0.79), (' Mystery ', 0.88), (' Crime ', 0.32), (' Adult ' (Approximately 800,000 records of the word dictionary)  

I repeat on the dictionary to find unique headers:

 

I believe it takes a long time to process its sub There can also be a problem with the code, because it is very slow and further, when I create raw data frames by raw, it slows further process:

In d in d: df.loc [K] = pd.Series (d [k]) df.fillna (0.0, axis = 1) in df = pd.DataFrame (column = col_headers, index = entities) k )

How can I move this process to reduce the process time?

But you should also open the internal key-value pair With a dictionary

  df = pd.DataFrame.from_dict ({k: dict (v) for k, v in d.items ()}, orient = "index"). Filling (0)  

Alternatively, if you want to unify the style of column headings:

  df.columns = [c.lower () For df.columns]  

Enter image details here

If you wanted to be completely crazy, then you can sort the columns:

  df = df.sort (axis = 1)  

Enter image details here


No comments:

Post a Comment