DHara: python - Sklearn-GMM on large datasets -

Sunday, 15 May 2011

python - Sklearn-GMM on large datasets -

I have a large data set (I can not fit the whole data on the memory) I can fit GMM on this data set Want to

Can I repeatedly use GMM.fit () ( sklearn.mixture.GMM ) on the mini batch of data ??

There is no reason to be frequent fit as soon as you think your machine is in the right time Randomly sample as much data as can be calculated. If the difference is not very high, then random sampling will have distribution of almost equal dataset.

  randomly_sampled = np.random.choice (full_dataset, size = 10000, replace = false) # If the data is not fit in the archive, then you can find a way to sample it randomly. When you use GMM.fit (randomly_sampled)

and use

  GMM .PEDICT (full_dataset) # Then you fit with a batch or by batch If you can not read

in memory to teach it.

DHara

Sunday, 15 May 2011

python - Sklearn-GMM on large datasets -

No comments:

Post a Comment