I have a large data set (I can not fit the whole data on the memory) I can fit GMM on this data set Want to
Can I repeatedly use GMM.fit ()
( sklearn.mixture.GMM
) on the mini batch of data ??
There is no reason to be frequent fit as soon as you think your machine is in the right time Randomly sample as much data as can be calculated. If the difference is not very high, then random sampling will have distribution of almost equal dataset.
randomly_sampled = np.random.choice (full_dataset, size = 10000, replace = false) # If the data is not fit in the archive, then you can find a way to sample it randomly. When you use GMM.fit (randomly_sampled)
and use
GMM .PEDICT (full_dataset) # Then you fit with a batch or by batch If you can not read
in memory to teach it.
No comments:
Post a Comment