I have some text data with multiple labels for each document. I use this Dataset for using an LSTM network I want to train. I came but it only facilitates a binary classification work. If someone has a suggestion on which to move forward, then it will be very good that I just need an initial viable direction, I can work.
Thanks, Amit
1) Change the last layer of the model. Ie
pred = tensor.nnet.softmax (tensor.dot (proj, tparams ['u']) + tparams ['b'])
Some other layers should be replaced by, e.g. sigmoid:
pred = tensor.nnet.sigmoid (tensor.dot (proj, tparams ['u']) + tparams ['b' ])
2) The cost should also be changed.
Ie
cost = -tensor.log (ex [tensor.arange (n_samples), y] + off) .mean ()
should be replaced at some other cost, e.g. cross-entropy:
one = np.float 32 (1.0) pred = T.clip (ex, 0.0001, 0.9999) # Do not stop the log cost = -t sum (y * T.log (pred) + (a - y) * T.log (one - future), axis = 1) # zodiac = t at all label cost. Pisces (cost, axis = 0) # Meaning of calculation on samples
3) In the function build_model (tparams, options)
, you should change it:
y = tensor.vector ('Y', dtype = 'int64')
y = tensor.matrix ('Y', dtype = 'int64') # Each line is a sample label for Y as [0 0] Sklearn.preprocessing.MultiLabelBinarizer () may be easy
4 ) Change pred_error ()
so that it supports multiballable (like some metrics like accuracy or F1 scores from scikit-learn).
No comments:
Post a Comment