Fine-Tuning for Downstream Tasks
All Word2Vec and transformer-based embeddings and any embedding followed with an svd
transformation are fine-tunable for downstream tasks. In other words, if you pass the resulting fine-tunable embedding to a PyTorch training method, the features will automatically be trained for your application.
The fine-tuning is disabled by default. To activate it, turn the is_finetuneable
parameter on and specify a torch dtype
as the type of the output.
emb = TextWiser(Embedding.Word(word_option=WordOptions.word2vec),
Transformation.Pool(pool_option=PoolOptions.max),
is_finetuneable=True, dtype=torch.float32)
Notice also that setting the sparse
parameter of a WordOptions.word2vec
model to True
can yield significant speedup during training. Currently, optim.SGD
(CUDA and CPU), optim.SparseAdam
(CUDA and CPU), and optim.Adagrad
(CPU) support sparse embeddings.
Fine Tuning Example shows the ability to fine-tune embeddings, and how they can improve the prediction performance.