... | @@ -35,14 +35,14 @@ Perplexity of individual checkpoints on each test set (wikipedia, tiger and 10kG |
... | @@ -35,14 +35,14 @@ Perplexity of individual checkpoints on each test set (wikipedia, tiger and 10kG |
|
## DE trained from scratched on diverse data
|
|
## DE trained from scratched on diverse data
|
|
|
|
|
|
The model is trained on 70% of the Wikipedia dataset (same as the "normal" DE model) and a 90% fraction of the datasets originally ment for only validation: Tiger, 10kGNAD (news corpus), Europarl.
|
|
The model is trained on 70% of the Wikipedia dataset (same as the "normal" DE model) and a 90% fraction of the datasets originally ment for only validation: Tiger, 10kGNAD (news corpus), Europarl.
|
|
The model is trained for 20 epochs (= 122400 steps). Final eval perplexities for the validation datasets (remaining 10% of the datasets mentioned above + 5% of wikipedia) are:
|
|
The model is trained for 20 epochs (= 122400 steps). Best eval perplexities for the validation datasets (remaining 10% of the datasets mentioned above + 5% of wikipedia) are:
|
|
|
|
|
|
| DATASET | PPL |
|
|
| DATASET | PPL (at 45k steps) |
|
|
| --------- | --------- |
|
|
| --------- | --------- |
|
|
| Wikipedia | 81 |
|
|
| Wikipedia | 48 |
|
|
| Tiger | 2599 |
|
|
| Tiger | 792 |
|
|
| 10kGNAD | 534 |
|
|
| 10kGNAD | 231 |
|
|
| Europarl | 855 |
|
|
| Europarl | 326 |
|
|
|
|
|
|
# LINKS
|
|
# LINKS
|
|
|
|
|
... | | ... | |