Published as a conference paper at ICLR 2022 Task Dataset T0 Train Coreference Resolution Coreference Resolution Natural Language Inference Natural Language Inference Natural Language Inference Paraphrase Identification Paraphrase Identification Paraphrase Identification Closed-Book QA Closed-Book QA Closed-Book QA Closed-Book QA Closed-Book QA Closed-Book QA Extractive QA Extractive QA Extractive QA Extractive QA Extractive QA Extractive QA Extractive QA Extractive QA Extractive QA Extractive QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Multiple-Choice QA Sentiment Sentiment Sentiment Sentiment Sentiment Sentence Completion Sentence Completion Sentence Completion Structure-to-Text Structure-to-Text Summarization Summarization Summarization Summarization Summarization Topic Classification Topic Classification Topic Classification Word Sense Disambiguation super glue/wsc.fixed winogrande/winogrande xl super glue/cb super glue/rte anli glue/mrpc glue/qqp paws/labeled final ai2 arc/ARC Challenge ai2 arc/ARC Easy kilt tasks/hotpotqa trivia qa/unfiltered web questions wiki qa adversarial qa/dbidaf adversarial qa/dbert adversarial qa/droberta duorc/SelfRC duorc/ParaphraseRC ropes squad v2 super glue/record quoref tydiqa cos e/v1.11 cosmos qa dream openbookqa/main qasc quail quarel quartz race/high race/middle sciq social i qa super glue/boolq super glue/multirc wiki hop/original wiqa piqa amazon polarity app reviews imdb rotten tomatoes yelp review full super glue/copa story cloze/2016 hellaswag common gen wiki bio cnn dailymail/3.0.0 gigaword multi news samsum xsum ag news dbpedia 14 trec super glue/wic X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X T0+ Train X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X T0++ Train Eval X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X Table 5: All training and evaluation datasets. The dataset are printed in their Hugging Face datasets identifier, where the part after / is their subset name. Hotpot QA is recast as closed-book QA due to long input length. Full citations are included in Appendix G. 24