Skip to main content

Analyzing Summarization Datasets

Supported Datasets​

DatasetsVersionTask SchemaDataloaderComments
govreport-Summarizationload_dataset("govreport")Current definition: text, summary
dialogsumdocumentSummarizationload_dataset("dialogsum", "document")Current definition: text, summary
dialogsumdialogueDialogSummarizationload_dataset("dialogsum", "dialogue")Current definition: dialogue: {"speaker": List[str], "text": List[str]}, summary: List[str]
wikihow-Summarizationload_dataset("wikihow")Current definition: text, summary
wikisum-Summarizationload_dataset("wikisum")Current definition: text, summary
reddit_tifu-Summarizationload_dataset("reddit_tifu")Current definition: text, summary
bigpatent-Summarizationload_dataset("bigpatent")Current definition: text, summary
multi_xsciencesingle-documentSummarizationload_dataset("multi_xsience", "single-document")Current definition: text, summary
multi_xsciencemulti-documentMultiDocSummarizationload_dataset("multi_xsience", "multi-document")Current definition: texts: List[str], summary: str
multinewsraw-singleSummarizationload_dataset("multinews", "raw-single")raw data, Current definition: text, summary
multinewsraw-cleaned-singleSummarizationload_dataset("multinews", "raw-cleaned-single")cleaned raw data, Current definition: text, summary
multinewspreprocessed-singleSummarizationload_dataset("multinews", "preprocessed-single")preprocessed data, Current definition: text, summary
multinewstruncated-singleSummarizationload_dataset("multinews", "truncated-single")preprocessed and truncated data, Current definition: text, summary
multinewsraw-multiMultiDocSummarizationload_dataset("multinews", "raw-multi")raw data, Current definition: texts: List[str], summary: str
multinewsraw-cleaned-multiMultiDocSummarizationload_dataset("multinews", "raw-cleaned-multi")cleaned raw data, Current definition: texts: List[str], summary: str
multinewspreprocessed-multiMultiDocSummarizationload_dataset("multinews", "preprocessed-multi")preprocessed data, Current definition: texts: List[str], summary: str
multinewstruncated-multiMultiDocSummarizationload_dataset("multinews", "truncated-multi")preprocessed and truncated data, Current definition: texts: List[str], summary: str
samsumdocumentSummarizationload_dataset("samsum", "document")Current definition: text, summary
samsumdialogueDialogSummarizationload_dataset("samsum", "dialogue")Current definition: dialogue: {"speaker": List[str], "text": List[str]}, summary: List[str]
qmsumdocumentSummarizationload_dataset("qmsum", "document")Current definition: text, summary
qmsumquery-basedQuerySummarizationload_dataset("qmsum", "query-based")Current definition: text, summary, query