How to add a new task schema for your dataset?
This will happen when you aim to add a new dataset whose task schema has NOT been supported by existing tasks schemas. In this case, you need to mannually add a new task schema. Here are existing supported task schema
(Check out this doc to know what is the task schema)
Example​
Suppose that we want to add the sequence-labeling
as a new task schema, which requires three steps:
1. creat a script for the class​
We need to creat a script (sequence_labeling.py
) in the folder to claim the class SequenceLabeling
.
2. claim the new class in __init__.py
​
We then need to register the information of new class at __init__.py
Tips​
- The motivation of introducing task schema is to help us easily standardize (normalize) different datasets from the same task category.
For example, the samples from both
ag_news
(topic classification) andsst2
(sentiment classification) should be formatted astext
andlabel
. The advantage of doing this is we can easily process all datasets within this task category in a unified way (without any additional preprocessing). - Once we introduce a new task schema, we can first refer to the schema of similar tasks and incrementally extend it. (
incrementally
kinda means partiallyinherit
the similar task schema.) For example, - you can refer to QuestionAnsweringExtractive if you aim to introduce other QA-based tasks.
- you can refer to Summarization if other new generation tasks are being added.