Skip to main content

How to add a new task schema for your dataset?

This will happen when you aim to add a new dataset whose task schema has NOT been supported by existing tasks schemas. In this case, you need to mannually add a new task schema. Here are existing supported task schema

(Check out this doc to know what is the task schema)

Example​

Suppose that we want to add the sequence-labeling as a new task schema, which requires three steps:

1. creat a script for the class​

We need to creat a script (sequence_labeling.py) in the folder to claim the class SequenceLabeling.

2. claim the new class in __init__.py​

We then need to register the information of new class at __init__.py

Tips​

  • The motivation of introducing task schema is to help us easily standardize (normalize) different datasets from the same task category. For example, the samples from both ag_news (topic classification) and sst2 (sentiment classification) should be formatted as text and label. The advantage of doing this is we can easily process all datasets within this task category in a unified way (without any additional preprocessing).
  • Once we introduce a new task schema, we can first refer to the schema of similar tasks and incrementally extend it. (incrementally kinda means partially inherit the similar task schema.) For example,
  • you can refer to QuestionAnsweringExtractive if you aim to introduce other QA-based tasks.
  • you can refer to Summarization if other new generation tasks are being added.