How to name a feature:

A. For general features:

If there are two general features: gender_bias_name_female && lexical_richness

If there are one split : train

If there are two field: text && label

1. data set level:

The name of feature should follow this format:

{field name}_{splitname}_avg{feature name}

Ex. text_train_avg_gender_bias_name_female

2. sample level:

The name of feature should follow this format:

{field name}_{feature name}

Ex. text_gender_bias_name_female

B. ner

There are four features: true_entity_info_of && avg_span_length_of (dataset level) && avg_eCon_of (dataset level) && avg_eFre_of (dataset level)

If there are one split : train

If there are two field: tokens

1. data set level:

The name of feature should follow this format:

{feature name}_{field name}_{split name}

Ex. avg_eFre_of_tokens_train

2. sample level:

The name of feature should follow this format:

{feature name}_{field name}

Ex. true_entity_info_of_tokens

C. nli

Usually, the field of nli dataset are premise and hypothesis

If there are one split : train

There are three features: minus, add, divide

1. data set level:

The name of feature should follow this format:

premise_length_minus_hypothesis_avg_{split name}_length

premise_length_add_hypothesis_avg_{split name}_length

premise_length_divide_hypothesis_avg_{split name}_length

Ex. premise_length_divide_hypothesis_avg_train_length

2. sample level:

The name of feature should follow this format:

premise_length_minus_hypothesis_length

premise_length_add_hypothesis_length

premise_length_divide_hypothesis_length

D. QA

Usually, the field of nli dataset are question and context

If there are one split : train

There are bleu features:bleu, divide

1. data set level:

The name of feature should follow this format:

question_length_divide_context_avg_{split name}_length

bleuquestion_context_avg{split name}

Ex. premise_length_divide_hypothesis_avg_train_length

2. sample level:

The name of feature should follow this format:

question_length_divide_context_length

bleu_question_context

E. Summary

Usually, there are six features: density, coverage, compression, repetition, novelty, copy_length

If there are one split : train

If the field of summary dataset are summary document

1. data set level:

The name of feature should follow this format:

avg_density_of_{split}_{field0}_and_{field1}

avg_coverage_of_{split}_{field0}_and_{field1}

avg_compression_of_{split}_{field0}_and_{field1}

avg_repetition_of_{split}_{field0}_and_{field1}

avg_novelty_of_{split}_{field0}_and_{field1}

avg_copy_length_of_{split}_{field0}_and_{field1}

Ex. avg_copy_length_of_train_summary_and_document

2. sample level:

The name of feature should follow this format:

density_of_{field0}_and_{field1}

coverage_of_{field0}_and_{field1}

compression_of_{field0}_and_{field1}

repetition_of_{field0}_and_{field1}

novelty_of_{field0}_and_{field1}

copy_length_of_{field0}_and_{field1}

Ex. copy_length_of_summary_and_document

How to name a feature:

A. For general features:​

1. data set level:​

2. sample level:​

B. ner​

1. data set level:​

2. sample level:​

C. nli​

1. data set level:​

2. sample level:​

D. QA​

1. data set level:​

2. sample level:​

E. Summary​

1. data set level:​

2. sample level:​

A. For general features:

1. data set level:

2. sample level:

B. ner

1. data set level:

2. sample level:

C. nli

1. data set level:

2. sample level:

D. QA

1. data set level:

2. sample level:

E. Summary

1. data set level:

2. sample level: