Skip to main content

How to name a feature:

A. For general features:​

If there are two general features: gender_bias_name_female && lexical_richness

If there are one split : train

If there are two field: text && label

1. data set level:​

The name of feature should follow this format:

{field name}_{splitname}_avg{feature name}

Ex. text_train_avg_gender_bias_name_female

2. sample level:​

The name of feature should follow this format:

{field name}_{feature name}

Ex. text_gender_bias_name_female

B. ner​

There are four features: true_entity_info_of && avg_span_length_of (dataset level) && avg_eCon_of (dataset level) && avg_eFre_of (dataset level)

If there are one split : train

If there are two field: tokens

1. data set level:​

The name of feature should follow this format:

{feature name}_{field name}_{split name}

Ex. avg_eFre_of_tokens_train

2. sample level:​

The name of feature should follow this format:

{feature name}_{field name}

Ex. true_entity_info_of_tokens

C. nli​

Usually, the field of nli dataset are premise and hypothesis

If there are one split : train

There are three features: minus, add, divide

1. data set level:​

The name of feature should follow this format:

premise_length_minus_hypothesis_avg_{split name}_length

premise_length_add_hypothesis_avg_{split name}_length

premise_length_divide_hypothesis_avg_{split name}_length

Ex. premise_length_divide_hypothesis_avg_train_length

2. sample level:​

The name of feature should follow this format:

premise_length_minus_hypothesis_length

premise_length_add_hypothesis_length

premise_length_divide_hypothesis_length

D. QA​

Usually, the field of nli dataset are question and context

If there are one split : train

There are bleu features:bleu, divide

1. data set level:​

The name of feature should follow this format:

question_length_divide_context_avg_{split name}_length

bleuquestion_context_avg{split name}

Ex. premise_length_divide_hypothesis_avg_train_length

2. sample level:​

The name of feature should follow this format:

question_length_divide_context_length

bleu_question_context

E. Summary​

Usually, there are six features: density, coverage, compression, repetition, novelty, copy_length

If there are one split : train

If the field of summary dataset are summary document

1. data set level:​

The name of feature should follow this format:

avg_density_of_{split}_{field0}_and_{field1}
avg_coverage_of_{split}_{field0}_and_{field1}
avg_compression_of_{split}_{field0}_and_{field1}
avg_repetition_of_{split}_{field0}_and_{field1}
avg_novelty_of_{split}_{field0}_and_{field1}
avg_copy_length_of_{split}_{field0}_and_{field1}

Ex. avg_copy_length_of_train_summary_and_document

2. sample level:​

The name of feature should follow this format:

density_of_{field0}_and_{field1}
coverage_of_{field0}_and_{field1}
compression_of_{field0}_and_{field1}
repetition_of_{field0}_and_{field1}
novelty_of_{field0}_and_{field1}
copy_length_of_{field0}_and_{field1}

Ex. copy_length_of_summary_and_document