Analyzed HERMES_192 dataset
Some statistical data
- Average document length: 151.14 features.
- Average documents per cluster: 5.48 documents.
Spanish side of the corpus (average frequencies) :
- NEs: 39.77%
- PERSON: 2.16%
- LOCATION: 1.37%
- ORGANIZATION: 2.39%
- MISC: 0.46%
- DATE: 26.30%
- AMOUNT: 4.76%
- Names: 18.22%
- Verbs: 11.10%
- Adjectives: 7.18%
- Adverbs: 18.22%
- Determiner: 3.30%
- Pronouns: 1.75%
- Conjunction: 18.22%
- Interjection: 0.0%
- Prepositions: 6.24%
- Punctuation mark: 2.37%
- Numbers: 0.62%
- Dates: 0.0%
- Other features: 0.0%
English side of the corpus (average frequencies):
- NEs: 13.55%
- PERSON: 4.29%
- LOCATION: 3.15%
- ORGANIZATION: 2.01%
- MISC: 2.73%
- DATE: 0.0%
- AMOUNT: 0.0%
- Names: 27.13%
- Verbs: 15.49%
- Adjectives: 7.31%
- Adverbs: 3.13%
- Determiner: 3.29%
- Pronouns: 2.86%
- Conjunction: 0.96%
- Interjection: 9.99%
- Prepositions: 0.0%
- Punctuation mark: 3.44%
- Numbers: 3.56%
- Dates: 1.0%
- Other features: 2.18%