es.nlp.uned.weps.evaluation
Class Measures

java.lang.Object
  extended by es.nlp.uned.weps.evaluation.Measures

public class Measures
extends java.lang.Object

The Class Measures contains a set of static methods that implement different clustering measures and combined measures. Reference: E. Amigó, J. Gonzalo and J. Artiles. Evaluation metrics for clustering tasks: a comparison based on formal constraints. Technical report to be published in http://nlp.uned.es


Field Summary
static java.lang.String BCUBED_EXT_PRECISION
          The BCUBE d_ EX t_ PRECISION.
static java.lang.String BCUBED_EXT_RECALL
          The BCUBE d_ EX t_ RECALL.
static java.lang.String BCUBED_F05
          The BCUBE d_ f05.
static int FOLKES_AND_MALLOWS
          The Constant FOLKES_AND_MALLOWS.
static java.lang.String INVERSE_PURITY
          The INVERS e_ PURITY.
static int JACCARD_COEFFICIENT
          The Constant JACCARD_COEFFICIENT.
static java.lang.String MULTIPLICITY
          The MULTIPLICITY.
static java.lang.String PAIRS_FOLKES_AND_MALLOWS
           
static java.lang.String PAIRS_JACCARD_COEFFICIENT
           
static java.lang.String PAIRS_RAND_STATISTIC
           
static java.lang.String PURITY
          The PURITY.
static java.lang.String PURITY_F05
          The PURIT y_ f05.
static int RAND_STATISTIC
          The Constant RAND_STATISTIC.
 
Constructor Summary
Measures()
           
 
Method Summary
static double BCubedExtendedPrecision(Clustering key, Clustering answer)
          Calculates a BCubed precision measure, extended for multicategory clustering problems.
static double BCubedExtendedRecall(Clustering key, Clustering answer)
          B cubed extended recall.
static double FMeasure(double P, double R, double alpha)
          Calculates the F measure as follows:
F-Measure = 1 / (alpha*1/purity + (1-alpha)*1/inv_purity) where alpha range is in the range of [0.0, 1.0]
static double inversePurity(Clustering key, Clustering answer)
          Inverse purity.
static double multiplicity(Clustering answer, Clustering key)
          Calculates the multiplicity measure.
static double pairsMeasure(Clustering answer, Clustering key, int metricType)
          Calculates a pairs based measure.
static double purity(Clustering key, Clustering answer)
          Calculates the standard purity measure.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

PURITY

public static final java.lang.String PURITY
The PURITY.

See Also:
Constant Field Values

INVERSE_PURITY

public static final java.lang.String INVERSE_PURITY
The INVERS e_ PURITY.

See Also:
Constant Field Values

PURITY_F05

public static final java.lang.String PURITY_F05
The PURIT y_ f05.

See Also:
Constant Field Values

BCUBED_EXT_RECALL

public static final java.lang.String BCUBED_EXT_RECALL
The BCUBE d_ EX t_ RECALL.

See Also:
Constant Field Values

BCUBED_EXT_PRECISION

public static final java.lang.String BCUBED_EXT_PRECISION
The BCUBE d_ EX t_ PRECISION.

See Also:
Constant Field Values

BCUBED_F05

public static final java.lang.String BCUBED_F05
The BCUBE d_ f05.

See Also:
Constant Field Values

MULTIPLICITY

public static final java.lang.String MULTIPLICITY
The MULTIPLICITY.

See Also:
Constant Field Values

PAIRS_RAND_STATISTIC

public static final java.lang.String PAIRS_RAND_STATISTIC
See Also:
Constant Field Values

PAIRS_JACCARD_COEFFICIENT

public static final java.lang.String PAIRS_JACCARD_COEFFICIENT
See Also:
Constant Field Values

PAIRS_FOLKES_AND_MALLOWS

public static final java.lang.String PAIRS_FOLKES_AND_MALLOWS
See Also:
Constant Field Values

RAND_STATISTIC

public static final int RAND_STATISTIC
The Constant RAND_STATISTIC.

See Also:
Constant Field Values

JACCARD_COEFFICIENT

public static final int JACCARD_COEFFICIENT
The Constant JACCARD_COEFFICIENT.

See Also:
Constant Field Values

FOLKES_AND_MALLOWS

public static final int FOLKES_AND_MALLOWS
The Constant FOLKES_AND_MALLOWS.

See Also:
Constant Field Values
Constructor Detail

Measures

public Measures()
Method Detail

FMeasure

public static double FMeasure(double P,
                              double R,
                              double alpha)
Calculates the F measure as follows:
F-Measure = 1 / (alpha*1/purity + (1-alpha)*1/inv_purity)
where alpha range is in the range of [0.0, 1.0]

Parameters:
R - the recall measure
P - the precision measure
alpha - the alpha
Returns:
the F-measure

multiplicity

public static double multiplicity(Clustering answer,
                                  Clustering key)
Calculates the multiplicity measure.

Intuitively, multiplicity measures how different two clustering solutions are in terms of the number clusters assigned to each element.
This measure is useful in clustering solutions where one element can belong to more than one category. In the case of the Web People Search task, one document can contain references to different people using the same name (e.g. in genealogies).

This method builds a vector for the key and another for the answer. In the vector each component is an the number of clusters in which an element appears. Finally the euclidian distance between the two vectors is calculated.

Reference: E. Amigó, J. Gonzalo and J. Artiles. Evaluation metrics for clustering tasks: a comparison based on formal constraints. Technical report to be published in http://nlp.uned.es

Parameters:
key - the key clustering
answer - the answer clustering
Returns:
the result

pairsMeasure

public static double pairsMeasure(Clustering answer,
                                  Clustering key,
                                  int metricType)
Calculates a pairs based measure.

Parameters:
key - the key clustering
metricType - the type of pair based metric (RAND_STATISTIC, JACCARD_COEFFICIENT or FOLKES_AND_MALLOWS).
answer - the answer clustering
Returns:
the result

purity

public static double purity(Clustering key,
                            Clustering answer)
Calculates the standard purity measure.

Parameters:
key - the key clustering
answer - the answer clustering
Returns:
the result

inversePurity

public static double inversePurity(Clustering key,
                                   Clustering answer)
Inverse purity.

Parameters:
key - the key
answer - the answer
Returns:
the double

BCubedExtendedPrecision

public static double BCubedExtendedPrecision(Clustering key,
                                             Clustering answer)
Calculates a BCubed precision measure, extended for multicategory clustering problems.
Swapping key and answer we obtain the equivalent recall measure.
 
 For each element in clustering e {
 
 precision_samples = 0
 precision = 0
 recall_samples = 0
 recall = 0
 
 For each element in clustering e' {
 boolean b_1 = e & e' share a cluster
 boolean b_2 = e & e' share a category
 
 IF b_1 {
 precision_samples ++
 IF b_2 {
 precision ++
 }
 }
 IF b_2 {
 recall_samples ++
 IF b_1 {
 recall ++
 }
 }
 }
 
 precision_total += precision/precision_samples
 recall_total += recall/recall_samples
 samples_total ++;
 
 }
 
 precision_total /= samples_total
 recall_total /= samples_total
 
Reference: E. Amigó, J. Gonzalo and J. Artiles. Evaluation metrics for clustering tasks: a comparison based on formal constraints. Technical report to be published in http://nlp.uned.es

Parameters:
key - the key clustering
answer - the answer clustering
Returns:
precision

BCubedExtendedRecall

public static double BCubedExtendedRecall(Clustering key,
                                          Clustering answer)
B cubed extended recall. Reference: E. Amigó, J. Gonzalo and J. Artiles. Evaluation metrics for clustering tasks: a comparison based on formal constraints. Technical report to be published in http://nlp.uned.es

Parameters:
key - the key
answer - the answer
Returns:
the double