Enrique Amigó

Ph.D. in Computer Science (UNED) assistant professor at the Departamento de Lenguajes y Sistemas Informaticos de la UNED. Member of the UNED group in Natural Language Processing and Information Retrieval.

Short bio

I'm currently Assisstant Professor with Universidad Nacional de Educación a Distancia (UNED), Spain, and member of the Research Group in natural language processing and information retrieval. My interests so far have focused on two lines of research. Firstly, the axiomatisation of evaluation metrics on the basis of measurement theory, making contributions in classification, ranking, diversity, clustering, automatic summarisation, etc. Secondly, the extension of information theory over continuous spaces, making contributions in document representation, similarity metrics, unsupervised fusion of rankings and distributional compositional models.

Main Contributions

(I) Evaluation metrics:

My studies on evaluation metrics for multiple IA tasks are grounded on axiomatic methodologies and measurement theory. The most relevant contributions in this line are:

1. General theoretical frameworks and metodologies:
  - On the nature of information access evaluation metrics: a unifying framework. (E. Amigó and S. Mizzaro, IRJ 2020): This paper provides a uniform, general formal account of evaluation metrics for ranking, classification, clustering, and other information access problems. The approach extends Measurement Theory, modelling the notion of mesasurement closeness at different scales.
  - What is my Problem? Identifying Formal Tasks and Metrics in Data Mining on the Basis of Measurement Theory (E. Amigó, J. Gonzalo and S. Mizzaro, IEEE Transactions on Knowledge and Data Engineering 2022) In this paper, we formalize AI tasks in terms of Measurement Theory, which is a cornerstone of quantitative research in many disciplines, but has not yet been incorporated (in a consensual way) into some areas of Computer Science. The proposed formal framework provides a methodology to precisely define AI tasks for any given scenario and identify appropriate metrics.
  - Are we on the Right Track? An Examination of Information Retrieval Methodologies (Enrique Amigó, Hui Fang, Stefano Mizzaro, ChengXiang Zai, SIGIR 2018) We categorize existing IR research methodologies along two dimensions: (1) empirical vs. theoretical, and (2) top-down vs. bottom-up and 6 desirable aspects. The theoretical analysis suggests that different methodologies are complementary and therefore, equally necessary. The categorization of the 167 full papers suggest that most of existing work is empirical bottom-up.
  - Combining Evaluation Metrics via the Unanimous Improvement Ratio and its Application to Clustering Tasks. (E. Amigó, J. Gonzalo, J. Artiles, F. Verdejo, JAIR 2011). A measure based on Conjoint Measurement Theory that indicates how robust the measured differences are to changes in the relative weights of the individual metrics (e.g. Precision and Recall) in metric combination functions. The empirical results confirm the validity and usefulness of the metric for the Text Clustering problem.

1. Specific tasks:
  - A general evaluation measure for document organization tasks. (E. Amigó, J. Gonzalo, F. Verdejo, SIGIR 2013) A set of five axioms for IR evaluation metrics, and the definition of Reliability and Sensitivity: a metric that can be applied to any mixture of ranking, clustering and filtering tasks. A high score according to the harmonic mean of Reliability and Sensitivity ensures a high score with any of the most popular evaluation metrics in all the Document Retrieval, Clustering and Filtering datasets used in our experiments.
  - A unifying and general account of fairness measurement in recommender systems (E. Amigó, Y. Deljoo, S. Mizzaro, A. Begollin, IPM 2022) A general, flexible, and parameterizable framework that covers a whole range of fairness evaluation possibilities in recommendation sytems. The framework is grounded on a general work hypothesis: interpreting the space of users and items as a probabilistic sample space, two fundamental measures in information theory (Kullback–Leibler Divergence and Mutual Information) can capture the majority of possible scenarios for measuring fairness on recommender system outputs. In addition, earlier research on fairness in recommender systems could be viewed as single-sided, trying to optimize some form of equity across either user groups or provider/procurer groups, without considering the user/item space in conjunction, thereby overlooking/disregarding the interplay between user and item groups. Instead, our framework includes the notion of statistical independence between user and item groups.
  - Ranking Interruptus: When Truncated Rankings Are Better and How to Measure That (E. Amigó, S. Mizzarom, D. Spina, SIGIR 2022) In this paper we analyse the problem of Truncated Ranking, i.e. evaluating systems that have a stopping criteria to truncate the ranking at the right position to avoid retrieving those irrelevant documents at the end. We first define formal properties to analyse how effectiveness metrics behave when evaluating truncated rankings. Our theoretical analysis shows that de-facto standard metrics do not satisfy desirable properties to evaluate truncated rankings: only Observational Information Effectiveness (OIE) – a metric based on Shannon’s information theory – satisfies them all. We then perform experiments to compare several metrics on nine TREC datasets. According to our experimental results, the most appropriate metrics for truncated rankings are OIE and a novel extension of Rank-Biased Precision that adds a user effort factor penalising the retrieval of irrelevant documents.
  - A comparison of extrinsic clustering evaluation metrics based on formal constraints. (E. Amigó, J. Gonzalo, J. Artiles, F. Verdejo, IRJ 2009) A few intuitive formal constraints on Clustering metrics which shed light on which aspects of the quality of a clustering are captured by different metric families. Our analysis of a wide range of metrics shows that only BCubed satisfies all formal constraints. We also extend the analysis and the BCubed metric to overlapping clustering.
  - A comparison of filtering evaluation metrics based on formal constraints (E. Amigó, J. Gonzalo, F. Verdejo and D. Spina, IRJ 2019). A study which leads to a typology of measures for Document Filtering which is based on a set of three (mutually exclusive) formal properties which help to understand the fundamental differences between measures and determining which ones are more appropriate depending on the application scenario.
  - An Effectiveness Metric for Ordinal Classification: Formal Properties and Experimental Results. (E. Amigó, J. Gonzalo, S. Mizzaro and J. Carrillo-de-Albornoz, ACL 2020) We propose a new metric for Ordinal Classification, Closeness Evaluation Measure (CEM), that is rooted on Measurement Theory and Information Theory. The results indicate that the proposed metric captures quality aspects from different traditional tasks simultaneously.
  - An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric (E. Amigó and D. Spina and J. Carrillo-de-Albornoz, SIGIR 2018). We define a constraint-based axiomatic framework to study the suitability of existing ranking with diversity metrics. The analysis informed the definition of Rank-Biased Utility (RBU). Our experiments show that the proposed metric captures quality criteria reflected by different metrics, being suitable in the absence of knowledge about particular features of the scenario under study.
  - An Evaluation Framework for Aggregated Temporal Information Extraction (Enrique Amigó, Javier Artiles and Heng Ji SIGIR 2011) This paper focusses on the representation and evaluation of temporal information about a certain event or entity. Given that the resulting temporal information can be vague, it is necessary that an evaluation framework captures and compares the temporal uncertainty of system outputs and human assessed gold-standard data. In this paper, we define a temporal representation, formal constraints and an evaluation metric. The task setting and the evaluation measure presented here have been introduced in the TAC 2011 Knowledge Base Population evaluation for the Temporal Slot Filling task.

(II) Observational Information Theory

In this research line, we have defined a generalization of the Shannon's information content for continuous feature values called Observational Information Quantity (OIQ). The following papers describes its implications in document representation, heterogeneous feature aggregation, similarity axiomatics, ranking effectiveness and ranking fusion.

A Formal Account of Effectiveness Evaluation and Ranking Fusion. (E. Amigó, F. Giner, S. Mizzaro, D. Spina, ICTIR 2018) In this papaer the observational information framework is used to formalize: (i) system effectiveness as an information theoretic similarity between system outputs and human assessments, and (ii) ranking fusion as an information quantity measure. As a result, the proposed effectiveness metric improves popular metrics in terms of formal constraints. In addition, our empirical experiments suggest that it captures quality aspects from traditional metrics, while the reverse is not true. Our work also advances the understanding of theoretical foundations of the empirically known phenomenon of effectiveness increase when combining retrieval system outputs in an unsupervised manner.
Integrating learned and explicit document features for reputation monitoring in social media. (F. Giner, E. Amigó, F. Verdejo, KAIS 2020) In this paper, we define the OIQ based representation and its application on quantitative and discrete features aggregation in the context of on-line reputation management. The approach allows to integrate, without supervision, intrinsic features (words, n.grams) with quantitative features based on training data (proximity to classes or clusters).
On the foundations of similarity in information access. (E. Amigó, J. Gonzalo, F. Giner and F. Verdejo, IRJ 2019) In this paper we show how axiomatic explanations of similarity from other fields do not completely fit the notion of similarity function in Information Access problems. On the basis of observational information framework, we propose a new set of formal constraints for similarity functions. Based on these formal constraints, we introduce a new parameterized similarity function, the information contrast model (ICM), which generalizes both pointwise mutual information and Tversky’s linear contrast model. Unlike previous similarity models, ICM satisfies the proposed formal constraints for a certain range of values of its parameters.

(III) Axiom-Based Distributional Text Representation and Composition

Information Theory-based Compositional Distributional Semantics (E. Amigo ́, A Ariza-Casabona, V. Fresno, M.A. Martí, CL Journal 2022). We define and study the notion of Information Theory-based Compositional Distributional Semantics (ICDS): i) we first establish formal properties for embedding, composition and similarity functions based on Shannon's Information Theory; ii) we analyse the existing approaches under this prism, checking whether or not they comply with the established desirable properties; iii) we propose two parameterisable composition and similarity functions that generalise traditional approaches while fulfilling the formal properties; and finally iv) we perform an empirical study on several textual similarity datasets that include sentences with a high and low lexical overlap, and on the similarity between words and their description. Our theoretical analysis and empirical results show that fulfilling formal properties affects positively the accuracy of text representation models in terms of correspondence (isometry) between the embedding and meaning spaces.

Music activity

I have combined my research career with musical projects. I began with the wave of singer-songwriters in Madrid in the 90's, playing in venues such as Libertad 8 or Galileo. Since then, I have shaped his songs with multiple projects. In the late 90's "La Casa del Conde", a Spanish/Brazilian fusion band with Ronny Vasques and Claudio H. In the 2000's he leads Esfumato, a musical-performance project accompanied by toys and shadow montages with Carlos Manzanares and Julio Gonzalo among others. In the last decade he has launched other projects such as Rap-Madera (organic rap trio on exclusively wooden instruments), with Dani Aguilera (Murcia) and Jorgito Kamankola (Cuba) or the violin and guitar duo with Violeta Veinte. In 2018 I founded the project TREVITHICK, with Viti Fresno on bass, Julio Gonzalo on sax, and Gabriel Vidanauta on drums. Organic band sound with Spanish guitar, leaving space for jazz, progressive elements and the lyrics themselves.