Lourdes Araujo and Juan J. Merelo.

This paper presents an evolutionary algorithm for modeling the arrival dates in time-stamped data

sequences such as newscasts, e-mails, IRC conversations, scientific journal articles or weblog postings.

These models are applied to the detection of buzz (i.e. terms that occur with a higher-than-normal

frequency) in them, which has attracted a lot of interest in the online world with the increasing number of

periodic content producers. That is why in this paper we have used this kind of online sequences to test

our system, though it is also valid for other types of event sequences. The algorithm assigns frequencies

(number of events per time unit) to time intervals so that it produces an optimal fit to the data. The

optimization procedure is a trade off between accurately fitting the data and avoiding too many frequency

changes, thus overcoming the noise inherent in these sequences. This process has been traditionally performed

using dynamic programming algorithms, which are limited by memory and efficiency requirements. This limitation

can be a problem when dealing with long sequences, and suggests the application of alternative search methods

with some degree of uncertainty to achieve tractability, such as the evolutionary algorithm proposed in this

paper. This algorithm is able to reach the same solution quality as those classical dynamic programming

algorithms, but in a shorter time. We also test different cost functions and propose a new one that yields

better fits than the one originally proposed by Kleinberg on real-world data. Finally, several distributions

of states for the finite state automata are tested, with the result that an uniform distribution produces

much better fits than the geometric distribution also proposed by Kleinberg. We also present a variant of

the evolutionary algorithm, which achieves a fast fit of a sequence extended with new data, by taking

advantage of the fit obtained for the original subsequence.