The distribution of term frequency in texts
E.L. Kuleshov, V.V. Krysanov, and K. Kakusho
Vladivostok, Russia
Pages: 81-90
Abstract
A new mathematical model of distribution of term frequency in texts such as English, Russian texts, and English hypertexts is proposed. A frequency distribution generalizing the Pareto distribution is derived. An algorithm for estimating the model parameters is presented. It is shown that the proposed distribution is far superior to the Pareto distribution since it ensures better agreement with empirical data.
|