COMPARISON OF OPTIMIZATION METHODS FOR REGRESSION ESTIMATION OF PROBABILITY DENSITY OF A ONE-DIMENSIONAL RANDOM VARIABLE
A. V. Lapko1,2, V. A. Lapko1,2
1Institute of Computational Modelling, Siberian Branch, Russian Academy of Sciences, Krasnoyarsk, Russia 2Reshetnev Siberian State University of Science and Technology, Krasnoyarsk, Russia
Keywords: regression estimation of the probability density, one-dimensional random variable, kernel probability density estimation, selection of bandwidths, Sturges rule, Heinhold - Gaede rule, large-volume samples
Abstract
The methods of bandwidth selection of the kernel functions of the regression estimation of the probability density of a one-dimensional random variable are investigated. Regression estimation of the probability density is a modification of the Rosenblatt-Parzen statistics and is used in processing of large-volume statistical data. Its synthesis is based on compression of the initial sample by means of decomposition of the range of values of the random variable. The elements of the resulting data array are the centers of the sampling intervals and the frequency of belonging of the values of the random variable chosen from the initial sample to them. This information is sufficient to estimate the probability density of the random variable in the form of a nonparametric regression. Therefore, it becomes possible to select the bandwidths of the kernel functions of the regression estimate from the condition of the minimum error of the desired probability density approximation. The traditional approach to optimizing nonparametric estimation of the probability density is based on minimizing its mean square deviation. The approximation properties of regression estimation of the probability density are analyzed when using the considered methods of its optimization.
|