Time filter

Source Type

Golden Valley, MN, United States

Lipovetsky S.,GfK Custom Research North America
Mathematical and Computer Modelling | Year: 2010

With a simple transformation, the ordinary least squares objective can yield a family of modified ridge regressions which outperforms the regular ridge model. These models have more stable coefficients and a higher quality of fit with the growing profile parameter. With an additional adjustment based on minimization of the residual variance, all the characteristics become even better: the coefficients of these regressions do not shrink to zero when the ridge parameter increases, the coefficient of multiple determination stays high, while bias and generalized cross-validation are low. In contrast to regular ridge regression, the modified ridge models yield robust solutions with various values of the ridge parameter, encompass interpretable coefficients, and good quality characteristics. © 2009 Elsevier Ltd. All rights reserved.

Nowakowska E.,Polish Academy of Sciences | Koronacki J.,Polish Academy of Sciences | Lipovetsky S.,GfK Custom Research North America
Information Sciences | Year: 2016

Dimensionality reduction that preserves certain characteristics of data is needed for numerous reasons. In this work we focus on data coming from a mixture of Gaussian distributions and we propose a method that preserves the distinctness of the clustering structure, although this structure is assumed to be yet unknown. The rationale behind the method is the following: (i) had one known the clusters (classes) within the data, one could facilitate further analysis and reduce space dimensionality by projecting the data to the Fisher's linear subspace, which - by definition - best preserves the structure of the given classes; (ii) under some reasonable assumptions, this can be done, albeit approximately, without prior knowledge of the clusters (classes). In this paper, we show how this approach works. We present a method of preliminary data transformation that brings the directions of largest overall variability close to the directions of the best between-class separation. Hence, for the transformed data, simple PCA provides an approximation to the Fisher's subspace. We show that the transformation preserves the distinctness of the unknown structure in the data to a great extent. © 2015 Elsevier Inc. All rights reserved.

Lipovetsky S.,GfK Custom Research North America | Mandel I.,Telmar Group Inc.
International Journal of Information Technology and Decision Making | Year: 2013

We consider estimation of one variable's dependence against another one in a new measure called a coefficient of structural association (CSA). It is based on the distribution of one variable along the segments of another one, and yields a gauge similar to the correlation ratio in the nonlinear regression modeling. This index can be constructed as a quotient of the observed and maximum possible variances. The CSA relations to other measures of dependence are described too, particularly, for binary variables CSA reduces to the Loevinger's coefficient of association. Numerical simulations show that CSA presents a powerful tool for data analysis where traditional measures fail. This method can enrich both theoretical and practical estimations for identifying hidden patterns in the data and help managers and researchers in taking appropriate decisions. © 2014 World Scientific Publishing Company.

Lipovetsky S.,GfK Custom Research North America
International Journal of Machine Learning and Cybernetics | Year: 2013

Mixed normal distributions are considered in additive and multiplicative forms. While the weighted arithmetic mean of the probability density functions typically demonstrates several peaks corresponding to the parent sub-distributions, their weighted geometric mean is always expressed in one unimodal multivariate normal distribution. Estimation of the cluster center parameters from such a synthesized distribution is considered. The problem is solved by a non-linear least squares optimization yielding the cluster centers and sizes. The relationship to factor analysis by unweighted least squares and generalized least squares is noted, and numerical results are discussed. The described approach uses only the sample variance-covariance matrix and not the observations, so it can be applied for difficult clustering tasks on huge data sets from data bases and for data mining problems such as finding the approximation for the cluster centers and sizes. The suggested techniques can enrich both theoretical consideration and practical applications for clustering problems. © 2012 Springer-Verlag.

Lipovetsky S.,GfK Custom Research North America
Advances in Adaptive Data Analysis | Year: 2010

Multiple regression's coefficients define change in the dependent variable due to a predictor's change while all other predictors are constant. Rearranging data to paired differences of observations and keeping only biggest changes yield a matrix of a single variable change, which is close to orthogonal design, so there is no impact of multicollinearity on the regression. A similar approach is used for meaningful coefficients of nonlinear regressions with coefficients of half-elasticity, elasticity, and odds' elasticity due the gradients in each predictor. In contrast to regular linear and nonlinear regressions, the suggested technique produces interpretable coefficients not prone to multicollinearity effects. © 2010 World Scientific Publishing Company.

Discover hidden collaborations