Title: Normalized entropy: A comparison with traditional techniques in variable selection
Authors: Pedro Macedo, Maria Conceição Costa, and João Pedro Cruz
Abstract. A variable selection procedure in regression analysis using a normalized entropy measure was firstly proposed in 1996, by Amos Golan, George Judge and Douglas Miller, in the book Maximum Entropy Econometrics – Robust Estimation with Limited Data. To the best of the authors’ knowledge, the idea has not been explored in the literature since then, despite many noteworthy advantages that have been pointed out by Amos Golan and coauthors, such as: it is simple to perform, even for a large number of variables (useful in some big data problems); it allows the use of non-sample information (easily incorporated in the optimization structure); and it can be implemented for ill-posed models (frequently observed in real-world problems). Following a recent work that illustrates how normalized entropy can represent a promising approach to identify pure noise models, this paper revises the procedure of normalized entropy, proposes some improvements, and illustrates its performance when compared with some well-known traditional techniques in variable selection problems.
Title: Neagging: An aggregation procedure based on normalized entropy
Authors: Maria Conceição Costa, Pedro Macedo, and João Pedro Cruz
Abstract. The analysis of big data, namely in inhomogeneous large-scale data under the regression analysis context, is a research topic with growing interest in recent years, where bagging and magging are two well-known aggregation procedures. As this kind of data may be recorded in different time regimes or may be taken from multiple sources, inhomogeneities are expected to be present, compromising regression modelling. The classical framework of independent and identically distributed errors related to a single underlying model does not apply and the usual alternatives (such as time-varying coefficients models or mixture models, for instance) may represent prohibitive computational burden. This paper revises the methodology developed in a recent work where an aggregation procedure based on normalized entropy was proposed, with very promising results, and illustrates its performance with real data applications considering distinct scenarios.
Title: A Process to machine learning with real n-dimensional data
Authors: Francisco Miranda, Carlos Abreu, and Daniel Miranda