The optimal feature selection problem is essentially a combinatorial problem. The number of non-dominant solutions grows exponentially along the number of features and objectives. Given its NP-hardness, it is neither practical nor necessary to find the entire true Pareto front [38]. Instead, obtaining a set of solutions that evenly distributed over the true Pareto front is representative, pragmatic and computationally referrable.
The normal constraint (NC) method [19] is an effective method in the literature that guarantees generating evenly distributed solutions over the entire Pareto front. The main idea of NC method is to use the utopia plane to approximate the true Pareto front such that a set of evenly distributed reference points on utopia plane would result to a set of evenly distributed solutions on Pareto front, after the projection along the normal vector of utopia plane. Here, utopia plane refers to the hyper plane determined by all the anchor points, each of which individually optimizes a single objective (e.g. y1 for y1-axis and y2 for y2-axis in Fig 3.a). However, the original NC method was developed on continuous solution space. In the case of IP, it can generate dominated solutions (see Fig 3.a) where point A is the solution generated using NC, but there exists a point B outside the shaded feasible region, dominating A. Therefore, we propose to combine the idea of utopia plane and the epsilon-constraint method such that all solutions are guaranteed to be non-dominant ones (see Fig 3.c).
Our proposed method consists of four main steps: 1) determine the utopia plane; 2) generate uniformly distributed reference points on the utopia plane; 3) for each reference point, use its (k-1) coordinates to constrain the (k-1) objectives; 4) optimize the k-th objective within the reduced solution space.
Repeat steps 2) to 4) to achieve the representative Pareto front. In step 2), we resort to the hit-and-run (H&R) method [18] to sample the reference points. Its principle is straightforward: let pt denote the current point then Pk+1= pk+ l×d is the next point, where d is a random direction vector and l is the random length of the jump. As a random-walk algorithm, H&R is proven to generate uniformly distributed points inside any polyhedron after a sufficient number of runs [18].
Note that an obvious alternative is to equally divide the Pareto front along each objective direction and then traverse each grid. However, it might not evenly cut the Pareto front and the computational efforts grow exponentially along the number of objectives.