The issue of insufficient samples usually occurs in real engineering problems because of the time-consuming and expensive nature of collecting samples. In general, nonlinear modeling based on limited samples is rather difficult. Incorporating prior knowledge into this type of problem might offer a promising solution. In practice, different forms of prior knowledge may be available, and their use can avoid the weakness of training sample limitation. The primary focus of this study is to introduce an alternative approach for incorporating prior knowledge based on the Pareto optimality concept by improving the initialization of the chromosome and obtaining a reliable Pareto front. In general, the proposed technique relies on the generation of a set of solutions by considering the available training samples and prior knowledge in modeling. As there are many difficulties in obtaining a good Pareto front, we discuss the challenges of implementing the proposed technique, including the formulation of two-objective functions, the uncertainty of the obtained Pareto front and the complexity of the problem space. To validate the proposed technique, a benchmark problem and a control engineering problem are investigated. It is shown that the proposed technique can be implemented by capturing the best solution in the obtained Pareto front, and the accuracy of the prediction for the system identification problem can be improved by up to 10 %.