This is also a terrible question.
how do you compute the model with a minimum residual after fitting probability distributions on a dataset
I can tell by the question that either you are in a master’s level statistic class, or a master’s level science class that requires these skills to complete assignments. My guess is the latter, because in a statistics class, you could ask and receive reasonable answers from your classmates or professor.
I can also tell by your posted question that you probably should not be in that class. Your posted question indicates a lack of communication skills, short-sighted thinking skills, and just plain laziness. If you posted this on stack exchange, it would be deleted, but not before you were mocked for being brain-dead.
The answer to your question is not difficult, but the actual process can be difficult. Start by using known data models that process the type of data you have. If it’s climate data then use climate models; if it’s biomedical data then use biomedical models. There are plenty of boxed models available for either of these types of data sets.
After computing the data using several boxed models, select the top (N) models that have the best fit based on Least squares and Chi squared statistics. To minimize the residual, you will need to analyze the residual. This “residue” is often referred to as “noise” in statistics and it’s composed of measurement errors, rounding errors, or random errors, and it can be false noise because the data falls outside the models measurement process.
To correct for noise and false noise, use inverse modeling. This is accomplished by setting one or more data elements for each multidimensional data point to an average, then rerunning the altered data using the same models that generated it. The outputs of this will start indicating from which dimensional parameters the noise is originating or where valid data is not utilized. After identifying the parameters of interests adjustments are made to accommodate or restrict the processing of the data in the original set.
Note the complexity of inverse modeling increases well beyond the exponential if your data points (Px) have large numbers of dimensional parameters in its composite form; if some of the parameters have high dependency on other parameters, then it will increase more.
Some boxed model programs have Big O notations approximating the time needed for various inverse modeling procedures. It’s not unusual for the time to exceed thousands of hours on fast, highly optimized computers.
I think you should do it manually. This will give you an appreciation for what the computer is doing.
GA