Latin Hypercube Sampling

Latin Hypercube Sampling LHC

JOSE LOPEZ-COLLADO

JULY, 2015

After searching fow a while, I finally found a paper describing the LHC sampling (Swiler and Wyss 2004). LHC is a re-scaling function in the domain of a random uniform variate so to have a better dispersion of the input numbers used to generate the pdf deviates. The paper of Swiler & Wyss presents a detailed example of the algorithm so anybody can check the results and the algorithm itself (pages 2-9 in the paper).

In essence, the sample size ss serves to divide the sampling space into ss categories and then the U values are re-scaled to the new limits:

(* ===========HERE IS THE MATHEMATICA CODE WITH COMMENTS ======== *)

SetDirectory[NotebookDirectory[]];

wdist = WeibullDistribution[1.5, 3];

dname = wdist;

(* ss is the sample size*)

ss= 1500;

(*scaleu is the LHS re-scaling, very simple indeed! *)

scaleu[u_, i_, ss_] := u (1/ss) + ((i - 1)/ss);

(* set function as listable, capable of handling lists*)

SetAttributes[scaleu, Listable];

(*get a list of uniform random numbers*)

un = RandomVariate[UniformDistribution[{0, 1}], ss];

(*Get a sequence of integers, 1,2,3,... ,ss*)

strata = Range[ss];

(*Re-distribute the random numbers using LHC*)

usc = scaleu[un, strata, ss];

(*Obtain a random number from the list above, in this example is wdist, a Weibull Distribution with shape and scale parameters of 1.5 and 3 respectively*)

(*Obtain a list of random numbers using LHC, check that we use the inverse of the cumulative density function CDF to translate from the re-scaled U values to their pdf values *)

pvL = Map[InverseCDF[dname, #] &, usc];

(* get some statistics, mean and standard deviation *)

mL = Mean[pvL]

2.70744

sdL = StandardDeviation[pvL]

1.83532

(*The next call is is the conventional random sampling, NOT the LHC, it uses the built-in Mathematica function RandomVariate and the distribution name and sample size as arguments *)

pvR = RandomVariate[dname, ss];

(* get the same statistics: mean and standard deviation*)

mR = Mean[pvR]

2.62812

sdR = StandardDeviation[pvR]

1.7902

(* Draw the distributions, Latin Hypercube on the left, regular sampling on the right *)

GraphicsRow[{Histogram[pvL], Histogram[pvR]}]

(* ===========END OF CODE HERE ======== *)

We can observe that the left distribution is smoother than the one on the right, due to the LHC. More about the theory is presented by McKay et al. (1979) and Stein (1987). It is suggested that different streams of random numbers be used every time we draw a sample.

Is it worth the LHC?

There has been different numerical analysis about the effectiveness of using LHC. It improves estimating the shape of the target distribution. Here I only demonstrate visually how good it is. It is suppose that we are using another notebook in Mathematica, that is why some of the code above is repeated. The analysis is in batch, that is, list of samples are generated.

(* ===========HERE IS THE MATHEMATICA CODE WITH COMMENTS ======== *)

SetDirectory[NotebookDirectory[]];

(* Use the Weibull distribution again *)

adist = WeibullDistribution[1.75, 4];

(* set a list of sample sizes *)

nsv = {10, 50, 100, 250, 500, 1000, 10000};

scaleu[u_, i_, ss_] := u (1/ss) + ((i - 1)/ss);

SetAttributes[scaleu, Listable];

(* generate a list of U random variates of different sample sizes nsv *)

unv = Map[RandomVariate[UniformDistribution[{0, 1}], #] &, nsv];

(* generate the categories per each sample size *)

strata = Map[Range[#] &, nsv];

(* Apply LHC *)

usc = MapThread[scaleu[#1, #2, #3] &, {unv, strata, nsv}];

(* Obtain the LHC using the inverse CDF *)

pvL = Map[InverseCDF[adist, #] &, usc];

(* Generate the Weibull variables directly, standard call *)

pvR = Map[RandomVariate[adist, #] &, nsv];

(* generate the histograms for LHC*)

figsLHC = Map[Histogram[#] &, pvL];

(* generate histograms for standard sampling *)

figsStd = Map[Histogram[#] &, pvR];

(* thread the figures and display in a grid, left side are LHC histograms, right side are standard sampling histograms*)

GraphicsGrid[Thread[{figsLHC, figsStd}], ImageSize -> 350]

(* ===========END OF CODE HERE ======== *)

So, it seems clear that LHC improves the shape of the target distribution, even at the largest sample size (n=10,000)

Some people argue (for example D. Vose) that LHC is a relict of the past, carrying an overload of memory and computing time. LHC applies only to distributions where the inverse method exist, but for distributions like the normal and lognormal, the inverse method is an approximate one, so it would be better relying on another number generation method instead.

In my opinion, it doesn't harm to implement the algorithm for medium size projects.

KEY WORDS: MATHEMATICA, LATIN HYPERCUBE SAMPLING, MONTE CARLO SIMULATION

REFERENCES

McKay MD, Beckman RJ Conover WJ. 1979. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21(2): 239-245.

Stein M. 1987. Large sampling properties of simulations using Latin Hypercube Sampling. Technometrics 29(2): 1543-151.

Swiler LP, Wyss GD. A User´s guide to Sandia´s Latin Hypercube Sampling Software: LHS UNIX Library/Standalone version. SAND2004-2439. UC-505. Albuquerque, NM.