Bootstrapping is a method for statistical inference, which focused on
building a sampling distribution with the key idea of resampling the originally
observed data with replacement. The term bootstrapping, proposed by Bradley
Efron in his “Bootstrap methods: another look at the jackknife” published
in 1979, is extracted from the cliché of ‘pulling oneself up by one’s
bootstraps’. So, from the meaning of this concept, sample data is considered as
a population and repeated samples are drawn from the sample data, which is
considered as a population, to generate the statistical inference about the sample
data.  The essential bootstrap analogy states
that “the population is to the sample as the sample is to the bootstrap samples”.

                   
The bootstrap was divided into two methods, which are parametric and
nonparametric. Parametric bootstrapping assumes that the original data set is drawn
from some specific distributions, e.g. normal distribution. And the sample often has the
same size as the original data set.
Nonparametric bootstrapping is just the
one described in the beginning, which draws bootstrapping samples from the
original data. Bootstrapping is quite useful in non-linear regression and
generalized linear models. For small sample size, the parametric bootstrapping
method is highly preferred. In large sample size, nonparametric bootstrapping
method would be preferably utilized. For a further clarification of nonparametric
bootstrapping, a sample data set, A = {x1, x2, …, xk} is randomly drawn from
a population B = {X1, X2, …, XK} and K is much larger than k. The statistic T
= t(A) is considered as an estimate of the corresponding population parameter P
= t(B). Nonparametric bootstrapping generates the estimate of the sampling
distribution of a statistic in an empirical way.  No assumptions of the form of the population
is needed. Next, a sample of size k is drawn from the elements of A with replacement,
which represents as A?1 =
{x?11,
x?12,
…, x?1k}.
In the resampling, a * is added to distinguish re-sampled data from original
data. Replacement is mandatory and supposed to be repeated typically 1000 or
10000 times, which is still increasing since computation power increases, otherwise
only original sample A would be generated.  And for each bootstrap estimate of these samples, mean is
calculated to estimate the expectation of the bootstrapped statistics.  Mean-T is the estimate of T’s bias. And T?, the bootstrap variance estimate, estimates the sampling variance of the population, P. Then
bootstrap confidence intervals can be constructed using either bootstrap
percentile interval approach or normal theory interval approach. Confidence
intervals by bootstrap percentile method is to use the empirical quantiles of the
bootstrap estimates, which is written as T?(lower) < P < T?(upper). More specifically, it can be written as Tˆ ? (Tˆ ? upper – T*ˆ) ? P ? Tˆ + (T*ˆ + Tˆ ?lower).                     Bootstrapping is an effective method to doublecheck the stability of the model estimation results. It is much better than the intervals calculated by sample variance with normality assumption. And simplicity is bootstrapping's another important benefit. For complicated estimators, such as correlation coefficients, percentile points, for complex parameters in the distribution, it is a pretty simple way to generate estimates of confidence intervals and standard errors. However, simplicity can also bring up disadvantage for bootstrapping, which makes the important assumptions for the bootstrapping easy to neglect. And bootstrapping is often over-optimistic and doesn't assure finite sample size.                        There are several types of bootstrapping schemes in the regression problems. One typical approach is to resample residuals in the regression models. The main procedure is firstly fit the original data set with the model, and generate model estimates, ?ˆ and calculate residuals, ?ˆ; secondly randomly and repeatedly sample the residuals (typically 1000 or 10000 times) to get K sets residuals of size k and add each resampled residual to the original equation, generating bootstrapped Y*; Finally use bootstrapped Y* to refit the model and get bootstrap estimate ?ˆ?.                      Another typical approach in the regression context is random-x resampling, which is also called case resampling. We can either apply Monte Carlo algorithm, which is to repeatedly resample the data of the same size as the original data set with replacement, or identify any possible resampling of the data set. In our case, before fitting regression model with the original predictor variable and response pairs (xi, yi), for i = 1, 2, . . ., k, these data pairs are resampled to get K new data pairs of size k. Then the regression model is fit to each of these K new data sets. ?ˆ? is generated from K parameter estimates. 

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now
x

Hi!
I'm Erica!

Would you like to get a custom essay? How about receiving a customized one?

Check it out