Confidence intervals can be used to estimate several population parameters. One type of parameter that can be estimated using inferential statistics is a population proportion. For example, we may want to know the percentage of the U.S. population who supports a particular piece of legislation. For this type of question, we need to find a confidence interval.
In this article, we will see how to construct a confidence interval for a population proportion, and examine some of the theory behind this.
We begin by looking at the big picture before we get into the specifics. The type of confidence interval that we will consider is of the following form:
Estimate +/- Margin of Error
This means that there are two numbers that we will need to determine. These values are an estimate for the desired parameter, along with the margin of error.
Before conducting any statistical test or procedure, it is important to make sure that all of the conditions are met. For a confidence interval for a population proportion, we need to make sure that the following hold:
- We have a simple random sample of size n from a large population
- Our individuals have been chosen independently of one another.
- There are at least 15 successes and 15 failures in our sample.
If the last item is not satisfied, then it may be possible to adjust our sample slightly and to use a plus-four confidence interval. In what follows, we will assume that all of the above conditions have been met.
Sample and Population Proportions
We start with the estimate for our population proportion. Just as we use a sample mean to estimate a population mean, we use a sample proportion to estimate a population proportion. The population proportion is an unknown parameter. The sample proportion is a statistic. This statistic is found by counting the number of successes in our sample and then dividing by the total number of individuals in the sample.
The population proportion is denoted by p and is self-explanatory. The notation for the sample proportion is a little more involved. We denote a sample proportion as p̂, and we read this symbol as "p-hat" because it looks like the letter p with a hat on top.
This becomes the first part of our confidence interval. The estimate of p is p̂.
Sampling Distribution of Sample Proportion
To determine the formula for the margin of error, we need to think about the sampling distribution of p̂. We will need to know the mean, the standard deviation, and the particular distribution that we are working with.
The sampling distribution of p̂ is a binomial distribution with probability of success p and n trials. This type of random variable has a mean of p and standard deviation of (p(1 - p)/n)0.5. There are two problems with this.
The first problem is that a binomial distribution can be very tricky to work with. The presence of factorials can lead to some very large numbers. This is where the conditions help us. As long as our conditions are met, we can estimate the binomial distribution with the standard normal distribution.
The second problem is that the standard deviation of p̂ uses p in its definition. The unknown population parameter is to be estimated by using that very same parameter as a margin of error. This circular reasoning is a problem that needs to be fixed.
The way out of this conundrum is to replace the standard deviation with its standard error. Standard errors are based upon statistics, not parameters. A standard error is used to estimate a standard deviation. What makes this strategy worthwhile is that we no longer need to know the value of the parameter p.
To use the standard error, we replace the unknown parameter p with the statistic p̂. The result is the following formula for a confidence interval for a population proportion:
p̂ +/- z* (p̂(1 - p̂)/n)0.5.
Here the value of z* is determined by our level of confidence C. For the standard normal distribution, exactly C percent of the standard normal distribution is between -z* and z*. Common values for z* include 1.645 for 90% confidence and 1.96 for 95% confidence.
Let's see how this method works with an example. Suppose that we wish to know with 95% confidence the percent of the electorate in a county that identifies itself as Democratic. We conduct a simple random sample of 100 people in this county and find that 64 of them identify as a Democrat.
We see that all of the conditions are met. The estimate of our population proportion is 64/100 = 0.64. This is the value of the sample proportion p̂, and it is the center of our confidence interval.
The margin of error is comprised of two pieces. The first is z*. As we said, for 95% confidence, the value of z* = 1.96.
The other part of the margin of error is given by the formula (p̂(1 - p̂)/n)0.5. We set p̂ = 0.64 and calculate = the standard error to be (0.64(0.36)/100)0.5 = 0.048.
We multiply these two numbers together and obtain a margin of error of 0.09408. The end result is:
0.64 +/- 0.09408,
or we can rewrite this as 54.592% to 73.408%. Thus we are 95% confident that the true population proportion of Democrats is somewhere in the range of these percentages. This means that in the long run, our technique and formula will capture the population proportion of 95% of the time.
There are a number of ideas and topics that are connected to this type of confidence interval. For instance, we could conduct a hypothesis test pertaining to the value of the population proportion. We could also compare two proportions from two different populations.