3 Ways of Collecting Statistical Samples

Where does our data come from?

Photo by Owen Cannon on Unsplash

For any given project, it is very likely that there exists some form of statistical sampling. This is simply because there must be data in order to perform analysis. Imagine for a minute that you want to see if there is a correlation between an athlete’s height and their speed, or if you want to see if there is a correlation between time playing video games and myopia. In each of these scenarios, you must figure out a way to properly select the participants in a manner that does not introduce any unwanted bias. View below an overview of three different statistical sampling methods.

Simple Random Sampling

Simple random sampling means when every individual from the population has an equally likely chance of being selected for the sample group. This method is one of the most commonly implemented practices for obtaining a sample.

Simple random sampling, along with the other two methods afterwards, will be examined in the context the following research question: Should school times start later in the state of New Jersey? In the context of simple random sampling, benefits include its lack of bias due to randomness, and the fact that the process of implementing the proceudre is not particularly arduous or long. Of course there are drawbacks, some of which include monetary issues and time. Since it may be beneficial to avoid the biases that come with online surveys, it could cost large sums of money to purchase full length databases of the population of New Jersey to begin sampling.

Photo by Khürt Williams on Unsplash

How would we actually implement this method? One way that simple random sampling can be administered is to randomly label the population with a unique number starting from 1 to n, where n is population size. Then, take the population members with assigned numbers 1 to x as the sample, with x being the desired sample size (it is important to make sure that the sample size is large enough in order to avoid high variability). This way, complete randomness is retained, thus properly abiding by the rules of having a simple random sample.

Proportional Stratified Sampling

Proportional stratified sampling requires the population to be divided into subsets called strata, with simple random sampling to occur within each strata afterwards. The sample size of each strata is directly proportional to the population size of the strata.

For example, say one wanted to survey 1000 people across two different stratum. Strata 1 occupies 75% of the population while Strata 2 occupies 25% of the population. Based on the definition of proportional stratified sampling, 750 of the 1000 survey members would come from Strata 1, while the other 250 would come from Strata 2.

This sampling method maintains random elements, as the simple random sampling procedure still exists in the stratified process. Additionally, there is more potential to reduce bias, since the existence of stratum help gauge the opinion of the population. However, working with stratum can also backfire, as identifying which stratums to use, or if stratum even exist in the population, is difficult. If one mistakenly identifies one strata for another, statistical results can be skewed, especially if one strata severely overrepresents a part of the population.

In the context of the New Jersey question, stratum can be identified as high income families (I defined as +$100,000 annual income) or low income families. Families with different economic backgrounds may view changing the start time of school differently, as some people may need to work longer hours. According to the United States Census Bureau, in 2012, 17.5% of families in New Jersey made more than $100,000 while 82.5% of families in New Jersey made less than $100,000. Therefore, if we were to sample 1000 parents, it would be ideal to have 82.5% of the sample comprise of families with income under $100,000 and 17.5% of the sample comprise of families with income over $100,000.

Convenience Sampling

A convenience sample is one that does not rely on any type of probability. Rather, the sample is simply drawn from a segment of the population that is close at hand. It is not random in nature. Convenience samples are not extremely common, as the chance of bias occurring is relatively high compared to other samples. However, they do present financial and time benefits, as one only needs to collect the sample from members of the population that are in close proximity.

Going back to the New Jersey question, say the person collecting the sample resided in Princeton and wanted to interview 100 people. Said person would only need to walk outside, maybe on Nassau Street, and ask the first 100 people about their opinion on school start times. This could be done in less than an hour, while other methods require precise calculation before they can be implemented.

Decision?

Out of these three sampling methods, the proportional stratified method will most likely produce the best results. With the usage of stratum, non-response bias may be decreased, and it also guarantees that the opinions of families from all socioeconomic backgrounds are included. Generally, proportional stratified sampling can provide a level of precision that not many other sampling methods can offer.

Conclusion

I hope you have learned a little bit more about how we collect data. Sampling was the first topic I researched that even remotely related to statistics, and I found it extremely interesting! Personally, I have found this useful in my own research, where I incorporated sampling to answer a research question about different school learning methods. Statistical sampling will always be powerful because it is a necessary step for analysis!

--

--

--

18 || High School Senior || Stats & Math

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

The Future of AI in Africa Looks Bright

The Flow Extension

Making sense out of the fast data and stream processing conundrum

Using AI to Predict Droughts, Floods and Conflict Displacements in Somalia

Inferring Label Hierarchies with hLDA

The Ichimoku Cloud Strategy: A “Beautiful” Blob of Mess

BHP Leverages the Denodo Platform to Create a Logical Data Fabric

Data mining uncovers hidden interactions

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Albert Ming

Albert Ming

18 || High School Senior || Stats & Math

More from Medium

When Mathematics Meets Bio-Science: Data Science Implication

How To The Count Number Of Ways That A List Of Items Can Be Partitioned Into Identical Sized…

Getting Familiar to The World of Machine Learning

Ordinary Least Squares and Normal Equations in Linear Regression