Introduction to Probability
Probability is a measure of uncertainty. By uncertainty we mean we don’t know what will the outcome of certain input and not sure what will happen later. In other words, we can describe probability as “How likely an event will occur?” We all know the sun rises in the east and sets in the west. In this case, there is a 100% likely that the sun will rise in the east and set in the west.
Let’s demonstrate some of the real examples,
Weather forecasting – Did you ever thought before how meteorologist forecasts weather? How does their weather report suggest that there is an 80% chance that there will be cloudy or rainy tomorrow?
Sales forecasting – How a company forecast future sales? By doing so they can predict future business trends, plan for future growth, etc. This sales forecast indicates as to “How much of a particular product is likely to be sold in a specified future period at a certain price?”
Political forecasting – During the election campaigning, “How Twitter predict the outcomes of election polls by using sentiment analysis?”
In the above examples, Probability is the common concept used for forecasting. Some of the terms we use for probability are chance, likelihood, expectation, percentage and odds.
Probability can be defined in many ways. One common definition is “It is a mathematical measure which indicates the chance of occurrences.” In other words, we can state that it is a measure of the likelihood that the event will occur. It ranges from 0 to 1.
Probability – Foundation of Statistics
Probability and Statistics are two different mathematical areas which are always interconnected and goes side by side. By the application of probability in statistics, it makes the outcomes more impactful and understandable. Now we might be thinking “How probability is the foundation of statistics?”
Let’s discuss a few of the concepts.
One of the common approaches we use during the data analysis is “Our data should be normal.” (Even though we have other alternatives if data is not normal). Now our question is, “How we come to know whether our data is normal or not?” The answer is data must follow a normal distribution. Normal distribution is a probability distribution.
Statistical inference is a process of making a decision about a population-based on sampling data. We often use p-value in SPC, DOE, Regression analysis, etc. for drawing inference about a process. “How we draw inference by using p-value?” We compare alpha (α) value with p-value and make a conclusion about the data. Now our question is, “What is a p-value?” P value is a probability value which lies within 0 and 1.
Suppose in a beverage manufacturing company, a quality inspector wants to know whether the bottle caps diameters are equal to 3 cm or not. If the bottle caps diameters are not equal to 3 cm then it won’t fit during the filling process. In such circumstances, we conduct hypothesis testing – We collect samples from a process and based on it we find p-value. After knowing p-value we compare with our alpha value and draw the inference.