# Correlation Analysis

## Definition

**Correlation **is the statistical tool which is used to know the relationship between two or more variables i.e. the degree to which the variables are associated with each other. In simpler words, it measures the closeness of the relationship. For example, price and supply, demand and supply, income and expenditure are correlated.

Suppose in a manufacturing firm, they want to know the relation between production volume & the efficiency of machinery equipment. In this case, we can use correlation analysis.

## What are types of correlation?

**Positive Correlation**– When the variables are changing in the same direction (either increase or decrease in parallel), we call it as a positively correlated. For e.g. price of a goods and demand, hot weather and cold drink consumptions, etc.**Negative Correlation**– When the variables are changing in the opposite direction (One is increasing and other is decreasing), we call it as a negatively correlated. For e.g. alcohol consumption and lifeline, smartphones usages and battery lifeline, etc.**Zero Correlation**– We call it a zero correlated when there is no relationship between the variables (Correlation=0). For e.g. HR recruits and temperature, paper production and beverages, etc.

**correlation**. Suppose in a glass manufacturing industry scenario, we want to know whether the temperature and chemical formulation is related or not. We want to check the relation – “How strong or weak it is?” It helps to identify the strength of the relationship between two factors and their cause and effect relationship. It is the fundamental tool in

**correlation**and regression analysis

**.**Generally three types of correlation are mentioned above using a scatterplots. A positive correlation is a type of correlation between two variables when both the variables are changes in same direction. When one keeps increasing and the other keeps increasing too. A negative correlation is a contradiction to positive correlation. It means as one variable increases and the other decreases. When there is no relationship between the variables and all the data points are scattered everywhere. In such case there is no correlation.

• Data are numerical in nature.

• To check the cause and effect relationships between the pair of continuous variables.

• To identify the outlier in a process.

• To examine whether there is a relationship exists between the variables.

• Easy visualization of the data variables or factors.

• Plotting the graph is relatively simple.

• To track the patterns or trends of a data.

• Best used for the optimization of a process.

**correlation coefficient (r)**– which is a

**statistical measure**of the degree, to which change to the value of one variable varies change to the value of another.

**Correlation methods**

**Karl Pearson’s Coefficient of Correlation –**It is widely used to find correlation of a numeric variables. To find the relationship between two variables ( Say x and y), we can use the formula.

For e.g. in an automobile manufacturing industry, we can check the relationship between weight and mileage of a cars. We will take weight of cars (x) and mileage of cars (y). After finding mean of x and y, equate on the above formula and we get r=0.73. This means the weight and mileage of care are positively correlated.

**Spearman’s Rank Correlation Coefficient –**It is used to study the degree of association between the ranked variables.To find the relationship between two variables ( Say X and Y), we can use the formula

For e.g. in the steel industry, a manufacturer wants to find a relation between material and labour costs based on expenditure. Now, we will rank it according to the cost (Say R_{1}. for material & R_{2}. for labour) and subtract it (We will get D). After equating those values on the above formula, we get r=0.89. This means the relationship between materials and labour is highly positively correlated based on expenditure.

**Some important interpretations **

→ Value of correlation coefficient ‘r’ ranges from -1 to +1.

→ If r = +1, then the correlation between the two variables is said to be perfect and positive.

→ If r = -1, then the correlation between the two variables is said to be perfect and negative.

→ If r = 0, then there exists no correlation between the variables.