Introduction to Data Science
We are in the era of “Internet of Things”, where data are produced in every fraction of time. As reported by Forbes, by 2020 it’s estimated that for every person on earth will produce 1.7 MB data per second. We use various applications or visit sites on the internet for different purposes like online streaming, E-commerce, transactional, social media, etc. From these activities a large set of data (known as big data) are produced and to get better insights or to unlock the real value from it – We use Data science.
While watching youtube or any other online streaming sites, we see content often which are similar to our interests in the recommended section. Similarly, in shopping sites, we see products at the recommended section which are of our interests. Can you think about how it is happening? (Simple by applying Data science)
What is Data Science?
Data science is an area of applying various techniques (data mining, machine learning, business analytics, etc.) to gain better insights about any kind of processes (either manufacturing or service) and predict the future scenarios. To be precise data science goes hand on hand with machine learning, data mining and business analytics.
Understand more about the applications of Data Science in this data-centric world.
Check our blogs
Common disciplines used in Data Science
- Business understanding – The first step is to understand the business processes with regards to different domains (sales, production, R&D, etc). Here we
→ Defined the project objective
→ Defined the goal to achieve
→ Identify the data source
- Data understanding – Here we mainly focus on the collection of data, describing and exploring data for gaining initial insights about the data. We use descriptive statistics and perform visualization. We try to find the relationship between the dependent and independent variables. In simple terms, we collect data from various sources and try to find the relationship between the variables by visualization of data.
- Data preparation – It is a process of converting messy or inconsistent data to a useful one for further analysis. It is a crucial part of data science process. Without proper data preparation, we couldn’t gain better insights into a process. It is the most time-consuming step in a data science process. It involves
→ Data cleaning
→ Data integration
→ Data transformation
→ Data reduction
→ Data discretization
- Data modelling – On this phase, development of models like predictive or descriptive are carried. We use machine learning algorithms depending on available data and requirements. It helps in analyzing data that will further help in meeting business requirements.
- Model evaluation – After creating a model we need to evaluate it whether the model represents the data well or not and to choose the best one from the alternatives available. To ensure that the model properly labels the problem. We take diagnostic measures if there is a fault in a model. Hence model evaluation is an integral part of the model development process.
- Deployment – Once the model is evaluated and performs satisfactorily with regards to specification needs. We move to deployment process.
Statistical Concepts used in Data Science
What role do statistics play in Data Science?
Data science is one of the emerging and fast-growing areas in the present era of analytics. There are two major domains which contributes a lot in data science. These are statistics and computer science. Statistics contribute in a fundamental way to the discipline. From understanding the problem scenarios to the deployment of model, statistics plays a significant part. Some of the activities like
• Experimental design
• Statistical modelling – predictive or descriptive models
• Statistical data analysis…
To proceed for the above activities, one should have a foundational statistical knowledge about it – “What are these methods mean?” “Where and when we can apply?” Hence statistics plays a pivotal role in it.