1.0.Data Science Introduction

vamsi krishna
4 min readNov 27, 2021

1.0.Introduction to Data Science

What is data?

What are variables?

What is statistics?

What is Analytics?

How data, variables, Stats, and analytics are correlated?

Data:

Data is a collection of recorded facts into a raw format. Raw Data was Unorganized.

Data further classified into 3 main categories. Those are

  1. Structured
  2. Semi Structured
  3. Unstructured

Structured data:

Data stored into the table formats, like rows and columns, those data called Structured data. Generally these Data Was stored into the relational databases like MySQL.

Semi Structured data:

Data might not be completely structured but easier to analyze and with some processing able to store in Relational databases. Ex:Email, Json,XML, HTML…

Unstructured data:

Data Doesn’t fit into relational databases(Rows and Columns) like text files, audio files, videos, images… in a world more available data was unstructured.

Information:

What we understand from the raw data is called information. Information is always structured and organized. With the help of statistics we collect information from the data.

Variables:

Variable is a characteristic information of an observation or a data point. If you take an example of table first row first column data point we call it a variable and over all variables in a table are called as data.

Variables are broadly classified into 2 types.

  • Qualitative variables
  • Quantitative variables

Qualitative variables:

Categorical variables where data with words we call as Qualitative variables.

Categorical data further divided into 2 types

  • Nominal
  • Ordinal

Nominal:

Categorical data without any order like names of countries, political parties, blood categories, gender, persons etc..

Here variables indicate a unique category of subject and not giving any rank.

Ordinal:

Where categorical data have rank or order we call them ordinal data. ex: Education levels(PhD, MSc, btech,puc,10th..), financial status(rich, upper middle class, lower middle class, poor..) etc…

Quantitative variables:

Numerical data types are Quantitative data where we use numbers and those numbers have value.

Quantitative variables further divided into 2 categories,

  • Discrete
  • Continuous

Discrete:

Numerical numbers with only integers we call as Discrete variables. Examples are number of students in a class, Number of languages in a country etc…

Continues:

Real numbers without any limitation including floating values are called continuous variables. Ex: Height, distance etc…

Statistics:

Stats in simple words how to get the information from the data.

Statistics further divide into 2 types. Stats uses analytics to prove the hypothesis what it makes.

  • Descriptive stats
  • Inferential Stats

Descriptive Stats:

Descriptive stats takes about what was there in data(Sample Data), like numbers of rows and columns are there in table, and data types, if numerical columns are mean, median, percentiles if categorical data is there mode, frequency of variables etc.. are the examples of the descriptive statistics.

Inferential Stats:

By using descriptive stats making conclusions of population data, or building hypotheses or making decisions are called Inferential stats.

Analytics:

Analytics is the process for Analyzing data to validate statistical hypotheses.

Analytics again classified into 4 main categories. Those are

  • Descriptive
  • Diagnostic
  • Predictive
  • Prescriptive

Descriptive Analytics:

what ‘s there in existing or available data is called as descriptive Analytics, Examples what we seen in descriptive stats we can compare here too, one example is Exploratory Data analysis is a process of exploring what is there in data like data types, missing data, Outliers , correlations…

Diagnostic analytics:

Diagnostics analytics are the kind of investigation on data why it happened. It answers questions like why events happen with reasons.

Predictive Analytics:

Making Predictions of the unknown by using Existing data Patterns called Predictive analytics.

Here we Use Machine learning to make predictions, initially we train the machine learning algorithms with known data then we make predictions of new data(Unknown data)

Prescriptive Analytics:

Prescriptive analytics are the high level analytics talks about the What to do? We see with one Example Google maps suggest the best route based on the predictions of traffic.

Data Analysis:

Data analysis is a sub field of data analytics that refers to specific actions like variable analysis.

Three main types of data analysis are there

  • Univariate
  • Bivariate
  • Multivariate

--

--

vamsi krishna

I'm a Data Science enthusiast who loves to play with machine learning and deep learning.