To be a successful analyst or be part of great analytics team. There are three important dimensions one would aspire to be or have. The three important dimensions are technical, business and tools. Hence, we would begin with one of the sub dimension of the technical skills, i.e. being Quantified Self or developing quantitative skills.
As per the Informs, the definition of Analytics shall be:
“Analytics is defined as the scientific process of transforming data into insight for making better decisions”
Analytics is highly quantitative in nature. Statistics and Mathematics play a major role in bringing insights from the data. Statistics and Mathematics provides an analyst with some effective tools to quantitatively summarize data.
The Five Number summary is one of the basic techniques to do analysis on a quantitative variable.
Anyone who does descriptive analytics or statistics, they most probably know this technique Five Number Summary. The Five Number Summary helps an analyst to find the Minimum, First Quartile, Median, Third Quartile and Maximum from a set of numerical data. The Five Number summary helps us to identify the data distributions. Let’s begin with identifying the data distributions.
The five number summary of a set of observations on a single variable consists of the following statistics:
Maximum (max) – the largest observation
Upper Quartile (Q3) – a value that separates the largest 25% of the observations from the smallest 75%
Median (M) – a value that separates the largest 50% of the observations from the smallest 50%
Lower Quartile (Q1) – a value that separates the largest 75% of the observations from the smallest 25%.
Minimum (min) – the smallest observation
We can use any of the statistical packages to arrive at the Five Number Summary
This technique helps an analyst to bring insights from a quantitative or numerical data.
The below Dataset A has 10 days of sales for a company A is as follows.
Dataset A – 133,195,194,150,210,345,234,245,345,355
To do the Five Number Summary, You can use either R or Excel to calculate it.
Steps to do in R, type the following in R.
> five<-c (133,195,194,150,210,345,234,245,345,355) #R Command
> summary(five) #R Command
The output will be following
Min – 133.0
1st Quartile (Lower Quartile Q1) – 194.2
Median – 222.0
Mean – 240.6
3rd Quartile (Upper Quartile Q3) – 320.0
Max – 355.0
The Lower Quartile Q1 – 194.2 states that 25% of the sales falls below at 194.2 and 75% of the data falls above 194.2
The Upper Quartile Q3 – 320.0 stats that 75% of the sales data falls below at 320.0 and 25% of the data falls above 320.
Method 2 :
Step 1 – Sort the data by ascending order.
Dataset B – 133,150,194,195,210,234,245,345,345,355
Step 2 – Split the data into two half.
Dataset C – 133,150,194,195,210
Dataset D – 234,245,345,345,355
Step 3 – Calculate Five Number Summaries
Min and Max can be easily identified. First Value and the Last Value in the Dataset B is Min and Max value.
Lower Quartile Q1 – The lower quartile value is the median of the lower half of the data i.e. Dataset C
Upper Quartile Q3 – The upper quartile value is the median of the upper half of the data i.e. Dataset D
As per the method 2 – the results are as following
>fivenum(five) # R command
Min – 133, Max – 355, Median – 222, Lower Quartile (Q1) – 194 and Upper Quartile (Q3) – 345
We can also explore the same Five Number Summary using Box Plot. Will see it in the coming post.