Do you aspire to be a data scientist? Then is it essential that you have a solid understanding of the core concepts of statistics. Everyone doesn’t have a Ph.D. in Statistics. And that isn’t the only way to excel in the field of data science. But yes, knowing stats well is a prerequisite for data science.
Nowadays, popularly used libraries, like Tesorflow, liberate the user from the intricacies of complex mathematics. Still, it is advisable to be familiar with the fundamental principles on which they work, because that will enable you to use the libraries better.
In this blog, we attempt to shed light on some basic concepts, theorems and equations of statistics for data science.
Statistical distributions are important tools that you must arm yourself with to be a skilled data scientist. Here, we shall talk about two important distributions, namely Poisson distribution and Binomial distribution.
This distribution is used to find out the number of events that are expected to occur during an interval of time. For example, the number of page views in one second, the number of phone calls in a particular period of time, number of sales per hour, etc.
The symbols used in the equation are:
x: exact number of successes
e: constant equal to 2.71828 approximately
λ: average number of successes per time interval
Poisson distribution is used for calculating losses in manufacturing. Let us consider that a machine generates metal sheets that have ‘x’ flaws per yard. Suppose the error rate is 2 per yard of sheet (λ). Applying this information to Poisson distribution, we can calculate the probability of having exactly two errors in a yard.
Poisson distribution is used for faster detection of anomalies.
This is a very common distribution in Statistics. Suppose you have flipped a coin thrice. Using basic combinatorics for flipping a coin thrice, we see that there are eight combinations possible. We find out the probabilities of getting 0, 1, 2 or 3 heads and plot this on a graph. This gives us the binomial distribution for this particular problem. It must be remembered that Binomial distribution curve is similar to a Normal distribution Curve. Normal distribution is used when values are continuous and Binomial distribution is used for discrete values.
Binomial distribution is a discrete probability distribution where number of trials is predetermined and there are two possible outcomes– success and failure, win or lose, gain or loss. Depending on a few conditions, like the total number of trails is large, the probability of success is near 1 and the probability of failure is near 0, the trails are independent and identical, etc., the binomial distribution is approximated to a normal distribution.
Binomial distribution has many applications in business. For example, it is estimated that 5% of tax returns for individuals with high net worth in USA is fraudulent. These frauds might be uncovered through audits. Binomial distribution is used to find out for ‘n’ number of tax returns that are audited, what is the probability for say 5 fraudulent returns to be uncovered.
There are some more probability distributions, like Bernoulli and Geometric distributions. We shall cover that and more in the following blogs. So, stay tuned and follow DexLab Analytics. The experts here offer top-quality data science courses in Delhi. Go through the data science certification details right now!
Interested in a career in Data Analyst?
To learn more about Machine Learning Using Python and Spark – Enrol Now.
To learn more about Data Analyst with SAS Course – Enrol Now.
To learn more about Data Analyst with Apache Spark Course – Enrol Now.
To learn more about Data Analyst with Market Risk Analytics and Modelling Course – Enrol Now.