Sampling Methods

Sampling methods or sampling techniques.

  1. Probability Sampling Techniques
    Simple Random Sampling
    Systematic Sampling
    Stratified Sampling
    Cluster Sampling
  2. Non-probability Sampling Techniques
    Convenience Sampling
    Judgmental Sampling
    Snowball Sampling
    Quota Sampling

Market Basket Analysis made simple

Market Basket Analysis helps to discover associations between items in customers' purchases. It reveals patterns like:

"Customers who buy bread often also buy butter"

This means: Bread → Butter

Such insights are useful in product placement, promotions and in increasing sales.

Three metrics help us to evaluate such rules.

Timeless Statistical Concepts Every Data Scientist Must Master - With Links to Visual Illustrations & Examples

In this blog post, I will outline the foundational statistical concepts that are essential for every Data Scientist to know. These concepts are timeless and fundamental to Data Science and will not change unlike the constantly evolving versions of software that we use for data analysis. Hence, if you learn these concepts thoroughly, you will remain up-to-date and better equipped with the skills needed to excel.
  1. The first concept is the Scales of measurement (nominal, ordinal, interval and ratio - understanding scales of measurement is essential for deciding on the appropriate analysis to perform on each type of scale.
  2. Degrees of Freedom - a basic concept
  3. Z-score - how many standard deviations a data point is from the mean
  4. Central Limit Theorem - an important concept
  5. Standard Deviation vs Standard Error - a confusing topic
  6.  Confidence Interval - useful for interpretation
  7. Confusion matrix: useful tool for measuring the accuracy of a classification model.
  8. Occam's Razor, Bias-Variance Tradeoff, No Free Lunch Theorem and The Curse of Dimensionality - to understand the limitations of machine learning
  9. Train-Test split and Cross-validation: for building an optimum model which neither underfits nor overfits the dataset.
  10. Components of Time Series (TCSI): this is the fundamental concept for time series analysis.




Non-linear Relationships: When a 0 Pearson Correlation Coefficient Can Be Surprisingly Meaningful

We know that Pearson correlation coefficient (r) ranges from -1 to +1. And a zero Pearson correlation coefficient means there exists no linear relationship between the variables.

Here the word linear is crucial. Why? Let's find this out using an example where Pearson Correlation Coefficient = 0.

Consider a case where Y=X2.

Understanding Confidence Intervals with an Intuitive Example

The concept of confidence intervals (CI) is commonly used in data science. Hence, using an intuitive example, let us learn it with confidence!

Imagine you are waiting for the bus at a bus stop. Usually, the bus arrives at 9.30 am. But the arrival time varies.

Another person arrives at the bus stop to catch the same bus and asks you, "Based on your experience, between 9.25 am to 9.35 am, what percentage of the time the bus arrived here?"

You think and answer, "90% of the time".

Standard Deviation vs Standard Error: Clearing up the Confusion with Visual Examples

Standard deviation and standard error are two statistical concepts that are often confused with each other. Though these two measures are related to variability in the data, they are different.

Standard deviation measures the variability in the dataset. The formula for standard deviation is given below.

Mastering Central Limit Theorem (CLT) with Intuitive Examples

To understand the Central Limit Theorem (CLT), let's use the example of rolling two dice, repeatedly (say 30 times). Then calculate the sample mean (mean of two dice values) and plot its distribution.

Round 1:
We got 2 and 5. The sample mean of 2 and 5 is 3.5.