• AI ML DS
  • Data Science
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • Deep Learning
  • NLP
  • Computer Vision
  • Artificial Intelligence
  • AI ML DS Interview Series
  • AI ML DS Projects series
  • Data Engineering
  • Web Scrapping

Open In App

Last Updated : 11 Feb, 2024

Comments

Improve

Summarize

Suggest changes

Like Article

Like

Save

Report

The quantile-quantile( q-q plot) plot is a graphical method for determining if a dataset follows a certain probability distribution or whether two samples of data came from the same population or not. Q-Q plots are particularly useful for assessing whether a dataset is normally distributed or if it follows some other known distribution. They are commonly used in statistics, data analysis, and quality control to check assumptions and identify departures from expected distributions.

Quantiles And Percentiles

Quantiles are points in a dataset that divide the data into intervals containing equal probabilities or proportions of the total distribution. They are often used to describe the spread or distribution of a dataset. The most common quantiles are:

  1. Median (50th percentile): The median is the middle value of a dataset when it is ordered from smallest to largest. It divides the dataset into two equal halves.
  2. Quartiles (25th, 50th, and 75th percentiles): Quartiles divide the dataset into four equal parts. The first quartile (Q1) is the value below which 25% of the data falls, the second quartile (Q2) is the median, and the third quartile (Q3) is the value below which 75% of the data falls.
  3. Percentiles: Percentiles are similar to quartiles but divide the dataset into 100 equal parts. For example, the 90th percentile is the value below which 90% of the data falls.

Note:

  • A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set.
  • For reference purposes, a 45% line is also plotted; For if the samples are from the same population then the points are along this line.


Normal Distribution:

The normal distribution (aka Gaussian distribution Bell curve) is a continuous probability distribution representing distribution obtained from the randomly generated real values.

.Quantile Quantile plots - GeeksforGeeks (2)

Quantile Quantile plots - GeeksforGeeks (3)

Quantile Quantile plots - GeeksforGeeks (4)

Normal Distribution with Area Under CUrve

How to Draw Q-Q plot?

To draw a Quantile-Quantile (Q-Q) plot, you can follow these steps:

  1. Collect the Data: Gather the dataset for which you want to create the Q-Q plot. Ensure that the data are numerical and represent a random sample from the population of interest.
  2. Sort the Data: Arrange the data in either ascending or descending order. This step is essential for computing quantiles accurately.
  3. Choose a Theoretical Distribution: Determine the theoretical distribution against which you want to compare your dataset. Common choices include the normal distribution, exponential distribution, or any other distribution that fits your data well.
  4. Calculate Theoretical Quantiles: Compute the quantiles for the chosen theoretical distribution. For example, if you’re comparing against a normal distribution, you would use the inverse cumulative distribution function (CDF) of the normal distribution to find the expected quantiles.
  5. Plotting:
    • Plot the sorted dataset values on the x-axis.
    • Plot the corresponding theoretical quantiles on the y-axis.
    • Each data point (x, y) represents a pair of observed and expected values.
    • Connect the data points to visually inspect the relationship between the dataset and the theoretical distribution.


Interpretation of Q-Q plot

  • If the points on the plot fall approximately along a straight line, it suggests that your dataset follows the assumed distribution.
  • Deviations from the straight line indicate departures from the assumed distribution, requiring further investigation.

Exploring Distribution Similarity with Q-Q Plots


Exploring distribution similarity using Q-Q plots is a fundamental task in statistics. Comparing two datasets to determine if they originate from the same distribution is vital for various analytical purposes. When the assumption of a common distribution holds, merging datasets can improve parameter estimation accuracy, such as for location and scale. Q-Q plots, short for quantile-quantile plots, offer a visual method for assessing distribution similarity. In these plots, quantiles from one dataset are plotted against quantiles from another. If the points closely align along a diagonal line, it suggests similarity between the distributions. Deviations from this diagonal line indicate differences in distribution characteristics.

While tests like the chi-square and Kolmogorov-Smirnov tests can evaluate overall distribution differences, Q-Q plots provide a nuanced perspective by directly comparing quantiles. This enables analysts to discern specific differences, such as shifts in location or changes in scale, which may not be evident from formal statistical tests alone.

Python Implementation Of Q-Q Plot

Python3

import numpy as np

import matplotlib.pyplot as plt

import scipy.stats as stats

# Generate example data

np.random.seed(0)

data = np.random.normal(loc=0, scale=1, size=1000)

# Create Q-Q plot

stats.probplot(data, dist="norm", plot=plt)

plt.title('Normal Q-Q plot')

plt.xlabel('Theoretical quantiles')

plt.ylabel('Ordered Values')

plt.grid(True)

plt.show()

 
 

Output:

Quantile Quantile plots - GeeksforGeeks (5)

Q-Q plot

Here, as the data points approximately follow a straight line in the Q-Q plot, it suggests that the dataset is consistent with the assumed theoretical distribution, which in this case we assumed to be the normal distribution.

Advantages of Q-Q plot

  1. Flexible Comparison: Q-Q plots can compare datasets of different sizes without requiring equal sample sizes.
  2. Dimensionless Analysis: They are dimensionless, making them suitable for comparing datasets with different units or scales.
  3. Visual Interpretation: Provides a clear visual representation of data distribution compared to a theoretical distribution.
  4. Sensitive to Deviations: Easily detects departures from assumed distributions, aiding in identifying data discrepancies.
  5. Diagnostic Tool: Helps in assessing distributional assumptions, identifying outliers, and understanding data patterns.

Applications Of Quantile-Quantile Plot

The Quantile-Quantile plot is used for the following purpose:

  1. Assessing Distributional Assumptions: Q-Q plots are frequently used to visually inspect whether a dataset follows a specific probability distribution, such as the normal distribution. By comparing the quantiles of the observed data to the quantiles of the assumed distribution, deviations from the assumed distribution can be detected. This is crucial in many statistical analyses, where the validity of distributional assumptions impacts the accuracy of statistical inferences.
  2. Detecting Outliers: Outliers are data points that deviate significantly from the rest of the dataset. Q-Q plots can help identify outliers by revealing data points that fall far from the expected pattern of the distribution. Outliers may appear as points that deviate from the expected straight line in the plot.
  3. Comparing Distributions: Q-Q plots can be used to compare two datasets to see if they come from the same distribution. This is achieved by plotting the quantiles of one dataset against the quantiles of another dataset. If the points fall approximately along a straight line, it suggests that the two datasets are drawn from the same distribution.
  4. Assessing Normality: Q-Q plots are particularly useful for assessing the normality of a dataset. If the data points in the plot closely follow a straight line, it indicates that the dataset is approximately normally distributed. Deviations from the line suggest departures from normality, which may require further investigation or non-parametric statistical techniques.
  5. Model Validation: In fields like econometrics and machine learning, Q-Q plots are used to validate predictive models. By comparing the quantiles of observed responses with the quantiles predicted by a model, one can assess how well the model fits the data. Deviations from the expected pattern may indicate areas where the model needs improvement.
  6. Quality Control: Q-Q plots are employed in quality control processes to monitor the distribution of measured or observed values over time or across different batches. Departures from expected patterns in the plot may signal changes in the underlying processes, prompting further investigation.

Types of Q-Q plots

There are several types of Q-Q plots commonly used in statistics and data analysis, each suited to different scenarios or purposes:

  1. Normal Distribution: A symmetric distribution where the Q-Q plot would show points approximately along a diagonal line if the data adheres to a normal distribution.
  2. Right-skewed Distribution: A distribution where the Q-Q plot would display a pattern where the observed quantiles deviate from the straight line towards the upper end, indicating a longer tail on the right side.
  3. Left-skewed Distribution: A distribution where the Q-Q plot would exhibit a pattern where the observed quantiles deviate from the straight line towards the lower end, indicating a longer tail on the left side.
  4. Under-dispersed Distribution: A distribution where the Q-Q plot would show observed quantiles clustered more tightly around the diagonal line compared to the theoretical quantiles, suggesting lower variance.
  5. Over-dispersed Distribution: A distribution where the Q-Q plot would display observed quantiles more spread out or deviating from the diagonal line, indicating higher variance or dispersion compared to the theoretical distribution.

Python3

import numpy as np

import matplotlib.pyplot as plt

import scipy.stats as stats

# Generate a random sample from a normal distribution

normal_data = np.random.normal(loc=0, scale=1, size=1000)

# Generate a random sample from a right-skewed distribution (exponential distribution)

right_skewed_data = np.random.exponential(scale=1, size=1000)

# Generate a random sample from a left-skewed distribution (negative exponential distribution)

left_skewed_data = -np.random.exponential(scale=1, size=1000)

# Generate a random sample from an under-dispersed distribution (truncated normal distribution)

under_dispersed_data = np.random.normal(loc=0, scale=0.5, size=1000)

under_dispersed_data = under_dispersed_data[(under_dispersed_data > -1) & (under_dispersed_data < 1)] # Truncate

# Generate a random sample from an over-dispersed distribution (mixture of normals)

over_dispersed_data = np.concatenate((np.random.normal(loc=-2, scale=1, size=500),

np.random.normal(loc=2, scale=1, size=500)))

# Create Q-Q plots

plt.figure(figsize=(15, 10))

plt.subplot(2, 3, 1)

stats.probplot(normal_data, dist="norm", plot=plt)

plt.title('Q-Q Plot - Normal Distribution')

plt.subplot(2, 3, 2)

stats.probplot(right_skewed_data, dist="expon", plot=plt)

plt.title('Q-Q Plot - Right-skewed Distribution')

plt.subplot(2, 3, 3)

stats.probplot(left_skewed_data, dist="expon", plot=plt)

plt.title('Q-Q Plot - Left-skewed Distribution')

plt.subplot(2, 3, 4)

stats.probplot(under_dispersed_data, dist="norm", plot=plt)

plt.title('Q-Q Plot - Under-dispersed Distribution')

plt.subplot(2, 3, 5)

stats.probplot(over_dispersed_data, dist="norm", plot=plt)

plt.title('Q-Q Plot - Over-dispersed Distribution')

plt.tight_layout()

plt.show()

 
 

Output:

Quantile Quantile plots - GeeksforGeeks (6)

Q-Q plot for different distributions



P

pawangfg

Quantile Quantile plots - GeeksforGeeks (7)

Improve

Previous Article

Box Plot

Next Article

Please Login to comment...

Similar Reads

qqplot (Quantile-Quantile Plot) in Python When the quantiles of two variables are plotted against each other, then the plot obtained is known as quantile - quantile plot or qqplot. This plot provides a summary of whether the distributions of two variables are similar or not with respect to the locations. Interpretations All point of quantiles lie on or close to straight line at an angle of 2 min read Draw a Quantile-Quantile Plot in R Programming - qqline() Function The Quantile-Quantile Plot in R Programming Language, or (Q-Q Plot) is defined as a value of two variables that are plotted corresponding to each other and check whether the distributions of two variables are similar or not concerning the locations. qqline() function in R Programming Language is used to draw a Q-Q Line Plot. QQplot in R Syntax: qql 2 min read Surface plots and Contour plots in Python Matplotlib was introduced keeping in mind, only two-dimensional plotting. But at the time when the release of 1.0 occurred, the 3d utilities were developed upon the 2d and thus, we have 3d implementation of data available today! The 3d plots are enabled by importing the mplot3d toolkit. In this article, we will discuss the surface plots and contour 4 min read Quantile Regression in R Programming Quantile Regression is an algorithm that studies the impact of independent variables on different quantiles of the dependent variable distribution. Quantile Regression provides a complete picture of the relationship between Z and Y. It is robust and effective to outliers in Z observations. In Quantile Regression, the estimation and inferences are d 3 min read How to Perform Quantile Regression in Python In this article, we are going to see how to perform quantile regression in Python. Linear regression is defined as the statistical method that constructs a relationship between a dependent variable and an independent variable as per the given set of variables. While performing linear regression we are curious about computing the mean value of the r 4 min read Quantile Transformer for Outlier Detection Data transformation is a mathematical function that changes the data into a scaled value, which makes it possible to compare different columns, e.g., salary in INR with weight in kilograms. Transforming the data will satisfy certain mathematical assumptions such as normalization, standardization, hom*ogeneity, linearity, etc. Quantile Transformer is 11 min read How Symmetric Weighted Quantile Sketch (SWQS) works? A strong method for quickly determining a dataset's quantiles in data science and machine learning is the Symmetric Weighted Quantile Sketch (SWQS). Quantiles are cut points that divide a probability distribution's range into adjacent intervals with equal probabilities. They are crucial for data summarization, machine learning model assessment, and 7 min read How to improve the performance of segmented regression using quantile regression in R? Segmented regression, also known as piecewise or broken-line regression is a powerful statistical technique used to identify changes in the relationship between a dependent variable and one or more independent variables. Quantile regression, on the other hand, estimates the conditional quantiles of a response variable distribution in the linear mod 8 min read Seaborn | Distribution Plots Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. This article deals with the distribution plots in seaborn which is used for examining univariate and bivariate distributions. In this article we will be discussing 4 types of distributio 3 min read ML | Matrix plots in Seaborn Seaborn is a wonderful visualization library provided by python. It has several kinds of plots through which it provides the amazing visualization capabilities. Some of them include count plot, scatter plot, pair plots, regression plots, matrix plots and much more. This article deals with the matrix plots in seaborn. Example 1: Heatmaps Heatmap is 4 min read Seaborn | Regression Plots The regression plots in seaborn are primarily intended to add a visual guide that helps to emphasize patterns in a dataset during exploratory data analyses. Regression plots as the name suggests creates a regression line between 2 parameters and helps to visualize their linear relationships. This article deals with those kinds of plots in seaborn a 4 min read Seaborn | Categorical Plots Plots are basically used for visualizing the relationship between variables. Those variables can be either be completely numerical or a category like a group, class or division. This article deals with categorical variables and how they can be visualized using the Seaborn library provided by Python. Seaborn besides being a statistical plotting libr 5 min read Relational plots in Seaborn - Part I Relational plots are used for visualizing the statistical relationship between the data points. Visualization is necessary because it allows the human to see trends and patterns in the data. The process of understanding how the variables in the dataset relate each other and their relationships are termed as Statistical analysis. Seaborn, unlike to 4 min read Relational plots in Seaborn - Part II Prerequisite: Relational Plots in Seaborn - Part IIn the previous part of this article, we learnt about the relplot(). Now, we will be reading about the other two relational plots, namely scatterplot() and lineplot() provided in seaborn library. Both these plots can also be drawn with the help of kind parameter in relplot(). Basically relplot(), by 6 min read Visualising ML DataSet Through Seaborn Plots and Matplotlib Working on data can sometimes be a bit boring. Transforming a raw data into an understandable format is one of the most essential part of the whole process, then why to just stick around on numbers, when we can visualize our data into mind-blowing graphs which are up for grabs in python. This article will focus on exploring plots which could make y 7 min read Visualizing Relationship between variables with scatter plots in Seaborn To understand how variables in a dataset are related to one another and how that relationship is dependent on other variables, we perform statistical analysis. This Statistical analysis helps to visualize the trends and identify various patterns in the dataset. One of the functions which can be used to get the relationship between two variables in 2 min read Dex scatter Plots The scatter plot is used to determine whether two variables have some correlation with each other. This plot uses dots to represent different data points in the dataset. The dex (design and Experiment) scatter plot is a technique used in the analysis of the process. Dex Scatter plot is used to answer three important questions: What data points are 3 min read Contour plots A contour plot is a graphical method to visualize the 3-D surface by plotting constant Z slices called contours in a 2-D format. The contour plot is an alternative to a 3-D surface plot The contour plot is formed by: Vertical axis: Independent variable 2Horizontal axis: Independent variable 1Lines: iso-response values, can be calculated with the he 4 min read Lag Plots A lag plot is a special type of scatter plot in which the X-axis represents the dataset with some time units behind or ahead as compared to the Y-axis. The difference between these time units is called lag or lagged and it is represented by k. The lag plot contains the following axes: Vertical axis: Yi for all iHorizontal axis: Yi-k for all i, wher 3 min read Data Visualisation using ggplot2(Scatter Plots) The correlation Scatter Plot is a crucial tool in data visualization and helps to identify the relationship between two continuous variables. In this article, we will discuss how to create a Correlation Scatter Plot using ggplot2 in R. The ggplot2 library is a popular library used for creating beautiful and informative data visualizations in R Prog 7 min read How to do 3D line plots grouped by two factors with the Plotly package in R? Users can construct dynamic and visually appealing charts with Plotly, a strong and adaptable data visualization library. We will be able to produce 3D line plots with Plotly that can be used to evaluate complex data and effectively convey findings. In this article, we will explore the process of creating 3D line plots that are grouped by two facto 5 min read 8 Types of Plots for Time Series Analysis using Python Time series data Time series data is a collection of observations chronologically arranged at regular time intervals. Each observation corresponds to a specific time point, and the data can be recorded at various frequencies (e.g., daily, monthly, yearly). This type of data is very essential in many fields, including finance, economics, climate sci 10 min read How to Use JupyterLab Inline Interactive Plots This article shows how to create inline interactive plots in JupyterLab with Python-3 programming language. It assumes basic familiarity with JupyterLab/Jupyter Notebooks and Python-3. By the end of the article, the reader will be able to understand and create inline interactive plots with Matplotlib, Bokeh, and Plotly plotting libraries inside a J 4 min read Diagnostic Plots for Model Evaluation Model evaluation is a critical step in the lifecycle of any statistical or machine-learning model. Diagnostic plots play a crucial role in assessing the performance, assumptions, and potential issues of a model. In this comprehensive overview, we will delve into the theory behind diagnostic plots, their types, and their interpretation. Purpose of D 8 min read Animating Scatter Plots in Matplotlib An animated scatter plot is a dynamic records visualization in Python that makes use of a series of frames or time steps to reveal data points and exchange their positions or attributes over time. Each body represents a second in time, and the scatter plot is up to date for each frame, allowing you to peer traits, fluctuations, or moves in the info 3 min read Exploration with Hexagonal Binning and Contour Plots Hexagonal binning is a plot of two numeric variables with the records binned into hexagons. The code below is a hexagon binning plot of the relationship between the finished square feet versus the tax-assessed value for homes. Rather than plotting points, records are grouped into hexagonal bins and color indicating the number of records in that bin 2 min read Histograms and Density Plots in R A histogram is a graphical representation that organizes a group of data points into user-specified ranges and an approximate representation of the distribution of numerical data. In R language the histogram is built with the use of the hist() function. Syntax: hist(v,main,xlab,xlim,ylim,breaks,col,border) Parameters: v:- It is a vector containing 3 min read Visualizing Data with pyCirclize: A Guide to Circular Plots PyCirclize is a versatile Python package designed for creating eye-catching circular visualizations. Inspired by the R package "circlize", it leverages the capabilities of matplotlib to generate various circular plots, including Circos Plots, Chord Diagrams, and Radar Charts. In this article, we will implement examples using pyCirclize to demonstra 9 min read How to Set the Hue Order in Seaborn Plots Setting the hueorder in Seaborn plotsallows you tocontrol the orderin which categoricallevels are displayed. This can beparticularlyuseful for ensuringconsistency acrossmultiple plotsor for emphasizingspecific categories. Below are detailedsteps and examplesfor setting thehue order indifferent typesof Seaborn plots. Table of Content 4 min read Residual plots for Nonlinear Regression Nonlinear regression is a form of regression analysis where data is fit to a model expressed as a nonlinear function. Unlike linear regression, where the relationship between the independent and dependent variables is linear, nonlinear regression involves more complex relationships. One of the critical tools in evaluating the fit of a nonlinear reg 4 min read

Article Tags :

  • AI-ML-DS
  • Machine Learning
  • Data Visualization
  • ML-EDA

Practice Tags :

  • Machine Learning

Trending in News

View More
  • California Lawmakers Pass Bill to Limit AI Replicas
  • Best 10 IPTV Service Providers in Germany
  • Python 3.13 Releases | Enhanced REPL for Developers
  • IPTV Anbieter in Deutschland - Top IPTV Anbieter Abonnements
  • Content Improvement League 2024: From Good To A Great Article

We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

Quantile Quantile plots - GeeksforGeeks (8)

Quantile Quantile plots - GeeksforGeeks (2024)
Top Articles
Latest Posts
Article information

Author: Dan Stracke

Last Updated:

Views: 6006

Rating: 4.2 / 5 (43 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Dan Stracke

Birthday: 1992-08-25

Address: 2253 Brown Springs, East Alla, OH 38634-0309

Phone: +398735162064

Job: Investor Government Associate

Hobby: Shopping, LARPing, Scrapbooking, Surfing, Slacklining, Dance, Glassblowing

Introduction: My name is Dan Stracke, I am a homely, gleaming, glamorous, inquisitive, homely, gorgeous, light person who loves writing and wants to share my knowledge and understanding with you.