The main idea of Seaborn is that it provides high-level commands to create a variety of plot types useful for statistical data exploration, and even some statistical model fitting.
Install it using this command:(write the command on the command python command prompt/ ANACONDA prompt)
pip install seaborn
or
conda install seaborn
Once Seaborn is installed, import it in your applications by adding the __import__
keyword:
The version string is stored under __version__
attribute.
import seaborn as sns
print(sns.__version__)
0.11.2
## Import matplotlib
import matplotlib.pyplot as plt
distplot
jointplot
pairplot
df= sns.load_dataset("tips")
df.head()
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
df.dtypes ## Data types
total_bill float64 tip float64 sex category smoker category day category time category size int64 dtype: object
A correlation heatmap uses colored cells, typically in a monochromatic scale, to show a 2D correlation matrix(table) between two discrete dimensions or event types, it is very important for EDA.
df.corr()
total_bill | tip | size | |
---|---|---|---|
total_bill | 1.000000 | 0.675734 | 0.598315 |
tip | 0.675734 | 1.000000 | 0.489299 |
size | 0.598315 | 0.489299 | 1.000000 |
## HeatMap
sns.heatmap(df.corr())
<AxesSubplot:>
sns.displot(df['tip'],kind='kde') ## Kernel Density
plt.title("Kernel Density Plot")
sns.displot(df['tip']) ## Histogram
plt.title("Histogram")
Text(0.5, 1.0, 'Histogram')
## Overlapping KD on Histogram using sns.distplot()
sns.distplot(df['tip'])
C:\Users\User\anaconda3\lib\site-packages\seaborn\distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning)
<AxesSubplot:xlabel='tip', ylabel='Density'>
sns.distplot(df['total_bill'],label="Total Bill")
sns.distplot(df['tip'],label="Tip")
plt.legend()
C:\Users\User\anaconda3\lib\site-packages\seaborn\distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning) C:\Users\User\anaconda3\lib\site-packages\seaborn\distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning)
<matplotlib.legend.Legend at 0x2118673df70>
sns.kdeplot(df.iloc[:,0],label="Total Bill")
sns.kdeplot(df.iloc[:,1],label="Tip")
plt.legend()
<matplotlib.legend.Legend at 0x211865379a0>
It plots X-Y scatter plot along with Merginal distributions (Hist./KD) of X & Y.
sns.jointplot(x='total_bill',y='tip',data=df)
<seaborn.axisgrid.JointGrid at 0x2118b44d400>
sns.jointplot(x='total_bill',y='tip',data=df,kind='hex') ## kind='hex' draws Hexagon
<seaborn.axisgrid.JointGrid at 0x21189bf2b50>
sns.jointplot(x='total_bill',y='tip',data=df,kind='reg') ## kind='reg' draws regression Line
<seaborn.axisgrid.JointGrid at 0x2118c4ec9d0>
sns.jointplot(x='total_bill',y='tip',data=df,kind='kde') ## kind='kde' draws Kernel Density
<seaborn.axisgrid.JointGrid at 0x2118df0b610>
penguins = sns.load_dataset("penguins")
sns.jointplot(data=penguins, x="flipper_length_mm", y="bill_length_mm", hue="species")
<seaborn.axisgrid.JointGrid at 0x2118f8a4d30>
A pair plot is also known as a scatterplot, in which one variable in the same data row is mattached with another variable's value,like this Pairs plots are just elaborations on this, showing all variables paired with all the other variables.
sns.pairplot(df)
<seaborn.axisgrid.PairGrid at 0x2118e317490>
sns.pairplot(df,hue="sex") ## Group By Sex
<seaborn.axisgrid.PairGrid at 0x2118e317520>
sns.pairplot(data=penguins, hue="species")
<seaborn.axisgrid.PairGrid at 0x2118bbd0e50>
sns.boxplot(x='smoker',y='total_bill',data=df)
<AxesSubplot:xlabel='smoker', ylabel='total_bill'>
sns.boxplot(x='total_bill',y='day',hue='smoker',data=df,palette='rainbow')
<AxesSubplot:xlabel='total_bill', ylabel='day'>
sns.countplot(x='sex',data=df)
<AxesSubplot:xlabel='sex', ylabel='count'>
sns.countplot(y='day',data=df) ## Horizontal Countplot
<AxesSubplot:xlabel='count', ylabel='day'>
sns.barplot(x='sex',y='tip',data=df)
<AxesSubplot:xlabel='sex', ylabel='tip'>
Further Reading: https://seaborn.pydata.org/tutorial/introduction.html