What is Seaborn?
Seaborn is a Python data visualization library built on top of Matplotlib that provides a high-level interface for drawing attractive and informative statistical graphics. It is particularly well-suited for visualizing complex datasets and exploring relationships between variables.
Key Features of Seaborn:
- Easy-to-use Interface: Seaborn offers simple functions to create complex visualizations with minimal code.
- Integrated with Pandas: Seaborn works seamlessly with Pandas DataFrames, making it easy to visualize data stored in tabular formats.
- Statistical Graphics: Seaborn includes functions to plot statistical graphics such as bar plots, box plots, scatter plots, and more.
Example 1: Scatter Plot
import seaborn as sns
import matplotlib.pyplot as plt
height = [62, 64, 69, 75, 66, 68, 65, 71, 76, 73]
weight = [120, 136, 148, 175, 137,165, 154, 172, 200, 187]
sns.scatterplot(x=height, y=weight)
plt.show()
Example 2: Count Plot
import seaborn as sns
import matplotlib.pyplot as plt
gender = ["Female", "Female", "Female", "Female",
"Male", "Male", "Male", "Male", "Male", "Male"]
sns.countplot(x=gender)
plt.show()
Example 3: Using DataFrames
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("masculinity.csv")
sns.countplot(x="how_masculine", data=df)
plt.show()
Adding a Third Variable Using hue
The hue parameter in Seaborn allows you to add a third variable to your plots by using different colors to represent different categories. This can be particularly useful for visualizing relationships between multiple variables.
Example: Scatter Plot with hue
Let's create a scatter plot with a third variable using hue
.
A Basic Scatter Plot:
import matplotlib.pyplot as plt
import seaborn as sns
sns.scatterplot(x="total_bill", y="tip", data=tips)
plt.show()
A Scatter Plot with hue:
sns.scatterplot(x="total_bill", y="tip", data=tips, hue="smoker")
plt.show()
Setting hue order:
sns.scatterplot(x="total_bill", y="tip", data=tips,
hue="smoker", hue_order=["Yes", "No"])
plt.show()
Specifying hue colors:
sns.scatterplot(x="total_bill", y="tip", data=tips,
hue="smoker", palette={"Yes":"black", "No":"red"})
plt.show()
Example: Count Plot with hue:
import matplotlib.pyplot as plt
import seaborn as sns
sns.countplot(x="smoker", data=tips, hue="sex")
plt.show()
Introducing relplot()
Seaborn's relplot
is a powerful function that can create various types of relational plots, such as scatter plots and line plots. It provides a high-level interface for drawing attractive and informative statistical graphics.
It can create subplots based on the values of categorical variables.
Scatter Plot with relplot:
Subgroups with point size:
We will use the tips
dataset again to demonstrate relplot
.
import seaborn as sns
import matplotlib.pyplot as plt
# Scatter plot with relplot
sns.relplot(x='total_bill', y='tip', data=tips, kind='scatter', size='size')
plt.show()
Subgroups with point size and hue:
import seaborn as sns
import matplotlib.pyplot as plt
# Scatter plot with relplot
sns.relplot(x='total_bill', y='tip', hue='size', data=tips, kind='scatter', size='size')
plt.show()
Subgroups with point style:
import seaborn as sns
import matplotlib.pyplot as plt
# Scatter plot with relplot
sns.relplot(x='total_bill', y='tip', hue='smoker', data=tips, kind='scatter', style='smoker')
plt.show()
Changing alpha value:
import seaborn as sns
import matplotlib.pyplot as plt
# Scatter plot with relplot
sns.relplot(x='total_bill', y='tip', data=tips, kind='scatter', alpha=0.4)
plt.show()
Subplots in Columns and Rows:
relplot can create subplots by faceting the data across multiple columns or rows.
Subplots in Columns:
# Subplots in columns
sns.relplot(x='total_bill', y='tip', data=tips, kind='scatter', col='smoker')
plt.show()
Subplots in Rows:
# Subplots in rows
sns.relplot(x='total_bill', y='tip', data=tips, kind='scatter', row='smoker')
plt.show()
Subplots in Rows and Columns:
# Subplots in rows and columns
sns.relplot(x='total_bill', y='tip', data=tips, kind='scatter', col='smoker', row='time')
plt.show()
Using col_wrap:
col_wrap
is used to wrap the columns into multiple rows.
# Subplots with col_wrap
sns.relplot(x='total_bill', y='tip', data=tips, kind='scatter', col='day', col_wrap=2)
plt.show()
Using col_order:
col_order
sets the order of the columns.
# Subplots with col_order
sns.relplot(x='total_bill', y='tip', data=tips, kind='scatter', col='day', col_wrap=2, col_order=['Thur', 'Fri', 'Sat', 'Sun'])
plt.show()
Line Plot with relplot:
We can also create line plots using relplot.
# Line plot with relplot
sns.relplot(x='hour', y='NO_2_mean', data=air_df_mean, kind='line', markers=True, style='location', hue='location')
plt.show()
In this example:
markers=True
adds markers to the data points on the line plot.style
changes the line style.
Introducing catplot()
Seaborn's catplot
is a powerful function used to create categorical plots. It can generate a variety of plots including bar plots, count plots, box plots, and more. This function is highly flexible and can be used to create complex visualizations by faceting the data across multiple subplots.
Count Plot with catplot:
A count plot is a type of bar plot that shows the number of observations in each category.
import seaborn as sns
import matplotlib.pyplot as plt
category_order = ["No answer", "Not at all", "Not very", "Somewhat", "Very"]
sns.catplot(x="how_masculine", data=masculinity_data, kind="count", order=category_order)
plt.show()
Bar Plot with catplot:
A bar plot shows the relationship between a categorical variable and a continuous variable.
# Bar plot with catplot
sns.catplot(x='day', y='total_bill', data=tips, kind='bar', order=['Thur', 'Fri', 'Sat', 'Sun'])
plt.show()
Box Plot with catplot:
A box plot shows the distribution of a continuous variable for each category of a categorical variable.
# Box plot with catplot
sns.catplot(x='time', y='total_bill', data=tips, kind='box', order=['Dinner', 'Lunch'])
plt.show()
Adding Titles and Labels
- plt.title(): Sets the title of the plot.
plt.xlabel():
Sets the label for the x-axis.
plt.ylabel():
Sets the label for the y-axis.
plt.xticks(rotation=90):
Rotates the x-axis tick labels for better readability.
import seaborn as sns
import matplotlib.pyplot as plt
# Count plot with catplot
sns.relplot(x="weight",
y="horsepower",
data=mpg,
kind="scatter")
plt.title('Title')
plt.xlabel('xlabel')
plt.ylabel('ylabel')
plt.xticks(rotation=90)
plt.show()