Why is the Legend Changing From Continuous to Categorical Ggplot
Introduction
The mtcars dataset contains information on 32 cars from a 1973 issue of Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables.
# Load the ggplot2 package library(ggplot2)
package 㤼㸱ggplot2㤼㸲 was built under R version 3.6.3
library(dplyr)
Attaching package: 㤼㸱dplyr㤼㸲 The following objects are masked from 㤼㸱package:stats㤼㸲: filter, lag The following objects are masked from 㤼㸱package:base㤼㸲: intersect, setdiff, setequal, union
# Explore the mtcars data frame with str() str(mtcars)
'data.frame': 32 obs. of 13 variables: $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... $ cyl : num 6 6 4 6 8 6 8 4 4 6 ... $ disp: num 160 160 108 258 360 ... $ hp : num 110 110 93 110 175 105 245 62 95 123 ... $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ... $ wt : num 2.62 2.88 2.32 3.21 3.44 ... $ qsec: num 16.5 17 18.6 19.4 17 ... $ vs : num 0 0 1 1 0 1 0 1 1 1 ... $ am : num 1 1 1 0 0 0 0 0 0 0 ... $ gear: num 4 4 4 3 3 3 3 4 4 4 ... $ carb: num 4 4 1 1 2 1 4 2 2 4 ... $ fcyl: Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ... $ fam : Factor w/ 2 levels "automatic","manual": 2 2 2 1 1 1 1 1 1 1 ...
# Execute the following command ggplot(mtcars, aes(cyl, mpg)) + geom_point()
Notice that ggplot2 treats cyl as a continuous variable. We get a plot, but it's not quite right, because it gives the impression that there is such a thing as a 5 or 7-cylinder car, which there is not.
Data columns types affect plot types
Although cyl (the number of cylinders) is categorical, you probably noticed that it is classified as numeric in mtcars. This is really misleading because the representation in the plot doesn't match the actual data type. You'll have to explicitly tell ggplot2 that cyl is a categorical variable.
# Load the ggplot2 package library(ggplot2) # Change the command below so that cyl is treated as factor ggplot(mtcars, aes(factor(cyl), mpg)) + geom_point()
Notice that ggplot2 treats cyl as a factor. This time the x-axis does not contain variables like 5 or 7, only the values that are present in the dataset.
The grammar of graphics
Mapping data columns to aesthetics
Let's dive a little deeper into the three main topics in this course: The data, aesthetics, and geom layers.
# Edit to add a color aesthetic mapped to disp ggplot(mtcars, aes(wt, mpg, color = disp)) + geom_point()
# Change the color aesthetic to a size aesthetic ggplot(mtcars, aes(wt, mpg, size = disp)) + geom_point()
Understanding variables
In the previous exercise you saw that disp can be mapped onto a color gradient or onto a continuous size scale.
Another argument of aes() is the shape of the points. There are a finite number of shapes which ggplot() can automatically assign to the points.
ggplot2 layers
Adding geometries
The diamonds dataset contains details of 1,000 diamonds. Among the variables included are carat (a measurement of the diamond's size) and price.
We'll use two common geom layer functions: - geom_point() adds points (as in a scatter plot). - geom_smooth() adds a smooth trend curve.
# Explore the diamonds data frame with str() str(diamonds)
Classes 'tbl_df', 'tbl' and 'data.frame': 53940 obs. of 10 variables: $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ... $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ... $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ... $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ... $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ... $ table : num 55 61 65 58 58 57 57 55 61 61 ... $ price : int 326 326 327 334 335 336 336 337 337 338 ... $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ... $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ... $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
# Add geom_point() with + ggplot(diamonds, aes(carat, price)) + geom_point()
# Add geom_smooth() with + ggplot(diamonds, aes(carat, price)) + geom_point() + geom_smooth()
Changing one geom or every geom
If we have multiple geoms, then mapping an aesthetic to data variable inside the call to ggplot() will change all the geoms. It is also possible to make changes to individual geoms by passing arguments to the geom_*() functions.
geom_point() has an alpha argument that controls the opacity of the points. A value of 1 (the default) means that the points are totally opaque; a value of 0 means the points are totally transparent (and therefore invisible). Values in between specify transparency.
# Map the color aesthetic to clarity ggplot(diamonds, aes(carat, price, color = clarity)) + geom_point() + geom_smooth()
# Make the points 40% opaque ggplot(diamonds, aes(carat, price, color = clarity)) + geom_point(alpha = 0.4) + geom_smooth()
Saving plots as variables
Plots can be saved as variables, which can be added two later on using the + operator. This is really useful if you want to make multiple related plots from a common base.
# Draw a ggplot plt_price_vs_carat <- ggplot( # Use the diamonds dataset diamonds, # For the aesthetics, map x to carat and y to price aes(carat, price) ) # Add a point layer to plt_price_vs_carat plt_price_vs_carat + geom_point()
# Edit this to make points 20% opaque: plt_price_vs_carat_transparent plt_price_vs_carat_transparent <- plt_price_vs_carat + geom_point(alpha = 0.2) # See the plot plt_price_vs_carat_transparent
# Edit this to map color to clarity, # Assign the updated plot to a new object plt_price_vs_carat_by_clarity <- plt_price_vs_carat + geom_point(aes(color = clarity)) # See the plot plt_price_vs_carat_by_clarity
By assigning parts of plots to a variable then reusing that variable in other plots, it makes it really clear how much those plots have in common.
Aesthetics
Color, shape and size
These are the aesthetics we can consider within aes() in this chapter: x, y, color, fill, size, alpha, labels and shape.
One common convention is that you don't name the x and y arguments to aes(), since they almost always come first, but you do name other arguments.
#create factor fcyl mtcars <- mtcars %>% mutate(fcyl = as.factor(cyl), fam = as.factor(am))
library(forcats) mtcars <- mtcars %>% mutate(fam = fct_recode(fam, "manual" = "1", "automatic" = "0"))
# Map x to mpg and y to fcyl ggplot(mtcars, aes(mpg, fcyl)) + geom_point()
# Swap mpg and fcyl ggplot(mtcars, aes(fcyl, mpg)) + geom_point()
# Map x to wt, y to mpg and color to fcyl ggplot(mtcars, aes(wt, mpg, color = fcyl)) + geom_point()
ggplot(mtcars, aes(wt, mpg, color = fcyl)) + # Set the shape and size of the points geom_point(shape = 1, size = 4)
Color vs. fill
Typically, the color aesthetic changes the outline of a geom and the fill aesthetic changes the inside. geom_point() is an exception: you use color (not fill) for the point color. However, some shapes have special behavior.
The default geom_point() uses shape = 19: a solid circle. An alternative is shape = 21: a circle that allow you to use both fill for the inside and color for the outline. This is lets you to map two aesthetics to each point.
All shape values are described on the points() help page.
# Map fcyl to fill ggplot(mtcars, aes(wt, mpg, fill = fcyl)) + geom_point(shape = 1, size = 4)
ggplot(mtcars, aes(wt, mpg, fill = fcyl)) + # Change point shape; set alpha geom_point(shape = 21, size = 4, alpha = 0.6)
# Map color to fam ggplot(mtcars, aes(wt, mpg, fill = fcyl, color = fam)) + geom_point(shape = 21, size = 4, alpha = 0.6)
Notice that mapping a categorical variable onto fill doesn't change the colors, although a legend is generated! This is because the default shape for points only has a color attribute and not a fill attribute! Use fill when you have another shape (such as a bar), or when using a point that does have a fill and a color attribute, such as shape = 21, which is a circle with an outline. Any time you use a solid color, make sure to use alpha blending to account for over plotting.
Comparing aesthetics
Be careful of a major pitfall: these attributes can overwrite the aesthetics of your plot!
# Establish the base layer plt_mpg_vs_wt <- ggplot(mtcars, aes(wt, mpg))
# Map fcyl to size plt_mpg_vs_wt + geom_point(aes(size = fcyl))
plt_mpg_vs_wt + geom_point(aes(alpha = fcyl))
# Map fcyl to shape, not alpha plt_mpg_vs_wt + geom_point(aes(shape = fcyl))
# Use text layer and map fcyl to label plt_mpg_vs_wt + geom_text(aes(label = fcyl))
Label and shape are only applicable to categorical data.
Color, shape, size and alpha
This time we'll use these arguments to set attributes of the plot, not map variables onto aesthetics.
We can specify colors in R using hex codes: a hash followed by two hexadecimal numbers each for red, green, and blue ("#RRGGBB"). Hexadecimal is base-16 counting. We have 0 to 9, and A representing 10 up to F representing 15. Pairs of hexadecimal numbers give you a range from 0 to 255. "#000000" is "black" (no color), "#FFFFFF" means "white", and `"#00FFFF" is cyan (mixed green and blue).
# A hexadecimal color my_blue <- "#4ABEFF"
ggplot(mtcars, aes(wt, mpg)) + # Set the point color and alpha geom_point(color = my_blue, alpha = 0.6)
# Change the color mapping to a fill mapping ggplot(mtcars, aes(wt, mpg, fill = fcyl)) + # Set point size and shape geom_point(color = my_blue, size = 10, shape = 1)
ggplot2 lets you control these attributes in many ways to customize your plots.
Conflicts with aesthetics
We can use all the aesthetics as attributes. Let's see how this works with the aesthetics you used in the previous exercises: x, y, color, fill, size, alpha, label and shape.
ggplot(mtcars, aes(wt, mpg, color = fcyl)) + # Add point layer with alpha 0.5 geom_point(alpha = 0.5)
ggplot(mtcars, aes(wt, mpg, color = fcyl)) + # Add text layer with label rownames(mtcars) and color red geom_text(label = rownames(mtcars), color = "red")
ggplot(mtcars, aes(wt, mpg, color = fcyl)) + # Add points layer with shape 24 and color yellow geom_point(shape = 24, color = "yellow")
Going all out
Now, we will gradually add more aesthetics layers to the plot. We're still working with the mtcars dataset, but this time we're using more features of the cars. Each of the columns is described on the mtcars help page.
Notice that adding more aesthetic mappings to our plot is not always a good idea! We may just increase complexity and decrease readability.
# 3 aesthetics: qsec vs. mpg, colored by fcyl ggplot(mtcars, aes(mpg, qsec, color = fcyl)) + geom_point()
# 4 aesthetics: add a mapping of shape to fam ggplot(mtcars, aes(mpg, qsec, color = fcyl, shape = fam)) + geom_point()
# 5 aesthetics: add a mapping of size to hp / wt ggplot(mtcars, aes(mpg, qsec, color = fcyl, shape = fam, size = hp/wt)) + geom_point()
Between the x and y dimensions, the color, shape, and size of the points, your plot displays five dimensions of the dataset!
Modifying aesthetics
Updating aesthetic labels
We'll modify some aesthetics to make a bar plot of the number of cylinders for cars with different types of transmission.
We'll also make use of some functions for improving the appearance of the plot.
- labs() to set the x- and y-axis labels. It takes strings for each argument.
- cale_color_manual() defines properties of the color scale (i.e. axis). The first argument sets the legend title. values is a named vector of colors to use.
ggplot(mtcars, aes(fcyl, fill = fam)) + geom_bar() + # Set the axis labels labs(x = "Number of Cylinders", y = "Count")
palette <- c(automatic = "#377EB8", manual = "#E41A1C")
ggplot(mtcars, aes(fcyl, fill = fam)) + geom_bar() + labs(x = "Number of Cylinders", y = "Count") + # Set the fill color scale scale_fill_manual("Transmission", values = palette)
# Set the position ggplot(mtcars, aes(fcyl, fill = fam)) + geom_bar(position = "dodge") + labs(x = "Number of Cylinders", y = "Count") + scale_fill_manual("Transmission", values = palette)
Choosing the right position argument is an important part of making a good plot.
Setting a dummy aesthetic
We saw all the visible aesthetics can serve as attributes and aesthetics, but we left out x and y. That's because although we can make univariate plots (such as histograms, which you'll get to in the next chapter), a y-axis will always be provided, even if you didn't ask for it.
We can make univariate plots in ggplot2, but we will need to add a fake y axis by mapping y to zero.
When using setting y-axis limits, we can specify the limits as separate arguments, or as a single numeric vector. That is, ylim(lo, hi) or ylim(c(lo, hi)).
# Plot 0 vs. mpg ggplot(mtcars, aes(mpg, 0)) + # Add jitter geom_point(position = "jitter")
ggplot(mtcars, aes(mpg, 0)) + geom_jitter() + # Set the y-axis limits ylim(-2, 2)
The best way to make your plot depends on a lot of different factors and sometimes ggplot2 might not be the best choice.
Aesthetics best practices
Appropriate mappings
Incorrect aesthetic mapping causes confusion or misleads the audience.
Typically, the dependent variable is mapped onto the the y-axis and the independent variable is mapped onto the x-axis.
Form follows function
Function
Primary:
- Accurate and efficient representations
Secondary:
- Visually appealing, beautiful plots
Guiding principles
Never:
- Misrepresent or obscure data
- Confuse viewers with complexity
Always:
- Consider the audience and purpose of every plot
The best choices for aesthetics
-
Effficient
- Provides a faster overview than numeric summaries
-
Accurate
- Minimizes information loss
Geometries
Scatter Plots: Overplotting
Large datasets
Scatter plots (using geom_point()) are intuitive, easily understood, and very common, but we must always consider overplotting, particularly in the following four situations:
- Large datasets
- Aligned values on a single axis
- Low-precision data
- Integer data
Typically, alpha blending (i.e. adding transparency) is recommended when using solid shapes. Alternatively, you can use opaque, hollow shapes.
Small points are suitable for large datasets with regions of high density (lots of overlapping).
# Plot price vs. carat, colored by clarity plt_price_vs_carat_by_clarity <- ggplot(diamonds, aes(carat, price, color = clarity)) # Add a point layer with tiny points plt_price_vs_carat_by_clarity + geom_point(alpha = 0.5, shape = ".")
# Set transparency to 0.5, set shape to 16 plt_price_vs_carat_by_clarity + geom_point(alpha = 0.5, shape = 16)
Aligned values
Let's take a look at another case where we should be aware of overplotting: Aligning values on a single axis.
This occurs when one axis is continuous and the other is categorical, which can be overcome with some form of jittering.
# Plot base plt_mpg_vs_fcyl_by_fam <- ggplot(mtcars, aes(fcyl, mpg, color = fam))
# Default points are shown for comparison plt_mpg_vs_fcyl_by_fam + geom_point()
# Default points are shown for comparison plt_mpg_vs_fcyl_by_fam + geom_point()
# Alter the point positions by jittering, width 0.3 plt_mpg_vs_fcyl_by_fam + geom_point(position = position_jitter(width = 0.3))
# Default points are shown for comparison plt_mpg_vs_fcyl_by_fam + geom_point()
# Now jitter and dodge the point positions plt_mpg_vs_fcyl_by_fam + geom_point(position = position_jitterdodge(jitter.width = 0.3, dodge.width = 0.3))
Low-precision data
Overplotting 3: Low-precision data We already saw how to deal with overplotting when using geom_point() in two cases:
- Large datasets
- Aligned values on a single axis
We used position = 'jitter' inside geom_point() or geom_jitter().
Let's take a look at another case:
- Low-precision data
This results from low-resolution measurements like in the iris dataset, which is measured to 1mm precision. It's similar to case 2, but in this case we can jitter on both the x and y axis.
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + # Swap for jitter layer with width 0.1 geom_jitter(alpha = 0.5,width = 0.1)
#jitter within geom_point ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + # Set the position to jitter geom_point(alpha = 0.5, position = "jitter")
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + # Use a jitter position function with width 0.1 geom_point(alpha = 0.5, position = position_jitter(width = 0.1))
Notice that jitter can be a geom itself (i.e. geom_jitter()), an argument in geom_point() (i.e. position = "jitter"), or a position function, (i.e. position_jitter()).
Integer data
Let's take a look at the last case of dealing with overplotting:
- Integer data
This can be type integer (i.e. 1 ,2, 3…) or categorical (i.e. class factor) variables. factor is just a special class of type integer.
We'll typically have a small, defined number of intersections between two variables, which is similar to case 3, but you may miss it if you don't realize that integer and factor data are the same as low precision data.
The Vocab dataset provided contains the years of education and vocabulary test scores from respondents to US General Social Surveys from 1972-2004.
library(carData)
Attaching package: 㤼㸱carData㤼㸲 The following object is masked _by_ 㤼㸱.GlobalEnv㤼㸲: Vocab
# Examine the structure of Vocab str(Vocab)
'data.frame': 30351 obs. of 4 variables: $ year : num 1974 1974 1974 1974 1974 ... $ sex : Factor w/ 2 levels "Female","Male": 2 2 1 1 1 2 2 2 1 1 ... $ education : num 14 16 10 10 12 16 17 10 12 11 ... $ vocabulary: Factor w/ 11 levels "0","1","2","3",..: 10 10 10 6 9 9 10 6 4 6 ...
# Plot vocabulary vs. education ggplot(Vocab, aes(education, vocabulary)) + # Add a point layer geom_point()
ggplot(Vocab, aes(education, vocabulary)) + # Change to a jitter layer geom_jitter()
ggplot(Vocab, aes(education, vocabulary)) + # Set the transparency to 0.2 geom_jitter(alpha=0.2)
ggplot(Vocab, aes(education, vocabulary)) + # Set the shape to 1 geom_jitter(alpha = 0.2, shape = 1)
Notice how jittering and alpha blending serves as a great solution to the overplotting problem here. Setting the shape to 1 didn't really help, but it was useful in the previous exercises when you had less data. We need to consider each plot individually.
Histograms
Histograms cut up a continuous variable into discrete bins and, by default, maps the internally calculated count variable (the number of observations in each bin) onto the y aesthetic. An internal variable called density can be accessed by using the .. notation, i.e. ..density… Plotting this variable will show the relative frequency, which is the height times the width of each bin.
# Plot mpg ggplot(mtcars, aes(mpg)) + # Add a histogram layer geom_histogram()
ggplot(mtcars, aes(mpg)) + # Set the binwidth to 1 geom_histogram(binwidth = 1)
# Map y to ..density.. ggplot(mtcars, aes(mpg, ..density..)) + geom_histogram(binwidth = 1)
datacamp_light_blue <- "#51A8C9" ggplot(mtcars, aes(mpg, ..density..)) + # Set the fill color to datacamp_light_blue geom_histogram(binwidth = 1, fill = datacamp_light_blue)
Histograms are one of the most common exploratory plots for continuous data. If you want to use density on the y-axis be sure to set your binwidth to an intuitive value.
Positions in histograms
Here, we'll examine the various ways of applying positions to histograms. geom_histogram(), a special case of geom_bar(), has a position argument that can take on the following values:
- stack (the default): Bars for different groups are stacked on top of each other.
- dodge: Bars for different groups are placed side by side.
- fill: Bars for different groups are shown as proportions.
- identity: Plot the values as they appear in the dataset.
# Update the aesthetics so the fill color is by fam ggplot(mtcars, aes(mpg, fill = fam)) + geom_histogram(binwidth = 1)
ggplot(mtcars, aes(mpg, fill = fam)) + # Change the position to dodge geom_histogram(binwidth = 1, position = "dodge")
ggplot(mtcars, aes(mpg, fill = fam)) + # Change the position to fill geom_histogram(binwidth = 1, position = "fill")
ggplot(mtcars, aes(mpg, fill = fam)) + # Change the position to identity, with transparency 0.4 geom_histogram(binwidth = 1, position = "identity", alpha = 0.4)
Bar plots
Position in bar and col plots
Let's see how the position argument changes geom_bar().
We have three position options:
- stack: The default
- dodge: Preferred
- fill: To show proportions
While we will be using geom_bar() here, note that the function geom_col() is just geom_bar() where both the position and stat arguments are set to "identity". It is used when we want the heights of the bars to represent the exact values in the data.
# Plot fcyl, filled by fam ggplot(mtcars, aes(fcyl, fill = fam)) + # Add a bar layer geom_bar()
ggplot(mtcars, aes(fcyl, fill = fam)) + # Set the position to "fill" geom_bar(position = "fill")
ggplot(mtcars, aes(fcyl, fill = fam)) + # Change the position to "dodge" geom_bar(position = "dodge")
Different kinds of plots need different position arguments, so it's important to be familiar with this attribute.
Overlapping bar plots
We can customize bar plots further by adjusting the dodging so that our bars partially overlap each other. Instead of using position = "dodge", we're going to use position_dodge(), like we did with position_jitter() in the the previously. Here, we'll save this as an object, posn_d, so that we can easily reuse it.
Remember, the reason we want to use position_dodge() (and position_jitter()) is to specify how much dodging (or jittering) you want.
ggplot(mtcars, aes(cyl, fill = fam)) + # Change position to use the functional form, with width 0.2 geom_bar(position = position_dodge(width = 0.2))
ggplot(mtcars, aes(cyl, fill = fam)) + # Set the transparency to 0.6 geom_bar(position = position_dodge(width = 0.2), alpha = 0.6)
By using these position functions, we can customize your plot to suit your needs.
Bar plots: sequential color palette
We'll fill each segment according to an ordinal variable. The best way to do that is with a sequential color palette.
#example of using a sequential color palette ggplot(mtcars, aes(fcyl, fill = fam)) + geom_bar() + scale_fill_brewer(palette = "Set1")
Vocab = Vocab %>% mutate(vocabulary = as.factor(vocabulary)) # Plot education, filled by vocabulary ggplot(Vocab, aes(education, fill = vocabulary)) + geom_bar()
# Plot education, filled by vocabulary ggplot(Vocab, aes(education, fill = vocabulary)) + # Add a bar layer with position "fill" geom_bar(position = "fill")
# Plot education, filled by vocabulary ggplot(Vocab, aes(education, fill = vocabulary)) + # Add a bar layer with position "fill" geom_bar(position = "fill") + # Add a brewer fill scale with default palette scale_fill_brewer()
Line plots
We'll use the economics dataset to make some line plots. The dataset contains a time series for unemployment and population statistics from the Federal Reserve Bank of St. Louis in the United States. The data is contained in the ggplot2 package.
To begin with, we can look at how the median unemployment time and the unemployment rate (the number of unemployed people as a proportion of the population) change over time.
# Print the head of economics head(economics)
# Using economics, plot unemploy vs. date ggplot(economics, aes(x = date, y = unemploy)) + # Make it a line plot geom_line()
# Change the y-axis to the proportion of the population that is unemployed ggplot(economics, aes(date, unemploy/pop)) + geom_line()
Multiple time series
We already saw how the form of your data affects how you can plot it. Let's explore that further with multiple time series. Here, it's important that all lines are on the same scale, and if possible, on the same plot.
fish.species contains the global capture rates of seven salmon species from 1950–2010. Each variable (column) is a Salmon species and each observation (row) is one year. fish.tidy contains the same data, but in three columns: Species, Year, and Capture (i.e. one variable per column).
load("fish.RData")
library(tidyr) # Use gather to go from fish.species to fish.tidy fish.tidy <- gather(fish.species, Species, Capture, -Year)
str(fish.species)
'data.frame': 61 obs. of 8 variables: $ Year : int 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 ... $ Pink : int 100600 259000 132600 235900 123400 244400 203400 270119 200798 200085 ... $ Chum : int 139300 155900 113800 99800 148700 143700 158480 125377 132407 113114 ... $ Sockeye : int 64100 51200 58200 66100 83800 72000 84800 69676 100520 62472 ... $ Coho : int 30500 40900 33600 32400 38300 45100 40000 39900 39200 32865 ... $ Rainbow : int 0 100 100 100 100 100 100 100 100 100 ... $ Chinook : int 23200 25500 24900 25300 24500 27700 25300 21200 20900 20335 ... $ Atlantic: int 10800 9701 9800 8800 9600 7800 8100 9000 8801 8700 ...
str(fish.tidy)
'data.frame': 427 obs. of 3 variables: $ Year : int 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 ... $ Species: chr "Pink" "Pink" "Pink" "Pink" ... $ Capture: int 100600 259000 132600 235900 123400 244400 203400 270119 200798 200085 ...
# Plot the Rainbow Salmon time series ggplot(fish.species, aes(x = Year, y = Rainbow)) + geom_line()
# Plot the Pink Salmon time series ggplot(fish.species, aes(x = Year, y = Pink)) + geom_line()
# Plot multiple time-series by grouping by species ggplot(fish.tidy, aes(Year, Capture)) + geom_line(aes(group = Species))
# Plot multiple time-series by coloring by species ggplot(fish.tidy, aes(x = Year, y = Capture, color = Species)) + geom_line(aes(group = Species))
As we can see in the the last couple of plots, a grouping aesthetic was vital here. If you don't specify color = Species, you'll get a mess of lines.
Themes
Moving the legend
To change stylistic elements of a plot, call theme() and set plot properties to a new value. For example, the following changes the legend position.
p + theme(legend.position = new_value)
Here, the new value can be
- "top", "bottom", "left", or "right'": place it at that side of the plot.
- "none": don't draw it.
- c(x, y): c(0, 0) means the bottom-left and c(1, 1) means the top-right.
recess <- data.frame( begin = c("1969-12-01","1973-11-01","1980-01-01","1981-07-01","1990-07-01","2001-03-01", "2007-12-01"), end = c("1970-11-01","1975-03-01","1980-07-01","1982-11-01","1991-03-01","2001-11-01", "2009-07-30"), event = c("Fiscal & Monetary\ntightening", "1973 Oil crisis", "Double dip I","Double dip II", "Oil price shock", "Dot-com bubble", "Sub-prime\nmortgage crisis"), y = c(.01415981, 0.02067402, 0.02951190, 0.03419201, 0.02767339, 0.02159662,0.02520715), stringsAsFactors = F ) library(lubridate)
Attaching package: 㤼㸱lubridate㤼㸲 The following object is masked from 㤼㸱package:base㤼㸲: date
recess$begin <- ymd (recess$begin) recess$end <- ymd (recess$end)
plt_prop_unemployed_over_time = ggplot(economics, aes(x = date, y = unemploy/pop)) + ggtitle(c("The percentage of unemployed Americans \n increases sharply during recessions")) + geom_line() + geom_rect(data = recess, aes(xmin = begin, xmax = end, ymin = -Inf, ymax = +Inf, fill = "Recession"), inherit.aes = FALSE, alpha = 0.2) + geom_label(data = recess, aes(x = end, y = y, label=event), size = 3) + scale_fill_manual(name = "", values="red", label="Recessions") plt_prop_unemployed_over_time
# View the default plot plt_prop_unemployed_over_time
# Remove legend entirely plt_prop_unemployed_over_time + theme(legend.position = "none")
# Position the legend at the bottom of the plot plt_prop_unemployed_over_time + theme(legend.position = "bottom")
# Position the legend inside the plot at (0.6, 0.1) plt_prop_unemployed_over_time + theme(legend.position = c(0.6,0.1))
But be careful when placing a legend inside your plotting space. You could end up obscuring data.
Modifying theme elements
Many plot elements have multiple properties that can be set. For example, line elements in the plot such as axes and gridlines have a color, a thickness (size), and a line type (solid line, dashed, or dotted). To set the style of a line, you use element_line(). For example, to make the axis lines into red, dashed lines, you would use the following.
p + theme(axis.line = element_line(color = "red", linetype = "dashed"))
Similarly, element_rect() changes rectangles and element_text() changes text. You can remove a plot element using element_blank().
plt_prop_unemployed_over_time + theme( # For all rectangles, set the fill color to grey92 rect = element_rect(fill = "grey92"), # For the legend key, turn off the outline legend.key = element_rect(color = NA) )
plt_prop_unemployed_over_time + theme( rect = element_rect(fill = "grey92"), legend.key = element_rect(color = NA), # Turn off axis ticks axis.ticks = element_blank(), # Turn off the panel grid panel.grid = element_blank() )
plt_prop_unemployed_over_time + theme( rect = element_rect(fill = "grey92"), legend.key = element_rect(color = NA), axis.ticks = element_blank(), panel.grid = element_blank(), # Add major y-axis panel grid lines back panel.grid.major.y = element_line( # Set the color to white color = "white", # Set the size to 0.5 size = 0.5, # Set the line type to dotted linetype = "dotted" ) )
plt_prop_unemployed_over_time + theme( rect = element_rect(fill = "grey92"), legend.key = element_rect(color = NA), axis.ticks = element_blank(), panel.grid = element_blank(), panel.grid.major.y = element_line( color = "white", size = 0.5, linetype = "dotted" ), # Set the axis text color to grey25 axis.text = element_text(color ="grey25"), # Set the plot title font face to italic and font size to 16 plot.title = element_text(size = 16, face = "italic") )
Excellent Explanatory Plot! This plot is ready for prime time – it's pretty AND informative. Make sure that all your text is legible for the context in which it will be viewed.
Modifying whitespace
Whitespace means all the non-visible margins and spacing in the plot.
To set a single whitespace value, use unit(x, unit), where x is the amount and unit is the unit of measure.
Borders require you to set 4 positions, so use margin(top, right, bottom, left, unit). To remember the margin order, think TRouBLe.
The default unit is "pt" (points), which scales well with text. Other options include "cm", "in" (inches) and "lines" (of text).
# View the original plot plt_mpg_vs_wt_by_cyl <- ggplot(mtcars, aes(wt, mpg, color = fcyl)) + ylab("Miels per gallon") + xlab("weight (1000/lbs)") + geom_point() plt_mpg_vs_wt_by_cyl
plt_mpg_vs_wt_by_cyl + theme( # Set the axis tick length to 2 lines axis.ticks.length = unit(2, "lines") )
plt_mpg_vs_wt_by_cyl + theme( # Set the legend key size to 3 centimeters legend.key.size = unit(3, "cm") )
plt_mpg_vs_wt_by_cyl + theme( # Set the legend margin to (20, 30, 40, 50) points legend.margin = margin(20, 30, 40, 50, "pt") )
plt_mpg_vs_wt_by_cyl + theme( # Set the plot margin to (10, 30, 50, 70) millimeters plot.margin = margin(10, 30, 50, 70, "mm") )
Changing the whitespace can be useful if you need to make your plot more compact, or if you want to create more space to reduce "business".
Built-in Themes
Built-in themes In addition to making your own themes, there are several out-of-the-box solutions that may save you lots of time.
- theme_gray() is the default.
- theme_bw() is useful when you use transparency.
- theme_classic() is more traditional.
- theme_void() removes everything but the data.
# Add a black and white theme plt_prop_unemployed_over_time + theme_bw()
# Add a classic theme plt_prop_unemployed_over_time + theme_classic()
# Add a void theme plt_prop_unemployed_over_time + theme_void()
The black and white theme works really well if you use transparency in your plot.
Exploring ggthemes package
Outside of ggplot2, another source of built-in themes is the ggthemes package.
library(ggthemes)
package 㤼㸱ggthemes㤼㸲 was built under R version 3.6.3
# Use the fivethirtyeight theme plt_prop_unemployed_over_time + theme_fivethirtyeight()
# Use Tufte's theme plt_prop_unemployed_over_time + theme_tufte()
# Use the Wall Street Journal theme plt_prop_unemployed_over_time + theme_wsj()
ggthemes has over 20 themes for you to try.
Setting themes
Reusing a theme across many plots helps to provide a consistent style. You have several options for this.
- Assign the theme to a variable, and add it to each plot.
- Set your theme as the default using theme_set().
A good strategy that we'll use here is to begin with a built-in theme then modify it.
# Save the theme as theme_recession theme_recession <- theme( rect = element_rect(fill = "grey92"), legend.key = element_rect(color = NA), axis.ticks = element_blank(), panel.grid = element_blank(), panel.grid.major.y = element_line(color = "white", size = 0.5, linetype = "dotted"), axis.text = element_text(color = "grey25"), plot.title = element_text(face = "italic", size = 16), legend.position = c(0.6, 0.1) ) # Combine the Tufte theme with theme_recession theme_tufte_recession <- theme_tufte() + theme_recession # Add the Tufte recession theme to the plot plt_prop_unemployed_over_time + theme_tufte_recession
theme_recession <- theme( rect = element_rect(fill = "grey92"), legend.key = element_rect(color = NA), axis.ticks = element_blank(), panel.grid = element_blank(), panel.grid.major.y = element_line(color = "white", size = 0.5, linetype = "dotted"), axis.text = element_text(color = "grey25"), plot.title = element_text(face = "italic", size = 16), legend.position = c(0.6, 0.1) ) theme_tufte_recession <- theme_tufte() + theme_recession # Set theme_tufte_recession as the default theme theme_set(theme_tufte_recession) # Draw the plot (without explicitly adding a theme) plt_prop_unemployed_over_time
Publication-quality plots
plt_prop_unemployed_over_time + # Add Tufte's theme theme_tufte()
plt_prop_unemployed_over_time + theme_tufte() + # Add individual theme elements theme( # Turn off the legend legend.position = "none", # Turn off the axis ticks axis.ticks = element_blank() )
plt_prop_unemployed_over_time + theme_tufte() + theme( legend.position = "none", axis.ticks = element_blank(), # Set the axis title's text color to grey60 axis.title = element_text(color = "grey60"), # Set the axis text's text color to grey60 axis.text = element_text(color = "grey60") )
plt_prop_unemployed_over_time + theme_tufte() + theme( legend.position = "none", axis.ticks = element_blank(), axis.title = element_text(color = "grey60"), axis.text = element_text(color = "grey60"), # Set the panel gridlines major y values panel.grid.major.y = element_line( # Set the color to grey60 color = "grey60", # Set the size to 0.25 size = 0.25, # Set the linetype to dotted linetype = "dotted" ) )
Using geoms for explanatory plots
Let's focus on producing beautiful and effective explanatory plots. In the next couple of exercises, we'll create a plot that is similar to the one shown in the video using gm2007, a filtered subset of the gapminder dataset.
This type of plot will be in an info-viz style, meaning that it would be similar to something you'd see in a magazine or website for a mostly lay audience.
library(gapminder)
package 㤼㸱gapminder㤼㸲 was built under R version 3.6.3
gapminder
gm2007 <- gapminder %>% filter(year == 2007) %>% select(country, lifeExp, continent) %>% filter(lifeExp > 80.6 | lifeExp <46) %>% arrange(lifeExp) gm2007
gm2007_full <- gapminder %>% filter(year == 2007) %>% select(country, lifeExp, continent)
# Add a geom_segment() layer ggplot(gm2007, aes(x = lifeExp, y = country, color = lifeExp)) + geom_point(size = 4) + geom_segment(aes(xend = 30, yend = country), size = 2) + theme(legend.position="right")
# Add a geom_text() layer ggplot(gm2007, aes(x = lifeExp, y = country, color = lifeExp)) + geom_point(size = 4) + geom_segment(aes(xend = 30, yend = country), size = 2) + geom_text(aes(label = lifeExp), color = "white", size = 1.5) + theme(legend.position="right")
library(RColorBrewer) # Set the color scale palette <- brewer.pal(5, "RdYlBu")[-(2:4)] # Modify the scales ggplot(gm2007, aes(x = lifeExp, y = country, color = lifeExp)) + geom_point(size = 4) + geom_segment(aes(xend = 30, yend = country), size = 2) + geom_text(aes(label = round(lifeExp,1)), color = "white", size = 1.5) + scale_x_continuous("", expand = c(0, 0), limits = c(30,90), position = "top") + scale_color_gradientn(colors = palette) + theme(legend.position="right")
# Set the color scale palette <- brewer.pal(5, "RdYlBu")[-(2:4)] # Add a title and caption plt_country_vs_lifeExp <- ggplot(gm2007, aes(x = lifeExp, y = country, color = lifeExp)) + geom_point(size = 4) + geom_segment(aes(xend = 30, yend = country), size = 2) + geom_text(aes(label = round(lifeExp,1)), color = "white", size = 1.5) + scale_x_continuous("", expand = c(0,0), limits = c(30,90), position = "top") + scale_color_gradientn(colors = palette) + labs(title = "Highest and lowest life expectancies, 2007", caption = "Source: gapminder") + theme(legend.position="right") plt_country_vs_lifeExp
Using annotate() for embellishments
We completed our basic plot. Now let's polish it by playing with the theme and adding annotations. In this exercise, we'll use annotate() to add text and a curve to the plot.
The following values have been calculated for you to assist with adding embellishments to the plot:
global_mean <- mean(gm2007_full$lifeExp) x_start <- global_mean + 4 y_start <- 5.5 x_end <- global_mean y_end <- 7.5
# Define the theme plt_country_vs_lifeExp <- plt_country_vs_lifeExp + theme_classic() + theme(axis.line.y = element_blank(), axis.ticks.y = element_blank(), axis.text = element_text(color = "black"), axis.title = element_blank(), legend.position = "none") plt_country_vs_lifeExp
# Add a vertical line plt_country_vs_lifeExp <- plt_country_vs_lifeExp + geom_vline(xintercept = global_mean, color = "grey40", linetype = 3) plt_country_vs_lifeExp
plt_country_vs_lifeExp <- plt_country_vs_lifeExp + annotate( "text", x = x_start, y = y_start, label = "The\nglobal\naverage", vjust = 1, size = 3, color = "grey40" ) plt_country_vs_lifeExp
plt_country_vs_lifeExp <- plt_country_vs_lifeExp + annotate( "curve", x = x_start, y = y_start, xend = x_end, yend = y_end, arrow = arrow(length = unit(0.2, "cm"), type = "closed"), color = "grey40" ) plt_country_vs_lifeExp
Your explanatory plot clearly shows the countries with the highest and lowest life expectancy and would be great for a lay audience.
---
title: "Introduction to Visualization with ggplot2"
output:
  html_notebook:
    toc: true
    toc_float: true
    toc_collapsed: false
    
toc_depth: 3
---

# Introduction

The mtcars dataset contains information on 32 cars from a 1973 issue of Motor Trend magazine. This dataset is small, intuitive, and contains a variety of continuous and categorical variables.

```{r}
# Load the ggplot2 package
library(ggplot2)
library(dplyr)
# Explore the mtcars data frame with str()
str(mtcars)

# Execute the following command
ggplot(mtcars, aes(cyl, mpg)) +
  geom_point()
```
Notice that ggplot2 treats cyl as a continuous variable. We get a plot, but it's not quite right, because it gives the impression that there is such a thing as a 5 or 7-cylinder car, which there is not.

## Data columns types affect plot types

Although cyl (the number of cylinders) is categorical, you probably noticed that it is classified as numeric in mtcars. This is really misleading because the representation in the plot doesn't match the actual data type. You'll have to explicitly tell ggplot2 that cyl is a categorical variable.

```{r}
# Load the ggplot2 package
library(ggplot2)

# Change the command below so that cyl is treated as factor
ggplot(mtcars, aes(factor(cyl), mpg)) +
  geom_point()
```
Notice that ggplot2 treats cyl as a factor. This time the x-axis does not contain variables like 5 or 7, only the values that are present in the dataset.

## The grammar of graphics

### Mapping data columns to aesthetics

Let's dive a little deeper into the three main topics in this course: The data, aesthetics, and geom layers. 

```{r}
# Edit to add a color aesthetic mapped to disp
ggplot(mtcars, aes(wt, mpg, color = disp)) +
  geom_point()
# Change the color aesthetic to a size aesthetic
ggplot(mtcars, aes(wt, mpg, size = disp)) +
  geom_point()
```
### Understanding variables

In the previous exercise you saw that disp can be mapped onto a color gradient or onto a continuous size scale.

Another argument of aes() is the shape of the points. There are a finite number of shapes which ggplot() can automatically assign to the points.

## ggplot2 layers

### Adding geometries

The diamonds dataset contains details of 1,000 diamonds. Among the variables included are carat (a measurement of the diamond's size) and price.

We'll use two common geom layer functions:
- geom_point() adds points (as in a scatter plot).
- geom_smooth() adds a smooth trend curve.
```{r}
# Explore the diamonds data frame with str()
str(diamonds)
```
```{r}
# Add geom_point() with +
ggplot(diamonds, aes(carat, price)) +
  geom_point()
```
```{r}
# Add geom_smooth() with +
ggplot(diamonds, aes(carat, price)) +
  geom_point() +
  geom_smooth()
```
### Changing one geom or every geom

If we have multiple geoms, then mapping an aesthetic to data variable inside the call to ggplot() will change all the geoms. It is also possible to make changes to individual geoms by passing arguments to the geom_*() functions.

geom_point() has an alpha argument that controls the opacity of the points. A value of 1 (the default) means that the points are totally opaque; a value of 0 means the points are totally transparent (and therefore invisible). Values in between specify transparency.
```{r}
# Map the color aesthetic to clarity
ggplot(diamonds, aes(carat, price, color = clarity)) +
  geom_point() +
  geom_smooth()
```
```{r}
# Make the points 40% opaque
ggplot(diamonds, aes(carat, price, color = clarity)) +
  geom_point(alpha = 0.4) +
  geom_smooth()
```
### Saving plots as variables

Plots can be saved as variables, which can be added two later on using the + operator. This is really useful if you want to make multiple related plots from a common base.

```{r}
# Draw a ggplot
plt_price_vs_carat <- ggplot(
  # Use the diamonds dataset
  diamonds,
  # For the aesthetics, map x to carat and y to price
  aes(carat, price)
)

# Add a point layer to plt_price_vs_carat
plt_price_vs_carat + geom_point()
```
```{r}
# Edit this to make points 20% opaque: plt_price_vs_carat_transparent
plt_price_vs_carat_transparent <- plt_price_vs_carat + geom_point(alpha = 0.2)

# See the plot
plt_price_vs_carat_transparent
```
```{r}
# Edit this to map color to clarity,
# Assign the updated plot to a new object
plt_price_vs_carat_by_clarity <- plt_price_vs_carat + geom_point(aes(color = clarity))

# See the plot
plt_price_vs_carat_by_clarity
```
By assigning parts of plots to a variable then reusing that variable in other plots, it makes it really clear how much those plots have in common.


# Aesthetics

## Color, shape and size

These are the aesthetics we can consider within aes() in this chapter: x, y, color, fill, size, alpha, labels and shape.

One common convention is that you don't name the x and y arguments to aes(), since they almost always come first, but you do name other arguments.
```{r}
#create factor fcyl
mtcars <- mtcars %>% 
  mutate(fcyl = as.factor(cyl),
         fam = as.factor(am))
```


```{r}
library(forcats)
mtcars <- mtcars %>% 
  mutate(fam = fct_recode(fam,
                          "manual" = "1",
                         "automatic" = "0"))
```

```{r}
# Map x to mpg and y to fcyl
ggplot(mtcars, aes(mpg, fcyl)) +
  geom_point()

```
```{r}
# Swap mpg and fcyl
ggplot(mtcars, aes(fcyl, mpg)) +
  geom_point()
```
```{r}
# Map x to wt, y to mpg and color to fcyl
ggplot(mtcars, aes(wt, mpg, color = fcyl)) +
  geom_point()
```
```{r}
ggplot(mtcars, aes(wt, mpg, color = fcyl)) +
  # Set the shape and size of the points
  geom_point(shape = 1, size = 4)
```
## Color vs. fill

Typically, the color aesthetic changes the outline of a geom and the fill aesthetic changes the inside. geom_point() is an exception: you use color (not fill) for the point color. However, some shapes have special behavior.

The default geom_point() uses shape = 19: a solid circle. An alternative is shape = 21: a circle that allow you to use both fill for the inside and color for the outline. This is lets you to map two aesthetics to each point.

All shape values are described on the points() help page.
```{r}
# Map fcyl to fill
ggplot(mtcars, aes(wt, mpg, fill = fcyl)) +
  geom_point(shape = 1, size = 4)
```
```{r}
ggplot(mtcars, aes(wt, mpg, fill = fcyl)) +
  # Change point shape; set alpha
  geom_point(shape = 21, size = 4, alpha = 0.6)
```
```{r}
# Map color to fam
ggplot(mtcars, aes(wt, mpg, fill = fcyl, color = fam)) +
  geom_point(shape = 21, size = 4, alpha = 0.6)
```
Notice that mapping a categorical variable onto fill doesn't change the colors, although a legend is generated! This is because the default shape for points only has a color attribute and not a fill attribute! Use fill when you have another shape (such as a bar), or when using a point that does have a fill and a color attribute, such as shape = 21, which is a circle with an outline. Any time you use a solid color, make sure to use alpha blending to account for over plotting.

## Comparing aesthetics

Be careful of a major pitfall: these attributes can overwrite the aesthetics of your plot!

```{r}
# Establish the base layer
plt_mpg_vs_wt <- ggplot(mtcars, aes(wt, mpg))
```
```{r}
# Map fcyl to size
plt_mpg_vs_wt +
  geom_point(aes(size = fcyl))
```
```{r}
plt_mpg_vs_wt +
  geom_point(aes(alpha = fcyl))
```
```{r}
# Map fcyl to shape, not alpha
plt_mpg_vs_wt +
  geom_point(aes(shape = fcyl))
```
```{r}
# Use text layer and map fcyl to label
plt_mpg_vs_wt +
  geom_text(aes(label = fcyl))
```
Label and shape are only applicable to categorical data.

## Color, shape, size and alpha

This time we'll use these arguments to set attributes of the plot, not map variables onto aesthetics.

We can specify colors in R using hex codes: a hash followed by two hexadecimal numbers each for red, green, and blue ("#RRGGBB"). Hexadecimal is base-16 counting. We have 0 to 9, and A representing 10 up to F representing 15. Pairs of hexadecimal numbers give you a range from 0 to 255. "#000000" is "black" (no color), "#FFFFFF" means "white", and `"#00FFFF" is cyan (mixed green and blue).
```{r}
# A hexadecimal color
my_blue <- "#4ABEFF"
```
```{r}
ggplot(mtcars, aes(wt, mpg)) +
  # Set the point color and alpha
  geom_point(color = my_blue, alpha = 0.6)
```
```{r}
# Change the color mapping to a fill mapping
ggplot(mtcars, aes(wt, mpg, fill = fcyl)) +
  # Set point size and shape
  geom_point(color = my_blue, size = 10, shape = 1)
```
ggplot2 lets you control these attributes in many ways to customize your plots.

## Conflicts with aesthetics

We can use all the aesthetics as attributes. Let's see how this works with the aesthetics you used in the previous exercises: x, y, color, fill, size, alpha, label and shape.
```{r}
ggplot(mtcars, aes(wt, mpg, color = fcyl)) +
  # Add point layer with alpha 0.5
  geom_point(alpha = 0.5)
```
```{r}
ggplot(mtcars, aes(wt, mpg, color = fcyl)) +
  # Add text layer with label rownames(mtcars) and color red
  geom_text(label = rownames(mtcars), color = "red")
```
```{r}
ggplot(mtcars, aes(wt, mpg, color = fcyl)) +
  # Add points layer with shape 24 and color yellow
  geom_point(shape = 24, color = "yellow")
```
## Going all out

Now, we will gradually add more aesthetics layers to the plot. We're still working with the mtcars dataset, but this time we're using more features of the cars. Each of the columns is described on the mtcars help page.

Notice that adding more aesthetic mappings to our plot is not always a good idea! We may just increase complexity and decrease readability.
```{r}
# 3 aesthetics: qsec vs. mpg, colored by fcyl
ggplot(mtcars, aes(mpg, qsec, color = fcyl)) +
  geom_point()
```
```{r}
# 4 aesthetics: add a mapping of shape to fam
ggplot(mtcars, aes(mpg, qsec, color = fcyl, shape = fam)) +
  geom_point()
```
```{r}
# 5 aesthetics: add a mapping of size to hp / wt
ggplot(mtcars, aes(mpg, qsec, color = fcyl, shape = fam, size = hp/wt)) +
  geom_point()
```
Between the x and y dimensions, the color, shape, and size of the points, your plot displays five dimensions of the dataset!

# Modifying aesthetics

## Updating aesthetic labels

We'll modify some aesthetics to make a bar plot of the number of cylinders for cars with different types of transmission.

We'll also make use of some functions for improving the appearance of the plot.

 - labs() to set the x- and y-axis labels. It takes strings for each argument.
 - cale_color_manual() defines properties of the color scale (i.e. axis). The first argument sets the legend title. values is a named vector of colors to use.

```{r}
ggplot(mtcars, aes(fcyl, fill = fam)) +
  geom_bar() +
  # Set the axis labels
  labs(x = "Number of Cylinders", y = "Count")
```

```{r}
palette <- c(automatic = "#377EB8", manual = "#E41A1C")
```
```{r}
ggplot(mtcars, aes(fcyl, fill = fam)) +
  geom_bar() +
  labs(x = "Number of Cylinders", y = "Count") +
  # Set the fill color scale
  scale_fill_manual("Transmission", values = palette)
```
```{r}
# Set the position
ggplot(mtcars, aes(fcyl, fill = fam)) +
  geom_bar(position = "dodge") +
  labs(x = "Number of Cylinders", y = "Count") +
  scale_fill_manual("Transmission", values = palette)
```
Choosing the right position argument is an important part of making a good plot.

## Setting a dummy aesthetic


We saw all the visible aesthetics can serve as attributes and aesthetics, but we left out x and y. That's because although we can make univariate plots (such as histograms, which you'll get to in the next chapter), a y-axis will always be provided, even if you didn't ask for it.

We can make univariate plots in ggplot2, but we will need to add a fake y axis by mapping y to zero.

When using setting y-axis limits, we can specify the limits as separate arguments, or as a single numeric vector. That is, ylim(lo, hi) or ylim(c(lo, hi)).
```{r}
# Plot 0 vs. mpg
ggplot(mtcars, aes(mpg, 0)) +
  # Add jitter 
  geom_point(position = "jitter")

```
```{r}
ggplot(mtcars, aes(mpg, 0)) +
  geom_jitter() +
  # Set the y-axis limits
  ylim(-2, 2)
```
The best way to make your plot depends on a lot of different factors and sometimes ggplot2 might not be the best choice.

## Aesthetics best practices

### Appropriate mappings
Incorrect aesthetic mapping causes confusion or misleads the audience.

Typically, the dependent variable is mapped onto the the y-axis and the independent variable is mapped onto the x-axis.
![aesthetics](images/aesthetics01.png)

### Form follows function


### Function
Primary:

- Accurate and efficient representations

Secondary:

- Visually appealing, beautiful plots

### Guiding principles

Never:

- Misrepresent or obscure data
- Confuse viewers with complexity

Always:

- Consider the audience and purpose of every plot

### The best choices for aesthetics

- Effficient

  - Provides a faster overview than numeric summaries
 
- Accurate

  - Minimizes information loss

# Geometries

## Scatter Plots: Overplotting

### Large datasets

Scatter plots (using geom_point()) are intuitive, easily understood, and very common, but we must always consider overplotting, particularly in the following four situations:

1. Large datasets
2. Aligned values on a single axis
3. Low-precision data
4. Integer data

Typically, alpha blending (i.e. adding transparency) is recommended when using solid shapes. Alternatively, you can use opaque, hollow shapes.

Small points are suitable for large datasets with regions of high density (lots of overlapping).
```{r}
# Plot price vs. carat, colored by clarity
plt_price_vs_carat_by_clarity <- ggplot(diamonds, aes(carat, price, color = clarity))
# Add a point layer with tiny points
plt_price_vs_carat_by_clarity + geom_point(alpha = 0.5, shape = ".")
```
```{r}
# Set transparency to 0.5, set shape to 16
plt_price_vs_carat_by_clarity + geom_point(alpha = 0.5, shape = 16)
```
### Aligned values

Let's take a look at another case where we should be aware of overplotting: Aligning values on a single axis.

This occurs when one axis is continuous and the other is categorical, which can be overcome with some form of jittering.
```{r}
# Plot base
plt_mpg_vs_fcyl_by_fam <- ggplot(mtcars, aes(fcyl, mpg, color = fam))
```
```{r}
# Default points are shown for comparison
plt_mpg_vs_fcyl_by_fam + geom_point()
```
```{r}
# Default points are shown for comparison
plt_mpg_vs_fcyl_by_fam + geom_point()
```
```{r}
# Alter the point positions by jittering, width 0.3
plt_mpg_vs_fcyl_by_fam + geom_point(position = position_jitter(width = 0.3))
```
```{r}
# Default points are shown for comparison
plt_mpg_vs_fcyl_by_fam + geom_point()
```
```{r}
# Now jitter and dodge the point positions
plt_mpg_vs_fcyl_by_fam + geom_point(position = position_jitterdodge(jitter.width = 0.3, dodge.width = 0.3))
```
### Low-precision data

Overplotting 3: Low-precision data
We already saw how to deal with overplotting when using geom_point() in two cases:

1.	Large datasets
2.	Aligned values on a single axis

We used position = 'jitter' inside geom_point() or geom_jitter().

Let's take a look at another case:

3.	Low-precision data

This results from low-resolution measurements like in the iris dataset, which is measured to 1mm precision. It's similar to case 2, but in this case we can jitter on both the x and y axis.
```{r}
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  # Swap for jitter layer with width 0.1
  geom_jitter(alpha = 0.5,width = 0.1)
```
```{r}
#jitter within geom_point
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  # Set the position to jitter
  geom_point(alpha = 0.5, position = "jitter")
```
```{r}
ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
  # Use a jitter position function with width 0.1
  geom_point(alpha = 0.5, position = position_jitter(width = 0.1))
```
Notice that jitter can be a geom itself (i.e. geom_jitter()), an argument in geom_point() (i.e. position = "jitter"), or a position function, (i.e. position_jitter()).
 
### Integer data

Let's take a look at the last case of dealing with overplotting:

1. Integer data

This can be type integer (i.e. 1 ,2, 3...) or categorical (i.e. class factor) variables. factor is just a special class of type integer.

We'll typically have a small, defined number of intersections between two variables, which is similar to case 3, but you may miss it if you don't realize that integer and factor data are the same as low precision data.

The Vocab dataset provided contains the years of education and vocabulary test scores from respondents to US General Social Surveys from 1972-2004.
```{r}
library(carData)
# Examine the structure of Vocab
str(Vocab)

# Plot vocabulary vs. education
ggplot(Vocab, aes(education, vocabulary)) +
  # Add a point layer
  geom_point()
```
```{r}
ggplot(Vocab, aes(education, vocabulary)) +
  # Change to a jitter layer
  geom_jitter()
```
```{r}
ggplot(Vocab, aes(education, vocabulary)) +
  # Set the transparency to 0.2
  geom_jitter(alpha=0.2)
```
```{r}
ggplot(Vocab, aes(education, vocabulary)) +
  # Set the shape to 1
  geom_jitter(alpha = 0.2, shape = 1)
```
Notice how jittering and alpha blending serves as a great solution to the overplotting problem here. Setting the shape to 1 didn't really help, but it was useful in the previous exercises when you had less data. We need to consider each plot individually. 

## Histograms

Histograms cut up a continuous variable into discrete bins and, by default, maps the internally calculated count variable (the number of observations in each bin) onto the y aesthetic. An internal variable called density can be accessed by using the .. notation, i.e. ..density... Plotting this variable will show the relative frequency, which is the height times the width of each bin.
```{r}
# Plot mpg
ggplot(mtcars, aes(mpg)) +
  # Add a histogram layer
  geom_histogram()
```
```{r}
ggplot(mtcars, aes(mpg)) +
  # Set the binwidth to 1
  geom_histogram(binwidth = 1)
```
```{r}
# Map y to ..density..
ggplot(mtcars, aes(mpg, ..density..)) +
  geom_histogram(binwidth = 1)
```
```{r}
datacamp_light_blue <- "#51A8C9"

ggplot(mtcars, aes(mpg, ..density..)) +
  # Set the fill color to datacamp_light_blue
  geom_histogram(binwidth = 1, fill = datacamp_light_blue)
```
Histograms are one of the most common exploratory plots for continuous data. If you want to use density on the y-axis be sure to set your binwidth to an intuitive value.

### Positions in histograms

Here, we'll examine the various ways of applying positions to histograms. geom_histogram(), a special case of geom_bar(), has a position argument that can take on the following values:

- stack (the default): Bars for different groups are stacked on top of each other.
- dodge: Bars for different groups are placed side by side.
- fill: Bars for different groups are shown as proportions.
- identity: Plot the values as they appear in the dataset.
```{r}
# Update the aesthetics so the fill color is by fam
ggplot(mtcars, aes(mpg, fill = fam)) +
  geom_histogram(binwidth = 1)
```
```{r}
ggplot(mtcars, aes(mpg, fill = fam)) +
  # Change the position to dodge
  geom_histogram(binwidth = 1, position = "dodge")
```
```{r}
ggplot(mtcars, aes(mpg, fill = fam)) +
  # Change the position to fill
  geom_histogram(binwidth = 1, position = "fill")
```
```{r}
ggplot(mtcars, aes(mpg, fill = fam)) +
  # Change the position to identity, with transparency 0.4
  geom_histogram(binwidth = 1, position = "identity", alpha = 0.4)
```
## Bar plots

### Position in bar and col plots

Let's see how the position argument changes geom_bar().

We have three position options:

- stack: The default
- dodge: Preferred
- fill: To show proportions

While we will be using geom_bar() here, note that the function geom_col() is just geom_bar() where both the position and stat arguments are set to "identity". It is used when we want the heights of the bars to represent the exact values in the data.
```{r}
# Plot fcyl, filled by fam
ggplot(mtcars, aes(fcyl, fill = fam)) +
  # Add a bar layer
  geom_bar()
```
```{r}
ggplot(mtcars, aes(fcyl, fill = fam)) +
  # Set the position to "fill"
  geom_bar(position = "fill")
```
```{r}
ggplot(mtcars, aes(fcyl, fill = fam)) +
  # Change the position to "dodge"
  geom_bar(position = "dodge")
```
Different kinds of plots need different position arguments, so it's important to be familiar with this attribute.

### Overlapping bar plots

We can customize bar plots further by adjusting the dodging so that our bars partially overlap each other. Instead of using position = "dodge", we're going to use position_dodge(), like we did with position_jitter() in the the previously. Here, we'll save this as an object, posn_d, so that we can easily reuse it.

Remember, the reason we want to use position_dodge() (and position_jitter()) is to specify how much dodging (or jittering) you want.
```{r}
ggplot(mtcars, aes(cyl, fill = fam)) +
  # Change position to use the functional form, with width 0.2
  geom_bar(position = position_dodge(width = 0.2))
```
```{r}
ggplot(mtcars, aes(cyl, fill = fam)) +
  # Set the transparency to 0.6
  geom_bar(position = position_dodge(width = 0.2), alpha = 0.6)
```
By using these position functions, we can customize your plot to suit your needs.

### Bar plots: sequential color palette

We'll fill each segment according to an ordinal variable. The best way to do that is with a sequential color palette.
```{r}
#example of using a sequential color palette
ggplot(mtcars, aes(fcyl, fill = fam)) +
  geom_bar() +
  scale_fill_brewer(palette = "Set1")
```
```{r}
Vocab = Vocab %>% 
  mutate(vocabulary = as.factor(vocabulary))
# Plot education, filled by vocabulary
ggplot(Vocab, aes(education, fill = vocabulary)) +
  geom_bar()
```

```{r}
# Plot education, filled by vocabulary
ggplot(Vocab, aes(education, fill = vocabulary)) +
  # Add a bar layer with position "fill"
  geom_bar(position = "fill")
```
```{r}
# Plot education, filled by vocabulary
ggplot(Vocab, aes(education, fill = vocabulary)) +
  # Add a bar layer with position "fill"
  geom_bar(position = "fill") +
  # Add a brewer fill scale with default palette
  scale_fill_brewer()
```
## Line plots

We'll use the economics dataset to make some line plots. The dataset contains a time series for unemployment and population statistics from the Federal Reserve Bank of St. Louis in the United States. The data is contained in the ggplot2 package.

To begin with, we can look at how the median unemployment time and the unemployment rate (the number of unemployed people as a proportion of the population) change over time.
```{r}
# Print the head of economics
head(economics)

# Using economics, plot unemploy vs. date
ggplot(economics, aes(x = date, y = unemploy)) +
  # Make it a line plot
  geom_line()
```
```{r}
# Change the y-axis to the proportion of the population that is unemployed
ggplot(economics, aes(date, unemploy/pop)) +
  geom_line()
```
### Multiple time series
We already saw how the form of your data affects how you can plot it. Let's explore that further with multiple time series. Here, it's important that all lines are on the same scale, and if possible, on the same plot.

fish.species contains the global capture rates of seven salmon species from 1950–2010. Each variable (column) is a Salmon species and each observation (row) is one year. fish.tidy contains the same data, but in three columns: Species, Year, and Capture (i.e. one variable per column).
```{r}
load("fish.RData")
```
```{r}
library(tidyr)
# Use gather to go from fish.species to fish.tidy
fish.tidy <- gather(fish.species, Species, Capture, -Year)
```
```{r}
str(fish.species)
str(fish.tidy)
```
```{r}
# Plot the Rainbow Salmon time series
ggplot(fish.species, aes(x = Year, y = Rainbow)) +
  geom_line()
```
```{r}
# Plot the Pink Salmon time series
ggplot(fish.species, aes(x = Year, y = Pink)) +
  geom_line()
```
```{r}
# Plot multiple time-series by grouping by species
ggplot(fish.tidy, aes(Year, Capture)) +
  geom_line(aes(group = Species))
```
```{r}
# Plot multiple time-series by coloring by species
ggplot(fish.tidy, aes(x = Year, y = Capture, color = Species)) +
  geom_line(aes(group = Species))
```
As we can see in the the last couple of plots, a grouping aesthetic was vital here. If you don't specify color = Species, you'll get a mess of lines.

# Themes

### Moving the legend

To change stylistic elements of a plot, call theme() and set plot properties to a new value. For example, the following changes the legend position.

    p + theme(legend.position = new_value)

Here, the new value can be

- "top", "bottom", "left", or "right'": place it at that side of the plot.
- "none": don't draw it.
- c(x, y): c(0, 0) means the bottom-left and c(1, 1) means the top-right.
```{r}
recess <- data.frame(
  begin = c("1969-12-01","1973-11-01","1980-01-01","1981-07-01","1990-07-01","2001-03-01", "2007-12-01"), 
  end = c("1970-11-01","1975-03-01","1980-07-01","1982-11-01","1991-03-01","2001-11-01", "2009-07-30"),
  event = c("Fiscal & Monetary\ntightening", "1973 Oil crisis", "Double dip I","Double dip II", "Oil price shock", "Dot-com bubble", "Sub-prime\nmortgage crisis"),
  y =  c(.01415981, 0.02067402, 0.02951190,  0.03419201,  0.02767339, 0.02159662,0.02520715),
  stringsAsFactors = F
  )

library(lubridate)
recess$begin <- ymd (recess$begin)
recess$end <- ymd (recess$end)
```
```{r}
plt_prop_unemployed_over_time = ggplot(economics, aes(x = date, y = unemploy/pop)) +
  ggtitle(c("The percentage of unemployed Americans \n increases sharply during recessions")) +
  geom_line() +
  geom_rect(data = recess, 
            aes(xmin = begin, xmax = end, ymin = -Inf, ymax = +Inf, fill = "Recession"), 
            inherit.aes = FALSE, alpha = 0.2) +
  geom_label(data = recess, aes(x = end, y = y, label=event), size = 3) + 
    scale_fill_manual(name = "", values="red", label="Recessions")

plt_prop_unemployed_over_time
```

```{r}
# View the default plot
plt_prop_unemployed_over_time

# Remove legend entirely
plt_prop_unemployed_over_time +
  theme(legend.position = "none")
```
```{r}
# Position the legend at the bottom of the plot
plt_prop_unemployed_over_time +
  theme(legend.position = "bottom")
```
```{r}
# Position the legend inside the plot at (0.6, 0.1)
plt_prop_unemployed_over_time +
  theme(legend.position = c(0.6,0.1))
```
But be careful when placing a legend inside your plotting space. You could end up obscuring data.

### Modifying theme elements

Many plot elements have multiple properties that can be set. For example, line elements in the plot such as axes and gridlines have a color, a thickness (size), and a line type (solid line, dashed, or dotted). To set the style of a line, you use element_line(). For example, to make the axis lines into red, dashed lines, you would use the following.

    p + theme(axis.line = element_line(color = "red", linetype = "dashed"))
    
Similarly, element_rect() changes rectangles and element_text() changes text. You can remove a plot element using element_blank().
```{r}
plt_prop_unemployed_over_time +
  theme(
    # For all rectangles, set the fill color to grey92
    rect = element_rect(fill = "grey92"),
    # For the legend key, turn off the outline
    legend.key = element_rect(color = NA)
  )
```
```{r}
plt_prop_unemployed_over_time +
  theme(
    rect = element_rect(fill = "grey92"),
    legend.key = element_rect(color = NA),
    # Turn off axis ticks
    axis.ticks = element_blank(),
    # Turn off the panel grid
    panel.grid = element_blank()
  )
```
```{r}
plt_prop_unemployed_over_time +
  theme(
    rect = element_rect(fill = "grey92"),
    legend.key = element_rect(color = NA),
    axis.ticks = element_blank(),
    panel.grid = element_blank(),
    # Add major y-axis panel grid lines back
    panel.grid.major.y = element_line(
      # Set the color to white
      color = "white",
      # Set the size to 0.5
      size = 0.5,
      # Set the line type to dotted
      linetype = "dotted"
    )
  )
```
```{r}
plt_prop_unemployed_over_time +
  theme(
    rect = element_rect(fill = "grey92"),
    legend.key = element_rect(color = NA),
    axis.ticks = element_blank(),
    panel.grid = element_blank(),
    panel.grid.major.y = element_line(
      color = "white",
      size = 0.5,
      linetype = "dotted"
    ),
    # Set the axis text color to grey25
    axis.text = element_text(color ="grey25"),
    # Set the plot title font face to italic and font size to 16
   plot.title = element_text(size = 16, face = "italic")
  )
```
Excellent Explanatory Plot! This plot is ready for prime time – it's pretty AND informative. Make sure that all your text is legible for the context in which it will be viewed.

### Modifying whitespace

Whitespace means all the non-visible margins and spacing in the plot.

To set a single whitespace value, use unit(x, unit), where x is the amount and unit is the unit of measure.

Borders require you to set 4 positions, so use margin(top, right, bottom, left, unit). To remember the margin order, think TRouBLe.

The default unit is "pt" (points), which scales well with text. Other options include "cm", "in" (inches) and "lines" (of text).
```{r}
# View the original plot
plt_mpg_vs_wt_by_cyl <- ggplot(mtcars, aes(wt, mpg, color = fcyl)) +
  ylab("Miels per gallon") + 
  xlab("weight (1000/lbs)") +
  geom_point()
plt_mpg_vs_wt_by_cyl
```
```{r}
plt_mpg_vs_wt_by_cyl +
  theme(
    # Set the axis tick length to 2 lines
    axis.ticks.length = unit(2, "lines")
  )
```
```{r}
plt_mpg_vs_wt_by_cyl +
  theme(
    # Set the legend key size to 3 centimeters
    legend.key.size = unit(3, "cm")
  )
```
```{r}
plt_mpg_vs_wt_by_cyl +
  theme(
    # Set the legend margin to (20, 30, 40, 50) points
    legend.margin = margin(20, 30, 40, 50, "pt")
  )
```
```{r}
plt_mpg_vs_wt_by_cyl +
  theme(
    # Set the plot margin to (10, 30, 50, 70) millimeters
    plot.margin = margin(10, 30, 50, 70, "mm")
  )
```
Changing the whitespace can be useful if you need to make your plot more compact, or if you want to create more space to reduce “business”.

## Built-in Themes

Built-in themes
In addition to making your own themes, there are several [out-of-the-box solutions](https://ggplot2.tidyverse.org/reference/ggtheme.html) that may save you lots of time.

- theme_gray() is the default.
- theme_bw() is useful when you use transparency.
- theme_classic() is more traditional.
- theme_void() removes everything but the data.
```{r}
# Add a black and white theme
plt_prop_unemployed_over_time +
  theme_bw()
```
```{r}
# Add a classic theme
plt_prop_unemployed_over_time +
  theme_classic()
```
```{r}
# Add a void theme
plt_prop_unemployed_over_time +
  theme_void()
```
The black and white theme works really well if you use transparency in your plot.

## Exploring ggthemes package

Outside of ggplot2, another source of built-in themes is the ggthemes package. 
```{r}
library(ggthemes)
```
```{r}
# Use the fivethirtyeight theme
plt_prop_unemployed_over_time +
  theme_fivethirtyeight()
```
```{r}
# Use Tufte's theme
plt_prop_unemployed_over_time +
  theme_tufte()
```
```{r}
# Use the Wall Street Journal theme
plt_prop_unemployed_over_time +
  theme_wsj()
```
ggthemes has over 20 themes for you to try.

## Setting themes

Reusing a theme across many plots helps to provide a consistent style. You have several options for this.

1. Assign the theme to a variable, and add it to each plot.
2. Set your theme as the default using theme_set().

A good strategy that we'll use here is to begin with a built-in theme then modify it.
```{r}
# Save the theme as theme_recession
theme_recession <- theme(
  rect = element_rect(fill = "grey92"),
  legend.key = element_rect(color = NA),
  axis.ticks = element_blank(),
  panel.grid = element_blank(),
  panel.grid.major.y = element_line(color = "white", size = 0.5, linetype = "dotted"),
  axis.text = element_text(color = "grey25"),
  plot.title = element_text(face = "italic", size = 16),
  legend.position = c(0.6, 0.1)
)

# Combine the Tufte theme with theme_recession
theme_tufte_recession <- theme_tufte() + theme_recession

# Add the Tufte recession theme to the plot
plt_prop_unemployed_over_time + theme_tufte_recession
```
```{r}
theme_recession <- theme(
  rect = element_rect(fill = "grey92"),
  legend.key = element_rect(color = NA),
  axis.ticks = element_blank(),
  panel.grid = element_blank(),
  panel.grid.major.y = element_line(color = "white", size = 0.5, linetype = "dotted"),
  axis.text = element_text(color = "grey25"),
  plot.title = element_text(face = "italic", size = 16),
  legend.position = c(0.6, 0.1)
)
theme_tufte_recession <- theme_tufte() + theme_recession

# Set theme_tufte_recession as the default theme
theme_set(theme_tufte_recession) 

# Draw the plot (without explicitly adding a theme)
plt_prop_unemployed_over_time
```
## Publication-quality plots

```{r}
plt_prop_unemployed_over_time +
  # Add Tufte's theme
  theme_tufte()
```
```{r}
plt_prop_unemployed_over_time +
  theme_tufte() +
  # Add individual theme elements
  theme(
    # Turn off the legend
    legend.position = "none",
    # Turn off the axis ticks
    axis.ticks = element_blank()
  )
```
```{r}
plt_prop_unemployed_over_time +
  theme_tufte() +
  theme(
    legend.position = "none",
    axis.ticks = element_blank(),
    # Set the axis title's text color to grey60
    axis.title = element_text(color = "grey60"),
    # Set the axis text's text color to grey60
    axis.text = element_text(color = "grey60")
  )
```
```{r}
plt_prop_unemployed_over_time +
  theme_tufte() +
  theme(
    legend.position = "none",
    axis.ticks = element_blank(),
    axis.title = element_text(color = "grey60"),
    axis.text = element_text(color = "grey60"),
    # Set the panel gridlines major y values
    panel.grid.major.y = element_line(
      # Set the color to grey60
      color = "grey60",
      # Set the size to 0.25
      size = 0.25,
      # Set the linetype to dotted
      linetype = "dotted"
    )
  )
```
## Using geoms for explanatory plots

Let's focus on producing beautiful and effective explanatory plots. In the next couple of exercises, we'll create a plot that is similar to the one shown in the video using gm2007, a filtered subset of the gapminder dataset.

This type of plot will be in an info-viz style, meaning that it would be similar to something you'd see in a magazine or website for a mostly lay audience.
```{r}
```


```{r}
library(gapminder)
gapminder
```


```{r}
gm2007 <- gapminder %>% 
  filter(year == 2007) %>% 
  select(country, lifeExp, continent) %>% 
filter(lifeExp > 80.6 | lifeExp <46) %>% 
  arrange(lifeExp)
gm2007
gm2007_full <- gapminder %>% 
  filter(year == 2007) %>% 
  select(country, lifeExp, continent)
```
```{r}
# Add a geom_segment() layer
ggplot(gm2007, aes(x = lifeExp, y = country, color = lifeExp)) +
  geom_point(size = 4) +
  geom_segment(aes(xend = 30, yend = country), size = 2) +
  theme(legend.position="right")
```
```{r}
# Add a geom_text() layer
ggplot(gm2007, aes(x = lifeExp, y = country, color = lifeExp)) +
  geom_point(size = 4) +
  geom_segment(aes(xend = 30, yend = country), size = 2) +
  geom_text(aes(label = lifeExp), color = "white", size = 1.5) +
  theme(legend.position="right")
```
```{r}
library(RColorBrewer)
# Set the color scale
palette <- brewer.pal(5, "RdYlBu")[-(2:4)]

# Modify the scales
ggplot(gm2007, aes(x = lifeExp, y = country, color = lifeExp)) +
  geom_point(size = 4) +
  geom_segment(aes(xend = 30, yend = country), size = 2) +
  geom_text(aes(label = round(lifeExp,1)), color = "white", size = 1.5) +
  scale_x_continuous("", expand = c(0, 0), limits = c(30,90), position = "top") +
  scale_color_gradientn(colors = palette) +
  theme(legend.position="right")
```
```{r}
# Set the color scale
palette <- brewer.pal(5, "RdYlBu")[-(2:4)]

# Add a title and caption
plt_country_vs_lifeExp <- ggplot(gm2007, aes(x = lifeExp, y = country, color = lifeExp)) +
  geom_point(size = 4) +
  geom_segment(aes(xend = 30, yend = country), size = 2) +
  geom_text(aes(label = round(lifeExp,1)), color = "white", size = 1.5) +
  scale_x_continuous("", expand = c(0,0), limits = c(30,90), position = "top") +
  scale_color_gradientn(colors = palette) +
  labs(title = "Highest and lowest life expectancies, 2007", caption = "Source: gapminder") +
  theme(legend.position="right")
plt_country_vs_lifeExp
```
## Using annotate() for embellishments

We completed our basic plot. Now let's polish it by playing with the theme and adding annotations. In this exercise, we'll use annotate() to add text and a curve to the plot.

The following values have been calculated for you to assist with adding embellishments to the plot:
```{r}
global_mean <- mean(gm2007_full$lifeExp)
x_start <- global_mean + 4
y_start <- 5.5
x_end <- global_mean
y_end <- 7.5
```

```{r}
# Define the theme
plt_country_vs_lifeExp <- plt_country_vs_lifeExp +
  theme_classic() +
  theme(axis.line.y = element_blank(),
        axis.ticks.y = element_blank(),
        axis.text = element_text(color = "black"),
        axis.title = element_blank(),
        legend.position = "none")
plt_country_vs_lifeExp
```
```{r}
# Add a vertical line
plt_country_vs_lifeExp <- plt_country_vs_lifeExp +
  geom_vline(xintercept = global_mean, color = "grey40", linetype = 3)
plt_country_vs_lifeExp
```
```{r}
plt_country_vs_lifeExp <- plt_country_vs_lifeExp  +
  annotate(
    "text",
    x = x_start, y = y_start,
    label = "The\nglobal\naverage",
    vjust = 1, size = 3, color = "grey40"
  )
plt_country_vs_lifeExp
```
```{r}
plt_country_vs_lifeExp <- plt_country_vs_lifeExp  +
  annotate(
    "curve",
    x = x_start, y = y_start,
    xend = x_end, yend = y_end,
    arrow = arrow(length = unit(0.2, "cm"), type = "closed"),
    color = "grey40"
  )
plt_country_vs_lifeExp
```
 Your explanatory plot clearly shows the countries with the highest and lowest life expectancy and would be great for a lay audience.

grahampravall1951.blogspot.com
Source: https://rstudio-pubs-static.s3.amazonaws.com/614151_17aa2ca6247b410ab70a15169f3e828d.html
Postar um comentário for "Why is the Legend Changing From Continuous to Categorical Ggplot"