Hello! This week I am learning about Multivariate graphs, and Maps.
Multivariate graphs
Similar to bivariate graphs, multivariate graphs are used to display 3 or more variables. The two most common ways to show the relationship between variables is grouping and faceting.
Grouping
Grouping lets you plot data for several groups on a single graph. The values of the first two variables are mapped to the x and y axis. The other variables are mapped according to color, shape, size, and other visual characteristics.
This chapter in the book uses the salaries dataset that shows the relationship between years since receiving a Ph. D and salary. There are several options you can change. For example, you can change the shape of the elements representing the sex of the professors, and different colors to show what their job title was. You can even change the size of the elements. In the picture of the graph below, the element size represents years they have been working:
You can also add best fit lines to graphs to better show the relationship between variables:
Another method is using faceting. Faceting creates several small graphs, one for each level of a third variable, or combination of variables. Using the facet_wrap function, you can create a separate graph for each level of job rank for the Salaries dataset. The ncol option controls the number of columns. You can also customize the graph using the function facet_grid, which assigned a certain variable to the rows, and another variable to the columns, like the graph below:
You can also combine faceting and grouping graphs together, and customize several variables from color, element shape, etc. This is pretty much done the same as the previous chapters, except you need to use the facet_ function to customize certain options such as the columns and rows, etc. This chapter was pretty brief, so that's it for Multivariate graphs! Chapter 6 is about Maps.
Maps
This chapter I finally get to learn something newer! The last couple chapters got a bit redundant for me. This chapter is super cool, because it uses actual geographical maps to display data. Fun! This section uses ggmap and choroplethr.
Dot Density Map
Dot density maps use points on a map to show spatial relationships. For these examples, the book uses a dataset for crime in Houston. It contains the date, time, and address of 6 types of crimes from January 2010 - August 2010. The
geocode function uses Google Maps API to take the address and return the latitude and longitude coordinates,
Unfortunately the code for this section doesn't seem to want to work. Rstudio says Google Maps now requires and API key.
For this section, the crime data for rapes is shown. The first thing you want to do is import the data. So for this section, the date of the offense, the crime (rape), address, and long./lat. coordinates are imported. After this, you set up the graph.
- The first step is to find the center coordinates for Houston, Texas using: houston_center <- geocode("Houston, TX") which returns the latitude and longitude coordinates.
- Second, you get the background map image. You do this by specifying a zoom factor from 3 to 21. Three is for the continent, and 21 zooms all the way down to a specific building. The default setting is 10, which is for a city. You also pick a map type, which includes terrain, terrain-background, satellite just to name a few. To find the center of Houston, TX, you would type:
houston_map <- get_map(houston_center,
zoom = 13,
maptype = "roadmap")
ggmap(houston_map)
- The third step is to add your graph elements to the map. This is pretty much done the same as for other dot plot graphs, except you add the following to the ggmap function: ggmap(houston_map, base_layer - ggplot(data = rapes, aes(x=long, y = lat))) + then use the geom _ function for the points.
- The fourth and final step is cleaning up the graph and adding labels. The result is the following graph:
Choropleth
Choropleth maps use color and/or shading to indicate numeric values of a variable in that area. A quick example that we've probably all seen this last year is a map of the US showing COVID-19 numbers in each state. This chapter shows 3 different map examples, but they're honestly all the exact same thing on different scales (world, country, state/province).
The first example shows a world map with life expectancy in different regions for the 2007 gapminder data.
Much like the dot density map, the first thing you do is import and prepare the date. After that, you prepare your map, add your data, clean it up, and add labels.
The choroplethr package has several functions for creating a choroplethr map very easy. You can use the country_choropleth function to import countries. This function requires that there be a column named region, and a column named value. The entries in the region column must match the entries in the region column of the dataset country.map from the choroplethrMaps package.
After this, you have to edit the gapminder dataset so that is R is able to read it. Remember, your entries must match the date exactly! Here is a screen shot of the code:
After this you make the map: Just like how choroplethr has a country function, it also has functions to view a specific country (like the US), or even a specific counties within a state. This is all literally done exactly the same as the country/world map. It's also worth mentioning that just like with the dot density map, you can also use the Google Maps API (if you have the key) to get more exact locations and longitude and latitude coordinates.
Comments
Post a Comment