Hello!

This week I read Chapter 9 in "Data Visualization with R" by Rob Kabacoff. This chapter is about several different types of graphs that don't really fit in the other chapters.

3-D Scatterplot

This graph is pretty self-explanatory. It can't be created using the ggplot2 package (ggplot2 can't make 3-D graphs), so we use the conveniently named scatterplot3d function in the scatterplot3d package.

The book doesn't really explain what kind of information this graph is best for showing, but it uses the data in the mtcars dataframe, and shows mileage vs. engine displacement vs. car weight. Customizing the labels for this type of graph is actually very simple. You type directly what labels you want in the xyz axis, and customize the graph much like how you would other graphs. You can add vertical lines going up to each point, customize the colors of the lines and points, the weight, etc.


Biplots

A biplot graph represents the relationship between: observations, variables, and observations and variables.

To make this type of graph, you fviz_pca function from the factoextra package. This function creates a ggplot2 graph.

Looking at the picture below, you see the labels say Dim1 and Dim2. These are linear combinations of the original p variables.

The points represent observations. The smaller the distance, the more similar the values of the original set of variables.

The vectors represent variables. The angle between vectors show correlation between variables. Smaller angle means stronger correlation.

The points farthest up the vector length have the highest values for that variable.


Bubble Chart

This type of graph is just a scatterplot, but with larger, generally transparent variables. You can customize the transparency, color, and border color of the bubbles.

This example shows automileage by weight and horsepower.


Flow Diagram

Flow diagrams show a dynamic relationship, such as flow of people, materials, or objects through a set of nodes. And example of a flow diagram is the Sankey diagram. In this type of flow diagram, the wide of the line between two nodes is proportional to the flow amount. This example uses the UK energy forecast data, which shows energy production and consumption for 2050.

The nodes and the links between nodes and the flow between them are made separately. They are made using the sankeyNetwork function in the networkD3 package. Note that this is NOT a ggplot2 graph.

This example doesn't seem to be wanting to work on R for me. It has a few error messages. One is about a different version of R, so maybe that's the issue. Here's the example from the book!


Another type of flow diagram is an Alluvial diagram. These diagrams an be used in place of a mosaic plot, and show the relationship among categorical variables. Block represent a cluster of observations, and stream fields represent changes to the clusters over time. They don't just have to be used for time, though. The book uses Titanic survival  as an example for this graph type.
To make an alluvial diagram, you use the ggalluvial package, which makes a ggplot2 graph. I figured this would also not work if the Sankey diagram didn't work, and I was right, so the book picture will have to suffice.

Heatmaps

A heatmap uses a set of colors to display values of variables. These graphs are great if you have several variables. R has a heatmap function, and also a superheat package. You can use clustering to sort the rows and/or columns, and change the text and label sizes.

The data for the superheat function must be formatted like so:

  • numeric
  • row names used to label the left axis.
  • missing values are allowed

Radar Charts

Radar charts are also called spider or star charts. This example definitely looks more spider-like. The charts show one or more groups or observations on three or more quantitative variables. This graph requires the data be in the following format:
  • the fist variable is a group, and contains identifier for each observation.
  • the numeric variables values must be from 0 to 1.
This example is a little more fun. It compares dogs, pigs, and cows in body size, brain size, and sleep characteristics, which comes from the Mammal Sleep dataset.

Radar charts can be created with ggradar function in the ggradar package. It isn't readily available, so I copied and pasted the code in the book to install it. It still didn't work in Rstudio.

Scatterplot Matrix

This graph type is basically several scatterplots organized as a grid. Although it is similar to a correlation plot, it doesn't show correlations, and instead shows underlying data.

This type of graph is made using the ggpairs function in the GGally package. This example also shows relationship between animal size and sleep characteristics.

I swear none of these graphs want to work on my version of R. It shows the default version of the graph, but doesn't do anything when I paste in the customizations for the graph.

This graph type is pretty neat, because it's showing a few different types of graphs. Each of these graphs can be customized the same way as the stand-alone version of their graphs. You can change colors, add a line of best fit to the scatter plots, and shade the area under the curve of the kernel density plot. The numbers are correlation coefficients.

Below is what my RStudio is showing:


Waterfall Charts

This type of chart shows the cumulative effect of a sequence of positive and negative values such as revenue and expenses. 

And finally! A graph that works. This example isn't using a particular dataset apparently. It's just showing revenue and expenses for an imaginary company.

This graph type also has a conveniently named function and package: waterfall is the function, and the package is waterfalls. This is a ggplot2 graph, so you can use all the ggplot2 options.


Word Clouds

A word cloud is basically a cluster of words that appear the most often in a text, or collection of texts. I've seen these all over the place, and I never knew they were an actual graph type.
In order to use word clouds, you need to download packages, so I was not confident that it would work... And it did not work. It actually couldn't even download the package.
The example in the book uses  President Kennedy's Address during the Cuban Missile Crisis, and this is the result:



That is all the graphs for this chapter! Next section will actually be the last section for this semester!

Comments

Popular posts from this blog

Week 13

Blog Post 2 -- Spring 2023 -- Professional Identity

Semester 2 week 4