Hello!
This week I looked at both time-dependent graphs in chapter 7, and statistical models in chapter 8.
Time-dependent Graphs
Time-dependent graphs are used to show change over a period of time. The most common graph type used is a time series line graph, but you an also use dumbbell charts and slope graphs.
Time series
Time series graphs are a set of quantitative values at different time points that are an equal length of time apart.
The data used for this chapter is the Economics time series included in the ggplot2 package. This shows economic data from January 1967 through January 2015.
Using the scale_x_date function, you can reformat the dates.
- data frame
- time variable (must be a factor)
- numeric variable to be plotted
- grouping variable (one line per group)
Area charts
An area chart is just a line graph that is shaded below the line. A stacked area chart can be used to show differences between variables over time. Stacked area charts are best when interest is on both group change over time, and overall change overtime.
The default for this has the population number set to thousands, which gives a label in scientific notation. To fix this, you simply divide the word "thousands" by 1000, and add a label saying that the population is in million.
Here is the final product:
Statistical Models
Statistical models show the relationship between explanatory variables and response variables. This chapter goes over models with a single response variable that is either quantitative or binary (yes/no).
There are 5 different types of plots used to visualize statistical models: correlation plots, linear regression, logistic regression, survival plots, mosaic plots.
Correlation plots
Correlation plots show the pairwise relationship between quantitative variables using color and shading.
This example uses the Saratoga Houses dataset, which shows sale price and characteristics of Saratoga County, NY homes in 2006.
It is super easy to visualize the data. You just type library(ggcorrplot) and ggcorplot(r) along with your usual library(ggplot2)package.
The ggcorrplot function has tons of options such as:
- hc.order = TRUE reorders and sorts variables that are similar.
- type = "lower" plots the lower portion of the correlation matrix
- lab = TRUE displays the correlation coefficients on the plot.
Linear regression visualizes the relationship between a quantitative response variable and an explanatory variable.
Survival Plots
This statistical model is exactly how it sounds. It is common healthcare research, where there is interest in time to recovery, time to death, and time to relapse. This plot can also be used to show the probability that an individual will survive up to time t.The example uses data from the Titanic sinking, and what role sex played in survival. This does not seem to want to work on my version of R, so I'll just screenshot from the text.
Comments
Post a Comment