Yay! The district's computer system is back up, so I can access the blog from my computer again.
This week I read about and practiced Univariate Graphs. This variable can be categorical or quantitative. A categorical variable is something such as race or sex, is usually plotted using a bar graph, pie graph, or tree map. A quantitative variable is something like age or height, and is usually plotted using a histogram, kernel density plot or dot plot.
Both categorical and quantitative variables use the ggplot2 function, which I used last week. I followed along with the book, and started out by following the examples to demonstrate a categorical variable. The dataset being used is The Marriage dataset that has the records of 98 couples in Mobile County, Alabama.
Categorical Variable Graphs
Bar Graph
The first graph demonstrated is a bar graph. Just like with the graph from my last blog, you can change and edit everything about the graph including color, using percent symbols for the numbers, ordering the bars from smallest to largest (or largest to smallest), etc.
Tree Map
Quantitative Variable Graphs
Histogram
We are going to continue using the Marriage dataset to demonstrate quantitative variable graphs, but instead of race we're going to categorize the individuals by age. If you want a histogram graph, you type geom_histogram. You edit the colors and x-axis labels the same as you would a bar graph. The default histogram graph has no separation between the categories. To add separation, you can change the border color by typing color = "white".
Bins, or the number of bars the plot has, are the most important part of a histogram graph. You can choose how many bins you want your histogram to have by typing bins = (number of bins)in the geom function. You can also choose the width of the bins by typing binwidth = (number here). The width chooses the distance between each bin. For example, if you type 5 for binwidth, each bar will contain 5 years. The graph below shows a histogram with a binwidth of 5, and percent of individuals in each age group:
This graph is basically a histogram, but with a smooth curve where the area under the curve equals 1. It's easier to show than explain! To make this graph, you just type geom_density() for the default graph, which is literally a wavy line. You can use the fill = (insert color) option to fill below the line with color:
Using the bandwidth edits the curves of... the curve. The default is 5.18, which is fairy wide, but you can change the bandwidth to better define each peak. This goes within the geom_ function, just like color or binwidth and is typed bw = (insert number here). If we make the bandwidth 1, the graph looks like this:
Dot Chart
Last but not least, a dot chart is another form of histogram, except each participant is represented as their own individual dot. The charts are best for variables with not too many participants. You want be able to easily count the number of participants in each category. You make a dot plot using geom_dotplot(). Of course, you can edit the size, fill color, border color, and many other options:
That's it for this week! Next week, I am reading about bivariate graphs!
Comments
Post a Comment