Yay! The district's computer system is back up, so I can access the blog from my computer again.

This week I read about and practiced Univariate Graphs. This variable can be categorical or quantitative. A categorical variable is something such as race or sex, is usually plotted using a bar graph, pie graph, or tree map. A quantitative variable is something like age or height, and is usually plotted using a histogram, kernel density plot or dot plot.

Both categorical and quantitative variables use the ggplot2 function, which I used last week. I followed along with the book, and started out by following the examples to demonstrate a categorical variable. The dataset being used is The Marriage dataset that has the records of 98 couples in Mobile County, Alabama. 

Categorical Variable Graphs

Bar Graph

The first graph demonstrated is a bar graph. Just like with the graph from my last blog, you can change and edit everything about the graph including color, using percent symbols for the numbers, ordering the bars from smallest to largest (or largest to smallest), etc.


The picture below is demonstrating the graph with percent symbols. The original code the book provides doesn't have anything for geom_bar, so the graph is a default gray. I decided to edit the geom_bar section. The left is the original code the book gives, and the right is my edit:

 

I didn't do too much, but it still gave me a chance to look at the code, and figure out why the example graph went from blue, back to a default grey when the numbers were changed to percentages.

This is how you would edit and make the graph sort itself into descending order, as well as add labels to each bar so you can see the exact percentages for each race:



Sometimes you might also run into the issue of having your graph labels overlap each other if they have really long names, or if there are several labels. There's a few ways to fix this, but I'm just going to mention 2. First, you can simply rotate the orientation of the whole graph from vertical to horizontal by adding a line of code that reads coord_flip(). Another option is the rotate the axis labels so they are read diagonally. This is a little more complicated, but still fairly simple. To rotate the labels, you use the theme function like so: theme(axis.text.x = element_text(angle = (insert any angle), hjust = 1))

Pie Graph

Pie charts are apparently not typically used in statistics, but they can work if you are trying to compare each category with a whole, and if you have a small number of categories. The book only has 2 examples for the pie chart: a basic pie chart with no labels and a legend, and a pie chart with no legend and labels. The picture below shows both sets of code, but since the labeled pie chart code is the most recent, that's the graph shown:


Tree Map

A tree map is basically a pie chart, but square. Tree maps can handle several categories that have many levels. I think they look a bit confusing, but I'm sure they're good for some things.

Quantitative Variable Graphs

Histogram

We are going to continue using the Marriage dataset to demonstrate quantitative variable graphs, but instead of race we're going to categorize the individuals by age. If you want a histogram graph, you type geom_histogram. You edit the colors and x-axis labels the same as you would a bar graph. The default histogram graph has no separation between the categories. To add separation, you can change the border color by typing color = "white".

Bins, or the number of bars the plot has, are the most important part of a histogram graph. You can choose how many bins you want your histogram to have by typing bins = (number of bins)in the geom function. You can also choose the width of the bins by typing binwidth = (number here). The width chooses the distance between each bin. For example, if you type 5 for binwidth, each bar will contain 5 years. The graph below shows a histogram with a binwidth of 5, and percent of individuals in each age group:



Kernel Density Plot

This graph is basically a histogram, but with a smooth curve where the area under the curve equals 1. It's easier to show than explain! To make this graph, you just type geom_density() for the default graph, which is literally a wavy line. You can use the fill = (insert color) option to fill below the line with color:

Using the bandwidth edits the curves of... the curve. The default is 5.18, which is fairy wide, but you can change the bandwidth to better define each peak. This goes within the geom_ function, just like color or binwidth and is typed bw = (insert number here). If we make the bandwidth 1, the graph looks like this:


Dot Chart

Last but not least, a dot chart is another form of histogram, except each participant is represented as their own individual dot. The charts are best for variables with not too many participants. You want be able to easily count the number of participants in each category. You make a dot plot using geom_dotplot(). Of course, you can edit the size, fill color, border color, and many other options:


That's it for this week! Next week, I am reading about bivariate graphs!

Comments

Popular posts from this blog

Week 13

Blog Post 2 -- Spring 2023 -- Professional Identity

Semester 2 week 4