Hello!

I got a little behind on my blog for this semester, but here I am!

This semester we are learning to use a statistical graphic coding language called R. This language seems to be used to visualize data with graphs and tables. I don't know a whole lot about R other than that, but I am excited to learn a new language since my major is actually in Computer Science. This will be a fun opportunity to enjoy both computer science and biology together.

So, I started reading chapter 1 in Data Visualization with R by Rob Kabacoff. The first chapter is about preparing your data for visualization. This can be done by importing your data from Excel, text files, and even databases using different packages. 

Text files are imported using the "readr" package, Excel spreadsheets using the readxl package, and statistical packages using the haven package. Importing data from databases is apparently much more complicated, and is not included in the book I am referencing.

Using RStudio, you can type your commands onto the console window. The book is using employee salaries as examples, so I'll mirror those examples. 

To import a text file from a comma delimited file you would type in:

library(readr)

Salaries <- read_csv("salaries.csv")

For Excel, you type something similar, but you specify the sheet you want:

library(readrxl)

Salaries <- read_excel("salaries.xlsx", sheet=1)

If you wanted to import data from a statistical package from Stata (Note: I'm not sure specifically what Stata is) then you would type:

library(haven)

Salaries <- read_sav("salaries.dta")

I decided to try making a simple little table on Excel to try importing it onto R and seeing what happens. I'm a huge Lord of the Rings nerd, so I decided to make a very incomplete table sorting out the different races and their groups in Middle Earth. FOR SCIENCE.


This is what I typed into Excel. It's very messy and I did not clean it up at all, which didn't help:




This still gave me a much better idea about how R sorts and displays your data, and how I should type my data on Excel to make it look cleaner on R. I decided it would be easier and require less research on my part if I just put character names from the Lord of the Rings universe, so here is take two:


It looks MUCH more tidy on R now!

So moving on.... 

After importing data, you must begin sorting it. The book shows examples of this using Star Wars. To sort data, there are two packages you can use: dplyr, and tidyr. The book has a table that shows both package's uses and functions:



It seems to me like these functions are used if you do not have the data being imported from a database, and must sort it yourself. I just messed around with the book examples for this, which uses Star Wars as an example. In a nutshell, the dplyr package is used to sort, filter, create, and change your data, while the tidyr package is used to change how your data is displayed.

This is using the mutate, filter, and group_by functions, as well as showing how you can make the code more compact:

Simply put, it looks like you can put the name of your dataset (starwars) next to newdata <-, and use the two percent signs at the end of each line instead of typing out newdata <- every single line.

The tidyr package, like I said, is used to change how the data is displayed, whether you want everything displayed vertically (long data) or horizontally (wide).



Comments

Popular posts from this blog

Week 13

Blog Post 2 -- Spring 2023 -- Professional Identity

Semester 2 week 4