If you're having trouble programming data file imports, try RStudio's code preview

When it comes to importing flat data files stored locally on your computer, such as csv’s or xls’s, you might be uncertain which method to use. It can also be hard to remember how to do it or the options that are available for various file types. Thankfully, RStudio will suggest code to use to import the files. In the lower right quadrant (the area tabbed with “Files,” “Plots,” “Packages,” “Help,” and “Viewer”) make sure you’re on the “Files” tab and navigate to the folder where your data file is stored.

Continue »

Note to Self: Using the filter and select functions from the dplyr package

This is the first post in a series where I write to myself regarding the various data science spells I’m learning. Today’s spell: dplyr’s filter function. For some reason, upon learning how to filter data with the dplyr package, I thought that function was designed to only remove or discard data, specifically columns. That is not the case and I’m writing this blog post to try and correct this automatic thinking in my brain.

Continue »

Plotting multiple lines on the y axis of a ggplot graph

I wanted to plot the yearly sales of three different types of hybrid and electric vehicles on the same graph. The dataset was originally wide with years as columns and the types of cars as rows. After cleaning the data (making it skinner by switching cars to columns and years to rows) and saving it to the name “ev_csv_3”, it was time to plot. In order to have multiple y-axis lines, simply skip entering a y argument in the aes function in the first line of the ggplot call.

Continue »