Introduction
I have included some nice reference materials so that I can always get to this information quickly. These are pdf files that you can download.
- The Periodic Table of Alteryx Tools – Front Side
- The Periodic Table of Alteryx Tools – Back Side
- Regular Expressions Cheat Sheet
Online References
- Tableau Mapping
- Parsing Command Line Parameters (thanks to Joe Mako)
R-Based Data Science Curriculum
DataCamp Courses
DataCamp soundly believes in educating people to be the best data scientists possible. As such, they allow students to take as many classes as they would like for free while enrolled…and there are a LOT to choose from, not only in R but in Python, SQL, and others. Below is a comprehensive list of classes that were available in Jan 2017.
Make sure to register for Data Camp.
R Programming
- Introduction to R (mostly working with data structures like vectors, matrices, factors, dataframes and lists)
- Intermediate R (if/then, loops, functions, the apply family, functions and debugging, working with text via regular expressions and substitutions, working with dates)
- Working With Dates and Times in R (using the lubridate package)
- Writing Functions in R (uses the purr package) to help write functions and is the “dplyr” of function-writing; course covers handling errors, arguments, etc., and it a bit more advanced treatment.
- Writing Efficient R Code (benchmarking/timing, profiling, parallel programming, very advanced stuff)
- Reporting with R Markdown (those .Rmd files you’re always using…)
Reading in, Cleaning, and Manipulating Data
- Importing Data in R part 1 (using the readr and data.table packages for reading in data, reading in Excel data, XLConnect with Excel)
- Importing Data in R part 2 (importing from databases, using SQL in R, importing data from the web with API, JSON, importing data from SAS, STATA, SPSS, etc)
- Cleaning Data in R (uses the tidyr package to help separte/unite columns, handle messy data, do string and data type conversion, handle missing values and data errors)
- Manipulating data with dplyr (learning the dplyr and tbls packages for selecting, mutating, filtering, arranging, grouping, summarizing, and aggregating data)
- Introduction to Data (an overview of working with data with some cautionary tales, uses dplyr)
- Data Table Manipulation in R (another class using dplyr)
- Joining Data in R with dplyr (another dplyr course; using this package will make you an expert in manipulating data)
- Introduction to Spark in R using sparklyr (Big Data topics)
Working with and Summarizing (Structured) Data
- Exploratory Data Analysis (charts, tables, counts vs proportions, histograms, boxplots, density plots, numerical and graphical summaries, case study)
- Case Studies in Importing and Cleaning Data in R (a few case studies were data issues crop up such as warnings, dates, removing redundant data, readxl, data type conversions, separating columns, replacing missing values, removing useless columns, splitting data)
- Exploring Pitch Data with R (an extended case study in Baseball using tapply, prop.table, ggplot, and “for” loops to do some analytics)
- Exploratory Data Analysis in R Case Study (practice with dplyr and ggplot2, intro to broom package and tiny, looks at UN General Assembly voting history)
- Case study in Credit Risk Modeling (doesn’t seem like a financial analytics example exclusively but rather a great case study)
Visualization
- Data Visualization in R (overview of plot, lines, points, par, adding text, lines, legend functions, histogram, boxplot, etc., making effective plots and plot layouts)
- Data Visualization with ggplot2 (if you ever want to be a true expert at making ridiculously great and flexible visualizations, this sequence is for you) Part 1, Part 2, Part 3
- Communicating with Data in the tidyverse (how to make great graphics and presentations with ggplot2 and markdown)
- Data Visualization with lattice (another package for making great plots)
- Data Visualization with ggvis (another package for making great plots)
Regression
- Correlation and Regression (basics)
- Inference for Linear Regression (variability of coefficients, simulation and bootstrapping of coefficients, assumptions of model and what to do when violated)
- Multiple and Logistic Regression (handling categorical predictors, interactions, adjusted R2, case study with Italian restaurants)
- Supervised Learning in R with Regression (machine learning perspective of regression, uses some ggplot2 and tree-based models, one-hot encoding of categorical variables with designTreatmentZ)
Machine Learning and Data Mining (BAS 474 stuff!)
- Introduction to Machine Learning (overview and basic algorithms, performance measures and bias/variance, crossvalidation)
- Machine Learning with Tree Based Models (rpart, randomForest, gbm)
- Supervised Learning / Classification in R (nearest neighbor, naive bayes, logistic regression, classification trees and random forest)
- Unsupervised Learning in R (kmeans and hierarchical clustering, dimension reduction with PCA case study, Pokemon data)
- Machine Learning Toolbox (learning with caret, linear/logistic regression, tuning parameters, pre-processing data, selecting and comparing models)
Time Series and Forecasting (BAS 475 stuff)
- Introduction to Time Series Analysis (autoregression and simple moving averages)
- Visualizing Time Series (mostly using plot(), case study in selecting a stock that improves portfolio)
- ARIMA modeling in R (seasonal and non-seasonal)
- Forecasting with R (ARIMA, smoothing methods, dynamic harmonic regression, TBATS)
- Manipulating Time Series Data and Case Studies (using xts and zoo, cases about flights, weather, unemployment, GDP, sports) and Forecasting
- String Manipulation in R with stringr (formatting characters and strings, regular expressions replacements, case study)
- Introduction to Text Mining and Bag of Words (basic handling of data, word clouds, distance matrices and dendrograms, n-grams, case study)
- Sentiment Analysis in R (qdap’s sentiment function polarity(), visualizing sentiment, airBnB reviews)
- Sentiment Analysis the tidy way (Tweets, Shakespeare, TV news, songs)
- Working with Web Data in R (API, httr to interact with APIs, JSON, XLM, scraping with XPATHs, CSS web scraping)
Probability and Statistics
- Foundations of Probability (binomial distribution, simulations, Bayesian statistics, Poisson and Geometric distributions, the replicate function)
- Foundations of Inference (randomization, hypothesis tests, confidence intervals, bootstrapping)
- Inference for Numerical Data (bootstrapping, t-tests, differences in averages, ANOVA for comparing many averages)
- Statistical Modeling in R Part 1 and Part 2 (unsure what to think of this class, I feel it’s approaching the field from a totally different angle that what I’m used to thinking of)
Spatial Analysis (geo-spatial statistics)
Network Analysis in R (e.g., social networks)
- igraph is an amazing package in R that handles nearly every aspect of network analysis you might be interested in
Finance in R