Technical notes on all things Data Science Setup Workspace Run Python code using Jupyter Notebook Install SciPy and Scikit-learn Launch Jupyter Notebook in different directory using Icon Python Flow control Function basics Load CSV file with Python Data Wrangling NumPy array basics Pandas Series Pandas DataFrame Basics Load CSV file with Numpy Load CSV file with Pandas Preprocessing data: Standardization using scikit-learn Preprocessing data: Rescaling using scikit-learn Preprocessing data: Normalization using scikit-learn Preprocessing data: Binarization using scikit-learn Probability and Statistics Generate descriptive statistics Calculate correlation of columns Calculate skew of columns Summary statistic: Mean, Variance and Effect size Data Visualization Matplotlib Line Plot Matplotlib Scatter Plot Matplotlib: Histogram Matplotlib: Density Plot Matplotlib: Box and Whisker Plot Matplotlib: Correlation Matrix Plot Matplotlib: Scatter Plot Matrix Machine Learning Feature selection: Univariate Selection Feature selection: Recursive Feature Elimination Feature selection: Principal Component Analysis Feature selection: Feature Importance Scikit-learn: Estimator objects and how to choose the right one Cross-validation: Train Test Split Cross-validation: k-fold Cross-validation: Leave One Out Cross-validation: Random Test-Train Splits Classification Model evaluation: Classification Accuracy Classification Model evaluation: Logarithmic Loss Classification Model evaluation: Area Under ROC Curve Classification Model evaluation: Confusion Matrix Classification Model evaluation: Classification Report Regression Model evaluation: Mean Absolute Error Regression Model evaluation: Mean Squared Error Regression Model evaluation: R2 Classification Algorithm: Logistic Regression Classification Algorithm: Linear Discriminant Analysis Classification Algorithm: k-Nearest Neighbors Classification Algorithm: Naive Bayes Classification Algorithm: Classification and Regression Trees Classification Algorithm: Support Vector Machines Regression Algorithm: Linear Regression Regression Algorithm: Ridge Regression Regression Algorithm: LASSO Regression Regression Algorithm: Elastic Net Regression Regression Algorithm: k-Nearest Neighbors Regression Algorithm: Classification and Regression Trees Regression Algorithm: Support Vector Machines Comparison of Machine Learning Algorithms Pipeline: Data Preprocessing and Modeling Pipeline: Feature Selection and Modeling Bagging Ensemble: Bagged Decision Trees Bagging Ensemble: Random Forest Bagging Ensemble: Extra Trees Boosting Ensemble: AdaBoost Boosting Ensemble: Stochastic Gradient Boosting Voting Ensemble: VotingClassifier Hyperparameter optimization: Grid Search Hyperparameter optimization: Random Search Save and Load Model using pickle Save and Load Model using Joblib