Preprocess: LDA and Kernel PCA in Python

Principal component analysis (PCA) is an unsupervised linear transformation technique that is widely used across different fields, most prominently for dimensionality reduction. We talked about it here: https://charleshsliao.wordpress.com/2017/05/28/preprocess-pca-application-in-python/ We use the data from sklearn library, and the IDE is Python3. Most of the code comes from Sebastian Raschka’s book: https://www.goodreads.com/book/show/25545994-python-machine-learning?ac=1&from_search=true ###1. import the data ###pls […]

Continue reading


NBA Winning Estimator with Decision Tree in Python

It would be interesting to conduct prediction to understand the trend of NBA winning teams. We will use data from http://www.basketball-reference.com/leagues/NBA_2017_games-june.html and follow workflow. More details can be found in Robert Layton’s book here: https://www.goodreads.com/book/show/26019855-learning-data-mining-with-python?from_search=true ###1. Load data from http://www.basketball-reference.com/leagues/NBA_2017_games-june.html import pandas as pd file=”NBA2017.csv” NBA2017=pd.read_csv(file,sep=”,”,parse_dates=[“Date”]) #change string of “Date” to date value NBA2017.columns=[“Date”, “Start […]

Continue reading


Preprocess in Python-Scale

We use the data from sklearn library, and the IDE is sublime text3. Most of the code comes from the book: https://www.goodreads.com/book/show/32439431-introduction-to-machine-learning-with-python?from_search=true ###Preprocess methods ###The StandardScaler in scikit-learn ensures that for each feature, the mean is zero, ###and the variance is one, bringing all features to the same magnitude. However, ###this scaling does not ensure […]

Continue reading