NBA Winning Estimator with Decision Tree in Python

It would be interesting to conduct prediction to understand the trend of NBA winning teams. We will use data from http://www.basketball-reference.com/leagues/NBA_2017_games-june.html and follow workflow. More details can be found in Robert Layton’s book here: https://www.goodreads.com/book/show/26019855-learning-data-mining-with-python?from_search=true ###1. Load data from http://www.basketball-reference.com/leagues/NBA_2017_games-june.html import pandas as pd file=”NBA2017.csv” NBA2017=pd.read_csv(file,sep=”,”,parse_dates=[“Date”]) #change string of “Date” to date value NBA2017.columns=[“Date”, “Start […]

Continue reading


Preprocess: t-SNE in Python

We use the data from sklearn library, and the IDE is sublime text3. Most of the code comes from the book: https://www.goodreads.com/book/show/32439431-introduction-to-machine-learning-with-python?from_search=true ###There is a class of algorithms for visualization called manifold learning algorithms ###which allows for much more complex mappings, and often provides better visualizations compared with PCA. ###A particular useful one is the […]

Continue reading


Preprocess: PCA Application in Python

We use the data from sklearn library, and the IDE is sublime text3. Most of the code comes from the book: https://www.goodreads.com/book/show/32439431-introduction-to-machine-learning-with-python?from_search=true ###sometimes we might face the situation that the features or vars in the data are not separate from each other ###We can always observe that data before we can even preprocess it with […]

Continue reading


Preprocess in Python-Scale

We use the data from sklearn library, and the IDE is sublime text3. Most of the code comes from the book: https://www.goodreads.com/book/show/32439431-introduction-to-machine-learning-with-python?from_search=true ###Preprocess methods ###The StandardScaler in scikit-learn ensures that for each feature, the mean is zero, ###and the variance is one, bringing all features to the same magnitude. However, ###this scaling does not ensure […]

Continue reading


Multi Layer Perceptrons in Python

You can see more about MLP in R here:https://charleshsliao.wordpress.com/2017/04/10/tune-multi-layer-perceptron-mlp-in-r-with-mnist/ Generally speaking, a deep learning model means a neural network model with more than just one hidden layer. Whether a deep learning model would be successful depends largely on the parameters tuned. We use the data from sklearn library, and the IDE is sublime text3. Most […]

Continue reading


Ensemble with Gradient Boosting in Python

We use the data from sklearn library, and the IDE is sublime text3. Most of the code comes from the book: https://www.goodreads.com/book/show/32439431-introduction-to-machine-learning-with-python?from_search=true from sklearn.ensemble import GradientBoostingClassifier import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.datasets import load_breast_cancer cancer=load_breast_cancer() X_train, X_test, y_train, y_test = train_test_split( cancer.data, cancer.target, random_state=0) ###Gradient boosted regression trees is another ensemble […]

Continue reading


Logistic Regression in Python to Tune Parameter C

The trade-off parameter of logistic regression that determines the strength of the regularization is called C, and higher values of C correspond to less regularization (where we can specify the regularization function).C is actually the Inverse of regularization strength(lambda) We use the data from sklearn library, and the IDE is sublime text3. Most of the […]

Continue reading


Quick KNN Examples in Python

Walked through two basic knn models in python to be more familiar with modeling and machine learning in python, using sublime text 3 as IDE. The first example of knn in python takes advantage of the iris data from sklearn lib. ###1. import data from sklearn.datasets import load_iris iris=load_iris() print(iris.keys()) print(‘\n”x:’,iris[‘feature_names’]) print(‘\n”y:’,iris[‘target_names’]) print(‘\n”type of data:’,type(iris[‘data’])) […]

Continue reading


Build Perceptron to Classify Iris Data with Python

It would be interesting to write some basic neuron function for classification, helping us refresh some essential points in neural network. Used sublime text3 and Ipython3 as IDE, and the code mostly came from: https://www.goodreads.com/book/show/25545994-python-machine-learning?from_search=true import pandas as pd df=pd.read_csv(‘iris.data’, header=None) def rstr(df): return df.shape, df.apply(lambda x:[x.unique()]) print(df.tail()) print(rstr(df)) import matplotlib.pyplot as plt import numpy […]

Continue reading


Using R’s ggplot within IPython Notebook

As a Data Scientist I often use Python to write quick scripts to transform/massage data. But for data visualization I love using R’s gggplot. Although there is a version of ggplot written in python, I found it be lacking lot of features as compared to its R’s counterpart. Luckily using IPython Notebook you can have […]

Continue reading