Dr.Manish Kumar Jain: June 2019

Saturday 22 June 2019

Test link

Link for dataset and Questions

https://urlzs.com/oQu2f

Link for google form

https://forms.gle/vEaWL2XnCTtYU1Y1A

# Load libraries
import pandas as pd
from sklearn.tree import DecisionTreeClassifier # Import Decision Tree Classifier
from sklearn.model_selection import train_test_split # Import train_test_split function
from sklearn import metrics #Import scikit-learn metrics module for accuracy calculation

col_names = ['pregnant', 'glucose', 'bp', 'skin', 'insulin', 'bmi', 'pedigree', 'age', 'label']
# load dataset
pima = pd.read_csv(r"C:\Users\Manish\Desktop\VNR CDC\Day 2\diabetes.csv", header=1, names=col_names)
pima.head()

#split dataset in features and target variable
feature_cols = ['pregnant', 'insulin', 'bmi', 'age','glucose','bp','pedigree']
X = pima[feature_cols] # Features
y = pima.label # Target variable

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1) # 70% training and 30% test

# Create Decision Tree classifer object
clf = DecisionTreeClassifier()

# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)

# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

from sklearn.tree import export_graphviz
from sklearn.externals.six import StringIO
from IPython.display import Image
import pydotplus
dot_data = StringIO()
export_graphviz(clf, out_file=dot_data,
filled=True, rounded=True,
special_characters=True,feature_names = feature_cols,class_names=['0','1'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_png('C:\\Users\\Manish\\Desktop\\VNR CDC\\Day 2\\diabetes.png')
Image(graph.create_png())

# Create Decision Tree classifer object
clf = DecisionTreeClassifier(criterion="entropy", max_depth=3)

# Train Decision Tree Classifer
clf = clf.fit(X_train,y_train)

#Predict the response for test dataset
y_pred = clf.predict(X_test)

# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

from sklearn.externals.six import StringIO
from IPython.display import Image
from sklearn.tree import export_graphviz
import pydotplus
dot_data = StringIO()
export_graphviz(clf, out_file=dot_data,
filled=True, rounded=True,
special_characters=True, feature_names = feature_cols,class_names=['0','1'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_png('diabetes.png')
Image(graph.create_png())

DT1

from sklearn import tree
clf = tree.DecisionTreeClassifier()

#[height, hair-length, voice-pitch]
X = [ [180, 15,0],
[167, 42,1],
[136, 35,1],
[174, 15,0],
[141, 28,1]]

Y = ['man', 'woman', 'woman', 'man', 'woman']

clf = clf.fit(X, Y)
prediction = clf.predict([[133, 37,1]])
print(prediction)

TF OD

Python Packages

!pip install tensorflow numpy scipy pillow matplotlib h5py keras

!pip install opencv-python

!pip install https://github.com/OlafenwaMoses/ImageAI/releases/download/2.0.1/imageai-2.0.1-py3-none-any.whl

Check PWD

import os

os.getcwd()

Keep trained dataset model and Image to PWD

from imageai.Detection import ObjectDetection

import os

execution_path = os.getcwd()

detector = ObjectDetection()

detector.setModelTypeAsRetinaNet()

detector.setModelPath( os.path.join(execution_path , "resnet50_coco_best_v2.0.1.h5"))

detector.loadModel()

detections = detector.detectObjectsFromImage(input_image=os.path.join(execution_path , "image.png"), output_image_path=os.path.join(execution_path , "image2new.jpg"), minimum_percentage_probability=30)

for eachObject in detections:

print(eachObject["name"] , " : ", eachObject["percentage_probability"])

print("--------------------------------")

detectObjectsFromImage() - function and parse in the path to our image, and the path to the new image which the function will save. Then the function returns an array of dictionaries with each dictionary corresponding to the number of objects detected in the image. Each dictionary has the properties name (name of the object),percentage_probability (percentage probability of the detection) and box_points ( the x1,y1,x2 and y2 coordinates of the bounding box of the object).

RetinaNet which is appropriate for high-performance and high-accuracy demanding detection tasks.

ImageAI provides very convenient and powerful methods to perform object detection on images and extract each object from the image. The object detection class supports RetinaNet, YOLOv3 and TinyYOLOv3. To start performing object detection, you must download the RetinaNet, YOLOv3 or TinyYOLOv3 object detection model

Types of ModelPath

- RetinaNet (Size = 145 mb, high performance and accuracy, with longer detection time)

- YOLOv3 (Size = 237 mb, moderate performance and accuracy, with a moderate detection time)

- TinyYOLOv3 (Size = 34 mb, optimized for speed and moderate performance, with fast detection time)

Download Link

https://github.com/OlafenwaMoses/ImageAI/tree/master/imageai/Detection

ANN

import numpy as np
feature_set = np.array([[0,1,0],[0,0,1],[1,0,0],[1,1,0],[1,1,1]])
labels = np.array([[1,0,0,1,1]])
labels = labels.reshape(5,1)

#######################################

np.random.seed(42) #random.seed function so that we can get the same random values whenever the script is executed.
weights = np.random.rand(3,1)
bias = np.random.rand(1)
lr = 0.05

##########################################

def sigmoid(x): #activation function is the sigmoid function
return 1/(1+np.exp(-x))

def sigmoid_der(x): #calculates the derivative of the sigmoid function
return sigmoid(x)*(1-sigmoid(x))

##########################################

#train our neural network that will be able to predict whether a person is obese or not.

# An epoch is basically the number of times we want to train the algorithm on our data.
#We will train the algorithm on our data 20,000 times. The ultimate goal is to minimize the error.

for epoch in range(20000):
inputs = feature_set

#Here we find the dot product of the input and the weight vector and add bias to it.

# feedforward step1
XW = np.dot(feature_set, weights) + bias

#We pass the dot product through the sigmoid activation function

#feedforward step2
z = sigmoid(XW)

#The variable z contains the predicted outputs. The first step of the backpropagation is to find the error.

# backpropagation step 1
error = z - labels

print(error.sum())

# backpropagation step 2
dcost_dpred = error
dpred_dz = sigmoid_der(z)

#Here we have the z_delta variable, which contains the product of dcost_dpred and dpred_dz.
#Instead of looping through each record and multiplying the input with corresponding z_delta,
#we take the transpose of the input feature matrix and multiply it with the z_delta.
#Finally, we multiply the learning rate variable lr with the derivative to increase the speed of convergence.

z_delta = dcost_dpred * dpred_dz

inputs = feature_set.T
weights -= lr * np.dot(inputs, z_delta)

for num in z_delta:
bias -= lr * num

################################

#You can see that error is extremely small at the end of the training of our neural network.
#At this point of time our weights and bias will have values that can be used to detect whether a person is diabetic or not,
#based on his smoking habits, obesity, and exercise habits.

#TEST : suppose we have a record of a patient that comes in who smokes, is not obese, and doesn't exercise.
#Let's find if he is likely to be diabetic or not. The input feature will look like this: [1,0,0].

single_point = np.array([1,0,0])
result = sigmoid(np.dot(single_point, weights) + bias)
print(result)

#let's test another person who doesn't, smoke, is obese, and doesn't exercises. The input feature vector will be [0,1,0]

single_point = np.array([0,1,0])
result = sigmoid(np.dot(single_point, weights) + bias)
print(result)

########################################

#value is very close to 1, which is likely due to the person's obesity.
#Multiply the result by 100 percent to convert the accuracy to a percentage. For our thermometer example:
#Relative accuracy = Accuracy x 100 percent = 0.968 x 100 percent = 96.8 percent

A=0.00707584 * 100
B=0.99837029 * 100
print(A)
print(B)

Friday 21 June 2019

KNN

import numpy as np
import pylab as pl
from sklearn import neighbors, datasets

# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features. 
Y = iris.target


h = .02 # step size in the mesh

knn=neighbors.KNeighborsClassifier()

# we create an instance of Neighbours Classifier and fit the data.
knn.fit(X, Y)

# Plot the decision boundary. For that, we will asign a color to each
# point in the mesh [x_min, m_max]x[y_min, y_max].
x_min, x_max = X[:,0].min() - .5, X[:,0].max() + .5
y_min, y_max = X[:,1].min() - .5, X[:,1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])

# Put the result into a color plot
Z = Z.reshape(xx.shape)
pl.figure(1, figsize=(4, 3))
pl.set_cmap(pl.cm.Paired)
pl.pcolormesh(xx, yy, Z)

# Plot also the training points
pl.scatter(X[:,0], X[:,1],c=Y )
pl.xlabel('Sepal length')
pl.ylabel('Sepal width')

pl.xlim(xx.min(), xx.max())
pl.ylim(yy.min(), yy.max())
pl.xticks(())
pl.yticks(())

pl.show()

Game of KNN

import tkinter

#...and for creating random numbers.

import random

#the list of possible colour.

colours = ['Red','Blue','Green','Pink','Black','Yellow','Orange','White','Purple','Brown']

#the player's score, initially 0.

score=0

#the game time left, initially 30 seconds.

timeleft=30

#a function that will start the game.

def startGame(event):

    #if there's still time left...

    if timeleft == 30:

        #start the countdown timer.

        countdown()

    #run the function to choose the next colour.

    nextColour()

#function to choose and display the next colour.

def nextColour():

    #use the globally declared 'score' and 'play' variables above.

    global score

    global timeleft

    #if a game is currently in play...

    if timeleft > 0:

        #...make the text entry box active.

        e.focus_set()

        #if the colour typed is equal to the colour of the text...

        if e.get().lower() == colours[1].lower():

            #...add one to the score.

            score += 1

        #clear the text entry box.

        e.delete(0, tkinter.END)

        #shuffle the list of colours.

        random.shuffle(colours)

        #change the colour to type, by changing the text _and_ the colour to a random colour value

        label.config(fg=str(colours[1]), text=str(colours[0]))

        #update the score.

        scoreLabel.config(text="Score: " + str(score))

#a countdown timer function.

def countdown():

    #use the globally declared 'play' variable above.

    global timeleft

    #if a game is in play...

    if timeleft > 0:

        #decrement the timer.

        timeleft -= 1

        #update the time left label.

        timeLabel.config(text="Time left: " + str(timeleft))

        #run the function again after 1 second.

        timeLabel.after(1000, countdown)

#create a GUI window.

root = tkinter.Tk()

#set the title.

root.title("TTCANTW")

#set the size.

root.geometry("375x200")

#add an instructions label.

instructions = tkinter.Label(root, text="Type in the colour of the words, and not the word text!", font=('Helvetica', 12))

instructions.pack()

#add a score label.

scoreLabel = tkinter.Label(root, text="Press enter to start", font=('Helvetica', 12))

scoreLabel.pack()

#add a time left label.

timeLabel = tkinter.Label(root, text="Time left: " + str(timeleft), font=('Helvetica', 12))

timeLabel.pack()

#add a label for displaying the colours.

label = tkinter.Label(root, font=('Helvetica', 60))

label.pack()

#add a text entry box for typing in colours.

e = tkinter.Entry(root)

#run the 'startGame' function when the enter key is pressed.

root.bind('<Return>', startGame)

e.pack()

#set focus on the entry box.

e.focus_set()

#start the GUI

root.mainloop()

Reg using GUI 1

from pandas import DataFrame
from sklearn import linear_model
import tkinter as tk
import statsmodels.api as sm

Stock_Market = {'Year': [2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016],
'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
'Unemployment_Rate': [5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
'Stock_Index_Price': [1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719]
}

df = DataFrame(Stock_Market,columns=['Year','Month','Interest_Rate','Unemployment_Rate','Stock_Index_Price'])

X = df[['Interest_Rate','Unemployment_Rate']] # here we have 2 input variables for multiple regression. If you just want to use one variable for simple linear regression, then use X = df['Interest_Rate'] for example.Alternatively, you may add additional variables within the brackets
Y = df['Stock_Index_Price'] # output variable (what we are trying to predict)

# with sklearn
regr = linear_model.LinearRegression()
regr.fit(X, Y)

print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)

# with statsmodels
X = sm.add_constant(X) # adding a constant

model = sm.OLS(Y, X).fit()
predictions = model.predict(X)

# tkinter GUI
root= tk.Tk()

canvas1 = tk.Canvas(root, width = 1200, height = 450)
canvas1.pack()

# with sklearn
Intercept_result = ('Intercept: ', regr.intercept_)
label_Intercept = tk.Label(root, text=Intercept_result, justify = 'center')
canvas1.create_window(260, 220, window=label_Intercept)

# with sklearn
Coefficients_result = ('Coefficients: ', regr.coef_)
label_Coefficients = tk.Label(root, text=Coefficients_result, justify = 'center')
canvas1.create_window(260, 240, window=label_Coefficients)

# with statsmodels
print_model = model.summary()
label_model = tk.Label(root, text=print_model, justify = 'center', relief = 'solid', bg='LightSkyBlue1')
canvas1.create_window(800, 220, window=label_model)

# New_Interest_Rate label and input box
label1 = tk.Label(root, text='Type Interest Rate: ')
canvas1.create_window(100, 100, window=label1)

entry1 = tk.Entry (root) # create 1st entry box
canvas1.create_window(270, 100, window=entry1)

# New_Unemployment_Rate label and input box
label2 = tk.Label(root, text=' Type Unemployment Rate: ')
canvas1.create_window(120, 120, window=label2)

entry2 = tk.Entry (root) # create 2nd entry box
canvas1.create_window(270, 120, window=entry2)

def values():
global New_Interest_Rate #our 1st input variable
New_Interest_Rate = float(entry1.get())

global New_Unemployment_Rate #our 2nd input variable
New_Unemployment_Rate = float(entry2.get())

Prediction_result = ('Predicted Stock Index Price: ', regr.predict([[New_Interest_Rate ,New_Unemployment_Rate]]))
label_Prediction = tk.Label(root, text= Prediction_result, bg='orange')
canvas1.create_window(260, 280, window=label_Prediction)

button1 = tk.Button (root, text='Predict Stock Index Price',command=values, bg='orange') # button to call the 'values' command above
canvas1.create_window(270, 150, window=button1)

root.mainloop()

Reg using GUI

from pandas import DataFrame
from sklearn import linear_model
import tkinter as tk
import matplotlib.pyplot as plt
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg

Stock_Market = {'Year': [2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016],
'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
'Unemployment_Rate': [5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
'Stock_Index_Price': [1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719]
}

df = DataFrame(Stock_Market,columns=['Year','Month','Interest_Rate','Unemployment_Rate','Stock_Index_Price'])

X = df[['Interest_Rate','Unemployment_Rate']].astype(float) # here we have 2 input variables for multiple regression. If you just want to use one variable for simple linear regression, then use X = df['Interest_Rate'] for example.Alternatively, you may add additional variables within the brackets
Y = df['Stock_Index_Price'].astype(float) # output variable (what we are trying to predict)

# with sklearn
regr = linear_model.LinearRegression()
regr.fit(X, Y)

print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)

# tkinter GUI
root= tk.Tk()

canvas1 = tk.Canvas(root, width = 500, height = 300)
canvas1.pack()

# with sklearn
Intercept_result = ('Intercept: ', regr.intercept_)
label_Intercept = tk.Label(root, text=Intercept_result, justify = 'center')
canvas1.create_window(260, 220, window=label_Intercept)

# with sklearn
Coefficients_result = ('Coefficients: ', regr.coef_)
label_Coefficients = tk.Label(root, text=Coefficients_result, justify = 'center')
canvas1.create_window(260, 240, window=label_Coefficients)

# New_Interest_Rate label and input box
label1 = tk.Label(root, text='Type Interest Rate: ')
canvas1.create_window(100, 100, window=label1)

entry1 = tk.Entry (root) # create 1st entry box
canvas1.create_window(270, 100, window=entry1)

# New_Unemployment_Rate label and input box
label2 = tk.Label(root, text=' Type Unemployment Rate: ')
canvas1.create_window(120, 120, window=label2)

entry2 = tk.Entry (root) # create 2nd entry box
canvas1.create_window(270, 120, window=entry2)

def values():
global New_Interest_Rate #our 1st input variable
New_Interest_Rate = float(entry1.get())

global New_Unemployment_Rate #our 2nd input variable
New_Unemployment_Rate = float(entry2.get())

Prediction_result = ('Predicted Stock Index Price: ', regr.predict([[New_Interest_Rate ,New_Unemployment_Rate]]))
label_Prediction = tk.Label(root, text= Prediction_result, bg='orange')
canvas1.create_window(260, 280, window=label_Prediction)

button1 = tk.Button (root, text='Predict Stock Index Price',command=values, bg='orange') # button to call the 'values' command above
canvas1.create_window(270, 150, window=button1)

#plot 1st scatter
figure3 = plt.Figure(figsize=(5,4), dpi=100)
ax3 = figure3.add_subplot(111)
ax3.scatter(df['Interest_Rate'].astype(float),df['Stock_Index_Price'].astype(float), color = 'r')
scatter3 = FigureCanvasTkAgg(figure3, root)
scatter3.get_tk_widget().pack(side=tk.RIGHT, fill=tk.BOTH)
ax3.legend()
ax3.set_xlabel('Interest Rate')
ax3.set_title('Interest Rate Vs. Stock Index Price')

#plot 2nd scatter
figure4 = plt.Figure(figsize=(5,4), dpi=100)
ax4 = figure4.add_subplot(111)
ax4.scatter(df['Unemployment_Rate'].astype(float),df['Stock_Index_Price'].astype(float), color = 'g')
scatter4 = FigureCanvasTkAgg(figure4, root)
scatter4.get_tk_widget().pack(side=tk.RIGHT, fill=tk.BOTH)
ax4.legend()
ax4.set_xlabel('Unemployment_Rate')
ax4.set_title('Unemployment_Rate Vs. Stock Index Price')

root.mainloop()

Thursday 20 June 2019

Projects AzureML

Word cloud

https://gallery.azure.ai/Experiment/40c882dcbb5345ada0a5a1cdd996d3f1

Human Resources Analytics - Why employees are leaving

https://gallery.azure.ai/Experiment/Human-Resources-Analytics-Why-employees-are-leaving

Model

https://gallery.azure.ai/Experiment/Predictive-Experiment-Mini-Twitter-sentiment-analysis-2

https://gallery.azure.ai/Experiment/Training-Experiment-Mini-Twitter-sentiment-analysis-1

Wednesday 19 June 2019

SQL1 -MLAzure

SELECT
t3.userID AS userID,
t3.placeID AS placeID,
(2015-t1.birth_year) AS age,
weight,
height,
budget,
color,
activity,
religion,
personality,
interest,
hijos,
marital_status,
transport,
ambience,
dress_preference,
drink_level,
smoker,
abs(t1.latitude-t2.latitude) AS latitude_diff,
abs(t1.longitude-t2.longitude) AS longitude_diff,
city,
state,
country,
alcohol,
smoking_area,
dress_code,
accessibility,
price,
Rambience,
franchise,
area,
other_services,
rating
FROM t1, t2, t3
WHERE t1.userID =t3.userID
AND t2.placeID=t3.placeID;

Dr.Manish Kumar Jain