Tensorflow and Scikit learn: Same solution but different outputs

Multi tool use

Im implementing a simple linear regression with scikitlearn and tensorflow.

My solution in scikitlearn seem fine but with tensorflow my evaluation output is showing some crazy numbers.

The problem is basically to try to predict a salary based in years of experience.

I not sure what Im doing wrong in Tensorflow's code.

Thanks!

ScikitLearn solution

import pandas as pd

data = pd.read_csv('Salary_Data.csv') 



X = data.iloc[:, :-1].values

y = data.iloc[:, 1].values



from sklearn.model_selection import train_test_split



X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)



from sklearn.linear_model import LinearRegression



regressor = LinearRegression()

regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)



X_single_data = [[4.6]]

y_single_pred = regressor.predict(X_single_data)



print(f'Train score: {regressor.score(X_train, y_train)}')

print(f'Test  score: {regressor.score(X_test, y_test)}')

Train score: 0.960775692121653

Test score: 0.9248580247217076

Tensorflow solution

import tensorflow as tf



f_cols = [tf.feature_column.numeric_column(key='X', shape=[1])]

estimator = tf.estimator.LinearRegressor(feature_columns=f_cols)





train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train,shuffle=False)



test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test,shuffle=False)





train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn)

eval_spec = tf.estimator.EvalSpec(input_fn=test_input_fn)



tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

({'average_loss': 7675087400.0,

'label/mean': 84588.11,

'loss': 69075790000.0,

'prediction/mean': 5.0796494,

'global_step': 6},

)

Data

YearsExperience,Salary

1.1,39343.00

1.3,46205.00

1.5,37731.00

2.0,43525.00

2.2,39891.00

2.9,56642.00

3.0,60150.00

3.2,54445.00

3.2,64445.00

3.7,57189.00

3.9,63218.00

4.0,55794.00

4.0,56957.00

4.1,57081.00

4.5,61111.00

4.9,67938.00

5.1,66029.00

5.3,83088.00

5.9,81363.00

6.0,93940.00

6.8,91738.00

7.1,98273.00

7.9,101302.00

8.2,113812.00

8.7,109431.00

9.0,105582.00

9.5,116969.00

9.6,112635.00

10.3,122391.00

10.5,121872.00

asked Nov 22 '18 at 5:03

gabrielpe

add a comment |

Im implementing a simple linear regression with scikitlearn and tensorflow.

My solution in scikitlearn seem fine but with tensorflow my evaluation output is showing some crazy numbers.

The problem is basically to try to predict a salary based in years of experience.

I not sure what Im doing wrong in Tensorflow's code.

Thanks!

ScikitLearn solution

import pandas as pd

data = pd.read_csv('Salary_Data.csv') 



X = data.iloc[:, :-1].values

y = data.iloc[:, 1].values



from sklearn.model_selection import train_test_split



X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)



from sklearn.linear_model import LinearRegression



regressor = LinearRegression()

regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)



X_single_data = [[4.6]]

y_single_pred = regressor.predict(X_single_data)



print(f'Train score: {regressor.score(X_train, y_train)}')

print(f'Test  score: {regressor.score(X_test, y_test)}')

Train score: 0.960775692121653

Test score: 0.9248580247217076

Tensorflow solution

import tensorflow as tf



f_cols = [tf.feature_column.numeric_column(key='X', shape=[1])]

estimator = tf.estimator.LinearRegressor(feature_columns=f_cols)





train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train,shuffle=False)



test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test,shuffle=False)





train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn)

eval_spec = tf.estimator.EvalSpec(input_fn=test_input_fn)



tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

({'average_loss': 7675087400.0,

'label/mean': 84588.11,

'loss': 69075790000.0,

'prediction/mean': 5.0796494,

'global_step': 6},

)

Data

YearsExperience,Salary

1.1,39343.00

1.3,46205.00

1.5,37731.00

2.0,43525.00

2.2,39891.00

2.9,56642.00

3.0,60150.00

3.2,54445.00

3.2,64445.00

3.7,57189.00

3.9,63218.00

4.0,55794.00

4.0,56957.00

4.1,57081.00

4.5,61111.00

4.9,67938.00

5.1,66029.00

5.3,83088.00

5.9,81363.00

6.0,93940.00

6.8,91738.00

7.1,98273.00

7.9,101302.00

8.2,113812.00

8.7,109431.00

9.0,105582.00

9.5,116969.00

9.6,112635.00

10.3,122391.00

10.5,121872.00

asked Nov 22 '18 at 5:03

gabrielpe

add a comment |

Im implementing a simple linear regression with scikitlearn and tensorflow.

My solution in scikitlearn seem fine but with tensorflow my evaluation output is showing some crazy numbers.

The problem is basically to try to predict a salary based in years of experience.

I not sure what Im doing wrong in Tensorflow's code.

Thanks!

ScikitLearn solution

import pandas as pd

data = pd.read_csv('Salary_Data.csv') 



X = data.iloc[:, :-1].values

y = data.iloc[:, 1].values



from sklearn.model_selection import train_test_split



X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)



from sklearn.linear_model import LinearRegression



regressor = LinearRegression()

regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)



X_single_data = [[4.6]]

y_single_pred = regressor.predict(X_single_data)



print(f'Train score: {regressor.score(X_train, y_train)}')

print(f'Test  score: {regressor.score(X_test, y_test)}')

Train score: 0.960775692121653

Test score: 0.9248580247217076

Tensorflow solution

import tensorflow as tf



f_cols = [tf.feature_column.numeric_column(key='X', shape=[1])]

estimator = tf.estimator.LinearRegressor(feature_columns=f_cols)





train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train,shuffle=False)



test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test,shuffle=False)





train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn)

eval_spec = tf.estimator.EvalSpec(input_fn=test_input_fn)



tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

({'average_loss': 7675087400.0,

'label/mean': 84588.11,

'loss': 69075790000.0,

'prediction/mean': 5.0796494,

'global_step': 6},

)

Data

YearsExperience,Salary

1.1,39343.00

1.3,46205.00

1.5,37731.00

2.0,43525.00

2.2,39891.00

2.9,56642.00

3.0,60150.00

3.2,54445.00

3.2,64445.00

3.7,57189.00

3.9,63218.00

4.0,55794.00

4.0,56957.00

4.1,57081.00

4.5,61111.00

4.9,67938.00

5.1,66029.00

5.3,83088.00

5.9,81363.00

6.0,93940.00

6.8,91738.00

7.1,98273.00

7.9,101302.00

8.2,113812.00

8.7,109431.00

9.0,105582.00

9.5,116969.00

9.6,112635.00

10.3,122391.00

10.5,121872.00

asked Nov 22 '18 at 5:03

gabrielpe

Im implementing a simple linear regression with scikitlearn and tensorflow.

My solution in scikitlearn seem fine but with tensorflow my evaluation output is showing some crazy numbers.

The problem is basically to try to predict a salary based in years of experience.

I not sure what Im doing wrong in Tensorflow's code.

Thanks!

ScikitLearn solution

import pandas as pd

data = pd.read_csv('Salary_Data.csv') 



X = data.iloc[:, :-1].values

y = data.iloc[:, 1].values



from sklearn.model_selection import train_test_split



X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)



from sklearn.linear_model import LinearRegression



regressor = LinearRegression()

regressor.fit(X_train, y_train)

y_pred = regressor.predict(X_test)



X_single_data = [[4.6]]

y_single_pred = regressor.predict(X_single_data)



print(f'Train score: {regressor.score(X_train, y_train)}')

print(f'Test  score: {regressor.score(X_test, y_test)}')

Train score: 0.960775692121653

Test score: 0.9248580247217076

Tensorflow solution

import tensorflow as tf



f_cols = [tf.feature_column.numeric_column(key='X', shape=[1])]

estimator = tf.estimator.LinearRegressor(feature_columns=f_cols)





train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train,shuffle=False)



test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test,shuffle=False)





train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn)

eval_spec = tf.estimator.EvalSpec(input_fn=test_input_fn)



tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)

({'average_loss': 7675087400.0,

'label/mean': 84588.11,

'loss': 69075790000.0,

'prediction/mean': 5.0796494,

'global_step': 6},

)

Data

YearsExperience,Salary

1.1,39343.00

1.3,46205.00

1.5,37731.00

2.0,43525.00

2.2,39891.00

2.9,56642.00

3.0,60150.00

3.2,54445.00

3.2,64445.00

3.7,57189.00

3.9,63218.00

4.0,55794.00

4.0,56957.00

4.1,57081.00

4.5,61111.00

4.9,67938.00

5.1,66029.00

5.3,83088.00

5.9,81363.00

6.0,93940.00

6.8,91738.00

7.1,98273.00

7.9,101302.00

8.2,113812.00

8.7,109431.00

9.0,105582.00

9.5,116969.00

9.6,112635.00

10.3,122391.00

10.5,121872.00

machine-learning scikit-learn linear-regression tensorflow-estimator

asked Nov 22 '18 at 5:03

gabrielpe

asked Nov 22 '18 at 5:03

gabrielpe

asked Nov 22 '18 at 5:03

gabrielpe

asked Nov 22 '18 at 5:03

gabrielpe

asked Nov 22 '18 at 5:03

gabrielpe

add a comment |

2 Answers
2

active

oldest

votes

Per your code request in the comments: Though I had used my online curve and surface fitting web site zunzun.com for this equation at http://zunzun.com/Equation/2/Sigmoidal/Sigmoid%20B/ for the modeling work, here is a graphing source code example using the scipy differential_evolution genetic algorithm module to estimate initial parameter estimates. The scipy implementation of Differential Evolution uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search - in this example those bounds are taken from the data maximum and minimum values, and the fit statistics and parameter values are almost identical to those from the web site.

import numpy, scipy, matplotlib

import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

from scipy.optimize import differential_evolution

import warnings



xData = numpy.array([ 1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])

yData = numpy.array([ 39.343, 46.205, 37.731, 43.525, 39.891, 56.642, 60.15, 54.445, 64.445, 57.189, 63.218, 55.794, 56.957, 57.081, 61.111, 67.938, 66.029, 83.088, 81.363, 93.94, 91.738, 98.273, 101.302, 113.812, 109.431, 105.582, 116.969, 112.635, 122.391, 121.872])





def func(x, a, b, c):

    return a / (1.0 + numpy.exp(-(x-b)/c))





# function for genetic algorithm to minimize (sum of squared error)

def sumOfSquaredError(parameterTuple):

    warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm

    val = func(xData, *parameterTuple)

    return numpy.sum((yData - val) ** 2.0)





def generate_Initial_Parameters():

    # min and max used for bounds

    maxX = max(xData)

    minX = min(xData)

    maxY = max(yData)

    minY = min(yData)



    parameterBounds = 

    parameterBounds.append([minY, maxY]) # search bounds for a

    parameterBounds.append([minX, maxX]) # search bounds for b

    parameterBounds.append([minX, maxX]) # search bounds for c



    # "seed" the numpy random number generator for repeatable results

    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)

    return result.x



# by default, differential_evolution completes by calling curve_fit() using parameter bounds

geneticParameters = generate_Initial_Parameters()



# now call curve_fit without passing bounds from the genetic algorithm,

# just in case the best fit parameters are aoutside those bounds

fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)

print('Fitted parameters:', fittedParameters)

print()



modelPredictions = func(xData, *fittedParameters) 



absError = modelPredictions - yData



SE = numpy.square(absError) # squared errors

MSE = numpy.mean(SE) # mean squared errors

RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE

Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))



print()

print('RMSE:', RMSE)

print('R-squared:', Rsquared)



print()





##########################################################

# graphics output section

def ModelAndScatterPlot(graphWidth, graphHeight):

    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)

    axes = f.add_subplot(111)



    # first the raw data as a scatter plot

    axes.plot(xData, yData,  'D')



    # create data for the fitted equation plot

    xModel = numpy.linspace(min(xData), max(xData))

    yModel = func(xModel, *fittedParameters)



    # now the model as a line plot

    axes.plot(xModel, yModel)



    axes.set_xlabel('Years of experience') # X axis data label

    axes.set_ylabel('Salary in thousands') # Y axis data label



    plt.show()

    plt.close('all') # clean up after using pyplot



graphWidth = 800

graphHeight = 600

ModelAndScatterPlot(graphWidth, graphHeight)

edited Nov 22 '18 at 15:14

answered Nov 22 '18 at 15:02

James Phillips

1,494387

add a comment |

I cannot place an image in a comment, and so place it here. I suspected the relationship might be sigmoidal rather than linear, and found the following sigmoidal equation and fit statistics using units of thousands for salary: "y = a / (1.0 + exp(-(x-b)/c))" with fitted parameters a = 1.5535069418318591E+02, b = 5.4580059234664899E+00, and c = 3.7724942500630938E+00 giving an R-squared = 0.96 and RMSE = 5.30 (thousand)

answered Nov 22 '18 at 12:46

James Phillips

1,494387

Thanks for your help. Do you mind posting your code here? I put my solution on github, please check how I could find a linear solution solution with scikit learn github.com/gabrielpsilva/ai-study-models/blob/master/… I'm still on my first steps, learning by examples :)

– gabrielpe
Nov 22 '18 at 13:29

I cannot format code in a comment, and so posted it as a second answer.

– James Phillips
Nov 22 '18 at 15:09

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424181%2ftensorflow-and-scikit-learn-same-solution-but-different-outputs%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

import numpy, scipy, matplotlib

import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

from scipy.optimize import differential_evolution

import warnings



xData = numpy.array([ 1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])

yData = numpy.array([ 39.343, 46.205, 37.731, 43.525, 39.891, 56.642, 60.15, 54.445, 64.445, 57.189, 63.218, 55.794, 56.957, 57.081, 61.111, 67.938, 66.029, 83.088, 81.363, 93.94, 91.738, 98.273, 101.302, 113.812, 109.431, 105.582, 116.969, 112.635, 122.391, 121.872])





def func(x, a, b, c):

    return a / (1.0 + numpy.exp(-(x-b)/c))





# function for genetic algorithm to minimize (sum of squared error)

def sumOfSquaredError(parameterTuple):

    warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm

    val = func(xData, *parameterTuple)

    return numpy.sum((yData - val) ** 2.0)





def generate_Initial_Parameters():

    # min and max used for bounds

    maxX = max(xData)

    minX = min(xData)

    maxY = max(yData)

    minY = min(yData)



    parameterBounds = 

    parameterBounds.append([minY, maxY]) # search bounds for a

    parameterBounds.append([minX, maxX]) # search bounds for b

    parameterBounds.append([minX, maxX]) # search bounds for c



    # "seed" the numpy random number generator for repeatable results

    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)

    return result.x



# by default, differential_evolution completes by calling curve_fit() using parameter bounds

geneticParameters = generate_Initial_Parameters()



# now call curve_fit without passing bounds from the genetic algorithm,

# just in case the best fit parameters are aoutside those bounds

fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)

print('Fitted parameters:', fittedParameters)

print()



modelPredictions = func(xData, *fittedParameters) 



absError = modelPredictions - yData



SE = numpy.square(absError) # squared errors

MSE = numpy.mean(SE) # mean squared errors

RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE

Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))



print()

print('RMSE:', RMSE)

print('R-squared:', Rsquared)



print()





##########################################################

# graphics output section

def ModelAndScatterPlot(graphWidth, graphHeight):

    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)

    axes = f.add_subplot(111)



    # first the raw data as a scatter plot

    axes.plot(xData, yData,  'D')



    # create data for the fitted equation plot

    xModel = numpy.linspace(min(xData), max(xData))

    yModel = func(xModel, *fittedParameters)



    # now the model as a line plot

    axes.plot(xModel, yModel)



    axes.set_xlabel('Years of experience') # X axis data label

    axes.set_ylabel('Salary in thousands') # Y axis data label



    plt.show()

    plt.close('all') # clean up after using pyplot



graphWidth = 800

graphHeight = 600

ModelAndScatterPlot(graphWidth, graphHeight)

edited Nov 22 '18 at 15:14

answered Nov 22 '18 at 15:02

James Phillips

1,494387

add a comment |

import numpy, scipy, matplotlib

import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

from scipy.optimize import differential_evolution

import warnings



xData = numpy.array([ 1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])

yData = numpy.array([ 39.343, 46.205, 37.731, 43.525, 39.891, 56.642, 60.15, 54.445, 64.445, 57.189, 63.218, 55.794, 56.957, 57.081, 61.111, 67.938, 66.029, 83.088, 81.363, 93.94, 91.738, 98.273, 101.302, 113.812, 109.431, 105.582, 116.969, 112.635, 122.391, 121.872])





def func(x, a, b, c):

    return a / (1.0 + numpy.exp(-(x-b)/c))





# function for genetic algorithm to minimize (sum of squared error)

def sumOfSquaredError(parameterTuple):

    warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm

    val = func(xData, *parameterTuple)

    return numpy.sum((yData - val) ** 2.0)





def generate_Initial_Parameters():

    # min and max used for bounds

    maxX = max(xData)

    minX = min(xData)

    maxY = max(yData)

    minY = min(yData)



    parameterBounds = 

    parameterBounds.append([minY, maxY]) # search bounds for a

    parameterBounds.append([minX, maxX]) # search bounds for b

    parameterBounds.append([minX, maxX]) # search bounds for c



    # "seed" the numpy random number generator for repeatable results

    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)

    return result.x



# by default, differential_evolution completes by calling curve_fit() using parameter bounds

geneticParameters = generate_Initial_Parameters()



# now call curve_fit without passing bounds from the genetic algorithm,

# just in case the best fit parameters are aoutside those bounds

fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)

print('Fitted parameters:', fittedParameters)

print()



modelPredictions = func(xData, *fittedParameters) 



absError = modelPredictions - yData



SE = numpy.square(absError) # squared errors

MSE = numpy.mean(SE) # mean squared errors

RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE

Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))



print()

print('RMSE:', RMSE)

print('R-squared:', Rsquared)



print()





##########################################################

# graphics output section

def ModelAndScatterPlot(graphWidth, graphHeight):

    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)

    axes = f.add_subplot(111)



    # first the raw data as a scatter plot

    axes.plot(xData, yData,  'D')



    # create data for the fitted equation plot

    xModel = numpy.linspace(min(xData), max(xData))

    yModel = func(xModel, *fittedParameters)



    # now the model as a line plot

    axes.plot(xModel, yModel)



    axes.set_xlabel('Years of experience') # X axis data label

    axes.set_ylabel('Salary in thousands') # Y axis data label



    plt.show()

    plt.close('all') # clean up after using pyplot



graphWidth = 800

graphHeight = 600

ModelAndScatterPlot(graphWidth, graphHeight)

edited Nov 22 '18 at 15:14

answered Nov 22 '18 at 15:02

James Phillips

1,494387

add a comment |

import numpy, scipy, matplotlib

import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

from scipy.optimize import differential_evolution

import warnings



xData = numpy.array([ 1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])

yData = numpy.array([ 39.343, 46.205, 37.731, 43.525, 39.891, 56.642, 60.15, 54.445, 64.445, 57.189, 63.218, 55.794, 56.957, 57.081, 61.111, 67.938, 66.029, 83.088, 81.363, 93.94, 91.738, 98.273, 101.302, 113.812, 109.431, 105.582, 116.969, 112.635, 122.391, 121.872])





def func(x, a, b, c):

    return a / (1.0 + numpy.exp(-(x-b)/c))





# function for genetic algorithm to minimize (sum of squared error)

def sumOfSquaredError(parameterTuple):

    warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm

    val = func(xData, *parameterTuple)

    return numpy.sum((yData - val) ** 2.0)





def generate_Initial_Parameters():

    # min and max used for bounds

    maxX = max(xData)

    minX = min(xData)

    maxY = max(yData)

    minY = min(yData)



    parameterBounds = 

    parameterBounds.append([minY, maxY]) # search bounds for a

    parameterBounds.append([minX, maxX]) # search bounds for b

    parameterBounds.append([minX, maxX]) # search bounds for c



    # "seed" the numpy random number generator for repeatable results

    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)

    return result.x



# by default, differential_evolution completes by calling curve_fit() using parameter bounds

geneticParameters = generate_Initial_Parameters()



# now call curve_fit without passing bounds from the genetic algorithm,

# just in case the best fit parameters are aoutside those bounds

fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)

print('Fitted parameters:', fittedParameters)

print()



modelPredictions = func(xData, *fittedParameters) 



absError = modelPredictions - yData



SE = numpy.square(absError) # squared errors

MSE = numpy.mean(SE) # mean squared errors

RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE

Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))



print()

print('RMSE:', RMSE)

print('R-squared:', Rsquared)



print()





##########################################################

# graphics output section

def ModelAndScatterPlot(graphWidth, graphHeight):

    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)

    axes = f.add_subplot(111)



    # first the raw data as a scatter plot

    axes.plot(xData, yData,  'D')



    # create data for the fitted equation plot

    xModel = numpy.linspace(min(xData), max(xData))

    yModel = func(xModel, *fittedParameters)



    # now the model as a line plot

    axes.plot(xModel, yModel)



    axes.set_xlabel('Years of experience') # X axis data label

    axes.set_ylabel('Salary in thousands') # Y axis data label



    plt.show()

    plt.close('all') # clean up after using pyplot



graphWidth = 800

graphHeight = 600

ModelAndScatterPlot(graphWidth, graphHeight)

edited Nov 22 '18 at 15:14

answered Nov 22 '18 at 15:02

James Phillips

1,494387

import numpy, scipy, matplotlib

import matplotlib.pyplot as plt

from scipy.optimize import curve_fit

from scipy.optimize import differential_evolution

import warnings



xData = numpy.array([ 1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])

yData = numpy.array([ 39.343, 46.205, 37.731, 43.525, 39.891, 56.642, 60.15, 54.445, 64.445, 57.189, 63.218, 55.794, 56.957, 57.081, 61.111, 67.938, 66.029, 83.088, 81.363, 93.94, 91.738, 98.273, 101.302, 113.812, 109.431, 105.582, 116.969, 112.635, 122.391, 121.872])





def func(x, a, b, c):

    return a / (1.0 + numpy.exp(-(x-b)/c))





# function for genetic algorithm to minimize (sum of squared error)

def sumOfSquaredError(parameterTuple):

    warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm

    val = func(xData, *parameterTuple)

    return numpy.sum((yData - val) ** 2.0)





def generate_Initial_Parameters():

    # min and max used for bounds

    maxX = max(xData)

    minX = min(xData)

    maxY = max(yData)

    minY = min(yData)



    parameterBounds = 

    parameterBounds.append([minY, maxY]) # search bounds for a

    parameterBounds.append([minX, maxX]) # search bounds for b

    parameterBounds.append([minX, maxX]) # search bounds for c



    # "seed" the numpy random number generator for repeatable results

    result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)

    return result.x



# by default, differential_evolution completes by calling curve_fit() using parameter bounds

geneticParameters = generate_Initial_Parameters()



# now call curve_fit without passing bounds from the genetic algorithm,

# just in case the best fit parameters are aoutside those bounds

fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)

print('Fitted parameters:', fittedParameters)

print()



modelPredictions = func(xData, *fittedParameters) 



absError = modelPredictions - yData



SE = numpy.square(absError) # squared errors

MSE = numpy.mean(SE) # mean squared errors

RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE

Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))



print()

print('RMSE:', RMSE)

print('R-squared:', Rsquared)



print()





##########################################################

# graphics output section

def ModelAndScatterPlot(graphWidth, graphHeight):

    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)

    axes = f.add_subplot(111)



    # first the raw data as a scatter plot

    axes.plot(xData, yData,  'D')



    # create data for the fitted equation plot

    xModel = numpy.linspace(min(xData), max(xData))

    yModel = func(xModel, *fittedParameters)



    # now the model as a line plot

    axes.plot(xModel, yModel)



    axes.set_xlabel('Years of experience') # X axis data label

    axes.set_ylabel('Salary in thousands') # Y axis data label



    plt.show()

    plt.close('all') # clean up after using pyplot



graphWidth = 800

graphHeight = 600

ModelAndScatterPlot(graphWidth, graphHeight)

edited Nov 22 '18 at 15:14

answered Nov 22 '18 at 15:02

James Phillips

1,494387

edited Nov 22 '18 at 15:14

answered Nov 22 '18 at 15:02

James Phillips

1,494387

answered Nov 22 '18 at 15:02

James Phillips

1,494387

answered Nov 22 '18 at 15:02

James Phillips

1,494387

add a comment |

answered Nov 22 '18 at 12:46

James Phillips

1,494387

Thanks for your help. Do you mind posting your code here? I put my solution on github, please check how I could find a linear solution solution with scikit learn github.com/gabrielpsilva/ai-study-models/blob/master/… I'm still on my first steps, learning by examples :)

– gabrielpe
Nov 22 '18 at 13:29

I cannot format code in a comment, and so posted it as a second answer.

– James Phillips
Nov 22 '18 at 15:09

add a comment |

answered Nov 22 '18 at 12:46

James Phillips

1,494387

Thanks for your help. Do you mind posting your code here? I put my solution on github, please check how I could find a linear solution solution with scikit learn github.com/gabrielpsilva/ai-study-models/blob/master/… I'm still on my first steps, learning by examples :)

– gabrielpe
Nov 22 '18 at 13:29

I cannot format code in a comment, and so posted it as a second answer.

– James Phillips
Nov 22 '18 at 15:09

add a comment |

answered Nov 22 '18 at 12:46

James Phillips

1,494387

answered Nov 22 '18 at 12:46

James Phillips

1,494387

answered Nov 22 '18 at 12:46

James Phillips

1,494387

answered Nov 22 '18 at 12:46

James Phillips

1,494387

answered Nov 22 '18 at 12:46

James Phillips

1,494387

Thanks for your help. Do you mind posting your code here? I put my solution on github, please check how I could find a linear solution solution with scikit learn github.com/gabrielpsilva/ai-study-models/blob/master/… I'm still on my first steps, learning by examples :)

– gabrielpe
Nov 22 '18 at 13:29

I cannot format code in a comment, and so posted it as a second answer.

– James Phillips
Nov 22 '18 at 15:09

add a comment |

Thanks for your help. Do you mind posting your code here? I put my solution on github, please check how I could find a linear solution solution with scikit learn github.com/gabrielpsilva/ai-study-models/blob/master/… I'm still on my first steps, learning by examples :)

– gabrielpe
Nov 22 '18 at 13:29

I cannot format code in a comment, and so posted it as a second answer.

– James Phillips
Nov 22 '18 at 15:09

Thanks for your help. Do you mind posting your code here? I put my solution on github, please check how I could find a linear solution solution with scikit learn github.com/gabrielpsilva/ai-study-models/blob/master/… I'm still on my first steps, learning by examples :)

– gabrielpe
Nov 22 '18 at 13:29

I cannot format code in a comment, and so posted it as a second answer.

– James Phillips
Nov 22 '18 at 15:09

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

qjK7tMgJ13bQH7x

搜尋此網誌

Nsryjdtyk