Tensorflow and Scikit learn: Same solution but different outputs












1















Im implementing a simple linear regression with scikitlearn and tensorflow.



My solution in scikitlearn seem fine but with tensorflow my evaluation output is showing some crazy numbers.



The problem is basically to try to predict a salary based in years of experience.



I not sure what Im doing wrong in Tensorflow's code.



Thanks!



ScikitLearn solution



import pandas as pd
data = pd.read_csv('Salary_Data.csv')

X = data.iloc[:, :-1].values
y = data.iloc[:, 1].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)

X_single_data = [[4.6]]
y_single_pred = regressor.predict(X_single_data)

print(f'Train score: {regressor.score(X_train, y_train)}')
print(f'Test score: {regressor.score(X_test, y_test)}')



Train score: 0.960775692121653



Test score: 0.9248580247217076




Tensorflow solution



import tensorflow as tf

f_cols = [tf.feature_column.numeric_column(key='X', shape=[1])]
estimator = tf.estimator.LinearRegressor(feature_columns=f_cols)


train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train,shuffle=False)

test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test,shuffle=False)


train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn)
eval_spec = tf.estimator.EvalSpec(input_fn=test_input_fn)

tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)



({'average_loss': 7675087400.0,



'label/mean': 84588.11,



'loss': 69075790000.0,



'prediction/mean': 5.0796494,



'global_step': 6},



)




Data



YearsExperience,Salary
1.1,39343.00
1.3,46205.00
1.5,37731.00
2.0,43525.00
2.2,39891.00
2.9,56642.00
3.0,60150.00
3.2,54445.00
3.2,64445.00
3.7,57189.00
3.9,63218.00
4.0,55794.00
4.0,56957.00
4.1,57081.00
4.5,61111.00
4.9,67938.00
5.1,66029.00
5.3,83088.00
5.9,81363.00
6.0,93940.00
6.8,91738.00
7.1,98273.00
7.9,101302.00
8.2,113812.00
8.7,109431.00
9.0,105582.00
9.5,116969.00
9.6,112635.00
10.3,122391.00
10.5,121872.00









share|improve this question



























    1















    Im implementing a simple linear regression with scikitlearn and tensorflow.



    My solution in scikitlearn seem fine but with tensorflow my evaluation output is showing some crazy numbers.



    The problem is basically to try to predict a salary based in years of experience.



    I not sure what Im doing wrong in Tensorflow's code.



    Thanks!



    ScikitLearn solution



    import pandas as pd
    data = pd.read_csv('Salary_Data.csv')

    X = data.iloc[:, :-1].values
    y = data.iloc[:, 1].values

    from sklearn.model_selection import train_test_split

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

    from sklearn.linear_model import LinearRegression

    regressor = LinearRegression()
    regressor.fit(X_train, y_train)
    y_pred = regressor.predict(X_test)

    X_single_data = [[4.6]]
    y_single_pred = regressor.predict(X_single_data)

    print(f'Train score: {regressor.score(X_train, y_train)}')
    print(f'Test score: {regressor.score(X_test, y_test)}')



    Train score: 0.960775692121653



    Test score: 0.9248580247217076




    Tensorflow solution



    import tensorflow as tf

    f_cols = [tf.feature_column.numeric_column(key='X', shape=[1])]
    estimator = tf.estimator.LinearRegressor(feature_columns=f_cols)


    train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train,shuffle=False)

    test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test,shuffle=False)


    train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn)
    eval_spec = tf.estimator.EvalSpec(input_fn=test_input_fn)

    tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)



    ({'average_loss': 7675087400.0,



    'label/mean': 84588.11,



    'loss': 69075790000.0,



    'prediction/mean': 5.0796494,



    'global_step': 6},



    )




    Data



    YearsExperience,Salary
    1.1,39343.00
    1.3,46205.00
    1.5,37731.00
    2.0,43525.00
    2.2,39891.00
    2.9,56642.00
    3.0,60150.00
    3.2,54445.00
    3.2,64445.00
    3.7,57189.00
    3.9,63218.00
    4.0,55794.00
    4.0,56957.00
    4.1,57081.00
    4.5,61111.00
    4.9,67938.00
    5.1,66029.00
    5.3,83088.00
    5.9,81363.00
    6.0,93940.00
    6.8,91738.00
    7.1,98273.00
    7.9,101302.00
    8.2,113812.00
    8.7,109431.00
    9.0,105582.00
    9.5,116969.00
    9.6,112635.00
    10.3,122391.00
    10.5,121872.00









    share|improve this question

























      1












      1








      1








      Im implementing a simple linear regression with scikitlearn and tensorflow.



      My solution in scikitlearn seem fine but with tensorflow my evaluation output is showing some crazy numbers.



      The problem is basically to try to predict a salary based in years of experience.



      I not sure what Im doing wrong in Tensorflow's code.



      Thanks!



      ScikitLearn solution



      import pandas as pd
      data = pd.read_csv('Salary_Data.csv')

      X = data.iloc[:, :-1].values
      y = data.iloc[:, 1].values

      from sklearn.model_selection import train_test_split

      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

      from sklearn.linear_model import LinearRegression

      regressor = LinearRegression()
      regressor.fit(X_train, y_train)
      y_pred = regressor.predict(X_test)

      X_single_data = [[4.6]]
      y_single_pred = regressor.predict(X_single_data)

      print(f'Train score: {regressor.score(X_train, y_train)}')
      print(f'Test score: {regressor.score(X_test, y_test)}')



      Train score: 0.960775692121653



      Test score: 0.9248580247217076




      Tensorflow solution



      import tensorflow as tf

      f_cols = [tf.feature_column.numeric_column(key='X', shape=[1])]
      estimator = tf.estimator.LinearRegressor(feature_columns=f_cols)


      train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train,shuffle=False)

      test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test,shuffle=False)


      train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn)
      eval_spec = tf.estimator.EvalSpec(input_fn=test_input_fn)

      tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)



      ({'average_loss': 7675087400.0,



      'label/mean': 84588.11,



      'loss': 69075790000.0,



      'prediction/mean': 5.0796494,



      'global_step': 6},



      )




      Data



      YearsExperience,Salary
      1.1,39343.00
      1.3,46205.00
      1.5,37731.00
      2.0,43525.00
      2.2,39891.00
      2.9,56642.00
      3.0,60150.00
      3.2,54445.00
      3.2,64445.00
      3.7,57189.00
      3.9,63218.00
      4.0,55794.00
      4.0,56957.00
      4.1,57081.00
      4.5,61111.00
      4.9,67938.00
      5.1,66029.00
      5.3,83088.00
      5.9,81363.00
      6.0,93940.00
      6.8,91738.00
      7.1,98273.00
      7.9,101302.00
      8.2,113812.00
      8.7,109431.00
      9.0,105582.00
      9.5,116969.00
      9.6,112635.00
      10.3,122391.00
      10.5,121872.00









      share|improve this question














      Im implementing a simple linear regression with scikitlearn and tensorflow.



      My solution in scikitlearn seem fine but with tensorflow my evaluation output is showing some crazy numbers.



      The problem is basically to try to predict a salary based in years of experience.



      I not sure what Im doing wrong in Tensorflow's code.



      Thanks!



      ScikitLearn solution



      import pandas as pd
      data = pd.read_csv('Salary_Data.csv')

      X = data.iloc[:, :-1].values
      y = data.iloc[:, 1].values

      from sklearn.model_selection import train_test_split

      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

      from sklearn.linear_model import LinearRegression

      regressor = LinearRegression()
      regressor.fit(X_train, y_train)
      y_pred = regressor.predict(X_test)

      X_single_data = [[4.6]]
      y_single_pred = regressor.predict(X_single_data)

      print(f'Train score: {regressor.score(X_train, y_train)}')
      print(f'Test score: {regressor.score(X_test, y_test)}')



      Train score: 0.960775692121653



      Test score: 0.9248580247217076




      Tensorflow solution



      import tensorflow as tf

      f_cols = [tf.feature_column.numeric_column(key='X', shape=[1])]
      estimator = tf.estimator.LinearRegressor(feature_columns=f_cols)


      train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train,shuffle=False)

      test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test,shuffle=False)


      train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn)
      eval_spec = tf.estimator.EvalSpec(input_fn=test_input_fn)

      tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)



      ({'average_loss': 7675087400.0,



      'label/mean': 84588.11,



      'loss': 69075790000.0,



      'prediction/mean': 5.0796494,



      'global_step': 6},



      )




      Data



      YearsExperience,Salary
      1.1,39343.00
      1.3,46205.00
      1.5,37731.00
      2.0,43525.00
      2.2,39891.00
      2.9,56642.00
      3.0,60150.00
      3.2,54445.00
      3.2,64445.00
      3.7,57189.00
      3.9,63218.00
      4.0,55794.00
      4.0,56957.00
      4.1,57081.00
      4.5,61111.00
      4.9,67938.00
      5.1,66029.00
      5.3,83088.00
      5.9,81363.00
      6.0,93940.00
      6.8,91738.00
      7.1,98273.00
      7.9,101302.00
      8.2,113812.00
      8.7,109431.00
      9.0,105582.00
      9.5,116969.00
      9.6,112635.00
      10.3,122391.00
      10.5,121872.00






      machine-learning scikit-learn linear-regression tensorflow-estimator






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 22 '18 at 5:03









      gabrielpegabrielpe

      62




      62
























          2 Answers
          2






          active

          oldest

          votes


















          0














          Per your code request in the comments: Though I had used my online curve and surface fitting web site zunzun.com for this equation at http://zunzun.com/Equation/2/Sigmoidal/Sigmoid%20B/ for the modeling work, here is a graphing source code example using the scipy differential_evolution genetic algorithm module to estimate initial parameter estimates. The scipy implementation of Differential Evolution uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search - in this example those bounds are taken from the data maximum and minimum values, and the fit statistics and parameter values are almost identical to those from the web site.



          import numpy, scipy, matplotlib
          import matplotlib.pyplot as plt
          from scipy.optimize import curve_fit
          from scipy.optimize import differential_evolution
          import warnings

          xData = numpy.array([ 1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])
          yData = numpy.array([ 39.343, 46.205, 37.731, 43.525, 39.891, 56.642, 60.15, 54.445, 64.445, 57.189, 63.218, 55.794, 56.957, 57.081, 61.111, 67.938, 66.029, 83.088, 81.363, 93.94, 91.738, 98.273, 101.302, 113.812, 109.431, 105.582, 116.969, 112.635, 122.391, 121.872])


          def func(x, a, b, c):
          return a / (1.0 + numpy.exp(-(x-b)/c))


          # function for genetic algorithm to minimize (sum of squared error)
          def sumOfSquaredError(parameterTuple):
          warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
          val = func(xData, *parameterTuple)
          return numpy.sum((yData - val) ** 2.0)


          def generate_Initial_Parameters():
          # min and max used for bounds
          maxX = max(xData)
          minX = min(xData)
          maxY = max(yData)
          minY = min(yData)

          parameterBounds =
          parameterBounds.append([minY, maxY]) # search bounds for a
          parameterBounds.append([minX, maxX]) # search bounds for b
          parameterBounds.append([minX, maxX]) # search bounds for c

          # "seed" the numpy random number generator for repeatable results
          result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
          return result.x

          # by default, differential_evolution completes by calling curve_fit() using parameter bounds
          geneticParameters = generate_Initial_Parameters()

          # now call curve_fit without passing bounds from the genetic algorithm,
          # just in case the best fit parameters are aoutside those bounds
          fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
          print('Fitted parameters:', fittedParameters)
          print()

          modelPredictions = func(xData, *fittedParameters)

          absError = modelPredictions - yData

          SE = numpy.square(absError) # squared errors
          MSE = numpy.mean(SE) # mean squared errors
          RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
          Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

          print()
          print('RMSE:', RMSE)
          print('R-squared:', Rsquared)

          print()


          ##########################################################
          # graphics output section
          def ModelAndScatterPlot(graphWidth, graphHeight):
          f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
          axes = f.add_subplot(111)

          # first the raw data as a scatter plot
          axes.plot(xData, yData, 'D')

          # create data for the fitted equation plot
          xModel = numpy.linspace(min(xData), max(xData))
          yModel = func(xModel, *fittedParameters)

          # now the model as a line plot
          axes.plot(xModel, yModel)

          axes.set_xlabel('Years of experience') # X axis data label
          axes.set_ylabel('Salary in thousands') # Y axis data label

          plt.show()
          plt.close('all') # clean up after using pyplot

          graphWidth = 800
          graphHeight = 600
          ModelAndScatterPlot(graphWidth, graphHeight)





          share|improve this answer

































            0














            I cannot place an image in a comment, and so place it here. I suspected the relationship might be sigmoidal rather than linear, and found the following sigmoidal equation and fit statistics using units of thousands for salary: "y = a / (1.0 + exp(-(x-b)/c))" with fitted parameters a = 1.5535069418318591E+02, b = 5.4580059234664899E+00, and c = 3.7724942500630938E+00 giving an R-squared = 0.96 and RMSE = 5.30 (thousand)
            sigmoidal






            share|improve this answer
























            • Thanks for your help. Do you mind posting your code here? I put my solution on github, please check how I could find a linear solution solution with scikit learn github.com/gabrielpsilva/ai-study-models/blob/master/… I'm still on my first steps, learning by examples :)

              – gabrielpe
              Nov 22 '18 at 13:29













            • I cannot format code in a comment, and so posted it as a second answer.

              – James Phillips
              Nov 22 '18 at 15:09











            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424181%2ftensorflow-and-scikit-learn-same-solution-but-different-outputs%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Per your code request in the comments: Though I had used my online curve and surface fitting web site zunzun.com for this equation at http://zunzun.com/Equation/2/Sigmoidal/Sigmoid%20B/ for the modeling work, here is a graphing source code example using the scipy differential_evolution genetic algorithm module to estimate initial parameter estimates. The scipy implementation of Differential Evolution uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search - in this example those bounds are taken from the data maximum and minimum values, and the fit statistics and parameter values are almost identical to those from the web site.



            import numpy, scipy, matplotlib
            import matplotlib.pyplot as plt
            from scipy.optimize import curve_fit
            from scipy.optimize import differential_evolution
            import warnings

            xData = numpy.array([ 1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])
            yData = numpy.array([ 39.343, 46.205, 37.731, 43.525, 39.891, 56.642, 60.15, 54.445, 64.445, 57.189, 63.218, 55.794, 56.957, 57.081, 61.111, 67.938, 66.029, 83.088, 81.363, 93.94, 91.738, 98.273, 101.302, 113.812, 109.431, 105.582, 116.969, 112.635, 122.391, 121.872])


            def func(x, a, b, c):
            return a / (1.0 + numpy.exp(-(x-b)/c))


            # function for genetic algorithm to minimize (sum of squared error)
            def sumOfSquaredError(parameterTuple):
            warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
            val = func(xData, *parameterTuple)
            return numpy.sum((yData - val) ** 2.0)


            def generate_Initial_Parameters():
            # min and max used for bounds
            maxX = max(xData)
            minX = min(xData)
            maxY = max(yData)
            minY = min(yData)

            parameterBounds =
            parameterBounds.append([minY, maxY]) # search bounds for a
            parameterBounds.append([minX, maxX]) # search bounds for b
            parameterBounds.append([minX, maxX]) # search bounds for c

            # "seed" the numpy random number generator for repeatable results
            result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
            return result.x

            # by default, differential_evolution completes by calling curve_fit() using parameter bounds
            geneticParameters = generate_Initial_Parameters()

            # now call curve_fit without passing bounds from the genetic algorithm,
            # just in case the best fit parameters are aoutside those bounds
            fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
            print('Fitted parameters:', fittedParameters)
            print()

            modelPredictions = func(xData, *fittedParameters)

            absError = modelPredictions - yData

            SE = numpy.square(absError) # squared errors
            MSE = numpy.mean(SE) # mean squared errors
            RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
            Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

            print()
            print('RMSE:', RMSE)
            print('R-squared:', Rsquared)

            print()


            ##########################################################
            # graphics output section
            def ModelAndScatterPlot(graphWidth, graphHeight):
            f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
            axes = f.add_subplot(111)

            # first the raw data as a scatter plot
            axes.plot(xData, yData, 'D')

            # create data for the fitted equation plot
            xModel = numpy.linspace(min(xData), max(xData))
            yModel = func(xModel, *fittedParameters)

            # now the model as a line plot
            axes.plot(xModel, yModel)

            axes.set_xlabel('Years of experience') # X axis data label
            axes.set_ylabel('Salary in thousands') # Y axis data label

            plt.show()
            plt.close('all') # clean up after using pyplot

            graphWidth = 800
            graphHeight = 600
            ModelAndScatterPlot(graphWidth, graphHeight)





            share|improve this answer






























              0














              Per your code request in the comments: Though I had used my online curve and surface fitting web site zunzun.com for this equation at http://zunzun.com/Equation/2/Sigmoidal/Sigmoid%20B/ for the modeling work, here is a graphing source code example using the scipy differential_evolution genetic algorithm module to estimate initial parameter estimates. The scipy implementation of Differential Evolution uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search - in this example those bounds are taken from the data maximum and minimum values, and the fit statistics and parameter values are almost identical to those from the web site.



              import numpy, scipy, matplotlib
              import matplotlib.pyplot as plt
              from scipy.optimize import curve_fit
              from scipy.optimize import differential_evolution
              import warnings

              xData = numpy.array([ 1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])
              yData = numpy.array([ 39.343, 46.205, 37.731, 43.525, 39.891, 56.642, 60.15, 54.445, 64.445, 57.189, 63.218, 55.794, 56.957, 57.081, 61.111, 67.938, 66.029, 83.088, 81.363, 93.94, 91.738, 98.273, 101.302, 113.812, 109.431, 105.582, 116.969, 112.635, 122.391, 121.872])


              def func(x, a, b, c):
              return a / (1.0 + numpy.exp(-(x-b)/c))


              # function for genetic algorithm to minimize (sum of squared error)
              def sumOfSquaredError(parameterTuple):
              warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
              val = func(xData, *parameterTuple)
              return numpy.sum((yData - val) ** 2.0)


              def generate_Initial_Parameters():
              # min and max used for bounds
              maxX = max(xData)
              minX = min(xData)
              maxY = max(yData)
              minY = min(yData)

              parameterBounds =
              parameterBounds.append([minY, maxY]) # search bounds for a
              parameterBounds.append([minX, maxX]) # search bounds for b
              parameterBounds.append([minX, maxX]) # search bounds for c

              # "seed" the numpy random number generator for repeatable results
              result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
              return result.x

              # by default, differential_evolution completes by calling curve_fit() using parameter bounds
              geneticParameters = generate_Initial_Parameters()

              # now call curve_fit without passing bounds from the genetic algorithm,
              # just in case the best fit parameters are aoutside those bounds
              fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
              print('Fitted parameters:', fittedParameters)
              print()

              modelPredictions = func(xData, *fittedParameters)

              absError = modelPredictions - yData

              SE = numpy.square(absError) # squared errors
              MSE = numpy.mean(SE) # mean squared errors
              RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
              Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

              print()
              print('RMSE:', RMSE)
              print('R-squared:', Rsquared)

              print()


              ##########################################################
              # graphics output section
              def ModelAndScatterPlot(graphWidth, graphHeight):
              f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
              axes = f.add_subplot(111)

              # first the raw data as a scatter plot
              axes.plot(xData, yData, 'D')

              # create data for the fitted equation plot
              xModel = numpy.linspace(min(xData), max(xData))
              yModel = func(xModel, *fittedParameters)

              # now the model as a line plot
              axes.plot(xModel, yModel)

              axes.set_xlabel('Years of experience') # X axis data label
              axes.set_ylabel('Salary in thousands') # Y axis data label

              plt.show()
              plt.close('all') # clean up after using pyplot

              graphWidth = 800
              graphHeight = 600
              ModelAndScatterPlot(graphWidth, graphHeight)





              share|improve this answer




























                0












                0








                0







                Per your code request in the comments: Though I had used my online curve and surface fitting web site zunzun.com for this equation at http://zunzun.com/Equation/2/Sigmoidal/Sigmoid%20B/ for the modeling work, here is a graphing source code example using the scipy differential_evolution genetic algorithm module to estimate initial parameter estimates. The scipy implementation of Differential Evolution uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search - in this example those bounds are taken from the data maximum and minimum values, and the fit statistics and parameter values are almost identical to those from the web site.



                import numpy, scipy, matplotlib
                import matplotlib.pyplot as plt
                from scipy.optimize import curve_fit
                from scipy.optimize import differential_evolution
                import warnings

                xData = numpy.array([ 1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])
                yData = numpy.array([ 39.343, 46.205, 37.731, 43.525, 39.891, 56.642, 60.15, 54.445, 64.445, 57.189, 63.218, 55.794, 56.957, 57.081, 61.111, 67.938, 66.029, 83.088, 81.363, 93.94, 91.738, 98.273, 101.302, 113.812, 109.431, 105.582, 116.969, 112.635, 122.391, 121.872])


                def func(x, a, b, c):
                return a / (1.0 + numpy.exp(-(x-b)/c))


                # function for genetic algorithm to minimize (sum of squared error)
                def sumOfSquaredError(parameterTuple):
                warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
                val = func(xData, *parameterTuple)
                return numpy.sum((yData - val) ** 2.0)


                def generate_Initial_Parameters():
                # min and max used for bounds
                maxX = max(xData)
                minX = min(xData)
                maxY = max(yData)
                minY = min(yData)

                parameterBounds =
                parameterBounds.append([minY, maxY]) # search bounds for a
                parameterBounds.append([minX, maxX]) # search bounds for b
                parameterBounds.append([minX, maxX]) # search bounds for c

                # "seed" the numpy random number generator for repeatable results
                result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
                return result.x

                # by default, differential_evolution completes by calling curve_fit() using parameter bounds
                geneticParameters = generate_Initial_Parameters()

                # now call curve_fit without passing bounds from the genetic algorithm,
                # just in case the best fit parameters are aoutside those bounds
                fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
                print('Fitted parameters:', fittedParameters)
                print()

                modelPredictions = func(xData, *fittedParameters)

                absError = modelPredictions - yData

                SE = numpy.square(absError) # squared errors
                MSE = numpy.mean(SE) # mean squared errors
                RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
                Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

                print()
                print('RMSE:', RMSE)
                print('R-squared:', Rsquared)

                print()


                ##########################################################
                # graphics output section
                def ModelAndScatterPlot(graphWidth, graphHeight):
                f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
                axes = f.add_subplot(111)

                # first the raw data as a scatter plot
                axes.plot(xData, yData, 'D')

                # create data for the fitted equation plot
                xModel = numpy.linspace(min(xData), max(xData))
                yModel = func(xModel, *fittedParameters)

                # now the model as a line plot
                axes.plot(xModel, yModel)

                axes.set_xlabel('Years of experience') # X axis data label
                axes.set_ylabel('Salary in thousands') # Y axis data label

                plt.show()
                plt.close('all') # clean up after using pyplot

                graphWidth = 800
                graphHeight = 600
                ModelAndScatterPlot(graphWidth, graphHeight)





                share|improve this answer















                Per your code request in the comments: Though I had used my online curve and surface fitting web site zunzun.com for this equation at http://zunzun.com/Equation/2/Sigmoidal/Sigmoid%20B/ for the modeling work, here is a graphing source code example using the scipy differential_evolution genetic algorithm module to estimate initial parameter estimates. The scipy implementation of Differential Evolution uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search - in this example those bounds are taken from the data maximum and minimum values, and the fit statistics and parameter values are almost identical to those from the web site.



                import numpy, scipy, matplotlib
                import matplotlib.pyplot as plt
                from scipy.optimize import curve_fit
                from scipy.optimize import differential_evolution
                import warnings

                xData = numpy.array([ 1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])
                yData = numpy.array([ 39.343, 46.205, 37.731, 43.525, 39.891, 56.642, 60.15, 54.445, 64.445, 57.189, 63.218, 55.794, 56.957, 57.081, 61.111, 67.938, 66.029, 83.088, 81.363, 93.94, 91.738, 98.273, 101.302, 113.812, 109.431, 105.582, 116.969, 112.635, 122.391, 121.872])


                def func(x, a, b, c):
                return a / (1.0 + numpy.exp(-(x-b)/c))


                # function for genetic algorithm to minimize (sum of squared error)
                def sumOfSquaredError(parameterTuple):
                warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
                val = func(xData, *parameterTuple)
                return numpy.sum((yData - val) ** 2.0)


                def generate_Initial_Parameters():
                # min and max used for bounds
                maxX = max(xData)
                minX = min(xData)
                maxY = max(yData)
                minY = min(yData)

                parameterBounds =
                parameterBounds.append([minY, maxY]) # search bounds for a
                parameterBounds.append([minX, maxX]) # search bounds for b
                parameterBounds.append([minX, maxX]) # search bounds for c

                # "seed" the numpy random number generator for repeatable results
                result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
                return result.x

                # by default, differential_evolution completes by calling curve_fit() using parameter bounds
                geneticParameters = generate_Initial_Parameters()

                # now call curve_fit without passing bounds from the genetic algorithm,
                # just in case the best fit parameters are aoutside those bounds
                fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
                print('Fitted parameters:', fittedParameters)
                print()

                modelPredictions = func(xData, *fittedParameters)

                absError = modelPredictions - yData

                SE = numpy.square(absError) # squared errors
                MSE = numpy.mean(SE) # mean squared errors
                RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
                Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

                print()
                print('RMSE:', RMSE)
                print('R-squared:', Rsquared)

                print()


                ##########################################################
                # graphics output section
                def ModelAndScatterPlot(graphWidth, graphHeight):
                f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
                axes = f.add_subplot(111)

                # first the raw data as a scatter plot
                axes.plot(xData, yData, 'D')

                # create data for the fitted equation plot
                xModel = numpy.linspace(min(xData), max(xData))
                yModel = func(xModel, *fittedParameters)

                # now the model as a line plot
                axes.plot(xModel, yModel)

                axes.set_xlabel('Years of experience') # X axis data label
                axes.set_ylabel('Salary in thousands') # Y axis data label

                plt.show()
                plt.close('all') # clean up after using pyplot

                graphWidth = 800
                graphHeight = 600
                ModelAndScatterPlot(graphWidth, graphHeight)






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 22 '18 at 15:14

























                answered Nov 22 '18 at 15:02









                James PhillipsJames Phillips

                1,494387




                1,494387

























                    0














                    I cannot place an image in a comment, and so place it here. I suspected the relationship might be sigmoidal rather than linear, and found the following sigmoidal equation and fit statistics using units of thousands for salary: "y = a / (1.0 + exp(-(x-b)/c))" with fitted parameters a = 1.5535069418318591E+02, b = 5.4580059234664899E+00, and c = 3.7724942500630938E+00 giving an R-squared = 0.96 and RMSE = 5.30 (thousand)
                    sigmoidal






                    share|improve this answer
























                    • Thanks for your help. Do you mind posting your code here? I put my solution on github, please check how I could find a linear solution solution with scikit learn github.com/gabrielpsilva/ai-study-models/blob/master/… I'm still on my first steps, learning by examples :)

                      – gabrielpe
                      Nov 22 '18 at 13:29













                    • I cannot format code in a comment, and so posted it as a second answer.

                      – James Phillips
                      Nov 22 '18 at 15:09
















                    0














                    I cannot place an image in a comment, and so place it here. I suspected the relationship might be sigmoidal rather than linear, and found the following sigmoidal equation and fit statistics using units of thousands for salary: "y = a / (1.0 + exp(-(x-b)/c))" with fitted parameters a = 1.5535069418318591E+02, b = 5.4580059234664899E+00, and c = 3.7724942500630938E+00 giving an R-squared = 0.96 and RMSE = 5.30 (thousand)
                    sigmoidal






                    share|improve this answer
























                    • Thanks for your help. Do you mind posting your code here? I put my solution on github, please check how I could find a linear solution solution with scikit learn github.com/gabrielpsilva/ai-study-models/blob/master/… I'm still on my first steps, learning by examples :)

                      – gabrielpe
                      Nov 22 '18 at 13:29













                    • I cannot format code in a comment, and so posted it as a second answer.

                      – James Phillips
                      Nov 22 '18 at 15:09














                    0












                    0








                    0







                    I cannot place an image in a comment, and so place it here. I suspected the relationship might be sigmoidal rather than linear, and found the following sigmoidal equation and fit statistics using units of thousands for salary: "y = a / (1.0 + exp(-(x-b)/c))" with fitted parameters a = 1.5535069418318591E+02, b = 5.4580059234664899E+00, and c = 3.7724942500630938E+00 giving an R-squared = 0.96 and RMSE = 5.30 (thousand)
                    sigmoidal






                    share|improve this answer













                    I cannot place an image in a comment, and so place it here. I suspected the relationship might be sigmoidal rather than linear, and found the following sigmoidal equation and fit statistics using units of thousands for salary: "y = a / (1.0 + exp(-(x-b)/c))" with fitted parameters a = 1.5535069418318591E+02, b = 5.4580059234664899E+00, and c = 3.7724942500630938E+00 giving an R-squared = 0.96 and RMSE = 5.30 (thousand)
                    sigmoidal







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 22 '18 at 12:46









                    James PhillipsJames Phillips

                    1,494387




                    1,494387













                    • Thanks for your help. Do you mind posting your code here? I put my solution on github, please check how I could find a linear solution solution with scikit learn github.com/gabrielpsilva/ai-study-models/blob/master/… I'm still on my first steps, learning by examples :)

                      – gabrielpe
                      Nov 22 '18 at 13:29













                    • I cannot format code in a comment, and so posted it as a second answer.

                      – James Phillips
                      Nov 22 '18 at 15:09



















                    • Thanks for your help. Do you mind posting your code here? I put my solution on github, please check how I could find a linear solution solution with scikit learn github.com/gabrielpsilva/ai-study-models/blob/master/… I'm still on my first steps, learning by examples :)

                      – gabrielpe
                      Nov 22 '18 at 13:29













                    • I cannot format code in a comment, and so posted it as a second answer.

                      – James Phillips
                      Nov 22 '18 at 15:09

















                    Thanks for your help. Do you mind posting your code here? I put my solution on github, please check how I could find a linear solution solution with scikit learn github.com/gabrielpsilva/ai-study-models/blob/master/… I'm still on my first steps, learning by examples :)

                    – gabrielpe
                    Nov 22 '18 at 13:29







                    Thanks for your help. Do you mind posting your code here? I put my solution on github, please check how I could find a linear solution solution with scikit learn github.com/gabrielpsilva/ai-study-models/blob/master/… I'm still on my first steps, learning by examples :)

                    – gabrielpe
                    Nov 22 '18 at 13:29















                    I cannot format code in a comment, and so posted it as a second answer.

                    – James Phillips
                    Nov 22 '18 at 15:09





                    I cannot format code in a comment, and so posted it as a second answer.

                    – James Phillips
                    Nov 22 '18 at 15:09


















                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53424181%2ftensorflow-and-scikit-learn-same-solution-but-different-outputs%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Costa Masnaga

                    Fotorealismo

                    Sidney Franklin