openCL Kernel to calculate Pi is not correct value












1














Good day,



I have an openCL kernel that is using the Leibniz formula to calculate pi. Currently my issue is that the value I get back isn't pi, but instead just 4.



__kernel void calculatePi(int numIterations, __global float *outputPi,
__local float* local_result, int numWorkers)
{
__private const uint gid = get_global_id(0);
__private const uint lid = get_local_id(0);
__private const uint offset = numIterations*gid*2;
__private float sum = 0.0f;

// Have the first worker initialize local_result
if (gid == 0)
{
for (int i = 0; i < numWorkers; i++)
{
local_result[i] = 0.0f;
}
}

// Have all workers wait until this is completed
barrier(CLK_GLOBAL_MEM_FENCE);

// Have each worker calculate their portion of pi
// This is a private value
for (int i=0; i<numIterations; i++)
{
if (i % 2 == 0)
{
sum += 1 / (1 + 2*i + offset);
}
else
{
sum -= 1 / (1 + 2*i + offset);
}
}

// Have each worker move their value to the appropriate
// local_result slot so that the first worker can see it
// when reducing next
local_result[gid] = sum;

// Make sure all workers complete this task before continuing
barrier(CLK_LOCAL_MEM_FENCE);

// Have the first worker add up all of the other worker's values
// to get the final value
if (lid == 0)
{
outputPi[0] = 0;
for (int i = 0; i < numWorkers; i++)
{
outputPi[0] += local_result[i];
}

outputPi[0] *= 4;
}
}


I've steered all of my inputs to my output to verify that they are what I expect. numIterations is 16 and numWorkers is also 16.



When sum is calculated then for the first worker, I would expect the sum to be
1 - 1/3 + 1/5 - 1/7 + 1/9 - 1/11 + 1/13 - 1/15 + 1/17 - 1/19 + 1/21 - 1/23 + 1/25 - 1/27 + 1/29 - 1/31



Using this calculator for the first 16 times, I expect the result to be around 3.2 : https://scratch.mit.edu/projects/19546118/



If I modify my last bit of code to be this so that I can look at a worker's calculated value of "sum":



    // Have the first worker add up all of the other worker's values
// to get the final value
if (lid == 0)
{
outputPi[0] = sum * 4;
}


Then the value returned for the first worker is 4 instead of the expected 3.2



Modifying to any other number except lid == 0, all other workers are reporting their sum as 0. So my question is why is that the calculated value? Am I doing something wrong with my sum variable? This should be a private variable and the for loop should be sequential from my understanding for each worker but numerous loops are executed in parallel based on the number of workers.



Here's a link to my github that has the kernel and main code uploaded.



https://github.com/TreverWagenhals/TreverWagenhals/tree/master/School/Heterogeneous%20Computing/Lab2



Thanks










share|improve this question
























  • I would expect offset = numIterations / (gid*2); or am i wrong?
    – OznOg
    Nov 20 at 18:04










  • No I don't think so. The next term for the next worker (gid==1) should start at 1/33 (I showed the first 16 terms above). So, the offset is 16*1*2 = 32. For the first loop of the for-loop, the calculation would then be + 1/(1 +2(0) + 32) = + 1/33, which is what is expected. Just for sanity, I tried what you said too and it results in a floating point exception (16/(15*2) = 16/30, which is not an integer anymore, so that makes sense that it isn't correct.
    – Trever Wagenhals
    Nov 20 at 18:08


















1














Good day,



I have an openCL kernel that is using the Leibniz formula to calculate pi. Currently my issue is that the value I get back isn't pi, but instead just 4.



__kernel void calculatePi(int numIterations, __global float *outputPi,
__local float* local_result, int numWorkers)
{
__private const uint gid = get_global_id(0);
__private const uint lid = get_local_id(0);
__private const uint offset = numIterations*gid*2;
__private float sum = 0.0f;

// Have the first worker initialize local_result
if (gid == 0)
{
for (int i = 0; i < numWorkers; i++)
{
local_result[i] = 0.0f;
}
}

// Have all workers wait until this is completed
barrier(CLK_GLOBAL_MEM_FENCE);

// Have each worker calculate their portion of pi
// This is a private value
for (int i=0; i<numIterations; i++)
{
if (i % 2 == 0)
{
sum += 1 / (1 + 2*i + offset);
}
else
{
sum -= 1 / (1 + 2*i + offset);
}
}

// Have each worker move their value to the appropriate
// local_result slot so that the first worker can see it
// when reducing next
local_result[gid] = sum;

// Make sure all workers complete this task before continuing
barrier(CLK_LOCAL_MEM_FENCE);

// Have the first worker add up all of the other worker's values
// to get the final value
if (lid == 0)
{
outputPi[0] = 0;
for (int i = 0; i < numWorkers; i++)
{
outputPi[0] += local_result[i];
}

outputPi[0] *= 4;
}
}


I've steered all of my inputs to my output to verify that they are what I expect. numIterations is 16 and numWorkers is also 16.



When sum is calculated then for the first worker, I would expect the sum to be
1 - 1/3 + 1/5 - 1/7 + 1/9 - 1/11 + 1/13 - 1/15 + 1/17 - 1/19 + 1/21 - 1/23 + 1/25 - 1/27 + 1/29 - 1/31



Using this calculator for the first 16 times, I expect the result to be around 3.2 : https://scratch.mit.edu/projects/19546118/



If I modify my last bit of code to be this so that I can look at a worker's calculated value of "sum":



    // Have the first worker add up all of the other worker's values
// to get the final value
if (lid == 0)
{
outputPi[0] = sum * 4;
}


Then the value returned for the first worker is 4 instead of the expected 3.2



Modifying to any other number except lid == 0, all other workers are reporting their sum as 0. So my question is why is that the calculated value? Am I doing something wrong with my sum variable? This should be a private variable and the for loop should be sequential from my understanding for each worker but numerous loops are executed in parallel based on the number of workers.



Here's a link to my github that has the kernel and main code uploaded.



https://github.com/TreverWagenhals/TreverWagenhals/tree/master/School/Heterogeneous%20Computing/Lab2



Thanks










share|improve this question
























  • I would expect offset = numIterations / (gid*2); or am i wrong?
    – OznOg
    Nov 20 at 18:04










  • No I don't think so. The next term for the next worker (gid==1) should start at 1/33 (I showed the first 16 terms above). So, the offset is 16*1*2 = 32. For the first loop of the for-loop, the calculation would then be + 1/(1 +2(0) + 32) = + 1/33, which is what is expected. Just for sanity, I tried what you said too and it results in a floating point exception (16/(15*2) = 16/30, which is not an integer anymore, so that makes sense that it isn't correct.
    – Trever Wagenhals
    Nov 20 at 18:08
















1












1








1







Good day,



I have an openCL kernel that is using the Leibniz formula to calculate pi. Currently my issue is that the value I get back isn't pi, but instead just 4.



__kernel void calculatePi(int numIterations, __global float *outputPi,
__local float* local_result, int numWorkers)
{
__private const uint gid = get_global_id(0);
__private const uint lid = get_local_id(0);
__private const uint offset = numIterations*gid*2;
__private float sum = 0.0f;

// Have the first worker initialize local_result
if (gid == 0)
{
for (int i = 0; i < numWorkers; i++)
{
local_result[i] = 0.0f;
}
}

// Have all workers wait until this is completed
barrier(CLK_GLOBAL_MEM_FENCE);

// Have each worker calculate their portion of pi
// This is a private value
for (int i=0; i<numIterations; i++)
{
if (i % 2 == 0)
{
sum += 1 / (1 + 2*i + offset);
}
else
{
sum -= 1 / (1 + 2*i + offset);
}
}

// Have each worker move their value to the appropriate
// local_result slot so that the first worker can see it
// when reducing next
local_result[gid] = sum;

// Make sure all workers complete this task before continuing
barrier(CLK_LOCAL_MEM_FENCE);

// Have the first worker add up all of the other worker's values
// to get the final value
if (lid == 0)
{
outputPi[0] = 0;
for (int i = 0; i < numWorkers; i++)
{
outputPi[0] += local_result[i];
}

outputPi[0] *= 4;
}
}


I've steered all of my inputs to my output to verify that they are what I expect. numIterations is 16 and numWorkers is also 16.



When sum is calculated then for the first worker, I would expect the sum to be
1 - 1/3 + 1/5 - 1/7 + 1/9 - 1/11 + 1/13 - 1/15 + 1/17 - 1/19 + 1/21 - 1/23 + 1/25 - 1/27 + 1/29 - 1/31



Using this calculator for the first 16 times, I expect the result to be around 3.2 : https://scratch.mit.edu/projects/19546118/



If I modify my last bit of code to be this so that I can look at a worker's calculated value of "sum":



    // Have the first worker add up all of the other worker's values
// to get the final value
if (lid == 0)
{
outputPi[0] = sum * 4;
}


Then the value returned for the first worker is 4 instead of the expected 3.2



Modifying to any other number except lid == 0, all other workers are reporting their sum as 0. So my question is why is that the calculated value? Am I doing something wrong with my sum variable? This should be a private variable and the for loop should be sequential from my understanding for each worker but numerous loops are executed in parallel based on the number of workers.



Here's a link to my github that has the kernel and main code uploaded.



https://github.com/TreverWagenhals/TreverWagenhals/tree/master/School/Heterogeneous%20Computing/Lab2



Thanks










share|improve this question















Good day,



I have an openCL kernel that is using the Leibniz formula to calculate pi. Currently my issue is that the value I get back isn't pi, but instead just 4.



__kernel void calculatePi(int numIterations, __global float *outputPi,
__local float* local_result, int numWorkers)
{
__private const uint gid = get_global_id(0);
__private const uint lid = get_local_id(0);
__private const uint offset = numIterations*gid*2;
__private float sum = 0.0f;

// Have the first worker initialize local_result
if (gid == 0)
{
for (int i = 0; i < numWorkers; i++)
{
local_result[i] = 0.0f;
}
}

// Have all workers wait until this is completed
barrier(CLK_GLOBAL_MEM_FENCE);

// Have each worker calculate their portion of pi
// This is a private value
for (int i=0; i<numIterations; i++)
{
if (i % 2 == 0)
{
sum += 1 / (1 + 2*i + offset);
}
else
{
sum -= 1 / (1 + 2*i + offset);
}
}

// Have each worker move their value to the appropriate
// local_result slot so that the first worker can see it
// when reducing next
local_result[gid] = sum;

// Make sure all workers complete this task before continuing
barrier(CLK_LOCAL_MEM_FENCE);

// Have the first worker add up all of the other worker's values
// to get the final value
if (lid == 0)
{
outputPi[0] = 0;
for (int i = 0; i < numWorkers; i++)
{
outputPi[0] += local_result[i];
}

outputPi[0] *= 4;
}
}


I've steered all of my inputs to my output to verify that they are what I expect. numIterations is 16 and numWorkers is also 16.



When sum is calculated then for the first worker, I would expect the sum to be
1 - 1/3 + 1/5 - 1/7 + 1/9 - 1/11 + 1/13 - 1/15 + 1/17 - 1/19 + 1/21 - 1/23 + 1/25 - 1/27 + 1/29 - 1/31



Using this calculator for the first 16 times, I expect the result to be around 3.2 : https://scratch.mit.edu/projects/19546118/



If I modify my last bit of code to be this so that I can look at a worker's calculated value of "sum":



    // Have the first worker add up all of the other worker's values
// to get the final value
if (lid == 0)
{
outputPi[0] = sum * 4;
}


Then the value returned for the first worker is 4 instead of the expected 3.2



Modifying to any other number except lid == 0, all other workers are reporting their sum as 0. So my question is why is that the calculated value? Am I doing something wrong with my sum variable? This should be a private variable and the for loop should be sequential from my understanding for each worker but numerous loops are executed in parallel based on the number of workers.



Here's a link to my github that has the kernel and main code uploaded.



https://github.com/TreverWagenhals/TreverWagenhals/tree/master/School/Heterogeneous%20Computing/Lab2



Thanks







c for-loop kernel opencl pi






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 at 18:00









OznOg

2,33411525




2,33411525










asked Nov 20 at 17:15









Trever Wagenhals

10210




10210












  • I would expect offset = numIterations / (gid*2); or am i wrong?
    – OznOg
    Nov 20 at 18:04










  • No I don't think so. The next term for the next worker (gid==1) should start at 1/33 (I showed the first 16 terms above). So, the offset is 16*1*2 = 32. For the first loop of the for-loop, the calculation would then be + 1/(1 +2(0) + 32) = + 1/33, which is what is expected. Just for sanity, I tried what you said too and it results in a floating point exception (16/(15*2) = 16/30, which is not an integer anymore, so that makes sense that it isn't correct.
    – Trever Wagenhals
    Nov 20 at 18:08




















  • I would expect offset = numIterations / (gid*2); or am i wrong?
    – OznOg
    Nov 20 at 18:04










  • No I don't think so. The next term for the next worker (gid==1) should start at 1/33 (I showed the first 16 terms above). So, the offset is 16*1*2 = 32. For the first loop of the for-loop, the calculation would then be + 1/(1 +2(0) + 32) = + 1/33, which is what is expected. Just for sanity, I tried what you said too and it results in a floating point exception (16/(15*2) = 16/30, which is not an integer anymore, so that makes sense that it isn't correct.
    – Trever Wagenhals
    Nov 20 at 18:08


















I would expect offset = numIterations / (gid*2); or am i wrong?
– OznOg
Nov 20 at 18:04




I would expect offset = numIterations / (gid*2); or am i wrong?
– OznOg
Nov 20 at 18:04












No I don't think so. The next term for the next worker (gid==1) should start at 1/33 (I showed the first 16 terms above). So, the offset is 16*1*2 = 32. For the first loop of the for-loop, the calculation would then be + 1/(1 +2(0) + 32) = + 1/33, which is what is expected. Just for sanity, I tried what you said too and it results in a floating point exception (16/(15*2) = 16/30, which is not an integer anymore, so that makes sense that it isn't correct.
– Trever Wagenhals
Nov 20 at 18:08






No I don't think so. The next term for the next worker (gid==1) should start at 1/33 (I showed the first 16 terms above). So, the offset is 16*1*2 = 32. For the first loop of the for-loop, the calculation would then be + 1/(1 +2(0) + 32) = + 1/33, which is what is expected. Just for sanity, I tried what you said too and it results in a floating point exception (16/(15*2) = 16/30, which is not an integer anymore, so that makes sense that it isn't correct.
– Trever Wagenhals
Nov 20 at 18:08














1 Answer
1






active

oldest

votes


















4














you are performing integral divisions in your code, should be floats:



if (i % 2 == 0)
{
sum += 1. / (1 + 2*i + offset); // notice the 1.
}
else
{
sum -= 1. / (1 + 2*i + offset);
}





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398186%2fopencl-kernel-to-calculate-pi-is-not-correct-value%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4














    you are performing integral divisions in your code, should be floats:



    if (i % 2 == 0)
    {
    sum += 1. / (1 + 2*i + offset); // notice the 1.
    }
    else
    {
    sum -= 1. / (1 + 2*i + offset);
    }





    share|improve this answer


























      4














      you are performing integral divisions in your code, should be floats:



      if (i % 2 == 0)
      {
      sum += 1. / (1 + 2*i + offset); // notice the 1.
      }
      else
      {
      sum -= 1. / (1 + 2*i + offset);
      }





      share|improve this answer
























        4












        4








        4






        you are performing integral divisions in your code, should be floats:



        if (i % 2 == 0)
        {
        sum += 1. / (1 + 2*i + offset); // notice the 1.
        }
        else
        {
        sum -= 1. / (1 + 2*i + offset);
        }





        share|improve this answer












        you are performing integral divisions in your code, should be floats:



        if (i % 2 == 0)
        {
        sum += 1. / (1 + 2*i + offset); // notice the 1.
        }
        else
        {
        sum -= 1. / (1 + 2*i + offset);
        }






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 20 at 20:02









        OznOg

        2,33411525




        2,33411525






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53398186%2fopencl-kernel-to-calculate-pi-is-not-correct-value%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Costa Masnaga

            Fotorealismo

            Sidney Franklin