Reduce total sum of vector elements in R












4














in R, I have a vector of integers. From this vector, I would like to reduce the value of each integer element randomly, in order to obtain a sum of the vector that is a percentage of the initial sum.



In this example, I would like to reduce the vector "x" to a vector "y", where each element has been randomly reduced to obtain a sum of the elements equal to 50% of the initial sum.



The resulting vector should have values that are non-negative and below the original value.



set.seed(1)
perc<-50
x<-sample(1:5,10,replace=TRUE)
xsum<-sum(x) # sum is 33
toremove<-floor(xsum*perc*0.01)
x # 2 2 3 5 2 5 5 4 4 1

y<-magicfunction(x,perc)
y # 0 2 1 4 0 3 2 1 2 1
sum(y) # sum is 16 (rounded half of 33)


Can you think of a way to do it? Thanks!










share|improve this question
























  • Maybe I'm confused, but I don't quite follow why based on your description you can't just do 0.5 * x? Is there some other criteria that you haven't mentioned?
    – joran
    Nov 20 at 17:00










  • Can we assume that the vector is long? With at least, say, 200 elements?
    – Julius Vainora
    Nov 20 at 17:00












  • @JuliusVainora yes, the vector can be arbitrarily long
    – Federico Giorgi
    Nov 20 at 17:04










  • @joran sorry, I clarified it: the vector reduction should not be proportional to each element, but random
    – Federico Giorgi
    Nov 20 at 17:05






  • 1




    Am I right to assume you'd like to avoid computationally inefficient solutions, e.g., sample random integers until their sum is sum(x)/2 and then repeatedly randomly subtract them from x until you get a vector with no negative values?
    – Milan Valášek
    Nov 20 at 17:15
















4














in R, I have a vector of integers. From this vector, I would like to reduce the value of each integer element randomly, in order to obtain a sum of the vector that is a percentage of the initial sum.



In this example, I would like to reduce the vector "x" to a vector "y", where each element has been randomly reduced to obtain a sum of the elements equal to 50% of the initial sum.



The resulting vector should have values that are non-negative and below the original value.



set.seed(1)
perc<-50
x<-sample(1:5,10,replace=TRUE)
xsum<-sum(x) # sum is 33
toremove<-floor(xsum*perc*0.01)
x # 2 2 3 5 2 5 5 4 4 1

y<-magicfunction(x,perc)
y # 0 2 1 4 0 3 2 1 2 1
sum(y) # sum is 16 (rounded half of 33)


Can you think of a way to do it? Thanks!










share|improve this question
























  • Maybe I'm confused, but I don't quite follow why based on your description you can't just do 0.5 * x? Is there some other criteria that you haven't mentioned?
    – joran
    Nov 20 at 17:00










  • Can we assume that the vector is long? With at least, say, 200 elements?
    – Julius Vainora
    Nov 20 at 17:00












  • @JuliusVainora yes, the vector can be arbitrarily long
    – Federico Giorgi
    Nov 20 at 17:04










  • @joran sorry, I clarified it: the vector reduction should not be proportional to each element, but random
    – Federico Giorgi
    Nov 20 at 17:05






  • 1




    Am I right to assume you'd like to avoid computationally inefficient solutions, e.g., sample random integers until their sum is sum(x)/2 and then repeatedly randomly subtract them from x until you get a vector with no negative values?
    – Milan Valášek
    Nov 20 at 17:15














4












4








4


1





in R, I have a vector of integers. From this vector, I would like to reduce the value of each integer element randomly, in order to obtain a sum of the vector that is a percentage of the initial sum.



In this example, I would like to reduce the vector "x" to a vector "y", where each element has been randomly reduced to obtain a sum of the elements equal to 50% of the initial sum.



The resulting vector should have values that are non-negative and below the original value.



set.seed(1)
perc<-50
x<-sample(1:5,10,replace=TRUE)
xsum<-sum(x) # sum is 33
toremove<-floor(xsum*perc*0.01)
x # 2 2 3 5 2 5 5 4 4 1

y<-magicfunction(x,perc)
y # 0 2 1 4 0 3 2 1 2 1
sum(y) # sum is 16 (rounded half of 33)


Can you think of a way to do it? Thanks!










share|improve this question















in R, I have a vector of integers. From this vector, I would like to reduce the value of each integer element randomly, in order to obtain a sum of the vector that is a percentage of the initial sum.



In this example, I would like to reduce the vector "x" to a vector "y", where each element has been randomly reduced to obtain a sum of the elements equal to 50% of the initial sum.



The resulting vector should have values that are non-negative and below the original value.



set.seed(1)
perc<-50
x<-sample(1:5,10,replace=TRUE)
xsum<-sum(x) # sum is 33
toremove<-floor(xsum*perc*0.01)
x # 2 2 3 5 2 5 5 4 4 1

y<-magicfunction(x,perc)
y # 0 2 1 4 0 3 2 1 2 1
sum(y) # sum is 16 (rounded half of 33)


Can you think of a way to do it? Thanks!







r






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 at 17:44

























asked Nov 20 at 16:55









Federico Giorgi

5,20463246




5,20463246












  • Maybe I'm confused, but I don't quite follow why based on your description you can't just do 0.5 * x? Is there some other criteria that you haven't mentioned?
    – joran
    Nov 20 at 17:00










  • Can we assume that the vector is long? With at least, say, 200 elements?
    – Julius Vainora
    Nov 20 at 17:00












  • @JuliusVainora yes, the vector can be arbitrarily long
    – Federico Giorgi
    Nov 20 at 17:04










  • @joran sorry, I clarified it: the vector reduction should not be proportional to each element, but random
    – Federico Giorgi
    Nov 20 at 17:05






  • 1




    Am I right to assume you'd like to avoid computationally inefficient solutions, e.g., sample random integers until their sum is sum(x)/2 and then repeatedly randomly subtract them from x until you get a vector with no negative values?
    – Milan Valášek
    Nov 20 at 17:15


















  • Maybe I'm confused, but I don't quite follow why based on your description you can't just do 0.5 * x? Is there some other criteria that you haven't mentioned?
    – joran
    Nov 20 at 17:00










  • Can we assume that the vector is long? With at least, say, 200 elements?
    – Julius Vainora
    Nov 20 at 17:00












  • @JuliusVainora yes, the vector can be arbitrarily long
    – Federico Giorgi
    Nov 20 at 17:04










  • @joran sorry, I clarified it: the vector reduction should not be proportional to each element, but random
    – Federico Giorgi
    Nov 20 at 17:05






  • 1




    Am I right to assume you'd like to avoid computationally inefficient solutions, e.g., sample random integers until their sum is sum(x)/2 and then repeatedly randomly subtract them from x until you get a vector with no negative values?
    – Milan Valášek
    Nov 20 at 17:15
















Maybe I'm confused, but I don't quite follow why based on your description you can't just do 0.5 * x? Is there some other criteria that you haven't mentioned?
– joran
Nov 20 at 17:00




Maybe I'm confused, but I don't quite follow why based on your description you can't just do 0.5 * x? Is there some other criteria that you haven't mentioned?
– joran
Nov 20 at 17:00












Can we assume that the vector is long? With at least, say, 200 elements?
– Julius Vainora
Nov 20 at 17:00






Can we assume that the vector is long? With at least, say, 200 elements?
– Julius Vainora
Nov 20 at 17:00














@JuliusVainora yes, the vector can be arbitrarily long
– Federico Giorgi
Nov 20 at 17:04




@JuliusVainora yes, the vector can be arbitrarily long
– Federico Giorgi
Nov 20 at 17:04












@joran sorry, I clarified it: the vector reduction should not be proportional to each element, but random
– Federico Giorgi
Nov 20 at 17:05




@joran sorry, I clarified it: the vector reduction should not be proportional to each element, but random
– Federico Giorgi
Nov 20 at 17:05




1




1




Am I right to assume you'd like to avoid computationally inefficient solutions, e.g., sample random integers until their sum is sum(x)/2 and then repeatedly randomly subtract them from x until you get a vector with no negative values?
– Milan Valášek
Nov 20 at 17:15




Am I right to assume you'd like to avoid computationally inefficient solutions, e.g., sample random integers until their sum is sum(x)/2 and then repeatedly randomly subtract them from x until you get a vector with no negative values?
– Milan Valášek
Nov 20 at 17:15












3 Answers
3






active

oldest

votes


















5














Assuming that x is long enough, we may rely on some appropriate law of large numbers (also assuming that x is regular enough in certain other ways). For that purpose we will generate values of another random variable Z taking values in [0,1] and with mean perc.



set.seed(1)
perc <- 50 / 100
x <- sample(1:10000, 1000)
sum(x)
# [1] 5014161
x <- round(x * rbeta(length(x), perc / 3 / (1 - perc), 1 / 3))
sum(x)
# [1] 2550901
sum(x) * 2
# [1] 5101802
sum(x) * 2 / 5014161
# [1] 1.017479 # One percent deviation


Here for Z I chose a certain beta distribution giving mean perc, but you could pick some other too. The lower the variance, the more precise the result. For instance, the following is much better as the previously chosen beta distribution is, in fact, bimodal:



set.seed(1)
perc <- 50 / 100
x <- sample(1:1000, 100)
sum(x)
# [1] 49921
x <- round(x * rbeta(length(x), 100 * perc / (1 - perc), 100))
sum(x)
# [1] 24851
sum(x) * 2
# [1] 49702
sum(x) * 2 / 49921
# [1] 0.9956131 # Less than 0.5% deviation!





share|improve this answer























  • I am totally and thoroughly impressed by this solution!
    – Federico Giorgi
    Nov 20 at 17:19






  • 1




    @FedericoGiorgi, the choice of this Z is really important, as I'm demonstrating in the answer. If you care about the error, you may pick some distribution that is very concentrated around perc, perhaps taking values only in some interval [perc - epsilon, perc + epsilon]. That said, depending on the details of your problem, the solution can be improved.
    – Julius Vainora
    Nov 20 at 17:29






  • 1




    As you lower the variance on the beta distribution, you are essentially doing x <- round(perc * x).
    – mickey
    Nov 20 at 17:31






  • 1




    Yes, in the limit. The question is how much of this randomness (variance) is needed for the actual problem. However, you won't be able to achieve super low variance with beta while maintaining the desired mean. So, some other distribution would be needed then.
    – Julius Vainora
    Nov 20 at 17:32








  • 1




    @FedericoGiorgi, among other properties, symmetry, lower variance, and narrower interval of possible values help precision, while higher variance and wider interval would increase "randomness". So, you may experiment with those parameters. Currently your problem description doesn't provide any information of what any of those should be, and that's not really a programming question anymore anyway.
    – Julius Vainora
    Nov 20 at 17:55





















3














An alternative solution is this function, which downsamples the original vector by a random fraction proportional to the vector element size. Then it checks that elements don't fall below zero, and iteratively approaches an optimal solution.



removereads<-function(x,perc=NULL){
xsum<-sum(x)
toremove<-floor(xsum*perc)
toremove2<-toremove
irem<-1
while(toremove2>(toremove*0.01)){
message("Downsampling iteration ",irem)
tmp<-sample(1:length(x),toremove2,prob=x,replace=TRUE)
tmp2<-table(tmp)
y<-x
common<-as.numeric(names(tmp2))
y[common]<-x[common]-tmp2
y[y<0]<-0
toremove2<-toremove-(xsum-sum(y))
irem<-irem+1
}
return(y)
}
set.seed(1)
x<-sample(1:1000,10000,replace=TRUE)
perc<-0.9
y<-removereads(x,perc)
plot(x,y,xlab="Before reduction",ylab="After reduction")
abline(0,1)


And the graphical results:
Downsampling R vector






share|improve this answer





























    1














    Here's a solution which uses draws from the Dirichlet distribution:



    set.seed(1)
    x = sample(10000, 1000, replace = TRUE)

    magic = function(x, perc, alpha = 1){
    # sample from the Dirichlet distribution
    # sum(p) == 1
    # lower values should reduce by less than larger values
    # larger alpha means the result will have more "randomness"
    p = rgamma(length(x), x / alpha, 1)
    p = p / sum(p)

    # scale p up an amount so we can subtract it from x
    # and get close to the desired sum
    reduce = round(p * (sum(x) - sum(round(x * perc))))
    y = x - reduce

    # No negatives
    y = c(ifelse(y < 0, 0, y))

    return (y)
    }

    alpha = 500
    perc = 0.7
    target = sum(round(perc * x))
    y = magic(x, perc, alpha)

    # Hopefully close to 1
    sum(y) / target
    > 1.000048

    # Measure of the "randomness"
    sd(y / x)
    > 0.1376637


    Basically, it tries to figure out how much to reduce each element by while still getting close to the sum you want. You can control how "random" you want the new vector by increasing alpha.






    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397861%2freduce-total-sum-of-vector-elements-in-r%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      5














      Assuming that x is long enough, we may rely on some appropriate law of large numbers (also assuming that x is regular enough in certain other ways). For that purpose we will generate values of another random variable Z taking values in [0,1] and with mean perc.



      set.seed(1)
      perc <- 50 / 100
      x <- sample(1:10000, 1000)
      sum(x)
      # [1] 5014161
      x <- round(x * rbeta(length(x), perc / 3 / (1 - perc), 1 / 3))
      sum(x)
      # [1] 2550901
      sum(x) * 2
      # [1] 5101802
      sum(x) * 2 / 5014161
      # [1] 1.017479 # One percent deviation


      Here for Z I chose a certain beta distribution giving mean perc, but you could pick some other too. The lower the variance, the more precise the result. For instance, the following is much better as the previously chosen beta distribution is, in fact, bimodal:



      set.seed(1)
      perc <- 50 / 100
      x <- sample(1:1000, 100)
      sum(x)
      # [1] 49921
      x <- round(x * rbeta(length(x), 100 * perc / (1 - perc), 100))
      sum(x)
      # [1] 24851
      sum(x) * 2
      # [1] 49702
      sum(x) * 2 / 49921
      # [1] 0.9956131 # Less than 0.5% deviation!





      share|improve this answer























      • I am totally and thoroughly impressed by this solution!
        – Federico Giorgi
        Nov 20 at 17:19






      • 1




        @FedericoGiorgi, the choice of this Z is really important, as I'm demonstrating in the answer. If you care about the error, you may pick some distribution that is very concentrated around perc, perhaps taking values only in some interval [perc - epsilon, perc + epsilon]. That said, depending on the details of your problem, the solution can be improved.
        – Julius Vainora
        Nov 20 at 17:29






      • 1




        As you lower the variance on the beta distribution, you are essentially doing x <- round(perc * x).
        – mickey
        Nov 20 at 17:31






      • 1




        Yes, in the limit. The question is how much of this randomness (variance) is needed for the actual problem. However, you won't be able to achieve super low variance with beta while maintaining the desired mean. So, some other distribution would be needed then.
        – Julius Vainora
        Nov 20 at 17:32








      • 1




        @FedericoGiorgi, among other properties, symmetry, lower variance, and narrower interval of possible values help precision, while higher variance and wider interval would increase "randomness". So, you may experiment with those parameters. Currently your problem description doesn't provide any information of what any of those should be, and that's not really a programming question anymore anyway.
        – Julius Vainora
        Nov 20 at 17:55


















      5














      Assuming that x is long enough, we may rely on some appropriate law of large numbers (also assuming that x is regular enough in certain other ways). For that purpose we will generate values of another random variable Z taking values in [0,1] and with mean perc.



      set.seed(1)
      perc <- 50 / 100
      x <- sample(1:10000, 1000)
      sum(x)
      # [1] 5014161
      x <- round(x * rbeta(length(x), perc / 3 / (1 - perc), 1 / 3))
      sum(x)
      # [1] 2550901
      sum(x) * 2
      # [1] 5101802
      sum(x) * 2 / 5014161
      # [1] 1.017479 # One percent deviation


      Here for Z I chose a certain beta distribution giving mean perc, but you could pick some other too. The lower the variance, the more precise the result. For instance, the following is much better as the previously chosen beta distribution is, in fact, bimodal:



      set.seed(1)
      perc <- 50 / 100
      x <- sample(1:1000, 100)
      sum(x)
      # [1] 49921
      x <- round(x * rbeta(length(x), 100 * perc / (1 - perc), 100))
      sum(x)
      # [1] 24851
      sum(x) * 2
      # [1] 49702
      sum(x) * 2 / 49921
      # [1] 0.9956131 # Less than 0.5% deviation!





      share|improve this answer























      • I am totally and thoroughly impressed by this solution!
        – Federico Giorgi
        Nov 20 at 17:19






      • 1




        @FedericoGiorgi, the choice of this Z is really important, as I'm demonstrating in the answer. If you care about the error, you may pick some distribution that is very concentrated around perc, perhaps taking values only in some interval [perc - epsilon, perc + epsilon]. That said, depending on the details of your problem, the solution can be improved.
        – Julius Vainora
        Nov 20 at 17:29






      • 1




        As you lower the variance on the beta distribution, you are essentially doing x <- round(perc * x).
        – mickey
        Nov 20 at 17:31






      • 1




        Yes, in the limit. The question is how much of this randomness (variance) is needed for the actual problem. However, you won't be able to achieve super low variance with beta while maintaining the desired mean. So, some other distribution would be needed then.
        – Julius Vainora
        Nov 20 at 17:32








      • 1




        @FedericoGiorgi, among other properties, symmetry, lower variance, and narrower interval of possible values help precision, while higher variance and wider interval would increase "randomness". So, you may experiment with those parameters. Currently your problem description doesn't provide any information of what any of those should be, and that's not really a programming question anymore anyway.
        – Julius Vainora
        Nov 20 at 17:55
















      5












      5








      5






      Assuming that x is long enough, we may rely on some appropriate law of large numbers (also assuming that x is regular enough in certain other ways). For that purpose we will generate values of another random variable Z taking values in [0,1] and with mean perc.



      set.seed(1)
      perc <- 50 / 100
      x <- sample(1:10000, 1000)
      sum(x)
      # [1] 5014161
      x <- round(x * rbeta(length(x), perc / 3 / (1 - perc), 1 / 3))
      sum(x)
      # [1] 2550901
      sum(x) * 2
      # [1] 5101802
      sum(x) * 2 / 5014161
      # [1] 1.017479 # One percent deviation


      Here for Z I chose a certain beta distribution giving mean perc, but you could pick some other too. The lower the variance, the more precise the result. For instance, the following is much better as the previously chosen beta distribution is, in fact, bimodal:



      set.seed(1)
      perc <- 50 / 100
      x <- sample(1:1000, 100)
      sum(x)
      # [1] 49921
      x <- round(x * rbeta(length(x), 100 * perc / (1 - perc), 100))
      sum(x)
      # [1] 24851
      sum(x) * 2
      # [1] 49702
      sum(x) * 2 / 49921
      # [1] 0.9956131 # Less than 0.5% deviation!





      share|improve this answer














      Assuming that x is long enough, we may rely on some appropriate law of large numbers (also assuming that x is regular enough in certain other ways). For that purpose we will generate values of another random variable Z taking values in [0,1] and with mean perc.



      set.seed(1)
      perc <- 50 / 100
      x <- sample(1:10000, 1000)
      sum(x)
      # [1] 5014161
      x <- round(x * rbeta(length(x), perc / 3 / (1 - perc), 1 / 3))
      sum(x)
      # [1] 2550901
      sum(x) * 2
      # [1] 5101802
      sum(x) * 2 / 5014161
      # [1] 1.017479 # One percent deviation


      Here for Z I chose a certain beta distribution giving mean perc, but you could pick some other too. The lower the variance, the more precise the result. For instance, the following is much better as the previously chosen beta distribution is, in fact, bimodal:



      set.seed(1)
      perc <- 50 / 100
      x <- sample(1:1000, 100)
      sum(x)
      # [1] 49921
      x <- round(x * rbeta(length(x), 100 * perc / (1 - perc), 100))
      sum(x)
      # [1] 24851
      sum(x) * 2
      # [1] 49702
      sum(x) * 2 / 49921
      # [1] 0.9956131 # Less than 0.5% deviation!






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Nov 20 at 17:27

























      answered Nov 20 at 16:58









      Julius Vainora

      31.8k75978




      31.8k75978












      • I am totally and thoroughly impressed by this solution!
        – Federico Giorgi
        Nov 20 at 17:19






      • 1




        @FedericoGiorgi, the choice of this Z is really important, as I'm demonstrating in the answer. If you care about the error, you may pick some distribution that is very concentrated around perc, perhaps taking values only in some interval [perc - epsilon, perc + epsilon]. That said, depending on the details of your problem, the solution can be improved.
        – Julius Vainora
        Nov 20 at 17:29






      • 1




        As you lower the variance on the beta distribution, you are essentially doing x <- round(perc * x).
        – mickey
        Nov 20 at 17:31






      • 1




        Yes, in the limit. The question is how much of this randomness (variance) is needed for the actual problem. However, you won't be able to achieve super low variance with beta while maintaining the desired mean. So, some other distribution would be needed then.
        – Julius Vainora
        Nov 20 at 17:32








      • 1




        @FedericoGiorgi, among other properties, symmetry, lower variance, and narrower interval of possible values help precision, while higher variance and wider interval would increase "randomness". So, you may experiment with those parameters. Currently your problem description doesn't provide any information of what any of those should be, and that's not really a programming question anymore anyway.
        – Julius Vainora
        Nov 20 at 17:55




















      • I am totally and thoroughly impressed by this solution!
        – Federico Giorgi
        Nov 20 at 17:19






      • 1




        @FedericoGiorgi, the choice of this Z is really important, as I'm demonstrating in the answer. If you care about the error, you may pick some distribution that is very concentrated around perc, perhaps taking values only in some interval [perc - epsilon, perc + epsilon]. That said, depending on the details of your problem, the solution can be improved.
        – Julius Vainora
        Nov 20 at 17:29






      • 1




        As you lower the variance on the beta distribution, you are essentially doing x <- round(perc * x).
        – mickey
        Nov 20 at 17:31






      • 1




        Yes, in the limit. The question is how much of this randomness (variance) is needed for the actual problem. However, you won't be able to achieve super low variance with beta while maintaining the desired mean. So, some other distribution would be needed then.
        – Julius Vainora
        Nov 20 at 17:32








      • 1




        @FedericoGiorgi, among other properties, symmetry, lower variance, and narrower interval of possible values help precision, while higher variance and wider interval would increase "randomness". So, you may experiment with those parameters. Currently your problem description doesn't provide any information of what any of those should be, and that's not really a programming question anymore anyway.
        – Julius Vainora
        Nov 20 at 17:55


















      I am totally and thoroughly impressed by this solution!
      – Federico Giorgi
      Nov 20 at 17:19




      I am totally and thoroughly impressed by this solution!
      – Federico Giorgi
      Nov 20 at 17:19




      1




      1




      @FedericoGiorgi, the choice of this Z is really important, as I'm demonstrating in the answer. If you care about the error, you may pick some distribution that is very concentrated around perc, perhaps taking values only in some interval [perc - epsilon, perc + epsilon]. That said, depending on the details of your problem, the solution can be improved.
      – Julius Vainora
      Nov 20 at 17:29




      @FedericoGiorgi, the choice of this Z is really important, as I'm demonstrating in the answer. If you care about the error, you may pick some distribution that is very concentrated around perc, perhaps taking values only in some interval [perc - epsilon, perc + epsilon]. That said, depending on the details of your problem, the solution can be improved.
      – Julius Vainora
      Nov 20 at 17:29




      1




      1




      As you lower the variance on the beta distribution, you are essentially doing x <- round(perc * x).
      – mickey
      Nov 20 at 17:31




      As you lower the variance on the beta distribution, you are essentially doing x <- round(perc * x).
      – mickey
      Nov 20 at 17:31




      1




      1




      Yes, in the limit. The question is how much of this randomness (variance) is needed for the actual problem. However, you won't be able to achieve super low variance with beta while maintaining the desired mean. So, some other distribution would be needed then.
      – Julius Vainora
      Nov 20 at 17:32






      Yes, in the limit. The question is how much of this randomness (variance) is needed for the actual problem. However, you won't be able to achieve super low variance with beta while maintaining the desired mean. So, some other distribution would be needed then.
      – Julius Vainora
      Nov 20 at 17:32






      1




      1




      @FedericoGiorgi, among other properties, symmetry, lower variance, and narrower interval of possible values help precision, while higher variance and wider interval would increase "randomness". So, you may experiment with those parameters. Currently your problem description doesn't provide any information of what any of those should be, and that's not really a programming question anymore anyway.
      – Julius Vainora
      Nov 20 at 17:55






      @FedericoGiorgi, among other properties, symmetry, lower variance, and narrower interval of possible values help precision, while higher variance and wider interval would increase "randomness". So, you may experiment with those parameters. Currently your problem description doesn't provide any information of what any of those should be, and that's not really a programming question anymore anyway.
      – Julius Vainora
      Nov 20 at 17:55















      3














      An alternative solution is this function, which downsamples the original vector by a random fraction proportional to the vector element size. Then it checks that elements don't fall below zero, and iteratively approaches an optimal solution.



      removereads<-function(x,perc=NULL){
      xsum<-sum(x)
      toremove<-floor(xsum*perc)
      toremove2<-toremove
      irem<-1
      while(toremove2>(toremove*0.01)){
      message("Downsampling iteration ",irem)
      tmp<-sample(1:length(x),toremove2,prob=x,replace=TRUE)
      tmp2<-table(tmp)
      y<-x
      common<-as.numeric(names(tmp2))
      y[common]<-x[common]-tmp2
      y[y<0]<-0
      toremove2<-toremove-(xsum-sum(y))
      irem<-irem+1
      }
      return(y)
      }
      set.seed(1)
      x<-sample(1:1000,10000,replace=TRUE)
      perc<-0.9
      y<-removereads(x,perc)
      plot(x,y,xlab="Before reduction",ylab="After reduction")
      abline(0,1)


      And the graphical results:
      Downsampling R vector






      share|improve this answer


























        3














        An alternative solution is this function, which downsamples the original vector by a random fraction proportional to the vector element size. Then it checks that elements don't fall below zero, and iteratively approaches an optimal solution.



        removereads<-function(x,perc=NULL){
        xsum<-sum(x)
        toremove<-floor(xsum*perc)
        toremove2<-toremove
        irem<-1
        while(toremove2>(toremove*0.01)){
        message("Downsampling iteration ",irem)
        tmp<-sample(1:length(x),toremove2,prob=x,replace=TRUE)
        tmp2<-table(tmp)
        y<-x
        common<-as.numeric(names(tmp2))
        y[common]<-x[common]-tmp2
        y[y<0]<-0
        toremove2<-toremove-(xsum-sum(y))
        irem<-irem+1
        }
        return(y)
        }
        set.seed(1)
        x<-sample(1:1000,10000,replace=TRUE)
        perc<-0.9
        y<-removereads(x,perc)
        plot(x,y,xlab="Before reduction",ylab="After reduction")
        abline(0,1)


        And the graphical results:
        Downsampling R vector






        share|improve this answer
























          3












          3








          3






          An alternative solution is this function, which downsamples the original vector by a random fraction proportional to the vector element size. Then it checks that elements don't fall below zero, and iteratively approaches an optimal solution.



          removereads<-function(x,perc=NULL){
          xsum<-sum(x)
          toremove<-floor(xsum*perc)
          toremove2<-toremove
          irem<-1
          while(toremove2>(toremove*0.01)){
          message("Downsampling iteration ",irem)
          tmp<-sample(1:length(x),toremove2,prob=x,replace=TRUE)
          tmp2<-table(tmp)
          y<-x
          common<-as.numeric(names(tmp2))
          y[common]<-x[common]-tmp2
          y[y<0]<-0
          toremove2<-toremove-(xsum-sum(y))
          irem<-irem+1
          }
          return(y)
          }
          set.seed(1)
          x<-sample(1:1000,10000,replace=TRUE)
          perc<-0.9
          y<-removereads(x,perc)
          plot(x,y,xlab="Before reduction",ylab="After reduction")
          abline(0,1)


          And the graphical results:
          Downsampling R vector






          share|improve this answer












          An alternative solution is this function, which downsamples the original vector by a random fraction proportional to the vector element size. Then it checks that elements don't fall below zero, and iteratively approaches an optimal solution.



          removereads<-function(x,perc=NULL){
          xsum<-sum(x)
          toremove<-floor(xsum*perc)
          toremove2<-toremove
          irem<-1
          while(toremove2>(toremove*0.01)){
          message("Downsampling iteration ",irem)
          tmp<-sample(1:length(x),toremove2,prob=x,replace=TRUE)
          tmp2<-table(tmp)
          y<-x
          common<-as.numeric(names(tmp2))
          y[common]<-x[common]-tmp2
          y[y<0]<-0
          toremove2<-toremove-(xsum-sum(y))
          irem<-irem+1
          }
          return(y)
          }
          set.seed(1)
          x<-sample(1:1000,10000,replace=TRUE)
          perc<-0.9
          y<-removereads(x,perc)
          plot(x,y,xlab="Before reduction",ylab="After reduction")
          abline(0,1)


          And the graphical results:
          Downsampling R vector







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 at 1:50









          Federico Giorgi

          5,20463246




          5,20463246























              1














              Here's a solution which uses draws from the Dirichlet distribution:



              set.seed(1)
              x = sample(10000, 1000, replace = TRUE)

              magic = function(x, perc, alpha = 1){
              # sample from the Dirichlet distribution
              # sum(p) == 1
              # lower values should reduce by less than larger values
              # larger alpha means the result will have more "randomness"
              p = rgamma(length(x), x / alpha, 1)
              p = p / sum(p)

              # scale p up an amount so we can subtract it from x
              # and get close to the desired sum
              reduce = round(p * (sum(x) - sum(round(x * perc))))
              y = x - reduce

              # No negatives
              y = c(ifelse(y < 0, 0, y))

              return (y)
              }

              alpha = 500
              perc = 0.7
              target = sum(round(perc * x))
              y = magic(x, perc, alpha)

              # Hopefully close to 1
              sum(y) / target
              > 1.000048

              # Measure of the "randomness"
              sd(y / x)
              > 0.1376637


              Basically, it tries to figure out how much to reduce each element by while still getting close to the sum you want. You can control how "random" you want the new vector by increasing alpha.






              share|improve this answer




























                1














                Here's a solution which uses draws from the Dirichlet distribution:



                set.seed(1)
                x = sample(10000, 1000, replace = TRUE)

                magic = function(x, perc, alpha = 1){
                # sample from the Dirichlet distribution
                # sum(p) == 1
                # lower values should reduce by less than larger values
                # larger alpha means the result will have more "randomness"
                p = rgamma(length(x), x / alpha, 1)
                p = p / sum(p)

                # scale p up an amount so we can subtract it from x
                # and get close to the desired sum
                reduce = round(p * (sum(x) - sum(round(x * perc))))
                y = x - reduce

                # No negatives
                y = c(ifelse(y < 0, 0, y))

                return (y)
                }

                alpha = 500
                perc = 0.7
                target = sum(round(perc * x))
                y = magic(x, perc, alpha)

                # Hopefully close to 1
                sum(y) / target
                > 1.000048

                # Measure of the "randomness"
                sd(y / x)
                > 0.1376637


                Basically, it tries to figure out how much to reduce each element by while still getting close to the sum you want. You can control how "random" you want the new vector by increasing alpha.






                share|improve this answer


























                  1












                  1








                  1






                  Here's a solution which uses draws from the Dirichlet distribution:



                  set.seed(1)
                  x = sample(10000, 1000, replace = TRUE)

                  magic = function(x, perc, alpha = 1){
                  # sample from the Dirichlet distribution
                  # sum(p) == 1
                  # lower values should reduce by less than larger values
                  # larger alpha means the result will have more "randomness"
                  p = rgamma(length(x), x / alpha, 1)
                  p = p / sum(p)

                  # scale p up an amount so we can subtract it from x
                  # and get close to the desired sum
                  reduce = round(p * (sum(x) - sum(round(x * perc))))
                  y = x - reduce

                  # No negatives
                  y = c(ifelse(y < 0, 0, y))

                  return (y)
                  }

                  alpha = 500
                  perc = 0.7
                  target = sum(round(perc * x))
                  y = magic(x, perc, alpha)

                  # Hopefully close to 1
                  sum(y) / target
                  > 1.000048

                  # Measure of the "randomness"
                  sd(y / x)
                  > 0.1376637


                  Basically, it tries to figure out how much to reduce each element by while still getting close to the sum you want. You can control how "random" you want the new vector by increasing alpha.






                  share|improve this answer














                  Here's a solution which uses draws from the Dirichlet distribution:



                  set.seed(1)
                  x = sample(10000, 1000, replace = TRUE)

                  magic = function(x, perc, alpha = 1){
                  # sample from the Dirichlet distribution
                  # sum(p) == 1
                  # lower values should reduce by less than larger values
                  # larger alpha means the result will have more "randomness"
                  p = rgamma(length(x), x / alpha, 1)
                  p = p / sum(p)

                  # scale p up an amount so we can subtract it from x
                  # and get close to the desired sum
                  reduce = round(p * (sum(x) - sum(round(x * perc))))
                  y = x - reduce

                  # No negatives
                  y = c(ifelse(y < 0, 0, y))

                  return (y)
                  }

                  alpha = 500
                  perc = 0.7
                  target = sum(round(perc * x))
                  y = magic(x, perc, alpha)

                  # Hopefully close to 1
                  sum(y) / target
                  > 1.000048

                  # Measure of the "randomness"
                  sd(y / x)
                  > 0.1376637


                  Basically, it tries to figure out how much to reduce each element by while still getting close to the sum you want. You can control how "random" you want the new vector by increasing alpha.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 20 at 18:21

























                  answered Nov 20 at 17:56









                  mickey

                  1,2181216




                  1,2181216






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.





                      Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                      Please pay close attention to the following guidance:


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397861%2freduce-total-sum-of-vector-elements-in-r%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Create new schema in PostgreSQL using DBeaver

                      Deepest pit of an array with Javascript: test on Codility

                      Costa Masnaga