R - use if statement to regroup variable












0















I want to regroup a variable into a new one.



If value is 0, new one should be 0 too.
If value ist 999, then make it missing, NA.
Everything else 1



This is my try:



id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)

df$variable2 <-
if (df$variable == 0) {
df$variable2 = 0
} else if (df$variable == 999){
df$variable2 = NA
} else {
df$variable2 = 1
}


And this the error message:




In if (df$variable == 0) { : the condition has length > 1 and only
the first element will be used




A pretty basic question but I'm a basic user. Thanks in advance!










share|improve this question



























    0















    I want to regroup a variable into a new one.



    If value is 0, new one should be 0 too.
    If value ist 999, then make it missing, NA.
    Everything else 1



    This is my try:



    id <- 1:10
    variable <- c(0,0,0,1,2,3,4,5,999,999)
    df <- data.frame(id,variable)

    df$variable2 <-
    if (df$variable == 0) {
    df$variable2 = 0
    } else if (df$variable == 999){
    df$variable2 = NA
    } else {
    df$variable2 = 1
    }


    And this the error message:




    In if (df$variable == 0) { : the condition has length > 1 and only
    the first element will be used




    A pretty basic question but I'm a basic user. Thanks in advance!










    share|improve this question

























      0












      0








      0








      I want to regroup a variable into a new one.



      If value is 0, new one should be 0 too.
      If value ist 999, then make it missing, NA.
      Everything else 1



      This is my try:



      id <- 1:10
      variable <- c(0,0,0,1,2,3,4,5,999,999)
      df <- data.frame(id,variable)

      df$variable2 <-
      if (df$variable == 0) {
      df$variable2 = 0
      } else if (df$variable == 999){
      df$variable2 = NA
      } else {
      df$variable2 = 1
      }


      And this the error message:




      In if (df$variable == 0) { : the condition has length > 1 and only
      the first element will be used




      A pretty basic question but I'm a basic user. Thanks in advance!










      share|improve this question














      I want to regroup a variable into a new one.



      If value is 0, new one should be 0 too.
      If value ist 999, then make it missing, NA.
      Everything else 1



      This is my try:



      id <- 1:10
      variable <- c(0,0,0,1,2,3,4,5,999,999)
      df <- data.frame(id,variable)

      df$variable2 <-
      if (df$variable == 0) {
      df$variable2 = 0
      } else if (df$variable == 999){
      df$variable2 = NA
      } else {
      df$variable2 = 1
      }


      And this the error message:




      In if (df$variable == 0) { : the condition has length > 1 and only
      the first element will be used




      A pretty basic question but I'm a basic user. Thanks in advance!







      r






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 25 '18 at 22:07









      SchillerlockeSchillerlocke

      246




      246
























          4 Answers
          4






          active

          oldest

          votes


















          3














          Try ifelse



          df$variable2 <- ifelse(df$variable == 999, NA, ifelse(df$variable > 0, 1, 0))
          df
          # id variable variable2
          #1 1 0 0
          #2 2 0 0
          #3 3 0 0
          #4 4 1 1
          #5 5 2 1
          #6 6 3 1
          #7 7 4 1
          #8 8 5 1
          #9 9 999 NA
          #10 10 999 NA




          When you do df$variable == 0 the output / condition is



          #[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE


          where it should be a length-one logical vector that is not NA in if(condition), see ?"if".





          You can avoid ifelse, for example, like so



          df$variable2 <- df$variable
          df$variable2[df$variable2 == 999] <- NA
          df$variable2[df$variable2 > 0] <- 1





          share|improve this answer


























          • With ifelse I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse is NOT simply a vectorized if - else. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to use ifelse but rather try to find a solution with normal if - else clauses.

            – Gwang-Jin Kim
            Nov 25 '18 at 22:45













          • @Gwang-JinKim Thanks for the comment. Posted a second solution to avoid ifelse.

            – markus
            Nov 25 '18 at 22:48











          • welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negative ifelse example - but difficult to find ...

            – Gwang-Jin Kim
            Nov 25 '18 at 22:50











          • @Gwang-JinKim ... Please do find such a link. To date, I have never any issue with ifelse(). Pass in it a vector to return a modified vector of same length!

            – Parfait
            Nov 25 '18 at 23:46











          • @Parfait I couldn't find a link but added to my answer an example. You are right, using ifelse purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.

            – Gwang-Jin Kim
            Nov 26 '18 at 7:48



















          2














          It might be easier to avoid the if/else statement all together by using conditional statements within subset notation:



          when df$variable is equal to zero, change it to zero



          df$variable[df$variable==0] <- 0


          when df$variable is equal to 999, change it to NA



          df$variable[df$variable==999] <- NA


          when df$variable is greater than 0 and is not equal to NA, change it to 1



          df$variable[df$variable>0 & is.na(df$variable) == 'FALSE'] <- 1





          share|improve this answer

































            2














            Looks like you want to recode your variable. You can do this (and other data/variable transformations) with the sjmisc-package, in your case with the rec()-command:



            id <- 1:10
            variable <- c(0,0,0,1,2,3,4,5,999,999)
            df <- data.frame(id,variable)

            library(sjmisc)
            rec(df, variable, rec = c("0=0;999=NA;else=1"))
            #> id variable variable_r
            #> 1 1 0 0
            #> 2 2 0 0
            #> 3 3 0 0
            #> 4 4 1 1
            #> 5 5 2 1
            #> 6 6 3 1
            #> 7 7 4 1
            #> 8 8 5 1
            #> 9 9 999 NA
            #> 10 10 999 NA

            # or a single vector as input
            rec(df$variable, rec = c("0=0;999=NA;else=1"))
            #> [1] 0 0 0 1 1 1 1 1 NA NA


            There are many examples, also in the help-file, and you can find a sjmisc-cheatsheet at the RStudio-Cheatsheet collection (or direct PDF-download here).






            share|improve this answer































              1














              df$variable2 <- sapply(df$variable, 
              function(el) if (el == 0) {0} else if (el == 999) {NA} else {1})


              This one-liner reflects your:




              If value is 0, new one should be 0 too. If value ist 999, then make it
              missing, NA. Everything else 1




              Well, it is slightly slower than @markus's second or @SPJ's solutions which are most r-ish solutions.



              Why one should put away the hands from ifelse



              tt <- c(TRUE, FALSE, TRUE, FALSE)
              a <- c("a", "b", "c", "d")
              b <- 1:4
              ifelse(tt, a, b) ## [1] "a" "2" "c" "4"
              # totally perfect and as expected!

              df <- data.frame(a=a, b=b, c=tt)
              df$d <- ifelse(df$c, df$a, df$b)
              ## > df
              ## a b c d
              ## 1 a 1 TRUE 1
              ## 2 b 2 FALSE 2
              ## 3 c 3 TRUE 3
              ## 4 d 4 FALSE 4

              ######### This is wrong!! ##########################
              ## df$d is not [1] "a" "2" "c" "4"
              ## the problem is that
              ## ifelse(df$c, df$a, df$b)
              ## returns for each TRUE or FALSE the entire
              ## df$a or df$b intead of treating it like a vector.
              ## Since the last df$c is FALSE, df$b is returned
              ## Thus we get df$b for df$d.
              ## Quite an unintuitive behaviour.
              ##
              ## If one uses purely vectors, ifelse is fine.
              ## But actually df$c, df$a, df$b should be treated each like a vector.
              ## However, `ifelse` does not.
              ## No warnings that using `ifelse` with them will lead to a
              ## totally different behaviour.
              ## In my view, this is a design mistake of `ifelse`.
              ## Thus I decided myself to abandon `ifelse` from my set of R commands.
              ## To avoid that such kind of mistakes can ever happen.
              #####################################################


              As @Parfait pointed out correctly, it was a misinterpretation.
              The problem was that df$a was treated in the data frame as a factor.



              df <- data.frame(a=a, b=b, c=tt, stringsAsFactor = F)
              df$d <- ifelse(df$c, df$a, df$b)
              df


              Gives the correct result.



                a b     c d
              1 a 1 TRUE a
              2 b 2 FALSE 2
              3 c 3 TRUE c
              4 d 4 FALSE 4


              Thank you @Parfait to pointing that out!
              Strange that I didn't recognized that in my initial trials.
              But yeah, you are absolutely right!






              share|improve this answer





















              • 1





                That's not an ifelse issue but a factor issue. Use the stringsAsFactors = FALSE argument in data.frame() so column a is treated as character. This is a regular new useR overlook.

                – Parfait
                Nov 26 '18 at 12:27













              • @Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... - df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F) and then df$d <- ifelse(df$c, df$a, df$b) - looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!

                – Gwang-Jin Kim
                Nov 26 '18 at 13:56








              • 1





                Hmmmm... I am unable to reproduce using stringsAsFactors = FALSE. See demo with resulting df$d exactly equal and identical to original ifelse: rextester.com/YGEL36294.

                – Parfait
                Nov 26 '18 at 16:22











              • @Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.

                – Gwang-Jin Kim
                Nov 26 '18 at 20:17











              • Thank you then pointing out!

                – Gwang-Jin Kim
                Nov 26 '18 at 20:18











              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472492%2fr-use-if-statement-to-regroup-variable%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              4 Answers
              4






              active

              oldest

              votes








              4 Answers
              4






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3














              Try ifelse



              df$variable2 <- ifelse(df$variable == 999, NA, ifelse(df$variable > 0, 1, 0))
              df
              # id variable variable2
              #1 1 0 0
              #2 2 0 0
              #3 3 0 0
              #4 4 1 1
              #5 5 2 1
              #6 6 3 1
              #7 7 4 1
              #8 8 5 1
              #9 9 999 NA
              #10 10 999 NA




              When you do df$variable == 0 the output / condition is



              #[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE


              where it should be a length-one logical vector that is not NA in if(condition), see ?"if".





              You can avoid ifelse, for example, like so



              df$variable2 <- df$variable
              df$variable2[df$variable2 == 999] <- NA
              df$variable2[df$variable2 > 0] <- 1





              share|improve this answer


























              • With ifelse I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse is NOT simply a vectorized if - else. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to use ifelse but rather try to find a solution with normal if - else clauses.

                – Gwang-Jin Kim
                Nov 25 '18 at 22:45













              • @Gwang-JinKim Thanks for the comment. Posted a second solution to avoid ifelse.

                – markus
                Nov 25 '18 at 22:48











              • welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negative ifelse example - but difficult to find ...

                – Gwang-Jin Kim
                Nov 25 '18 at 22:50











              • @Gwang-JinKim ... Please do find such a link. To date, I have never any issue with ifelse(). Pass in it a vector to return a modified vector of same length!

                – Parfait
                Nov 25 '18 at 23:46











              • @Parfait I couldn't find a link but added to my answer an example. You are right, using ifelse purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.

                – Gwang-Jin Kim
                Nov 26 '18 at 7:48
















              3














              Try ifelse



              df$variable2 <- ifelse(df$variable == 999, NA, ifelse(df$variable > 0, 1, 0))
              df
              # id variable variable2
              #1 1 0 0
              #2 2 0 0
              #3 3 0 0
              #4 4 1 1
              #5 5 2 1
              #6 6 3 1
              #7 7 4 1
              #8 8 5 1
              #9 9 999 NA
              #10 10 999 NA




              When you do df$variable == 0 the output / condition is



              #[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE


              where it should be a length-one logical vector that is not NA in if(condition), see ?"if".





              You can avoid ifelse, for example, like so



              df$variable2 <- df$variable
              df$variable2[df$variable2 == 999] <- NA
              df$variable2[df$variable2 > 0] <- 1





              share|improve this answer


























              • With ifelse I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse is NOT simply a vectorized if - else. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to use ifelse but rather try to find a solution with normal if - else clauses.

                – Gwang-Jin Kim
                Nov 25 '18 at 22:45













              • @Gwang-JinKim Thanks for the comment. Posted a second solution to avoid ifelse.

                – markus
                Nov 25 '18 at 22:48











              • welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negative ifelse example - but difficult to find ...

                – Gwang-Jin Kim
                Nov 25 '18 at 22:50











              • @Gwang-JinKim ... Please do find such a link. To date, I have never any issue with ifelse(). Pass in it a vector to return a modified vector of same length!

                – Parfait
                Nov 25 '18 at 23:46











              • @Parfait I couldn't find a link but added to my answer an example. You are right, using ifelse purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.

                – Gwang-Jin Kim
                Nov 26 '18 at 7:48














              3












              3








              3







              Try ifelse



              df$variable2 <- ifelse(df$variable == 999, NA, ifelse(df$variable > 0, 1, 0))
              df
              # id variable variable2
              #1 1 0 0
              #2 2 0 0
              #3 3 0 0
              #4 4 1 1
              #5 5 2 1
              #6 6 3 1
              #7 7 4 1
              #8 8 5 1
              #9 9 999 NA
              #10 10 999 NA




              When you do df$variable == 0 the output / condition is



              #[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE


              where it should be a length-one logical vector that is not NA in if(condition), see ?"if".





              You can avoid ifelse, for example, like so



              df$variable2 <- df$variable
              df$variable2[df$variable2 == 999] <- NA
              df$variable2[df$variable2 > 0] <- 1





              share|improve this answer















              Try ifelse



              df$variable2 <- ifelse(df$variable == 999, NA, ifelse(df$variable > 0, 1, 0))
              df
              # id variable variable2
              #1 1 0 0
              #2 2 0 0
              #3 3 0 0
              #4 4 1 1
              #5 5 2 1
              #6 6 3 1
              #7 7 4 1
              #8 8 5 1
              #9 9 999 NA
              #10 10 999 NA




              When you do df$variable == 0 the output / condition is



              #[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE


              where it should be a length-one logical vector that is not NA in if(condition), see ?"if".





              You can avoid ifelse, for example, like so



              df$variable2 <- df$variable
              df$variable2[df$variable2 == 999] <- NA
              df$variable2[df$variable2 > 0] <- 1






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 25 '18 at 22:47

























              answered Nov 25 '18 at 22:14









              markusmarkus

              14.5k11336




              14.5k11336













              • With ifelse I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse is NOT simply a vectorized if - else. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to use ifelse but rather try to find a solution with normal if - else clauses.

                – Gwang-Jin Kim
                Nov 25 '18 at 22:45













              • @Gwang-JinKim Thanks for the comment. Posted a second solution to avoid ifelse.

                – markus
                Nov 25 '18 at 22:48











              • welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negative ifelse example - but difficult to find ...

                – Gwang-Jin Kim
                Nov 25 '18 at 22:50











              • @Gwang-JinKim ... Please do find such a link. To date, I have never any issue with ifelse(). Pass in it a vector to return a modified vector of same length!

                – Parfait
                Nov 25 '18 at 23:46











              • @Parfait I couldn't find a link but added to my answer an example. You are right, using ifelse purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.

                – Gwang-Jin Kim
                Nov 26 '18 at 7:48



















              • With ifelse I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse is NOT simply a vectorized if - else. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to use ifelse but rather try to find a solution with normal if - else clauses.

                – Gwang-Jin Kim
                Nov 25 '18 at 22:45













              • @Gwang-JinKim Thanks for the comment. Posted a second solution to avoid ifelse.

                – markus
                Nov 25 '18 at 22:48











              • welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negative ifelse example - but difficult to find ...

                – Gwang-Jin Kim
                Nov 25 '18 at 22:50











              • @Gwang-JinKim ... Please do find such a link. To date, I have never any issue with ifelse(). Pass in it a vector to return a modified vector of same length!

                – Parfait
                Nov 25 '18 at 23:46











              • @Parfait I couldn't find a link but added to my answer an example. You are right, using ifelse purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.

                – Gwang-Jin Kim
                Nov 26 '18 at 7:48

















              With ifelse I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse is NOT simply a vectorized if - else. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to use ifelse but rather try to find a solution with normal if - else clauses.

              – Gwang-Jin Kim
              Nov 25 '18 at 22:45







              With ifelse I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse is NOT simply a vectorized if - else. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to use ifelse but rather try to find a solution with normal if - else clauses.

              – Gwang-Jin Kim
              Nov 25 '18 at 22:45















              @Gwang-JinKim Thanks for the comment. Posted a second solution to avoid ifelse.

              – markus
              Nov 25 '18 at 22:48





              @Gwang-JinKim Thanks for the comment. Posted a second solution to avoid ifelse.

              – markus
              Nov 25 '18 at 22:48













              welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negative ifelse example - but difficult to find ...

              – Gwang-Jin Kim
              Nov 25 '18 at 22:50





              welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negative ifelse example - but difficult to find ...

              – Gwang-Jin Kim
              Nov 25 '18 at 22:50













              @Gwang-JinKim ... Please do find such a link. To date, I have never any issue with ifelse(). Pass in it a vector to return a modified vector of same length!

              – Parfait
              Nov 25 '18 at 23:46





              @Gwang-JinKim ... Please do find such a link. To date, I have never any issue with ifelse(). Pass in it a vector to return a modified vector of same length!

              – Parfait
              Nov 25 '18 at 23:46













              @Parfait I couldn't find a link but added to my answer an example. You are right, using ifelse purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.

              – Gwang-Jin Kim
              Nov 26 '18 at 7:48





              @Parfait I couldn't find a link but added to my answer an example. You are right, using ifelse purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.

              – Gwang-Jin Kim
              Nov 26 '18 at 7:48













              2














              It might be easier to avoid the if/else statement all together by using conditional statements within subset notation:



              when df$variable is equal to zero, change it to zero



              df$variable[df$variable==0] <- 0


              when df$variable is equal to 999, change it to NA



              df$variable[df$variable==999] <- NA


              when df$variable is greater than 0 and is not equal to NA, change it to 1



              df$variable[df$variable>0 & is.na(df$variable) == 'FALSE'] <- 1





              share|improve this answer






























                2














                It might be easier to avoid the if/else statement all together by using conditional statements within subset notation:



                when df$variable is equal to zero, change it to zero



                df$variable[df$variable==0] <- 0


                when df$variable is equal to 999, change it to NA



                df$variable[df$variable==999] <- NA


                when df$variable is greater than 0 and is not equal to NA, change it to 1



                df$variable[df$variable>0 & is.na(df$variable) == 'FALSE'] <- 1





                share|improve this answer




























                  2












                  2








                  2







                  It might be easier to avoid the if/else statement all together by using conditional statements within subset notation:



                  when df$variable is equal to zero, change it to zero



                  df$variable[df$variable==0] <- 0


                  when df$variable is equal to 999, change it to NA



                  df$variable[df$variable==999] <- NA


                  when df$variable is greater than 0 and is not equal to NA, change it to 1



                  df$variable[df$variable>0 & is.na(df$variable) == 'FALSE'] <- 1





                  share|improve this answer















                  It might be easier to avoid the if/else statement all together by using conditional statements within subset notation:



                  when df$variable is equal to zero, change it to zero



                  df$variable[df$variable==0] <- 0


                  when df$variable is equal to 999, change it to NA



                  df$variable[df$variable==999] <- NA


                  when df$variable is greater than 0 and is not equal to NA, change it to 1



                  df$variable[df$variable>0 & is.na(df$variable) == 'FALSE'] <- 1






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 26 '18 at 0:35









                  mischva11

                  890818




                  890818










                  answered Nov 25 '18 at 23:04









                  SPJSPJ

                  235




                  235























                      2














                      Looks like you want to recode your variable. You can do this (and other data/variable transformations) with the sjmisc-package, in your case with the rec()-command:



                      id <- 1:10
                      variable <- c(0,0,0,1,2,3,4,5,999,999)
                      df <- data.frame(id,variable)

                      library(sjmisc)
                      rec(df, variable, rec = c("0=0;999=NA;else=1"))
                      #> id variable variable_r
                      #> 1 1 0 0
                      #> 2 2 0 0
                      #> 3 3 0 0
                      #> 4 4 1 1
                      #> 5 5 2 1
                      #> 6 6 3 1
                      #> 7 7 4 1
                      #> 8 8 5 1
                      #> 9 9 999 NA
                      #> 10 10 999 NA

                      # or a single vector as input
                      rec(df$variable, rec = c("0=0;999=NA;else=1"))
                      #> [1] 0 0 0 1 1 1 1 1 NA NA


                      There are many examples, also in the help-file, and you can find a sjmisc-cheatsheet at the RStudio-Cheatsheet collection (or direct PDF-download here).






                      share|improve this answer




























                        2














                        Looks like you want to recode your variable. You can do this (and other data/variable transformations) with the sjmisc-package, in your case with the rec()-command:



                        id <- 1:10
                        variable <- c(0,0,0,1,2,3,4,5,999,999)
                        df <- data.frame(id,variable)

                        library(sjmisc)
                        rec(df, variable, rec = c("0=0;999=NA;else=1"))
                        #> id variable variable_r
                        #> 1 1 0 0
                        #> 2 2 0 0
                        #> 3 3 0 0
                        #> 4 4 1 1
                        #> 5 5 2 1
                        #> 6 6 3 1
                        #> 7 7 4 1
                        #> 8 8 5 1
                        #> 9 9 999 NA
                        #> 10 10 999 NA

                        # or a single vector as input
                        rec(df$variable, rec = c("0=0;999=NA;else=1"))
                        #> [1] 0 0 0 1 1 1 1 1 NA NA


                        There are many examples, also in the help-file, and you can find a sjmisc-cheatsheet at the RStudio-Cheatsheet collection (or direct PDF-download here).






                        share|improve this answer


























                          2












                          2








                          2







                          Looks like you want to recode your variable. You can do this (and other data/variable transformations) with the sjmisc-package, in your case with the rec()-command:



                          id <- 1:10
                          variable <- c(0,0,0,1,2,3,4,5,999,999)
                          df <- data.frame(id,variable)

                          library(sjmisc)
                          rec(df, variable, rec = c("0=0;999=NA;else=1"))
                          #> id variable variable_r
                          #> 1 1 0 0
                          #> 2 2 0 0
                          #> 3 3 0 0
                          #> 4 4 1 1
                          #> 5 5 2 1
                          #> 6 6 3 1
                          #> 7 7 4 1
                          #> 8 8 5 1
                          #> 9 9 999 NA
                          #> 10 10 999 NA

                          # or a single vector as input
                          rec(df$variable, rec = c("0=0;999=NA;else=1"))
                          #> [1] 0 0 0 1 1 1 1 1 NA NA


                          There are many examples, also in the help-file, and you can find a sjmisc-cheatsheet at the RStudio-Cheatsheet collection (or direct PDF-download here).






                          share|improve this answer













                          Looks like you want to recode your variable. You can do this (and other data/variable transformations) with the sjmisc-package, in your case with the rec()-command:



                          id <- 1:10
                          variable <- c(0,0,0,1,2,3,4,5,999,999)
                          df <- data.frame(id,variable)

                          library(sjmisc)
                          rec(df, variable, rec = c("0=0;999=NA;else=1"))
                          #> id variable variable_r
                          #> 1 1 0 0
                          #> 2 2 0 0
                          #> 3 3 0 0
                          #> 4 4 1 1
                          #> 5 5 2 1
                          #> 6 6 3 1
                          #> 7 7 4 1
                          #> 8 8 5 1
                          #> 9 9 999 NA
                          #> 10 10 999 NA

                          # or a single vector as input
                          rec(df$variable, rec = c("0=0;999=NA;else=1"))
                          #> [1] 0 0 0 1 1 1 1 1 NA NA


                          There are many examples, also in the help-file, and you can find a sjmisc-cheatsheet at the RStudio-Cheatsheet collection (or direct PDF-download here).







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 26 '18 at 7:53









                          DanielDaniel

                          3,82341730




                          3,82341730























                              1














                              df$variable2 <- sapply(df$variable, 
                              function(el) if (el == 0) {0} else if (el == 999) {NA} else {1})


                              This one-liner reflects your:




                              If value is 0, new one should be 0 too. If value ist 999, then make it
                              missing, NA. Everything else 1




                              Well, it is slightly slower than @markus's second or @SPJ's solutions which are most r-ish solutions.



                              Why one should put away the hands from ifelse



                              tt <- c(TRUE, FALSE, TRUE, FALSE)
                              a <- c("a", "b", "c", "d")
                              b <- 1:4
                              ifelse(tt, a, b) ## [1] "a" "2" "c" "4"
                              # totally perfect and as expected!

                              df <- data.frame(a=a, b=b, c=tt)
                              df$d <- ifelse(df$c, df$a, df$b)
                              ## > df
                              ## a b c d
                              ## 1 a 1 TRUE 1
                              ## 2 b 2 FALSE 2
                              ## 3 c 3 TRUE 3
                              ## 4 d 4 FALSE 4

                              ######### This is wrong!! ##########################
                              ## df$d is not [1] "a" "2" "c" "4"
                              ## the problem is that
                              ## ifelse(df$c, df$a, df$b)
                              ## returns for each TRUE or FALSE the entire
                              ## df$a or df$b intead of treating it like a vector.
                              ## Since the last df$c is FALSE, df$b is returned
                              ## Thus we get df$b for df$d.
                              ## Quite an unintuitive behaviour.
                              ##
                              ## If one uses purely vectors, ifelse is fine.
                              ## But actually df$c, df$a, df$b should be treated each like a vector.
                              ## However, `ifelse` does not.
                              ## No warnings that using `ifelse` with them will lead to a
                              ## totally different behaviour.
                              ## In my view, this is a design mistake of `ifelse`.
                              ## Thus I decided myself to abandon `ifelse` from my set of R commands.
                              ## To avoid that such kind of mistakes can ever happen.
                              #####################################################


                              As @Parfait pointed out correctly, it was a misinterpretation.
                              The problem was that df$a was treated in the data frame as a factor.



                              df <- data.frame(a=a, b=b, c=tt, stringsAsFactor = F)
                              df$d <- ifelse(df$c, df$a, df$b)
                              df


                              Gives the correct result.



                                a b     c d
                              1 a 1 TRUE a
                              2 b 2 FALSE 2
                              3 c 3 TRUE c
                              4 d 4 FALSE 4


                              Thank you @Parfait to pointing that out!
                              Strange that I didn't recognized that in my initial trials.
                              But yeah, you are absolutely right!






                              share|improve this answer





















                              • 1





                                That's not an ifelse issue but a factor issue. Use the stringsAsFactors = FALSE argument in data.frame() so column a is treated as character. This is a regular new useR overlook.

                                – Parfait
                                Nov 26 '18 at 12:27













                              • @Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... - df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F) and then df$d <- ifelse(df$c, df$a, df$b) - looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!

                                – Gwang-Jin Kim
                                Nov 26 '18 at 13:56








                              • 1





                                Hmmmm... I am unable to reproduce using stringsAsFactors = FALSE. See demo with resulting df$d exactly equal and identical to original ifelse: rextester.com/YGEL36294.

                                – Parfait
                                Nov 26 '18 at 16:22











                              • @Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.

                                – Gwang-Jin Kim
                                Nov 26 '18 at 20:17











                              • Thank you then pointing out!

                                – Gwang-Jin Kim
                                Nov 26 '18 at 20:18
















                              1














                              df$variable2 <- sapply(df$variable, 
                              function(el) if (el == 0) {0} else if (el == 999) {NA} else {1})


                              This one-liner reflects your:




                              If value is 0, new one should be 0 too. If value ist 999, then make it
                              missing, NA. Everything else 1




                              Well, it is slightly slower than @markus's second or @SPJ's solutions which are most r-ish solutions.



                              Why one should put away the hands from ifelse



                              tt <- c(TRUE, FALSE, TRUE, FALSE)
                              a <- c("a", "b", "c", "d")
                              b <- 1:4
                              ifelse(tt, a, b) ## [1] "a" "2" "c" "4"
                              # totally perfect and as expected!

                              df <- data.frame(a=a, b=b, c=tt)
                              df$d <- ifelse(df$c, df$a, df$b)
                              ## > df
                              ## a b c d
                              ## 1 a 1 TRUE 1
                              ## 2 b 2 FALSE 2
                              ## 3 c 3 TRUE 3
                              ## 4 d 4 FALSE 4

                              ######### This is wrong!! ##########################
                              ## df$d is not [1] "a" "2" "c" "4"
                              ## the problem is that
                              ## ifelse(df$c, df$a, df$b)
                              ## returns for each TRUE or FALSE the entire
                              ## df$a or df$b intead of treating it like a vector.
                              ## Since the last df$c is FALSE, df$b is returned
                              ## Thus we get df$b for df$d.
                              ## Quite an unintuitive behaviour.
                              ##
                              ## If one uses purely vectors, ifelse is fine.
                              ## But actually df$c, df$a, df$b should be treated each like a vector.
                              ## However, `ifelse` does not.
                              ## No warnings that using `ifelse` with them will lead to a
                              ## totally different behaviour.
                              ## In my view, this is a design mistake of `ifelse`.
                              ## Thus I decided myself to abandon `ifelse` from my set of R commands.
                              ## To avoid that such kind of mistakes can ever happen.
                              #####################################################


                              As @Parfait pointed out correctly, it was a misinterpretation.
                              The problem was that df$a was treated in the data frame as a factor.



                              df <- data.frame(a=a, b=b, c=tt, stringsAsFactor = F)
                              df$d <- ifelse(df$c, df$a, df$b)
                              df


                              Gives the correct result.



                                a b     c d
                              1 a 1 TRUE a
                              2 b 2 FALSE 2
                              3 c 3 TRUE c
                              4 d 4 FALSE 4


                              Thank you @Parfait to pointing that out!
                              Strange that I didn't recognized that in my initial trials.
                              But yeah, you are absolutely right!






                              share|improve this answer





















                              • 1





                                That's not an ifelse issue but a factor issue. Use the stringsAsFactors = FALSE argument in data.frame() so column a is treated as character. This is a regular new useR overlook.

                                – Parfait
                                Nov 26 '18 at 12:27













                              • @Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... - df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F) and then df$d <- ifelse(df$c, df$a, df$b) - looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!

                                – Gwang-Jin Kim
                                Nov 26 '18 at 13:56








                              • 1





                                Hmmmm... I am unable to reproduce using stringsAsFactors = FALSE. See demo with resulting df$d exactly equal and identical to original ifelse: rextester.com/YGEL36294.

                                – Parfait
                                Nov 26 '18 at 16:22











                              • @Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.

                                – Gwang-Jin Kim
                                Nov 26 '18 at 20:17











                              • Thank you then pointing out!

                                – Gwang-Jin Kim
                                Nov 26 '18 at 20:18














                              1












                              1








                              1







                              df$variable2 <- sapply(df$variable, 
                              function(el) if (el == 0) {0} else if (el == 999) {NA} else {1})


                              This one-liner reflects your:




                              If value is 0, new one should be 0 too. If value ist 999, then make it
                              missing, NA. Everything else 1




                              Well, it is slightly slower than @markus's second or @SPJ's solutions which are most r-ish solutions.



                              Why one should put away the hands from ifelse



                              tt <- c(TRUE, FALSE, TRUE, FALSE)
                              a <- c("a", "b", "c", "d")
                              b <- 1:4
                              ifelse(tt, a, b) ## [1] "a" "2" "c" "4"
                              # totally perfect and as expected!

                              df <- data.frame(a=a, b=b, c=tt)
                              df$d <- ifelse(df$c, df$a, df$b)
                              ## > df
                              ## a b c d
                              ## 1 a 1 TRUE 1
                              ## 2 b 2 FALSE 2
                              ## 3 c 3 TRUE 3
                              ## 4 d 4 FALSE 4

                              ######### This is wrong!! ##########################
                              ## df$d is not [1] "a" "2" "c" "4"
                              ## the problem is that
                              ## ifelse(df$c, df$a, df$b)
                              ## returns for each TRUE or FALSE the entire
                              ## df$a or df$b intead of treating it like a vector.
                              ## Since the last df$c is FALSE, df$b is returned
                              ## Thus we get df$b for df$d.
                              ## Quite an unintuitive behaviour.
                              ##
                              ## If one uses purely vectors, ifelse is fine.
                              ## But actually df$c, df$a, df$b should be treated each like a vector.
                              ## However, `ifelse` does not.
                              ## No warnings that using `ifelse` with them will lead to a
                              ## totally different behaviour.
                              ## In my view, this is a design mistake of `ifelse`.
                              ## Thus I decided myself to abandon `ifelse` from my set of R commands.
                              ## To avoid that such kind of mistakes can ever happen.
                              #####################################################


                              As @Parfait pointed out correctly, it was a misinterpretation.
                              The problem was that df$a was treated in the data frame as a factor.



                              df <- data.frame(a=a, b=b, c=tt, stringsAsFactor = F)
                              df$d <- ifelse(df$c, df$a, df$b)
                              df


                              Gives the correct result.



                                a b     c d
                              1 a 1 TRUE a
                              2 b 2 FALSE 2
                              3 c 3 TRUE c
                              4 d 4 FALSE 4


                              Thank you @Parfait to pointing that out!
                              Strange that I didn't recognized that in my initial trials.
                              But yeah, you are absolutely right!






                              share|improve this answer















                              df$variable2 <- sapply(df$variable, 
                              function(el) if (el == 0) {0} else if (el == 999) {NA} else {1})


                              This one-liner reflects your:




                              If value is 0, new one should be 0 too. If value ist 999, then make it
                              missing, NA. Everything else 1




                              Well, it is slightly slower than @markus's second or @SPJ's solutions which are most r-ish solutions.



                              Why one should put away the hands from ifelse



                              tt <- c(TRUE, FALSE, TRUE, FALSE)
                              a <- c("a", "b", "c", "d")
                              b <- 1:4
                              ifelse(tt, a, b) ## [1] "a" "2" "c" "4"
                              # totally perfect and as expected!

                              df <- data.frame(a=a, b=b, c=tt)
                              df$d <- ifelse(df$c, df$a, df$b)
                              ## > df
                              ## a b c d
                              ## 1 a 1 TRUE 1
                              ## 2 b 2 FALSE 2
                              ## 3 c 3 TRUE 3
                              ## 4 d 4 FALSE 4

                              ######### This is wrong!! ##########################
                              ## df$d is not [1] "a" "2" "c" "4"
                              ## the problem is that
                              ## ifelse(df$c, df$a, df$b)
                              ## returns for each TRUE or FALSE the entire
                              ## df$a or df$b intead of treating it like a vector.
                              ## Since the last df$c is FALSE, df$b is returned
                              ## Thus we get df$b for df$d.
                              ## Quite an unintuitive behaviour.
                              ##
                              ## If one uses purely vectors, ifelse is fine.
                              ## But actually df$c, df$a, df$b should be treated each like a vector.
                              ## However, `ifelse` does not.
                              ## No warnings that using `ifelse` with them will lead to a
                              ## totally different behaviour.
                              ## In my view, this is a design mistake of `ifelse`.
                              ## Thus I decided myself to abandon `ifelse` from my set of R commands.
                              ## To avoid that such kind of mistakes can ever happen.
                              #####################################################


                              As @Parfait pointed out correctly, it was a misinterpretation.
                              The problem was that df$a was treated in the data frame as a factor.



                              df <- data.frame(a=a, b=b, c=tt, stringsAsFactor = F)
                              df$d <- ifelse(df$c, df$a, df$b)
                              df


                              Gives the correct result.



                                a b     c d
                              1 a 1 TRUE a
                              2 b 2 FALSE 2
                              3 c 3 TRUE c
                              4 d 4 FALSE 4


                              Thank you @Parfait to pointing that out!
                              Strange that I didn't recognized that in my initial trials.
                              But yeah, you are absolutely right!







                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited Nov 26 '18 at 20:22

























                              answered Nov 25 '18 at 23:07









                              Gwang-Jin KimGwang-Jin Kim

                              2,484217




                              2,484217








                              • 1





                                That's not an ifelse issue but a factor issue. Use the stringsAsFactors = FALSE argument in data.frame() so column a is treated as character. This is a regular new useR overlook.

                                – Parfait
                                Nov 26 '18 at 12:27













                              • @Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... - df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F) and then df$d <- ifelse(df$c, df$a, df$b) - looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!

                                – Gwang-Jin Kim
                                Nov 26 '18 at 13:56








                              • 1





                                Hmmmm... I am unable to reproduce using stringsAsFactors = FALSE. See demo with resulting df$d exactly equal and identical to original ifelse: rextester.com/YGEL36294.

                                – Parfait
                                Nov 26 '18 at 16:22











                              • @Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.

                                – Gwang-Jin Kim
                                Nov 26 '18 at 20:17











                              • Thank you then pointing out!

                                – Gwang-Jin Kim
                                Nov 26 '18 at 20:18














                              • 1





                                That's not an ifelse issue but a factor issue. Use the stringsAsFactors = FALSE argument in data.frame() so column a is treated as character. This is a regular new useR overlook.

                                – Parfait
                                Nov 26 '18 at 12:27













                              • @Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... - df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F) and then df$d <- ifelse(df$c, df$a, df$b) - looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!

                                – Gwang-Jin Kim
                                Nov 26 '18 at 13:56








                              • 1





                                Hmmmm... I am unable to reproduce using stringsAsFactors = FALSE. See demo with resulting df$d exactly equal and identical to original ifelse: rextester.com/YGEL36294.

                                – Parfait
                                Nov 26 '18 at 16:22











                              • @Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.

                                – Gwang-Jin Kim
                                Nov 26 '18 at 20:17











                              • Thank you then pointing out!

                                – Gwang-Jin Kim
                                Nov 26 '18 at 20:18








                              1




                              1





                              That's not an ifelse issue but a factor issue. Use the stringsAsFactors = FALSE argument in data.frame() so column a is treated as character. This is a regular new useR overlook.

                              – Parfait
                              Nov 26 '18 at 12:27







                              That's not an ifelse issue but a factor issue. Use the stringsAsFactors = FALSE argument in data.frame() so column a is treated as character. This is a regular new useR overlook.

                              – Parfait
                              Nov 26 '18 at 12:27















                              @Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... - df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F) and then df$d <- ifelse(df$c, df$a, df$b) - looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!

                              – Gwang-Jin Kim
                              Nov 26 '18 at 13:56







                              @Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... - df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F) and then df$d <- ifelse(df$c, df$a, df$b) - looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!

                              – Gwang-Jin Kim
                              Nov 26 '18 at 13:56






                              1




                              1





                              Hmmmm... I am unable to reproduce using stringsAsFactors = FALSE. See demo with resulting df$d exactly equal and identical to original ifelse: rextester.com/YGEL36294.

                              – Parfait
                              Nov 26 '18 at 16:22





                              Hmmmm... I am unable to reproduce using stringsAsFactors = FALSE. See demo with resulting df$d exactly equal and identical to original ifelse: rextester.com/YGEL36294.

                              – Parfait
                              Nov 26 '18 at 16:22













                              @Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.

                              – Gwang-Jin Kim
                              Nov 26 '18 at 20:17





                              @Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.

                              – Gwang-Jin Kim
                              Nov 26 '18 at 20:17













                              Thank you then pointing out!

                              – Gwang-Jin Kim
                              Nov 26 '18 at 20:18





                              Thank you then pointing out!

                              – Gwang-Jin Kim
                              Nov 26 '18 at 20:18


















                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472492%2fr-use-if-statement-to-regroup-variable%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Costa Masnaga

                              Fotorealismo

                              Sidney Franklin