R - use if statement to regroup variable
I want to regroup a variable into a new one.
If value is 0, new one should be 0 too.
If value ist 999, then make it missing, NA.
Everything else 1
This is my try:
id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)
df$variable2 <-
if (df$variable == 0) {
df$variable2 = 0
} else if (df$variable == 999){
df$variable2 = NA
} else {
df$variable2 = 1
}
And this the error message:
In if (df$variable == 0) { : the condition has length > 1 and only
the first element will be used
A pretty basic question but I'm a basic user. Thanks in advance!
r
add a comment |
I want to regroup a variable into a new one.
If value is 0, new one should be 0 too.
If value ist 999, then make it missing, NA.
Everything else 1
This is my try:
id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)
df$variable2 <-
if (df$variable == 0) {
df$variable2 = 0
} else if (df$variable == 999){
df$variable2 = NA
} else {
df$variable2 = 1
}
And this the error message:
In if (df$variable == 0) { : the condition has length > 1 and only
the first element will be used
A pretty basic question but I'm a basic user. Thanks in advance!
r
add a comment |
I want to regroup a variable into a new one.
If value is 0, new one should be 0 too.
If value ist 999, then make it missing, NA.
Everything else 1
This is my try:
id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)
df$variable2 <-
if (df$variable == 0) {
df$variable2 = 0
} else if (df$variable == 999){
df$variable2 = NA
} else {
df$variable2 = 1
}
And this the error message:
In if (df$variable == 0) { : the condition has length > 1 and only
the first element will be used
A pretty basic question but I'm a basic user. Thanks in advance!
r
I want to regroup a variable into a new one.
If value is 0, new one should be 0 too.
If value ist 999, then make it missing, NA.
Everything else 1
This is my try:
id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)
df$variable2 <-
if (df$variable == 0) {
df$variable2 = 0
} else if (df$variable == 999){
df$variable2 = NA
} else {
df$variable2 = 1
}
And this the error message:
In if (df$variable == 0) { : the condition has length > 1 and only
the first element will be used
A pretty basic question but I'm a basic user. Thanks in advance!
r
r
asked Nov 25 '18 at 22:07
SchillerlockeSchillerlocke
246
246
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
Try ifelse
df$variable2 <- ifelse(df$variable == 999, NA, ifelse(df$variable > 0, 1, 0))
df
# id variable variable2
#1 1 0 0
#2 2 0 0
#3 3 0 0
#4 4 1 1
#5 5 2 1
#6 6 3 1
#7 7 4 1
#8 8 5 1
#9 9 999 NA
#10 10 999 NA
When you do df$variable == 0
the output / condition
is
#[1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
where it should be a length-one logical vector that is not NA in if(condition)
, see ?"if"
.
You can avoid ifelse
, for example, like so
df$variable2 <- df$variable
df$variable2[df$variable2 == 999] <- NA
df$variable2[df$variable2 > 0] <- 1
Withifelse
I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse
is NOT simply a vectorizedif - else
. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to useifelse
but rather try to find a solution with normalif - else
clauses.
– Gwang-Jin Kim
Nov 25 '18 at 22:45
@Gwang-JinKim Thanks for the comment. Posted a second solution to avoidifelse
.
– markus
Nov 25 '18 at 22:48
welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negativeifelse
example - but difficult to find ...
– Gwang-Jin Kim
Nov 25 '18 at 22:50
@Gwang-JinKim ... Please do find such a link. To date, I have never any issue withifelse()
. Pass in it a vector to return a modified vector of same length!
– Parfait
Nov 25 '18 at 23:46
@Parfait I couldn't find a link but added to my answer an example. You are right, usingifelse
purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.
– Gwang-Jin Kim
Nov 26 '18 at 7:48
add a comment |
It might be easier to avoid the if/else statement all together by using conditional statements within subset notation:
when df$variable
is equal to zero, change it to zero
df$variable[df$variable==0] <- 0
when df$variable
is equal to 999, change it to NA
df$variable[df$variable==999] <- NA
when df$variable
is greater than 0 and is not equal to NA, change it to 1
df$variable[df$variable>0 & is.na(df$variable) == 'FALSE'] <- 1
add a comment |
Looks like you want to recode your variable. You can do this (and other data/variable transformations) with the sjmisc-package, in your case with the rec()
-command:
id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)
library(sjmisc)
rec(df, variable, rec = c("0=0;999=NA;else=1"))
#> id variable variable_r
#> 1 1 0 0
#> 2 2 0 0
#> 3 3 0 0
#> 4 4 1 1
#> 5 5 2 1
#> 6 6 3 1
#> 7 7 4 1
#> 8 8 5 1
#> 9 9 999 NA
#> 10 10 999 NA
# or a single vector as input
rec(df$variable, rec = c("0=0;999=NA;else=1"))
#> [1] 0 0 0 1 1 1 1 1 NA NA
There are many examples, also in the help-file, and you can find a sjmisc-cheatsheet at the RStudio-Cheatsheet collection (or direct PDF-download here).
add a comment |
df$variable2 <- sapply(df$variable,
function(el) if (el == 0) {0} else if (el == 999) {NA} else {1})
This one-liner reflects your:
If value is 0, new one should be 0 too. If value ist 999, then make it
missing, NA. Everything else 1
Well, it is slightly slower than @markus's second or @SPJ's solutions which are most r-ish solutions.
Why one should put away the hands from ifelse
tt <- c(TRUE, FALSE, TRUE, FALSE)
a <- c("a", "b", "c", "d")
b <- 1:4
ifelse(tt, a, b) ## [1] "a" "2" "c" "4"
# totally perfect and as expected!
df <- data.frame(a=a, b=b, c=tt)
df$d <- ifelse(df$c, df$a, df$b)
## > df
## a b c d
## 1 a 1 TRUE 1
## 2 b 2 FALSE 2
## 3 c 3 TRUE 3
## 4 d 4 FALSE 4
######### This is wrong!! ##########################
## df$d is not [1] "a" "2" "c" "4"
## the problem is that
## ifelse(df$c, df$a, df$b)
## returns for each TRUE or FALSE the entire
## df$a or df$b intead of treating it like a vector.
## Since the last df$c is FALSE, df$b is returned
## Thus we get df$b for df$d.
## Quite an unintuitive behaviour.
##
## If one uses purely vectors, ifelse is fine.
## But actually df$c, df$a, df$b should be treated each like a vector.
## However, `ifelse` does not.
## No warnings that using `ifelse` with them will lead to a
## totally different behaviour.
## In my view, this is a design mistake of `ifelse`.
## Thus I decided myself to abandon `ifelse` from my set of R commands.
## To avoid that such kind of mistakes can ever happen.
#####################################################
As @Parfait pointed out correctly, it was a misinterpretation.
The problem was that df$a was treated in the data frame as a factor.
df <- data.frame(a=a, b=b, c=tt, stringsAsFactor = F)
df$d <- ifelse(df$c, df$a, df$b)
df
Gives the correct result.
a b c d
1 a 1 TRUE a
2 b 2 FALSE 2
3 c 3 TRUE c
4 d 4 FALSE 4
Thank you @Parfait to pointing that out!
Strange that I didn't recognized that in my initial trials.
But yeah, you are absolutely right!
1
That's not anifelse
issue but a factor issue. Use thestringsAsFactors = FALSE
argument indata.frame()
so column a is treated as character. This is a regular new useR overlook.
– Parfait
Nov 26 '18 at 12:27
@Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... -df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F)
and thendf$d <- ifelse(df$c, df$a, df$b)
- looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!
– Gwang-Jin Kim
Nov 26 '18 at 13:56
1
Hmmmm... I am unable to reproduce usingstringsAsFactors = FALSE
. See demo with resultingdf$d
exactly equal and identical to originalifelse
: rextester.com/YGEL36294.
– Parfait
Nov 26 '18 at 16:22
@Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.
– Gwang-Jin Kim
Nov 26 '18 at 20:17
Thank you then pointing out!
– Gwang-Jin Kim
Nov 26 '18 at 20:18
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472492%2fr-use-if-statement-to-regroup-variable%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try ifelse
df$variable2 <- ifelse(df$variable == 999, NA, ifelse(df$variable > 0, 1, 0))
df
# id variable variable2
#1 1 0 0
#2 2 0 0
#3 3 0 0
#4 4 1 1
#5 5 2 1
#6 6 3 1
#7 7 4 1
#8 8 5 1
#9 9 999 NA
#10 10 999 NA
When you do df$variable == 0
the output / condition
is
#[1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
where it should be a length-one logical vector that is not NA in if(condition)
, see ?"if"
.
You can avoid ifelse
, for example, like so
df$variable2 <- df$variable
df$variable2[df$variable2 == 999] <- NA
df$variable2[df$variable2 > 0] <- 1
Withifelse
I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse
is NOT simply a vectorizedif - else
. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to useifelse
but rather try to find a solution with normalif - else
clauses.
– Gwang-Jin Kim
Nov 25 '18 at 22:45
@Gwang-JinKim Thanks for the comment. Posted a second solution to avoidifelse
.
– markus
Nov 25 '18 at 22:48
welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negativeifelse
example - but difficult to find ...
– Gwang-Jin Kim
Nov 25 '18 at 22:50
@Gwang-JinKim ... Please do find such a link. To date, I have never any issue withifelse()
. Pass in it a vector to return a modified vector of same length!
– Parfait
Nov 25 '18 at 23:46
@Parfait I couldn't find a link but added to my answer an example. You are right, usingifelse
purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.
– Gwang-Jin Kim
Nov 26 '18 at 7:48
add a comment |
Try ifelse
df$variable2 <- ifelse(df$variable == 999, NA, ifelse(df$variable > 0, 1, 0))
df
# id variable variable2
#1 1 0 0
#2 2 0 0
#3 3 0 0
#4 4 1 1
#5 5 2 1
#6 6 3 1
#7 7 4 1
#8 8 5 1
#9 9 999 NA
#10 10 999 NA
When you do df$variable == 0
the output / condition
is
#[1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
where it should be a length-one logical vector that is not NA in if(condition)
, see ?"if"
.
You can avoid ifelse
, for example, like so
df$variable2 <- df$variable
df$variable2[df$variable2 == 999] <- NA
df$variable2[df$variable2 > 0] <- 1
Withifelse
I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse
is NOT simply a vectorizedif - else
. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to useifelse
but rather try to find a solution with normalif - else
clauses.
– Gwang-Jin Kim
Nov 25 '18 at 22:45
@Gwang-JinKim Thanks for the comment. Posted a second solution to avoidifelse
.
– markus
Nov 25 '18 at 22:48
welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negativeifelse
example - but difficult to find ...
– Gwang-Jin Kim
Nov 25 '18 at 22:50
@Gwang-JinKim ... Please do find such a link. To date, I have never any issue withifelse()
. Pass in it a vector to return a modified vector of same length!
– Parfait
Nov 25 '18 at 23:46
@Parfait I couldn't find a link but added to my answer an example. You are right, usingifelse
purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.
– Gwang-Jin Kim
Nov 26 '18 at 7:48
add a comment |
Try ifelse
df$variable2 <- ifelse(df$variable == 999, NA, ifelse(df$variable > 0, 1, 0))
df
# id variable variable2
#1 1 0 0
#2 2 0 0
#3 3 0 0
#4 4 1 1
#5 5 2 1
#6 6 3 1
#7 7 4 1
#8 8 5 1
#9 9 999 NA
#10 10 999 NA
When you do df$variable == 0
the output / condition
is
#[1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
where it should be a length-one logical vector that is not NA in if(condition)
, see ?"if"
.
You can avoid ifelse
, for example, like so
df$variable2 <- df$variable
df$variable2[df$variable2 == 999] <- NA
df$variable2[df$variable2 > 0] <- 1
Try ifelse
df$variable2 <- ifelse(df$variable == 999, NA, ifelse(df$variable > 0, 1, 0))
df
# id variable variable2
#1 1 0 0
#2 2 0 0
#3 3 0 0
#4 4 1 1
#5 5 2 1
#6 6 3 1
#7 7 4 1
#8 8 5 1
#9 9 999 NA
#10 10 999 NA
When you do df$variable == 0
the output / condition
is
#[1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
where it should be a length-one logical vector that is not NA in if(condition)
, see ?"if"
.
You can avoid ifelse
, for example, like so
df$variable2 <- df$variable
df$variable2[df$variable2 == 999] <- NA
df$variable2[df$variable2 > 0] <- 1
edited Nov 25 '18 at 22:47
answered Nov 25 '18 at 22:14
markusmarkus
14.5k11336
14.5k11336
Withifelse
I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse
is NOT simply a vectorizedif - else
. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to useifelse
but rather try to find a solution with normalif - else
clauses.
– Gwang-Jin Kim
Nov 25 '18 at 22:45
@Gwang-JinKim Thanks for the comment. Posted a second solution to avoidifelse
.
– markus
Nov 25 '18 at 22:48
welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negativeifelse
example - but difficult to find ...
– Gwang-Jin Kim
Nov 25 '18 at 22:50
@Gwang-JinKim ... Please do find such a link. To date, I have never any issue withifelse()
. Pass in it a vector to return a modified vector of same length!
– Parfait
Nov 25 '18 at 23:46
@Parfait I couldn't find a link but added to my answer an example. You are right, usingifelse
purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.
– Gwang-Jin Kim
Nov 26 '18 at 7:48
add a comment |
Withifelse
I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse
is NOT simply a vectorizedif - else
. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to useifelse
but rather try to find a solution with normalif - else
clauses.
– Gwang-Jin Kim
Nov 25 '18 at 22:45
@Gwang-JinKim Thanks for the comment. Posted a second solution to avoidifelse
.
– markus
Nov 25 '18 at 22:48
welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negativeifelse
example - but difficult to find ...
– Gwang-Jin Kim
Nov 25 '18 at 22:50
@Gwang-JinKim ... Please do find such a link. To date, I have never any issue withifelse()
. Pass in it a vector to return a modified vector of same length!
– Parfait
Nov 25 '18 at 23:46
@Parfait I couldn't find a link but added to my answer an example. You are right, usingifelse
purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.
– Gwang-Jin Kim
Nov 26 '18 at 7:48
With
ifelse
I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse
is NOT simply a vectorized if - else
. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to use ifelse
but rather try to find a solution with normal if - else
clauses.– Gwang-Jin Kim
Nov 25 '18 at 22:45
With
ifelse
I had quite some bad experiences. Especially in combination with dataframe columns. It can behave very unintuitively. (ifelse
is NOT simply a vectorized if - else
. Definitely not. Thus - ) In my opinion one should always discourage (especially beginners) to use ifelse
but rather try to find a solution with normal if - else
clauses.– Gwang-Jin Kim
Nov 25 '18 at 22:45
@Gwang-JinKim Thanks for the comment. Posted a second solution to avoid
ifelse
.– markus
Nov 25 '18 at 22:48
@Gwang-JinKim Thanks for the comment. Posted a second solution to avoid
ifelse
.– markus
Nov 25 '18 at 22:48
welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negative
ifelse
example - but difficult to find ...– Gwang-Jin Kim
Nov 25 '18 at 22:50
welcome! Yes, your second solution is much more R-ish. I am trying to find the link for a negative
ifelse
example - but difficult to find ...– Gwang-Jin Kim
Nov 25 '18 at 22:50
@Gwang-JinKim ... Please do find such a link. To date, I have never any issue with
ifelse()
. Pass in it a vector to return a modified vector of same length!– Parfait
Nov 25 '18 at 23:46
@Gwang-JinKim ... Please do find such a link. To date, I have never any issue with
ifelse()
. Pass in it a vector to return a modified vector of same length!– Parfait
Nov 25 '18 at 23:46
@Parfait I couldn't find a link but added to my answer an example. You are right, using
ifelse
purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.– Gwang-Jin Kim
Nov 26 '18 at 7:48
@Parfait I couldn't find a link but added to my answer an example. You are right, using
ifelse
purely with vectors might not give any problems. But very often beginners and intermediates whish to use it on data frames. Choosing a column from data frame should give the same behaviour like choosing a vector. But it doesn't. See my example.– Gwang-Jin Kim
Nov 26 '18 at 7:48
add a comment |
It might be easier to avoid the if/else statement all together by using conditional statements within subset notation:
when df$variable
is equal to zero, change it to zero
df$variable[df$variable==0] <- 0
when df$variable
is equal to 999, change it to NA
df$variable[df$variable==999] <- NA
when df$variable
is greater than 0 and is not equal to NA, change it to 1
df$variable[df$variable>0 & is.na(df$variable) == 'FALSE'] <- 1
add a comment |
It might be easier to avoid the if/else statement all together by using conditional statements within subset notation:
when df$variable
is equal to zero, change it to zero
df$variable[df$variable==0] <- 0
when df$variable
is equal to 999, change it to NA
df$variable[df$variable==999] <- NA
when df$variable
is greater than 0 and is not equal to NA, change it to 1
df$variable[df$variable>0 & is.na(df$variable) == 'FALSE'] <- 1
add a comment |
It might be easier to avoid the if/else statement all together by using conditional statements within subset notation:
when df$variable
is equal to zero, change it to zero
df$variable[df$variable==0] <- 0
when df$variable
is equal to 999, change it to NA
df$variable[df$variable==999] <- NA
when df$variable
is greater than 0 and is not equal to NA, change it to 1
df$variable[df$variable>0 & is.na(df$variable) == 'FALSE'] <- 1
It might be easier to avoid the if/else statement all together by using conditional statements within subset notation:
when df$variable
is equal to zero, change it to zero
df$variable[df$variable==0] <- 0
when df$variable
is equal to 999, change it to NA
df$variable[df$variable==999] <- NA
when df$variable
is greater than 0 and is not equal to NA, change it to 1
df$variable[df$variable>0 & is.na(df$variable) == 'FALSE'] <- 1
edited Nov 26 '18 at 0:35
mischva11
890818
890818
answered Nov 25 '18 at 23:04
SPJSPJ
235
235
add a comment |
add a comment |
Looks like you want to recode your variable. You can do this (and other data/variable transformations) with the sjmisc-package, in your case with the rec()
-command:
id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)
library(sjmisc)
rec(df, variable, rec = c("0=0;999=NA;else=1"))
#> id variable variable_r
#> 1 1 0 0
#> 2 2 0 0
#> 3 3 0 0
#> 4 4 1 1
#> 5 5 2 1
#> 6 6 3 1
#> 7 7 4 1
#> 8 8 5 1
#> 9 9 999 NA
#> 10 10 999 NA
# or a single vector as input
rec(df$variable, rec = c("0=0;999=NA;else=1"))
#> [1] 0 0 0 1 1 1 1 1 NA NA
There are many examples, also in the help-file, and you can find a sjmisc-cheatsheet at the RStudio-Cheatsheet collection (or direct PDF-download here).
add a comment |
Looks like you want to recode your variable. You can do this (and other data/variable transformations) with the sjmisc-package, in your case with the rec()
-command:
id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)
library(sjmisc)
rec(df, variable, rec = c("0=0;999=NA;else=1"))
#> id variable variable_r
#> 1 1 0 0
#> 2 2 0 0
#> 3 3 0 0
#> 4 4 1 1
#> 5 5 2 1
#> 6 6 3 1
#> 7 7 4 1
#> 8 8 5 1
#> 9 9 999 NA
#> 10 10 999 NA
# or a single vector as input
rec(df$variable, rec = c("0=0;999=NA;else=1"))
#> [1] 0 0 0 1 1 1 1 1 NA NA
There are many examples, also in the help-file, and you can find a sjmisc-cheatsheet at the RStudio-Cheatsheet collection (or direct PDF-download here).
add a comment |
Looks like you want to recode your variable. You can do this (and other data/variable transformations) with the sjmisc-package, in your case with the rec()
-command:
id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)
library(sjmisc)
rec(df, variable, rec = c("0=0;999=NA;else=1"))
#> id variable variable_r
#> 1 1 0 0
#> 2 2 0 0
#> 3 3 0 0
#> 4 4 1 1
#> 5 5 2 1
#> 6 6 3 1
#> 7 7 4 1
#> 8 8 5 1
#> 9 9 999 NA
#> 10 10 999 NA
# or a single vector as input
rec(df$variable, rec = c("0=0;999=NA;else=1"))
#> [1] 0 0 0 1 1 1 1 1 NA NA
There are many examples, also in the help-file, and you can find a sjmisc-cheatsheet at the RStudio-Cheatsheet collection (or direct PDF-download here).
Looks like you want to recode your variable. You can do this (and other data/variable transformations) with the sjmisc-package, in your case with the rec()
-command:
id <- 1:10
variable <- c(0,0,0,1,2,3,4,5,999,999)
df <- data.frame(id,variable)
library(sjmisc)
rec(df, variable, rec = c("0=0;999=NA;else=1"))
#> id variable variable_r
#> 1 1 0 0
#> 2 2 0 0
#> 3 3 0 0
#> 4 4 1 1
#> 5 5 2 1
#> 6 6 3 1
#> 7 7 4 1
#> 8 8 5 1
#> 9 9 999 NA
#> 10 10 999 NA
# or a single vector as input
rec(df$variable, rec = c("0=0;999=NA;else=1"))
#> [1] 0 0 0 1 1 1 1 1 NA NA
There are many examples, also in the help-file, and you can find a sjmisc-cheatsheet at the RStudio-Cheatsheet collection (or direct PDF-download here).
answered Nov 26 '18 at 7:53
DanielDaniel
3,82341730
3,82341730
add a comment |
add a comment |
df$variable2 <- sapply(df$variable,
function(el) if (el == 0) {0} else if (el == 999) {NA} else {1})
This one-liner reflects your:
If value is 0, new one should be 0 too. If value ist 999, then make it
missing, NA. Everything else 1
Well, it is slightly slower than @markus's second or @SPJ's solutions which are most r-ish solutions.
Why one should put away the hands from ifelse
tt <- c(TRUE, FALSE, TRUE, FALSE)
a <- c("a", "b", "c", "d")
b <- 1:4
ifelse(tt, a, b) ## [1] "a" "2" "c" "4"
# totally perfect and as expected!
df <- data.frame(a=a, b=b, c=tt)
df$d <- ifelse(df$c, df$a, df$b)
## > df
## a b c d
## 1 a 1 TRUE 1
## 2 b 2 FALSE 2
## 3 c 3 TRUE 3
## 4 d 4 FALSE 4
######### This is wrong!! ##########################
## df$d is not [1] "a" "2" "c" "4"
## the problem is that
## ifelse(df$c, df$a, df$b)
## returns for each TRUE or FALSE the entire
## df$a or df$b intead of treating it like a vector.
## Since the last df$c is FALSE, df$b is returned
## Thus we get df$b for df$d.
## Quite an unintuitive behaviour.
##
## If one uses purely vectors, ifelse is fine.
## But actually df$c, df$a, df$b should be treated each like a vector.
## However, `ifelse` does not.
## No warnings that using `ifelse` with them will lead to a
## totally different behaviour.
## In my view, this is a design mistake of `ifelse`.
## Thus I decided myself to abandon `ifelse` from my set of R commands.
## To avoid that such kind of mistakes can ever happen.
#####################################################
As @Parfait pointed out correctly, it was a misinterpretation.
The problem was that df$a was treated in the data frame as a factor.
df <- data.frame(a=a, b=b, c=tt, stringsAsFactor = F)
df$d <- ifelse(df$c, df$a, df$b)
df
Gives the correct result.
a b c d
1 a 1 TRUE a
2 b 2 FALSE 2
3 c 3 TRUE c
4 d 4 FALSE 4
Thank you @Parfait to pointing that out!
Strange that I didn't recognized that in my initial trials.
But yeah, you are absolutely right!
1
That's not anifelse
issue but a factor issue. Use thestringsAsFactors = FALSE
argument indata.frame()
so column a is treated as character. This is a regular new useR overlook.
– Parfait
Nov 26 '18 at 12:27
@Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... -df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F)
and thendf$d <- ifelse(df$c, df$a, df$b)
- looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!
– Gwang-Jin Kim
Nov 26 '18 at 13:56
1
Hmmmm... I am unable to reproduce usingstringsAsFactors = FALSE
. See demo with resultingdf$d
exactly equal and identical to originalifelse
: rextester.com/YGEL36294.
– Parfait
Nov 26 '18 at 16:22
@Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.
– Gwang-Jin Kim
Nov 26 '18 at 20:17
Thank you then pointing out!
– Gwang-Jin Kim
Nov 26 '18 at 20:18
add a comment |
df$variable2 <- sapply(df$variable,
function(el) if (el == 0) {0} else if (el == 999) {NA} else {1})
This one-liner reflects your:
If value is 0, new one should be 0 too. If value ist 999, then make it
missing, NA. Everything else 1
Well, it is slightly slower than @markus's second or @SPJ's solutions which are most r-ish solutions.
Why one should put away the hands from ifelse
tt <- c(TRUE, FALSE, TRUE, FALSE)
a <- c("a", "b", "c", "d")
b <- 1:4
ifelse(tt, a, b) ## [1] "a" "2" "c" "4"
# totally perfect and as expected!
df <- data.frame(a=a, b=b, c=tt)
df$d <- ifelse(df$c, df$a, df$b)
## > df
## a b c d
## 1 a 1 TRUE 1
## 2 b 2 FALSE 2
## 3 c 3 TRUE 3
## 4 d 4 FALSE 4
######### This is wrong!! ##########################
## df$d is not [1] "a" "2" "c" "4"
## the problem is that
## ifelse(df$c, df$a, df$b)
## returns for each TRUE or FALSE the entire
## df$a or df$b intead of treating it like a vector.
## Since the last df$c is FALSE, df$b is returned
## Thus we get df$b for df$d.
## Quite an unintuitive behaviour.
##
## If one uses purely vectors, ifelse is fine.
## But actually df$c, df$a, df$b should be treated each like a vector.
## However, `ifelse` does not.
## No warnings that using `ifelse` with them will lead to a
## totally different behaviour.
## In my view, this is a design mistake of `ifelse`.
## Thus I decided myself to abandon `ifelse` from my set of R commands.
## To avoid that such kind of mistakes can ever happen.
#####################################################
As @Parfait pointed out correctly, it was a misinterpretation.
The problem was that df$a was treated in the data frame as a factor.
df <- data.frame(a=a, b=b, c=tt, stringsAsFactor = F)
df$d <- ifelse(df$c, df$a, df$b)
df
Gives the correct result.
a b c d
1 a 1 TRUE a
2 b 2 FALSE 2
3 c 3 TRUE c
4 d 4 FALSE 4
Thank you @Parfait to pointing that out!
Strange that I didn't recognized that in my initial trials.
But yeah, you are absolutely right!
1
That's not anifelse
issue but a factor issue. Use thestringsAsFactors = FALSE
argument indata.frame()
so column a is treated as character. This is a regular new useR overlook.
– Parfait
Nov 26 '18 at 12:27
@Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... -df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F)
and thendf$d <- ifelse(df$c, df$a, df$b)
- looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!
– Gwang-Jin Kim
Nov 26 '18 at 13:56
1
Hmmmm... I am unable to reproduce usingstringsAsFactors = FALSE
. See demo with resultingdf$d
exactly equal and identical to originalifelse
: rextester.com/YGEL36294.
– Parfait
Nov 26 '18 at 16:22
@Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.
– Gwang-Jin Kim
Nov 26 '18 at 20:17
Thank you then pointing out!
– Gwang-Jin Kim
Nov 26 '18 at 20:18
add a comment |
df$variable2 <- sapply(df$variable,
function(el) if (el == 0) {0} else if (el == 999) {NA} else {1})
This one-liner reflects your:
If value is 0, new one should be 0 too. If value ist 999, then make it
missing, NA. Everything else 1
Well, it is slightly slower than @markus's second or @SPJ's solutions which are most r-ish solutions.
Why one should put away the hands from ifelse
tt <- c(TRUE, FALSE, TRUE, FALSE)
a <- c("a", "b", "c", "d")
b <- 1:4
ifelse(tt, a, b) ## [1] "a" "2" "c" "4"
# totally perfect and as expected!
df <- data.frame(a=a, b=b, c=tt)
df$d <- ifelse(df$c, df$a, df$b)
## > df
## a b c d
## 1 a 1 TRUE 1
## 2 b 2 FALSE 2
## 3 c 3 TRUE 3
## 4 d 4 FALSE 4
######### This is wrong!! ##########################
## df$d is not [1] "a" "2" "c" "4"
## the problem is that
## ifelse(df$c, df$a, df$b)
## returns for each TRUE or FALSE the entire
## df$a or df$b intead of treating it like a vector.
## Since the last df$c is FALSE, df$b is returned
## Thus we get df$b for df$d.
## Quite an unintuitive behaviour.
##
## If one uses purely vectors, ifelse is fine.
## But actually df$c, df$a, df$b should be treated each like a vector.
## However, `ifelse` does not.
## No warnings that using `ifelse` with them will lead to a
## totally different behaviour.
## In my view, this is a design mistake of `ifelse`.
## Thus I decided myself to abandon `ifelse` from my set of R commands.
## To avoid that such kind of mistakes can ever happen.
#####################################################
As @Parfait pointed out correctly, it was a misinterpretation.
The problem was that df$a was treated in the data frame as a factor.
df <- data.frame(a=a, b=b, c=tt, stringsAsFactor = F)
df$d <- ifelse(df$c, df$a, df$b)
df
Gives the correct result.
a b c d
1 a 1 TRUE a
2 b 2 FALSE 2
3 c 3 TRUE c
4 d 4 FALSE 4
Thank you @Parfait to pointing that out!
Strange that I didn't recognized that in my initial trials.
But yeah, you are absolutely right!
df$variable2 <- sapply(df$variable,
function(el) if (el == 0) {0} else if (el == 999) {NA} else {1})
This one-liner reflects your:
If value is 0, new one should be 0 too. If value ist 999, then make it
missing, NA. Everything else 1
Well, it is slightly slower than @markus's second or @SPJ's solutions which are most r-ish solutions.
Why one should put away the hands from ifelse
tt <- c(TRUE, FALSE, TRUE, FALSE)
a <- c("a", "b", "c", "d")
b <- 1:4
ifelse(tt, a, b) ## [1] "a" "2" "c" "4"
# totally perfect and as expected!
df <- data.frame(a=a, b=b, c=tt)
df$d <- ifelse(df$c, df$a, df$b)
## > df
## a b c d
## 1 a 1 TRUE 1
## 2 b 2 FALSE 2
## 3 c 3 TRUE 3
## 4 d 4 FALSE 4
######### This is wrong!! ##########################
## df$d is not [1] "a" "2" "c" "4"
## the problem is that
## ifelse(df$c, df$a, df$b)
## returns for each TRUE or FALSE the entire
## df$a or df$b intead of treating it like a vector.
## Since the last df$c is FALSE, df$b is returned
## Thus we get df$b for df$d.
## Quite an unintuitive behaviour.
##
## If one uses purely vectors, ifelse is fine.
## But actually df$c, df$a, df$b should be treated each like a vector.
## However, `ifelse` does not.
## No warnings that using `ifelse` with them will lead to a
## totally different behaviour.
## In my view, this is a design mistake of `ifelse`.
## Thus I decided myself to abandon `ifelse` from my set of R commands.
## To avoid that such kind of mistakes can ever happen.
#####################################################
As @Parfait pointed out correctly, it was a misinterpretation.
The problem was that df$a was treated in the data frame as a factor.
df <- data.frame(a=a, b=b, c=tt, stringsAsFactor = F)
df$d <- ifelse(df$c, df$a, df$b)
df
Gives the correct result.
a b c d
1 a 1 TRUE a
2 b 2 FALSE 2
3 c 3 TRUE c
4 d 4 FALSE 4
Thank you @Parfait to pointing that out!
Strange that I didn't recognized that in my initial trials.
But yeah, you are absolutely right!
edited Nov 26 '18 at 20:22
answered Nov 25 '18 at 23:07
Gwang-Jin KimGwang-Jin Kim
2,484217
2,484217
1
That's not anifelse
issue but a factor issue. Use thestringsAsFactors = FALSE
argument indata.frame()
so column a is treated as character. This is a regular new useR overlook.
– Parfait
Nov 26 '18 at 12:27
@Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... -df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F)
and thendf$d <- ifelse(df$c, df$a, df$b)
- looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!
– Gwang-Jin Kim
Nov 26 '18 at 13:56
1
Hmmmm... I am unable to reproduce usingstringsAsFactors = FALSE
. See demo with resultingdf$d
exactly equal and identical to originalifelse
: rextester.com/YGEL36294.
– Parfait
Nov 26 '18 at 16:22
@Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.
– Gwang-Jin Kim
Nov 26 '18 at 20:17
Thank you then pointing out!
– Gwang-Jin Kim
Nov 26 '18 at 20:18
add a comment |
1
That's not anifelse
issue but a factor issue. Use thestringsAsFactors = FALSE
argument indata.frame()
so column a is treated as character. This is a regular new useR overlook.
– Parfait
Nov 26 '18 at 12:27
@Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... -df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F)
and thendf$d <- ifelse(df$c, df$a, df$b)
- looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!
– Gwang-Jin Kim
Nov 26 '18 at 13:56
1
Hmmmm... I am unable to reproduce usingstringsAsFactors = FALSE
. See demo with resultingdf$d
exactly equal and identical to originalifelse
: rextester.com/YGEL36294.
– Parfait
Nov 26 '18 at 16:22
@Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.
– Gwang-Jin Kim
Nov 26 '18 at 20:17
Thank you then pointing out!
– Gwang-Jin Kim
Nov 26 '18 at 20:18
1
1
That's not an
ifelse
issue but a factor issue. Use the stringsAsFactors = FALSE
argument in data.frame()
so column a is treated as character. This is a regular new useR overlook.– Parfait
Nov 26 '18 at 12:27
That's not an
ifelse
issue but a factor issue. Use the stringsAsFactors = FALSE
argument in data.frame()
so column a is treated as character. This is a regular new useR overlook.– Parfait
Nov 26 '18 at 12:27
@Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... -
df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F)
and then df$d <- ifelse(df$c, df$a, df$b)
- looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!– Gwang-Jin Kim
Nov 26 '18 at 13:56
@Parfait - it would be nice but I tried stringsAsFactors = FALSE, but it is as I wrote here - you get entire df$b as df$d; I would have recognized a stringsAsFactors issue quite quickly. Because this is the first thing I assume when sth is wrong with data frames in R... -
df <- data.frame(a=a, b=b, c=tt, stringsAsFactors = F)
and then df$d <- ifelse(df$c, df$a, df$b)
- looks exactly the same like before - df$d is identical to df$b. And the explanation I already wrote above in the example. - And that is the point - even experienced users as you could not predict this behavior from code!– Gwang-Jin Kim
Nov 26 '18 at 13:56
1
1
Hmmmm... I am unable to reproduce using
stringsAsFactors = FALSE
. See demo with resulting df$d
exactly equal and identical to original ifelse
: rextester.com/YGEL36294.– Parfait
Nov 26 '18 at 16:22
Hmmmm... I am unable to reproduce using
stringsAsFactors = FALSE
. See demo with resulting df$d
exactly equal and identical to original ifelse
: rextester.com/YGEL36294.– Parfait
Nov 26 '18 at 16:22
@Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.
– Gwang-Jin Kim
Nov 26 '18 at 20:17
@Parfait Oh I did sth wrong before - it works. Strange - why it didn't work before.
– Gwang-Jin Kim
Nov 26 '18 at 20:17
Thank you then pointing out!
– Gwang-Jin Kim
Nov 26 '18 at 20:18
Thank you then pointing out!
– Gwang-Jin Kim
Nov 26 '18 at 20:18
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53472492%2fr-use-if-statement-to-regroup-variable%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown