How to convert a factor to integernumeric without loss of information?
up vote
495
down vote
favorite
When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.
f <- factor(sample(runif(5), 20, replace = TRUE))
## [1] 0.0248644019011408 0.0248644019011408 0.179684827337041
## [4] 0.0284090070053935 0.363644931698218 0.363644931698218
## [7] 0.179684827337041 0.249704354675487 0.249704354675487
## [10] 0.0248644019011408 0.249704354675487 0.0284090070053935
## [13] 0.179684827337041 0.0248644019011408 0.179684827337041
## [16] 0.363644931698218 0.249704354675487 0.363644931698218
## [19] 0.179684827337041 0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218
as.numeric(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2
as.integer(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2
I have to resort to paste to get the real values:
as.numeric(paste(f))
## [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
## [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
## [19] 0.17968483 0.02840901
Is there a better way to convert a factor to numeric?
r casting r-faq
add a comment |
up vote
495
down vote
favorite
When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.
f <- factor(sample(runif(5), 20, replace = TRUE))
## [1] 0.0248644019011408 0.0248644019011408 0.179684827337041
## [4] 0.0284090070053935 0.363644931698218 0.363644931698218
## [7] 0.179684827337041 0.249704354675487 0.249704354675487
## [10] 0.0248644019011408 0.249704354675487 0.0284090070053935
## [13] 0.179684827337041 0.0248644019011408 0.179684827337041
## [16] 0.363644931698218 0.249704354675487 0.363644931698218
## [19] 0.179684827337041 0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218
as.numeric(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2
as.integer(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2
I have to resort to paste to get the real values:
as.numeric(paste(f))
## [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
## [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
## [19] 0.17968483 0.02840901
Is there a better way to convert a factor to numeric?
r casting r-faq
1
The levels of a factor are stored as character data type anyway (attributes(f)), so I don't think there is anything wrong withas.numeric(paste(f)). Perhaps it would be better to think why (in the specific context) you are getting a factor in the first place, and try to stop that. E.g., is thedecargument inread.tableset correctly?
– CJB
Jan 25 '16 at 9:44
If you use a dataframe you can use convert from hablar.df %>% convert(num(column)). Or if you have a factor vector you can useas_reliable_num(factor_vector)
– davsjob
Nov 1 at 9:53
add a comment |
up vote
495
down vote
favorite
up vote
495
down vote
favorite
When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.
f <- factor(sample(runif(5), 20, replace = TRUE))
## [1] 0.0248644019011408 0.0248644019011408 0.179684827337041
## [4] 0.0284090070053935 0.363644931698218 0.363644931698218
## [7] 0.179684827337041 0.249704354675487 0.249704354675487
## [10] 0.0248644019011408 0.249704354675487 0.0284090070053935
## [13] 0.179684827337041 0.0248644019011408 0.179684827337041
## [16] 0.363644931698218 0.249704354675487 0.363644931698218
## [19] 0.179684827337041 0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218
as.numeric(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2
as.integer(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2
I have to resort to paste to get the real values:
as.numeric(paste(f))
## [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
## [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
## [19] 0.17968483 0.02840901
Is there a better way to convert a factor to numeric?
r casting r-faq
When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.
f <- factor(sample(runif(5), 20, replace = TRUE))
## [1] 0.0248644019011408 0.0248644019011408 0.179684827337041
## [4] 0.0284090070053935 0.363644931698218 0.363644931698218
## [7] 0.179684827337041 0.249704354675487 0.249704354675487
## [10] 0.0248644019011408 0.249704354675487 0.0284090070053935
## [13] 0.179684827337041 0.0248644019011408 0.179684827337041
## [16] 0.363644931698218 0.249704354675487 0.363644931698218
## [19] 0.179684827337041 0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218
as.numeric(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2
as.integer(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2
I have to resort to paste to get the real values:
as.numeric(paste(f))
## [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
## [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
## [19] 0.17968483 0.02840901
Is there a better way to convert a factor to numeric?
r casting r-faq
r casting r-faq
edited Apr 1 at 11:06
Jaap
54.7k20116129
54.7k20116129
asked Aug 5 '10 at 18:53
Adam SO
3,66162127
3,66162127
1
The levels of a factor are stored as character data type anyway (attributes(f)), so I don't think there is anything wrong withas.numeric(paste(f)). Perhaps it would be better to think why (in the specific context) you are getting a factor in the first place, and try to stop that. E.g., is thedecargument inread.tableset correctly?
– CJB
Jan 25 '16 at 9:44
If you use a dataframe you can use convert from hablar.df %>% convert(num(column)). Or if you have a factor vector you can useas_reliable_num(factor_vector)
– davsjob
Nov 1 at 9:53
add a comment |
1
The levels of a factor are stored as character data type anyway (attributes(f)), so I don't think there is anything wrong withas.numeric(paste(f)). Perhaps it would be better to think why (in the specific context) you are getting a factor in the first place, and try to stop that. E.g., is thedecargument inread.tableset correctly?
– CJB
Jan 25 '16 at 9:44
If you use a dataframe you can use convert from hablar.df %>% convert(num(column)). Or if you have a factor vector you can useas_reliable_num(factor_vector)
– davsjob
Nov 1 at 9:53
1
1
The levels of a factor are stored as character data type anyway (
attributes(f)), so I don't think there is anything wrong with as.numeric(paste(f)). Perhaps it would be better to think why (in the specific context) you are getting a factor in the first place, and try to stop that. E.g., is the dec argument in read.table set correctly?– CJB
Jan 25 '16 at 9:44
The levels of a factor are stored as character data type anyway (
attributes(f)), so I don't think there is anything wrong with as.numeric(paste(f)). Perhaps it would be better to think why (in the specific context) you are getting a factor in the first place, and try to stop that. E.g., is the dec argument in read.table set correctly?– CJB
Jan 25 '16 at 9:44
If you use a dataframe you can use convert from hablar.
df %>% convert(num(column)). Or if you have a factor vector you can use as_reliable_num(factor_vector)– davsjob
Nov 1 at 9:53
If you use a dataframe you can use convert from hablar.
df %>% convert(num(column)). Or if you have a factor vector you can use as_reliable_num(factor_vector)– davsjob
Nov 1 at 9:53
add a comment |
7 Answers
7
active
oldest
votes
up vote
590
down vote
accepted
See the Warning section of ?factor:
In particular,
as.numericapplied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factorfto
approximately its original numeric
values,as.numeric(levels(f))[f]is
recommended and slightly more
efficient than
as.numeric(as.character(f)).
The FAQ on R has similar advice.
Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?
as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.
Some timings
library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
## expr min lq mean median uq max neval
## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05
2
For timings see this answer: stackoverflow.com/questions/6979625/…
– Ari B. Friedman
Aug 8 '11 at 11:27
2
Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
– Sam
Apr 18 '14 at 0:25
6
@Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
– Jonathan
Jun 27 '14 at 19:12
8
when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
– maycca
Apr 13 '16 at 21:23
@maycca did you overcame this issue?
– user08041991
Jan 31 '17 at 12:25
|
show 1 more comment
up vote
72
down vote
R has a number of (undocumented) convenience functions for converting factors:
as.character.factoras.data.frame.factoras.Date.factoras.list.factoras.vector.factor- ...
But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich's answer, I would suggest to overcome this omission with the definition of your own idiomatic function:
as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
that you can store at the beginning of your script, or even better in your .Rprofile file.
12
There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected thatas.integer(factor)returns the underlying integer codes (as shown in the examples section of?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.
– Joshua Ulrich
Apr 18 '14 at 12:03
That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersomefactor->numericconversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling itas.numeric.factormakes sense to me, but YMMV.
– Jealie
Apr 18 '14 at 20:11
4
If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
– Joshua Ulrich
Apr 18 '14 at 22:44
as.numeric.factor returns NA?
– jO.
Aug 8 '14 at 7:56
@jO.: in the cases where you used something likev=NA;as.numeric.factor(v)orv='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.
– Jealie
Aug 8 '14 at 14:43
add a comment |
up vote
27
down vote
The most easiest way would be to use unfactor function from package varhandle
unfactor(your_factor_variable)
This example can be a quick start:
x <- rep(c("a", "b", "c"), 20)
y <- rep(c(1, 1, 0), 20)
class(x) # -> "character"
class(y) # -> "numeric"
x <- factor(x)
y <- factor(y)
class(x) # -> "factor"
class(y) # -> "factor"
library(varhandle)
x <- unfactor(x)
y <- unfactor(y)
class(x) # -> "character"
class(y) # -> "numeric"
Theunfactorfunction converts to character data type first and then converts back to numeric. Typeunfactorat the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.
– CJB
Jan 25 '16 at 9:32
Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
– CJB
Jan 25 '16 at 9:38
Theunfactorfunction takes care of things that cannot be converted to numeric. Check the examples inhelp("unfactor")
– Mehrad Mahmoudian
Jul 25 '16 at 13:15
2
@Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (library("varhandle")) first (as I mentioned in the first line of my answer!!)
– Mehrad Mahmoudian
Sep 29 '16 at 13:06
1
@Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put theas.numeric()andas.character()in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions
– Mehrad Mahmoudian
Nov 10 '16 at 11:53
|
show 3 more comments
up vote
15
down vote
Every answer in this post failed to generate results for me , NAs were getting generated.
y2<-factor(c("A","B","C","D","A"));
as.numeric(levels(y2))[y2]
[1] NA NA NA NA NA Warning message: NAs introduced by coercion
What worked for me is this -
as.integer(y2)
# [1] 1 2 3 4 1
Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.
Are you sure you had a factor? Look at this example.y<-factor(c("5","15","20","2")); unclass(y) %>% as.numericThis returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.
– MrFlick
Feb 22 '17 at 19:19
Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
– Indi
Feb 22 '17 at 19:34
Let me update my scenario in the answer that I had provided
– Indi
Feb 22 '17 at 19:36
3
OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case ,as.numeric(y)should have worked just fine, no need for theunclass(). But again, that's not what this question was about. This answer isn't appropriate here.
– MrFlick
Feb 22 '17 at 19:37
3
Well, I really hope it helps someone who was in a hurry like me and read just the title !
– Indi
Feb 22 '17 at 19:45
|
show 1 more comment
up vote
7
down vote
It is possible only in the case when the factor labels match the original values. I will explain it with an example.
Assume the data is vector x:
x <- c(20, 10, 30, 20, 10, 40, 10, 40)
Now I will create a factor with four labels:
f <- factor(x, levels = c(10, 20, 30, 40), labels = c("A", "B", "C", "D"))
1) x is with type double, f is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.
> typeof(x)
[1] "double"
> typeof(f)
[1] "integer"
2) It is not possible to revert back to the original values (10, 20, 30, 40) having only f available. We can see that f holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.
> str(f)
Factor w/ 4 levels "A","B","C","D": 2 1 3 2 1 4 1 4
> attributes(f)
$levels
[1] "A" "B" "C" "D"
$class
[1] "factor"
To revert back to the original values we have to know the values of levels used in creating the factor. In this case c(10, 20, 30, 40). If we know the original levels (in correct order), we can revert back to the original values.
> orig_levels <- c(10, 20, 30, 40)
> x1 <- orig_levels[f]
> all.equal(x, x1)
[1] TRUE
And this will work only in case when labels have been defined for all possible values in the original data.
So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor.
add a comment |
up vote
0
down vote
You can use hablar::convert if you have a data frame. The syntax is easy:
Sample df
library(hablar)
library(dplyr)
df <- dplyr::tibble(a = as.factor(c("7", "3")),
b = as.factor(c("1.5", "6.3")))
Solution
df %>%
convert(num(a, b))
gives you:
# A tibble: 2 x 2
a b
<dbl> <dbl>
1 7. 1.50
2 3. 6.30
Or if you want one column to be integer and one numeric:
df %>%
convert(int(a),
num(b))
results in:
# A tibble: 2 x 2
a b
<int> <dbl>
1 7 1.50
2 3 6.30
add a comment |
up vote
-1
down vote
late to the game, accidently, I found trimws() can convert factor(3:5) to c("3","4","5"). Then you can call as.numeric(). That is:
as.numeric(trimws(x_factor_var))
1
Is there a reason you would recommend usingtrimwsoveras.characteras described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove,trimwsis just going to do a bunch of unnecessary regular expression work to return the same result.
– MrFlick
Nov 13 at 18:54
add a comment |
protected by Joshua Ulrich Jul 9 '13 at 13:53
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
7 Answers
7
active
oldest
votes
7 Answers
7
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
590
down vote
accepted
See the Warning section of ?factor:
In particular,
as.numericapplied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factorfto
approximately its original numeric
values,as.numeric(levels(f))[f]is
recommended and slightly more
efficient than
as.numeric(as.character(f)).
The FAQ on R has similar advice.
Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?
as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.
Some timings
library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
## expr min lq mean median uq max neval
## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05
2
For timings see this answer: stackoverflow.com/questions/6979625/…
– Ari B. Friedman
Aug 8 '11 at 11:27
2
Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
– Sam
Apr 18 '14 at 0:25
6
@Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
– Jonathan
Jun 27 '14 at 19:12
8
when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
– maycca
Apr 13 '16 at 21:23
@maycca did you overcame this issue?
– user08041991
Jan 31 '17 at 12:25
|
show 1 more comment
up vote
590
down vote
accepted
See the Warning section of ?factor:
In particular,
as.numericapplied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factorfto
approximately its original numeric
values,as.numeric(levels(f))[f]is
recommended and slightly more
efficient than
as.numeric(as.character(f)).
The FAQ on R has similar advice.
Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?
as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.
Some timings
library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
## expr min lq mean median uq max neval
## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05
2
For timings see this answer: stackoverflow.com/questions/6979625/…
– Ari B. Friedman
Aug 8 '11 at 11:27
2
Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
– Sam
Apr 18 '14 at 0:25
6
@Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
– Jonathan
Jun 27 '14 at 19:12
8
when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
– maycca
Apr 13 '16 at 21:23
@maycca did you overcame this issue?
– user08041991
Jan 31 '17 at 12:25
|
show 1 more comment
up vote
590
down vote
accepted
up vote
590
down vote
accepted
See the Warning section of ?factor:
In particular,
as.numericapplied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factorfto
approximately its original numeric
values,as.numeric(levels(f))[f]is
recommended and slightly more
efficient than
as.numeric(as.character(f)).
The FAQ on R has similar advice.
Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?
as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.
Some timings
library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
## expr min lq mean median uq max neval
## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05
See the Warning section of ?factor:
In particular,
as.numericapplied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factorfto
approximately its original numeric
values,as.numeric(levels(f))[f]is
recommended and slightly more
efficient than
as.numeric(as.character(f)).
The FAQ on R has similar advice.
Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?
as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.
Some timings
library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
## expr min lq mean median uq max neval
## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05
edited Jan 17 '16 at 7:51
Jaap
54.7k20116129
54.7k20116129
answered Aug 5 '10 at 19:01
Joshua Ulrich
137k22268356
137k22268356
2
For timings see this answer: stackoverflow.com/questions/6979625/…
– Ari B. Friedman
Aug 8 '11 at 11:27
2
Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
– Sam
Apr 18 '14 at 0:25
6
@Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
– Jonathan
Jun 27 '14 at 19:12
8
when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
– maycca
Apr 13 '16 at 21:23
@maycca did you overcame this issue?
– user08041991
Jan 31 '17 at 12:25
|
show 1 more comment
2
For timings see this answer: stackoverflow.com/questions/6979625/…
– Ari B. Friedman
Aug 8 '11 at 11:27
2
Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
– Sam
Apr 18 '14 at 0:25
6
@Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
– Jonathan
Jun 27 '14 at 19:12
8
when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
– maycca
Apr 13 '16 at 21:23
@maycca did you overcame this issue?
– user08041991
Jan 31 '17 at 12:25
2
2
For timings see this answer: stackoverflow.com/questions/6979625/…
– Ari B. Friedman
Aug 8 '11 at 11:27
For timings see this answer: stackoverflow.com/questions/6979625/…
– Ari B. Friedman
Aug 8 '11 at 11:27
2
2
Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
– Sam
Apr 18 '14 at 0:25
Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
– Sam
Apr 18 '14 at 0:25
6
6
@Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
– Jonathan
Jun 27 '14 at 19:12
@Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
– Jonathan
Jun 27 '14 at 19:12
8
8
when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
– maycca
Apr 13 '16 at 21:23
when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
– maycca
Apr 13 '16 at 21:23
@maycca did you overcame this issue?
– user08041991
Jan 31 '17 at 12:25
@maycca did you overcame this issue?
– user08041991
Jan 31 '17 at 12:25
|
show 1 more comment
up vote
72
down vote
R has a number of (undocumented) convenience functions for converting factors:
as.character.factoras.data.frame.factoras.Date.factoras.list.factoras.vector.factor- ...
But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich's answer, I would suggest to overcome this omission with the definition of your own idiomatic function:
as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
that you can store at the beginning of your script, or even better in your .Rprofile file.
12
There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected thatas.integer(factor)returns the underlying integer codes (as shown in the examples section of?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.
– Joshua Ulrich
Apr 18 '14 at 12:03
That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersomefactor->numericconversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling itas.numeric.factormakes sense to me, but YMMV.
– Jealie
Apr 18 '14 at 20:11
4
If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
– Joshua Ulrich
Apr 18 '14 at 22:44
as.numeric.factor returns NA?
– jO.
Aug 8 '14 at 7:56
@jO.: in the cases where you used something likev=NA;as.numeric.factor(v)orv='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.
– Jealie
Aug 8 '14 at 14:43
add a comment |
up vote
72
down vote
R has a number of (undocumented) convenience functions for converting factors:
as.character.factoras.data.frame.factoras.Date.factoras.list.factoras.vector.factor- ...
But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich's answer, I would suggest to overcome this omission with the definition of your own idiomatic function:
as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
that you can store at the beginning of your script, or even better in your .Rprofile file.
12
There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected thatas.integer(factor)returns the underlying integer codes (as shown in the examples section of?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.
– Joshua Ulrich
Apr 18 '14 at 12:03
That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersomefactor->numericconversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling itas.numeric.factormakes sense to me, but YMMV.
– Jealie
Apr 18 '14 at 20:11
4
If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
– Joshua Ulrich
Apr 18 '14 at 22:44
as.numeric.factor returns NA?
– jO.
Aug 8 '14 at 7:56
@jO.: in the cases where you used something likev=NA;as.numeric.factor(v)orv='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.
– Jealie
Aug 8 '14 at 14:43
add a comment |
up vote
72
down vote
up vote
72
down vote
R has a number of (undocumented) convenience functions for converting factors:
as.character.factoras.data.frame.factoras.Date.factoras.list.factoras.vector.factor- ...
But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich's answer, I would suggest to overcome this omission with the definition of your own idiomatic function:
as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
that you can store at the beginning of your script, or even better in your .Rprofile file.
R has a number of (undocumented) convenience functions for converting factors:
as.character.factoras.data.frame.factoras.Date.factoras.list.factoras.vector.factor- ...
But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich's answer, I would suggest to overcome this omission with the definition of your own idiomatic function:
as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
that you can store at the beginning of your script, or even better in your .Rprofile file.
edited Jun 4 '14 at 18:19
MrLore
3,37922033
3,37922033
answered Mar 27 '14 at 23:39
Jealie
4,3382133
4,3382133
12
There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected thatas.integer(factor)returns the underlying integer codes (as shown in the examples section of?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.
– Joshua Ulrich
Apr 18 '14 at 12:03
That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersomefactor->numericconversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling itas.numeric.factormakes sense to me, but YMMV.
– Jealie
Apr 18 '14 at 20:11
4
If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
– Joshua Ulrich
Apr 18 '14 at 22:44
as.numeric.factor returns NA?
– jO.
Aug 8 '14 at 7:56
@jO.: in the cases where you used something likev=NA;as.numeric.factor(v)orv='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.
– Jealie
Aug 8 '14 at 14:43
add a comment |
12
There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected thatas.integer(factor)returns the underlying integer codes (as shown in the examples section of?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.
– Joshua Ulrich
Apr 18 '14 at 12:03
That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersomefactor->numericconversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling itas.numeric.factormakes sense to me, but YMMV.
– Jealie
Apr 18 '14 at 20:11
4
If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
– Joshua Ulrich
Apr 18 '14 at 22:44
as.numeric.factor returns NA?
– jO.
Aug 8 '14 at 7:56
@jO.: in the cases where you used something likev=NA;as.numeric.factor(v)orv='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.
– Jealie
Aug 8 '14 at 14:43
12
12
There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected that
as.integer(factor) returns the underlying integer codes (as shown in the examples section of ?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.– Joshua Ulrich
Apr 18 '14 at 12:03
There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected that
as.integer(factor) returns the underlying integer codes (as shown in the examples section of ?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.– Joshua Ulrich
Apr 18 '14 at 12:03
That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersome
factor->numeric conversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling it as.numeric.factor makes sense to me, but YMMV.– Jealie
Apr 18 '14 at 20:11
That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersome
factor->numeric conversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling it as.numeric.factor makes sense to me, but YMMV.– Jealie
Apr 18 '14 at 20:11
4
4
If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
– Joshua Ulrich
Apr 18 '14 at 22:44
If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
– Joshua Ulrich
Apr 18 '14 at 22:44
as.numeric.factor returns NA?
– jO.
Aug 8 '14 at 7:56
as.numeric.factor returns NA?
– jO.
Aug 8 '14 at 7:56
@jO.: in the cases where you used something like
v=NA;as.numeric.factor(v) or v='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.– Jealie
Aug 8 '14 at 14:43
@jO.: in the cases where you used something like
v=NA;as.numeric.factor(v) or v='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.– Jealie
Aug 8 '14 at 14:43
add a comment |
up vote
27
down vote
The most easiest way would be to use unfactor function from package varhandle
unfactor(your_factor_variable)
This example can be a quick start:
x <- rep(c("a", "b", "c"), 20)
y <- rep(c(1, 1, 0), 20)
class(x) # -> "character"
class(y) # -> "numeric"
x <- factor(x)
y <- factor(y)
class(x) # -> "factor"
class(y) # -> "factor"
library(varhandle)
x <- unfactor(x)
y <- unfactor(y)
class(x) # -> "character"
class(y) # -> "numeric"
Theunfactorfunction converts to character data type first and then converts back to numeric. Typeunfactorat the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.
– CJB
Jan 25 '16 at 9:32
Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
– CJB
Jan 25 '16 at 9:38
Theunfactorfunction takes care of things that cannot be converted to numeric. Check the examples inhelp("unfactor")
– Mehrad Mahmoudian
Jul 25 '16 at 13:15
2
@Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (library("varhandle")) first (as I mentioned in the first line of my answer!!)
– Mehrad Mahmoudian
Sep 29 '16 at 13:06
1
@Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put theas.numeric()andas.character()in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions
– Mehrad Mahmoudian
Nov 10 '16 at 11:53
|
show 3 more comments
up vote
27
down vote
The most easiest way would be to use unfactor function from package varhandle
unfactor(your_factor_variable)
This example can be a quick start:
x <- rep(c("a", "b", "c"), 20)
y <- rep(c(1, 1, 0), 20)
class(x) # -> "character"
class(y) # -> "numeric"
x <- factor(x)
y <- factor(y)
class(x) # -> "factor"
class(y) # -> "factor"
library(varhandle)
x <- unfactor(x)
y <- unfactor(y)
class(x) # -> "character"
class(y) # -> "numeric"
Theunfactorfunction converts to character data type first and then converts back to numeric. Typeunfactorat the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.
– CJB
Jan 25 '16 at 9:32
Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
– CJB
Jan 25 '16 at 9:38
Theunfactorfunction takes care of things that cannot be converted to numeric. Check the examples inhelp("unfactor")
– Mehrad Mahmoudian
Jul 25 '16 at 13:15
2
@Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (library("varhandle")) first (as I mentioned in the first line of my answer!!)
– Mehrad Mahmoudian
Sep 29 '16 at 13:06
1
@Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put theas.numeric()andas.character()in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions
– Mehrad Mahmoudian
Nov 10 '16 at 11:53
|
show 3 more comments
up vote
27
down vote
up vote
27
down vote
The most easiest way would be to use unfactor function from package varhandle
unfactor(your_factor_variable)
This example can be a quick start:
x <- rep(c("a", "b", "c"), 20)
y <- rep(c(1, 1, 0), 20)
class(x) # -> "character"
class(y) # -> "numeric"
x <- factor(x)
y <- factor(y)
class(x) # -> "factor"
class(y) # -> "factor"
library(varhandle)
x <- unfactor(x)
y <- unfactor(y)
class(x) # -> "character"
class(y) # -> "numeric"
The most easiest way would be to use unfactor function from package varhandle
unfactor(your_factor_variable)
This example can be a quick start:
x <- rep(c("a", "b", "c"), 20)
y <- rep(c(1, 1, 0), 20)
class(x) # -> "character"
class(y) # -> "numeric"
x <- factor(x)
y <- factor(y)
class(x) # -> "factor"
class(y) # -> "factor"
library(varhandle)
x <- unfactor(x)
y <- unfactor(y)
class(x) # -> "character"
class(y) # -> "numeric"
edited Jan 25 '16 at 9:14
answered Dec 1 '15 at 14:11
Mehrad Mahmoudian
1,5181726
1,5181726
Theunfactorfunction converts to character data type first and then converts back to numeric. Typeunfactorat the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.
– CJB
Jan 25 '16 at 9:32
Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
– CJB
Jan 25 '16 at 9:38
Theunfactorfunction takes care of things that cannot be converted to numeric. Check the examples inhelp("unfactor")
– Mehrad Mahmoudian
Jul 25 '16 at 13:15
2
@Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (library("varhandle")) first (as I mentioned in the first line of my answer!!)
– Mehrad Mahmoudian
Sep 29 '16 at 13:06
1
@Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put theas.numeric()andas.character()in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions
– Mehrad Mahmoudian
Nov 10 '16 at 11:53
|
show 3 more comments
Theunfactorfunction converts to character data type first and then converts back to numeric. Typeunfactorat the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.
– CJB
Jan 25 '16 at 9:32
Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
– CJB
Jan 25 '16 at 9:38
Theunfactorfunction takes care of things that cannot be converted to numeric. Check the examples inhelp("unfactor")
– Mehrad Mahmoudian
Jul 25 '16 at 13:15
2
@Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (library("varhandle")) first (as I mentioned in the first line of my answer!!)
– Mehrad Mahmoudian
Sep 29 '16 at 13:06
1
@Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put theas.numeric()andas.character()in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions
– Mehrad Mahmoudian
Nov 10 '16 at 11:53
The
unfactor function converts to character data type first and then converts back to numeric. Type unfactor at the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.– CJB
Jan 25 '16 at 9:32
The
unfactor function converts to character data type first and then converts back to numeric. Type unfactor at the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.– CJB
Jan 25 '16 at 9:32
Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
– CJB
Jan 25 '16 at 9:38
Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
– CJB
Jan 25 '16 at 9:38
The
unfactor function takes care of things that cannot be converted to numeric. Check the examples in help("unfactor")– Mehrad Mahmoudian
Jul 25 '16 at 13:15
The
unfactor function takes care of things that cannot be converted to numeric. Check the examples in help("unfactor")– Mehrad Mahmoudian
Jul 25 '16 at 13:15
2
2
@Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (
library("varhandle")) first (as I mentioned in the first line of my answer!!)– Mehrad Mahmoudian
Sep 29 '16 at 13:06
@Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (
library("varhandle")) first (as I mentioned in the first line of my answer!!)– Mehrad Mahmoudian
Sep 29 '16 at 13:06
1
1
@Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put the
as.numeric() and as.character() in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions– Mehrad Mahmoudian
Nov 10 '16 at 11:53
@Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put the
as.numeric() and as.character() in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions– Mehrad Mahmoudian
Nov 10 '16 at 11:53
|
show 3 more comments
up vote
15
down vote
Every answer in this post failed to generate results for me , NAs were getting generated.
y2<-factor(c("A","B","C","D","A"));
as.numeric(levels(y2))[y2]
[1] NA NA NA NA NA Warning message: NAs introduced by coercion
What worked for me is this -
as.integer(y2)
# [1] 1 2 3 4 1
Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.
Are you sure you had a factor? Look at this example.y<-factor(c("5","15","20","2")); unclass(y) %>% as.numericThis returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.
– MrFlick
Feb 22 '17 at 19:19
Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
– Indi
Feb 22 '17 at 19:34
Let me update my scenario in the answer that I had provided
– Indi
Feb 22 '17 at 19:36
3
OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case ,as.numeric(y)should have worked just fine, no need for theunclass(). But again, that's not what this question was about. This answer isn't appropriate here.
– MrFlick
Feb 22 '17 at 19:37
3
Well, I really hope it helps someone who was in a hurry like me and read just the title !
– Indi
Feb 22 '17 at 19:45
|
show 1 more comment
up vote
15
down vote
Every answer in this post failed to generate results for me , NAs were getting generated.
y2<-factor(c("A","B","C","D","A"));
as.numeric(levels(y2))[y2]
[1] NA NA NA NA NA Warning message: NAs introduced by coercion
What worked for me is this -
as.integer(y2)
# [1] 1 2 3 4 1
Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.
Are you sure you had a factor? Look at this example.y<-factor(c("5","15","20","2")); unclass(y) %>% as.numericThis returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.
– MrFlick
Feb 22 '17 at 19:19
Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
– Indi
Feb 22 '17 at 19:34
Let me update my scenario in the answer that I had provided
– Indi
Feb 22 '17 at 19:36
3
OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case ,as.numeric(y)should have worked just fine, no need for theunclass(). But again, that's not what this question was about. This answer isn't appropriate here.
– MrFlick
Feb 22 '17 at 19:37
3
Well, I really hope it helps someone who was in a hurry like me and read just the title !
– Indi
Feb 22 '17 at 19:45
|
show 1 more comment
up vote
15
down vote
up vote
15
down vote
Every answer in this post failed to generate results for me , NAs were getting generated.
y2<-factor(c("A","B","C","D","A"));
as.numeric(levels(y2))[y2]
[1] NA NA NA NA NA Warning message: NAs introduced by coercion
What worked for me is this -
as.integer(y2)
# [1] 1 2 3 4 1
Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.
Every answer in this post failed to generate results for me , NAs were getting generated.
y2<-factor(c("A","B","C","D","A"));
as.numeric(levels(y2))[y2]
[1] NA NA NA NA NA Warning message: NAs introduced by coercion
What worked for me is this -
as.integer(y2)
# [1] 1 2 3 4 1
Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.
edited Jun 1 at 15:13
Gregor
62.4k988163
62.4k988163
answered Feb 22 '17 at 18:26
Indi
762523
762523
Are you sure you had a factor? Look at this example.y<-factor(c("5","15","20","2")); unclass(y) %>% as.numericThis returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.
– MrFlick
Feb 22 '17 at 19:19
Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
– Indi
Feb 22 '17 at 19:34
Let me update my scenario in the answer that I had provided
– Indi
Feb 22 '17 at 19:36
3
OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case ,as.numeric(y)should have worked just fine, no need for theunclass(). But again, that's not what this question was about. This answer isn't appropriate here.
– MrFlick
Feb 22 '17 at 19:37
3
Well, I really hope it helps someone who was in a hurry like me and read just the title !
– Indi
Feb 22 '17 at 19:45
|
show 1 more comment
Are you sure you had a factor? Look at this example.y<-factor(c("5","15","20","2")); unclass(y) %>% as.numericThis returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.
– MrFlick
Feb 22 '17 at 19:19
Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
– Indi
Feb 22 '17 at 19:34
Let me update my scenario in the answer that I had provided
– Indi
Feb 22 '17 at 19:36
3
OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case ,as.numeric(y)should have worked just fine, no need for theunclass(). But again, that's not what this question was about. This answer isn't appropriate here.
– MrFlick
Feb 22 '17 at 19:37
3
Well, I really hope it helps someone who was in a hurry like me and read just the title !
– Indi
Feb 22 '17 at 19:45
Are you sure you had a factor? Look at this example.
y<-factor(c("5","15","20","2")); unclass(y) %>% as.numeric This returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.– MrFlick
Feb 22 '17 at 19:19
Are you sure you had a factor? Look at this example.
y<-factor(c("5","15","20","2")); unclass(y) %>% as.numeric This returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.– MrFlick
Feb 22 '17 at 19:19
Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
– Indi
Feb 22 '17 at 19:34
Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
– Indi
Feb 22 '17 at 19:34
Let me update my scenario in the answer that I had provided
– Indi
Feb 22 '17 at 19:36
Let me update my scenario in the answer that I had provided
– Indi
Feb 22 '17 at 19:36
3
3
OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case ,
as.numeric(y) should have worked just fine, no need for the unclass(). But again, that's not what this question was about. This answer isn't appropriate here.– MrFlick
Feb 22 '17 at 19:37
OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case ,
as.numeric(y) should have worked just fine, no need for the unclass(). But again, that's not what this question was about. This answer isn't appropriate here.– MrFlick
Feb 22 '17 at 19:37
3
3
Well, I really hope it helps someone who was in a hurry like me and read just the title !
– Indi
Feb 22 '17 at 19:45
Well, I really hope it helps someone who was in a hurry like me and read just the title !
– Indi
Feb 22 '17 at 19:45
|
show 1 more comment
up vote
7
down vote
It is possible only in the case when the factor labels match the original values. I will explain it with an example.
Assume the data is vector x:
x <- c(20, 10, 30, 20, 10, 40, 10, 40)
Now I will create a factor with four labels:
f <- factor(x, levels = c(10, 20, 30, 40), labels = c("A", "B", "C", "D"))
1) x is with type double, f is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.
> typeof(x)
[1] "double"
> typeof(f)
[1] "integer"
2) It is not possible to revert back to the original values (10, 20, 30, 40) having only f available. We can see that f holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.
> str(f)
Factor w/ 4 levels "A","B","C","D": 2 1 3 2 1 4 1 4
> attributes(f)
$levels
[1] "A" "B" "C" "D"
$class
[1] "factor"
To revert back to the original values we have to know the values of levels used in creating the factor. In this case c(10, 20, 30, 40). If we know the original levels (in correct order), we can revert back to the original values.
> orig_levels <- c(10, 20, 30, 40)
> x1 <- orig_levels[f]
> all.equal(x, x1)
[1] TRUE
And this will work only in case when labels have been defined for all possible values in the original data.
So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor.
add a comment |
up vote
7
down vote
It is possible only in the case when the factor labels match the original values. I will explain it with an example.
Assume the data is vector x:
x <- c(20, 10, 30, 20, 10, 40, 10, 40)
Now I will create a factor with four labels:
f <- factor(x, levels = c(10, 20, 30, 40), labels = c("A", "B", "C", "D"))
1) x is with type double, f is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.
> typeof(x)
[1] "double"
> typeof(f)
[1] "integer"
2) It is not possible to revert back to the original values (10, 20, 30, 40) having only f available. We can see that f holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.
> str(f)
Factor w/ 4 levels "A","B","C","D": 2 1 3 2 1 4 1 4
> attributes(f)
$levels
[1] "A" "B" "C" "D"
$class
[1] "factor"
To revert back to the original values we have to know the values of levels used in creating the factor. In this case c(10, 20, 30, 40). If we know the original levels (in correct order), we can revert back to the original values.
> orig_levels <- c(10, 20, 30, 40)
> x1 <- orig_levels[f]
> all.equal(x, x1)
[1] TRUE
And this will work only in case when labels have been defined for all possible values in the original data.
So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor.
add a comment |
up vote
7
down vote
up vote
7
down vote
It is possible only in the case when the factor labels match the original values. I will explain it with an example.
Assume the data is vector x:
x <- c(20, 10, 30, 20, 10, 40, 10, 40)
Now I will create a factor with four labels:
f <- factor(x, levels = c(10, 20, 30, 40), labels = c("A", "B", "C", "D"))
1) x is with type double, f is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.
> typeof(x)
[1] "double"
> typeof(f)
[1] "integer"
2) It is not possible to revert back to the original values (10, 20, 30, 40) having only f available. We can see that f holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.
> str(f)
Factor w/ 4 levels "A","B","C","D": 2 1 3 2 1 4 1 4
> attributes(f)
$levels
[1] "A" "B" "C" "D"
$class
[1] "factor"
To revert back to the original values we have to know the values of levels used in creating the factor. In this case c(10, 20, 30, 40). If we know the original levels (in correct order), we can revert back to the original values.
> orig_levels <- c(10, 20, 30, 40)
> x1 <- orig_levels[f]
> all.equal(x, x1)
[1] TRUE
And this will work only in case when labels have been defined for all possible values in the original data.
So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor.
It is possible only in the case when the factor labels match the original values. I will explain it with an example.
Assume the data is vector x:
x <- c(20, 10, 30, 20, 10, 40, 10, 40)
Now I will create a factor with four labels:
f <- factor(x, levels = c(10, 20, 30, 40), labels = c("A", "B", "C", "D"))
1) x is with type double, f is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.
> typeof(x)
[1] "double"
> typeof(f)
[1] "integer"
2) It is not possible to revert back to the original values (10, 20, 30, 40) having only f available. We can see that f holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.
> str(f)
Factor w/ 4 levels "A","B","C","D": 2 1 3 2 1 4 1 4
> attributes(f)
$levels
[1] "A" "B" "C" "D"
$class
[1] "factor"
To revert back to the original values we have to know the values of levels used in creating the factor. In this case c(10, 20, 30, 40). If we know the original levels (in correct order), we can revert back to the original values.
> orig_levels <- c(10, 20, 30, 40)
> x1 <- orig_levels[f]
> all.equal(x, x1)
[1] TRUE
And this will work only in case when labels have been defined for all possible values in the original data.
So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor.
answered Oct 9 '15 at 12:34
djhurio
4,18021941
4,18021941
add a comment |
add a comment |
up vote
0
down vote
You can use hablar::convert if you have a data frame. The syntax is easy:
Sample df
library(hablar)
library(dplyr)
df <- dplyr::tibble(a = as.factor(c("7", "3")),
b = as.factor(c("1.5", "6.3")))
Solution
df %>%
convert(num(a, b))
gives you:
# A tibble: 2 x 2
a b
<dbl> <dbl>
1 7. 1.50
2 3. 6.30
Or if you want one column to be integer and one numeric:
df %>%
convert(int(a),
num(b))
results in:
# A tibble: 2 x 2
a b
<int> <dbl>
1 7 1.50
2 3 6.30
add a comment |
up vote
0
down vote
You can use hablar::convert if you have a data frame. The syntax is easy:
Sample df
library(hablar)
library(dplyr)
df <- dplyr::tibble(a = as.factor(c("7", "3")),
b = as.factor(c("1.5", "6.3")))
Solution
df %>%
convert(num(a, b))
gives you:
# A tibble: 2 x 2
a b
<dbl> <dbl>
1 7. 1.50
2 3. 6.30
Or if you want one column to be integer and one numeric:
df %>%
convert(int(a),
num(b))
results in:
# A tibble: 2 x 2
a b
<int> <dbl>
1 7 1.50
2 3 6.30
add a comment |
up vote
0
down vote
up vote
0
down vote
You can use hablar::convert if you have a data frame. The syntax is easy:
Sample df
library(hablar)
library(dplyr)
df <- dplyr::tibble(a = as.factor(c("7", "3")),
b = as.factor(c("1.5", "6.3")))
Solution
df %>%
convert(num(a, b))
gives you:
# A tibble: 2 x 2
a b
<dbl> <dbl>
1 7. 1.50
2 3. 6.30
Or if you want one column to be integer and one numeric:
df %>%
convert(int(a),
num(b))
results in:
# A tibble: 2 x 2
a b
<int> <dbl>
1 7 1.50
2 3 6.30
You can use hablar::convert if you have a data frame. The syntax is easy:
Sample df
library(hablar)
library(dplyr)
df <- dplyr::tibble(a = as.factor(c("7", "3")),
b = as.factor(c("1.5", "6.3")))
Solution
df %>%
convert(num(a, b))
gives you:
# A tibble: 2 x 2
a b
<dbl> <dbl>
1 7. 1.50
2 3. 6.30
Or if you want one column to be integer and one numeric:
df %>%
convert(int(a),
num(b))
results in:
# A tibble: 2 x 2
a b
<int> <dbl>
1 7 1.50
2 3 6.30
answered Nov 1 at 10:05
davsjob
50726
50726
add a comment |
add a comment |
up vote
-1
down vote
late to the game, accidently, I found trimws() can convert factor(3:5) to c("3","4","5"). Then you can call as.numeric(). That is:
as.numeric(trimws(x_factor_var))
1
Is there a reason you would recommend usingtrimwsoveras.characteras described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove,trimwsis just going to do a bunch of unnecessary regular expression work to return the same result.
– MrFlick
Nov 13 at 18:54
add a comment |
up vote
-1
down vote
late to the game, accidently, I found trimws() can convert factor(3:5) to c("3","4","5"). Then you can call as.numeric(). That is:
as.numeric(trimws(x_factor_var))
1
Is there a reason you would recommend usingtrimwsoveras.characteras described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove,trimwsis just going to do a bunch of unnecessary regular expression work to return the same result.
– MrFlick
Nov 13 at 18:54
add a comment |
up vote
-1
down vote
up vote
-1
down vote
late to the game, accidently, I found trimws() can convert factor(3:5) to c("3","4","5"). Then you can call as.numeric(). That is:
as.numeric(trimws(x_factor_var))
late to the game, accidently, I found trimws() can convert factor(3:5) to c("3","4","5"). Then you can call as.numeric(). That is:
as.numeric(trimws(x_factor_var))
answered Nov 13 at 2:37
Jerry T
591710
591710
1
Is there a reason you would recommend usingtrimwsoveras.characteras described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove,trimwsis just going to do a bunch of unnecessary regular expression work to return the same result.
– MrFlick
Nov 13 at 18:54
add a comment |
1
Is there a reason you would recommend usingtrimwsoveras.characteras described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove,trimwsis just going to do a bunch of unnecessary regular expression work to return the same result.
– MrFlick
Nov 13 at 18:54
1
1
Is there a reason you would recommend using
trimws over as.character as described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove, trimws is just going to do a bunch of unnecessary regular expression work to return the same result.– MrFlick
Nov 13 at 18:54
Is there a reason you would recommend using
trimws over as.character as described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove, trimws is just going to do a bunch of unnecessary regular expression work to return the same result.– MrFlick
Nov 13 at 18:54
add a comment |
protected by Joshua Ulrich Jul 9 '13 at 13:53
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
1
The levels of a factor are stored as character data type anyway (
attributes(f)), so I don't think there is anything wrong withas.numeric(paste(f)). Perhaps it would be better to think why (in the specific context) you are getting a factor in the first place, and try to stop that. E.g., is thedecargument inread.tableset correctly?– CJB
Jan 25 '16 at 9:44
If you use a dataframe you can use convert from hablar.
df %>% convert(num(column)). Or if you have a factor vector you can useas_reliable_num(factor_vector)– davsjob
Nov 1 at 9:53