How to convert a factor to integernumeric without loss of information?











up vote
495
down vote

favorite
245












When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.



f <- factor(sample(runif(5), 20, replace = TRUE))
## [1] 0.0248644019011408 0.0248644019011408 0.179684827337041
## [4] 0.0284090070053935 0.363644931698218 0.363644931698218
## [7] 0.179684827337041 0.249704354675487 0.249704354675487
## [10] 0.0248644019011408 0.249704354675487 0.0284090070053935
## [13] 0.179684827337041 0.0248644019011408 0.179684827337041
## [16] 0.363644931698218 0.249704354675487 0.363644931698218
## [19] 0.179684827337041 0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218

as.numeric(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2

as.integer(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2


I have to resort to paste to get the real values:



as.numeric(paste(f))
## [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
## [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
## [19] 0.17968483 0.02840901


Is there a better way to convert a factor to numeric?










share|improve this question




















  • 1




    The levels of a factor are stored as character data type anyway (attributes(f)), so I don't think there is anything wrong with as.numeric(paste(f)). Perhaps it would be better to think why (in the specific context) you are getting a factor in the first place, and try to stop that. E.g., is the dec argument in read.table set correctly?
    – CJB
    Jan 25 '16 at 9:44












  • If you use a dataframe you can use convert from hablar. df %>% convert(num(column)). Or if you have a factor vector you can use as_reliable_num(factor_vector)
    – davsjob
    Nov 1 at 9:53















up vote
495
down vote

favorite
245












When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.



f <- factor(sample(runif(5), 20, replace = TRUE))
## [1] 0.0248644019011408 0.0248644019011408 0.179684827337041
## [4] 0.0284090070053935 0.363644931698218 0.363644931698218
## [7] 0.179684827337041 0.249704354675487 0.249704354675487
## [10] 0.0248644019011408 0.249704354675487 0.0284090070053935
## [13] 0.179684827337041 0.0248644019011408 0.179684827337041
## [16] 0.363644931698218 0.249704354675487 0.363644931698218
## [19] 0.179684827337041 0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218

as.numeric(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2

as.integer(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2


I have to resort to paste to get the real values:



as.numeric(paste(f))
## [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
## [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
## [19] 0.17968483 0.02840901


Is there a better way to convert a factor to numeric?










share|improve this question




















  • 1




    The levels of a factor are stored as character data type anyway (attributes(f)), so I don't think there is anything wrong with as.numeric(paste(f)). Perhaps it would be better to think why (in the specific context) you are getting a factor in the first place, and try to stop that. E.g., is the dec argument in read.table set correctly?
    – CJB
    Jan 25 '16 at 9:44












  • If you use a dataframe you can use convert from hablar. df %>% convert(num(column)). Or if you have a factor vector you can use as_reliable_num(factor_vector)
    – davsjob
    Nov 1 at 9:53













up vote
495
down vote

favorite
245









up vote
495
down vote

favorite
245






245





When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.



f <- factor(sample(runif(5), 20, replace = TRUE))
## [1] 0.0248644019011408 0.0248644019011408 0.179684827337041
## [4] 0.0284090070053935 0.363644931698218 0.363644931698218
## [7] 0.179684827337041 0.249704354675487 0.249704354675487
## [10] 0.0248644019011408 0.249704354675487 0.0284090070053935
## [13] 0.179684827337041 0.0248644019011408 0.179684827337041
## [16] 0.363644931698218 0.249704354675487 0.363644931698218
## [19] 0.179684827337041 0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218

as.numeric(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2

as.integer(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2


I have to resort to paste to get the real values:



as.numeric(paste(f))
## [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
## [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
## [19] 0.17968483 0.02840901


Is there a better way to convert a factor to numeric?










share|improve this question















When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.



f <- factor(sample(runif(5), 20, replace = TRUE))
## [1] 0.0248644019011408 0.0248644019011408 0.179684827337041
## [4] 0.0284090070053935 0.363644931698218 0.363644931698218
## [7] 0.179684827337041 0.249704354675487 0.249704354675487
## [10] 0.0248644019011408 0.249704354675487 0.0284090070053935
## [13] 0.179684827337041 0.0248644019011408 0.179684827337041
## [16] 0.363644931698218 0.249704354675487 0.363644931698218
## [19] 0.179684827337041 0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218

as.numeric(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2

as.integer(f)
## [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2


I have to resort to paste to get the real values:



as.numeric(paste(f))
## [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
## [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
## [19] 0.17968483 0.02840901


Is there a better way to convert a factor to numeric?







r casting r-faq






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Apr 1 at 11:06









Jaap

54.7k20116129




54.7k20116129










asked Aug 5 '10 at 18:53









Adam SO

3,66162127




3,66162127








  • 1




    The levels of a factor are stored as character data type anyway (attributes(f)), so I don't think there is anything wrong with as.numeric(paste(f)). Perhaps it would be better to think why (in the specific context) you are getting a factor in the first place, and try to stop that. E.g., is the dec argument in read.table set correctly?
    – CJB
    Jan 25 '16 at 9:44












  • If you use a dataframe you can use convert from hablar. df %>% convert(num(column)). Or if you have a factor vector you can use as_reliable_num(factor_vector)
    – davsjob
    Nov 1 at 9:53














  • 1




    The levels of a factor are stored as character data type anyway (attributes(f)), so I don't think there is anything wrong with as.numeric(paste(f)). Perhaps it would be better to think why (in the specific context) you are getting a factor in the first place, and try to stop that. E.g., is the dec argument in read.table set correctly?
    – CJB
    Jan 25 '16 at 9:44












  • If you use a dataframe you can use convert from hablar. df %>% convert(num(column)). Or if you have a factor vector you can use as_reliable_num(factor_vector)
    – davsjob
    Nov 1 at 9:53








1




1




The levels of a factor are stored as character data type anyway (attributes(f)), so I don't think there is anything wrong with as.numeric(paste(f)). Perhaps it would be better to think why (in the specific context) you are getting a factor in the first place, and try to stop that. E.g., is the dec argument in read.table set correctly?
– CJB
Jan 25 '16 at 9:44






The levels of a factor are stored as character data type anyway (attributes(f)), so I don't think there is anything wrong with as.numeric(paste(f)). Perhaps it would be better to think why (in the specific context) you are getting a factor in the first place, and try to stop that. E.g., is the dec argument in read.table set correctly?
– CJB
Jan 25 '16 at 9:44














If you use a dataframe you can use convert from hablar. df %>% convert(num(column)). Or if you have a factor vector you can use as_reliable_num(factor_vector)
– davsjob
Nov 1 at 9:53




If you use a dataframe you can use convert from hablar. df %>% convert(num(column)). Or if you have a factor vector you can use as_reliable_num(factor_vector)
– davsjob
Nov 1 at 9:53












7 Answers
7






active

oldest

votes

















up vote
590
down vote



accepted










See the Warning section of ?factor:




In particular, as.numeric applied to
a factor is meaningless, and may
happen by implicit coercion. To
transform a factor f to
approximately its original numeric
values, as.numeric(levels(f))[f] is
recommended and slightly more
efficient than
as.numeric(as.character(f)).




The FAQ on R has similar advice.





Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?



as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.





Some timings



library(microbenchmark)
microbenchmark(
as.numeric(levels(f))[f],
as.numeric(levels(f)[f]),
as.numeric(as.character(f)),
paste0(x),
paste(x),
times = 1e5
)
## Unit: microseconds
## expr min lq mean median uq max neval
## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05





share|improve this answer



















  • 2




    For timings see this answer: stackoverflow.com/questions/6979625/…
    – Ari B. Friedman
    Aug 8 '11 at 11:27






  • 2




    Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
    – Sam
    Apr 18 '14 at 0:25






  • 6




    @Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
    – Jonathan
    Jun 27 '14 at 19:12








  • 8




    when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
    – maycca
    Apr 13 '16 at 21:23










  • @maycca did you overcame this issue?
    – user08041991
    Jan 31 '17 at 12:25


















up vote
72
down vote













R has a number of (undocumented) convenience functions for converting factors:




  • as.character.factor

  • as.data.frame.factor

  • as.Date.factor

  • as.list.factor

  • as.vector.factor

  • ...


But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich's answer, I would suggest to overcome this omission with the definition of your own idiomatic function:



as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}


that you can store at the beginning of your script, or even better in your .Rprofile file.






share|improve this answer



















  • 12




    There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected that as.integer(factor) returns the underlying integer codes (as shown in the examples section of ?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.
    – Joshua Ulrich
    Apr 18 '14 at 12:03










  • That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersome factor->numeric conversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling it as.numeric.factor makes sense to me, but YMMV.
    – Jealie
    Apr 18 '14 at 20:11






  • 4




    If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
    – Joshua Ulrich
    Apr 18 '14 at 22:44










  • as.numeric.factor returns NA?
    – jO.
    Aug 8 '14 at 7:56










  • @jO.: in the cases where you used something like v=NA;as.numeric.factor(v) or v='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.
    – Jealie
    Aug 8 '14 at 14:43




















up vote
27
down vote













The most easiest way would be to use unfactor function from package varhandle



unfactor(your_factor_variable)


This example can be a quick start:



x <- rep(c("a", "b", "c"), 20)
y <- rep(c(1, 1, 0), 20)

class(x) # -> "character"
class(y) # -> "numeric"

x <- factor(x)
y <- factor(y)

class(x) # -> "factor"
class(y) # -> "factor"

library(varhandle)
x <- unfactor(x)
y <- unfactor(y)

class(x) # -> "character"
class(y) # -> "numeric"





share|improve this answer























  • The unfactor function converts to character data type first and then converts back to numeric. Type unfactor at the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.
    – CJB
    Jan 25 '16 at 9:32










  • Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
    – CJB
    Jan 25 '16 at 9:38










  • The unfactor function takes care of things that cannot be converted to numeric. Check the examples in help("unfactor")
    – Mehrad Mahmoudian
    Jul 25 '16 at 13:15






  • 2




    @Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (library("varhandle")) first (as I mentioned in the first line of my answer!!)
    – Mehrad Mahmoudian
    Sep 29 '16 at 13:06








  • 1




    @Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put the as.numeric() and as.character() in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions
    – Mehrad Mahmoudian
    Nov 10 '16 at 11:53


















up vote
15
down vote













Every answer in this post failed to generate results for me , NAs were getting generated.



y2<-factor(c("A","B","C","D","A")); 
as.numeric(levels(y2))[y2]
[1] NA NA NA NA NA Warning message: NAs introduced by coercion


What worked for me is this -



as.integer(y2)
# [1] 1 2 3 4 1


Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.






share|improve this answer























  • Are you sure you had a factor? Look at this example.y<-factor(c("5","15","20","2")); unclass(y) %>% as.numeric This returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.
    – MrFlick
    Feb 22 '17 at 19:19










  • Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
    – Indi
    Feb 22 '17 at 19:34










  • Let me update my scenario in the answer that I had provided
    – Indi
    Feb 22 '17 at 19:36






  • 3




    OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case , as.numeric(y) should have worked just fine, no need for the unclass(). But again, that's not what this question was about. This answer isn't appropriate here.
    – MrFlick
    Feb 22 '17 at 19:37






  • 3




    Well, I really hope it helps someone who was in a hurry like me and read just the title !
    – Indi
    Feb 22 '17 at 19:45


















up vote
7
down vote













It is possible only in the case when the factor labels match the original values. I will explain it with an example.



Assume the data is vector x:



x <- c(20, 10, 30, 20, 10, 40, 10, 40)


Now I will create a factor with four labels:



f <- factor(x, levels = c(10, 20, 30, 40), labels = c("A", "B", "C", "D"))


1) x is with type double, f is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.



> typeof(x)
[1] "double"
> typeof(f)
[1] "integer"


2) It is not possible to revert back to the original values (10, 20, 30, 40) having only f available. We can see that f holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.



> str(f)
Factor w/ 4 levels "A","B","C","D": 2 1 3 2 1 4 1 4
> attributes(f)
$levels
[1] "A" "B" "C" "D"

$class
[1] "factor"


To revert back to the original values we have to know the values of levels used in creating the factor. In this case c(10, 20, 30, 40). If we know the original levels (in correct order), we can revert back to the original values.



> orig_levels <- c(10, 20, 30, 40)
> x1 <- orig_levels[f]
> all.equal(x, x1)
[1] TRUE


And this will work only in case when labels have been defined for all possible values in the original data.



So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor.






share|improve this answer




























    up vote
    0
    down vote













    You can use hablar::convert if you have a data frame. The syntax is easy:



    Sample df



    library(hablar)
    library(dplyr)

    df <- dplyr::tibble(a = as.factor(c("7", "3")),
    b = as.factor(c("1.5", "6.3")))


    Solution



    df %>% 
    convert(num(a, b))


    gives you:



    # A tibble: 2 x 2
    a b
    <dbl> <dbl>
    1 7. 1.50
    2 3. 6.30


    Or if you want one column to be integer and one numeric:



    df %>% 
    convert(int(a),
    num(b))


    results in:



    # A tibble: 2 x 2
    a b
    <int> <dbl>
    1 7 1.50
    2 3 6.30





    share|improve this answer




























      up vote
      -1
      down vote













      late to the game, accidently, I found trimws() can convert factor(3:5) to c("3","4","5"). Then you can call as.numeric(). That is:



      as.numeric(trimws(x_factor_var))





      share|improve this answer

















      • 1




        Is there a reason you would recommend using trimws over as.character as described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove, trimws is just going to do a bunch of unnecessary regular expression work to return the same result.
        – MrFlick
        Nov 13 at 18:54










      protected by Joshua Ulrich Jul 9 '13 at 13:53



      Thank you for your interest in this question.
      Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



      Would you like to answer one of these unanswered questions instead?














      7 Answers
      7






      active

      oldest

      votes








      7 Answers
      7






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      590
      down vote



      accepted










      See the Warning section of ?factor:




      In particular, as.numeric applied to
      a factor is meaningless, and may
      happen by implicit coercion. To
      transform a factor f to
      approximately its original numeric
      values, as.numeric(levels(f))[f] is
      recommended and slightly more
      efficient than
      as.numeric(as.character(f)).




      The FAQ on R has similar advice.





      Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?



      as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.





      Some timings



      library(microbenchmark)
      microbenchmark(
      as.numeric(levels(f))[f],
      as.numeric(levels(f)[f]),
      as.numeric(as.character(f)),
      paste0(x),
      paste(x),
      times = 1e5
      )
      ## Unit: microseconds
      ## expr min lq mean median uq max neval
      ## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
      ## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
      ## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
      ## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
      ## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05





      share|improve this answer



















      • 2




        For timings see this answer: stackoverflow.com/questions/6979625/…
        – Ari B. Friedman
        Aug 8 '11 at 11:27






      • 2




        Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
        – Sam
        Apr 18 '14 at 0:25






      • 6




        @Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
        – Jonathan
        Jun 27 '14 at 19:12








      • 8




        when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
        – maycca
        Apr 13 '16 at 21:23










      • @maycca did you overcame this issue?
        – user08041991
        Jan 31 '17 at 12:25















      up vote
      590
      down vote



      accepted










      See the Warning section of ?factor:




      In particular, as.numeric applied to
      a factor is meaningless, and may
      happen by implicit coercion. To
      transform a factor f to
      approximately its original numeric
      values, as.numeric(levels(f))[f] is
      recommended and slightly more
      efficient than
      as.numeric(as.character(f)).




      The FAQ on R has similar advice.





      Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?



      as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.





      Some timings



      library(microbenchmark)
      microbenchmark(
      as.numeric(levels(f))[f],
      as.numeric(levels(f)[f]),
      as.numeric(as.character(f)),
      paste0(x),
      paste(x),
      times = 1e5
      )
      ## Unit: microseconds
      ## expr min lq mean median uq max neval
      ## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
      ## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
      ## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
      ## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
      ## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05





      share|improve this answer



















      • 2




        For timings see this answer: stackoverflow.com/questions/6979625/…
        – Ari B. Friedman
        Aug 8 '11 at 11:27






      • 2




        Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
        – Sam
        Apr 18 '14 at 0:25






      • 6




        @Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
        – Jonathan
        Jun 27 '14 at 19:12








      • 8




        when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
        – maycca
        Apr 13 '16 at 21:23










      • @maycca did you overcame this issue?
        – user08041991
        Jan 31 '17 at 12:25













      up vote
      590
      down vote



      accepted







      up vote
      590
      down vote



      accepted






      See the Warning section of ?factor:




      In particular, as.numeric applied to
      a factor is meaningless, and may
      happen by implicit coercion. To
      transform a factor f to
      approximately its original numeric
      values, as.numeric(levels(f))[f] is
      recommended and slightly more
      efficient than
      as.numeric(as.character(f)).




      The FAQ on R has similar advice.





      Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?



      as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.





      Some timings



      library(microbenchmark)
      microbenchmark(
      as.numeric(levels(f))[f],
      as.numeric(levels(f)[f]),
      as.numeric(as.character(f)),
      paste0(x),
      paste(x),
      times = 1e5
      )
      ## Unit: microseconds
      ## expr min lq mean median uq max neval
      ## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
      ## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
      ## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
      ## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
      ## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05





      share|improve this answer














      See the Warning section of ?factor:




      In particular, as.numeric applied to
      a factor is meaningless, and may
      happen by implicit coercion. To
      transform a factor f to
      approximately its original numeric
      values, as.numeric(levels(f))[f] is
      recommended and slightly more
      efficient than
      as.numeric(as.character(f)).




      The FAQ on R has similar advice.





      Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?



      as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.





      Some timings



      library(microbenchmark)
      microbenchmark(
      as.numeric(levels(f))[f],
      as.numeric(levels(f)[f]),
      as.numeric(as.character(f)),
      paste0(x),
      paste(x),
      times = 1e5
      )
      ## Unit: microseconds
      ## expr min lq mean median uq max neval
      ## as.numeric(levels(f))[f] 3.982 5.120 6.088624 5.405 5.974 1981.418 1e+05
      ## as.numeric(levels(f)[f]) 5.973 7.111 8.352032 7.396 8.250 4256.380 1e+05
      ## as.numeric(as.character(f)) 6.827 8.249 9.628264 8.534 9.671 1983.694 1e+05
      ## paste0(x) 7.964 9.387 11.026351 9.956 10.810 2911.257 1e+05
      ## paste(x) 7.965 9.387 11.127308 9.956 11.093 2419.458 1e+05






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jan 17 '16 at 7:51









      Jaap

      54.7k20116129




      54.7k20116129










      answered Aug 5 '10 at 19:01









      Joshua Ulrich

      137k22268356




      137k22268356








      • 2




        For timings see this answer: stackoverflow.com/questions/6979625/…
        – Ari B. Friedman
        Aug 8 '11 at 11:27






      • 2




        Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
        – Sam
        Apr 18 '14 at 0:25






      • 6




        @Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
        – Jonathan
        Jun 27 '14 at 19:12








      • 8




        when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
        – maycca
        Apr 13 '16 at 21:23










      • @maycca did you overcame this issue?
        – user08041991
        Jan 31 '17 at 12:25














      • 2




        For timings see this answer: stackoverflow.com/questions/6979625/…
        – Ari B. Friedman
        Aug 8 '11 at 11:27






      • 2




        Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
        – Sam
        Apr 18 '14 at 0:25






      • 6




        @Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
        – Jonathan
        Jun 27 '14 at 19:12








      • 8




        when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
        – maycca
        Apr 13 '16 at 21:23










      • @maycca did you overcame this issue?
        – user08041991
        Jan 31 '17 at 12:25








      2




      2




      For timings see this answer: stackoverflow.com/questions/6979625/…
      – Ari B. Friedman
      Aug 8 '11 at 11:27




      For timings see this answer: stackoverflow.com/questions/6979625/…
      – Ari B. Friedman
      Aug 8 '11 at 11:27




      2




      2




      Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
      – Sam
      Apr 18 '14 at 0:25




      Many thanks for your solution. Can I ask why the as.numeric(levels(f))[f] is more precise and faster? Thanks.
      – Sam
      Apr 18 '14 at 0:25




      6




      6




      @Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
      – Jonathan
      Jun 27 '14 at 19:12






      @Sam as.character(f) requires a "primitive lookup" to find the function as.character.factor(), which is defined as as.numeric(levels(f))[f].
      – Jonathan
      Jun 27 '14 at 19:12






      8




      8




      when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
      – maycca
      Apr 13 '16 at 21:23




      when apply as.numeric(levels(f))[f] OR as.numeric(as.character(f)), I have an warning msg: Warning message:NAs introduced by coercion. Do you know where the problem could be? thank you !
      – maycca
      Apr 13 '16 at 21:23












      @maycca did you overcame this issue?
      – user08041991
      Jan 31 '17 at 12:25




      @maycca did you overcame this issue?
      – user08041991
      Jan 31 '17 at 12:25












      up vote
      72
      down vote













      R has a number of (undocumented) convenience functions for converting factors:




      • as.character.factor

      • as.data.frame.factor

      • as.Date.factor

      • as.list.factor

      • as.vector.factor

      • ...


      But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich's answer, I would suggest to overcome this omission with the definition of your own idiomatic function:



      as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}


      that you can store at the beginning of your script, or even better in your .Rprofile file.






      share|improve this answer



















      • 12




        There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected that as.integer(factor) returns the underlying integer codes (as shown in the examples section of ?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.
        – Joshua Ulrich
        Apr 18 '14 at 12:03










      • That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersome factor->numeric conversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling it as.numeric.factor makes sense to me, but YMMV.
        – Jealie
        Apr 18 '14 at 20:11






      • 4




        If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
        – Joshua Ulrich
        Apr 18 '14 at 22:44










      • as.numeric.factor returns NA?
        – jO.
        Aug 8 '14 at 7:56










      • @jO.: in the cases where you used something like v=NA;as.numeric.factor(v) or v='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.
        – Jealie
        Aug 8 '14 at 14:43

















      up vote
      72
      down vote













      R has a number of (undocumented) convenience functions for converting factors:




      • as.character.factor

      • as.data.frame.factor

      • as.Date.factor

      • as.list.factor

      • as.vector.factor

      • ...


      But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich's answer, I would suggest to overcome this omission with the definition of your own idiomatic function:



      as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}


      that you can store at the beginning of your script, or even better in your .Rprofile file.






      share|improve this answer



















      • 12




        There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected that as.integer(factor) returns the underlying integer codes (as shown in the examples section of ?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.
        – Joshua Ulrich
        Apr 18 '14 at 12:03










      • That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersome factor->numeric conversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling it as.numeric.factor makes sense to me, but YMMV.
        – Jealie
        Apr 18 '14 at 20:11






      • 4




        If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
        – Joshua Ulrich
        Apr 18 '14 at 22:44










      • as.numeric.factor returns NA?
        – jO.
        Aug 8 '14 at 7:56










      • @jO.: in the cases where you used something like v=NA;as.numeric.factor(v) or v='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.
        – Jealie
        Aug 8 '14 at 14:43















      up vote
      72
      down vote










      up vote
      72
      down vote









      R has a number of (undocumented) convenience functions for converting factors:




      • as.character.factor

      • as.data.frame.factor

      • as.Date.factor

      • as.list.factor

      • as.vector.factor

      • ...


      But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich's answer, I would suggest to overcome this omission with the definition of your own idiomatic function:



      as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}


      that you can store at the beginning of your script, or even better in your .Rprofile file.






      share|improve this answer














      R has a number of (undocumented) convenience functions for converting factors:




      • as.character.factor

      • as.data.frame.factor

      • as.Date.factor

      • as.list.factor

      • as.vector.factor

      • ...


      But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich's answer, I would suggest to overcome this omission with the definition of your own idiomatic function:



      as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}


      that you can store at the beginning of your script, or even better in your .Rprofile file.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jun 4 '14 at 18:19









      MrLore

      3,37922033




      3,37922033










      answered Mar 27 '14 at 23:39









      Jealie

      4,3382133




      4,3382133








      • 12




        There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected that as.integer(factor) returns the underlying integer codes (as shown in the examples section of ?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.
        – Joshua Ulrich
        Apr 18 '14 at 12:03










      • That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersome factor->numeric conversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling it as.numeric.factor makes sense to me, but YMMV.
        – Jealie
        Apr 18 '14 at 20:11






      • 4




        If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
        – Joshua Ulrich
        Apr 18 '14 at 22:44










      • as.numeric.factor returns NA?
        – jO.
        Aug 8 '14 at 7:56










      • @jO.: in the cases where you used something like v=NA;as.numeric.factor(v) or v='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.
        – Jealie
        Aug 8 '14 at 14:43
















      • 12




        There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected that as.integer(factor) returns the underlying integer codes (as shown in the examples section of ?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.
        – Joshua Ulrich
        Apr 18 '14 at 12:03










      • That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersome factor->numeric conversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling it as.numeric.factor makes sense to me, but YMMV.
        – Jealie
        Apr 18 '14 at 20:11






      • 4




        If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
        – Joshua Ulrich
        Apr 18 '14 at 22:44










      • as.numeric.factor returns NA?
        – jO.
        Aug 8 '14 at 7:56










      • @jO.: in the cases where you used something like v=NA;as.numeric.factor(v) or v='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.
        – Jealie
        Aug 8 '14 at 14:43










      12




      12




      There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected that as.integer(factor) returns the underlying integer codes (as shown in the examples section of ?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.
      – Joshua Ulrich
      Apr 18 '14 at 12:03




      There's nothing to handle the factor-to-integer (or numeric) conversion because it's expected that as.integer(factor) returns the underlying integer codes (as shown in the examples section of ?factor). It's probably okay to define this function in your global environment, but you might cause problems if you actually register it as an S3 method.
      – Joshua Ulrich
      Apr 18 '14 at 12:03












      That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersome factor->numeric conversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling it as.numeric.factor makes sense to me, but YMMV.
      – Jealie
      Apr 18 '14 at 20:11




      That's a good point and I agree: a complete redefinition of the factor->numeric conversion is likely to mess a lot of things. I found myself writing the cumbersome factor->numeric conversion a lot before realizing that it is in fact a shortcoming of R: some convenience function should be available... Calling it as.numeric.factor makes sense to me, but YMMV.
      – Jealie
      Apr 18 '14 at 20:11




      4




      4




      If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
      – Joshua Ulrich
      Apr 18 '14 at 22:44




      If you find yourself doing that a lot, then you should do something upstream to avoid it all-together.
      – Joshua Ulrich
      Apr 18 '14 at 22:44












      as.numeric.factor returns NA?
      – jO.
      Aug 8 '14 at 7:56




      as.numeric.factor returns NA?
      – jO.
      Aug 8 '14 at 7:56












      @jO.: in the cases where you used something like v=NA;as.numeric.factor(v) or v='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.
      – Jealie
      Aug 8 '14 at 14:43






      @jO.: in the cases where you used something like v=NA;as.numeric.factor(v) or v='something';as.numeric.factor(v), then it should, otherwise you have a weird thing going on somewhere.
      – Jealie
      Aug 8 '14 at 14:43












      up vote
      27
      down vote













      The most easiest way would be to use unfactor function from package varhandle



      unfactor(your_factor_variable)


      This example can be a quick start:



      x <- rep(c("a", "b", "c"), 20)
      y <- rep(c(1, 1, 0), 20)

      class(x) # -> "character"
      class(y) # -> "numeric"

      x <- factor(x)
      y <- factor(y)

      class(x) # -> "factor"
      class(y) # -> "factor"

      library(varhandle)
      x <- unfactor(x)
      y <- unfactor(y)

      class(x) # -> "character"
      class(y) # -> "numeric"





      share|improve this answer























      • The unfactor function converts to character data type first and then converts back to numeric. Type unfactor at the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.
        – CJB
        Jan 25 '16 at 9:32










      • Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
        – CJB
        Jan 25 '16 at 9:38










      • The unfactor function takes care of things that cannot be converted to numeric. Check the examples in help("unfactor")
        – Mehrad Mahmoudian
        Jul 25 '16 at 13:15






      • 2




        @Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (library("varhandle")) first (as I mentioned in the first line of my answer!!)
        – Mehrad Mahmoudian
        Sep 29 '16 at 13:06








      • 1




        @Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put the as.numeric() and as.character() in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions
        – Mehrad Mahmoudian
        Nov 10 '16 at 11:53















      up vote
      27
      down vote













      The most easiest way would be to use unfactor function from package varhandle



      unfactor(your_factor_variable)


      This example can be a quick start:



      x <- rep(c("a", "b", "c"), 20)
      y <- rep(c(1, 1, 0), 20)

      class(x) # -> "character"
      class(y) # -> "numeric"

      x <- factor(x)
      y <- factor(y)

      class(x) # -> "factor"
      class(y) # -> "factor"

      library(varhandle)
      x <- unfactor(x)
      y <- unfactor(y)

      class(x) # -> "character"
      class(y) # -> "numeric"





      share|improve this answer























      • The unfactor function converts to character data type first and then converts back to numeric. Type unfactor at the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.
        – CJB
        Jan 25 '16 at 9:32










      • Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
        – CJB
        Jan 25 '16 at 9:38










      • The unfactor function takes care of things that cannot be converted to numeric. Check the examples in help("unfactor")
        – Mehrad Mahmoudian
        Jul 25 '16 at 13:15






      • 2




        @Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (library("varhandle")) first (as I mentioned in the first line of my answer!!)
        – Mehrad Mahmoudian
        Sep 29 '16 at 13:06








      • 1




        @Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put the as.numeric() and as.character() in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions
        – Mehrad Mahmoudian
        Nov 10 '16 at 11:53













      up vote
      27
      down vote










      up vote
      27
      down vote









      The most easiest way would be to use unfactor function from package varhandle



      unfactor(your_factor_variable)


      This example can be a quick start:



      x <- rep(c("a", "b", "c"), 20)
      y <- rep(c(1, 1, 0), 20)

      class(x) # -> "character"
      class(y) # -> "numeric"

      x <- factor(x)
      y <- factor(y)

      class(x) # -> "factor"
      class(y) # -> "factor"

      library(varhandle)
      x <- unfactor(x)
      y <- unfactor(y)

      class(x) # -> "character"
      class(y) # -> "numeric"





      share|improve this answer














      The most easiest way would be to use unfactor function from package varhandle



      unfactor(your_factor_variable)


      This example can be a quick start:



      x <- rep(c("a", "b", "c"), 20)
      y <- rep(c(1, 1, 0), 20)

      class(x) # -> "character"
      class(y) # -> "numeric"

      x <- factor(x)
      y <- factor(y)

      class(x) # -> "factor"
      class(y) # -> "factor"

      library(varhandle)
      x <- unfactor(x)
      y <- unfactor(y)

      class(x) # -> "character"
      class(y) # -> "numeric"






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jan 25 '16 at 9:14

























      answered Dec 1 '15 at 14:11









      Mehrad Mahmoudian

      1,5181726




      1,5181726












      • The unfactor function converts to character data type first and then converts back to numeric. Type unfactor at the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.
        – CJB
        Jan 25 '16 at 9:32










      • Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
        – CJB
        Jan 25 '16 at 9:38










      • The unfactor function takes care of things that cannot be converted to numeric. Check the examples in help("unfactor")
        – Mehrad Mahmoudian
        Jul 25 '16 at 13:15






      • 2




        @Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (library("varhandle")) first (as I mentioned in the first line of my answer!!)
        – Mehrad Mahmoudian
        Sep 29 '16 at 13:06








      • 1




        @Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put the as.numeric() and as.character() in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions
        – Mehrad Mahmoudian
        Nov 10 '16 at 11:53


















      • The unfactor function converts to character data type first and then converts back to numeric. Type unfactor at the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.
        – CJB
        Jan 25 '16 at 9:32










      • Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
        – CJB
        Jan 25 '16 at 9:38










      • The unfactor function takes care of things that cannot be converted to numeric. Check the examples in help("unfactor")
        – Mehrad Mahmoudian
        Jul 25 '16 at 13:15






      • 2




        @Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (library("varhandle")) first (as I mentioned in the first line of my answer!!)
        – Mehrad Mahmoudian
        Sep 29 '16 at 13:06








      • 1




        @Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put the as.numeric() and as.character() in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions
        – Mehrad Mahmoudian
        Nov 10 '16 at 11:53
















      The unfactor function converts to character data type first and then converts back to numeric. Type unfactor at the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.
      – CJB
      Jan 25 '16 at 9:32




      The unfactor function converts to character data type first and then converts back to numeric. Type unfactor at the console and you can see it in the middle of the function. Therefore it doesn't really give a better solution than what the asker already had.
      – CJB
      Jan 25 '16 at 9:32












      Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
      – CJB
      Jan 25 '16 at 9:38




      Having said that, the levels of a factor are of character type anyway, so nothing is lost by this approach.
      – CJB
      Jan 25 '16 at 9:38












      The unfactor function takes care of things that cannot be converted to numeric. Check the examples in help("unfactor")
      – Mehrad Mahmoudian
      Jul 25 '16 at 13:15




      The unfactor function takes care of things that cannot be converted to numeric. Check the examples in help("unfactor")
      – Mehrad Mahmoudian
      Jul 25 '16 at 13:15




      2




      2




      @Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (library("varhandle")) first (as I mentioned in the first line of my answer!!)
      – Mehrad Mahmoudian
      Sep 29 '16 at 13:06






      @Selrac I've mentioned that this function is available in varhandle package, meaning you should load the package (library("varhandle")) first (as I mentioned in the first line of my answer!!)
      – Mehrad Mahmoudian
      Sep 29 '16 at 13:06






      1




      1




      @Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put the as.numeric() and as.character() in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions
      – Mehrad Mahmoudian
      Nov 10 '16 at 11:53




      @Gregor adding a light dependency does not harm usually and of course if you are looking for the most efficient way, writing the code your self might perform faster. but as you can also see in your comment this is not trivial since you also put the as.numeric() and as.character() in a wrong order ;) What your code chunk does is to turn the factor's level index into a character matrix, so what you will have at the and is a character vector that contains some numbers that has been once assigned to certain level of your factor. Functions in that package are there to prevent these confusions
      – Mehrad Mahmoudian
      Nov 10 '16 at 11:53










      up vote
      15
      down vote













      Every answer in this post failed to generate results for me , NAs were getting generated.



      y2<-factor(c("A","B","C","D","A")); 
      as.numeric(levels(y2))[y2]
      [1] NA NA NA NA NA Warning message: NAs introduced by coercion


      What worked for me is this -



      as.integer(y2)
      # [1] 1 2 3 4 1


      Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.






      share|improve this answer























      • Are you sure you had a factor? Look at this example.y<-factor(c("5","15","20","2")); unclass(y) %>% as.numeric This returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.
        – MrFlick
        Feb 22 '17 at 19:19










      • Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
        – Indi
        Feb 22 '17 at 19:34










      • Let me update my scenario in the answer that I had provided
        – Indi
        Feb 22 '17 at 19:36






      • 3




        OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case , as.numeric(y) should have worked just fine, no need for the unclass(). But again, that's not what this question was about. This answer isn't appropriate here.
        – MrFlick
        Feb 22 '17 at 19:37






      • 3




        Well, I really hope it helps someone who was in a hurry like me and read just the title !
        – Indi
        Feb 22 '17 at 19:45















      up vote
      15
      down vote













      Every answer in this post failed to generate results for me , NAs were getting generated.



      y2<-factor(c("A","B","C","D","A")); 
      as.numeric(levels(y2))[y2]
      [1] NA NA NA NA NA Warning message: NAs introduced by coercion


      What worked for me is this -



      as.integer(y2)
      # [1] 1 2 3 4 1


      Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.






      share|improve this answer























      • Are you sure you had a factor? Look at this example.y<-factor(c("5","15","20","2")); unclass(y) %>% as.numeric This returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.
        – MrFlick
        Feb 22 '17 at 19:19










      • Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
        – Indi
        Feb 22 '17 at 19:34










      • Let me update my scenario in the answer that I had provided
        – Indi
        Feb 22 '17 at 19:36






      • 3




        OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case , as.numeric(y) should have worked just fine, no need for the unclass(). But again, that's not what this question was about. This answer isn't appropriate here.
        – MrFlick
        Feb 22 '17 at 19:37






      • 3




        Well, I really hope it helps someone who was in a hurry like me and read just the title !
        – Indi
        Feb 22 '17 at 19:45













      up vote
      15
      down vote










      up vote
      15
      down vote









      Every answer in this post failed to generate results for me , NAs were getting generated.



      y2<-factor(c("A","B","C","D","A")); 
      as.numeric(levels(y2))[y2]
      [1] NA NA NA NA NA Warning message: NAs introduced by coercion


      What worked for me is this -



      as.integer(y2)
      # [1] 1 2 3 4 1


      Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.






      share|improve this answer














      Every answer in this post failed to generate results for me , NAs were getting generated.



      y2<-factor(c("A","B","C","D","A")); 
      as.numeric(levels(y2))[y2]
      [1] NA NA NA NA NA Warning message: NAs introduced by coercion


      What worked for me is this -



      as.integer(y2)
      # [1] 1 2 3 4 1


      Note: this particular answer is not for converting numeric-valued factors to numerics, it is for converting categorical factors to their corresponding level numbers.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jun 1 at 15:13









      Gregor

      62.4k988163




      62.4k988163










      answered Feb 22 '17 at 18:26









      Indi

      762523




      762523












      • Are you sure you had a factor? Look at this example.y<-factor(c("5","15","20","2")); unclass(y) %>% as.numeric This returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.
        – MrFlick
        Feb 22 '17 at 19:19










      • Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
        – Indi
        Feb 22 '17 at 19:34










      • Let me update my scenario in the answer that I had provided
        – Indi
        Feb 22 '17 at 19:36






      • 3




        OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case , as.numeric(y) should have worked just fine, no need for the unclass(). But again, that's not what this question was about. This answer isn't appropriate here.
        – MrFlick
        Feb 22 '17 at 19:37






      • 3




        Well, I really hope it helps someone who was in a hurry like me and read just the title !
        – Indi
        Feb 22 '17 at 19:45


















      • Are you sure you had a factor? Look at this example.y<-factor(c("5","15","20","2")); unclass(y) %>% as.numeric This returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.
        – MrFlick
        Feb 22 '17 at 19:19










      • Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
        – Indi
        Feb 22 '17 at 19:34










      • Let me update my scenario in the answer that I had provided
        – Indi
        Feb 22 '17 at 19:36






      • 3




        OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case , as.numeric(y) should have worked just fine, no need for the unclass(). But again, that's not what this question was about. This answer isn't appropriate here.
        – MrFlick
        Feb 22 '17 at 19:37






      • 3




        Well, I really hope it helps someone who was in a hurry like me and read just the title !
        – Indi
        Feb 22 '17 at 19:45
















      Are you sure you had a factor? Look at this example.y<-factor(c("5","15","20","2")); unclass(y) %>% as.numeric This returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.
      – MrFlick
      Feb 22 '17 at 19:19




      Are you sure you had a factor? Look at this example.y<-factor(c("5","15","20","2")); unclass(y) %>% as.numeric This returns 4,1,3,2, not 5,15,20,2. This seems like incorrect information.
      – MrFlick
      Feb 22 '17 at 19:19












      Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
      – Indi
      Feb 22 '17 at 19:34




      Ok, this is similar to what I was trying to do today :- y2<-factor(c("A","B","C","D","A")); as.numeric(levels(y2))[y2] [1] NA NA NA NA NA Warning message: NAs introduced by coercion whereas unclass(y2) %>% as.numeric gave me the results that I needed.
      – Indi
      Feb 22 '17 at 19:34












      Let me update my scenario in the answer that I had provided
      – Indi
      Feb 22 '17 at 19:36




      Let me update my scenario in the answer that I had provided
      – Indi
      Feb 22 '17 at 19:36




      3




      3




      OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case , as.numeric(y) should have worked just fine, no need for the unclass(). But again, that's not what this question was about. This answer isn't appropriate here.
      – MrFlick
      Feb 22 '17 at 19:37




      OK, well that's not the question that was asked above. In this question the factor levels are all "numeric". In your case , as.numeric(y) should have worked just fine, no need for the unclass(). But again, that's not what this question was about. This answer isn't appropriate here.
      – MrFlick
      Feb 22 '17 at 19:37




      3




      3




      Well, I really hope it helps someone who was in a hurry like me and read just the title !
      – Indi
      Feb 22 '17 at 19:45




      Well, I really hope it helps someone who was in a hurry like me and read just the title !
      – Indi
      Feb 22 '17 at 19:45










      up vote
      7
      down vote













      It is possible only in the case when the factor labels match the original values. I will explain it with an example.



      Assume the data is vector x:



      x <- c(20, 10, 30, 20, 10, 40, 10, 40)


      Now I will create a factor with four labels:



      f <- factor(x, levels = c(10, 20, 30, 40), labels = c("A", "B", "C", "D"))


      1) x is with type double, f is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.



      > typeof(x)
      [1] "double"
      > typeof(f)
      [1] "integer"


      2) It is not possible to revert back to the original values (10, 20, 30, 40) having only f available. We can see that f holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.



      > str(f)
      Factor w/ 4 levels "A","B","C","D": 2 1 3 2 1 4 1 4
      > attributes(f)
      $levels
      [1] "A" "B" "C" "D"

      $class
      [1] "factor"


      To revert back to the original values we have to know the values of levels used in creating the factor. In this case c(10, 20, 30, 40). If we know the original levels (in correct order), we can revert back to the original values.



      > orig_levels <- c(10, 20, 30, 40)
      > x1 <- orig_levels[f]
      > all.equal(x, x1)
      [1] TRUE


      And this will work only in case when labels have been defined for all possible values in the original data.



      So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor.






      share|improve this answer

























        up vote
        7
        down vote













        It is possible only in the case when the factor labels match the original values. I will explain it with an example.



        Assume the data is vector x:



        x <- c(20, 10, 30, 20, 10, 40, 10, 40)


        Now I will create a factor with four labels:



        f <- factor(x, levels = c(10, 20, 30, 40), labels = c("A", "B", "C", "D"))


        1) x is with type double, f is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.



        > typeof(x)
        [1] "double"
        > typeof(f)
        [1] "integer"


        2) It is not possible to revert back to the original values (10, 20, 30, 40) having only f available. We can see that f holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.



        > str(f)
        Factor w/ 4 levels "A","B","C","D": 2 1 3 2 1 4 1 4
        > attributes(f)
        $levels
        [1] "A" "B" "C" "D"

        $class
        [1] "factor"


        To revert back to the original values we have to know the values of levels used in creating the factor. In this case c(10, 20, 30, 40). If we know the original levels (in correct order), we can revert back to the original values.



        > orig_levels <- c(10, 20, 30, 40)
        > x1 <- orig_levels[f]
        > all.equal(x, x1)
        [1] TRUE


        And this will work only in case when labels have been defined for all possible values in the original data.



        So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor.






        share|improve this answer























          up vote
          7
          down vote










          up vote
          7
          down vote









          It is possible only in the case when the factor labels match the original values. I will explain it with an example.



          Assume the data is vector x:



          x <- c(20, 10, 30, 20, 10, 40, 10, 40)


          Now I will create a factor with four labels:



          f <- factor(x, levels = c(10, 20, 30, 40), labels = c("A", "B", "C", "D"))


          1) x is with type double, f is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.



          > typeof(x)
          [1] "double"
          > typeof(f)
          [1] "integer"


          2) It is not possible to revert back to the original values (10, 20, 30, 40) having only f available. We can see that f holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.



          > str(f)
          Factor w/ 4 levels "A","B","C","D": 2 1 3 2 1 4 1 4
          > attributes(f)
          $levels
          [1] "A" "B" "C" "D"

          $class
          [1] "factor"


          To revert back to the original values we have to know the values of levels used in creating the factor. In this case c(10, 20, 30, 40). If we know the original levels (in correct order), we can revert back to the original values.



          > orig_levels <- c(10, 20, 30, 40)
          > x1 <- orig_levels[f]
          > all.equal(x, x1)
          [1] TRUE


          And this will work only in case when labels have been defined for all possible values in the original data.



          So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor.






          share|improve this answer












          It is possible only in the case when the factor labels match the original values. I will explain it with an example.



          Assume the data is vector x:



          x <- c(20, 10, 30, 20, 10, 40, 10, 40)


          Now I will create a factor with four labels:



          f <- factor(x, levels = c(10, 20, 30, 40), labels = c("A", "B", "C", "D"))


          1) x is with type double, f is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.



          > typeof(x)
          [1] "double"
          > typeof(f)
          [1] "integer"


          2) It is not possible to revert back to the original values (10, 20, 30, 40) having only f available. We can see that f holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.



          > str(f)
          Factor w/ 4 levels "A","B","C","D": 2 1 3 2 1 4 1 4
          > attributes(f)
          $levels
          [1] "A" "B" "C" "D"

          $class
          [1] "factor"


          To revert back to the original values we have to know the values of levels used in creating the factor. In this case c(10, 20, 30, 40). If we know the original levels (in correct order), we can revert back to the original values.



          > orig_levels <- c(10, 20, 30, 40)
          > x1 <- orig_levels[f]
          > all.equal(x, x1)
          [1] TRUE


          And this will work only in case when labels have been defined for all possible values in the original data.



          So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Oct 9 '15 at 12:34









          djhurio

          4,18021941




          4,18021941






















              up vote
              0
              down vote













              You can use hablar::convert if you have a data frame. The syntax is easy:



              Sample df



              library(hablar)
              library(dplyr)

              df <- dplyr::tibble(a = as.factor(c("7", "3")),
              b = as.factor(c("1.5", "6.3")))


              Solution



              df %>% 
              convert(num(a, b))


              gives you:



              # A tibble: 2 x 2
              a b
              <dbl> <dbl>
              1 7. 1.50
              2 3. 6.30


              Or if you want one column to be integer and one numeric:



              df %>% 
              convert(int(a),
              num(b))


              results in:



              # A tibble: 2 x 2
              a b
              <int> <dbl>
              1 7 1.50
              2 3 6.30





              share|improve this answer

























                up vote
                0
                down vote













                You can use hablar::convert if you have a data frame. The syntax is easy:



                Sample df



                library(hablar)
                library(dplyr)

                df <- dplyr::tibble(a = as.factor(c("7", "3")),
                b = as.factor(c("1.5", "6.3")))


                Solution



                df %>% 
                convert(num(a, b))


                gives you:



                # A tibble: 2 x 2
                a b
                <dbl> <dbl>
                1 7. 1.50
                2 3. 6.30


                Or if you want one column to be integer and one numeric:



                df %>% 
                convert(int(a),
                num(b))


                results in:



                # A tibble: 2 x 2
                a b
                <int> <dbl>
                1 7 1.50
                2 3 6.30





                share|improve this answer























                  up vote
                  0
                  down vote










                  up vote
                  0
                  down vote









                  You can use hablar::convert if you have a data frame. The syntax is easy:



                  Sample df



                  library(hablar)
                  library(dplyr)

                  df <- dplyr::tibble(a = as.factor(c("7", "3")),
                  b = as.factor(c("1.5", "6.3")))


                  Solution



                  df %>% 
                  convert(num(a, b))


                  gives you:



                  # A tibble: 2 x 2
                  a b
                  <dbl> <dbl>
                  1 7. 1.50
                  2 3. 6.30


                  Or if you want one column to be integer and one numeric:



                  df %>% 
                  convert(int(a),
                  num(b))


                  results in:



                  # A tibble: 2 x 2
                  a b
                  <int> <dbl>
                  1 7 1.50
                  2 3 6.30





                  share|improve this answer












                  You can use hablar::convert if you have a data frame. The syntax is easy:



                  Sample df



                  library(hablar)
                  library(dplyr)

                  df <- dplyr::tibble(a = as.factor(c("7", "3")),
                  b = as.factor(c("1.5", "6.3")))


                  Solution



                  df %>% 
                  convert(num(a, b))


                  gives you:



                  # A tibble: 2 x 2
                  a b
                  <dbl> <dbl>
                  1 7. 1.50
                  2 3. 6.30


                  Or if you want one column to be integer and one numeric:



                  df %>% 
                  convert(int(a),
                  num(b))


                  results in:



                  # A tibble: 2 x 2
                  a b
                  <int> <dbl>
                  1 7 1.50
                  2 3 6.30






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 1 at 10:05









                  davsjob

                  50726




                  50726






















                      up vote
                      -1
                      down vote













                      late to the game, accidently, I found trimws() can convert factor(3:5) to c("3","4","5"). Then you can call as.numeric(). That is:



                      as.numeric(trimws(x_factor_var))





                      share|improve this answer

















                      • 1




                        Is there a reason you would recommend using trimws over as.character as described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove, trimws is just going to do a bunch of unnecessary regular expression work to return the same result.
                        – MrFlick
                        Nov 13 at 18:54















                      up vote
                      -1
                      down vote













                      late to the game, accidently, I found trimws() can convert factor(3:5) to c("3","4","5"). Then you can call as.numeric(). That is:



                      as.numeric(trimws(x_factor_var))





                      share|improve this answer

















                      • 1




                        Is there a reason you would recommend using trimws over as.character as described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove, trimws is just going to do a bunch of unnecessary regular expression work to return the same result.
                        – MrFlick
                        Nov 13 at 18:54













                      up vote
                      -1
                      down vote










                      up vote
                      -1
                      down vote









                      late to the game, accidently, I found trimws() can convert factor(3:5) to c("3","4","5"). Then you can call as.numeric(). That is:



                      as.numeric(trimws(x_factor_var))





                      share|improve this answer












                      late to the game, accidently, I found trimws() can convert factor(3:5) to c("3","4","5"). Then you can call as.numeric(). That is:



                      as.numeric(trimws(x_factor_var))






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Nov 13 at 2:37









                      Jerry T

                      591710




                      591710








                      • 1




                        Is there a reason you would recommend using trimws over as.character as described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove, trimws is just going to do a bunch of unnecessary regular expression work to return the same result.
                        – MrFlick
                        Nov 13 at 18:54














                      • 1




                        Is there a reason you would recommend using trimws over as.character as described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove, trimws is just going to do a bunch of unnecessary regular expression work to return the same result.
                        – MrFlick
                        Nov 13 at 18:54








                      1




                      1




                      Is there a reason you would recommend using trimws over as.character as described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove, trimws is just going to do a bunch of unnecessary regular expression work to return the same result.
                      – MrFlick
                      Nov 13 at 18:54




                      Is there a reason you would recommend using trimws over as.character as described in the accepted answer? It seems to me like unless you actually had whitespace you needed to remove, trimws is just going to do a bunch of unnecessary regular expression work to return the same result.
                      – MrFlick
                      Nov 13 at 18:54





                      protected by Joshua Ulrich Jul 9 '13 at 13:53



                      Thank you for your interest in this question.
                      Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                      Would you like to answer one of these unanswered questions instead?



                      Popular posts from this blog

                      Ottavio Pratesi

                      Tricia Helfer

                      15 giugno