Create loop to calculate percentage under conditions and fill in results











up vote
0
down vote

favorite












I have tried to find some information, but did not really find what I was looking for.



This is my dataframe full.data (excerpt)



country year      sector    emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 0
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 0
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 0
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 0
Austria 2015 Total 6.670398e+07 AT 0
Austria 2015 Regulated 3.800309e+07 AT 0
Austria 2015 Unregulated 2.870090e+07 AT 0
Austria 2015 ETS 2.949206e+07 AT 0
Austria 2016 Total 6.740209e+07 AT 0
Austria 2016 Regulated 3.765177e+07 AT 0
Austria 2016 Unregulated 2.975031e+07 AT 0
Austria 2016 ETS 2.900012e+07 AT 0
Austria 2017 ETS 3.055523e+07 AT 0
Belgium 1990 Total 1.204844e+08 BE 0
Belgium 1990 Regulated 7.861411e+07 BE 0
Belgium 1990 Unregulated 4.187029e+07 BE 0
Belgium 1991 Total 1.235447e+08 BE 0
Belgium 1991 Regulated 7.981152e+07 BE 0
Belgium 1991 Unregulated 4.373319e+07 BE 0
Belgium 1992 Total 1.226578e+08 BE 0
Belgium 1992 Regulated 7.828396e+07 BE 0
Belgium 1992 Unregulated 4.437385e+07 BE 0
Belgium 1993 Total 1.215573e+08 BE 0
Belgium 1993 Regulated 7.675229e+07 BE 0
Belgium 1993 Unregulated 4.480499e+07 BE 0
Belgium 1994 Total 1.249382e+08 BE 0
Belgium 1994 Regulated 8.064799e+07 BE 0
Belgium 1994 Unregulated 4.429020e+07 BE 0
....


I'm trying to fill into full.data$PercentageDifference the percentage of emissions where sector=ETS and where sector=Regulated(emissions sector=ETS is xx.y% of sector=Regulated) . This percentage value should be filed into PercentageDifference where sector=ETS. This should happen for every year and country. I would assume I would need a loop here. I have read that dplyr is useful for this, but did not really figure out how to do that. However, if there is a better way than dplyr, that's fine with me.



The result would look like this



country year      sector    emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 72.6%
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 71.9%
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 77.3%
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 77.3%


I have not posted what I have done so far, as I did not manage doing a lot....



Thanks for help of any kind.



Nordsee










share|improve this question
























  • Please always provide a minimal data.frame in your question and show the expected result too. IMHO your question is difficult to understand and typing in the data to find and answer is a waste of time. THX :-)
    – R Yoda
    Nov 18 at 14:43










  • Contains "emissions" a percentage or an absolute value?
    – R Yoda
    Nov 18 at 14:44






  • 1




    @RYoda emisisons are absolute values/numeric. I have added an example outcome. Sorry I didn't add this earlier
    – Nordsee
    Nov 18 at 14:58















up vote
0
down vote

favorite












I have tried to find some information, but did not really find what I was looking for.



This is my dataframe full.data (excerpt)



country year      sector    emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 0
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 0
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 0
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 0
Austria 2015 Total 6.670398e+07 AT 0
Austria 2015 Regulated 3.800309e+07 AT 0
Austria 2015 Unregulated 2.870090e+07 AT 0
Austria 2015 ETS 2.949206e+07 AT 0
Austria 2016 Total 6.740209e+07 AT 0
Austria 2016 Regulated 3.765177e+07 AT 0
Austria 2016 Unregulated 2.975031e+07 AT 0
Austria 2016 ETS 2.900012e+07 AT 0
Austria 2017 ETS 3.055523e+07 AT 0
Belgium 1990 Total 1.204844e+08 BE 0
Belgium 1990 Regulated 7.861411e+07 BE 0
Belgium 1990 Unregulated 4.187029e+07 BE 0
Belgium 1991 Total 1.235447e+08 BE 0
Belgium 1991 Regulated 7.981152e+07 BE 0
Belgium 1991 Unregulated 4.373319e+07 BE 0
Belgium 1992 Total 1.226578e+08 BE 0
Belgium 1992 Regulated 7.828396e+07 BE 0
Belgium 1992 Unregulated 4.437385e+07 BE 0
Belgium 1993 Total 1.215573e+08 BE 0
Belgium 1993 Regulated 7.675229e+07 BE 0
Belgium 1993 Unregulated 4.480499e+07 BE 0
Belgium 1994 Total 1.249382e+08 BE 0
Belgium 1994 Regulated 8.064799e+07 BE 0
Belgium 1994 Unregulated 4.429020e+07 BE 0
....


I'm trying to fill into full.data$PercentageDifference the percentage of emissions where sector=ETS and where sector=Regulated(emissions sector=ETS is xx.y% of sector=Regulated) . This percentage value should be filed into PercentageDifference where sector=ETS. This should happen for every year and country. I would assume I would need a loop here. I have read that dplyr is useful for this, but did not really figure out how to do that. However, if there is a better way than dplyr, that's fine with me.



The result would look like this



country year      sector    emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 72.6%
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 71.9%
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 77.3%
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 77.3%


I have not posted what I have done so far, as I did not manage doing a lot....



Thanks for help of any kind.



Nordsee










share|improve this question
























  • Please always provide a minimal data.frame in your question and show the expected result too. IMHO your question is difficult to understand and typing in the data to find and answer is a waste of time. THX :-)
    – R Yoda
    Nov 18 at 14:43










  • Contains "emissions" a percentage or an absolute value?
    – R Yoda
    Nov 18 at 14:44






  • 1




    @RYoda emisisons are absolute values/numeric. I have added an example outcome. Sorry I didn't add this earlier
    – Nordsee
    Nov 18 at 14:58













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have tried to find some information, but did not really find what I was looking for.



This is my dataframe full.data (excerpt)



country year      sector    emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 0
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 0
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 0
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 0
Austria 2015 Total 6.670398e+07 AT 0
Austria 2015 Regulated 3.800309e+07 AT 0
Austria 2015 Unregulated 2.870090e+07 AT 0
Austria 2015 ETS 2.949206e+07 AT 0
Austria 2016 Total 6.740209e+07 AT 0
Austria 2016 Regulated 3.765177e+07 AT 0
Austria 2016 Unregulated 2.975031e+07 AT 0
Austria 2016 ETS 2.900012e+07 AT 0
Austria 2017 ETS 3.055523e+07 AT 0
Belgium 1990 Total 1.204844e+08 BE 0
Belgium 1990 Regulated 7.861411e+07 BE 0
Belgium 1990 Unregulated 4.187029e+07 BE 0
Belgium 1991 Total 1.235447e+08 BE 0
Belgium 1991 Regulated 7.981152e+07 BE 0
Belgium 1991 Unregulated 4.373319e+07 BE 0
Belgium 1992 Total 1.226578e+08 BE 0
Belgium 1992 Regulated 7.828396e+07 BE 0
Belgium 1992 Unregulated 4.437385e+07 BE 0
Belgium 1993 Total 1.215573e+08 BE 0
Belgium 1993 Regulated 7.675229e+07 BE 0
Belgium 1993 Unregulated 4.480499e+07 BE 0
Belgium 1994 Total 1.249382e+08 BE 0
Belgium 1994 Regulated 8.064799e+07 BE 0
Belgium 1994 Unregulated 4.429020e+07 BE 0
....


I'm trying to fill into full.data$PercentageDifference the percentage of emissions where sector=ETS and where sector=Regulated(emissions sector=ETS is xx.y% of sector=Regulated) . This percentage value should be filed into PercentageDifference where sector=ETS. This should happen for every year and country. I would assume I would need a loop here. I have read that dplyr is useful for this, but did not really figure out how to do that. However, if there is a better way than dplyr, that's fine with me.



The result would look like this



country year      sector    emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 72.6%
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 71.9%
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 77.3%
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 77.3%


I have not posted what I have done so far, as I did not manage doing a lot....



Thanks for help of any kind.



Nordsee










share|improve this question















I have tried to find some information, but did not really find what I was looking for.



This is my dataframe full.data (excerpt)



country year      sector    emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 0
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 0
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 0
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 0
Austria 2015 Total 6.670398e+07 AT 0
Austria 2015 Regulated 3.800309e+07 AT 0
Austria 2015 Unregulated 2.870090e+07 AT 0
Austria 2015 ETS 2.949206e+07 AT 0
Austria 2016 Total 6.740209e+07 AT 0
Austria 2016 Regulated 3.765177e+07 AT 0
Austria 2016 Unregulated 2.975031e+07 AT 0
Austria 2016 ETS 2.900012e+07 AT 0
Austria 2017 ETS 3.055523e+07 AT 0
Belgium 1990 Total 1.204844e+08 BE 0
Belgium 1990 Regulated 7.861411e+07 BE 0
Belgium 1990 Unregulated 4.187029e+07 BE 0
Belgium 1991 Total 1.235447e+08 BE 0
Belgium 1991 Regulated 7.981152e+07 BE 0
Belgium 1991 Unregulated 4.373319e+07 BE 0
Belgium 1992 Total 1.226578e+08 BE 0
Belgium 1992 Regulated 7.828396e+07 BE 0
Belgium 1992 Unregulated 4.437385e+07 BE 0
Belgium 1993 Total 1.215573e+08 BE 0
Belgium 1993 Regulated 7.675229e+07 BE 0
Belgium 1993 Unregulated 4.480499e+07 BE 0
Belgium 1994 Total 1.249382e+08 BE 0
Belgium 1994 Regulated 8.064799e+07 BE 0
Belgium 1994 Unregulated 4.429020e+07 BE 0
....


I'm trying to fill into full.data$PercentageDifference the percentage of emissions where sector=ETS and where sector=Regulated(emissions sector=ETS is xx.y% of sector=Regulated) . This percentage value should be filed into PercentageDifference where sector=ETS. This should happen for every year and country. I would assume I would need a loop here. I have read that dplyr is useful for this, but did not really figure out how to do that. However, if there is a better way than dplyr, that's fine with me.



The result would look like this



country year      sector    emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 72.6%
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 71.9%
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 77.3%
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 77.3%


I have not posted what I have done so far, as I did not manage doing a lot....



Thanks for help of any kind.



Nordsee







r dplyr percentage






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 18 at 14:57

























asked Nov 18 at 13:59









Nordsee

405




405












  • Please always provide a minimal data.frame in your question and show the expected result too. IMHO your question is difficult to understand and typing in the data to find and answer is a waste of time. THX :-)
    – R Yoda
    Nov 18 at 14:43










  • Contains "emissions" a percentage or an absolute value?
    – R Yoda
    Nov 18 at 14:44






  • 1




    @RYoda emisisons are absolute values/numeric. I have added an example outcome. Sorry I didn't add this earlier
    – Nordsee
    Nov 18 at 14:58


















  • Please always provide a minimal data.frame in your question and show the expected result too. IMHO your question is difficult to understand and typing in the data to find and answer is a waste of time. THX :-)
    – R Yoda
    Nov 18 at 14:43










  • Contains "emissions" a percentage or an absolute value?
    – R Yoda
    Nov 18 at 14:44






  • 1




    @RYoda emisisons are absolute values/numeric. I have added an example outcome. Sorry I didn't add this earlier
    – Nordsee
    Nov 18 at 14:58
















Please always provide a minimal data.frame in your question and show the expected result too. IMHO your question is difficult to understand and typing in the data to find and answer is a waste of time. THX :-)
– R Yoda
Nov 18 at 14:43




Please always provide a minimal data.frame in your question and show the expected result too. IMHO your question is difficult to understand and typing in the data to find and answer is a waste of time. THX :-)
– R Yoda
Nov 18 at 14:43












Contains "emissions" a percentage or an absolute value?
– R Yoda
Nov 18 at 14:44




Contains "emissions" a percentage or an absolute value?
– R Yoda
Nov 18 at 14:44




1




1




@RYoda emisisons are absolute values/numeric. I have added an example outcome. Sorry I didn't add this earlier
– Nordsee
Nov 18 at 14:58




@RYoda emisisons are absolute values/numeric. I have added an example outcome. Sorry I didn't add this earlier
– Nordsee
Nov 18 at 14:58












2 Answers
2






active

oldest

votes

















up vote
1
down vote



accepted










What about:



library(tidyverse)
library(zoo)

df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_),
PercentageDifference = if_else(!is.na(PercentageDifference), paste0(PercentageDifference, "%"), NA_character_)
)


First 10 rows:



  country  year sector      emissions iso2  PercentageDifference
<chr> <int> <chr> <dbl> <chr> <chr>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6%
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9%
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA


It's important to know that the percentage column will be of type character then, as you have specified you would like to see % sign in it.



If you want to keep it numeric, you can just remove the last step in mutate, i.e. you can do:



library(tidyverse)
library(zoo)

df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_)
)


Output (first 10 rows):



   country  year sector      emissions iso2  PercentageDifference
<chr> <int> <chr> <dbl> <chr> <dbl>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA


If you would like to avoid loading the zoo package, you can also use fill from tidyverse in a separate step, but it is much much slower.






share|improve this answer



















  • 1




    Thanks again for helping me again. Also thank you for giving me these two options
    – Nordsee
    Nov 18 at 16:56


















up vote
1
down vote













Here comes a solution based on tydiverse/dplyr.



Files stack.txt contains pasted text from your example above.



library(tidyverse)
full_data <- read.table("stack.txt", quote=""", comment.char="")
names(full_data) <- c("country", "year", "sector", "emission", "iso", "perc")

full_data <- full_data %>%
select(-perc)
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated) %>%
select(country, year, percentage) %>%
right_join(full_data) %>%
select(country, year, sector, emission, iso, percentage) %>%
mutate(percentage = ifelse(sector == "ETS", percentage, 0))


Result:



   country year      sector  emission iso percentage
1 Austria 2011 Total 70115670 AT 0.0000000
2 Austria 2011 Regulated 42148360 AT 0.0000000
3 Austria 2011 Unregulated 27967320 AT 0.0000000
4 Austria 2011 ETS 30599420 AT 0.7259931
5 Austria 2012 Total 67661400 AT 0.0000000
6 Austria 2012 Regulated 39494450 AT 0.0000000
7 Austria 2012 Unregulated 28166950 AT 0.0000000
8 Austria 2012 ETS 28387060 AT 0.7187607
9 Austria 2013 Total 68001230 AT 0.0000000
10 Austria 2013 Regulated 38573960 AT 0.0000000
11 Austria 2013 Unregulated 29427270 AT 0.0000000


If you want more explanation, how it works, I'd suggest to break the pipeline and see intermediate results i.e



full_data %>% 
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated)

country year ETS Regulated Total Unregulated percentage
1 Austria 2011 30599420 42148360 70115670 27967320 0.7259931
2 Austria 2012 28387060 39494450 67661400 28166950 0.7187607
3 Austria 2013 29804410 38573960 68001230 29427270 0.7726562
4 Austria 2014 28055970 36311070 64253330 27942250 0.7726561
5 Austria 2015 29492060 38003090 66703980 28700900 0.7760437
6 Austria 2016 29000120 37651770 67402090 29750310 0.7702193


Regards
Paweł






share|improve this answer























  • Thank you very much this!!
    – Nordsee
    Nov 18 at 16:56











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53361698%2fcreate-loop-to-calculate-percentage-under-conditions-and-fill-in-results%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










What about:



library(tidyverse)
library(zoo)

df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_),
PercentageDifference = if_else(!is.na(PercentageDifference), paste0(PercentageDifference, "%"), NA_character_)
)


First 10 rows:



  country  year sector      emissions iso2  PercentageDifference
<chr> <int> <chr> <dbl> <chr> <chr>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6%
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9%
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA


It's important to know that the percentage column will be of type character then, as you have specified you would like to see % sign in it.



If you want to keep it numeric, you can just remove the last step in mutate, i.e. you can do:



library(tidyverse)
library(zoo)

df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_)
)


Output (first 10 rows):



   country  year sector      emissions iso2  PercentageDifference
<chr> <int> <chr> <dbl> <chr> <dbl>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA


If you would like to avoid loading the zoo package, you can also use fill from tidyverse in a separate step, but it is much much slower.






share|improve this answer



















  • 1




    Thanks again for helping me again. Also thank you for giving me these two options
    – Nordsee
    Nov 18 at 16:56















up vote
1
down vote



accepted










What about:



library(tidyverse)
library(zoo)

df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_),
PercentageDifference = if_else(!is.na(PercentageDifference), paste0(PercentageDifference, "%"), NA_character_)
)


First 10 rows:



  country  year sector      emissions iso2  PercentageDifference
<chr> <int> <chr> <dbl> <chr> <chr>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6%
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9%
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA


It's important to know that the percentage column will be of type character then, as you have specified you would like to see % sign in it.



If you want to keep it numeric, you can just remove the last step in mutate, i.e. you can do:



library(tidyverse)
library(zoo)

df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_)
)


Output (first 10 rows):



   country  year sector      emissions iso2  PercentageDifference
<chr> <int> <chr> <dbl> <chr> <dbl>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA


If you would like to avoid loading the zoo package, you can also use fill from tidyverse in a separate step, but it is much much slower.






share|improve this answer



















  • 1




    Thanks again for helping me again. Also thank you for giving me these two options
    – Nordsee
    Nov 18 at 16:56













up vote
1
down vote



accepted







up vote
1
down vote



accepted






What about:



library(tidyverse)
library(zoo)

df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_),
PercentageDifference = if_else(!is.na(PercentageDifference), paste0(PercentageDifference, "%"), NA_character_)
)


First 10 rows:



  country  year sector      emissions iso2  PercentageDifference
<chr> <int> <chr> <dbl> <chr> <chr>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6%
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9%
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA


It's important to know that the percentage column will be of type character then, as you have specified you would like to see % sign in it.



If you want to keep it numeric, you can just remove the last step in mutate, i.e. you can do:



library(tidyverse)
library(zoo)

df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_)
)


Output (first 10 rows):



   country  year sector      emissions iso2  PercentageDifference
<chr> <int> <chr> <dbl> <chr> <dbl>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA


If you would like to avoid loading the zoo package, you can also use fill from tidyverse in a separate step, but it is much much slower.






share|improve this answer














What about:



library(tidyverse)
library(zoo)

df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_),
PercentageDifference = if_else(!is.na(PercentageDifference), paste0(PercentageDifference, "%"), NA_character_)
)


First 10 rows:



  country  year sector      emissions iso2  PercentageDifference
<chr> <int> <chr> <dbl> <chr> <chr>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6%
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9%
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA


It's important to know that the percentage column will be of type character then, as you have specified you would like to see % sign in it.



If you want to keep it numeric, you can just remove the last step in mutate, i.e. you can do:



library(tidyverse)
library(zoo)

df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_)
)


Output (first 10 rows):



   country  year sector      emissions iso2  PercentageDifference
<chr> <int> <chr> <dbl> <chr> <dbl>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA


If you would like to avoid loading the zoo package, you can also use fill from tidyverse in a separate step, but it is much much slower.







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 18 at 16:08

























answered Nov 18 at 16:02









arg0naut

1,527312




1,527312








  • 1




    Thanks again for helping me again. Also thank you for giving me these two options
    – Nordsee
    Nov 18 at 16:56














  • 1




    Thanks again for helping me again. Also thank you for giving me these two options
    – Nordsee
    Nov 18 at 16:56








1




1




Thanks again for helping me again. Also thank you for giving me these two options
– Nordsee
Nov 18 at 16:56




Thanks again for helping me again. Also thank you for giving me these two options
– Nordsee
Nov 18 at 16:56












up vote
1
down vote













Here comes a solution based on tydiverse/dplyr.



Files stack.txt contains pasted text from your example above.



library(tidyverse)
full_data <- read.table("stack.txt", quote=""", comment.char="")
names(full_data) <- c("country", "year", "sector", "emission", "iso", "perc")

full_data <- full_data %>%
select(-perc)
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated) %>%
select(country, year, percentage) %>%
right_join(full_data) %>%
select(country, year, sector, emission, iso, percentage) %>%
mutate(percentage = ifelse(sector == "ETS", percentage, 0))


Result:



   country year      sector  emission iso percentage
1 Austria 2011 Total 70115670 AT 0.0000000
2 Austria 2011 Regulated 42148360 AT 0.0000000
3 Austria 2011 Unregulated 27967320 AT 0.0000000
4 Austria 2011 ETS 30599420 AT 0.7259931
5 Austria 2012 Total 67661400 AT 0.0000000
6 Austria 2012 Regulated 39494450 AT 0.0000000
7 Austria 2012 Unregulated 28166950 AT 0.0000000
8 Austria 2012 ETS 28387060 AT 0.7187607
9 Austria 2013 Total 68001230 AT 0.0000000
10 Austria 2013 Regulated 38573960 AT 0.0000000
11 Austria 2013 Unregulated 29427270 AT 0.0000000


If you want more explanation, how it works, I'd suggest to break the pipeline and see intermediate results i.e



full_data %>% 
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated)

country year ETS Regulated Total Unregulated percentage
1 Austria 2011 30599420 42148360 70115670 27967320 0.7259931
2 Austria 2012 28387060 39494450 67661400 28166950 0.7187607
3 Austria 2013 29804410 38573960 68001230 29427270 0.7726562
4 Austria 2014 28055970 36311070 64253330 27942250 0.7726561
5 Austria 2015 29492060 38003090 66703980 28700900 0.7760437
6 Austria 2016 29000120 37651770 67402090 29750310 0.7702193


Regards
Paweł






share|improve this answer























  • Thank you very much this!!
    – Nordsee
    Nov 18 at 16:56















up vote
1
down vote













Here comes a solution based on tydiverse/dplyr.



Files stack.txt contains pasted text from your example above.



library(tidyverse)
full_data <- read.table("stack.txt", quote=""", comment.char="")
names(full_data) <- c("country", "year", "sector", "emission", "iso", "perc")

full_data <- full_data %>%
select(-perc)
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated) %>%
select(country, year, percentage) %>%
right_join(full_data) %>%
select(country, year, sector, emission, iso, percentage) %>%
mutate(percentage = ifelse(sector == "ETS", percentage, 0))


Result:



   country year      sector  emission iso percentage
1 Austria 2011 Total 70115670 AT 0.0000000
2 Austria 2011 Regulated 42148360 AT 0.0000000
3 Austria 2011 Unregulated 27967320 AT 0.0000000
4 Austria 2011 ETS 30599420 AT 0.7259931
5 Austria 2012 Total 67661400 AT 0.0000000
6 Austria 2012 Regulated 39494450 AT 0.0000000
7 Austria 2012 Unregulated 28166950 AT 0.0000000
8 Austria 2012 ETS 28387060 AT 0.7187607
9 Austria 2013 Total 68001230 AT 0.0000000
10 Austria 2013 Regulated 38573960 AT 0.0000000
11 Austria 2013 Unregulated 29427270 AT 0.0000000


If you want more explanation, how it works, I'd suggest to break the pipeline and see intermediate results i.e



full_data %>% 
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated)

country year ETS Regulated Total Unregulated percentage
1 Austria 2011 30599420 42148360 70115670 27967320 0.7259931
2 Austria 2012 28387060 39494450 67661400 28166950 0.7187607
3 Austria 2013 29804410 38573960 68001230 29427270 0.7726562
4 Austria 2014 28055970 36311070 64253330 27942250 0.7726561
5 Austria 2015 29492060 38003090 66703980 28700900 0.7760437
6 Austria 2016 29000120 37651770 67402090 29750310 0.7702193


Regards
Paweł






share|improve this answer























  • Thank you very much this!!
    – Nordsee
    Nov 18 at 16:56













up vote
1
down vote










up vote
1
down vote









Here comes a solution based on tydiverse/dplyr.



Files stack.txt contains pasted text from your example above.



library(tidyverse)
full_data <- read.table("stack.txt", quote=""", comment.char="")
names(full_data) <- c("country", "year", "sector", "emission", "iso", "perc")

full_data <- full_data %>%
select(-perc)
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated) %>%
select(country, year, percentage) %>%
right_join(full_data) %>%
select(country, year, sector, emission, iso, percentage) %>%
mutate(percentage = ifelse(sector == "ETS", percentage, 0))


Result:



   country year      sector  emission iso percentage
1 Austria 2011 Total 70115670 AT 0.0000000
2 Austria 2011 Regulated 42148360 AT 0.0000000
3 Austria 2011 Unregulated 27967320 AT 0.0000000
4 Austria 2011 ETS 30599420 AT 0.7259931
5 Austria 2012 Total 67661400 AT 0.0000000
6 Austria 2012 Regulated 39494450 AT 0.0000000
7 Austria 2012 Unregulated 28166950 AT 0.0000000
8 Austria 2012 ETS 28387060 AT 0.7187607
9 Austria 2013 Total 68001230 AT 0.0000000
10 Austria 2013 Regulated 38573960 AT 0.0000000
11 Austria 2013 Unregulated 29427270 AT 0.0000000


If you want more explanation, how it works, I'd suggest to break the pipeline and see intermediate results i.e



full_data %>% 
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated)

country year ETS Regulated Total Unregulated percentage
1 Austria 2011 30599420 42148360 70115670 27967320 0.7259931
2 Austria 2012 28387060 39494450 67661400 28166950 0.7187607
3 Austria 2013 29804410 38573960 68001230 29427270 0.7726562
4 Austria 2014 28055970 36311070 64253330 27942250 0.7726561
5 Austria 2015 29492060 38003090 66703980 28700900 0.7760437
6 Austria 2016 29000120 37651770 67402090 29750310 0.7702193


Regards
Paweł






share|improve this answer














Here comes a solution based on tydiverse/dplyr.



Files stack.txt contains pasted text from your example above.



library(tidyverse)
full_data <- read.table("stack.txt", quote=""", comment.char="")
names(full_data) <- c("country", "year", "sector", "emission", "iso", "perc")

full_data <- full_data %>%
select(-perc)
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated) %>%
select(country, year, percentage) %>%
right_join(full_data) %>%
select(country, year, sector, emission, iso, percentage) %>%
mutate(percentage = ifelse(sector == "ETS", percentage, 0))


Result:



   country year      sector  emission iso percentage
1 Austria 2011 Total 70115670 AT 0.0000000
2 Austria 2011 Regulated 42148360 AT 0.0000000
3 Austria 2011 Unregulated 27967320 AT 0.0000000
4 Austria 2011 ETS 30599420 AT 0.7259931
5 Austria 2012 Total 67661400 AT 0.0000000
6 Austria 2012 Regulated 39494450 AT 0.0000000
7 Austria 2012 Unregulated 28166950 AT 0.0000000
8 Austria 2012 ETS 28387060 AT 0.7187607
9 Austria 2013 Total 68001230 AT 0.0000000
10 Austria 2013 Regulated 38573960 AT 0.0000000
11 Austria 2013 Unregulated 29427270 AT 0.0000000


If you want more explanation, how it works, I'd suggest to break the pipeline and see intermediate results i.e



full_data %>% 
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated)

country year ETS Regulated Total Unregulated percentage
1 Austria 2011 30599420 42148360 70115670 27967320 0.7259931
2 Austria 2012 28387060 39494450 67661400 28166950 0.7187607
3 Austria 2013 29804410 38573960 68001230 29427270 0.7726562
4 Austria 2014 28055970 36311070 64253330 27942250 0.7726561
5 Austria 2015 29492060 38003090 66703980 28700900 0.7760437
6 Austria 2016 29000120 37651770 67402090 29750310 0.7702193


Regards
Paweł







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 18 at 15:06

























answered Nov 18 at 14:55









Pawel Stradowski

13918




13918












  • Thank you very much this!!
    – Nordsee
    Nov 18 at 16:56


















  • Thank you very much this!!
    – Nordsee
    Nov 18 at 16:56
















Thank you very much this!!
– Nordsee
Nov 18 at 16:56




Thank you very much this!!
– Nordsee
Nov 18 at 16:56


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53361698%2fcreate-loop-to-calculate-percentage-under-conditions-and-fill-in-results%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Ottavio Pratesi

Tricia Helfer

15 giugno