Create loop to calculate percentage under conditions and fill in results
up vote
0
down vote
favorite
I have tried to find some information, but did not really find what I was looking for.
This is my dataframe full.data (excerpt)
country year sector emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 0
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 0
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 0
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 0
Austria 2015 Total 6.670398e+07 AT 0
Austria 2015 Regulated 3.800309e+07 AT 0
Austria 2015 Unregulated 2.870090e+07 AT 0
Austria 2015 ETS 2.949206e+07 AT 0
Austria 2016 Total 6.740209e+07 AT 0
Austria 2016 Regulated 3.765177e+07 AT 0
Austria 2016 Unregulated 2.975031e+07 AT 0
Austria 2016 ETS 2.900012e+07 AT 0
Austria 2017 ETS 3.055523e+07 AT 0
Belgium 1990 Total 1.204844e+08 BE 0
Belgium 1990 Regulated 7.861411e+07 BE 0
Belgium 1990 Unregulated 4.187029e+07 BE 0
Belgium 1991 Total 1.235447e+08 BE 0
Belgium 1991 Regulated 7.981152e+07 BE 0
Belgium 1991 Unregulated 4.373319e+07 BE 0
Belgium 1992 Total 1.226578e+08 BE 0
Belgium 1992 Regulated 7.828396e+07 BE 0
Belgium 1992 Unregulated 4.437385e+07 BE 0
Belgium 1993 Total 1.215573e+08 BE 0
Belgium 1993 Regulated 7.675229e+07 BE 0
Belgium 1993 Unregulated 4.480499e+07 BE 0
Belgium 1994 Total 1.249382e+08 BE 0
Belgium 1994 Regulated 8.064799e+07 BE 0
Belgium 1994 Unregulated 4.429020e+07 BE 0
....
I'm trying to fill into full.data$PercentageDifference the percentage of emissions where sector=ETS and where sector=Regulated(emissions sector=ETS is xx.y% of sector=Regulated) . This percentage value should be filed into PercentageDifference where sector=ETS. This should happen for every year and country. I would assume I would need a loop here. I have read that dplyr is useful for this, but did not really figure out how to do that. However, if there is a better way than dplyr, that's fine with me.
The result would look like this
country year sector emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 72.6%
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 71.9%
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 77.3%
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 77.3%
I have not posted what I have done so far, as I did not manage doing a lot....
Thanks for help of any kind.
Nordsee
r dplyr percentage
add a comment |
up vote
0
down vote
favorite
I have tried to find some information, but did not really find what I was looking for.
This is my dataframe full.data (excerpt)
country year sector emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 0
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 0
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 0
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 0
Austria 2015 Total 6.670398e+07 AT 0
Austria 2015 Regulated 3.800309e+07 AT 0
Austria 2015 Unregulated 2.870090e+07 AT 0
Austria 2015 ETS 2.949206e+07 AT 0
Austria 2016 Total 6.740209e+07 AT 0
Austria 2016 Regulated 3.765177e+07 AT 0
Austria 2016 Unregulated 2.975031e+07 AT 0
Austria 2016 ETS 2.900012e+07 AT 0
Austria 2017 ETS 3.055523e+07 AT 0
Belgium 1990 Total 1.204844e+08 BE 0
Belgium 1990 Regulated 7.861411e+07 BE 0
Belgium 1990 Unregulated 4.187029e+07 BE 0
Belgium 1991 Total 1.235447e+08 BE 0
Belgium 1991 Regulated 7.981152e+07 BE 0
Belgium 1991 Unregulated 4.373319e+07 BE 0
Belgium 1992 Total 1.226578e+08 BE 0
Belgium 1992 Regulated 7.828396e+07 BE 0
Belgium 1992 Unregulated 4.437385e+07 BE 0
Belgium 1993 Total 1.215573e+08 BE 0
Belgium 1993 Regulated 7.675229e+07 BE 0
Belgium 1993 Unregulated 4.480499e+07 BE 0
Belgium 1994 Total 1.249382e+08 BE 0
Belgium 1994 Regulated 8.064799e+07 BE 0
Belgium 1994 Unregulated 4.429020e+07 BE 0
....
I'm trying to fill into full.data$PercentageDifference the percentage of emissions where sector=ETS and where sector=Regulated(emissions sector=ETS is xx.y% of sector=Regulated) . This percentage value should be filed into PercentageDifference where sector=ETS. This should happen for every year and country. I would assume I would need a loop here. I have read that dplyr is useful for this, but did not really figure out how to do that. However, if there is a better way than dplyr, that's fine with me.
The result would look like this
country year sector emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 72.6%
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 71.9%
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 77.3%
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 77.3%
I have not posted what I have done so far, as I did not manage doing a lot....
Thanks for help of any kind.
Nordsee
r dplyr percentage
Please always provide a minimaldata.framein your question and show the expected result too. IMHO your question is difficult to understand and typing in the data to find and answer is a waste of time. THX :-)
– R Yoda
Nov 18 at 14:43
Contains "emissions" a percentage or an absolute value?
– R Yoda
Nov 18 at 14:44
1
@RYoda emisisons are absolute values/numeric. I have added an example outcome. Sorry I didn't add this earlier
– Nordsee
Nov 18 at 14:58
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I have tried to find some information, but did not really find what I was looking for.
This is my dataframe full.data (excerpt)
country year sector emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 0
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 0
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 0
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 0
Austria 2015 Total 6.670398e+07 AT 0
Austria 2015 Regulated 3.800309e+07 AT 0
Austria 2015 Unregulated 2.870090e+07 AT 0
Austria 2015 ETS 2.949206e+07 AT 0
Austria 2016 Total 6.740209e+07 AT 0
Austria 2016 Regulated 3.765177e+07 AT 0
Austria 2016 Unregulated 2.975031e+07 AT 0
Austria 2016 ETS 2.900012e+07 AT 0
Austria 2017 ETS 3.055523e+07 AT 0
Belgium 1990 Total 1.204844e+08 BE 0
Belgium 1990 Regulated 7.861411e+07 BE 0
Belgium 1990 Unregulated 4.187029e+07 BE 0
Belgium 1991 Total 1.235447e+08 BE 0
Belgium 1991 Regulated 7.981152e+07 BE 0
Belgium 1991 Unregulated 4.373319e+07 BE 0
Belgium 1992 Total 1.226578e+08 BE 0
Belgium 1992 Regulated 7.828396e+07 BE 0
Belgium 1992 Unregulated 4.437385e+07 BE 0
Belgium 1993 Total 1.215573e+08 BE 0
Belgium 1993 Regulated 7.675229e+07 BE 0
Belgium 1993 Unregulated 4.480499e+07 BE 0
Belgium 1994 Total 1.249382e+08 BE 0
Belgium 1994 Regulated 8.064799e+07 BE 0
Belgium 1994 Unregulated 4.429020e+07 BE 0
....
I'm trying to fill into full.data$PercentageDifference the percentage of emissions where sector=ETS and where sector=Regulated(emissions sector=ETS is xx.y% of sector=Regulated) . This percentage value should be filed into PercentageDifference where sector=ETS. This should happen for every year and country. I would assume I would need a loop here. I have read that dplyr is useful for this, but did not really figure out how to do that. However, if there is a better way than dplyr, that's fine with me.
The result would look like this
country year sector emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 72.6%
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 71.9%
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 77.3%
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 77.3%
I have not posted what I have done so far, as I did not manage doing a lot....
Thanks for help of any kind.
Nordsee
r dplyr percentage
I have tried to find some information, but did not really find what I was looking for.
This is my dataframe full.data (excerpt)
country year sector emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 0
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 0
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 0
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 0
Austria 2015 Total 6.670398e+07 AT 0
Austria 2015 Regulated 3.800309e+07 AT 0
Austria 2015 Unregulated 2.870090e+07 AT 0
Austria 2015 ETS 2.949206e+07 AT 0
Austria 2016 Total 6.740209e+07 AT 0
Austria 2016 Regulated 3.765177e+07 AT 0
Austria 2016 Unregulated 2.975031e+07 AT 0
Austria 2016 ETS 2.900012e+07 AT 0
Austria 2017 ETS 3.055523e+07 AT 0
Belgium 1990 Total 1.204844e+08 BE 0
Belgium 1990 Regulated 7.861411e+07 BE 0
Belgium 1990 Unregulated 4.187029e+07 BE 0
Belgium 1991 Total 1.235447e+08 BE 0
Belgium 1991 Regulated 7.981152e+07 BE 0
Belgium 1991 Unregulated 4.373319e+07 BE 0
Belgium 1992 Total 1.226578e+08 BE 0
Belgium 1992 Regulated 7.828396e+07 BE 0
Belgium 1992 Unregulated 4.437385e+07 BE 0
Belgium 1993 Total 1.215573e+08 BE 0
Belgium 1993 Regulated 7.675229e+07 BE 0
Belgium 1993 Unregulated 4.480499e+07 BE 0
Belgium 1994 Total 1.249382e+08 BE 0
Belgium 1994 Regulated 8.064799e+07 BE 0
Belgium 1994 Unregulated 4.429020e+07 BE 0
....
I'm trying to fill into full.data$PercentageDifference the percentage of emissions where sector=ETS and where sector=Regulated(emissions sector=ETS is xx.y% of sector=Regulated) . This percentage value should be filed into PercentageDifference where sector=ETS. This should happen for every year and country. I would assume I would need a loop here. I have read that dplyr is useful for this, but did not really figure out how to do that. However, if there is a better way than dplyr, that's fine with me.
The result would look like this
country year sector emissions iso2 PercentageDifference
....
Austria 2011 Total 7.011567e+07 AT 0
Austria 2011 Regulated 4.214836e+07 AT 0
Austria 2011 Unregulated 2.796732e+07 AT 0
Austria 2011 ETS 3.059942e+07 AT 72.6%
Austria 2012 Total 6.766140e+07 AT 0
Austria 2012 Regulated 3.949445e+07 AT 0
Austria 2012 Unregulated 2.816695e+07 AT 0
Austria 2012 ETS 2.838706e+07 AT 71.9%
Austria 2013 Total 6.800123e+07 AT 0
Austria 2013 Regulated 3.857396e+07 AT 0
Austria 2013 Unregulated 2.942727e+07 AT 0
Austria 2013 ETS 2.980441e+07 AT 77.3%
Austria 2014 Total 6.425333e+07 AT 0
Austria 2014 Regulated 3.631107e+07 AT 0
Austria 2014 Unregulated 2.794225e+07 AT 0
Austria 2014 ETS 2.805597e+07 AT 77.3%
I have not posted what I have done so far, as I did not manage doing a lot....
Thanks for help of any kind.
Nordsee
r dplyr percentage
r dplyr percentage
edited Nov 18 at 14:57
asked Nov 18 at 13:59
Nordsee
405
405
Please always provide a minimaldata.framein your question and show the expected result too. IMHO your question is difficult to understand and typing in the data to find and answer is a waste of time. THX :-)
– R Yoda
Nov 18 at 14:43
Contains "emissions" a percentage or an absolute value?
– R Yoda
Nov 18 at 14:44
1
@RYoda emisisons are absolute values/numeric. I have added an example outcome. Sorry I didn't add this earlier
– Nordsee
Nov 18 at 14:58
add a comment |
Please always provide a minimaldata.framein your question and show the expected result too. IMHO your question is difficult to understand and typing in the data to find and answer is a waste of time. THX :-)
– R Yoda
Nov 18 at 14:43
Contains "emissions" a percentage or an absolute value?
– R Yoda
Nov 18 at 14:44
1
@RYoda emisisons are absolute values/numeric. I have added an example outcome. Sorry I didn't add this earlier
– Nordsee
Nov 18 at 14:58
Please always provide a minimal
data.frame in your question and show the expected result too. IMHO your question is difficult to understand and typing in the data to find and answer is a waste of time. THX :-)– R Yoda
Nov 18 at 14:43
Please always provide a minimal
data.frame in your question and show the expected result too. IMHO your question is difficult to understand and typing in the data to find and answer is a waste of time. THX :-)– R Yoda
Nov 18 at 14:43
Contains "emissions" a percentage or an absolute value?
– R Yoda
Nov 18 at 14:44
Contains "emissions" a percentage or an absolute value?
– R Yoda
Nov 18 at 14:44
1
1
@RYoda emisisons are absolute values/numeric. I have added an example outcome. Sorry I didn't add this earlier
– Nordsee
Nov 18 at 14:58
@RYoda emisisons are absolute values/numeric. I have added an example outcome. Sorry I didn't add this earlier
– Nordsee
Nov 18 at 14:58
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
What about:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_),
PercentageDifference = if_else(!is.na(PercentageDifference), paste0(PercentageDifference, "%"), NA_character_)
)
First 10 rows:
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <chr>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6%
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9%
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
It's important to know that the percentage column will be of type character then, as you have specified you would like to see % sign in it.
If you want to keep it numeric, you can just remove the last step in mutate, i.e. you can do:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_)
)
Output (first 10 rows):
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <dbl>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
If you would like to avoid loading the zoo package, you can also use fill from tidyverse in a separate step, but it is much much slower.
1
Thanks again for helping me again. Also thank you for giving me these two options
– Nordsee
Nov 18 at 16:56
add a comment |
up vote
1
down vote
Here comes a solution based on tydiverse/dplyr.
Files stack.txt contains pasted text from your example above.
library(tidyverse)
full_data <- read.table("stack.txt", quote=""", comment.char="")
names(full_data) <- c("country", "year", "sector", "emission", "iso", "perc")
full_data <- full_data %>%
select(-perc)
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated) %>%
select(country, year, percentage) %>%
right_join(full_data) %>%
select(country, year, sector, emission, iso, percentage) %>%
mutate(percentage = ifelse(sector == "ETS", percentage, 0))
Result:
country year sector emission iso percentage
1 Austria 2011 Total 70115670 AT 0.0000000
2 Austria 2011 Regulated 42148360 AT 0.0000000
3 Austria 2011 Unregulated 27967320 AT 0.0000000
4 Austria 2011 ETS 30599420 AT 0.7259931
5 Austria 2012 Total 67661400 AT 0.0000000
6 Austria 2012 Regulated 39494450 AT 0.0000000
7 Austria 2012 Unregulated 28166950 AT 0.0000000
8 Austria 2012 ETS 28387060 AT 0.7187607
9 Austria 2013 Total 68001230 AT 0.0000000
10 Austria 2013 Regulated 38573960 AT 0.0000000
11 Austria 2013 Unregulated 29427270 AT 0.0000000
If you want more explanation, how it works, I'd suggest to break the pipeline and see intermediate results i.e
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated)
country year ETS Regulated Total Unregulated percentage
1 Austria 2011 30599420 42148360 70115670 27967320 0.7259931
2 Austria 2012 28387060 39494450 67661400 28166950 0.7187607
3 Austria 2013 29804410 38573960 68001230 29427270 0.7726562
4 Austria 2014 28055970 36311070 64253330 27942250 0.7726561
5 Austria 2015 29492060 38003090 66703980 28700900 0.7760437
6 Austria 2016 29000120 37651770 67402090 29750310 0.7702193
Regards
Paweł
Thank you very much this!!
– Nordsee
Nov 18 at 16:56
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
What about:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_),
PercentageDifference = if_else(!is.na(PercentageDifference), paste0(PercentageDifference, "%"), NA_character_)
)
First 10 rows:
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <chr>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6%
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9%
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
It's important to know that the percentage column will be of type character then, as you have specified you would like to see % sign in it.
If you want to keep it numeric, you can just remove the last step in mutate, i.e. you can do:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_)
)
Output (first 10 rows):
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <dbl>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
If you would like to avoid loading the zoo package, you can also use fill from tidyverse in a separate step, but it is much much slower.
1
Thanks again for helping me again. Also thank you for giving me these two options
– Nordsee
Nov 18 at 16:56
add a comment |
up vote
1
down vote
accepted
What about:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_),
PercentageDifference = if_else(!is.na(PercentageDifference), paste0(PercentageDifference, "%"), NA_character_)
)
First 10 rows:
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <chr>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6%
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9%
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
It's important to know that the percentage column will be of type character then, as you have specified you would like to see % sign in it.
If you want to keep it numeric, you can just remove the last step in mutate, i.e. you can do:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_)
)
Output (first 10 rows):
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <dbl>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
If you would like to avoid loading the zoo package, you can also use fill from tidyverse in a separate step, but it is much much slower.
1
Thanks again for helping me again. Also thank you for giving me these two options
– Nordsee
Nov 18 at 16:56
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
What about:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_),
PercentageDifference = if_else(!is.na(PercentageDifference), paste0(PercentageDifference, "%"), NA_character_)
)
First 10 rows:
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <chr>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6%
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9%
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
It's important to know that the percentage column will be of type character then, as you have specified you would like to see % sign in it.
If you want to keep it numeric, you can just remove the last step in mutate, i.e. you can do:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_)
)
Output (first 10 rows):
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <dbl>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
If you would like to avoid loading the zoo package, you can also use fill from tidyverse in a separate step, but it is much much slower.
What about:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_),
PercentageDifference = if_else(!is.na(PercentageDifference), paste0(PercentageDifference, "%"), NA_character_)
)
First 10 rows:
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <chr>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6%
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9%
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
It's important to know that the percentage column will be of type character then, as you have specified you would like to see % sign in it.
If you want to keep it numeric, you can just remove the last step in mutate, i.e. you can do:
library(tidyverse)
library(zoo)
df %>%
group_by(country, year) %>%
mutate(
PercentageDifference = if_else(sector %in% c("ETS", "Regulated"), emissions, NA_real_),
PercentageDifference = na.locf(PercentageDifference, na.rm = FALSE),
PercentageDifference = if_else(sector == "ETS", round((PercentageDifference / lag(PercentageDifference)) * 100,1), NA_real_)
)
Output (first 10 rows):
country year sector emissions iso2 PercentageDifference
<chr> <int> <chr> <dbl> <chr> <dbl>
1 Austria 2011 Total 70115670 AT NA
2 Austria 2011 Regulated 42148360 AT NA
3 Austria 2011 Unregulated 27967320 AT NA
4 Austria 2011 ETS 30599420 AT 72.6
5 Austria 2012 Total 67661400 AT NA
6 Austria 2012 Regulated 39494450 AT NA
7 Austria 2012 Unregulated 28166950 AT NA
8 Austria 2012 ETS 28387060 AT 71.9
9 Austria 2013 Total 68001230 AT NA
10 Austria 2013 Regulated 38573960 AT NA
If you would like to avoid loading the zoo package, you can also use fill from tidyverse in a separate step, but it is much much slower.
edited Nov 18 at 16:08
answered Nov 18 at 16:02
arg0naut
1,527312
1,527312
1
Thanks again for helping me again. Also thank you for giving me these two options
– Nordsee
Nov 18 at 16:56
add a comment |
1
Thanks again for helping me again. Also thank you for giving me these two options
– Nordsee
Nov 18 at 16:56
1
1
Thanks again for helping me again. Also thank you for giving me these two options
– Nordsee
Nov 18 at 16:56
Thanks again for helping me again. Also thank you for giving me these two options
– Nordsee
Nov 18 at 16:56
add a comment |
up vote
1
down vote
Here comes a solution based on tydiverse/dplyr.
Files stack.txt contains pasted text from your example above.
library(tidyverse)
full_data <- read.table("stack.txt", quote=""", comment.char="")
names(full_data) <- c("country", "year", "sector", "emission", "iso", "perc")
full_data <- full_data %>%
select(-perc)
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated) %>%
select(country, year, percentage) %>%
right_join(full_data) %>%
select(country, year, sector, emission, iso, percentage) %>%
mutate(percentage = ifelse(sector == "ETS", percentage, 0))
Result:
country year sector emission iso percentage
1 Austria 2011 Total 70115670 AT 0.0000000
2 Austria 2011 Regulated 42148360 AT 0.0000000
3 Austria 2011 Unregulated 27967320 AT 0.0000000
4 Austria 2011 ETS 30599420 AT 0.7259931
5 Austria 2012 Total 67661400 AT 0.0000000
6 Austria 2012 Regulated 39494450 AT 0.0000000
7 Austria 2012 Unregulated 28166950 AT 0.0000000
8 Austria 2012 ETS 28387060 AT 0.7187607
9 Austria 2013 Total 68001230 AT 0.0000000
10 Austria 2013 Regulated 38573960 AT 0.0000000
11 Austria 2013 Unregulated 29427270 AT 0.0000000
If you want more explanation, how it works, I'd suggest to break the pipeline and see intermediate results i.e
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated)
country year ETS Regulated Total Unregulated percentage
1 Austria 2011 30599420 42148360 70115670 27967320 0.7259931
2 Austria 2012 28387060 39494450 67661400 28166950 0.7187607
3 Austria 2013 29804410 38573960 68001230 29427270 0.7726562
4 Austria 2014 28055970 36311070 64253330 27942250 0.7726561
5 Austria 2015 29492060 38003090 66703980 28700900 0.7760437
6 Austria 2016 29000120 37651770 67402090 29750310 0.7702193
Regards
Paweł
Thank you very much this!!
– Nordsee
Nov 18 at 16:56
add a comment |
up vote
1
down vote
Here comes a solution based on tydiverse/dplyr.
Files stack.txt contains pasted text from your example above.
library(tidyverse)
full_data <- read.table("stack.txt", quote=""", comment.char="")
names(full_data) <- c("country", "year", "sector", "emission", "iso", "perc")
full_data <- full_data %>%
select(-perc)
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated) %>%
select(country, year, percentage) %>%
right_join(full_data) %>%
select(country, year, sector, emission, iso, percentage) %>%
mutate(percentage = ifelse(sector == "ETS", percentage, 0))
Result:
country year sector emission iso percentage
1 Austria 2011 Total 70115670 AT 0.0000000
2 Austria 2011 Regulated 42148360 AT 0.0000000
3 Austria 2011 Unregulated 27967320 AT 0.0000000
4 Austria 2011 ETS 30599420 AT 0.7259931
5 Austria 2012 Total 67661400 AT 0.0000000
6 Austria 2012 Regulated 39494450 AT 0.0000000
7 Austria 2012 Unregulated 28166950 AT 0.0000000
8 Austria 2012 ETS 28387060 AT 0.7187607
9 Austria 2013 Total 68001230 AT 0.0000000
10 Austria 2013 Regulated 38573960 AT 0.0000000
11 Austria 2013 Unregulated 29427270 AT 0.0000000
If you want more explanation, how it works, I'd suggest to break the pipeline and see intermediate results i.e
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated)
country year ETS Regulated Total Unregulated percentage
1 Austria 2011 30599420 42148360 70115670 27967320 0.7259931
2 Austria 2012 28387060 39494450 67661400 28166950 0.7187607
3 Austria 2013 29804410 38573960 68001230 29427270 0.7726562
4 Austria 2014 28055970 36311070 64253330 27942250 0.7726561
5 Austria 2015 29492060 38003090 66703980 28700900 0.7760437
6 Austria 2016 29000120 37651770 67402090 29750310 0.7702193
Regards
Paweł
Thank you very much this!!
– Nordsee
Nov 18 at 16:56
add a comment |
up vote
1
down vote
up vote
1
down vote
Here comes a solution based on tydiverse/dplyr.
Files stack.txt contains pasted text from your example above.
library(tidyverse)
full_data <- read.table("stack.txt", quote=""", comment.char="")
names(full_data) <- c("country", "year", "sector", "emission", "iso", "perc")
full_data <- full_data %>%
select(-perc)
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated) %>%
select(country, year, percentage) %>%
right_join(full_data) %>%
select(country, year, sector, emission, iso, percentage) %>%
mutate(percentage = ifelse(sector == "ETS", percentage, 0))
Result:
country year sector emission iso percentage
1 Austria 2011 Total 70115670 AT 0.0000000
2 Austria 2011 Regulated 42148360 AT 0.0000000
3 Austria 2011 Unregulated 27967320 AT 0.0000000
4 Austria 2011 ETS 30599420 AT 0.7259931
5 Austria 2012 Total 67661400 AT 0.0000000
6 Austria 2012 Regulated 39494450 AT 0.0000000
7 Austria 2012 Unregulated 28166950 AT 0.0000000
8 Austria 2012 ETS 28387060 AT 0.7187607
9 Austria 2013 Total 68001230 AT 0.0000000
10 Austria 2013 Regulated 38573960 AT 0.0000000
11 Austria 2013 Unregulated 29427270 AT 0.0000000
If you want more explanation, how it works, I'd suggest to break the pipeline and see intermediate results i.e
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated)
country year ETS Regulated Total Unregulated percentage
1 Austria 2011 30599420 42148360 70115670 27967320 0.7259931
2 Austria 2012 28387060 39494450 67661400 28166950 0.7187607
3 Austria 2013 29804410 38573960 68001230 29427270 0.7726562
4 Austria 2014 28055970 36311070 64253330 27942250 0.7726561
5 Austria 2015 29492060 38003090 66703980 28700900 0.7760437
6 Austria 2016 29000120 37651770 67402090 29750310 0.7702193
Regards
Paweł
Here comes a solution based on tydiverse/dplyr.
Files stack.txt contains pasted text from your example above.
library(tidyverse)
full_data <- read.table("stack.txt", quote=""", comment.char="")
names(full_data) <- c("country", "year", "sector", "emission", "iso", "perc")
full_data <- full_data %>%
select(-perc)
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated) %>%
select(country, year, percentage) %>%
right_join(full_data) %>%
select(country, year, sector, emission, iso, percentage) %>%
mutate(percentage = ifelse(sector == "ETS", percentage, 0))
Result:
country year sector emission iso percentage
1 Austria 2011 Total 70115670 AT 0.0000000
2 Austria 2011 Regulated 42148360 AT 0.0000000
3 Austria 2011 Unregulated 27967320 AT 0.0000000
4 Austria 2011 ETS 30599420 AT 0.7259931
5 Austria 2012 Total 67661400 AT 0.0000000
6 Austria 2012 Regulated 39494450 AT 0.0000000
7 Austria 2012 Unregulated 28166950 AT 0.0000000
8 Austria 2012 ETS 28387060 AT 0.7187607
9 Austria 2013 Total 68001230 AT 0.0000000
10 Austria 2013 Regulated 38573960 AT 0.0000000
11 Austria 2013 Unregulated 29427270 AT 0.0000000
If you want more explanation, how it works, I'd suggest to break the pipeline and see intermediate results i.e
full_data %>%
select(-iso) %>%
spread(sector, emission) %>%
mutate(percentage = ETS/Regulated)
country year ETS Regulated Total Unregulated percentage
1 Austria 2011 30599420 42148360 70115670 27967320 0.7259931
2 Austria 2012 28387060 39494450 67661400 28166950 0.7187607
3 Austria 2013 29804410 38573960 68001230 29427270 0.7726562
4 Austria 2014 28055970 36311070 64253330 27942250 0.7726561
5 Austria 2015 29492060 38003090 66703980 28700900 0.7760437
6 Austria 2016 29000120 37651770 67402090 29750310 0.7702193
Regards
Paweł
edited Nov 18 at 15:06
answered Nov 18 at 14:55
Pawel Stradowski
13918
13918
Thank you very much this!!
– Nordsee
Nov 18 at 16:56
add a comment |
Thank you very much this!!
– Nordsee
Nov 18 at 16:56
Thank you very much this!!
– Nordsee
Nov 18 at 16:56
Thank you very much this!!
– Nordsee
Nov 18 at 16:56
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53361698%2fcreate-loop-to-calculate-percentage-under-conditions-and-fill-in-results%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Please always provide a minimal
data.framein your question and show the expected result too. IMHO your question is difficult to understand and typing in the data to find and answer is a waste of time. THX :-)– R Yoda
Nov 18 at 14:43
Contains "emissions" a percentage or an absolute value?
– R Yoda
Nov 18 at 14:44
1
@RYoda emisisons are absolute values/numeric. I have added an example outcome. Sorry I didn't add this earlier
– Nordsee
Nov 18 at 14:58