Calculating the rate of each category in a factor variable, by categories in another factor
up vote
0
down vote
favorite
Here are two columns, both of which are factor variables. The first one is the races of inmates, the second one is whether they recidivated or not. And I'd like to plot the rate of recidivism by race. How should I achieve this?
And I have tried this:
df %>%
group_by(race, Recidivated) %>%
summarize(count = n()) %>%
arrange (-count) %>%
ggplot(aes(reorder(race, count, FUN = max),
count, fill = race)) +
geom_col() +
coord_flip() +
scale_fill_manual(values=palette_9_colors) +
theme(legend.position = "none") +
labs(x = "Charge", y = "Count",
title="Recidivism by Rates",
subtitle= "Broward County - Source: Propublica",
caption="UrbanSpatialAnalysis.com") +
plotTheme()
The result is a histogram calculating the number of each races. How can I get a plot that visualizes the rate of recidivism by race? Thank you!!!
Here are some of the data!
> head(df)
sex age age_cat race priors_count two_year_recid
1 Male 69 Greater than 45 Other 0 0
2 Male 34 25 - 45 African-American 0 1
3 Male 24 Less than 25 African-American 4 1
4 Male 44 25 - 45 Other 0 0
5 Male 41 25 - 45 Caucasian 14 1
6 Male 43 25 - 45 Other 3 0
r_charge_desc c_charge_desc
1 Aggravated Assault w/Firearm
2 Felony Battery (Dom Strang) Felony Battery w/Prior Convict
3 Driving Under The Influence Possession of Cocaine
4 Battery
5 Poss of Firearm by Convic Felo Possession Burglary Tools
6 arrest case no charge
c_charge_degree r_charge_degree juv_other_count length_of_stay
1 F 0 1
2 F (F3) 0 10
3 F (M1) 1 1
4 M 0 1
5 F (F2) 0 6
6 F 0 1
Recidivated
1 notRecidivate
2 Recidivate
3 Recidivate
4 notRecidivate
5 Recidivate
6 notRecidivate
r ggplot2 group-by
add a comment |
up vote
0
down vote
favorite
Here are two columns, both of which are factor variables. The first one is the races of inmates, the second one is whether they recidivated or not. And I'd like to plot the rate of recidivism by race. How should I achieve this?
And I have tried this:
df %>%
group_by(race, Recidivated) %>%
summarize(count = n()) %>%
arrange (-count) %>%
ggplot(aes(reorder(race, count, FUN = max),
count, fill = race)) +
geom_col() +
coord_flip() +
scale_fill_manual(values=palette_9_colors) +
theme(legend.position = "none") +
labs(x = "Charge", y = "Count",
title="Recidivism by Rates",
subtitle= "Broward County - Source: Propublica",
caption="UrbanSpatialAnalysis.com") +
plotTheme()
The result is a histogram calculating the number of each races. How can I get a plot that visualizes the rate of recidivism by race? Thank you!!!
Here are some of the data!
> head(df)
sex age age_cat race priors_count two_year_recid
1 Male 69 Greater than 45 Other 0 0
2 Male 34 25 - 45 African-American 0 1
3 Male 24 Less than 25 African-American 4 1
4 Male 44 25 - 45 Other 0 0
5 Male 41 25 - 45 Caucasian 14 1
6 Male 43 25 - 45 Other 3 0
r_charge_desc c_charge_desc
1 Aggravated Assault w/Firearm
2 Felony Battery (Dom Strang) Felony Battery w/Prior Convict
3 Driving Under The Influence Possession of Cocaine
4 Battery
5 Poss of Firearm by Convic Felo Possession Burglary Tools
6 arrest case no charge
c_charge_degree r_charge_degree juv_other_count length_of_stay
1 F 0 1
2 F (F3) 0 10
3 F (M1) 1 1
4 M 0 1
5 F (F2) 0 6
6 F 0 1
Recidivated
1 notRecidivate
2 Recidivate
3 Recidivate
4 notRecidivate
5 Recidivate
6 notRecidivate
r ggplot2 group-by
Could you please post some of your data? Or it may be difficult for us to find a solution for you.
– passiflora
Nov 19 at 2:50
Sure, Thank you for bringing this out!
– Shuhui Huang
Nov 19 at 3:28
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Here are two columns, both of which are factor variables. The first one is the races of inmates, the second one is whether they recidivated or not. And I'd like to plot the rate of recidivism by race. How should I achieve this?
And I have tried this:
df %>%
group_by(race, Recidivated) %>%
summarize(count = n()) %>%
arrange (-count) %>%
ggplot(aes(reorder(race, count, FUN = max),
count, fill = race)) +
geom_col() +
coord_flip() +
scale_fill_manual(values=palette_9_colors) +
theme(legend.position = "none") +
labs(x = "Charge", y = "Count",
title="Recidivism by Rates",
subtitle= "Broward County - Source: Propublica",
caption="UrbanSpatialAnalysis.com") +
plotTheme()
The result is a histogram calculating the number of each races. How can I get a plot that visualizes the rate of recidivism by race? Thank you!!!
Here are some of the data!
> head(df)
sex age age_cat race priors_count two_year_recid
1 Male 69 Greater than 45 Other 0 0
2 Male 34 25 - 45 African-American 0 1
3 Male 24 Less than 25 African-American 4 1
4 Male 44 25 - 45 Other 0 0
5 Male 41 25 - 45 Caucasian 14 1
6 Male 43 25 - 45 Other 3 0
r_charge_desc c_charge_desc
1 Aggravated Assault w/Firearm
2 Felony Battery (Dom Strang) Felony Battery w/Prior Convict
3 Driving Under The Influence Possession of Cocaine
4 Battery
5 Poss of Firearm by Convic Felo Possession Burglary Tools
6 arrest case no charge
c_charge_degree r_charge_degree juv_other_count length_of_stay
1 F 0 1
2 F (F3) 0 10
3 F (M1) 1 1
4 M 0 1
5 F (F2) 0 6
6 F 0 1
Recidivated
1 notRecidivate
2 Recidivate
3 Recidivate
4 notRecidivate
5 Recidivate
6 notRecidivate
r ggplot2 group-by
Here are two columns, both of which are factor variables. The first one is the races of inmates, the second one is whether they recidivated or not. And I'd like to plot the rate of recidivism by race. How should I achieve this?
And I have tried this:
df %>%
group_by(race, Recidivated) %>%
summarize(count = n()) %>%
arrange (-count) %>%
ggplot(aes(reorder(race, count, FUN = max),
count, fill = race)) +
geom_col() +
coord_flip() +
scale_fill_manual(values=palette_9_colors) +
theme(legend.position = "none") +
labs(x = "Charge", y = "Count",
title="Recidivism by Rates",
subtitle= "Broward County - Source: Propublica",
caption="UrbanSpatialAnalysis.com") +
plotTheme()
The result is a histogram calculating the number of each races. How can I get a plot that visualizes the rate of recidivism by race? Thank you!!!
Here are some of the data!
> head(df)
sex age age_cat race priors_count two_year_recid
1 Male 69 Greater than 45 Other 0 0
2 Male 34 25 - 45 African-American 0 1
3 Male 24 Less than 25 African-American 4 1
4 Male 44 25 - 45 Other 0 0
5 Male 41 25 - 45 Caucasian 14 1
6 Male 43 25 - 45 Other 3 0
r_charge_desc c_charge_desc
1 Aggravated Assault w/Firearm
2 Felony Battery (Dom Strang) Felony Battery w/Prior Convict
3 Driving Under The Influence Possession of Cocaine
4 Battery
5 Poss of Firearm by Convic Felo Possession Burglary Tools
6 arrest case no charge
c_charge_degree r_charge_degree juv_other_count length_of_stay
1 F 0 1
2 F (F3) 0 10
3 F (M1) 1 1
4 M 0 1
5 F (F2) 0 6
6 F 0 1
Recidivated
1 notRecidivate
2 Recidivate
3 Recidivate
4 notRecidivate
5 Recidivate
6 notRecidivate
r ggplot2 group-by
r ggplot2 group-by
edited Nov 19 at 3:27
asked Nov 19 at 1:09
Shuhui Huang
11
11
Could you please post some of your data? Or it may be difficult for us to find a solution for you.
– passiflora
Nov 19 at 2:50
Sure, Thank you for bringing this out!
– Shuhui Huang
Nov 19 at 3:28
add a comment |
Could you please post some of your data? Or it may be difficult for us to find a solution for you.
– passiflora
Nov 19 at 2:50
Sure, Thank you for bringing this out!
– Shuhui Huang
Nov 19 at 3:28
Could you please post some of your data? Or it may be difficult for us to find a solution for you.
– passiflora
Nov 19 at 2:50
Could you please post some of your data? Or it may be difficult for us to find a solution for you.
– passiflora
Nov 19 at 2:50
Sure, Thank you for bringing this out!
– Shuhui Huang
Nov 19 at 3:28
Sure, Thank you for bringing this out!
– Shuhui Huang
Nov 19 at 3:28
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
race <- sample(c("A", "B", "C", "D"), size = 100, replace = T)
recidivated <- sample(c(TRUE, FALSE), size = 100, replace = T)
df <- data.frame(race, recidivated)
df %>% group_by(race) %>% summarize(recidRate = mean(recidivated)) %>% ggplot(aes(race, recidRate)) + geom_bar(stat = "identity")
You should use TRUE or FALSE for Recidivated if it's a logical variable, and for logicals, mean() is the proportion of TRUE.
Hope this helps :)
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
race <- sample(c("A", "B", "C", "D"), size = 100, replace = T)
recidivated <- sample(c(TRUE, FALSE), size = 100, replace = T)
df <- data.frame(race, recidivated)
df %>% group_by(race) %>% summarize(recidRate = mean(recidivated)) %>% ggplot(aes(race, recidRate)) + geom_bar(stat = "identity")
You should use TRUE or FALSE for Recidivated if it's a logical variable, and for logicals, mean() is the proportion of TRUE.
Hope this helps :)
add a comment |
up vote
0
down vote
race <- sample(c("A", "B", "C", "D"), size = 100, replace = T)
recidivated <- sample(c(TRUE, FALSE), size = 100, replace = T)
df <- data.frame(race, recidivated)
df %>% group_by(race) %>% summarize(recidRate = mean(recidivated)) %>% ggplot(aes(race, recidRate)) + geom_bar(stat = "identity")
You should use TRUE or FALSE for Recidivated if it's a logical variable, and for logicals, mean() is the proportion of TRUE.
Hope this helps :)
add a comment |
up vote
0
down vote
up vote
0
down vote
race <- sample(c("A", "B", "C", "D"), size = 100, replace = T)
recidivated <- sample(c(TRUE, FALSE), size = 100, replace = T)
df <- data.frame(race, recidivated)
df %>% group_by(race) %>% summarize(recidRate = mean(recidivated)) %>% ggplot(aes(race, recidRate)) + geom_bar(stat = "identity")
You should use TRUE or FALSE for Recidivated if it's a logical variable, and for logicals, mean() is the proportion of TRUE.
Hope this helps :)
race <- sample(c("A", "B", "C", "D"), size = 100, replace = T)
recidivated <- sample(c(TRUE, FALSE), size = 100, replace = T)
df <- data.frame(race, recidivated)
df %>% group_by(race) %>% summarize(recidRate = mean(recidivated)) %>% ggplot(aes(race, recidRate)) + geom_bar(stat = "identity")
You should use TRUE or FALSE for Recidivated if it's a logical variable, and for logicals, mean() is the proportion of TRUE.
Hope this helps :)
answered Nov 19 at 8:02
passiflora
1187
1187
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53367102%2fcalculating-the-rate-of-each-category-in-a-factor-variable-by-categories-in-ano%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Could you please post some of your data? Or it may be difficult for us to find a solution for you.
– passiflora
Nov 19 at 2:50
Sure, Thank you for bringing this out!
– Shuhui Huang
Nov 19 at 3:28