Calculating the rate of each category in a factor variable, by categories in another factor











up vote
0
down vote

favorite












Here are two columns, both of which are factor variables. The first one is the races of inmates, the second one is whether they recidivated or not. And I'd like to plot the rate of recidivism by race. How should I achieve this?



And I have tried this:



df %>%
group_by(race, Recidivated) %>%
summarize(count = n()) %>%
arrange (-count) %>%
ggplot(aes(reorder(race, count, FUN = max),
count, fill = race)) +
geom_col() +
coord_flip() +
scale_fill_manual(values=palette_9_colors) +
theme(legend.position = "none") +
labs(x = "Charge", y = "Count",
title="Recidivism by Rates",
subtitle= "Broward County - Source: Propublica",
caption="UrbanSpatialAnalysis.com") +
plotTheme()


The result is a histogram calculating the number of each races. How can I get a plot that visualizes the rate of recidivism by race? Thank you!!!



Here are some of the data!



    > head(df)
sex age age_cat race priors_count two_year_recid
1 Male 69 Greater than 45 Other 0 0
2 Male 34 25 - 45 African-American 0 1
3 Male 24 Less than 25 African-American 4 1
4 Male 44 25 - 45 Other 0 0
5 Male 41 25 - 45 Caucasian 14 1
6 Male 43 25 - 45 Other 3 0
r_charge_desc c_charge_desc
1 Aggravated Assault w/Firearm
2 Felony Battery (Dom Strang) Felony Battery w/Prior Convict
3 Driving Under The Influence Possession of Cocaine
4 Battery
5 Poss of Firearm by Convic Felo Possession Burglary Tools
6 arrest case no charge
c_charge_degree r_charge_degree juv_other_count length_of_stay
1 F 0 1
2 F (F3) 0 10
3 F (M1) 1 1
4 M 0 1
5 F (F2) 0 6
6 F 0 1
Recidivated
1 notRecidivate
2 Recidivate
3 Recidivate
4 notRecidivate
5 Recidivate
6 notRecidivate









share|improve this question
























  • Could you please post some of your data? Or it may be difficult for us to find a solution for you.
    – passiflora
    Nov 19 at 2:50










  • Sure, Thank you for bringing this out!
    – Shuhui Huang
    Nov 19 at 3:28















up vote
0
down vote

favorite












Here are two columns, both of which are factor variables. The first one is the races of inmates, the second one is whether they recidivated or not. And I'd like to plot the rate of recidivism by race. How should I achieve this?



And I have tried this:



df %>%
group_by(race, Recidivated) %>%
summarize(count = n()) %>%
arrange (-count) %>%
ggplot(aes(reorder(race, count, FUN = max),
count, fill = race)) +
geom_col() +
coord_flip() +
scale_fill_manual(values=palette_9_colors) +
theme(legend.position = "none") +
labs(x = "Charge", y = "Count",
title="Recidivism by Rates",
subtitle= "Broward County - Source: Propublica",
caption="UrbanSpatialAnalysis.com") +
plotTheme()


The result is a histogram calculating the number of each races. How can I get a plot that visualizes the rate of recidivism by race? Thank you!!!



Here are some of the data!



    > head(df)
sex age age_cat race priors_count two_year_recid
1 Male 69 Greater than 45 Other 0 0
2 Male 34 25 - 45 African-American 0 1
3 Male 24 Less than 25 African-American 4 1
4 Male 44 25 - 45 Other 0 0
5 Male 41 25 - 45 Caucasian 14 1
6 Male 43 25 - 45 Other 3 0
r_charge_desc c_charge_desc
1 Aggravated Assault w/Firearm
2 Felony Battery (Dom Strang) Felony Battery w/Prior Convict
3 Driving Under The Influence Possession of Cocaine
4 Battery
5 Poss of Firearm by Convic Felo Possession Burglary Tools
6 arrest case no charge
c_charge_degree r_charge_degree juv_other_count length_of_stay
1 F 0 1
2 F (F3) 0 10
3 F (M1) 1 1
4 M 0 1
5 F (F2) 0 6
6 F 0 1
Recidivated
1 notRecidivate
2 Recidivate
3 Recidivate
4 notRecidivate
5 Recidivate
6 notRecidivate









share|improve this question
























  • Could you please post some of your data? Or it may be difficult for us to find a solution for you.
    – passiflora
    Nov 19 at 2:50










  • Sure, Thank you for bringing this out!
    – Shuhui Huang
    Nov 19 at 3:28













up vote
0
down vote

favorite









up vote
0
down vote

favorite











Here are two columns, both of which are factor variables. The first one is the races of inmates, the second one is whether they recidivated or not. And I'd like to plot the rate of recidivism by race. How should I achieve this?



And I have tried this:



df %>%
group_by(race, Recidivated) %>%
summarize(count = n()) %>%
arrange (-count) %>%
ggplot(aes(reorder(race, count, FUN = max),
count, fill = race)) +
geom_col() +
coord_flip() +
scale_fill_manual(values=palette_9_colors) +
theme(legend.position = "none") +
labs(x = "Charge", y = "Count",
title="Recidivism by Rates",
subtitle= "Broward County - Source: Propublica",
caption="UrbanSpatialAnalysis.com") +
plotTheme()


The result is a histogram calculating the number of each races. How can I get a plot that visualizes the rate of recidivism by race? Thank you!!!



Here are some of the data!



    > head(df)
sex age age_cat race priors_count two_year_recid
1 Male 69 Greater than 45 Other 0 0
2 Male 34 25 - 45 African-American 0 1
3 Male 24 Less than 25 African-American 4 1
4 Male 44 25 - 45 Other 0 0
5 Male 41 25 - 45 Caucasian 14 1
6 Male 43 25 - 45 Other 3 0
r_charge_desc c_charge_desc
1 Aggravated Assault w/Firearm
2 Felony Battery (Dom Strang) Felony Battery w/Prior Convict
3 Driving Under The Influence Possession of Cocaine
4 Battery
5 Poss of Firearm by Convic Felo Possession Burglary Tools
6 arrest case no charge
c_charge_degree r_charge_degree juv_other_count length_of_stay
1 F 0 1
2 F (F3) 0 10
3 F (M1) 1 1
4 M 0 1
5 F (F2) 0 6
6 F 0 1
Recidivated
1 notRecidivate
2 Recidivate
3 Recidivate
4 notRecidivate
5 Recidivate
6 notRecidivate









share|improve this question















Here are two columns, both of which are factor variables. The first one is the races of inmates, the second one is whether they recidivated or not. And I'd like to plot the rate of recidivism by race. How should I achieve this?



And I have tried this:



df %>%
group_by(race, Recidivated) %>%
summarize(count = n()) %>%
arrange (-count) %>%
ggplot(aes(reorder(race, count, FUN = max),
count, fill = race)) +
geom_col() +
coord_flip() +
scale_fill_manual(values=palette_9_colors) +
theme(legend.position = "none") +
labs(x = "Charge", y = "Count",
title="Recidivism by Rates",
subtitle= "Broward County - Source: Propublica",
caption="UrbanSpatialAnalysis.com") +
plotTheme()


The result is a histogram calculating the number of each races. How can I get a plot that visualizes the rate of recidivism by race? Thank you!!!



Here are some of the data!



    > head(df)
sex age age_cat race priors_count two_year_recid
1 Male 69 Greater than 45 Other 0 0
2 Male 34 25 - 45 African-American 0 1
3 Male 24 Less than 25 African-American 4 1
4 Male 44 25 - 45 Other 0 0
5 Male 41 25 - 45 Caucasian 14 1
6 Male 43 25 - 45 Other 3 0
r_charge_desc c_charge_desc
1 Aggravated Assault w/Firearm
2 Felony Battery (Dom Strang) Felony Battery w/Prior Convict
3 Driving Under The Influence Possession of Cocaine
4 Battery
5 Poss of Firearm by Convic Felo Possession Burglary Tools
6 arrest case no charge
c_charge_degree r_charge_degree juv_other_count length_of_stay
1 F 0 1
2 F (F3) 0 10
3 F (M1) 1 1
4 M 0 1
5 F (F2) 0 6
6 F 0 1
Recidivated
1 notRecidivate
2 Recidivate
3 Recidivate
4 notRecidivate
5 Recidivate
6 notRecidivate






r ggplot2 group-by






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 at 3:27

























asked Nov 19 at 1:09









Shuhui Huang

11




11












  • Could you please post some of your data? Or it may be difficult for us to find a solution for you.
    – passiflora
    Nov 19 at 2:50










  • Sure, Thank you for bringing this out!
    – Shuhui Huang
    Nov 19 at 3:28


















  • Could you please post some of your data? Or it may be difficult for us to find a solution for you.
    – passiflora
    Nov 19 at 2:50










  • Sure, Thank you for bringing this out!
    – Shuhui Huang
    Nov 19 at 3:28
















Could you please post some of your data? Or it may be difficult for us to find a solution for you.
– passiflora
Nov 19 at 2:50




Could you please post some of your data? Or it may be difficult for us to find a solution for you.
– passiflora
Nov 19 at 2:50












Sure, Thank you for bringing this out!
– Shuhui Huang
Nov 19 at 3:28




Sure, Thank you for bringing this out!
– Shuhui Huang
Nov 19 at 3:28












1 Answer
1






active

oldest

votes

















up vote
0
down vote













race <- sample(c("A", "B", "C", "D"), size = 100, replace = T)
recidivated <- sample(c(TRUE, FALSE), size = 100, replace = T)
df <- data.frame(race, recidivated)
df %>% group_by(race) %>% summarize(recidRate = mean(recidivated)) %>% ggplot(aes(race, recidRate)) + geom_bar(stat = "identity")


You should use TRUE or FALSE for Recidivated if it's a logical variable, and for logicals, mean() is the proportion of TRUE.



Hope this helps :)






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53367102%2fcalculating-the-rate-of-each-category-in-a-factor-variable-by-categories-in-ano%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    race <- sample(c("A", "B", "C", "D"), size = 100, replace = T)
    recidivated <- sample(c(TRUE, FALSE), size = 100, replace = T)
    df <- data.frame(race, recidivated)
    df %>% group_by(race) %>% summarize(recidRate = mean(recidivated)) %>% ggplot(aes(race, recidRate)) + geom_bar(stat = "identity")


    You should use TRUE or FALSE for Recidivated if it's a logical variable, and for logicals, mean() is the proportion of TRUE.



    Hope this helps :)






    share|improve this answer

























      up vote
      0
      down vote













      race <- sample(c("A", "B", "C", "D"), size = 100, replace = T)
      recidivated <- sample(c(TRUE, FALSE), size = 100, replace = T)
      df <- data.frame(race, recidivated)
      df %>% group_by(race) %>% summarize(recidRate = mean(recidivated)) %>% ggplot(aes(race, recidRate)) + geom_bar(stat = "identity")


      You should use TRUE or FALSE for Recidivated if it's a logical variable, and for logicals, mean() is the proportion of TRUE.



      Hope this helps :)






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        race <- sample(c("A", "B", "C", "D"), size = 100, replace = T)
        recidivated <- sample(c(TRUE, FALSE), size = 100, replace = T)
        df <- data.frame(race, recidivated)
        df %>% group_by(race) %>% summarize(recidRate = mean(recidivated)) %>% ggplot(aes(race, recidRate)) + geom_bar(stat = "identity")


        You should use TRUE or FALSE for Recidivated if it's a logical variable, and for logicals, mean() is the proportion of TRUE.



        Hope this helps :)






        share|improve this answer












        race <- sample(c("A", "B", "C", "D"), size = 100, replace = T)
        recidivated <- sample(c(TRUE, FALSE), size = 100, replace = T)
        df <- data.frame(race, recidivated)
        df %>% group_by(race) %>% summarize(recidRate = mean(recidivated)) %>% ggplot(aes(race, recidRate)) + geom_bar(stat = "identity")


        You should use TRUE or FALSE for Recidivated if it's a logical variable, and for logicals, mean() is the proportion of TRUE.



        Hope this helps :)







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 19 at 8:02









        passiflora

        1187




        1187






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53367102%2fcalculating-the-rate-of-each-category-in-a-factor-variable-by-categories-in-ano%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Create new schema in PostgreSQL using DBeaver

            Deepest pit of an array with Javascript: test on Codility

            Fotorealismo