How to select some % of records from dataframe in python? [duplicate]












1
















This question already has an answer here:




  • Random row selection in Pandas dataframe

    6 answers




I need to select some % of records from my dataframe for my analysis, lets say 33% of record I need to select from my dataframe, which has 100 records(as an example). I need to select randomly 33 records from my dataframe. I tried "random.randint", but this is not giving exactly 33% of records, it gives approximately 33% of records only.Below is my code:



DF_1['ran'] = [random.randint(0,99)  for k in DF_1.index]

DF_2=DF_1[DF_1['ran']<33]


Do we have any other functions to get exact % of records from dataframe?. Thank you in advance. Alex










share|improve this question















marked as duplicate by jezrael dataframe
Users with the  dataframe badge can single-handedly close dataframe questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 9:54


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • df.sample() ?

    – Patrick Artner
    Nov 26 '18 at 9:45


















1
















This question already has an answer here:




  • Random row selection in Pandas dataframe

    6 answers




I need to select some % of records from my dataframe for my analysis, lets say 33% of record I need to select from my dataframe, which has 100 records(as an example). I need to select randomly 33 records from my dataframe. I tried "random.randint", but this is not giving exactly 33% of records, it gives approximately 33% of records only.Below is my code:



DF_1['ran'] = [random.randint(0,99)  for k in DF_1.index]

DF_2=DF_1[DF_1['ran']<33]


Do we have any other functions to get exact % of records from dataframe?. Thank you in advance. Alex










share|improve this question















marked as duplicate by jezrael dataframe
Users with the  dataframe badge can single-handedly close dataframe questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 9:54


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.



















  • df.sample() ?

    – Patrick Artner
    Nov 26 '18 at 9:45
















1












1








1









This question already has an answer here:




  • Random row selection in Pandas dataframe

    6 answers




I need to select some % of records from my dataframe for my analysis, lets say 33% of record I need to select from my dataframe, which has 100 records(as an example). I need to select randomly 33 records from my dataframe. I tried "random.randint", but this is not giving exactly 33% of records, it gives approximately 33% of records only.Below is my code:



DF_1['ran'] = [random.randint(0,99)  for k in DF_1.index]

DF_2=DF_1[DF_1['ran']<33]


Do we have any other functions to get exact % of records from dataframe?. Thank you in advance. Alex










share|improve this question

















This question already has an answer here:




  • Random row selection in Pandas dataframe

    6 answers




I need to select some % of records from my dataframe for my analysis, lets say 33% of record I need to select from my dataframe, which has 100 records(as an example). I need to select randomly 33 records from my dataframe. I tried "random.randint", but this is not giving exactly 33% of records, it gives approximately 33% of records only.Below is my code:



DF_1['ran'] = [random.randint(0,99)  for k in DF_1.index]

DF_2=DF_1[DF_1['ran']<33]


Do we have any other functions to get exact % of records from dataframe?. Thank you in advance. Alex





This question already has an answer here:




  • Random row selection in Pandas dataframe

    6 answers








python python-3.x pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 26 '18 at 9:51









jpp

102k2165116




102k2165116










asked Nov 26 '18 at 9:41









AlexsanderAlexsander

5810




5810




marked as duplicate by jezrael dataframe
Users with the  dataframe badge can single-handedly close dataframe questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 9:54


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.









marked as duplicate by jezrael dataframe
Users with the  dataframe badge can single-handedly close dataframe questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 9:54


This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.















  • df.sample() ?

    – Patrick Artner
    Nov 26 '18 at 9:45





















  • df.sample() ?

    – Patrick Artner
    Nov 26 '18 at 9:45



















df.sample() ?

– Patrick Artner
Nov 26 '18 at 9:45







df.sample() ?

– Patrick Artner
Nov 26 '18 at 9:45














1 Answer
1






active

oldest

votes


















2














randint in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.



With the random module, you can use random.sample, which gives a sample without replacement:



from random import sample

num = int(len(Mission_3_0A.index) * 0.33) # e.g. for 33%
indices = sample(Mission_3_0A.index, k=num)
DF_2 = DF_1.loc[indices].copy()


With NumPy, you can use np.random.choice, specifying replace=False:



indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)
DF_2 = DF_1.loc[indices].copy()


Most idiomatic is to use pd.DataFrame.sample:



DF_2 = DF_1.sample(n=num)     # absolute number
DF_2 = DF_1.sample(frac=1/3) # give fraction (floored if not whole)





share|improve this answer
































    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    randint in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.



    With the random module, you can use random.sample, which gives a sample without replacement:



    from random import sample

    num = int(len(Mission_3_0A.index) * 0.33) # e.g. for 33%
    indices = sample(Mission_3_0A.index, k=num)
    DF_2 = DF_1.loc[indices].copy()


    With NumPy, you can use np.random.choice, specifying replace=False:



    indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)
    DF_2 = DF_1.loc[indices].copy()


    Most idiomatic is to use pd.DataFrame.sample:



    DF_2 = DF_1.sample(n=num)     # absolute number
    DF_2 = DF_1.sample(frac=1/3) # give fraction (floored if not whole)





    share|improve this answer






























      2














      randint in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.



      With the random module, you can use random.sample, which gives a sample without replacement:



      from random import sample

      num = int(len(Mission_3_0A.index) * 0.33) # e.g. for 33%
      indices = sample(Mission_3_0A.index, k=num)
      DF_2 = DF_1.loc[indices].copy()


      With NumPy, you can use np.random.choice, specifying replace=False:



      indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)
      DF_2 = DF_1.loc[indices].copy()


      Most idiomatic is to use pd.DataFrame.sample:



      DF_2 = DF_1.sample(n=num)     # absolute number
      DF_2 = DF_1.sample(frac=1/3) # give fraction (floored if not whole)





      share|improve this answer




























        2












        2








        2







        randint in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.



        With the random module, you can use random.sample, which gives a sample without replacement:



        from random import sample

        num = int(len(Mission_3_0A.index) * 0.33) # e.g. for 33%
        indices = sample(Mission_3_0A.index, k=num)
        DF_2 = DF_1.loc[indices].copy()


        With NumPy, you can use np.random.choice, specifying replace=False:



        indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)
        DF_2 = DF_1.loc[indices].copy()


        Most idiomatic is to use pd.DataFrame.sample:



        DF_2 = DF_1.sample(n=num)     # absolute number
        DF_2 = DF_1.sample(frac=1/3) # give fraction (floored if not whole)





        share|improve this answer















        randint in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.



        With the random module, you can use random.sample, which gives a sample without replacement:



        from random import sample

        num = int(len(Mission_3_0A.index) * 0.33) # e.g. for 33%
        indices = sample(Mission_3_0A.index, k=num)
        DF_2 = DF_1.loc[indices].copy()


        With NumPy, you can use np.random.choice, specifying replace=False:



        indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)
        DF_2 = DF_1.loc[indices].copy()


        Most idiomatic is to use pd.DataFrame.sample:



        DF_2 = DF_1.sample(n=num)     # absolute number
        DF_2 = DF_1.sample(frac=1/3) # give fraction (floored if not whole)






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 26 '18 at 10:00

























        answered Nov 26 '18 at 9:45









        jppjpp

        102k2165116




        102k2165116

















            Popular posts from this blog

            Costa Masnaga

            Fotorealismo

            Sidney Franklin