How to select some % of records from dataframe in python? [duplicate]
This question already has an answer here:
Random row selection in Pandas dataframe
6 answers
I need to select some % of records from my dataframe for my analysis, lets say 33% of record I need to select from my dataframe, which has 100 records(as an example). I need to select randomly 33 records from my dataframe. I tried "random.randint", but this is not giving exactly 33% of records, it gives approximately 33% of records only.Below is my code:
DF_1['ran'] = [random.randint(0,99) for k in DF_1.index]
DF_2=DF_1[DF_1['ran']<33]
Do we have any other functions to get exact % of records from dataframe?. Thank you in advance. Alex
python python-3.x pandas dataframe
marked as duplicate by jezrael
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 9:54
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
Random row selection in Pandas dataframe
6 answers
I need to select some % of records from my dataframe for my analysis, lets say 33% of record I need to select from my dataframe, which has 100 records(as an example). I need to select randomly 33 records from my dataframe. I tried "random.randint", but this is not giving exactly 33% of records, it gives approximately 33% of records only.Below is my code:
DF_1['ran'] = [random.randint(0,99) for k in DF_1.index]
DF_2=DF_1[DF_1['ran']<33]
Do we have any other functions to get exact % of records from dataframe?. Thank you in advance. Alex
python python-3.x pandas dataframe
marked as duplicate by jezrael
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 9:54
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
df.sample() ?
– Patrick Artner
Nov 26 '18 at 9:45
add a comment |
This question already has an answer here:
Random row selection in Pandas dataframe
6 answers
I need to select some % of records from my dataframe for my analysis, lets say 33% of record I need to select from my dataframe, which has 100 records(as an example). I need to select randomly 33 records from my dataframe. I tried "random.randint", but this is not giving exactly 33% of records, it gives approximately 33% of records only.Below is my code:
DF_1['ran'] = [random.randint(0,99) for k in DF_1.index]
DF_2=DF_1[DF_1['ran']<33]
Do we have any other functions to get exact % of records from dataframe?. Thank you in advance. Alex
python python-3.x pandas dataframe
This question already has an answer here:
Random row selection in Pandas dataframe
6 answers
I need to select some % of records from my dataframe for my analysis, lets say 33% of record I need to select from my dataframe, which has 100 records(as an example). I need to select randomly 33 records from my dataframe. I tried "random.randint", but this is not giving exactly 33% of records, it gives approximately 33% of records only.Below is my code:
DF_1['ran'] = [random.randint(0,99) for k in DF_1.index]
DF_2=DF_1[DF_1['ran']<33]
Do we have any other functions to get exact % of records from dataframe?. Thank you in advance. Alex
This question already has an answer here:
Random row selection in Pandas dataframe
6 answers
python python-3.x pandas dataframe
python python-3.x pandas dataframe
edited Nov 26 '18 at 9:51
jpp
102k2165116
102k2165116
asked Nov 26 '18 at 9:41
AlexsanderAlexsander
5810
5810
marked as duplicate by jezrael
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 9:54
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by jezrael
StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 9:54
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
df.sample() ?
– Patrick Artner
Nov 26 '18 at 9:45
add a comment |
df.sample() ?
– Patrick Artner
Nov 26 '18 at 9:45
df.sample() ?
– Patrick Artner
Nov 26 '18 at 9:45
df.sample() ?
– Patrick Artner
Nov 26 '18 at 9:45
add a comment |
1 Answer
1
active
oldest
votes
randint
in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.
With the random
module, you can use random.sample
, which gives a sample without replacement:
from random import sample
num = int(len(Mission_3_0A.index) * 0.33) # e.g. for 33%
indices = sample(Mission_3_0A.index, k=num)
DF_2 = DF_1.loc[indices].copy()
With NumPy, you can use np.random.choice
, specifying replace=False
:
indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)
DF_2 = DF_1.loc[indices].copy()
Most idiomatic is to use pd.DataFrame.sample
:
DF_2 = DF_1.sample(n=num) # absolute number
DF_2 = DF_1.sample(frac=1/3) # give fraction (floored if not whole)
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
randint
in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.
With the random
module, you can use random.sample
, which gives a sample without replacement:
from random import sample
num = int(len(Mission_3_0A.index) * 0.33) # e.g. for 33%
indices = sample(Mission_3_0A.index, k=num)
DF_2 = DF_1.loc[indices].copy()
With NumPy, you can use np.random.choice
, specifying replace=False
:
indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)
DF_2 = DF_1.loc[indices].copy()
Most idiomatic is to use pd.DataFrame.sample
:
DF_2 = DF_1.sample(n=num) # absolute number
DF_2 = DF_1.sample(frac=1/3) # give fraction (floored if not whole)
add a comment |
randint
in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.
With the random
module, you can use random.sample
, which gives a sample without replacement:
from random import sample
num = int(len(Mission_3_0A.index) * 0.33) # e.g. for 33%
indices = sample(Mission_3_0A.index, k=num)
DF_2 = DF_1.loc[indices].copy()
With NumPy, you can use np.random.choice
, specifying replace=False
:
indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)
DF_2 = DF_1.loc[indices].copy()
Most idiomatic is to use pd.DataFrame.sample
:
DF_2 = DF_1.sample(n=num) # absolute number
DF_2 = DF_1.sample(frac=1/3) # give fraction (floored if not whole)
add a comment |
randint
in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.
With the random
module, you can use random.sample
, which gives a sample without replacement:
from random import sample
num = int(len(Mission_3_0A.index) * 0.33) # e.g. for 33%
indices = sample(Mission_3_0A.index, k=num)
DF_2 = DF_1.loc[indices].copy()
With NumPy, you can use np.random.choice
, specifying replace=False
:
indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)
DF_2 = DF_1.loc[indices].copy()
Most idiomatic is to use pd.DataFrame.sample
:
DF_2 = DF_1.sample(n=num) # absolute number
DF_2 = DF_1.sample(frac=1/3) # give fraction (floored if not whole)
randint
in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.
With the random
module, you can use random.sample
, which gives a sample without replacement:
from random import sample
num = int(len(Mission_3_0A.index) * 0.33) # e.g. for 33%
indices = sample(Mission_3_0A.index, k=num)
DF_2 = DF_1.loc[indices].copy()
With NumPy, you can use np.random.choice
, specifying replace=False
:
indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)
DF_2 = DF_1.loc[indices].copy()
Most idiomatic is to use pd.DataFrame.sample
:
DF_2 = DF_1.sample(n=num) # absolute number
DF_2 = DF_1.sample(frac=1/3) # give fraction (floored if not whole)
edited Nov 26 '18 at 10:00
answered Nov 26 '18 at 9:45
jppjpp
102k2165116
102k2165116
add a comment |
add a comment |
df.sample() ?
– Patrick Artner
Nov 26 '18 at 9:45