How to select some % of records from dataframe in python? [duplicate]

This question already has an answer here:

Random row selection in Pandas dataframe

6 answers

I need to select some % of records from my dataframe for my analysis, lets say 33% of record I need to select from my dataframe, which has 100 records(as an example). I need to select randomly 33 records from my dataframe. I tried "random.randint", but this is not giving exactly 33% of records, it gives approximately 33% of records only.Below is my code:

DF_1['ran'] = [random.randint(0,99)  for k in DF_1.index]



DF_2=DF_1[DF_1['ran']<33]

Do we have any other functions to get exact % of records from dataframe?. Thank you in advance. Alex

edited Nov 26 '18 at 9:51

jpp

102k2165116

asked Nov 26 '18 at 9:41

Alexsander

5810

marked as duplicate by jezrael dataframe
Users with the dataframe badge can single-handedly close dataframe questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 9:54

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

df.sample() ?

– Patrick Artner
Nov 26 '18 at 9:45

add a comment |

This question already has an answer here:

Random row selection in Pandas dataframe

6 answers

DF_1['ran'] = [random.randint(0,99)  for k in DF_1.index]



DF_2=DF_1[DF_1['ran']<33]

Do we have any other functions to get exact % of records from dataframe?. Thank you in advance. Alex

edited Nov 26 '18 at 9:51

jpp

102k2165116

asked Nov 26 '18 at 9:41

Alexsander

5810

marked as duplicate by jezrael dataframe
Users with the dataframe badge can single-handedly close dataframe questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 9:54

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

df.sample() ?

– Patrick Artner
Nov 26 '18 at 9:45

add a comment |

This question already has an answer here:

Random row selection in Pandas dataframe

6 answers

DF_1['ran'] = [random.randint(0,99)  for k in DF_1.index]



DF_2=DF_1[DF_1['ran']<33]

Do we have any other functions to get exact % of records from dataframe?. Thank you in advance. Alex

edited Nov 26 '18 at 9:51

jpp

102k2165116

asked Nov 26 '18 at 9:41

Alexsander

5810

This question already has an answer here:

Random row selection in Pandas dataframe

6 answers

DF_1['ran'] = [random.randint(0,99)  for k in DF_1.index]



DF_2=DF_1[DF_1['ran']<33]

Do we have any other functions to get exact % of records from dataframe?. Thank you in advance. Alex

This question already has an answer here:

Random row selection in Pandas dataframe

6 answers

python python-3.x pandas dataframe

edited Nov 26 '18 at 9:51

jpp

102k2165116

asked Nov 26 '18 at 9:41

Alexsander

5810

edited Nov 26 '18 at 9:51

jpp

102k2165116

asked Nov 26 '18 at 9:41

Alexsander

5810

edited Nov 26 '18 at 9:51

jpp

102k2165116

edited Nov 26 '18 at 9:51

jpp

102k2165116

edited Nov 26 '18 at 9:51

jpp

102k2165116

asked Nov 26 '18 at 9:41

Alexsander

5810

asked Nov 26 '18 at 9:41

Alexsander

5810

asked Nov 26 '18 at 9:41

Alexsander

5810

marked as duplicate by jezrael dataframe
Users with the dataframe badge can single-handedly close dataframe questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 9:54

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by jezrael dataframe
Users with the dataframe badge can single-handedly close dataframe questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
Nov 26 '18 at 9:54

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

df.sample() ?

– Patrick Artner
Nov 26 '18 at 9:45

add a comment |

df.sample() ?

– Patrick Artner
Nov 26 '18 at 9:45

df.sample() ?

– Patrick Artner
Nov 26 '18 at 9:45

add a comment |

1 Answer
1

active

oldest

votes

randint in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.

With the random module, you can use random.sample, which gives a sample without replacement:

from random import sample



num = int(len(Mission_3_0A.index) * 0.33)  # e.g. for 33%

indices = sample(Mission_3_0A.index, k=num)

DF_2 = DF_1.loc[indices].copy()

With NumPy, you can use np.random.choice, specifying replace=False:

indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)

DF_2 = DF_1.loc[indices].copy()

Most idiomatic is to use pd.DataFrame.sample:

DF_2 = DF_1.sample(n=num)     # absolute number

DF_2 = DF_1.sample(frac=1/3)  # give fraction (floored if not whole)

edited Nov 26 '18 at 10:00

answered Nov 26 '18 at 9:45

jpp

102k2165116

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

randint in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.

With the random module, you can use random.sample, which gives a sample without replacement:

from random import sample



num = int(len(Mission_3_0A.index) * 0.33)  # e.g. for 33%

indices = sample(Mission_3_0A.index, k=num)

DF_2 = DF_1.loc[indices].copy()

With NumPy, you can use np.random.choice, specifying replace=False:

indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)

DF_2 = DF_1.loc[indices].copy()

Most idiomatic is to use pd.DataFrame.sample:

DF_2 = DF_1.sample(n=num)     # absolute number

DF_2 = DF_1.sample(frac=1/3)  # give fraction (floored if not whole)

edited Nov 26 '18 at 10:00

answered Nov 26 '18 at 9:45

jpp

102k2165116

add a comment |

randint in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.

With the random module, you can use random.sample, which gives a sample without replacement:

from random import sample



num = int(len(Mission_3_0A.index) * 0.33)  # e.g. for 33%

indices = sample(Mission_3_0A.index, k=num)

DF_2 = DF_1.loc[indices].copy()

With NumPy, you can use np.random.choice, specifying replace=False:

indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)

DF_2 = DF_1.loc[indices].copy()

Most idiomatic is to use pd.DataFrame.sample:

DF_2 = DF_1.sample(n=num)     # absolute number

DF_2 = DF_1.sample(frac=1/3)  # give fraction (floored if not whole)

edited Nov 26 '18 at 10:00

answered Nov 26 '18 at 9:45

jpp

102k2165116

add a comment |

randint in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.

With the random module, you can use random.sample, which gives a sample without replacement:

from random import sample



num = int(len(Mission_3_0A.index) * 0.33)  # e.g. for 33%

indices = sample(Mission_3_0A.index, k=num)

DF_2 = DF_1.loc[indices].copy()

With NumPy, you can use np.random.choice, specifying replace=False:

indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)

DF_2 = DF_1.loc[indices].copy()

Most idiomatic is to use pd.DataFrame.sample:

DF_2 = DF_1.sample(n=num)     # absolute number

DF_2 = DF_1.sample(frac=1/3)  # give fraction (floored if not whole)

edited Nov 26 '18 at 10:00

answered Nov 26 '18 at 9:45

jpp

102k2165116

randint in a list comprehension won't guarantee an even distribution, nor will it guarantee no duplicates.

With the random module, you can use random.sample, which gives a sample without replacement:

from random import sample



num = int(len(Mission_3_0A.index) * 0.33)  # e.g. for 33%

indices = sample(Mission_3_0A.index, k=num)

DF_2 = DF_1.loc[indices].copy()

With NumPy, you can use np.random.choice, specifying replace=False:

indices = np.random.choice(Mission_3_0A.index, size=num, replace=False)

DF_2 = DF_1.loc[indices].copy()

Most idiomatic is to use pd.DataFrame.sample:

DF_2 = DF_1.sample(n=num)     # absolute number

DF_2 = DF_1.sample(frac=1/3)  # give fraction (floored if not whole)

edited Nov 26 '18 at 10:00

answered Nov 26 '18 at 9:45

jpp

102k2165116

edited Nov 26 '18 at 10:00

answered Nov 26 '18 at 9:45

jpp

102k2165116

answered Nov 26 '18 at 9:45

jpp

102k2165116

answered Nov 26 '18 at 9:45

jpp

102k2165116

add a comment |

This page is only for reference, If you need detailed information, please check here

GyZTzZmhFrQqmeSzMzN7NvLmIpWfF yYzV8Op5cXqKsdjQ2mhxsWxv1HIQrOmyKv2A G bSWiynus13u1feFhAqzNGvoJmR Hr

搜尋此網誌

Nsryjdtyk