How to drop rows of Pandas DataFrame whose value in certain columns is NaN
up vote
490
down vote
favorite
I have a DataFrame
:
>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN
Then I just want the records whose EPS
is not NaN
, that is, df.drop(....)
will return the dataframe as below:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
How do I do that?
python pandas dataframe
add a comment |
up vote
490
down vote
favorite
I have a DataFrame
:
>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN
Then I just want the records whose EPS
is not NaN
, that is, df.drop(....)
will return the dataframe as below:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
How do I do that?
python pandas dataframe
17
dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29
120
df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53
add a comment |
up vote
490
down vote
favorite
up vote
490
down vote
favorite
I have a DataFrame
:
>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN
Then I just want the records whose EPS
is not NaN
, that is, df.drop(....)
will return the dataframe as below:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
How do I do that?
python pandas dataframe
I have a DataFrame
:
>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN
Then I just want the records whose EPS
is not NaN
, that is, df.drop(....)
will return the dataframe as below:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
How do I do that?
python pandas dataframe
python pandas dataframe
edited Jan 5 '17 at 17:01
Ninjakannon
2,67642645
2,67642645
asked Nov 16 '12 at 9:17
bigbug
10.9k286184
10.9k286184
17
dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29
120
df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53
add a comment |
17
dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29
120
df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53
17
17
dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29
dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29
120
120
df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53
df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53
add a comment |
11 Answers
11
active
oldest
votes
up vote
388
down vote
accepted
Don't drop
. Just take rows where EPS
is finite:
df = df[np.isfinite(df['EPS'])]
352
I'd recommend usingpandas.notnull
instead ofnp.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
8
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
9
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
2
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
4
@PhilippSchwarz This error occurs if the column (EPS
in the example) contains strings or other types that cannot be digested bynp.isfinite()
. I recommend to usepandas.notnull()
that will handle this more generously.
– normanius
Apr 5 at 10:02
|
show 2 more comments
up vote
653
down vote
This question is already resolved, but...
...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna()
, is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.
In [24]: df = pd.DataFrame(np.random.randn(10,3))
In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;
In [26]: df
Out[26]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [27]: df.dropna() #drop all rows that have any NaN values
Out[27]:
0 1 2
1 2.677677 -1.466923 -0.750366
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
In [28]: df.dropna(how='all') #drop only if ALL columns are NaN
Out[28]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [29]: df.dropna(thresh=2) #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
In [30]: df.dropna(subset=[1]) #Drop only if NaN in specific column (as asked in the question)
Out[30]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.
Pretty handy!
177
you can also usedf.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07
8
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
2
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
1
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 at 21:51
add a comment |
up vote
86
down vote
I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:
import pandas as pd
df = df[pd.notnull(df['EPS'])]
7
Actually, the specific answer would be:df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53
2
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
add a comment |
up vote
31
down vote
You can use this:
df.dropna(subset=['EPS'], how='all', inplace = True)
9
how='all'
is redundant here, because you subsetting dataframe only with one field so both'all'
and'any'
will have the same effect.
– Anton Protopopov
Jan 16 at 12:41
add a comment |
up vote
22
down vote
Simplest of all solutions:
filtered_df = df[df['EPS'].notnull()]
The above solution is way better than using np.isfinite()
add a comment |
up vote
19
down vote
You could use dataframe method notnull or inverse of isnull, or numpy.isnan:
In [332]: df[df.EPS.notnull()]
Out[332]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [334]: df[~df.EPS.isnull()]
Out[334]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [347]: df[~np.isnan(df.EPS)]
Out[347]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
add a comment |
up vote
8
down vote
yet another solution which uses the fact that np.nan != np.nan
:
In [149]: df.query("EPS == EPS")
Out[149]:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
add a comment |
up vote
2
down vote
Or (check for NaN's with isnull
, then use ~
to make the opposite to no NaN's):
df=df[~df['EPS'].isnull()]
Now:
print(df)
Is:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
add a comment |
up vote
1
down vote
you can use dropna
Example
Drop the rows where at least one element is missing.
df=df.dropna()
Define in which columns to look for missing values.
df=df.dropna(subset=['column1', 'column1'])
See this for more examples
Note: axis parameter of dropna is deprecated since version 0.23.0:
add a comment |
up vote
0
down vote
It may be added at that '&' can be used to add additional conditions e.g.
df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
Notice that when evaluating the statements, pandas needs parenthesis.
Sorry, but OP want someting else. Btw, your code is wrong, returnValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis -df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52
add a comment |
up vote
-1
down vote
For some reason none of the previously submitted answers worked for me. This basic solution did:
df = df[df.EPS >= 0]
Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.
df = df[df.EPS <= 0]
add a comment |
protected by jezrael Mar 16 '16 at 11:53
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
11 Answers
11
active
oldest
votes
11 Answers
11
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
388
down vote
accepted
Don't drop
. Just take rows where EPS
is finite:
df = df[np.isfinite(df['EPS'])]
352
I'd recommend usingpandas.notnull
instead ofnp.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
8
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
9
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
2
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
4
@PhilippSchwarz This error occurs if the column (EPS
in the example) contains strings or other types that cannot be digested bynp.isfinite()
. I recommend to usepandas.notnull()
that will handle this more generously.
– normanius
Apr 5 at 10:02
|
show 2 more comments
up vote
388
down vote
accepted
Don't drop
. Just take rows where EPS
is finite:
df = df[np.isfinite(df['EPS'])]
352
I'd recommend usingpandas.notnull
instead ofnp.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
8
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
9
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
2
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
4
@PhilippSchwarz This error occurs if the column (EPS
in the example) contains strings or other types that cannot be digested bynp.isfinite()
. I recommend to usepandas.notnull()
that will handle this more generously.
– normanius
Apr 5 at 10:02
|
show 2 more comments
up vote
388
down vote
accepted
up vote
388
down vote
accepted
Don't drop
. Just take rows where EPS
is finite:
df = df[np.isfinite(df['EPS'])]
Don't drop
. Just take rows where EPS
is finite:
df = df[np.isfinite(df['EPS'])]
answered Nov 16 '12 at 9:34
eumiro
125k18223228
125k18223228
352
I'd recommend usingpandas.notnull
instead ofnp.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
8
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
9
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
2
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
4
@PhilippSchwarz This error occurs if the column (EPS
in the example) contains strings or other types that cannot be digested bynp.isfinite()
. I recommend to usepandas.notnull()
that will handle this more generously.
– normanius
Apr 5 at 10:02
|
show 2 more comments
352
I'd recommend usingpandas.notnull
instead ofnp.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
8
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
9
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
2
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
4
@PhilippSchwarz This error occurs if the column (EPS
in the example) contains strings or other types that cannot be digested bynp.isfinite()
. I recommend to usepandas.notnull()
that will handle this more generously.
– normanius
Apr 5 at 10:02
352
352
I'd recommend using
pandas.notnull
instead of np.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
I'd recommend using
pandas.notnull
instead of np.isfinite
– Wes McKinney
Nov 21 '12 at 3:08
8
8
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15
9
9
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18
2
2
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53
4
4
@PhilippSchwarz This error occurs if the column (
EPS
in the example) contains strings or other types that cannot be digested by np.isfinite()
. I recommend to use pandas.notnull()
that will handle this more generously.– normanius
Apr 5 at 10:02
@PhilippSchwarz This error occurs if the column (
EPS
in the example) contains strings or other types that cannot be digested by np.isfinite()
. I recommend to use pandas.notnull()
that will handle this more generously.– normanius
Apr 5 at 10:02
|
show 2 more comments
up vote
653
down vote
This question is already resolved, but...
...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna()
, is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.
In [24]: df = pd.DataFrame(np.random.randn(10,3))
In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;
In [26]: df
Out[26]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [27]: df.dropna() #drop all rows that have any NaN values
Out[27]:
0 1 2
1 2.677677 -1.466923 -0.750366
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
In [28]: df.dropna(how='all') #drop only if ALL columns are NaN
Out[28]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [29]: df.dropna(thresh=2) #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
In [30]: df.dropna(subset=[1]) #Drop only if NaN in specific column (as asked in the question)
Out[30]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.
Pretty handy!
177
you can also usedf.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07
8
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
2
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
1
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 at 21:51
add a comment |
up vote
653
down vote
This question is already resolved, but...
...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna()
, is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.
In [24]: df = pd.DataFrame(np.random.randn(10,3))
In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;
In [26]: df
Out[26]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [27]: df.dropna() #drop all rows that have any NaN values
Out[27]:
0 1 2
1 2.677677 -1.466923 -0.750366
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
In [28]: df.dropna(how='all') #drop only if ALL columns are NaN
Out[28]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [29]: df.dropna(thresh=2) #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
In [30]: df.dropna(subset=[1]) #Drop only if NaN in specific column (as asked in the question)
Out[30]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.
Pretty handy!
177
you can also usedf.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07
8
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
2
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
1
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 at 21:51
add a comment |
up vote
653
down vote
up vote
653
down vote
This question is already resolved, but...
...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna()
, is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.
In [24]: df = pd.DataFrame(np.random.randn(10,3))
In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;
In [26]: df
Out[26]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [27]: df.dropna() #drop all rows that have any NaN values
Out[27]:
0 1 2
1 2.677677 -1.466923 -0.750366
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
In [28]: df.dropna(how='all') #drop only if ALL columns are NaN
Out[28]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [29]: df.dropna(thresh=2) #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
In [30]: df.dropna(subset=[1]) #Drop only if NaN in specific column (as asked in the question)
Out[30]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.
Pretty handy!
This question is already resolved, but...
...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna()
, is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.
In [24]: df = pd.DataFrame(np.random.randn(10,3))
In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;
In [26]: df
Out[26]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [27]: df.dropna() #drop all rows that have any NaN values
Out[27]:
0 1 2
1 2.677677 -1.466923 -0.750366
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
In [28]: df.dropna(how='all') #drop only if ALL columns are NaN
Out[28]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN
In [29]: df.dropna(thresh=2) #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
In [30]: df.dropna(subset=[1]) #Drop only if NaN in specific column (as asked in the question)
Out[30]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN
There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.
Pretty handy!
edited Aug 14 '17 at 0:04
ayhan
35.6k66397
35.6k66397
answered Nov 17 '12 at 20:27
Aman
23.3k62435
23.3k62435
177
you can also usedf.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07
8
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
2
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
1
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 at 21:51
add a comment |
177
you can also usedf.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07
8
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
2
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
1
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 at 21:51
177
177
you can also use
df.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1– James Tobin
Jun 18 '14 at 14:07
you can also use
df.dropna(subset = ['column_name'])
. Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1– James Tobin
Jun 18 '14 at 14:07
8
8
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52
2
2
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10
1
1
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 at 21:51
isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 at 21:51
add a comment |
up vote
86
down vote
I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:
import pandas as pd
df = df[pd.notnull(df['EPS'])]
7
Actually, the specific answer would be:df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53
2
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
add a comment |
up vote
86
down vote
I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:
import pandas as pd
df = df[pd.notnull(df['EPS'])]
7
Actually, the specific answer would be:df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53
2
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
add a comment |
up vote
86
down vote
up vote
86
down vote
I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:
import pandas as pd
df = df[pd.notnull(df['EPS'])]
I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:
import pandas as pd
df = df[pd.notnull(df['EPS'])]
answered Apr 23 '14 at 5:37
Kirk Hadley
1,00672
1,00672
7
Actually, the specific answer would be:df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53
2
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
add a comment |
7
Actually, the specific answer would be:df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53
2
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
7
7
Actually, the specific answer would be:
df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)– joris
Apr 23 '14 at 12:53
Actually, the specific answer would be:
df.dropna(subset=['EPS'])
(based on the general description of Aman, of course this does also work)– joris
Apr 23 '14 at 12:53
2
2
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.– fantabolous
Jul 9 '14 at 3:24
notnull
is also what Wes (author of Pandas) suggested in his comment on another answer.– fantabolous
Jul 9 '14 at 3:24
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03
add a comment |
up vote
31
down vote
You can use this:
df.dropna(subset=['EPS'], how='all', inplace = True)
9
how='all'
is redundant here, because you subsetting dataframe only with one field so both'all'
and'any'
will have the same effect.
– Anton Protopopov
Jan 16 at 12:41
add a comment |
up vote
31
down vote
You can use this:
df.dropna(subset=['EPS'], how='all', inplace = True)
9
how='all'
is redundant here, because you subsetting dataframe only with one field so both'all'
and'any'
will have the same effect.
– Anton Protopopov
Jan 16 at 12:41
add a comment |
up vote
31
down vote
up vote
31
down vote
You can use this:
df.dropna(subset=['EPS'], how='all', inplace = True)
You can use this:
df.dropna(subset=['EPS'], how='all', inplace = True)
edited Aug 21 '17 at 9:49
Mojtaba Khodadadi
56447
56447
answered Aug 2 '17 at 16:28
Joe
5,80421129
5,80421129
9
how='all'
is redundant here, because you subsetting dataframe only with one field so both'all'
and'any'
will have the same effect.
– Anton Protopopov
Jan 16 at 12:41
add a comment |
9
how='all'
is redundant here, because you subsetting dataframe only with one field so both'all'
and'any'
will have the same effect.
– Anton Protopopov
Jan 16 at 12:41
9
9
how='all'
is redundant here, because you subsetting dataframe only with one field so both 'all'
and 'any'
will have the same effect.– Anton Protopopov
Jan 16 at 12:41
how='all'
is redundant here, because you subsetting dataframe only with one field so both 'all'
and 'any'
will have the same effect.– Anton Protopopov
Jan 16 at 12:41
add a comment |
up vote
22
down vote
Simplest of all solutions:
filtered_df = df[df['EPS'].notnull()]
The above solution is way better than using np.isfinite()
add a comment |
up vote
22
down vote
Simplest of all solutions:
filtered_df = df[df['EPS'].notnull()]
The above solution is way better than using np.isfinite()
add a comment |
up vote
22
down vote
up vote
22
down vote
Simplest of all solutions:
filtered_df = df[df['EPS'].notnull()]
The above solution is way better than using np.isfinite()
Simplest of all solutions:
filtered_df = df[df['EPS'].notnull()]
The above solution is way better than using np.isfinite()
edited Aug 8 at 15:17
ayhan
35.6k66397
35.6k66397
answered Nov 23 '17 at 12:08
Gil Baggio
2,2371420
2,2371420
add a comment |
add a comment |
up vote
19
down vote
You could use dataframe method notnull or inverse of isnull, or numpy.isnan:
In [332]: df[df.EPS.notnull()]
Out[332]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [334]: df[~df.EPS.isnull()]
Out[334]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [347]: df[~np.isnan(df.EPS)]
Out[347]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
add a comment |
up vote
19
down vote
You could use dataframe method notnull or inverse of isnull, or numpy.isnan:
In [332]: df[df.EPS.notnull()]
Out[332]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [334]: df[~df.EPS.isnull()]
Out[334]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [347]: df[~np.isnan(df.EPS)]
Out[347]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
add a comment |
up vote
19
down vote
up vote
19
down vote
You could use dataframe method notnull or inverse of isnull, or numpy.isnan:
In [332]: df[df.EPS.notnull()]
Out[332]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [334]: df[~df.EPS.isnull()]
Out[334]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [347]: df[~np.isnan(df.EPS)]
Out[347]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
You could use dataframe method notnull or inverse of isnull, or numpy.isnan:
In [332]: df[df.EPS.notnull()]
Out[332]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [334]: df[~df.EPS.isnull()]
Out[334]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
In [347]: df[~np.isnan(df.EPS)]
Out[347]:
STK_ID RPT_Date STK_ID.1 EPS cash
2 600016 20111231 600016 4.3 NaN
4 601939 20111231 601939 2.5 NaN
answered Dec 4 '15 at 7:01
Anton Protopopov
14.4k34657
14.4k34657
add a comment |
add a comment |
up vote
8
down vote
yet another solution which uses the fact that np.nan != np.nan
:
In [149]: df.query("EPS == EPS")
Out[149]:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
add a comment |
up vote
8
down vote
yet another solution which uses the fact that np.nan != np.nan
:
In [149]: df.query("EPS == EPS")
Out[149]:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
add a comment |
up vote
8
down vote
up vote
8
down vote
yet another solution which uses the fact that np.nan != np.nan
:
In [149]: df.query("EPS == EPS")
Out[149]:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
yet another solution which uses the fact that np.nan != np.nan
:
In [149]: df.query("EPS == EPS")
Out[149]:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
answered Apr 20 '17 at 21:15
MaxU
118k11105163
118k11105163
add a comment |
add a comment |
up vote
2
down vote
Or (check for NaN's with isnull
, then use ~
to make the opposite to no NaN's):
df=df[~df['EPS'].isnull()]
Now:
print(df)
Is:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
add a comment |
up vote
2
down vote
Or (check for NaN's with isnull
, then use ~
to make the opposite to no NaN's):
df=df[~df['EPS'].isnull()]
Now:
print(df)
Is:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
add a comment |
up vote
2
down vote
up vote
2
down vote
Or (check for NaN's with isnull
, then use ~
to make the opposite to no NaN's):
df=df[~df['EPS'].isnull()]
Now:
print(df)
Is:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
Or (check for NaN's with isnull
, then use ~
to make the opposite to no NaN's):
df=df[~df['EPS'].isnull()]
Now:
print(df)
Is:
STK_ID EPS cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN
answered Oct 18 at 23:55
U9-Forward
10.5k2834
10.5k2834
add a comment |
add a comment |
up vote
1
down vote
you can use dropna
Example
Drop the rows where at least one element is missing.
df=df.dropna()
Define in which columns to look for missing values.
df=df.dropna(subset=['column1', 'column1'])
See this for more examples
Note: axis parameter of dropna is deprecated since version 0.23.0:
add a comment |
up vote
1
down vote
you can use dropna
Example
Drop the rows where at least one element is missing.
df=df.dropna()
Define in which columns to look for missing values.
df=df.dropna(subset=['column1', 'column1'])
See this for more examples
Note: axis parameter of dropna is deprecated since version 0.23.0:
add a comment |
up vote
1
down vote
up vote
1
down vote
you can use dropna
Example
Drop the rows where at least one element is missing.
df=df.dropna()
Define in which columns to look for missing values.
df=df.dropna(subset=['column1', 'column1'])
See this for more examples
Note: axis parameter of dropna is deprecated since version 0.23.0:
you can use dropna
Example
Drop the rows where at least one element is missing.
df=df.dropna()
Define in which columns to look for missing values.
df=df.dropna(subset=['column1', 'column1'])
See this for more examples
Note: axis parameter of dropna is deprecated since version 0.23.0:
answered Oct 14 at 19:26
Umer
570415
570415
add a comment |
add a comment |
up vote
0
down vote
It may be added at that '&' can be used to add additional conditions e.g.
df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
Notice that when evaluating the statements, pandas needs parenthesis.
Sorry, but OP want someting else. Btw, your code is wrong, returnValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis -df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52
add a comment |
up vote
0
down vote
It may be added at that '&' can be used to add additional conditions e.g.
df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
Notice that when evaluating the statements, pandas needs parenthesis.
Sorry, but OP want someting else. Btw, your code is wrong, returnValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis -df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52
add a comment |
up vote
0
down vote
up vote
0
down vote
It may be added at that '&' can be used to add additional conditions e.g.
df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
Notice that when evaluating the statements, pandas needs parenthesis.
It may be added at that '&' can be used to add additional conditions e.g.
df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
Notice that when evaluating the statements, pandas needs parenthesis.
edited Jan 26 '17 at 23:12
aesede
3,88922629
3,88922629
answered Mar 15 '16 at 15:33
David
91
91
Sorry, but OP want someting else. Btw, your code is wrong, returnValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis -df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52
add a comment |
Sorry, but OP want someting else. Btw, your code is wrong, returnValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis -df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52
Sorry, but OP want someting else. Btw, your code is wrong, return
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.– jezrael
Mar 16 '16 at 11:52
Sorry, but OP want someting else. Btw, your code is wrong, return
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)]
, but also it is not answer for this question.– jezrael
Mar 16 '16 at 11:52
add a comment |
up vote
-1
down vote
For some reason none of the previously submitted answers worked for me. This basic solution did:
df = df[df.EPS >= 0]
Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.
df = df[df.EPS <= 0]
add a comment |
up vote
-1
down vote
For some reason none of the previously submitted answers worked for me. This basic solution did:
df = df[df.EPS >= 0]
Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.
df = df[df.EPS <= 0]
add a comment |
up vote
-1
down vote
up vote
-1
down vote
For some reason none of the previously submitted answers worked for me. This basic solution did:
df = df[df.EPS >= 0]
Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.
df = df[df.EPS <= 0]
For some reason none of the previously submitted answers worked for me. This basic solution did:
df = df[df.EPS >= 0]
Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.
df = df[df.EPS <= 0]
edited Oct 9 '15 at 18:25
answered Oct 9 '15 at 18:00
samthebrand
93521938
93521938
add a comment |
add a comment |
protected by jezrael Mar 16 '16 at 11:53
Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
17
dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29
120
df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53