How to drop rows of Pandas DataFrame whose value in certain columns is NaN

up vote
490
down vote

favorite

182

I have a DataFrame:

>>> df

                 STK_ID  EPS  cash

STK_ID RPT_Date                   

601166 20111231  601166  NaN   NaN

600036 20111231  600036  NaN    12

600016 20111231  600016  4.3   NaN

601009 20111231  601009  NaN   NaN

601939 20111231  601939  2.5   NaN

000001 20111231  000001  NaN   NaN

Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:

                  STK_ID  EPS  cash

STK_ID RPT_Date                   

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

How do I do that?

edited Jan 5 '17 at 17:01

Ninjakannon

2,67642645

asked Nov 16 '12 at 9:17

bigbug

10.9k286184

17

dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29

120

df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53

add a comment |

up vote
490
down vote

favorite

182

I have a DataFrame:

>>> df

                 STK_ID  EPS  cash

STK_ID RPT_Date                   

601166 20111231  601166  NaN   NaN

600036 20111231  600036  NaN    12

600016 20111231  600016  4.3   NaN

601009 20111231  601009  NaN   NaN

601939 20111231  601939  2.5   NaN

000001 20111231  000001  NaN   NaN

Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:

                  STK_ID  EPS  cash

STK_ID RPT_Date                   

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

How do I do that?

edited Jan 5 '17 at 17:01

Ninjakannon

2,67642645

asked Nov 16 '12 at 9:17

bigbug

10.9k286184

17

dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29

120

df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53

add a comment |

up vote
490
down vote

favorite

182

up vote
490
down vote

favorite

182

I have a DataFrame:

>>> df

                 STK_ID  EPS  cash

STK_ID RPT_Date                   

601166 20111231  601166  NaN   NaN

600036 20111231  600036  NaN    12

600016 20111231  600016  4.3   NaN

601009 20111231  601009  NaN   NaN

601939 20111231  601939  2.5   NaN

000001 20111231  000001  NaN   NaN

Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:

                  STK_ID  EPS  cash

STK_ID RPT_Date                   

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

How do I do that?

edited Jan 5 '17 at 17:01

Ninjakannon

2,67642645

asked Nov 16 '12 at 9:17

bigbug

10.9k286184

I have a DataFrame:

>>> df

                 STK_ID  EPS  cash

STK_ID RPT_Date                   

601166 20111231  601166  NaN   NaN

600036 20111231  600036  NaN    12

600016 20111231  600016  4.3   NaN

601009 20111231  601009  NaN   NaN

601939 20111231  601939  2.5   NaN

000001 20111231  000001  NaN   NaN

Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:

                  STK_ID  EPS  cash

STK_ID RPT_Date                   

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

How do I do that?

python pandas dataframe

edited Jan 5 '17 at 17:01

Ninjakannon

2,67642645

asked Nov 16 '12 at 9:17

bigbug

10.9k286184

edited Jan 5 '17 at 17:01

Ninjakannon

2,67642645

asked Nov 16 '12 at 9:17

bigbug

10.9k286184

edited Jan 5 '17 at 17:01

Ninjakannon

2,67642645

edited Jan 5 '17 at 17:01

Ninjakannon

2,67642645

edited Jan 5 '17 at 17:01

Ninjakannon

2,67642645

asked Nov 16 '12 at 9:17

bigbug

10.9k286184

asked Nov 16 '12 at 9:17

bigbug

10.9k286184

asked Nov 16 '12 at 9:17

bigbug

10.9k286184

17

dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29

120

df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53

add a comment |

17

dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29

120

df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53

dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29

120

df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53

add a comment |

11 Answers
11

active

oldest

votes

up vote
388
down vote

accepted

Don't drop. Just take rows where EPS is finite:

df = df[np.isfinite(df['EPS'])]

answered Nov 16 '12 at 9:34

eumiro

125k18223228

352

I'd recommend using pandas.notnull instead of np.isfinite
– Wes McKinney
Nov 21 '12 at 3:08

8

Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15

9

Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18

2

@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53

4

@PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.
– normanius
Apr 5 at 10:02

|
show 2 more comments

up vote
653
down vote

This question is already resolved, but...

...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.

In [24]: df = pd.DataFrame(np.random.randn(10,3))



In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;



In [26]: df

Out[26]:

          0         1         2

0       NaN       NaN       NaN

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values

Out[27]:

          0         1         2

1  2.677677 -1.466923 -0.750366

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN

Out[28]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN

Out[29]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)

Out[30]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!

edited Aug 14 '17 at 0:04

ayhan

35.6k66397

answered Nov 17 '12 at 20:27

Aman

23.3k62435

177

you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07

8

@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52

2

This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10

1

isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 at 21:51

add a comment |

up vote
86
down vote

I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:

import pandas as pd

df = df[pd.notnull(df['EPS'])]

answered Apr 23 '14 at 5:37

Kirk Hadley

1,00672

7

Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53

2

notnull is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24

This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03

add a comment |

up vote
31
down vote

You can use this:

df.dropna(subset=['EPS'], how='all', inplace = True)

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

56447

answered Aug 2 '17 at 16:28

Joe

5,80421129

9

how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.
– Anton Protopopov
Jan 16 at 12:41

add a comment |

up vote
22
down vote

Simplest of all solutions:

filtered_df = df[df['EPS'].notnull()]

The above solution is way better than using np.isfinite()

edited Aug 8 at 15:17

ayhan

35.6k66397

answered Nov 23 '17 at 12:08

Gil Baggio

2,2371420

add a comment |

up vote
19
down vote

You could use dataframe method notnull or inverse of isnull, or numpy.isnan:

In [332]: df[df.EPS.notnull()]

Out[332]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [334]: df[~df.EPS.isnull()]

Out[334]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [347]: df[~np.isnan(df.EPS)]

Out[347]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN

answered Dec 4 '15 at 7:01

Anton Protopopov

14.4k34657

add a comment |

up vote
8
down vote

yet another solution which uses the fact that np.nan != np.nan:

In [149]: df.query("EPS == EPS")

Out[149]:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Apr 20 '17 at 21:15

MaxU

118k11105163

add a comment |

up vote
2
down vote

Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):

df=df[~df['EPS'].isnull()]

Now:

print(df)

Is:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Oct 18 at 23:55

U9-Forward

10.5k2834

add a comment |

up vote
1
down vote

you can use dropna

Example

Drop the rows where at least one element is missing.

df=df.dropna()

Define in which columns to look for missing values.

df=df.dropna(subset=['column1', 'column1'])

See this for more examples

Note: axis parameter of dropna is deprecated since version 0.23.0:

answered Oct 14 at 19:26

Umer

570415

add a comment |

up vote
0
down vote

It may be added at that '&' can be used to add additional conditions e.g.

df = df[(df.EPS > 2.0) & (df.EPS <4.0)]

Notice that when evaluating the statements, pandas needs parenthesis.

edited Jan 26 '17 at 23:12

aesede

3,88922629

answered Mar 15 '16 at 15:33

David

Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52

add a comment |

up vote
-1
down vote

For some reason none of the previously submitted answers worked for me. This basic solution did:

df = df[df.EPS >= 0]

Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.

df = df[df.EPS <= 0]

edited Oct 9 '15 at 18:25

answered Oct 9 '15 at 18:00

samthebrand

93521938

add a comment |

protected by jezrael Mar 16 '16 at 11:53

Thank you for your interest in this question.
Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

11 Answers
11

active

oldest

votes

11 Answers
11

active

oldest

votes

up vote
388
down vote

accepted

Don't drop. Just take rows where EPS is finite:

df = df[np.isfinite(df['EPS'])]

answered Nov 16 '12 at 9:34

eumiro

125k18223228

352

I'd recommend using pandas.notnull instead of np.isfinite
– Wes McKinney
Nov 21 '12 at 3:08

8

Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15

9

Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18

2

@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53

4

@PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.
– normanius
Apr 5 at 10:02

|
show 2 more comments

up vote
388
down vote

accepted

Don't drop. Just take rows where EPS is finite:

df = df[np.isfinite(df['EPS'])]

answered Nov 16 '12 at 9:34

eumiro

125k18223228

352

I'd recommend using pandas.notnull instead of np.isfinite
– Wes McKinney
Nov 21 '12 at 3:08

8

Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15

9

Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18

2

@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53

4

@PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.
– normanius
Apr 5 at 10:02

|
show 2 more comments

up vote
388
down vote

accepted

Don't drop. Just take rows where EPS is finite:

df = df[np.isfinite(df['EPS'])]

answered Nov 16 '12 at 9:34

eumiro

125k18223228

Don't drop. Just take rows where EPS is finite:

df = df[np.isfinite(df['EPS'])]

answered Nov 16 '12 at 9:34

eumiro

125k18223228

answered Nov 16 '12 at 9:34

eumiro

125k18223228

answered Nov 16 '12 at 9:34

eumiro

125k18223228

answered Nov 16 '12 at 9:34

eumiro

125k18223228

352

I'd recommend using pandas.notnull instead of np.isfinite
– Wes McKinney
Nov 21 '12 at 3:08

8

Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15

9

Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18

2

@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53

4

@PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.
– normanius
Apr 5 at 10:02

|
show 2 more comments

352

I'd recommend using pandas.notnull instead of np.isfinite
– Wes McKinney
Nov 21 '12 at 3:08

8

Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15

9

Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18

2

@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53

4

@PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.
– normanius
Apr 5 at 10:02

352

I'd recommend using pandas.notnull instead of np.isfinite
– Wes McKinney
Nov 21 '12 at 3:08

Is there any advantage to indexing and copying over dropping?
– Robert Muil
Jul 31 '15 at 8:15

Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
– Philipp Schwarz
Oct 7 '16 at 13:18

@wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
– stormfield
Sep 7 '17 at 11:53

@PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.
– normanius
Apr 5 at 10:02

|
show 2 more comments

up vote
653
down vote

This question is already resolved, but...

In [24]: df = pd.DataFrame(np.random.randn(10,3))



In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;



In [26]: df

Out[26]:

          0         1         2

0       NaN       NaN       NaN

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values

Out[27]:

          0         1         2

1  2.677677 -1.466923 -0.750366

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN

Out[28]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN

Out[29]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)

Out[30]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!

edited Aug 14 '17 at 0:04

ayhan

35.6k66397

answered Nov 17 '12 at 20:27

Aman

23.3k62435

177

you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07

8

@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52

2

This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10

1

isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 at 21:51

add a comment |

up vote
653
down vote

This question is already resolved, but...

In [24]: df = pd.DataFrame(np.random.randn(10,3))



In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;



In [26]: df

Out[26]:

          0         1         2

0       NaN       NaN       NaN

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values

Out[27]:

          0         1         2

1  2.677677 -1.466923 -0.750366

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN

Out[28]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN

Out[29]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)

Out[30]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!

edited Aug 14 '17 at 0:04

ayhan

35.6k66397

answered Nov 17 '12 at 20:27

Aman

23.3k62435

177

you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07

8

@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52

2

This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10

1

isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 at 21:51

add a comment |

up vote
653
down vote

This question is already resolved, but...

In [24]: df = pd.DataFrame(np.random.randn(10,3))



In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;



In [26]: df

Out[26]:

          0         1         2

0       NaN       NaN       NaN

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values

Out[27]:

          0         1         2

1  2.677677 -1.466923 -0.750366

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN

Out[28]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN

Out[29]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)

Out[30]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!

edited Aug 14 '17 at 0:04

ayhan

35.6k66397

answered Nov 17 '12 at 20:27

Aman

23.3k62435

This question is already resolved, but...

In [24]: df = pd.DataFrame(np.random.randn(10,3))



In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;



In [26]: df

Out[26]:

          0         1         2

0       NaN       NaN       NaN

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values

Out[27]:

          0         1         2

1  2.677677 -1.466923 -0.750366

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN

Out[28]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

4       NaN       NaN  0.050742

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

8       NaN       NaN  0.637482

9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN

Out[29]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)

Out[30]:

          0         1         2

1  2.677677 -1.466923 -0.750366

2       NaN  0.798002 -0.906038

3  0.672201  0.964789       NaN

5 -1.250970  0.030561 -2.678622

6       NaN  1.036043       NaN

7  0.049896 -0.308003  0.823295

9 -0.310130  0.078891       NaN

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!

edited Aug 14 '17 at 0:04

ayhan

35.6k66397

answered Nov 17 '12 at 20:27

Aman

23.3k62435

edited Aug 14 '17 at 0:04

ayhan

35.6k66397

edited Aug 14 '17 at 0:04

ayhan

35.6k66397

edited Aug 14 '17 at 0:04

ayhan

35.6k66397

answered Nov 17 '12 at 20:27

Aman

23.3k62435

answered Nov 17 '12 at 20:27

Aman

23.3k62435

answered Nov 17 '12 at 20:27

Aman

23.3k62435

177

you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07

8

@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52

2

This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10

1

isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 at 21:51

add a comment |

177

you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07

8

@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52

2

This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10

1

isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 at 21:51

177

you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
– James Tobin
Jun 18 '14 at 14:07

@JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
– osa
Sep 5 '14 at 23:52

This should be #1
– Cord Kaldemeyer
Oct 20 '17 at 13:10

isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
– TheProletariat
Mar 20 at 21:51

add a comment |

up vote
86
down vote

import pandas as pd

df = df[pd.notnull(df['EPS'])]

answered Apr 23 '14 at 5:37

Kirk Hadley

1,00672

7

Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53

2

notnull is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24

This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03

add a comment |

up vote
86
down vote

import pandas as pd

df = df[pd.notnull(df['EPS'])]

answered Apr 23 '14 at 5:37

Kirk Hadley

1,00672

7

Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53

2

notnull is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24

This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03

add a comment |

up vote
86
down vote

import pandas as pd

df = df[pd.notnull(df['EPS'])]

answered Apr 23 '14 at 5:37

Kirk Hadley

1,00672

import pandas as pd

df = df[pd.notnull(df['EPS'])]

answered Apr 23 '14 at 5:37

Kirk Hadley

1,00672

answered Apr 23 '14 at 5:37

Kirk Hadley

1,00672

answered Apr 23 '14 at 5:37

Kirk Hadley

1,00672

answered Apr 23 '14 at 5:37

Kirk Hadley

1,00672

7

Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53

2

notnull is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24

This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03

add a comment |

7

Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53

2

notnull is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24

This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03

Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)
– joris
Apr 23 '14 at 12:53

notnull is also what Wes (author of Pandas) suggested in his comment on another answer.
– fantabolous
Jul 9 '14 at 3:24

This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
– Aakash Gupta
Mar 4 '16 at 6:03

add a comment |

up vote
31
down vote

You can use this:

df.dropna(subset=['EPS'], how='all', inplace = True)

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

56447

answered Aug 2 '17 at 16:28

Joe

5,80421129

9

how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.
– Anton Protopopov
Jan 16 at 12:41

add a comment |

up vote
31
down vote

You can use this:

df.dropna(subset=['EPS'], how='all', inplace = True)

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

56447

answered Aug 2 '17 at 16:28

Joe

5,80421129

9

how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.
– Anton Protopopov
Jan 16 at 12:41

add a comment |

up vote
31
down vote

You can use this:

df.dropna(subset=['EPS'], how='all', inplace = True)

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

56447

answered Aug 2 '17 at 16:28

Joe

5,80421129

You can use this:

df.dropna(subset=['EPS'], how='all', inplace = True)

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

56447

answered Aug 2 '17 at 16:28

Joe

5,80421129

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

56447

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

56447

edited Aug 21 '17 at 9:49

Mojtaba Khodadadi

56447

answered Aug 2 '17 at 16:28

Joe

5,80421129

answered Aug 2 '17 at 16:28

Joe

5,80421129

answered Aug 2 '17 at 16:28

Joe

5,80421129

9

how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.
– Anton Protopopov
Jan 16 at 12:41

add a comment |

9

how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.
– Anton Protopopov
Jan 16 at 12:41

how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.
– Anton Protopopov
Jan 16 at 12:41

add a comment |

up vote
22
down vote

Simplest of all solutions:

filtered_df = df[df['EPS'].notnull()]

The above solution is way better than using np.isfinite()

edited Aug 8 at 15:17

ayhan

35.6k66397

answered Nov 23 '17 at 12:08

Gil Baggio

2,2371420

add a comment |

up vote
22
down vote

Simplest of all solutions:

filtered_df = df[df['EPS'].notnull()]

The above solution is way better than using np.isfinite()

edited Aug 8 at 15:17

ayhan

35.6k66397

answered Nov 23 '17 at 12:08

Gil Baggio

2,2371420

add a comment |

up vote
22
down vote

Simplest of all solutions:

filtered_df = df[df['EPS'].notnull()]

The above solution is way better than using np.isfinite()

edited Aug 8 at 15:17

ayhan

35.6k66397

answered Nov 23 '17 at 12:08

Gil Baggio

2,2371420

Simplest of all solutions:

filtered_df = df[df['EPS'].notnull()]

The above solution is way better than using np.isfinite()

edited Aug 8 at 15:17

ayhan

35.6k66397

answered Nov 23 '17 at 12:08

Gil Baggio

2,2371420

edited Aug 8 at 15:17

ayhan

35.6k66397

edited Aug 8 at 15:17

ayhan

35.6k66397

edited Aug 8 at 15:17

ayhan

35.6k66397

answered Nov 23 '17 at 12:08

Gil Baggio

2,2371420

answered Nov 23 '17 at 12:08

Gil Baggio

2,2371420

answered Nov 23 '17 at 12:08

Gil Baggio

2,2371420

add a comment |

up vote
19
down vote

You could use dataframe method notnull or inverse of isnull, or numpy.isnan:

In [332]: df[df.EPS.notnull()]

Out[332]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [334]: df[~df.EPS.isnull()]

Out[334]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [347]: df[~np.isnan(df.EPS)]

Out[347]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN

answered Dec 4 '15 at 7:01

Anton Protopopov

14.4k34657

add a comment |

up vote
19
down vote

You could use dataframe method notnull or inverse of isnull, or numpy.isnan:

In [332]: df[df.EPS.notnull()]

Out[332]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [334]: df[~df.EPS.isnull()]

Out[334]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [347]: df[~np.isnan(df.EPS)]

Out[347]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN

answered Dec 4 '15 at 7:01

Anton Protopopov

14.4k34657

add a comment |

up vote
19
down vote

You could use dataframe method notnull or inverse of isnull, or numpy.isnan:

In [332]: df[df.EPS.notnull()]

Out[332]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [334]: df[~df.EPS.isnull()]

Out[334]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [347]: df[~np.isnan(df.EPS)]

Out[347]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN

answered Dec 4 '15 at 7:01

Anton Protopopov

14.4k34657

You could use dataframe method notnull or inverse of isnull, or numpy.isnan:

In [332]: df[df.EPS.notnull()]

Out[332]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [334]: df[~df.EPS.isnull()]

Out[334]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN





In [347]: df[~np.isnan(df.EPS)]

Out[347]:

   STK_ID  RPT_Date  STK_ID.1  EPS  cash

2  600016  20111231    600016  4.3   NaN

4  601939  20111231    601939  2.5   NaN

answered Dec 4 '15 at 7:01

Anton Protopopov

14.4k34657

answered Dec 4 '15 at 7:01

Anton Protopopov

14.4k34657

answered Dec 4 '15 at 7:01

Anton Protopopov

14.4k34657

answered Dec 4 '15 at 7:01

Anton Protopopov

14.4k34657

add a comment |

up vote
8
down vote

yet another solution which uses the fact that np.nan != np.nan:

In [149]: df.query("EPS == EPS")

Out[149]:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Apr 20 '17 at 21:15

MaxU

118k11105163

add a comment |

up vote
8
down vote

yet another solution which uses the fact that np.nan != np.nan:

In [149]: df.query("EPS == EPS")

Out[149]:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Apr 20 '17 at 21:15

MaxU

118k11105163

add a comment |

up vote
8
down vote

yet another solution which uses the fact that np.nan != np.nan:

In [149]: df.query("EPS == EPS")

Out[149]:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Apr 20 '17 at 21:15

MaxU

118k11105163

yet another solution which uses the fact that np.nan != np.nan:

In [149]: df.query("EPS == EPS")

Out[149]:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Apr 20 '17 at 21:15

MaxU

118k11105163

answered Apr 20 '17 at 21:15

MaxU

118k11105163

answered Apr 20 '17 at 21:15

MaxU

118k11105163

answered Apr 20 '17 at 21:15

MaxU

118k11105163

add a comment |

up vote
2
down vote

Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):

df=df[~df['EPS'].isnull()]

Now:

print(df)

Is:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Oct 18 at 23:55

U9-Forward

10.5k2834

add a comment |

up vote
2
down vote

Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):

df=df[~df['EPS'].isnull()]

Now:

print(df)

Is:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Oct 18 at 23:55

U9-Forward

10.5k2834

add a comment |

up vote
2
down vote

Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):

df=df[~df['EPS'].isnull()]

Now:

print(df)

Is:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Oct 18 at 23:55

U9-Forward

10.5k2834

Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):

df=df[~df['EPS'].isnull()]

Now:

print(df)

Is:

                 STK_ID  EPS  cash

STK_ID RPT_Date

600016 20111231  600016  4.3   NaN

601939 20111231  601939  2.5   NaN

answered Oct 18 at 23:55

U9-Forward

10.5k2834

answered Oct 18 at 23:55

U9-Forward

10.5k2834

answered Oct 18 at 23:55

U9-Forward

10.5k2834

answered Oct 18 at 23:55

U9-Forward

10.5k2834

add a comment |

up vote
1
down vote

you can use dropna

Example

Drop the rows where at least one element is missing.

df=df.dropna()

Define in which columns to look for missing values.

df=df.dropna(subset=['column1', 'column1'])

See this for more examples

Note: axis parameter of dropna is deprecated since version 0.23.0:

answered Oct 14 at 19:26

Umer

570415

add a comment |

up vote
1
down vote

you can use dropna

Example

Drop the rows where at least one element is missing.

df=df.dropna()

Define in which columns to look for missing values.

df=df.dropna(subset=['column1', 'column1'])

See this for more examples

Note: axis parameter of dropna is deprecated since version 0.23.0:

answered Oct 14 at 19:26

Umer

570415

add a comment |

up vote
1
down vote

you can use dropna

Example

Drop the rows where at least one element is missing.

df=df.dropna()

Define in which columns to look for missing values.

df=df.dropna(subset=['column1', 'column1'])

See this for more examples

Note: axis parameter of dropna is deprecated since version 0.23.0:

answered Oct 14 at 19:26

Umer

570415

you can use dropna

Example

Drop the rows where at least one element is missing.

df=df.dropna()

Define in which columns to look for missing values.

df=df.dropna(subset=['column1', 'column1'])

See this for more examples

Note: axis parameter of dropna is deprecated since version 0.23.0:

answered Oct 14 at 19:26

Umer

570415

answered Oct 14 at 19:26

Umer

570415

answered Oct 14 at 19:26

Umer

570415

answered Oct 14 at 19:26

Umer

570415

add a comment |

up vote
0
down vote

It may be added at that '&' can be used to add additional conditions e.g.

df = df[(df.EPS > 2.0) & (df.EPS <4.0)]

Notice that when evaluating the statements, pandas needs parenthesis.

edited Jan 26 '17 at 23:12

aesede

3,88922629

answered Mar 15 '16 at 15:33

David

Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52

add a comment |

up vote
0
down vote

It may be added at that '&' can be used to add additional conditions e.g.

df = df[(df.EPS > 2.0) & (df.EPS <4.0)]

Notice that when evaluating the statements, pandas needs parenthesis.

edited Jan 26 '17 at 23:12

aesede

3,88922629

answered Mar 15 '16 at 15:33

David

Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52

add a comment |

up vote
0
down vote

It may be added at that '&' can be used to add additional conditions e.g.

df = df[(df.EPS > 2.0) & (df.EPS <4.0)]

Notice that when evaluating the statements, pandas needs parenthesis.

edited Jan 26 '17 at 23:12

aesede

3,88922629

answered Mar 15 '16 at 15:33

David

It may be added at that '&' can be used to add additional conditions e.g.

df = df[(df.EPS > 2.0) & (df.EPS <4.0)]

Notice that when evaluating the statements, pandas needs parenthesis.

edited Jan 26 '17 at 23:12

aesede

3,88922629

answered Mar 15 '16 at 15:33

David

edited Jan 26 '17 at 23:12

aesede

3,88922629

edited Jan 26 '17 at 23:12

aesede

3,88922629

edited Jan 26 '17 at 23:12

aesede

3,88922629

answered Mar 15 '16 at 15:33

David

answered Mar 15 '16 at 15:33

David

answered Mar 15 '16 at 15:33

David

Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52

add a comment |

Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52

Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.
– jezrael
Mar 16 '16 at 11:52

add a comment |

up vote
-1
down vote

For some reason none of the previously submitted answers worked for me. This basic solution did:

df = df[df.EPS >= 0]

Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.

df = df[df.EPS <= 0]

edited Oct 9 '15 at 18:25

answered Oct 9 '15 at 18:00

samthebrand

93521938

add a comment |

up vote
-1
down vote

For some reason none of the previously submitted answers worked for me. This basic solution did:

df = df[df.EPS >= 0]

Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.

df = df[df.EPS <= 0]

edited Oct 9 '15 at 18:25

answered Oct 9 '15 at 18:00

samthebrand

93521938

add a comment |

up vote
-1
down vote

For some reason none of the previously submitted answers worked for me. This basic solution did:

df = df[df.EPS >= 0]

Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.

df = df[df.EPS <= 0]

edited Oct 9 '15 at 18:25

answered Oct 9 '15 at 18:00

samthebrand

93521938

For some reason none of the previously submitted answers worked for me. This basic solution did:

df = df[df.EPS >= 0]

Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.

df = df[df.EPS <= 0]

edited Oct 9 '15 at 18:25

answered Oct 9 '15 at 18:00

samthebrand

93521938

edited Oct 9 '15 at 18:25

answered Oct 9 '15 at 18:00

samthebrand

93521938

answered Oct 9 '15 at 18:00

samthebrand

93521938

answered Oct 9 '15 at 18:00

samthebrand

93521938

add a comment |

protected by jezrael Mar 16 '16 at 11:53

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nsryjdtyk