How to drop rows of Pandas DataFrame whose value in certain columns is NaN











up vote
490
down vote

favorite
182












I have a DataFrame:



>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN


Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:



                  STK_ID  EPS  cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN


How do I do that?










share|improve this question




















  • 17




    dropna: pandas.pydata.org/pandas-docs/stable/generated/…
    – Wouter Overmeire
    Nov 16 '12 at 9:29








  • 120




    df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
    – osa
    Sep 5 '14 at 23:53

















up vote
490
down vote

favorite
182












I have a DataFrame:



>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN


Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:



                  STK_ID  EPS  cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN


How do I do that?










share|improve this question




















  • 17




    dropna: pandas.pydata.org/pandas-docs/stable/generated/…
    – Wouter Overmeire
    Nov 16 '12 at 9:29








  • 120




    df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
    – osa
    Sep 5 '14 at 23:53















up vote
490
down vote

favorite
182









up vote
490
down vote

favorite
182






182





I have a DataFrame:



>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN


Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:



                  STK_ID  EPS  cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN


How do I do that?










share|improve this question















I have a DataFrame:



>>> df
STK_ID EPS cash
STK_ID RPT_Date
601166 20111231 601166 NaN NaN
600036 20111231 600036 NaN 12
600016 20111231 600016 4.3 NaN
601009 20111231 601009 NaN NaN
601939 20111231 601939 2.5 NaN
000001 20111231 000001 NaN NaN


Then I just want the records whose EPS is not NaN, that is, df.drop(....) will return the dataframe as below:



                  STK_ID  EPS  cash
STK_ID RPT_Date
600016 20111231 600016 4.3 NaN
601939 20111231 601939 2.5 NaN


How do I do that?







python pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 5 '17 at 17:01









Ninjakannon

2,67642645




2,67642645










asked Nov 16 '12 at 9:17









bigbug

10.9k286184




10.9k286184








  • 17




    dropna: pandas.pydata.org/pandas-docs/stable/generated/…
    – Wouter Overmeire
    Nov 16 '12 at 9:29








  • 120




    df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
    – osa
    Sep 5 '14 at 23:53
















  • 17




    dropna: pandas.pydata.org/pandas-docs/stable/generated/…
    – Wouter Overmeire
    Nov 16 '12 at 9:29








  • 120




    df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
    – osa
    Sep 5 '14 at 23:53










17




17




dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29






dropna: pandas.pydata.org/pandas-docs/stable/generated/…
– Wouter Overmeire
Nov 16 '12 at 9:29






120




120




df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53






df.dropna(subset = ['column1_name', 'column2_name', 'column3_name'])
– osa
Sep 5 '14 at 23:53














11 Answers
11






active

oldest

votes

















up vote
388
down vote



accepted










Don't drop. Just take rows where EPS is finite:



df = df[np.isfinite(df['EPS'])]





share|improve this answer

















  • 352




    I'd recommend using pandas.notnull instead of np.isfinite
    – Wes McKinney
    Nov 21 '12 at 3:08






  • 8




    Is there any advantage to indexing and copying over dropping?
    – Robert Muil
    Jul 31 '15 at 8:15






  • 9




    Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
    – Philipp Schwarz
    Oct 7 '16 at 13:18








  • 2




    @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
    – stormfield
    Sep 7 '17 at 11:53






  • 4




    @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.
    – normanius
    Apr 5 at 10:02


















up vote
653
down vote













This question is already resolved, but...



...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.



In [24]: df = pd.DataFrame(np.random.randn(10,3))

In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

In [26]: df
Out[26]:
0 1 2
0 NaN NaN NaN
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN




In [27]: df.dropna()     #drop all rows that have any NaN values
Out[27]:
0 1 2
1 2.677677 -1.466923 -0.750366
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295




In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
Out[28]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
4 NaN NaN 0.050742
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
8 NaN NaN 0.637482
9 -0.310130 0.078891 NaN




In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN




In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
Out[30]:
0 1 2
1 2.677677 -1.466923 -0.750366
2 NaN 0.798002 -0.906038
3 0.672201 0.964789 NaN
5 -1.250970 0.030561 -2.678622
6 NaN 1.036043 NaN
7 0.049896 -0.308003 0.823295
9 -0.310130 0.078891 NaN


There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.



Pretty handy!






share|improve this answer



















  • 177




    you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
    – James Tobin
    Jun 18 '14 at 14:07






  • 8




    @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
    – osa
    Sep 5 '14 at 23:52






  • 2




    This should be #1
    – Cord Kaldemeyer
    Oct 20 '17 at 13:10






  • 1




    isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
    – TheProletariat
    Mar 20 at 21:51


















up vote
86
down vote













I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:



import pandas as pd
df = df[pd.notnull(df['EPS'])]





share|improve this answer

















  • 7




    Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)
    – joris
    Apr 23 '14 at 12:53






  • 2




    notnull is also what Wes (author of Pandas) suggested in his comment on another answer.
    – fantabolous
    Jul 9 '14 at 3:24










  • This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
    – Aakash Gupta
    Mar 4 '16 at 6:03


















up vote
31
down vote













You can use this:



df.dropna(subset=['EPS'], how='all', inplace = True)





share|improve this answer



















  • 9




    how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.
    – Anton Protopopov
    Jan 16 at 12:41




















up vote
22
down vote













Simplest of all solutions:



filtered_df = df[df['EPS'].notnull()]



The above solution is way better than using np.isfinite()







share|improve this answer






























    up vote
    19
    down vote













    You could use dataframe method notnull or inverse of isnull, or numpy.isnan:



    In [332]: df[df.EPS.notnull()]
    Out[332]:
    STK_ID RPT_Date STK_ID.1 EPS cash
    2 600016 20111231 600016 4.3 NaN
    4 601939 20111231 601939 2.5 NaN


    In [334]: df[~df.EPS.isnull()]
    Out[334]:
    STK_ID RPT_Date STK_ID.1 EPS cash
    2 600016 20111231 600016 4.3 NaN
    4 601939 20111231 601939 2.5 NaN


    In [347]: df[~np.isnan(df.EPS)]
    Out[347]:
    STK_ID RPT_Date STK_ID.1 EPS cash
    2 600016 20111231 600016 4.3 NaN
    4 601939 20111231 601939 2.5 NaN





    share|improve this answer




























      up vote
      8
      down vote













      yet another solution which uses the fact that np.nan != np.nan:



      In [149]: df.query("EPS == EPS")
      Out[149]:
      STK_ID EPS cash
      STK_ID RPT_Date
      600016 20111231 600016 4.3 NaN
      601939 20111231 601939 2.5 NaN





      share|improve this answer




























        up vote
        2
        down vote













        Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):



        df=df[~df['EPS'].isnull()]


        Now:



        print(df)


        Is:



                         STK_ID  EPS  cash
        STK_ID RPT_Date
        600016 20111231 600016 4.3 NaN
        601939 20111231 601939 2.5 NaN





        share|improve this answer




























          up vote
          1
          down vote













          you can use dropna



          Example



          Drop the rows where at least one element is missing.



          df=df.dropna()


          Define in which columns to look for missing values.



          df=df.dropna(subset=['column1', 'column1'])


          See this for more examples




          Note: axis parameter of dropna is deprecated since version 0.23.0:







          share|improve this answer




























            up vote
            0
            down vote













            It may be added at that '&' can be used to add additional conditions e.g.



            df = df[(df.EPS > 2.0) & (df.EPS <4.0)]


            Notice that when evaluating the statements, pandas needs parenthesis.






            share|improve this answer























            • Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.
              – jezrael
              Mar 16 '16 at 11:52




















            up vote
            -1
            down vote













            For some reason none of the previously submitted answers worked for me. This basic solution did:



            df = df[df.EPS >= 0]


            Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.



            df = df[df.EPS <= 0]





            share|improve this answer






















              protected by jezrael Mar 16 '16 at 11:53



              Thank you for your interest in this question.
              Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



              Would you like to answer one of these unanswered questions instead?














              11 Answers
              11






              active

              oldest

              votes








              11 Answers
              11






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              388
              down vote



              accepted










              Don't drop. Just take rows where EPS is finite:



              df = df[np.isfinite(df['EPS'])]





              share|improve this answer

















              • 352




                I'd recommend using pandas.notnull instead of np.isfinite
                – Wes McKinney
                Nov 21 '12 at 3:08






              • 8




                Is there any advantage to indexing and copying over dropping?
                – Robert Muil
                Jul 31 '15 at 8:15






              • 9




                Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
                – Philipp Schwarz
                Oct 7 '16 at 13:18








              • 2




                @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
                – stormfield
                Sep 7 '17 at 11:53






              • 4




                @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.
                – normanius
                Apr 5 at 10:02















              up vote
              388
              down vote



              accepted










              Don't drop. Just take rows where EPS is finite:



              df = df[np.isfinite(df['EPS'])]





              share|improve this answer

















              • 352




                I'd recommend using pandas.notnull instead of np.isfinite
                – Wes McKinney
                Nov 21 '12 at 3:08






              • 8




                Is there any advantage to indexing and copying over dropping?
                – Robert Muil
                Jul 31 '15 at 8:15






              • 9




                Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
                – Philipp Schwarz
                Oct 7 '16 at 13:18








              • 2




                @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
                – stormfield
                Sep 7 '17 at 11:53






              • 4




                @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.
                – normanius
                Apr 5 at 10:02













              up vote
              388
              down vote



              accepted







              up vote
              388
              down vote



              accepted






              Don't drop. Just take rows where EPS is finite:



              df = df[np.isfinite(df['EPS'])]





              share|improve this answer












              Don't drop. Just take rows where EPS is finite:



              df = df[np.isfinite(df['EPS'])]






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Nov 16 '12 at 9:34









              eumiro

              125k18223228




              125k18223228








              • 352




                I'd recommend using pandas.notnull instead of np.isfinite
                – Wes McKinney
                Nov 21 '12 at 3:08






              • 8




                Is there any advantage to indexing and copying over dropping?
                – Robert Muil
                Jul 31 '15 at 8:15






              • 9




                Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
                – Philipp Schwarz
                Oct 7 '16 at 13:18








              • 2




                @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
                – stormfield
                Sep 7 '17 at 11:53






              • 4




                @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.
                – normanius
                Apr 5 at 10:02














              • 352




                I'd recommend using pandas.notnull instead of np.isfinite
                – Wes McKinney
                Nov 21 '12 at 3:08






              • 8




                Is there any advantage to indexing and copying over dropping?
                – Robert Muil
                Jul 31 '15 at 8:15






              • 9




                Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
                – Philipp Schwarz
                Oct 7 '16 at 13:18








              • 2




                @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
                – stormfield
                Sep 7 '17 at 11:53






              • 4




                @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.
                – normanius
                Apr 5 at 10:02








              352




              352




              I'd recommend using pandas.notnull instead of np.isfinite
              – Wes McKinney
              Nov 21 '12 at 3:08




              I'd recommend using pandas.notnull instead of np.isfinite
              – Wes McKinney
              Nov 21 '12 at 3:08




              8




              8




              Is there any advantage to indexing and copying over dropping?
              – Robert Muil
              Jul 31 '15 at 8:15




              Is there any advantage to indexing and copying over dropping?
              – Robert Muil
              Jul 31 '15 at 8:15




              9




              9




              Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
              – Philipp Schwarz
              Oct 7 '16 at 13:18






              Creates Error: TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
              – Philipp Schwarz
              Oct 7 '16 at 13:18






              2




              2




              @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
              – stormfield
              Sep 7 '17 at 11:53




              @wes-mckinney could please let me know if dropna () is a better choice over pandas.notnull in this case ? If so, then why ?
              – stormfield
              Sep 7 '17 at 11:53




              4




              4




              @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.
              – normanius
              Apr 5 at 10:02




              @PhilippSchwarz This error occurs if the column (EPS in the example) contains strings or other types that cannot be digested by np.isfinite(). I recommend to use pandas.notnull() that will handle this more generously.
              – normanius
              Apr 5 at 10:02












              up vote
              653
              down vote













              This question is already resolved, but...



              ...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.



              In [24]: df = pd.DataFrame(np.random.randn(10,3))

              In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

              In [26]: df
              Out[26]:
              0 1 2
              0 NaN NaN NaN
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              4 NaN NaN 0.050742
              5 -1.250970 0.030561 -2.678622
              6 NaN 1.036043 NaN
              7 0.049896 -0.308003 0.823295
              8 NaN NaN 0.637482
              9 -0.310130 0.078891 NaN




              In [27]: df.dropna()     #drop all rows that have any NaN values
              Out[27]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              5 -1.250970 0.030561 -2.678622
              7 0.049896 -0.308003 0.823295




              In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
              Out[28]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              4 NaN NaN 0.050742
              5 -1.250970 0.030561 -2.678622
              6 NaN 1.036043 NaN
              7 0.049896 -0.308003 0.823295
              8 NaN NaN 0.637482
              9 -0.310130 0.078891 NaN




              In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
              Out[29]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              5 -1.250970 0.030561 -2.678622
              7 0.049896 -0.308003 0.823295
              9 -0.310130 0.078891 NaN




              In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
              Out[30]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              5 -1.250970 0.030561 -2.678622
              6 NaN 1.036043 NaN
              7 0.049896 -0.308003 0.823295
              9 -0.310130 0.078891 NaN


              There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.



              Pretty handy!






              share|improve this answer



















              • 177




                you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
                – James Tobin
                Jun 18 '14 at 14:07






              • 8




                @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
                – osa
                Sep 5 '14 at 23:52






              • 2




                This should be #1
                – Cord Kaldemeyer
                Oct 20 '17 at 13:10






              • 1




                isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
                – TheProletariat
                Mar 20 at 21:51















              up vote
              653
              down vote













              This question is already resolved, but...



              ...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.



              In [24]: df = pd.DataFrame(np.random.randn(10,3))

              In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

              In [26]: df
              Out[26]:
              0 1 2
              0 NaN NaN NaN
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              4 NaN NaN 0.050742
              5 -1.250970 0.030561 -2.678622
              6 NaN 1.036043 NaN
              7 0.049896 -0.308003 0.823295
              8 NaN NaN 0.637482
              9 -0.310130 0.078891 NaN




              In [27]: df.dropna()     #drop all rows that have any NaN values
              Out[27]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              5 -1.250970 0.030561 -2.678622
              7 0.049896 -0.308003 0.823295




              In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
              Out[28]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              4 NaN NaN 0.050742
              5 -1.250970 0.030561 -2.678622
              6 NaN 1.036043 NaN
              7 0.049896 -0.308003 0.823295
              8 NaN NaN 0.637482
              9 -0.310130 0.078891 NaN




              In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
              Out[29]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              5 -1.250970 0.030561 -2.678622
              7 0.049896 -0.308003 0.823295
              9 -0.310130 0.078891 NaN




              In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
              Out[30]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              5 -1.250970 0.030561 -2.678622
              6 NaN 1.036043 NaN
              7 0.049896 -0.308003 0.823295
              9 -0.310130 0.078891 NaN


              There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.



              Pretty handy!






              share|improve this answer



















              • 177




                you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
                – James Tobin
                Jun 18 '14 at 14:07






              • 8




                @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
                – osa
                Sep 5 '14 at 23:52






              • 2




                This should be #1
                – Cord Kaldemeyer
                Oct 20 '17 at 13:10






              • 1




                isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
                – TheProletariat
                Mar 20 at 21:51













              up vote
              653
              down vote










              up vote
              653
              down vote









              This question is already resolved, but...



              ...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.



              In [24]: df = pd.DataFrame(np.random.randn(10,3))

              In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

              In [26]: df
              Out[26]:
              0 1 2
              0 NaN NaN NaN
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              4 NaN NaN 0.050742
              5 -1.250970 0.030561 -2.678622
              6 NaN 1.036043 NaN
              7 0.049896 -0.308003 0.823295
              8 NaN NaN 0.637482
              9 -0.310130 0.078891 NaN




              In [27]: df.dropna()     #drop all rows that have any NaN values
              Out[27]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              5 -1.250970 0.030561 -2.678622
              7 0.049896 -0.308003 0.823295




              In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
              Out[28]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              4 NaN NaN 0.050742
              5 -1.250970 0.030561 -2.678622
              6 NaN 1.036043 NaN
              7 0.049896 -0.308003 0.823295
              8 NaN NaN 0.637482
              9 -0.310130 0.078891 NaN




              In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
              Out[29]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              5 -1.250970 0.030561 -2.678622
              7 0.049896 -0.308003 0.823295
              9 -0.310130 0.078891 NaN




              In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
              Out[30]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              5 -1.250970 0.030561 -2.678622
              6 NaN 1.036043 NaN
              7 0.049896 -0.308003 0.823295
              9 -0.310130 0.078891 NaN


              There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.



              Pretty handy!






              share|improve this answer














              This question is already resolved, but...



              ...also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.



              In [24]: df = pd.DataFrame(np.random.randn(10,3))

              In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

              In [26]: df
              Out[26]:
              0 1 2
              0 NaN NaN NaN
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              4 NaN NaN 0.050742
              5 -1.250970 0.030561 -2.678622
              6 NaN 1.036043 NaN
              7 0.049896 -0.308003 0.823295
              8 NaN NaN 0.637482
              9 -0.310130 0.078891 NaN




              In [27]: df.dropna()     #drop all rows that have any NaN values
              Out[27]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              5 -1.250970 0.030561 -2.678622
              7 0.049896 -0.308003 0.823295




              In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
              Out[28]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              4 NaN NaN 0.050742
              5 -1.250970 0.030561 -2.678622
              6 NaN 1.036043 NaN
              7 0.049896 -0.308003 0.823295
              8 NaN NaN 0.637482
              9 -0.310130 0.078891 NaN




              In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
              Out[29]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              5 -1.250970 0.030561 -2.678622
              7 0.049896 -0.308003 0.823295
              9 -0.310130 0.078891 NaN




              In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
              Out[30]:
              0 1 2
              1 2.677677 -1.466923 -0.750366
              2 NaN 0.798002 -0.906038
              3 0.672201 0.964789 NaN
              5 -1.250970 0.030561 -2.678622
              6 NaN 1.036043 NaN
              7 0.049896 -0.308003 0.823295
              9 -0.310130 0.078891 NaN


              There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.



              Pretty handy!







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Aug 14 '17 at 0:04









              ayhan

              35.6k66397




              35.6k66397










              answered Nov 17 '12 at 20:27









              Aman

              23.3k62435




              23.3k62435








              • 177




                you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
                – James Tobin
                Jun 18 '14 at 14:07






              • 8




                @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
                – osa
                Sep 5 '14 at 23:52






              • 2




                This should be #1
                – Cord Kaldemeyer
                Oct 20 '17 at 13:10






              • 1




                isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
                – TheProletariat
                Mar 20 at 21:51














              • 177




                you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
                – James Tobin
                Jun 18 '14 at 14:07






              • 8




                @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
                – osa
                Sep 5 '14 at 23:52






              • 2




                This should be #1
                – Cord Kaldemeyer
                Oct 20 '17 at 13:10






              • 1




                isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
                – TheProletariat
                Mar 20 at 21:51








              177




              177




              you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
              – James Tobin
              Jun 18 '14 at 14:07




              you can also use df.dropna(subset = ['column_name']). Hope that saves at least one person the extra 5 seconds of 'what am I doing wrong'. Great answer, +1
              – James Tobin
              Jun 18 '14 at 14:07




              8




              8




              @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
              – osa
              Sep 5 '14 at 23:52




              @JamesTobin, I just spent 20 minutes to write a function for that! The official documentation was very cryptic: "Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include". I was unable to understand, what they meant...
              – osa
              Sep 5 '14 at 23:52




              2




              2




              This should be #1
              – Cord Kaldemeyer
              Oct 20 '17 at 13:10




              This should be #1
              – Cord Kaldemeyer
              Oct 20 '17 at 13:10




              1




              1




              isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
              – TheProletariat
              Mar 20 at 21:51




              isfinite() is probably more pythonic, but this answer is more elegant and in line with pandas principles. Great answer.
              – TheProletariat
              Mar 20 at 21:51










              up vote
              86
              down vote













              I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:



              import pandas as pd
              df = df[pd.notnull(df['EPS'])]





              share|improve this answer

















              • 7




                Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)
                – joris
                Apr 23 '14 at 12:53






              • 2




                notnull is also what Wes (author of Pandas) suggested in his comment on another answer.
                – fantabolous
                Jul 9 '14 at 3:24










              • This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
                – Aakash Gupta
                Mar 4 '16 at 6:03















              up vote
              86
              down vote













              I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:



              import pandas as pd
              df = df[pd.notnull(df['EPS'])]





              share|improve this answer

















              • 7




                Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)
                – joris
                Apr 23 '14 at 12:53






              • 2




                notnull is also what Wes (author of Pandas) suggested in his comment on another answer.
                – fantabolous
                Jul 9 '14 at 3:24










              • This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
                – Aakash Gupta
                Mar 4 '16 at 6:03













              up vote
              86
              down vote










              up vote
              86
              down vote









              I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:



              import pandas as pd
              df = df[pd.notnull(df['EPS'])]





              share|improve this answer












              I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:



              import pandas as pd
              df = df[pd.notnull(df['EPS'])]






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Apr 23 '14 at 5:37









              Kirk Hadley

              1,00672




              1,00672








              • 7




                Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)
                – joris
                Apr 23 '14 at 12:53






              • 2




                notnull is also what Wes (author of Pandas) suggested in his comment on another answer.
                – fantabolous
                Jul 9 '14 at 3:24










              • This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
                – Aakash Gupta
                Mar 4 '16 at 6:03














              • 7




                Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)
                – joris
                Apr 23 '14 at 12:53






              • 2




                notnull is also what Wes (author of Pandas) suggested in his comment on another answer.
                – fantabolous
                Jul 9 '14 at 3:24










              • This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
                – Aakash Gupta
                Mar 4 '16 at 6:03








              7




              7




              Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)
              – joris
              Apr 23 '14 at 12:53




              Actually, the specific answer would be: df.dropna(subset=['EPS']) (based on the general description of Aman, of course this does also work)
              – joris
              Apr 23 '14 at 12:53




              2




              2




              notnull is also what Wes (author of Pandas) suggested in his comment on another answer.
              – fantabolous
              Jul 9 '14 at 3:24




              notnull is also what Wes (author of Pandas) suggested in his comment on another answer.
              – fantabolous
              Jul 9 '14 at 3:24












              This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
              – Aakash Gupta
              Mar 4 '16 at 6:03




              This maybe a noob question. But when I do a df[pd.notnull(...) or df.dropna the index gets dropped. So if there was a null value in row-index 10 in a df of length 200. The dataframe after running the drop function has index values from 1 to 9 and then 11 to 200. Anyway to "re-index" it
              – Aakash Gupta
              Mar 4 '16 at 6:03










              up vote
              31
              down vote













              You can use this:



              df.dropna(subset=['EPS'], how='all', inplace = True)





              share|improve this answer



















              • 9




                how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.
                – Anton Protopopov
                Jan 16 at 12:41

















              up vote
              31
              down vote













              You can use this:



              df.dropna(subset=['EPS'], how='all', inplace = True)





              share|improve this answer



















              • 9




                how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.
                – Anton Protopopov
                Jan 16 at 12:41















              up vote
              31
              down vote










              up vote
              31
              down vote









              You can use this:



              df.dropna(subset=['EPS'], how='all', inplace = True)





              share|improve this answer














              You can use this:



              df.dropna(subset=['EPS'], how='all', inplace = True)






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Aug 21 '17 at 9:49









              Mojtaba Khodadadi

              56447




              56447










              answered Aug 2 '17 at 16:28









              Joe

              5,80421129




              5,80421129








              • 9




                how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.
                – Anton Protopopov
                Jan 16 at 12:41
















              • 9




                how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.
                – Anton Protopopov
                Jan 16 at 12:41










              9




              9




              how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.
              – Anton Protopopov
              Jan 16 at 12:41






              how='all' is redundant here, because you subsetting dataframe only with one field so both 'all' and 'any' will have the same effect.
              – Anton Protopopov
              Jan 16 at 12:41












              up vote
              22
              down vote













              Simplest of all solutions:



              filtered_df = df[df['EPS'].notnull()]



              The above solution is way better than using np.isfinite()







              share|improve this answer



























                up vote
                22
                down vote













                Simplest of all solutions:



                filtered_df = df[df['EPS'].notnull()]



                The above solution is way better than using np.isfinite()







                share|improve this answer

























                  up vote
                  22
                  down vote










                  up vote
                  22
                  down vote









                  Simplest of all solutions:



                  filtered_df = df[df['EPS'].notnull()]



                  The above solution is way better than using np.isfinite()







                  share|improve this answer














                  Simplest of all solutions:



                  filtered_df = df[df['EPS'].notnull()]



                  The above solution is way better than using np.isfinite()








                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Aug 8 at 15:17









                  ayhan

                  35.6k66397




                  35.6k66397










                  answered Nov 23 '17 at 12:08









                  Gil Baggio

                  2,2371420




                  2,2371420






















                      up vote
                      19
                      down vote













                      You could use dataframe method notnull or inverse of isnull, or numpy.isnan:



                      In [332]: df[df.EPS.notnull()]
                      Out[332]:
                      STK_ID RPT_Date STK_ID.1 EPS cash
                      2 600016 20111231 600016 4.3 NaN
                      4 601939 20111231 601939 2.5 NaN


                      In [334]: df[~df.EPS.isnull()]
                      Out[334]:
                      STK_ID RPT_Date STK_ID.1 EPS cash
                      2 600016 20111231 600016 4.3 NaN
                      4 601939 20111231 601939 2.5 NaN


                      In [347]: df[~np.isnan(df.EPS)]
                      Out[347]:
                      STK_ID RPT_Date STK_ID.1 EPS cash
                      2 600016 20111231 600016 4.3 NaN
                      4 601939 20111231 601939 2.5 NaN





                      share|improve this answer

























                        up vote
                        19
                        down vote













                        You could use dataframe method notnull or inverse of isnull, or numpy.isnan:



                        In [332]: df[df.EPS.notnull()]
                        Out[332]:
                        STK_ID RPT_Date STK_ID.1 EPS cash
                        2 600016 20111231 600016 4.3 NaN
                        4 601939 20111231 601939 2.5 NaN


                        In [334]: df[~df.EPS.isnull()]
                        Out[334]:
                        STK_ID RPT_Date STK_ID.1 EPS cash
                        2 600016 20111231 600016 4.3 NaN
                        4 601939 20111231 601939 2.5 NaN


                        In [347]: df[~np.isnan(df.EPS)]
                        Out[347]:
                        STK_ID RPT_Date STK_ID.1 EPS cash
                        2 600016 20111231 600016 4.3 NaN
                        4 601939 20111231 601939 2.5 NaN





                        share|improve this answer























                          up vote
                          19
                          down vote










                          up vote
                          19
                          down vote









                          You could use dataframe method notnull or inverse of isnull, or numpy.isnan:



                          In [332]: df[df.EPS.notnull()]
                          Out[332]:
                          STK_ID RPT_Date STK_ID.1 EPS cash
                          2 600016 20111231 600016 4.3 NaN
                          4 601939 20111231 601939 2.5 NaN


                          In [334]: df[~df.EPS.isnull()]
                          Out[334]:
                          STK_ID RPT_Date STK_ID.1 EPS cash
                          2 600016 20111231 600016 4.3 NaN
                          4 601939 20111231 601939 2.5 NaN


                          In [347]: df[~np.isnan(df.EPS)]
                          Out[347]:
                          STK_ID RPT_Date STK_ID.1 EPS cash
                          2 600016 20111231 600016 4.3 NaN
                          4 601939 20111231 601939 2.5 NaN





                          share|improve this answer












                          You could use dataframe method notnull or inverse of isnull, or numpy.isnan:



                          In [332]: df[df.EPS.notnull()]
                          Out[332]:
                          STK_ID RPT_Date STK_ID.1 EPS cash
                          2 600016 20111231 600016 4.3 NaN
                          4 601939 20111231 601939 2.5 NaN


                          In [334]: df[~df.EPS.isnull()]
                          Out[334]:
                          STK_ID RPT_Date STK_ID.1 EPS cash
                          2 600016 20111231 600016 4.3 NaN
                          4 601939 20111231 601939 2.5 NaN


                          In [347]: df[~np.isnan(df.EPS)]
                          Out[347]:
                          STK_ID RPT_Date STK_ID.1 EPS cash
                          2 600016 20111231 600016 4.3 NaN
                          4 601939 20111231 601939 2.5 NaN






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Dec 4 '15 at 7:01









                          Anton Protopopov

                          14.4k34657




                          14.4k34657






















                              up vote
                              8
                              down vote













                              yet another solution which uses the fact that np.nan != np.nan:



                              In [149]: df.query("EPS == EPS")
                              Out[149]:
                              STK_ID EPS cash
                              STK_ID RPT_Date
                              600016 20111231 600016 4.3 NaN
                              601939 20111231 601939 2.5 NaN





                              share|improve this answer

























                                up vote
                                8
                                down vote













                                yet another solution which uses the fact that np.nan != np.nan:



                                In [149]: df.query("EPS == EPS")
                                Out[149]:
                                STK_ID EPS cash
                                STK_ID RPT_Date
                                600016 20111231 600016 4.3 NaN
                                601939 20111231 601939 2.5 NaN





                                share|improve this answer























                                  up vote
                                  8
                                  down vote










                                  up vote
                                  8
                                  down vote









                                  yet another solution which uses the fact that np.nan != np.nan:



                                  In [149]: df.query("EPS == EPS")
                                  Out[149]:
                                  STK_ID EPS cash
                                  STK_ID RPT_Date
                                  600016 20111231 600016 4.3 NaN
                                  601939 20111231 601939 2.5 NaN





                                  share|improve this answer












                                  yet another solution which uses the fact that np.nan != np.nan:



                                  In [149]: df.query("EPS == EPS")
                                  Out[149]:
                                  STK_ID EPS cash
                                  STK_ID RPT_Date
                                  600016 20111231 600016 4.3 NaN
                                  601939 20111231 601939 2.5 NaN






                                  share|improve this answer












                                  share|improve this answer



                                  share|improve this answer










                                  answered Apr 20 '17 at 21:15









                                  MaxU

                                  118k11105163




                                  118k11105163






















                                      up vote
                                      2
                                      down vote













                                      Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):



                                      df=df[~df['EPS'].isnull()]


                                      Now:



                                      print(df)


                                      Is:



                                                       STK_ID  EPS  cash
                                      STK_ID RPT_Date
                                      600016 20111231 600016 4.3 NaN
                                      601939 20111231 601939 2.5 NaN





                                      share|improve this answer

























                                        up vote
                                        2
                                        down vote













                                        Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):



                                        df=df[~df['EPS'].isnull()]


                                        Now:



                                        print(df)


                                        Is:



                                                         STK_ID  EPS  cash
                                        STK_ID RPT_Date
                                        600016 20111231 600016 4.3 NaN
                                        601939 20111231 601939 2.5 NaN





                                        share|improve this answer























                                          up vote
                                          2
                                          down vote










                                          up vote
                                          2
                                          down vote









                                          Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):



                                          df=df[~df['EPS'].isnull()]


                                          Now:



                                          print(df)


                                          Is:



                                                           STK_ID  EPS  cash
                                          STK_ID RPT_Date
                                          600016 20111231 600016 4.3 NaN
                                          601939 20111231 601939 2.5 NaN





                                          share|improve this answer












                                          Or (check for NaN's with isnull, then use ~ to make the opposite to no NaN's):



                                          df=df[~df['EPS'].isnull()]


                                          Now:



                                          print(df)


                                          Is:



                                                           STK_ID  EPS  cash
                                          STK_ID RPT_Date
                                          600016 20111231 600016 4.3 NaN
                                          601939 20111231 601939 2.5 NaN






                                          share|improve this answer












                                          share|improve this answer



                                          share|improve this answer










                                          answered Oct 18 at 23:55









                                          U9-Forward

                                          10.5k2834




                                          10.5k2834






















                                              up vote
                                              1
                                              down vote













                                              you can use dropna



                                              Example



                                              Drop the rows where at least one element is missing.



                                              df=df.dropna()


                                              Define in which columns to look for missing values.



                                              df=df.dropna(subset=['column1', 'column1'])


                                              See this for more examples




                                              Note: axis parameter of dropna is deprecated since version 0.23.0:







                                              share|improve this answer

























                                                up vote
                                                1
                                                down vote













                                                you can use dropna



                                                Example



                                                Drop the rows where at least one element is missing.



                                                df=df.dropna()


                                                Define in which columns to look for missing values.



                                                df=df.dropna(subset=['column1', 'column1'])


                                                See this for more examples




                                                Note: axis parameter of dropna is deprecated since version 0.23.0:







                                                share|improve this answer























                                                  up vote
                                                  1
                                                  down vote










                                                  up vote
                                                  1
                                                  down vote









                                                  you can use dropna



                                                  Example



                                                  Drop the rows where at least one element is missing.



                                                  df=df.dropna()


                                                  Define in which columns to look for missing values.



                                                  df=df.dropna(subset=['column1', 'column1'])


                                                  See this for more examples




                                                  Note: axis parameter of dropna is deprecated since version 0.23.0:







                                                  share|improve this answer












                                                  you can use dropna



                                                  Example



                                                  Drop the rows where at least one element is missing.



                                                  df=df.dropna()


                                                  Define in which columns to look for missing values.



                                                  df=df.dropna(subset=['column1', 'column1'])


                                                  See this for more examples




                                                  Note: axis parameter of dropna is deprecated since version 0.23.0:








                                                  share|improve this answer












                                                  share|improve this answer



                                                  share|improve this answer










                                                  answered Oct 14 at 19:26









                                                  Umer

                                                  570415




                                                  570415






















                                                      up vote
                                                      0
                                                      down vote













                                                      It may be added at that '&' can be used to add additional conditions e.g.



                                                      df = df[(df.EPS > 2.0) & (df.EPS <4.0)]


                                                      Notice that when evaluating the statements, pandas needs parenthesis.






                                                      share|improve this answer























                                                      • Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.
                                                        – jezrael
                                                        Mar 16 '16 at 11:52

















                                                      up vote
                                                      0
                                                      down vote













                                                      It may be added at that '&' can be used to add additional conditions e.g.



                                                      df = df[(df.EPS > 2.0) & (df.EPS <4.0)]


                                                      Notice that when evaluating the statements, pandas needs parenthesis.






                                                      share|improve this answer























                                                      • Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.
                                                        – jezrael
                                                        Mar 16 '16 at 11:52















                                                      up vote
                                                      0
                                                      down vote










                                                      up vote
                                                      0
                                                      down vote









                                                      It may be added at that '&' can be used to add additional conditions e.g.



                                                      df = df[(df.EPS > 2.0) & (df.EPS <4.0)]


                                                      Notice that when evaluating the statements, pandas needs parenthesis.






                                                      share|improve this answer














                                                      It may be added at that '&' can be used to add additional conditions e.g.



                                                      df = df[(df.EPS > 2.0) & (df.EPS <4.0)]


                                                      Notice that when evaluating the statements, pandas needs parenthesis.







                                                      share|improve this answer














                                                      share|improve this answer



                                                      share|improve this answer








                                                      edited Jan 26 '17 at 23:12









                                                      aesede

                                                      3,88922629




                                                      3,88922629










                                                      answered Mar 15 '16 at 15:33









                                                      David

                                                      91




                                                      91












                                                      • Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.
                                                        – jezrael
                                                        Mar 16 '16 at 11:52




















                                                      • Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.
                                                        – jezrael
                                                        Mar 16 '16 at 11:52


















                                                      Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.
                                                      – jezrael
                                                      Mar 16 '16 at 11:52






                                                      Sorry, but OP want someting else. Btw, your code is wrong, return ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().. You need add parenthesis - df = df[(df.EPS > 2.0) & (df.EPS <4.0)], but also it is not answer for this question.
                                                      – jezrael
                                                      Mar 16 '16 at 11:52












                                                      up vote
                                                      -1
                                                      down vote













                                                      For some reason none of the previously submitted answers worked for me. This basic solution did:



                                                      df = df[df.EPS >= 0]


                                                      Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.



                                                      df = df[df.EPS <= 0]





                                                      share|improve this answer



























                                                        up vote
                                                        -1
                                                        down vote













                                                        For some reason none of the previously submitted answers worked for me. This basic solution did:



                                                        df = df[df.EPS >= 0]


                                                        Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.



                                                        df = df[df.EPS <= 0]





                                                        share|improve this answer

























                                                          up vote
                                                          -1
                                                          down vote










                                                          up vote
                                                          -1
                                                          down vote









                                                          For some reason none of the previously submitted answers worked for me. This basic solution did:



                                                          df = df[df.EPS >= 0]


                                                          Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.



                                                          df = df[df.EPS <= 0]





                                                          share|improve this answer














                                                          For some reason none of the previously submitted answers worked for me. This basic solution did:



                                                          df = df[df.EPS >= 0]


                                                          Though of course that will drop rows with negative numbers, too. So if you want those it's probably smart to add this after, too.



                                                          df = df[df.EPS <= 0]






                                                          share|improve this answer














                                                          share|improve this answer



                                                          share|improve this answer








                                                          edited Oct 9 '15 at 18:25

























                                                          answered Oct 9 '15 at 18:00









                                                          samthebrand

                                                          93521938




                                                          93521938

















                                                              protected by jezrael Mar 16 '16 at 11:53



                                                              Thank you for your interest in this question.
                                                              Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).



                                                              Would you like to answer one of these unanswered questions instead?



                                                              Popular posts from this blog

                                                              Costa Masnaga

                                                              Fotorealismo

                                                              Sidney Franklin