Pandas conditional filter












3















I have a dataframe



   A     B     C
0 True True True
1 True False False
2 False False False


I would like to add a row D with the following conditions:



D is true, if A, B and C are true. Else, D is false.



I tried



df['D'] = df.loc[(df['A'] == True) & df['B'] == True & df['C'] == True] 


I get



TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]


Then I tried to follow this example and wrote a similar function as suggested in the link:



def all_true(row):

if row['A'] == True:
if row['B'] == True:
if row['C'] == True:
val = True
else:
val = 0

return val

df['D'] = df.apply(all_true(df), axis=1)


In which case I get



ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


I'd appreciate suggestions. Thanks!










share|improve this question



























    3















    I have a dataframe



       A     B     C
    0 True True True
    1 True False False
    2 False False False


    I would like to add a row D with the following conditions:



    D is true, if A, B and C are true. Else, D is false.



    I tried



    df['D'] = df.loc[(df['A'] == True) & df['B'] == True & df['C'] == True] 


    I get



    TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]


    Then I tried to follow this example and wrote a similar function as suggested in the link:



    def all_true(row):

    if row['A'] == True:
    if row['B'] == True:
    if row['C'] == True:
    val = True
    else:
    val = 0

    return val

    df['D'] = df.apply(all_true(df), axis=1)


    In which case I get



    ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


    I'd appreciate suggestions. Thanks!










    share|improve this question

























      3












      3








      3








      I have a dataframe



         A     B     C
      0 True True True
      1 True False False
      2 False False False


      I would like to add a row D with the following conditions:



      D is true, if A, B and C are true. Else, D is false.



      I tried



      df['D'] = df.loc[(df['A'] == True) & df['B'] == True & df['C'] == True] 


      I get



      TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]


      Then I tried to follow this example and wrote a similar function as suggested in the link:



      def all_true(row):

      if row['A'] == True:
      if row['B'] == True:
      if row['C'] == True:
      val = True
      else:
      val = 0

      return val

      df['D'] = df.apply(all_true(df), axis=1)


      In which case I get



      ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


      I'd appreciate suggestions. Thanks!










      share|improve this question














      I have a dataframe



         A     B     C
      0 True True True
      1 True False False
      2 False False False


      I would like to add a row D with the following conditions:



      D is true, if A, B and C are true. Else, D is false.



      I tried



      df['D'] = df.loc[(df['A'] == True) & df['B'] == True & df['C'] == True] 


      I get



      TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]


      Then I tried to follow this example and wrote a similar function as suggested in the link:



      def all_true(row):

      if row['A'] == True:
      if row['B'] == True:
      if row['C'] == True:
      val = True
      else:
      val = 0

      return val

      df['D'] = df.apply(all_true(df), axis=1)


      In which case I get



      ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().


      I'd appreciate suggestions. Thanks!







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 23 '18 at 6:43









      MeeepMeeep

      325




      325
























          3 Answers
          3






          active

          oldest

          votes


















          4














          Or even better:



          df['D']=df.all(1)


          And now:



          print(df)


          Is:



                 A      B      C      D
          0 True True True True
          1 True False False False
          2 False False False False





          share|improve this answer



















          • 1





            Did the trick. Thanks

            – Meeep
            Nov 23 '18 at 12:07











          • @Meeep Happy to help, :-), 😊😊😊

            – U9-Forward
            Nov 24 '18 at 23:45



















          3














          Comparing with True is not necessary, ony chain boolean masks with &:



          df['D'] = df['A'] & df['B'] & df['C']


          If performance is important:



          df['D'] = df['A'].values & df['B'].values & df['C'].values


          Or use DataFrame.all for check all Trues per rows:



          df['D'] = df[['A','B','C']].all(axis=1)

          #numpy all
          #df['D'] = np.all(df.values,1)




          print (df)
          A B C D
          0 True True True True
          1 True False False False
          2 False False False False


          Performance:



          g



          np.random.seed(125)

          def all1(df):
          df['D'] = df.all(axis=1)
          return df

          def all1_numpy(df):
          df['D'] = np.all(df.values,1)
          return df

          def eval1(df):
          df['D'] = df.eval('A & B & C')
          return df

          def chained(df):
          df['D'] = df['A'] & df['B'] & df['C']
          return df

          def chained_numpy(df):
          df['D'] = df['A'].values & df['B'].values & df['C'].values
          return df




          def make_df(n):
          df = pd.DataFrame({'A':np.random.choice([True, False], size=n),
          'B':np.random.choice([True, False], size=n),
          'C':np.random.choice([True, False], size=n)})
          return df

          perfplot.show(
          setup=make_df,
          kernels=[all1, all1_numpy, eval1,chained,chained_numpy],
          n_range=[2**k for k in range(2, 25)],
          logx=True,
          logy=True,
          equality_check=False,
          xlabel='len(df)')





          share|improve this answer


























          • @jezrael, what is perfplot is this matplotlib import? i'm into that learning this is good example.

            – pygo
            Nov 23 '18 at 8:31






          • 1





            no, it is custom module, learning from unutbu, github.com/nschloe/perfplot - but it use matplotlib

            – jezrael
            Nov 23 '18 at 8:33



















          1














          Using pandas eval:



          df['D'] = df.eval('A & B & C')


          Or:



          df = df.eval('D = A & B & C')
          #alternative inplace df.eval('D = A & B & C', inplace=True)


          Or:



          df['D'] = np.all(df.values,1)

          print(df)
          A B C D
          0 True True True True
          1 True False False False
          2 False False False False





          share|improve this answer


























          • Good Try +1 :-)

            – pygo
            Nov 23 '18 at 12:45











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53441783%2fpandas-conditional-filter%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          4














          Or even better:



          df['D']=df.all(1)


          And now:



          print(df)


          Is:



                 A      B      C      D
          0 True True True True
          1 True False False False
          2 False False False False





          share|improve this answer



















          • 1





            Did the trick. Thanks

            – Meeep
            Nov 23 '18 at 12:07











          • @Meeep Happy to help, :-), 😊😊😊

            – U9-Forward
            Nov 24 '18 at 23:45
















          4














          Or even better:



          df['D']=df.all(1)


          And now:



          print(df)


          Is:



                 A      B      C      D
          0 True True True True
          1 True False False False
          2 False False False False





          share|improve this answer



















          • 1





            Did the trick. Thanks

            – Meeep
            Nov 23 '18 at 12:07











          • @Meeep Happy to help, :-), 😊😊😊

            – U9-Forward
            Nov 24 '18 at 23:45














          4












          4








          4







          Or even better:



          df['D']=df.all(1)


          And now:



          print(df)


          Is:



                 A      B      C      D
          0 True True True True
          1 True False False False
          2 False False False False





          share|improve this answer













          Or even better:



          df['D']=df.all(1)


          And now:



          print(df)


          Is:



                 A      B      C      D
          0 True True True True
          1 True False False False
          2 False False False False






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 23 '18 at 6:45









          U9-ForwardU9-Forward

          15.2k41438




          15.2k41438








          • 1





            Did the trick. Thanks

            – Meeep
            Nov 23 '18 at 12:07











          • @Meeep Happy to help, :-), 😊😊😊

            – U9-Forward
            Nov 24 '18 at 23:45














          • 1





            Did the trick. Thanks

            – Meeep
            Nov 23 '18 at 12:07











          • @Meeep Happy to help, :-), 😊😊😊

            – U9-Forward
            Nov 24 '18 at 23:45








          1




          1





          Did the trick. Thanks

          – Meeep
          Nov 23 '18 at 12:07





          Did the trick. Thanks

          – Meeep
          Nov 23 '18 at 12:07













          @Meeep Happy to help, :-), 😊😊😊

          – U9-Forward
          Nov 24 '18 at 23:45





          @Meeep Happy to help, :-), 😊😊😊

          – U9-Forward
          Nov 24 '18 at 23:45













          3














          Comparing with True is not necessary, ony chain boolean masks with &:



          df['D'] = df['A'] & df['B'] & df['C']


          If performance is important:



          df['D'] = df['A'].values & df['B'].values & df['C'].values


          Or use DataFrame.all for check all Trues per rows:



          df['D'] = df[['A','B','C']].all(axis=1)

          #numpy all
          #df['D'] = np.all(df.values,1)




          print (df)
          A B C D
          0 True True True True
          1 True False False False
          2 False False False False


          Performance:



          g



          np.random.seed(125)

          def all1(df):
          df['D'] = df.all(axis=1)
          return df

          def all1_numpy(df):
          df['D'] = np.all(df.values,1)
          return df

          def eval1(df):
          df['D'] = df.eval('A & B & C')
          return df

          def chained(df):
          df['D'] = df['A'] & df['B'] & df['C']
          return df

          def chained_numpy(df):
          df['D'] = df['A'].values & df['B'].values & df['C'].values
          return df




          def make_df(n):
          df = pd.DataFrame({'A':np.random.choice([True, False], size=n),
          'B':np.random.choice([True, False], size=n),
          'C':np.random.choice([True, False], size=n)})
          return df

          perfplot.show(
          setup=make_df,
          kernels=[all1, all1_numpy, eval1,chained,chained_numpy],
          n_range=[2**k for k in range(2, 25)],
          logx=True,
          logy=True,
          equality_check=False,
          xlabel='len(df)')





          share|improve this answer


























          • @jezrael, what is perfplot is this matplotlib import? i'm into that learning this is good example.

            – pygo
            Nov 23 '18 at 8:31






          • 1





            no, it is custom module, learning from unutbu, github.com/nschloe/perfplot - but it use matplotlib

            – jezrael
            Nov 23 '18 at 8:33
















          3














          Comparing with True is not necessary, ony chain boolean masks with &:



          df['D'] = df['A'] & df['B'] & df['C']


          If performance is important:



          df['D'] = df['A'].values & df['B'].values & df['C'].values


          Or use DataFrame.all for check all Trues per rows:



          df['D'] = df[['A','B','C']].all(axis=1)

          #numpy all
          #df['D'] = np.all(df.values,1)




          print (df)
          A B C D
          0 True True True True
          1 True False False False
          2 False False False False


          Performance:



          g



          np.random.seed(125)

          def all1(df):
          df['D'] = df.all(axis=1)
          return df

          def all1_numpy(df):
          df['D'] = np.all(df.values,1)
          return df

          def eval1(df):
          df['D'] = df.eval('A & B & C')
          return df

          def chained(df):
          df['D'] = df['A'] & df['B'] & df['C']
          return df

          def chained_numpy(df):
          df['D'] = df['A'].values & df['B'].values & df['C'].values
          return df




          def make_df(n):
          df = pd.DataFrame({'A':np.random.choice([True, False], size=n),
          'B':np.random.choice([True, False], size=n),
          'C':np.random.choice([True, False], size=n)})
          return df

          perfplot.show(
          setup=make_df,
          kernels=[all1, all1_numpy, eval1,chained,chained_numpy],
          n_range=[2**k for k in range(2, 25)],
          logx=True,
          logy=True,
          equality_check=False,
          xlabel='len(df)')





          share|improve this answer


























          • @jezrael, what is perfplot is this matplotlib import? i'm into that learning this is good example.

            – pygo
            Nov 23 '18 at 8:31






          • 1





            no, it is custom module, learning from unutbu, github.com/nschloe/perfplot - but it use matplotlib

            – jezrael
            Nov 23 '18 at 8:33














          3












          3








          3







          Comparing with True is not necessary, ony chain boolean masks with &:



          df['D'] = df['A'] & df['B'] & df['C']


          If performance is important:



          df['D'] = df['A'].values & df['B'].values & df['C'].values


          Or use DataFrame.all for check all Trues per rows:



          df['D'] = df[['A','B','C']].all(axis=1)

          #numpy all
          #df['D'] = np.all(df.values,1)




          print (df)
          A B C D
          0 True True True True
          1 True False False False
          2 False False False False


          Performance:



          g



          np.random.seed(125)

          def all1(df):
          df['D'] = df.all(axis=1)
          return df

          def all1_numpy(df):
          df['D'] = np.all(df.values,1)
          return df

          def eval1(df):
          df['D'] = df.eval('A & B & C')
          return df

          def chained(df):
          df['D'] = df['A'] & df['B'] & df['C']
          return df

          def chained_numpy(df):
          df['D'] = df['A'].values & df['B'].values & df['C'].values
          return df




          def make_df(n):
          df = pd.DataFrame({'A':np.random.choice([True, False], size=n),
          'B':np.random.choice([True, False], size=n),
          'C':np.random.choice([True, False], size=n)})
          return df

          perfplot.show(
          setup=make_df,
          kernels=[all1, all1_numpy, eval1,chained,chained_numpy],
          n_range=[2**k for k in range(2, 25)],
          logx=True,
          logy=True,
          equality_check=False,
          xlabel='len(df)')





          share|improve this answer















          Comparing with True is not necessary, ony chain boolean masks with &:



          df['D'] = df['A'] & df['B'] & df['C']


          If performance is important:



          df['D'] = df['A'].values & df['B'].values & df['C'].values


          Or use DataFrame.all for check all Trues per rows:



          df['D'] = df[['A','B','C']].all(axis=1)

          #numpy all
          #df['D'] = np.all(df.values,1)




          print (df)
          A B C D
          0 True True True True
          1 True False False False
          2 False False False False


          Performance:



          g



          np.random.seed(125)

          def all1(df):
          df['D'] = df.all(axis=1)
          return df

          def all1_numpy(df):
          df['D'] = np.all(df.values,1)
          return df

          def eval1(df):
          df['D'] = df.eval('A & B & C')
          return df

          def chained(df):
          df['D'] = df['A'] & df['B'] & df['C']
          return df

          def chained_numpy(df):
          df['D'] = df['A'].values & df['B'].values & df['C'].values
          return df




          def make_df(n):
          df = pd.DataFrame({'A':np.random.choice([True, False], size=n),
          'B':np.random.choice([True, False], size=n),
          'C':np.random.choice([True, False], size=n)})
          return df

          perfplot.show(
          setup=make_df,
          kernels=[all1, all1_numpy, eval1,chained,chained_numpy],
          n_range=[2**k for k in range(2, 25)],
          logx=True,
          logy=True,
          equality_check=False,
          xlabel='len(df)')






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 23 '18 at 7:11

























          answered Nov 23 '18 at 6:44









          jezraeljezrael

          334k25277353




          334k25277353













          • @jezrael, what is perfplot is this matplotlib import? i'm into that learning this is good example.

            – pygo
            Nov 23 '18 at 8:31






          • 1





            no, it is custom module, learning from unutbu, github.com/nschloe/perfplot - but it use matplotlib

            – jezrael
            Nov 23 '18 at 8:33



















          • @jezrael, what is perfplot is this matplotlib import? i'm into that learning this is good example.

            – pygo
            Nov 23 '18 at 8:31






          • 1





            no, it is custom module, learning from unutbu, github.com/nschloe/perfplot - but it use matplotlib

            – jezrael
            Nov 23 '18 at 8:33

















          @jezrael, what is perfplot is this matplotlib import? i'm into that learning this is good example.

          – pygo
          Nov 23 '18 at 8:31





          @jezrael, what is perfplot is this matplotlib import? i'm into that learning this is good example.

          – pygo
          Nov 23 '18 at 8:31




          1




          1





          no, it is custom module, learning from unutbu, github.com/nschloe/perfplot - but it use matplotlib

          – jezrael
          Nov 23 '18 at 8:33





          no, it is custom module, learning from unutbu, github.com/nschloe/perfplot - but it use matplotlib

          – jezrael
          Nov 23 '18 at 8:33











          1














          Using pandas eval:



          df['D'] = df.eval('A & B & C')


          Or:



          df = df.eval('D = A & B & C')
          #alternative inplace df.eval('D = A & B & C', inplace=True)


          Or:



          df['D'] = np.all(df.values,1)

          print(df)
          A B C D
          0 True True True True
          1 True False False False
          2 False False False False





          share|improve this answer


























          • Good Try +1 :-)

            – pygo
            Nov 23 '18 at 12:45
















          1














          Using pandas eval:



          df['D'] = df.eval('A & B & C')


          Or:



          df = df.eval('D = A & B & C')
          #alternative inplace df.eval('D = A & B & C', inplace=True)


          Or:



          df['D'] = np.all(df.values,1)

          print(df)
          A B C D
          0 True True True True
          1 True False False False
          2 False False False False





          share|improve this answer


























          • Good Try +1 :-)

            – pygo
            Nov 23 '18 at 12:45














          1












          1








          1







          Using pandas eval:



          df['D'] = df.eval('A & B & C')


          Or:



          df = df.eval('D = A & B & C')
          #alternative inplace df.eval('D = A & B & C', inplace=True)


          Or:



          df['D'] = np.all(df.values,1)

          print(df)
          A B C D
          0 True True True True
          1 True False False False
          2 False False False False





          share|improve this answer















          Using pandas eval:



          df['D'] = df.eval('A & B & C')


          Or:



          df = df.eval('D = A & B & C')
          #alternative inplace df.eval('D = A & B & C', inplace=True)


          Or:



          df['D'] = np.all(df.values,1)

          print(df)
          A B C D
          0 True True True True
          1 True False False False
          2 False False False False






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 23 '18 at 7:05

























          answered Nov 23 '18 at 6:46









          Sandeep KadapaSandeep Kadapa

          7,043830




          7,043830













          • Good Try +1 :-)

            – pygo
            Nov 23 '18 at 12:45



















          • Good Try +1 :-)

            – pygo
            Nov 23 '18 at 12:45

















          Good Try +1 :-)

          – pygo
          Nov 23 '18 at 12:45





          Good Try +1 :-)

          – pygo
          Nov 23 '18 at 12:45


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53441783%2fpandas-conditional-filter%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Create new schema in PostgreSQL using DBeaver

          Deepest pit of an array with Javascript: test on Codility

          Costa Masnaga