Counting DISTINCT over multiple columns












153















Is there a better way of doing a query like this:



SELECT COUNT(*) 
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
FROM DocumentOutputItems) AS internalQuery


I need to count the number of distinct items from this table but the distinct is over two columns.



My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)










share|improve this question

























  • IordanTanev, Mark Brackett, RC - thanks for replies, it was a nice try, but you need to check what you doing before posting to SO. The queries you provided are not equivalent to my query. You can easily see I always have a scalar a result but your query returns multiple rows.

    – Novitzky
    Sep 24 '09 at 12:30











  • Just updated the question to include your clarifying comment from one of the answers

    – Jeff
    Mar 29 '16 at 2:54











  • FYI: community.oracle.com/ideas/18664

    – quetzalcoatl
    Sep 6 '18 at 14:49
















153















Is there a better way of doing a query like this:



SELECT COUNT(*) 
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
FROM DocumentOutputItems) AS internalQuery


I need to count the number of distinct items from this table but the distinct is over two columns.



My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)










share|improve this question

























  • IordanTanev, Mark Brackett, RC - thanks for replies, it was a nice try, but you need to check what you doing before posting to SO. The queries you provided are not equivalent to my query. You can easily see I always have a scalar a result but your query returns multiple rows.

    – Novitzky
    Sep 24 '09 at 12:30











  • Just updated the question to include your clarifying comment from one of the answers

    – Jeff
    Mar 29 '16 at 2:54











  • FYI: community.oracle.com/ideas/18664

    – quetzalcoatl
    Sep 6 '18 at 14:49














153












153








153


31






Is there a better way of doing a query like this:



SELECT COUNT(*) 
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
FROM DocumentOutputItems) AS internalQuery


I need to count the number of distinct items from this table but the distinct is over two columns.



My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)










share|improve this question
















Is there a better way of doing a query like this:



SELECT COUNT(*) 
FROM (SELECT DISTINCT DocumentId, DocumentSessionId
FROM DocumentOutputItems) AS internalQuery


I need to count the number of distinct items from this table but the distinct is over two columns.



My query works fine but I was wondering if I can get the final result using just one query (without using a sub-query)







sql sql-server tsql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 29 '16 at 2:54









Jeff

9,36352449




9,36352449










asked Sep 24 '09 at 12:03









NovitzkyNovitzky

1,83931826




1,83931826













  • IordanTanev, Mark Brackett, RC - thanks for replies, it was a nice try, but you need to check what you doing before posting to SO. The queries you provided are not equivalent to my query. You can easily see I always have a scalar a result but your query returns multiple rows.

    – Novitzky
    Sep 24 '09 at 12:30











  • Just updated the question to include your clarifying comment from one of the answers

    – Jeff
    Mar 29 '16 at 2:54











  • FYI: community.oracle.com/ideas/18664

    – quetzalcoatl
    Sep 6 '18 at 14:49



















  • IordanTanev, Mark Brackett, RC - thanks for replies, it was a nice try, but you need to check what you doing before posting to SO. The queries you provided are not equivalent to my query. You can easily see I always have a scalar a result but your query returns multiple rows.

    – Novitzky
    Sep 24 '09 at 12:30











  • Just updated the question to include your clarifying comment from one of the answers

    – Jeff
    Mar 29 '16 at 2:54











  • FYI: community.oracle.com/ideas/18664

    – quetzalcoatl
    Sep 6 '18 at 14:49

















IordanTanev, Mark Brackett, RC - thanks for replies, it was a nice try, but you need to check what you doing before posting to SO. The queries you provided are not equivalent to my query. You can easily see I always have a scalar a result but your query returns multiple rows.

– Novitzky
Sep 24 '09 at 12:30





IordanTanev, Mark Brackett, RC - thanks for replies, it was a nice try, but you need to check what you doing before posting to SO. The queries you provided are not equivalent to my query. You can easily see I always have a scalar a result but your query returns multiple rows.

– Novitzky
Sep 24 '09 at 12:30













Just updated the question to include your clarifying comment from one of the answers

– Jeff
Mar 29 '16 at 2:54





Just updated the question to include your clarifying comment from one of the answers

– Jeff
Mar 29 '16 at 2:54













FYI: community.oracle.com/ideas/18664

– quetzalcoatl
Sep 6 '18 at 14:49





FYI: community.oracle.com/ideas/18664

– quetzalcoatl
Sep 6 '18 at 14:49












17 Answers
17






active

oldest

votes


















55














If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.



Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.



I believe a distinct count of the computed column would be equivalent to your query.






share|improve this answer





















  • 3





    Excellent suggestion! The more I read, the more I am realizing that SQL is less about knowing syntax and functions and more about applying pure logic.. I wish I had 2 upvotes!

    – tumchaaditya
    Oct 4 '13 at 22:48













  • Too good suggestion. It avoided me to write unnecessary code to this.

    – Avrajit Roy
    Mar 9 '16 at 6:39



















47














Edit: Altered from the less-than-reliable checksum-only query
I've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable



SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
FROM DocumentOutPutItems





share|improve this answer


























  • +1 Nice one, works perfect (when you have the right column types to perform a CheckSum on... ;)

    – Bernoulli IT
    Nov 7 '12 at 12:54






  • 7





    With hashes like Checksum(), there is small chance that the same hash will be returned for different inputs so the count may be very slightly off. HashBytes() is an even smaller chance but still not zero. If those two Ids were int's (32b) then a "lossless hash" could combine them into an bigint (64b) like Id1 << 32 + Id2.

    – crokusek
    Jan 31 '14 at 19:35








  • 1





    the chance is not so small even, especially when you start combining columns (which is what it was supposed to be meant for). I was curious about this approach and in a particular case the checksum ended up with a count 10% smaller. If you think of it a bit longer, Checksum just returns an int, so if you'd checksum a full bigint range you'll end up with a distinct count about 2 billion times smaller than there actually is. -1

    – pvolders
    Jul 23 '14 at 7:53






  • 3





    Could we avoid CHECKSUM -- could we just concatenate the two values together? I suppose that risks considering as the same thing: ('he', 'art') == 'hear', 't'). But I think that can be solved with a delimiter as @APC proposes (some value that doesn't appear in either column), so 'he|art' != 'hear|t' Are there other problems with a simple "concatenation" approach?

    – The Red Pea
    Jan 9 '16 at 15:40






  • 1





    this answer is better than good: "good enough"

    – scottlittle
    Oct 9 '17 at 17:44



















22














What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?



It certainly works as you might expect in Oracle.



SQL> select distinct deptno, job from emp
2 order by deptno, job
3 /

DEPTNO JOB
---------- ---------
10 CLERK
10 MANAGER
10 PRESIDENT
20 ANALYST
20 CLERK
20 MANAGER
30 CLERK
30 MANAGER
30 SALESMAN

9 rows selected.


SQL> select count(*) from (
2 select distinct deptno, job from emp
3 )
4 /

COUNT(*)
----------
9

SQL>


edit



I went down a blind alley with analytics but the answer was depressingly obvious...



SQL> select count(distinct concat(deptno,job)) from emp
2 /

COUNT(DISTINCTCONCAT(DEPTNO,JOB))
---------------------------------
9

SQL>


edit 2



Given the following data the concatenating solution provided above will miscount:



col1  col2
---- ----
A AA
AA A


So we to include a separator...



select col1 + '*' + col2 from t23
/


Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.






share|improve this answer


























  • +1 from me. Thanks for your answer. My query works fine but I was wondering if I can get the final result using just one query (without using a subquery)

    – Novitzky
    Sep 24 '09 at 13:57



















14














How about something like:




select count(*)
from
(select count(*) cnt
from DocumentOutputItems
group by DocumentId, DocumentSessionId) t1



Probably just does the same as you are already though but it avoids the DISTINCT.






share|improve this answer


























  • Yes, you right. It does the same job as my original one.

    – Novitzky
    Sep 24 '09 at 13:35











  • in my tests (using SET SHOWPLAN_ALL ON), it had the same execution plan and exact same TotalSubtreeCost

    – KM.
    Sep 24 '09 at 13:43











  • +1 for a nice try and the explanation.

    – Novitzky
    Sep 24 '09 at 13:50






  • 1





    Depending on the complexity of the original query, solving this with GROUP BY may introduce a couple of additional challenges to the query transformation to achieve the desired output (e.g. when the original query already had GROUP BY or HAVING clauses...)

    – Lukas Eder
    Dec 17 '13 at 9:08





















8














To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.



SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;


In MySQL you can do the same thing without the concatenation step as follows:



SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;


This feature is mentioned in the MySQL documentation:



http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct






share|improve this answer
























  • This was a SQL Server question, and both options you posted have already been mentioned in the following answers to this question: stackoverflow.com/a/1471444/4955425 and stackoverflow.com/a/1471713/4955425.

    – sstan
    Jul 28 '16 at 20:31











  • FWIW, this almost works in PostgreSQL; just need extra parentheses: SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

    – ijoseph
    Jun 25 '18 at 20:12





















6














Here's a shorter version without the subselect:



SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems


It works fine in MySQL, and I think that the optimizer has an easier time understanding this one.



Edit: Apparently I misread MSSQL and MySQL - sorry about that, but maybe it helps anyway.






share|improve this answer





















  • 6





    in SQL Server you get: Msg 102, Level 15, State 1, Line 1 Incorrect syntax near ','.

    – KM.
    Sep 24 '09 at 13:35











  • This is what I was thinking of. I want do similar thing in MSSQL if possible.

    – Novitzky
    Sep 24 '09 at 13:38











  • @Kamil Nowicki, in SQL Server, you can only have one field in a COUNT(), in my answer I show that you can concatenate the two fields into one and try this approach. However, I'd just stick with the original since the query plans would end up the same.

    – KM.
    Sep 24 '09 at 13:45











  • Please give a look in @JayTee answer. It works like a charm. count ( distinct CHECKSUM ([Field1], [Field2])

    – Custodio
    Nov 16 '12 at 14:53



















3














There's nothing wrong with your query, but you could also do it this way:



WITH internalQuery (Amount)
AS
(
SELECT (0)
FROM DocumentOutputItems
GROUP BY DocumentId, DocumentSessionId
)
SELECT COUNT(*) AS NumberOfDistinctRows
FROM internalQuery





share|improve this answer































    3














    I found this when I Googled for my own issue, found that if you count DISTINCT objects, you get the correct number returned (I'm using MySQL)



    SELECT COUNT(DISTINCT DocumentID) AS Count1, 
    COUNT(DISTINCT DocumentSessionId) AS Count2
    FROM DocumentOutputItems





    share|improve this answer



















    • 5





      The above query will return a different set of results than what the OP was looking for (the distinct combinations of DocumentId and DocumentSessionId). Alexander Kjäll already posted the correct answer if the OP was using MySQL and not MS SQL Server.

      – Anthony Geoghegan
      Jul 28 '14 at 9:21



















    2














    if you had only one field to "DISTINCT", you could use:



    SELECT COUNT(DISTINCT DocumentId) 
    FROM DocumentOutputItems


    and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:



        SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId)) 
    FROM DocumentOutputItems


    but you'll have issues if NULLs are involved. I'd just stick with the original query.






    share|improve this answer
























    • +1 from me. Thanks but I will stick with my query as you suggested. Using "convert" can decrease performance even more.

      – Novitzky
      Sep 24 '09 at 13:58



















    2














    Hope this works i am writing on prima vista



    SELECT COUNT(*) 
    FROM DocumentOutputItems
    GROUP BY DocumentId, DocumentSessionId





    share|improve this answer





















    • 7





      In order for this to give the final answer, you would have to wrap it in another SELECT COUNT(*) FROM ( ... ). Essentially this answer is just giving you another way to list the distinct values you want to count. It's no better than your original solution.

      – Dave Costa
      Sep 24 '09 at 13:19











    • Thanks Dave. I know you can use group by instead of distinct in my case. I was wondering if you get the final result using just one query. I think is impossible but I might be wrong.

      – Novitzky
      Sep 24 '09 at 13:32



















    1














    I wish MS SQL could also do something like COUNT(DISTINCT A, B). But it can't.



    At first JayTee's answer seemed like a solution to me bu after some tests CHECKSUM() failed to create unique values. A quick example is, both CHECKSUM(31,467,519) and CHECKSUM(69,1120,823) gives the same answer which is 55.



    Then I made some research and found that Microsoft does NOT recommend using CHECKSUM for change detection purposes. In some forums some suggested using



    SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))


    but this is also not conforting.



    You can use HASHBYTES() function as suggested in TSQL CHECKSUM conundrum. However this also has a small chance of not returning unique results.



    I would suggest using



    SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems





    share|improve this answer

































      1














      Many (most?) SQL databases can work with tuples like values so you can just do:

      SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId))
      FROM DocumentOutputItems;

      If your database doesn't support this, it can be simulated as per @oncel-umut-turer's suggestion of CHECKSUM or other scalar function providing good uniqueness e.g.

      COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId))
      .



      A related use of tuples is performing IN queries such as:

      SELECT * FROM DocumentOutputItems
      WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));






      share|improve this answer
























      • what databases support select count(distinct(a, b))? :D

        – Vytenis Bivainis
        Oct 11 '18 at 21:39











      • @VytenisBivainis I know PostgreSQL does--not sure since which version.

        – karmakaze
        Oct 17 '18 at 0:35



















      0














      It works for me. In oracle:



      SELECT SUM(DECODE(COUNT(*),1,1,1))
      FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;


      In jpql:



      SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
      FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;





      share|improve this answer































        0














        How about this,



        Select DocumentId, DocumentSessionId, count(*) as c 
        from DocumentOutputItems
        group by DocumentId, DocumentSessionId;


        This will get us the count of all possible combinations of DocumentId, and DocumentSessionId






        share|improve this answer

































          0














          You can just use the Count Function Twice.



          In this case, it would be:



          SELECT COUNT (DISTINCT DocumentId), COUNT (DISTINCT DocumentSessionId) 
          FROM DocumentOutputItems





          share|improve this answer


























          • this doesn't do as require in the question, it counts the distinct in separate for each column

            – naviram
            Jan 10 at 22:17



















          -1














          select DISTINCT DocumentId as i, DocumentSessionId as s , count(*)



          from DocumentOutputItems



          group by i ,s;



          This code uses distinct on 2 parameters and provides count of number of rows specific to those distinct values row count. It worked for me in mysql like a charm.






          share|improve this answer


























          • This is the part of review answer in stackoverflow. Add some explanation though code is explanatory.

            – Harsha B
            Nov 16 '18 at 13:10











          • Can somebody please explain why my answer got a -1. Sorry, I am a newbie and I tried different answers and none of them was working. I made the changes and posted the working code?

            – rishi jain
            Nov 20 '18 at 6:45











          • Answer was not formatted properly. May be that's why someone downvoted. Don't worry, I have formatted the same.

            – Harsha B
            Nov 20 '18 at 7:04



















          -3














          This was posed and answered on Quora (https://www.quora.com/In-SQL-how-to-I-count-DISTINCT-over-multiple-columns):



          select col1, col2, col3, count(*)
          from table
          group by col1, col2, col3


          I was working on this in SAS, and SAS Proc SQL does not like DISTINCT with more than one column.






          share|improve this answer
























          • The original query in the question returns the number of combinations in given columns. This answer instead returns the number of occurrences for each combination in given columns.

            – jumxozizi
            Jan 27 '17 at 14:27











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f1471250%2fcounting-distinct-over-multiple-columns%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          17 Answers
          17






          active

          oldest

          votes








          17 Answers
          17






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          55














          If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.



          Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.



          I believe a distinct count of the computed column would be equivalent to your query.






          share|improve this answer





















          • 3





            Excellent suggestion! The more I read, the more I am realizing that SQL is less about knowing syntax and functions and more about applying pure logic.. I wish I had 2 upvotes!

            – tumchaaditya
            Oct 4 '13 at 22:48













          • Too good suggestion. It avoided me to write unnecessary code to this.

            – Avrajit Roy
            Mar 9 '16 at 6:39
















          55














          If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.



          Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.



          I believe a distinct count of the computed column would be equivalent to your query.






          share|improve this answer





















          • 3





            Excellent suggestion! The more I read, the more I am realizing that SQL is less about knowing syntax and functions and more about applying pure logic.. I wish I had 2 upvotes!

            – tumchaaditya
            Oct 4 '13 at 22:48













          • Too good suggestion. It avoided me to write unnecessary code to this.

            – Avrajit Roy
            Mar 9 '16 at 6:39














          55












          55








          55







          If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.



          Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.



          I believe a distinct count of the computed column would be equivalent to your query.






          share|improve this answer















          If you are trying to improve performance, you could try creating a persisted computed column on either a hash or concatenated value of the two columns.



          Once it is persisted, provided the column is deterministic and you are using "sane" database settings, it can be indexed and / or statistics can be created on it.



          I believe a distinct count of the computed column would be equivalent to your query.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jul 27 '16 at 19:18









          Scott Beeson

          7,6882178149




          7,6882178149










          answered Sep 26 '09 at 3:42









          JasonHornerJasonHorner

          2,61111827




          2,61111827








          • 3





            Excellent suggestion! The more I read, the more I am realizing that SQL is less about knowing syntax and functions and more about applying pure logic.. I wish I had 2 upvotes!

            – tumchaaditya
            Oct 4 '13 at 22:48













          • Too good suggestion. It avoided me to write unnecessary code to this.

            – Avrajit Roy
            Mar 9 '16 at 6:39














          • 3





            Excellent suggestion! The more I read, the more I am realizing that SQL is less about knowing syntax and functions and more about applying pure logic.. I wish I had 2 upvotes!

            – tumchaaditya
            Oct 4 '13 at 22:48













          • Too good suggestion. It avoided me to write unnecessary code to this.

            – Avrajit Roy
            Mar 9 '16 at 6:39








          3




          3





          Excellent suggestion! The more I read, the more I am realizing that SQL is less about knowing syntax and functions and more about applying pure logic.. I wish I had 2 upvotes!

          – tumchaaditya
          Oct 4 '13 at 22:48







          Excellent suggestion! The more I read, the more I am realizing that SQL is less about knowing syntax and functions and more about applying pure logic.. I wish I had 2 upvotes!

          – tumchaaditya
          Oct 4 '13 at 22:48















          Too good suggestion. It avoided me to write unnecessary code to this.

          – Avrajit Roy
          Mar 9 '16 at 6:39





          Too good suggestion. It avoided me to write unnecessary code to this.

          – Avrajit Roy
          Mar 9 '16 at 6:39













          47














          Edit: Altered from the less-than-reliable checksum-only query
          I've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable



          SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
          FROM DocumentOutPutItems





          share|improve this answer


























          • +1 Nice one, works perfect (when you have the right column types to perform a CheckSum on... ;)

            – Bernoulli IT
            Nov 7 '12 at 12:54






          • 7





            With hashes like Checksum(), there is small chance that the same hash will be returned for different inputs so the count may be very slightly off. HashBytes() is an even smaller chance but still not zero. If those two Ids were int's (32b) then a "lossless hash" could combine them into an bigint (64b) like Id1 << 32 + Id2.

            – crokusek
            Jan 31 '14 at 19:35








          • 1





            the chance is not so small even, especially when you start combining columns (which is what it was supposed to be meant for). I was curious about this approach and in a particular case the checksum ended up with a count 10% smaller. If you think of it a bit longer, Checksum just returns an int, so if you'd checksum a full bigint range you'll end up with a distinct count about 2 billion times smaller than there actually is. -1

            – pvolders
            Jul 23 '14 at 7:53






          • 3





            Could we avoid CHECKSUM -- could we just concatenate the two values together? I suppose that risks considering as the same thing: ('he', 'art') == 'hear', 't'). But I think that can be solved with a delimiter as @APC proposes (some value that doesn't appear in either column), so 'he|art' != 'hear|t' Are there other problems with a simple "concatenation" approach?

            – The Red Pea
            Jan 9 '16 at 15:40






          • 1





            this answer is better than good: "good enough"

            – scottlittle
            Oct 9 '17 at 17:44
















          47














          Edit: Altered from the less-than-reliable checksum-only query
          I've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable



          SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
          FROM DocumentOutPutItems





          share|improve this answer


























          • +1 Nice one, works perfect (when you have the right column types to perform a CheckSum on... ;)

            – Bernoulli IT
            Nov 7 '12 at 12:54






          • 7





            With hashes like Checksum(), there is small chance that the same hash will be returned for different inputs so the count may be very slightly off. HashBytes() is an even smaller chance but still not zero. If those two Ids were int's (32b) then a "lossless hash" could combine them into an bigint (64b) like Id1 << 32 + Id2.

            – crokusek
            Jan 31 '14 at 19:35








          • 1





            the chance is not so small even, especially when you start combining columns (which is what it was supposed to be meant for). I was curious about this approach and in a particular case the checksum ended up with a count 10% smaller. If you think of it a bit longer, Checksum just returns an int, so if you'd checksum a full bigint range you'll end up with a distinct count about 2 billion times smaller than there actually is. -1

            – pvolders
            Jul 23 '14 at 7:53






          • 3





            Could we avoid CHECKSUM -- could we just concatenate the two values together? I suppose that risks considering as the same thing: ('he', 'art') == 'hear', 't'). But I think that can be solved with a delimiter as @APC proposes (some value that doesn't appear in either column), so 'he|art' != 'hear|t' Are there other problems with a simple "concatenation" approach?

            – The Red Pea
            Jan 9 '16 at 15:40






          • 1





            this answer is better than good: "good enough"

            – scottlittle
            Oct 9 '17 at 17:44














          47












          47








          47







          Edit: Altered from the less-than-reliable checksum-only query
          I've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable



          SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
          FROM DocumentOutPutItems





          share|improve this answer















          Edit: Altered from the less-than-reliable checksum-only query
          I've discovered a way to do this (in SQL Server 2005) that works pretty well for me and I can use as many columns as I need (by adding them to the CHECKSUM() function). The REVERSE() function turns the ints into varchars to make the distinct more reliable



          SELECT COUNT(DISTINCT (CHECKSUM(DocumentId,DocumentSessionId)) + CHECKSUM(REVERSE(DocumentId),REVERSE(DocumentSessionId)) )
          FROM DocumentOutPutItems






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Sep 4 '14 at 14:01

























          answered Jul 6 '12 at 23:01









          JayTeeJayTee

          2,23521825




          2,23521825













          • +1 Nice one, works perfect (when you have the right column types to perform a CheckSum on... ;)

            – Bernoulli IT
            Nov 7 '12 at 12:54






          • 7





            With hashes like Checksum(), there is small chance that the same hash will be returned for different inputs so the count may be very slightly off. HashBytes() is an even smaller chance but still not zero. If those two Ids were int's (32b) then a "lossless hash" could combine them into an bigint (64b) like Id1 << 32 + Id2.

            – crokusek
            Jan 31 '14 at 19:35








          • 1





            the chance is not so small even, especially when you start combining columns (which is what it was supposed to be meant for). I was curious about this approach and in a particular case the checksum ended up with a count 10% smaller. If you think of it a bit longer, Checksum just returns an int, so if you'd checksum a full bigint range you'll end up with a distinct count about 2 billion times smaller than there actually is. -1

            – pvolders
            Jul 23 '14 at 7:53






          • 3





            Could we avoid CHECKSUM -- could we just concatenate the two values together? I suppose that risks considering as the same thing: ('he', 'art') == 'hear', 't'). But I think that can be solved with a delimiter as @APC proposes (some value that doesn't appear in either column), so 'he|art' != 'hear|t' Are there other problems with a simple "concatenation" approach?

            – The Red Pea
            Jan 9 '16 at 15:40






          • 1





            this answer is better than good: "good enough"

            – scottlittle
            Oct 9 '17 at 17:44



















          • +1 Nice one, works perfect (when you have the right column types to perform a CheckSum on... ;)

            – Bernoulli IT
            Nov 7 '12 at 12:54






          • 7





            With hashes like Checksum(), there is small chance that the same hash will be returned for different inputs so the count may be very slightly off. HashBytes() is an even smaller chance but still not zero. If those two Ids were int's (32b) then a "lossless hash" could combine them into an bigint (64b) like Id1 << 32 + Id2.

            – crokusek
            Jan 31 '14 at 19:35








          • 1





            the chance is not so small even, especially when you start combining columns (which is what it was supposed to be meant for). I was curious about this approach and in a particular case the checksum ended up with a count 10% smaller. If you think of it a bit longer, Checksum just returns an int, so if you'd checksum a full bigint range you'll end up with a distinct count about 2 billion times smaller than there actually is. -1

            – pvolders
            Jul 23 '14 at 7:53






          • 3





            Could we avoid CHECKSUM -- could we just concatenate the two values together? I suppose that risks considering as the same thing: ('he', 'art') == 'hear', 't'). But I think that can be solved with a delimiter as @APC proposes (some value that doesn't appear in either column), so 'he|art' != 'hear|t' Are there other problems with a simple "concatenation" approach?

            – The Red Pea
            Jan 9 '16 at 15:40






          • 1





            this answer is better than good: "good enough"

            – scottlittle
            Oct 9 '17 at 17:44

















          +1 Nice one, works perfect (when you have the right column types to perform a CheckSum on... ;)

          – Bernoulli IT
          Nov 7 '12 at 12:54





          +1 Nice one, works perfect (when you have the right column types to perform a CheckSum on... ;)

          – Bernoulli IT
          Nov 7 '12 at 12:54




          7




          7





          With hashes like Checksum(), there is small chance that the same hash will be returned for different inputs so the count may be very slightly off. HashBytes() is an even smaller chance but still not zero. If those two Ids were int's (32b) then a "lossless hash" could combine them into an bigint (64b) like Id1 << 32 + Id2.

          – crokusek
          Jan 31 '14 at 19:35







          With hashes like Checksum(), there is small chance that the same hash will be returned for different inputs so the count may be very slightly off. HashBytes() is an even smaller chance but still not zero. If those two Ids were int's (32b) then a "lossless hash" could combine them into an bigint (64b) like Id1 << 32 + Id2.

          – crokusek
          Jan 31 '14 at 19:35






          1




          1





          the chance is not so small even, especially when you start combining columns (which is what it was supposed to be meant for). I was curious about this approach and in a particular case the checksum ended up with a count 10% smaller. If you think of it a bit longer, Checksum just returns an int, so if you'd checksum a full bigint range you'll end up with a distinct count about 2 billion times smaller than there actually is. -1

          – pvolders
          Jul 23 '14 at 7:53





          the chance is not so small even, especially when you start combining columns (which is what it was supposed to be meant for). I was curious about this approach and in a particular case the checksum ended up with a count 10% smaller. If you think of it a bit longer, Checksum just returns an int, so if you'd checksum a full bigint range you'll end up with a distinct count about 2 billion times smaller than there actually is. -1

          – pvolders
          Jul 23 '14 at 7:53




          3




          3





          Could we avoid CHECKSUM -- could we just concatenate the two values together? I suppose that risks considering as the same thing: ('he', 'art') == 'hear', 't'). But I think that can be solved with a delimiter as @APC proposes (some value that doesn't appear in either column), so 'he|art' != 'hear|t' Are there other problems with a simple "concatenation" approach?

          – The Red Pea
          Jan 9 '16 at 15:40





          Could we avoid CHECKSUM -- could we just concatenate the two values together? I suppose that risks considering as the same thing: ('he', 'art') == 'hear', 't'). But I think that can be solved with a delimiter as @APC proposes (some value that doesn't appear in either column), so 'he|art' != 'hear|t' Are there other problems with a simple "concatenation" approach?

          – The Red Pea
          Jan 9 '16 at 15:40




          1




          1





          this answer is better than good: "good enough"

          – scottlittle
          Oct 9 '17 at 17:44





          this answer is better than good: "good enough"

          – scottlittle
          Oct 9 '17 at 17:44











          22














          What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?



          It certainly works as you might expect in Oracle.



          SQL> select distinct deptno, job from emp
          2 order by deptno, job
          3 /

          DEPTNO JOB
          ---------- ---------
          10 CLERK
          10 MANAGER
          10 PRESIDENT
          20 ANALYST
          20 CLERK
          20 MANAGER
          30 CLERK
          30 MANAGER
          30 SALESMAN

          9 rows selected.


          SQL> select count(*) from (
          2 select distinct deptno, job from emp
          3 )
          4 /

          COUNT(*)
          ----------
          9

          SQL>


          edit



          I went down a blind alley with analytics but the answer was depressingly obvious...



          SQL> select count(distinct concat(deptno,job)) from emp
          2 /

          COUNT(DISTINCTCONCAT(DEPTNO,JOB))
          ---------------------------------
          9

          SQL>


          edit 2



          Given the following data the concatenating solution provided above will miscount:



          col1  col2
          ---- ----
          A AA
          AA A


          So we to include a separator...



          select col1 + '*' + col2 from t23
          /


          Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.






          share|improve this answer


























          • +1 from me. Thanks for your answer. My query works fine but I was wondering if I can get the final result using just one query (without using a subquery)

            – Novitzky
            Sep 24 '09 at 13:57
















          22














          What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?



          It certainly works as you might expect in Oracle.



          SQL> select distinct deptno, job from emp
          2 order by deptno, job
          3 /

          DEPTNO JOB
          ---------- ---------
          10 CLERK
          10 MANAGER
          10 PRESIDENT
          20 ANALYST
          20 CLERK
          20 MANAGER
          30 CLERK
          30 MANAGER
          30 SALESMAN

          9 rows selected.


          SQL> select count(*) from (
          2 select distinct deptno, job from emp
          3 )
          4 /

          COUNT(*)
          ----------
          9

          SQL>


          edit



          I went down a blind alley with analytics but the answer was depressingly obvious...



          SQL> select count(distinct concat(deptno,job)) from emp
          2 /

          COUNT(DISTINCTCONCAT(DEPTNO,JOB))
          ---------------------------------
          9

          SQL>


          edit 2



          Given the following data the concatenating solution provided above will miscount:



          col1  col2
          ---- ----
          A AA
          AA A


          So we to include a separator...



          select col1 + '*' + col2 from t23
          /


          Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.






          share|improve this answer


























          • +1 from me. Thanks for your answer. My query works fine but I was wondering if I can get the final result using just one query (without using a subquery)

            – Novitzky
            Sep 24 '09 at 13:57














          22












          22








          22







          What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?



          It certainly works as you might expect in Oracle.



          SQL> select distinct deptno, job from emp
          2 order by deptno, job
          3 /

          DEPTNO JOB
          ---------- ---------
          10 CLERK
          10 MANAGER
          10 PRESIDENT
          20 ANALYST
          20 CLERK
          20 MANAGER
          30 CLERK
          30 MANAGER
          30 SALESMAN

          9 rows selected.


          SQL> select count(*) from (
          2 select distinct deptno, job from emp
          3 )
          4 /

          COUNT(*)
          ----------
          9

          SQL>


          edit



          I went down a blind alley with analytics but the answer was depressingly obvious...



          SQL> select count(distinct concat(deptno,job)) from emp
          2 /

          COUNT(DISTINCTCONCAT(DEPTNO,JOB))
          ---------------------------------
          9

          SQL>


          edit 2



          Given the following data the concatenating solution provided above will miscount:



          col1  col2
          ---- ----
          A AA
          AA A


          So we to include a separator...



          select col1 + '*' + col2 from t23
          /


          Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.






          share|improve this answer















          What is it about your existing query that you don't like? If you are concerned that DISTINCT across two columns does not return just the unique permutations why not try it?



          It certainly works as you might expect in Oracle.



          SQL> select distinct deptno, job from emp
          2 order by deptno, job
          3 /

          DEPTNO JOB
          ---------- ---------
          10 CLERK
          10 MANAGER
          10 PRESIDENT
          20 ANALYST
          20 CLERK
          20 MANAGER
          30 CLERK
          30 MANAGER
          30 SALESMAN

          9 rows selected.


          SQL> select count(*) from (
          2 select distinct deptno, job from emp
          3 )
          4 /

          COUNT(*)
          ----------
          9

          SQL>


          edit



          I went down a blind alley with analytics but the answer was depressingly obvious...



          SQL> select count(distinct concat(deptno,job)) from emp
          2 /

          COUNT(DISTINCTCONCAT(DEPTNO,JOB))
          ---------------------------------
          9

          SQL>


          edit 2



          Given the following data the concatenating solution provided above will miscount:



          col1  col2
          ---- ----
          A AA
          AA A


          So we to include a separator...



          select col1 + '*' + col2 from t23
          /


          Obviously the chosen separator must be a character, or set of characters, which can never appear in either column.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Apr 19 '12 at 6:15

























          answered Sep 24 '09 at 12:41









          APCAPC

          118k15117229




          118k15117229













          • +1 from me. Thanks for your answer. My query works fine but I was wondering if I can get the final result using just one query (without using a subquery)

            – Novitzky
            Sep 24 '09 at 13:57



















          • +1 from me. Thanks for your answer. My query works fine but I was wondering if I can get the final result using just one query (without using a subquery)

            – Novitzky
            Sep 24 '09 at 13:57

















          +1 from me. Thanks for your answer. My query works fine but I was wondering if I can get the final result using just one query (without using a subquery)

          – Novitzky
          Sep 24 '09 at 13:57





          +1 from me. Thanks for your answer. My query works fine but I was wondering if I can get the final result using just one query (without using a subquery)

          – Novitzky
          Sep 24 '09 at 13:57











          14














          How about something like:




          select count(*)
          from
          (select count(*) cnt
          from DocumentOutputItems
          group by DocumentId, DocumentSessionId) t1



          Probably just does the same as you are already though but it avoids the DISTINCT.






          share|improve this answer


























          • Yes, you right. It does the same job as my original one.

            – Novitzky
            Sep 24 '09 at 13:35











          • in my tests (using SET SHOWPLAN_ALL ON), it had the same execution plan and exact same TotalSubtreeCost

            – KM.
            Sep 24 '09 at 13:43











          • +1 for a nice try and the explanation.

            – Novitzky
            Sep 24 '09 at 13:50






          • 1





            Depending on the complexity of the original query, solving this with GROUP BY may introduce a couple of additional challenges to the query transformation to achieve the desired output (e.g. when the original query already had GROUP BY or HAVING clauses...)

            – Lukas Eder
            Dec 17 '13 at 9:08


















          14














          How about something like:




          select count(*)
          from
          (select count(*) cnt
          from DocumentOutputItems
          group by DocumentId, DocumentSessionId) t1



          Probably just does the same as you are already though but it avoids the DISTINCT.






          share|improve this answer


























          • Yes, you right. It does the same job as my original one.

            – Novitzky
            Sep 24 '09 at 13:35











          • in my tests (using SET SHOWPLAN_ALL ON), it had the same execution plan and exact same TotalSubtreeCost

            – KM.
            Sep 24 '09 at 13:43











          • +1 for a nice try and the explanation.

            – Novitzky
            Sep 24 '09 at 13:50






          • 1





            Depending on the complexity of the original query, solving this with GROUP BY may introduce a couple of additional challenges to the query transformation to achieve the desired output (e.g. when the original query already had GROUP BY or HAVING clauses...)

            – Lukas Eder
            Dec 17 '13 at 9:08
















          14












          14








          14







          How about something like:




          select count(*)
          from
          (select count(*) cnt
          from DocumentOutputItems
          group by DocumentId, DocumentSessionId) t1



          Probably just does the same as you are already though but it avoids the DISTINCT.






          share|improve this answer















          How about something like:




          select count(*)
          from
          (select count(*) cnt
          from DocumentOutputItems
          group by DocumentId, DocumentSessionId) t1



          Probably just does the same as you are already though but it avoids the DISTINCT.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Sep 24 '09 at 12:57

























          answered Sep 24 '09 at 12:46









          Trevor TippinsTrevor Tippins

          2,667107




          2,667107













          • Yes, you right. It does the same job as my original one.

            – Novitzky
            Sep 24 '09 at 13:35











          • in my tests (using SET SHOWPLAN_ALL ON), it had the same execution plan and exact same TotalSubtreeCost

            – KM.
            Sep 24 '09 at 13:43











          • +1 for a nice try and the explanation.

            – Novitzky
            Sep 24 '09 at 13:50






          • 1





            Depending on the complexity of the original query, solving this with GROUP BY may introduce a couple of additional challenges to the query transformation to achieve the desired output (e.g. when the original query already had GROUP BY or HAVING clauses...)

            – Lukas Eder
            Dec 17 '13 at 9:08





















          • Yes, you right. It does the same job as my original one.

            – Novitzky
            Sep 24 '09 at 13:35











          • in my tests (using SET SHOWPLAN_ALL ON), it had the same execution plan and exact same TotalSubtreeCost

            – KM.
            Sep 24 '09 at 13:43











          • +1 for a nice try and the explanation.

            – Novitzky
            Sep 24 '09 at 13:50






          • 1





            Depending on the complexity of the original query, solving this with GROUP BY may introduce a couple of additional challenges to the query transformation to achieve the desired output (e.g. when the original query already had GROUP BY or HAVING clauses...)

            – Lukas Eder
            Dec 17 '13 at 9:08



















          Yes, you right. It does the same job as my original one.

          – Novitzky
          Sep 24 '09 at 13:35





          Yes, you right. It does the same job as my original one.

          – Novitzky
          Sep 24 '09 at 13:35













          in my tests (using SET SHOWPLAN_ALL ON), it had the same execution plan and exact same TotalSubtreeCost

          – KM.
          Sep 24 '09 at 13:43





          in my tests (using SET SHOWPLAN_ALL ON), it had the same execution plan and exact same TotalSubtreeCost

          – KM.
          Sep 24 '09 at 13:43













          +1 for a nice try and the explanation.

          – Novitzky
          Sep 24 '09 at 13:50





          +1 for a nice try and the explanation.

          – Novitzky
          Sep 24 '09 at 13:50




          1




          1





          Depending on the complexity of the original query, solving this with GROUP BY may introduce a couple of additional challenges to the query transformation to achieve the desired output (e.g. when the original query already had GROUP BY or HAVING clauses...)

          – Lukas Eder
          Dec 17 '13 at 9:08







          Depending on the complexity of the original query, solving this with GROUP BY may introduce a couple of additional challenges to the query transformation to achieve the desired output (e.g. when the original query already had GROUP BY or HAVING clauses...)

          – Lukas Eder
          Dec 17 '13 at 9:08













          8














          To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.



          SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;


          In MySQL you can do the same thing without the concatenation step as follows:



          SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;


          This feature is mentioned in the MySQL documentation:



          http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct






          share|improve this answer
























          • This was a SQL Server question, and both options you posted have already been mentioned in the following answers to this question: stackoverflow.com/a/1471444/4955425 and stackoverflow.com/a/1471713/4955425.

            – sstan
            Jul 28 '16 at 20:31











          • FWIW, this almost works in PostgreSQL; just need extra parentheses: SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

            – ijoseph
            Jun 25 '18 at 20:12


















          8














          To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.



          SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;


          In MySQL you can do the same thing without the concatenation step as follows:



          SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;


          This feature is mentioned in the MySQL documentation:



          http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct






          share|improve this answer
























          • This was a SQL Server question, and both options you posted have already been mentioned in the following answers to this question: stackoverflow.com/a/1471444/4955425 and stackoverflow.com/a/1471713/4955425.

            – sstan
            Jul 28 '16 at 20:31











          • FWIW, this almost works in PostgreSQL; just need extra parentheses: SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

            – ijoseph
            Jun 25 '18 at 20:12
















          8












          8








          8







          To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.



          SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;


          In MySQL you can do the same thing without the concatenation step as follows:



          SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;


          This feature is mentioned in the MySQL documentation:



          http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct






          share|improve this answer













          To run as a single query, concatenate the columns, then get the distinct count of instances of the concatenated string.



          SELECT count(DISTINCT concat(DocumentId, DocumentSessionId)) FROM DocumentOutputItems;


          In MySQL you can do the same thing without the concatenation step as follows:



          SELECT count(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems;


          This feature is mentioned in the MySQL documentation:



          http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_count-distinct







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jul 28 '16 at 20:21









          spelunk1spelunk1

          12914




          12914













          • This was a SQL Server question, and both options you posted have already been mentioned in the following answers to this question: stackoverflow.com/a/1471444/4955425 and stackoverflow.com/a/1471713/4955425.

            – sstan
            Jul 28 '16 at 20:31











          • FWIW, this almost works in PostgreSQL; just need extra parentheses: SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

            – ijoseph
            Jun 25 '18 at 20:12





















          • This was a SQL Server question, and both options you posted have already been mentioned in the following answers to this question: stackoverflow.com/a/1471444/4955425 and stackoverflow.com/a/1471713/4955425.

            – sstan
            Jul 28 '16 at 20:31











          • FWIW, this almost works in PostgreSQL; just need extra parentheses: SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

            – ijoseph
            Jun 25 '18 at 20:12



















          This was a SQL Server question, and both options you posted have already been mentioned in the following answers to this question: stackoverflow.com/a/1471444/4955425 and stackoverflow.com/a/1471713/4955425.

          – sstan
          Jul 28 '16 at 20:31





          This was a SQL Server question, and both options you posted have already been mentioned in the following answers to this question: stackoverflow.com/a/1471444/4955425 and stackoverflow.com/a/1471713/4955425.

          – sstan
          Jul 28 '16 at 20:31













          FWIW, this almost works in PostgreSQL; just need extra parentheses: SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

          – ijoseph
          Jun 25 '18 at 20:12







          FWIW, this almost works in PostgreSQL; just need extra parentheses: SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId)) FROM DocumentOutputItems;

          – ijoseph
          Jun 25 '18 at 20:12













          6














          Here's a shorter version without the subselect:



          SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems


          It works fine in MySQL, and I think that the optimizer has an easier time understanding this one.



          Edit: Apparently I misread MSSQL and MySQL - sorry about that, but maybe it helps anyway.






          share|improve this answer





















          • 6





            in SQL Server you get: Msg 102, Level 15, State 1, Line 1 Incorrect syntax near ','.

            – KM.
            Sep 24 '09 at 13:35











          • This is what I was thinking of. I want do similar thing in MSSQL if possible.

            – Novitzky
            Sep 24 '09 at 13:38











          • @Kamil Nowicki, in SQL Server, you can only have one field in a COUNT(), in my answer I show that you can concatenate the two fields into one and try this approach. However, I'd just stick with the original since the query plans would end up the same.

            – KM.
            Sep 24 '09 at 13:45











          • Please give a look in @JayTee answer. It works like a charm. count ( distinct CHECKSUM ([Field1], [Field2])

            – Custodio
            Nov 16 '12 at 14:53
















          6














          Here's a shorter version without the subselect:



          SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems


          It works fine in MySQL, and I think that the optimizer has an easier time understanding this one.



          Edit: Apparently I misread MSSQL and MySQL - sorry about that, but maybe it helps anyway.






          share|improve this answer





















          • 6





            in SQL Server you get: Msg 102, Level 15, State 1, Line 1 Incorrect syntax near ','.

            – KM.
            Sep 24 '09 at 13:35











          • This is what I was thinking of. I want do similar thing in MSSQL if possible.

            – Novitzky
            Sep 24 '09 at 13:38











          • @Kamil Nowicki, in SQL Server, you can only have one field in a COUNT(), in my answer I show that you can concatenate the two fields into one and try this approach. However, I'd just stick with the original since the query plans would end up the same.

            – KM.
            Sep 24 '09 at 13:45











          • Please give a look in @JayTee answer. It works like a charm. count ( distinct CHECKSUM ([Field1], [Field2])

            – Custodio
            Nov 16 '12 at 14:53














          6












          6








          6







          Here's a shorter version without the subselect:



          SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems


          It works fine in MySQL, and I think that the optimizer has an easier time understanding this one.



          Edit: Apparently I misread MSSQL and MySQL - sorry about that, but maybe it helps anyway.






          share|improve this answer















          Here's a shorter version without the subselect:



          SELECT COUNT(DISTINCT DocumentId, DocumentSessionId) FROM DocumentOutputItems


          It works fine in MySQL, and I think that the optimizer has an easier time understanding this one.



          Edit: Apparently I misread MSSQL and MySQL - sorry about that, but maybe it helps anyway.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Dec 1 '16 at 10:40









          starbeamrainbowlabs

          2,99052947




          2,99052947










          answered Sep 24 '09 at 13:33









          Alexander KjällAlexander Kjäll

          3,02332145




          3,02332145








          • 6





            in SQL Server you get: Msg 102, Level 15, State 1, Line 1 Incorrect syntax near ','.

            – KM.
            Sep 24 '09 at 13:35











          • This is what I was thinking of. I want do similar thing in MSSQL if possible.

            – Novitzky
            Sep 24 '09 at 13:38











          • @Kamil Nowicki, in SQL Server, you can only have one field in a COUNT(), in my answer I show that you can concatenate the two fields into one and try this approach. However, I'd just stick with the original since the query plans would end up the same.

            – KM.
            Sep 24 '09 at 13:45











          • Please give a look in @JayTee answer. It works like a charm. count ( distinct CHECKSUM ([Field1], [Field2])

            – Custodio
            Nov 16 '12 at 14:53














          • 6





            in SQL Server you get: Msg 102, Level 15, State 1, Line 1 Incorrect syntax near ','.

            – KM.
            Sep 24 '09 at 13:35











          • This is what I was thinking of. I want do similar thing in MSSQL if possible.

            – Novitzky
            Sep 24 '09 at 13:38











          • @Kamil Nowicki, in SQL Server, you can only have one field in a COUNT(), in my answer I show that you can concatenate the two fields into one and try this approach. However, I'd just stick with the original since the query plans would end up the same.

            – KM.
            Sep 24 '09 at 13:45











          • Please give a look in @JayTee answer. It works like a charm. count ( distinct CHECKSUM ([Field1], [Field2])

            – Custodio
            Nov 16 '12 at 14:53








          6




          6





          in SQL Server you get: Msg 102, Level 15, State 1, Line 1 Incorrect syntax near ','.

          – KM.
          Sep 24 '09 at 13:35





          in SQL Server you get: Msg 102, Level 15, State 1, Line 1 Incorrect syntax near ','.

          – KM.
          Sep 24 '09 at 13:35













          This is what I was thinking of. I want do similar thing in MSSQL if possible.

          – Novitzky
          Sep 24 '09 at 13:38





          This is what I was thinking of. I want do similar thing in MSSQL if possible.

          – Novitzky
          Sep 24 '09 at 13:38













          @Kamil Nowicki, in SQL Server, you can only have one field in a COUNT(), in my answer I show that you can concatenate the two fields into one and try this approach. However, I'd just stick with the original since the query plans would end up the same.

          – KM.
          Sep 24 '09 at 13:45





          @Kamil Nowicki, in SQL Server, you can only have one field in a COUNT(), in my answer I show that you can concatenate the two fields into one and try this approach. However, I'd just stick with the original since the query plans would end up the same.

          – KM.
          Sep 24 '09 at 13:45













          Please give a look in @JayTee answer. It works like a charm. count ( distinct CHECKSUM ([Field1], [Field2])

          – Custodio
          Nov 16 '12 at 14:53





          Please give a look in @JayTee answer. It works like a charm. count ( distinct CHECKSUM ([Field1], [Field2])

          – Custodio
          Nov 16 '12 at 14:53











          3














          There's nothing wrong with your query, but you could also do it this way:



          WITH internalQuery (Amount)
          AS
          (
          SELECT (0)
          FROM DocumentOutputItems
          GROUP BY DocumentId, DocumentSessionId
          )
          SELECT COUNT(*) AS NumberOfDistinctRows
          FROM internalQuery





          share|improve this answer




























            3














            There's nothing wrong with your query, but you could also do it this way:



            WITH internalQuery (Amount)
            AS
            (
            SELECT (0)
            FROM DocumentOutputItems
            GROUP BY DocumentId, DocumentSessionId
            )
            SELECT COUNT(*) AS NumberOfDistinctRows
            FROM internalQuery





            share|improve this answer


























              3












              3








              3







              There's nothing wrong with your query, but you could also do it this way:



              WITH internalQuery (Amount)
              AS
              (
              SELECT (0)
              FROM DocumentOutputItems
              GROUP BY DocumentId, DocumentSessionId
              )
              SELECT COUNT(*) AS NumberOfDistinctRows
              FROM internalQuery





              share|improve this answer













              There's nothing wrong with your query, but you could also do it this way:



              WITH internalQuery (Amount)
              AS
              (
              SELECT (0)
              FROM DocumentOutputItems
              GROUP BY DocumentId, DocumentSessionId
              )
              SELECT COUNT(*) AS NumberOfDistinctRows
              FROM internalQuery






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Sep 24 '09 at 13:37









              BliekBliek

              40637




              40637























                  3














                  I found this when I Googled for my own issue, found that if you count DISTINCT objects, you get the correct number returned (I'm using MySQL)



                  SELECT COUNT(DISTINCT DocumentID) AS Count1, 
                  COUNT(DISTINCT DocumentSessionId) AS Count2
                  FROM DocumentOutputItems





                  share|improve this answer



















                  • 5





                    The above query will return a different set of results than what the OP was looking for (the distinct combinations of DocumentId and DocumentSessionId). Alexander Kjäll already posted the correct answer if the OP was using MySQL and not MS SQL Server.

                    – Anthony Geoghegan
                    Jul 28 '14 at 9:21
















                  3














                  I found this when I Googled for my own issue, found that if you count DISTINCT objects, you get the correct number returned (I'm using MySQL)



                  SELECT COUNT(DISTINCT DocumentID) AS Count1, 
                  COUNT(DISTINCT DocumentSessionId) AS Count2
                  FROM DocumentOutputItems





                  share|improve this answer



















                  • 5





                    The above query will return a different set of results than what the OP was looking for (the distinct combinations of DocumentId and DocumentSessionId). Alexander Kjäll already posted the correct answer if the OP was using MySQL and not MS SQL Server.

                    – Anthony Geoghegan
                    Jul 28 '14 at 9:21














                  3












                  3








                  3







                  I found this when I Googled for my own issue, found that if you count DISTINCT objects, you get the correct number returned (I'm using MySQL)



                  SELECT COUNT(DISTINCT DocumentID) AS Count1, 
                  COUNT(DISTINCT DocumentSessionId) AS Count2
                  FROM DocumentOutputItems





                  share|improve this answer













                  I found this when I Googled for my own issue, found that if you count DISTINCT objects, you get the correct number returned (I'm using MySQL)



                  SELECT COUNT(DISTINCT DocumentID) AS Count1, 
                  COUNT(DISTINCT DocumentSessionId) AS Count2
                  FROM DocumentOutputItems






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Apr 12 '13 at 16:31









                  tehaugmentertehaugmenter

                  6311




                  6311








                  • 5





                    The above query will return a different set of results than what the OP was looking for (the distinct combinations of DocumentId and DocumentSessionId). Alexander Kjäll already posted the correct answer if the OP was using MySQL and not MS SQL Server.

                    – Anthony Geoghegan
                    Jul 28 '14 at 9:21














                  • 5





                    The above query will return a different set of results than what the OP was looking for (the distinct combinations of DocumentId and DocumentSessionId). Alexander Kjäll already posted the correct answer if the OP was using MySQL and not MS SQL Server.

                    – Anthony Geoghegan
                    Jul 28 '14 at 9:21








                  5




                  5





                  The above query will return a different set of results than what the OP was looking for (the distinct combinations of DocumentId and DocumentSessionId). Alexander Kjäll already posted the correct answer if the OP was using MySQL and not MS SQL Server.

                  – Anthony Geoghegan
                  Jul 28 '14 at 9:21





                  The above query will return a different set of results than what the OP was looking for (the distinct combinations of DocumentId and DocumentSessionId). Alexander Kjäll already posted the correct answer if the OP was using MySQL and not MS SQL Server.

                  – Anthony Geoghegan
                  Jul 28 '14 at 9:21











                  2














                  if you had only one field to "DISTINCT", you could use:



                  SELECT COUNT(DISTINCT DocumentId) 
                  FROM DocumentOutputItems


                  and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:



                      SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId)) 
                  FROM DocumentOutputItems


                  but you'll have issues if NULLs are involved. I'd just stick with the original query.






                  share|improve this answer
























                  • +1 from me. Thanks but I will stick with my query as you suggested. Using "convert" can decrease performance even more.

                    – Novitzky
                    Sep 24 '09 at 13:58
















                  2














                  if you had only one field to "DISTINCT", you could use:



                  SELECT COUNT(DISTINCT DocumentId) 
                  FROM DocumentOutputItems


                  and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:



                      SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId)) 
                  FROM DocumentOutputItems


                  but you'll have issues if NULLs are involved. I'd just stick with the original query.






                  share|improve this answer
























                  • +1 from me. Thanks but I will stick with my query as you suggested. Using "convert" can decrease performance even more.

                    – Novitzky
                    Sep 24 '09 at 13:58














                  2












                  2








                  2







                  if you had only one field to "DISTINCT", you could use:



                  SELECT COUNT(DISTINCT DocumentId) 
                  FROM DocumentOutputItems


                  and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:



                      SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId)) 
                  FROM DocumentOutputItems


                  but you'll have issues if NULLs are involved. I'd just stick with the original query.






                  share|improve this answer













                  if you had only one field to "DISTINCT", you could use:



                  SELECT COUNT(DISTINCT DocumentId) 
                  FROM DocumentOutputItems


                  and that does return the same query plan as the original, as tested with SET SHOWPLAN_ALL ON. However you are using two fields so you could try something crazy like:



                      SELECT COUNT(DISTINCT convert(varchar(15),DocumentId)+'|~|'+convert(varchar(15), DocumentSessionId)) 
                  FROM DocumentOutputItems


                  but you'll have issues if NULLs are involved. I'd just stick with the original query.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Sep 24 '09 at 13:34









                  KM.KM.

                  82.4k27147189




                  82.4k27147189













                  • +1 from me. Thanks but I will stick with my query as you suggested. Using "convert" can decrease performance even more.

                    – Novitzky
                    Sep 24 '09 at 13:58



















                  • +1 from me. Thanks but I will stick with my query as you suggested. Using "convert" can decrease performance even more.

                    – Novitzky
                    Sep 24 '09 at 13:58

















                  +1 from me. Thanks but I will stick with my query as you suggested. Using "convert" can decrease performance even more.

                  – Novitzky
                  Sep 24 '09 at 13:58





                  +1 from me. Thanks but I will stick with my query as you suggested. Using "convert" can decrease performance even more.

                  – Novitzky
                  Sep 24 '09 at 13:58











                  2














                  Hope this works i am writing on prima vista



                  SELECT COUNT(*) 
                  FROM DocumentOutputItems
                  GROUP BY DocumentId, DocumentSessionId





                  share|improve this answer





















                  • 7





                    In order for this to give the final answer, you would have to wrap it in another SELECT COUNT(*) FROM ( ... ). Essentially this answer is just giving you another way to list the distinct values you want to count. It's no better than your original solution.

                    – Dave Costa
                    Sep 24 '09 at 13:19











                  • Thanks Dave. I know you can use group by instead of distinct in my case. I was wondering if you get the final result using just one query. I think is impossible but I might be wrong.

                    – Novitzky
                    Sep 24 '09 at 13:32
















                  2














                  Hope this works i am writing on prima vista



                  SELECT COUNT(*) 
                  FROM DocumentOutputItems
                  GROUP BY DocumentId, DocumentSessionId





                  share|improve this answer





















                  • 7





                    In order for this to give the final answer, you would have to wrap it in another SELECT COUNT(*) FROM ( ... ). Essentially this answer is just giving you another way to list the distinct values you want to count. It's no better than your original solution.

                    – Dave Costa
                    Sep 24 '09 at 13:19











                  • Thanks Dave. I know you can use group by instead of distinct in my case. I was wondering if you get the final result using just one query. I think is impossible but I might be wrong.

                    – Novitzky
                    Sep 24 '09 at 13:32














                  2












                  2








                  2







                  Hope this works i am writing on prima vista



                  SELECT COUNT(*) 
                  FROM DocumentOutputItems
                  GROUP BY DocumentId, DocumentSessionId





                  share|improve this answer















                  Hope this works i am writing on prima vista



                  SELECT COUNT(*) 
                  FROM DocumentOutputItems
                  GROUP BY DocumentId, DocumentSessionId






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Mar 7 '13 at 12:49









                  Alexander

                  20.4k74667




                  20.4k74667










                  answered Sep 24 '09 at 12:10









                  IordanTanevIordanTanev

                  4,55533146




                  4,55533146








                  • 7





                    In order for this to give the final answer, you would have to wrap it in another SELECT COUNT(*) FROM ( ... ). Essentially this answer is just giving you another way to list the distinct values you want to count. It's no better than your original solution.

                    – Dave Costa
                    Sep 24 '09 at 13:19











                  • Thanks Dave. I know you can use group by instead of distinct in my case. I was wondering if you get the final result using just one query. I think is impossible but I might be wrong.

                    – Novitzky
                    Sep 24 '09 at 13:32














                  • 7





                    In order for this to give the final answer, you would have to wrap it in another SELECT COUNT(*) FROM ( ... ). Essentially this answer is just giving you another way to list the distinct values you want to count. It's no better than your original solution.

                    – Dave Costa
                    Sep 24 '09 at 13:19











                  • Thanks Dave. I know you can use group by instead of distinct in my case. I was wondering if you get the final result using just one query. I think is impossible but I might be wrong.

                    – Novitzky
                    Sep 24 '09 at 13:32








                  7




                  7





                  In order for this to give the final answer, you would have to wrap it in another SELECT COUNT(*) FROM ( ... ). Essentially this answer is just giving you another way to list the distinct values you want to count. It's no better than your original solution.

                  – Dave Costa
                  Sep 24 '09 at 13:19





                  In order for this to give the final answer, you would have to wrap it in another SELECT COUNT(*) FROM ( ... ). Essentially this answer is just giving you another way to list the distinct values you want to count. It's no better than your original solution.

                  – Dave Costa
                  Sep 24 '09 at 13:19













                  Thanks Dave. I know you can use group by instead of distinct in my case. I was wondering if you get the final result using just one query. I think is impossible but I might be wrong.

                  – Novitzky
                  Sep 24 '09 at 13:32





                  Thanks Dave. I know you can use group by instead of distinct in my case. I was wondering if you get the final result using just one query. I think is impossible but I might be wrong.

                  – Novitzky
                  Sep 24 '09 at 13:32











                  1














                  I wish MS SQL could also do something like COUNT(DISTINCT A, B). But it can't.



                  At first JayTee's answer seemed like a solution to me bu after some tests CHECKSUM() failed to create unique values. A quick example is, both CHECKSUM(31,467,519) and CHECKSUM(69,1120,823) gives the same answer which is 55.



                  Then I made some research and found that Microsoft does NOT recommend using CHECKSUM for change detection purposes. In some forums some suggested using



                  SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))


                  but this is also not conforting.



                  You can use HASHBYTES() function as suggested in TSQL CHECKSUM conundrum. However this also has a small chance of not returning unique results.



                  I would suggest using



                  SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems





                  share|improve this answer






























                    1














                    I wish MS SQL could also do something like COUNT(DISTINCT A, B). But it can't.



                    At first JayTee's answer seemed like a solution to me bu after some tests CHECKSUM() failed to create unique values. A quick example is, both CHECKSUM(31,467,519) and CHECKSUM(69,1120,823) gives the same answer which is 55.



                    Then I made some research and found that Microsoft does NOT recommend using CHECKSUM for change detection purposes. In some forums some suggested using



                    SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))


                    but this is also not conforting.



                    You can use HASHBYTES() function as suggested in TSQL CHECKSUM conundrum. However this also has a small chance of not returning unique results.



                    I would suggest using



                    SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems





                    share|improve this answer




























                      1












                      1








                      1







                      I wish MS SQL could also do something like COUNT(DISTINCT A, B). But it can't.



                      At first JayTee's answer seemed like a solution to me bu after some tests CHECKSUM() failed to create unique values. A quick example is, both CHECKSUM(31,467,519) and CHECKSUM(69,1120,823) gives the same answer which is 55.



                      Then I made some research and found that Microsoft does NOT recommend using CHECKSUM for change detection purposes. In some forums some suggested using



                      SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))


                      but this is also not conforting.



                      You can use HASHBYTES() function as suggested in TSQL CHECKSUM conundrum. However this also has a small chance of not returning unique results.



                      I would suggest using



                      SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems





                      share|improve this answer















                      I wish MS SQL could also do something like COUNT(DISTINCT A, B). But it can't.



                      At first JayTee's answer seemed like a solution to me bu after some tests CHECKSUM() failed to create unique values. A quick example is, both CHECKSUM(31,467,519) and CHECKSUM(69,1120,823) gives the same answer which is 55.



                      Then I made some research and found that Microsoft does NOT recommend using CHECKSUM for change detection purposes. In some forums some suggested using



                      SELECT COUNT(DISTINCT CHECKSUM(value1, value2, ..., valueN) + CHECKSUM(valueN, value(N-1), ..., value1))


                      but this is also not conforting.



                      You can use HASHBYTES() function as suggested in TSQL CHECKSUM conundrum. However this also has a small chance of not returning unique results.



                      I would suggest using



                      SELECT COUNT(DISTINCT CAST(DocumentId AS VARCHAR)+'-'+CAST(DocumentSessionId AS VARCHAR)) FROM DocumentOutputItems






                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited May 23 '17 at 11:47









                      Community

                      11




                      11










                      answered Jan 24 '13 at 10:34









                      Oncel Umut TUREROncel Umut TURER

                      430812




                      430812























                          1














                          Many (most?) SQL databases can work with tuples like values so you can just do:

                          SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId))
                          FROM DocumentOutputItems;

                          If your database doesn't support this, it can be simulated as per @oncel-umut-turer's suggestion of CHECKSUM or other scalar function providing good uniqueness e.g.

                          COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId))
                          .



                          A related use of tuples is performing IN queries such as:

                          SELECT * FROM DocumentOutputItems
                          WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));






                          share|improve this answer
























                          • what databases support select count(distinct(a, b))? :D

                            – Vytenis Bivainis
                            Oct 11 '18 at 21:39











                          • @VytenisBivainis I know PostgreSQL does--not sure since which version.

                            – karmakaze
                            Oct 17 '18 at 0:35
















                          1














                          Many (most?) SQL databases can work with tuples like values so you can just do:

                          SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId))
                          FROM DocumentOutputItems;

                          If your database doesn't support this, it can be simulated as per @oncel-umut-turer's suggestion of CHECKSUM or other scalar function providing good uniqueness e.g.

                          COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId))
                          .



                          A related use of tuples is performing IN queries such as:

                          SELECT * FROM DocumentOutputItems
                          WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));






                          share|improve this answer
























                          • what databases support select count(distinct(a, b))? :D

                            – Vytenis Bivainis
                            Oct 11 '18 at 21:39











                          • @VytenisBivainis I know PostgreSQL does--not sure since which version.

                            – karmakaze
                            Oct 17 '18 at 0:35














                          1












                          1








                          1







                          Many (most?) SQL databases can work with tuples like values so you can just do:

                          SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId))
                          FROM DocumentOutputItems;

                          If your database doesn't support this, it can be simulated as per @oncel-umut-turer's suggestion of CHECKSUM or other scalar function providing good uniqueness e.g.

                          COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId))
                          .



                          A related use of tuples is performing IN queries such as:

                          SELECT * FROM DocumentOutputItems
                          WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));






                          share|improve this answer













                          Many (most?) SQL databases can work with tuples like values so you can just do:

                          SELECT COUNT(DISTINCT (DocumentId, DocumentSessionId))
                          FROM DocumentOutputItems;

                          If your database doesn't support this, it can be simulated as per @oncel-umut-turer's suggestion of CHECKSUM or other scalar function providing good uniqueness e.g.

                          COUNT(DISTINCT CONCAT(DocumentId, ':', DocumentSessionId))
                          .



                          A related use of tuples is performing IN queries such as:

                          SELECT * FROM DocumentOutputItems
                          WHERE (DocumentId, DocumentSessionId) in (('a', '1'), ('b', '2'));







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Jun 2 '18 at 16:42









                          karmakazekarmakaze

                          21.7k11922




                          21.7k11922













                          • what databases support select count(distinct(a, b))? :D

                            – Vytenis Bivainis
                            Oct 11 '18 at 21:39











                          • @VytenisBivainis I know PostgreSQL does--not sure since which version.

                            – karmakaze
                            Oct 17 '18 at 0:35



















                          • what databases support select count(distinct(a, b))? :D

                            – Vytenis Bivainis
                            Oct 11 '18 at 21:39











                          • @VytenisBivainis I know PostgreSQL does--not sure since which version.

                            – karmakaze
                            Oct 17 '18 at 0:35

















                          what databases support select count(distinct(a, b))? :D

                          – Vytenis Bivainis
                          Oct 11 '18 at 21:39





                          what databases support select count(distinct(a, b))? :D

                          – Vytenis Bivainis
                          Oct 11 '18 at 21:39













                          @VytenisBivainis I know PostgreSQL does--not sure since which version.

                          – karmakaze
                          Oct 17 '18 at 0:35





                          @VytenisBivainis I know PostgreSQL does--not sure since which version.

                          – karmakaze
                          Oct 17 '18 at 0:35











                          0














                          It works for me. In oracle:



                          SELECT SUM(DECODE(COUNT(*),1,1,1))
                          FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;


                          In jpql:



                          SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
                          FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;





                          share|improve this answer




























                            0














                            It works for me. In oracle:



                            SELECT SUM(DECODE(COUNT(*),1,1,1))
                            FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;


                            In jpql:



                            SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
                            FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;





                            share|improve this answer


























                              0












                              0








                              0







                              It works for me. In oracle:



                              SELECT SUM(DECODE(COUNT(*),1,1,1))
                              FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;


                              In jpql:



                              SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
                              FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;





                              share|improve this answer













                              It works for me. In oracle:



                              SELECT SUM(DECODE(COUNT(*),1,1,1))
                              FROM DocumentOutputItems GROUP BY DocumentId, DocumentSessionId;


                              In jpql:



                              SELECT SUM(CASE WHEN COUNT(i)=1 THEN 1 ELSE 1 END)
                              FROM DocumentOutputItems i GROUP BY i.DocumentId, i.DocumentSessionId;






                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Mar 29 '18 at 7:59









                              NataNata

                              11




                              11























                                  0














                                  How about this,



                                  Select DocumentId, DocumentSessionId, count(*) as c 
                                  from DocumentOutputItems
                                  group by DocumentId, DocumentSessionId;


                                  This will get us the count of all possible combinations of DocumentId, and DocumentSessionId






                                  share|improve this answer






























                                    0














                                    How about this,



                                    Select DocumentId, DocumentSessionId, count(*) as c 
                                    from DocumentOutputItems
                                    group by DocumentId, DocumentSessionId;


                                    This will get us the count of all possible combinations of DocumentId, and DocumentSessionId






                                    share|improve this answer




























                                      0












                                      0








                                      0







                                      How about this,



                                      Select DocumentId, DocumentSessionId, count(*) as c 
                                      from DocumentOutputItems
                                      group by DocumentId, DocumentSessionId;


                                      This will get us the count of all possible combinations of DocumentId, and DocumentSessionId






                                      share|improve this answer















                                      How about this,



                                      Select DocumentId, DocumentSessionId, count(*) as c 
                                      from DocumentOutputItems
                                      group by DocumentId, DocumentSessionId;


                                      This will get us the count of all possible combinations of DocumentId, and DocumentSessionId







                                      share|improve this answer














                                      share|improve this answer



                                      share|improve this answer








                                      edited May 1 '18 at 10:57









                                      ADyson

                                      23.6k112445




                                      23.6k112445










                                      answered May 1 '18 at 10:45









                                      Nikhil SinghNikhil Singh

                                      11




                                      11























                                          0














                                          You can just use the Count Function Twice.



                                          In this case, it would be:



                                          SELECT COUNT (DISTINCT DocumentId), COUNT (DISTINCT DocumentSessionId) 
                                          FROM DocumentOutputItems





                                          share|improve this answer


























                                          • this doesn't do as require in the question, it counts the distinct in separate for each column

                                            – naviram
                                            Jan 10 at 22:17
















                                          0














                                          You can just use the Count Function Twice.



                                          In this case, it would be:



                                          SELECT COUNT (DISTINCT DocumentId), COUNT (DISTINCT DocumentSessionId) 
                                          FROM DocumentOutputItems





                                          share|improve this answer


























                                          • this doesn't do as require in the question, it counts the distinct in separate for each column

                                            – naviram
                                            Jan 10 at 22:17














                                          0












                                          0








                                          0







                                          You can just use the Count Function Twice.



                                          In this case, it would be:



                                          SELECT COUNT (DISTINCT DocumentId), COUNT (DISTINCT DocumentSessionId) 
                                          FROM DocumentOutputItems





                                          share|improve this answer















                                          You can just use the Count Function Twice.



                                          In this case, it would be:



                                          SELECT COUNT (DISTINCT DocumentId), COUNT (DISTINCT DocumentSessionId) 
                                          FROM DocumentOutputItems






                                          share|improve this answer














                                          share|improve this answer



                                          share|improve this answer








                                          edited Nov 16 '18 at 8:17









                                          Nick

                                          27k111939




                                          27k111939










                                          answered Aug 14 '18 at 16:11









                                          BibekBibek

                                          91




                                          91













                                          • this doesn't do as require in the question, it counts the distinct in separate for each column

                                            – naviram
                                            Jan 10 at 22:17



















                                          • this doesn't do as require in the question, it counts the distinct in separate for each column

                                            – naviram
                                            Jan 10 at 22:17

















                                          this doesn't do as require in the question, it counts the distinct in separate for each column

                                          – naviram
                                          Jan 10 at 22:17





                                          this doesn't do as require in the question, it counts the distinct in separate for each column

                                          – naviram
                                          Jan 10 at 22:17











                                          -1














                                          select DISTINCT DocumentId as i, DocumentSessionId as s , count(*)



                                          from DocumentOutputItems



                                          group by i ,s;



                                          This code uses distinct on 2 parameters and provides count of number of rows specific to those distinct values row count. It worked for me in mysql like a charm.






                                          share|improve this answer


























                                          • This is the part of review answer in stackoverflow. Add some explanation though code is explanatory.

                                            – Harsha B
                                            Nov 16 '18 at 13:10











                                          • Can somebody please explain why my answer got a -1. Sorry, I am a newbie and I tried different answers and none of them was working. I made the changes and posted the working code?

                                            – rishi jain
                                            Nov 20 '18 at 6:45











                                          • Answer was not formatted properly. May be that's why someone downvoted. Don't worry, I have formatted the same.

                                            – Harsha B
                                            Nov 20 '18 at 7:04
















                                          -1














                                          select DISTINCT DocumentId as i, DocumentSessionId as s , count(*)



                                          from DocumentOutputItems



                                          group by i ,s;



                                          This code uses distinct on 2 parameters and provides count of number of rows specific to those distinct values row count. It worked for me in mysql like a charm.






                                          share|improve this answer


























                                          • This is the part of review answer in stackoverflow. Add some explanation though code is explanatory.

                                            – Harsha B
                                            Nov 16 '18 at 13:10











                                          • Can somebody please explain why my answer got a -1. Sorry, I am a newbie and I tried different answers and none of them was working. I made the changes and posted the working code?

                                            – rishi jain
                                            Nov 20 '18 at 6:45











                                          • Answer was not formatted properly. May be that's why someone downvoted. Don't worry, I have formatted the same.

                                            – Harsha B
                                            Nov 20 '18 at 7:04














                                          -1












                                          -1








                                          -1







                                          select DISTINCT DocumentId as i, DocumentSessionId as s , count(*)



                                          from DocumentOutputItems



                                          group by i ,s;



                                          This code uses distinct on 2 parameters and provides count of number of rows specific to those distinct values row count. It worked for me in mysql like a charm.






                                          share|improve this answer















                                          select DISTINCT DocumentId as i, DocumentSessionId as s , count(*)



                                          from DocumentOutputItems



                                          group by i ,s;



                                          This code uses distinct on 2 parameters and provides count of number of rows specific to those distinct values row count. It worked for me in mysql like a charm.







                                          share|improve this answer














                                          share|improve this answer



                                          share|improve this answer








                                          edited Nov 22 '18 at 6:13

























                                          answered Nov 16 '18 at 7:17









                                          rishi jainrishi jain

                                          2519




                                          2519













                                          • This is the part of review answer in stackoverflow. Add some explanation though code is explanatory.

                                            – Harsha B
                                            Nov 16 '18 at 13:10











                                          • Can somebody please explain why my answer got a -1. Sorry, I am a newbie and I tried different answers and none of them was working. I made the changes and posted the working code?

                                            – rishi jain
                                            Nov 20 '18 at 6:45











                                          • Answer was not formatted properly. May be that's why someone downvoted. Don't worry, I have formatted the same.

                                            – Harsha B
                                            Nov 20 '18 at 7:04



















                                          • This is the part of review answer in stackoverflow. Add some explanation though code is explanatory.

                                            – Harsha B
                                            Nov 16 '18 at 13:10











                                          • Can somebody please explain why my answer got a -1. Sorry, I am a newbie and I tried different answers and none of them was working. I made the changes and posted the working code?

                                            – rishi jain
                                            Nov 20 '18 at 6:45











                                          • Answer was not formatted properly. May be that's why someone downvoted. Don't worry, I have formatted the same.

                                            – Harsha B
                                            Nov 20 '18 at 7:04

















                                          This is the part of review answer in stackoverflow. Add some explanation though code is explanatory.

                                          – Harsha B
                                          Nov 16 '18 at 13:10





                                          This is the part of review answer in stackoverflow. Add some explanation though code is explanatory.

                                          – Harsha B
                                          Nov 16 '18 at 13:10













                                          Can somebody please explain why my answer got a -1. Sorry, I am a newbie and I tried different answers and none of them was working. I made the changes and posted the working code?

                                          – rishi jain
                                          Nov 20 '18 at 6:45





                                          Can somebody please explain why my answer got a -1. Sorry, I am a newbie and I tried different answers and none of them was working. I made the changes and posted the working code?

                                          – rishi jain
                                          Nov 20 '18 at 6:45













                                          Answer was not formatted properly. May be that's why someone downvoted. Don't worry, I have formatted the same.

                                          – Harsha B
                                          Nov 20 '18 at 7:04





                                          Answer was not formatted properly. May be that's why someone downvoted. Don't worry, I have formatted the same.

                                          – Harsha B
                                          Nov 20 '18 at 7:04











                                          -3














                                          This was posed and answered on Quora (https://www.quora.com/In-SQL-how-to-I-count-DISTINCT-over-multiple-columns):



                                          select col1, col2, col3, count(*)
                                          from table
                                          group by col1, col2, col3


                                          I was working on this in SAS, and SAS Proc SQL does not like DISTINCT with more than one column.






                                          share|improve this answer
























                                          • The original query in the question returns the number of combinations in given columns. This answer instead returns the number of occurrences for each combination in given columns.

                                            – jumxozizi
                                            Jan 27 '17 at 14:27
















                                          -3














                                          This was posed and answered on Quora (https://www.quora.com/In-SQL-how-to-I-count-DISTINCT-over-multiple-columns):



                                          select col1, col2, col3, count(*)
                                          from table
                                          group by col1, col2, col3


                                          I was working on this in SAS, and SAS Proc SQL does not like DISTINCT with more than one column.






                                          share|improve this answer
























                                          • The original query in the question returns the number of combinations in given columns. This answer instead returns the number of occurrences for each combination in given columns.

                                            – jumxozizi
                                            Jan 27 '17 at 14:27














                                          -3












                                          -3








                                          -3







                                          This was posed and answered on Quora (https://www.quora.com/In-SQL-how-to-I-count-DISTINCT-over-multiple-columns):



                                          select col1, col2, col3, count(*)
                                          from table
                                          group by col1, col2, col3


                                          I was working on this in SAS, and SAS Proc SQL does not like DISTINCT with more than one column.






                                          share|improve this answer













                                          This was posed and answered on Quora (https://www.quora.com/In-SQL-how-to-I-count-DISTINCT-over-multiple-columns):



                                          select col1, col2, col3, count(*)
                                          from table
                                          group by col1, col2, col3


                                          I was working on this in SAS, and SAS Proc SQL does not like DISTINCT with more than one column.







                                          share|improve this answer












                                          share|improve this answer



                                          share|improve this answer










                                          answered Sep 20 '16 at 20:12









                                          Barry DeCiccoBarry DeCicco

                                          12




                                          12













                                          • The original query in the question returns the number of combinations in given columns. This answer instead returns the number of occurrences for each combination in given columns.

                                            – jumxozizi
                                            Jan 27 '17 at 14:27



















                                          • The original query in the question returns the number of combinations in given columns. This answer instead returns the number of occurrences for each combination in given columns.

                                            – jumxozizi
                                            Jan 27 '17 at 14:27

















                                          The original query in the question returns the number of combinations in given columns. This answer instead returns the number of occurrences for each combination in given columns.

                                          – jumxozizi
                                          Jan 27 '17 at 14:27





                                          The original query in the question returns the number of combinations in given columns. This answer instead returns the number of occurrences for each combination in given columns.

                                          – jumxozizi
                                          Jan 27 '17 at 14:27


















                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f1471250%2fcounting-distinct-over-multiple-columns%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          Ottavio Pratesi

                                          Tricia Helfer

                                          15 giugno