Using group by to return a row with max()


Keywords:mysql 


Question: 

I am trying to get the items that their item_id exists in an array (arr_items) and have the higher num.

player_id | item_id | num | unique_number
-----------------------------------------
10        |    1    |  1  |      1
10        |    1    |  11 |      2
10        |    1    |  93 |      3
10        |    2    |  24 |      4
10        |    2    |  40 |      5

The expected result is to get the item_id 1 with num 93, and item_id 2 with num 40.

The below query returns no results. If I remove the max in the num, it works but doesn't return the item_id with the highest num. Suppose arr_items = [1,2]

SELECT a.player_id, a.item_id, a.num, a.unique_number
FROM my_table a
    INNER JOIN
    (
        SELECT player_id, item_id, max(num) AS max_num, unique_number
        FROM my_table
        WHERE
        player_id = 10
        AND item_id IN (arr_items)
        GROUP BY item_id
    ) b ON a.item_id = b.item_id AND
           a.player_id = b.player_id AND
           a.num = b.max_num AND
           a.unique_number = b.unique_number
;

Edit: The above query works fine if I remove the unique_number from the ON clause but I don't quite understand why. I also noticed, that if I have unique item_ids, then it works with the unique_number field in the on clause.

Meaning if my table was like the below, it would work. (the item_id values changed)

player_id | item_id | num | unique_number
-----------------------------------------
10        |    0    |  1  |      1
10        |    1    |  11 |      2
10        |    2    |  93 |      3
10        |    3    |  24 |      4
10        |    4    |  40 |      5

1 Answer: 

MySQL provides a non-standard way of using group by which has regrettably "taught" many people to believe grouping is as simple as falling off a log, but yet somehow it could correctly guess/interpret the balance of the needed logic. The truth is that while grouping is quite easy, the weird non-standard syntax MySQL allows isn't always right (the result are an "approximation") and does not always guess correctly.

Your existing query can easily be modified to achieve what you need, but note that you really should always specify every non-aggregating column in the group by clause.

SELECT a.player_id, a.item_id, a.num, a.unique_number
FROM my_table a
INNER JOIN (
        SELECT player_id, item_id, max(num) max_num
        FROM my_table
        GROUP BY player_id, item_id
    ) b ON a.item_id = b.item_id 
       AND a.player_id = b.player_id 
       AND a.num = b.max_num
;

In the future, when MySQL does implement window functions, you will be able to use row_number() to achieve what you want and it will be a little more efficient as well:

select * 
from (
     select *
         , row_number() over(partition by player_id, item_id
                             order by num DESC) as rn
     from my_table
     ) d
where rn = 1

nb: This second query will always only return one row per player_id, item_id but the first query above could return more than one row IF num = max(num) is repeated per player_id, item_id