I have 2 tables, authors and books
authors contains the unique id authorId
books also contains this as a foreign key
I need to know the authors with the most number of books. If 2 or more authors are tied for the greatest number of books, I need to show both authors
I have been able to achieve this by first getting the maximum count
SELECT @maxCount := (MAX(counter)) FROM (SELECT count(*) AS counter FROM books GROUP BY authorId) AS counts;
and then using it to get the Ids with that count as part of my author selection
SELECT *
FROM authors
WHERE authorId IN (
SELECT authorId
FROM books
GROUP BY authorId
HAVING COUNT(*) = @maxCount
);
I've been told that I am not allowed to use variables and that what I've done is horribly inefficient if the tables grow very large.
Am I missing something obvious here? Is there a way to do this in a single statement without a variable (or temp table), and without having to select/group the entire books table twice?
SELECT author, COUNT(*)
FROM authors
JOIN books
ON authors.authorId=books.AuthorId
GROUP BY author
ORDER BY COUNT(*) DESC
Will give you a list ordered by the number of books for each author. I don't have an instance nearby to test, and tend to steer clear of embedded variables but expect something like....
SELECT *
FROM (
SELECT author
, @maxcount:=IF(COUNT(*)>@maxcount,COUNT(*), @maxcount)
, COUNT(*) AS cnt
FROM authors
JOIN books
ON authors.authorId=books.AuthorId
GROUP BY author
ORDER BY COUNT(*) DESC
) ilv
WHERE cnt=@maxcount;
Performance still sucks with large datasets (even with the right indexes). If you have to run this query frequently with >100,000 records, then you might consider denormalizing your data.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments