SAKILA Database is the default database that comes with the MySQL Workbench. It contains data related to a movie rental store. I thought of querying that database to answer a few business problems.
Assume the name of the movie rental is “Millenium Movies” and it’s owner is interested in knowing the following details about his business. Therefore, he gathers his analytics team and lists down his requirements.
- Top 10 most in-demand movies by the number of rentals
- How many movies do we have in the store?
- Breakdown of the number of movies by genre.
- Which are the most popular genres?
- What is the average rental duration of a movie?
- What is the income of the store?
- What are the overdue movies and customer names
Top 10 Most In-Demand Movies by the No. of Rentals
The data needed to answer this question was in three tables namely “rental”, “inventory” and “film”. Hence, two Inner Joins were needed to accomplish this task. First Inner Join was done between “rental” and “inventory” tables and then on the “film” table. Next, the result set was grouped by the ‘film_id’ and ordered by the ascending order of the count of ‘rental_id‘. Since the requirement was only to get the top 10 rows, ‘LIMIT 10’ was used.
SELECT inventory.film_id, count(rental.rental_id) AS '# rentals',
film.title
FROM rental INNER JOIN inventory
ON rental.inventory_id = inventory.inventory_id
INNER JOIN film
ON inventory.film_id = film.film_id
group by inventory.film_id
order by count(rental.rental_id) desc
LIMIT 10;
How many movies do we have in the store?
The answer to this was obtained by using a simple aggregation query.
SELECT COUNT(film_id) AS 'Total No. of Movies' from film;
Breakdown of the number of movies by genre.
In order to do this, we need to join the tables “film_category” and “category”. This is an Inner Join since we only want to join by the common fields in each table. Then we take the count of all rows and group it by the “film category”
SELECT category.name AS 'Category Name',
count(*) AS '# of Movies' FROM
film_category INNER JOIN category ON
film_category.category_id = category.category_id
GROUP BY category.category_id
ORDER BY count(*) DESC;
Top 3 Most Popular Movie Genres by No. of Rentals
Four tables were Inner Joined to generate the result set here. Tables joined were “rental”, “inventory”, “film_category”, “category”. Then the results were grouped by film category (a.k.a ‘genre’) and ordered by the descending order of the number of rentals.
SELECT category.name, count(*) AS '# of Rentals'
FROM category
INNER JOIN film_category
ON category.category_id = film_category.category_id
INNER JOIN inventory
ON film_category.film_id = inventory.film_id
INNER JOIN rental
ON rental.inventory_id = inventory.inventory_id
GROUP BY category.name
ORDER BY count(*) DESC
LIMIT 3;
Average Rental Duration of a Film
This can be achieved by using the aggregate function AVG
SELECT AVG(rental_duration) AS 'Average Rental Duration' from film;
Here I thought of going the extra mile and taking the breakdown of the rental duration by film genre. I had to do two Inner Joins to achieve this task given the nature of the database design.
First we Inner Join the “film” and “film_category” tables and then the category table. Once that was done, the result set was grouped by the ‘category_id’ and ordered by the descending order of average film rental_duration.
SELECT category.category_id,category.name as 'GENRE',
AVG(film.rental_duration) AS 'Average Rental Duration'
FROM film INNER JOIN film_category
ON film.film_id = film_category.film_id
INNER JOIN category
ON film_category.category_id = category.category_id
group by category.category_id
order by avg(film.rental_duration) DESC;
Income of the store
Sum aggregation function along with an Inner Join of the tables “rental” and “payment” was used to obtain the desired insight. Aggregated amount has been given an alias of “Store Income” and small case characters were used to denote the tables as an attempt to simplify the code.
SELECT SUM(p.amount) as 'Store Income'
FROM payment p
INNER JOIN rental r ON p.rental_id = r.rental_id;
List of Customers who haven’t returned the movies
This task can be accomplished by joining the tables “rental” and “customer” on the field ‘customer_id’. The logic used here to determine whether a particular customer has returned the rented movie or not is the field ‘return_date’ being ‘NULL’. Simply put, if there is no return date mentioned for a particular rental, that rental is classified as ‘not returned’ or ‘over due’
In order to get the full name of the customer, CONCAT function is used to concatenate the first_name field and last_name field.
SELECT rental.rental_id, rental.rental_date 'Rented Date',
concat(first_name," ", last_name) AS 'Name'
FROM rental
INNER JOIN customer ON
rental.customer_id = customer.customer_id
WHERE return_date IS NULL;
The problem with this output is that all the letters in full name of a particular customer are Upper Case. A workaround for this would be to use a combination of UPPER, LOWER, LEFT and SUBSTRING functions like this;
SELECT rental.rental_id, rental.rental_date 'Rented Date',
concat(CONCAT(UPPER(LEFT(first_name,1)),
LOWER(SUBSTRING(first_name, 2, CHAR_LENGTH(first_name)))) , ' ', customer.last_name)
FROM rental
INNER JOIN customer ON
rental.customer_id = customer.customer_id
WHERE return_date IS NULL;
But this query is a bit cumbersome. The above code only changes the ‘first_name‘. A separate line of code is needed to change the ‘last_name’ into the proper case. Here, the proper case means Capitalizing only the first letter of each word.
Hence, the analytics team thought of another way to do this without having to write lengthy codes. Their solution was to use a custom function for this. Accordingly, they wrote a custom function named “Proper Name” that would take “first_name” and “last_name” as its arguments and combine those two together and convert the text to proper case. The code for the custom functions is as follows;
DELIMITER $$
CREATE FUNCTION ProperName (fname CHAR(30), lname CHAR(30))
RETURNS CHAR(30) DETERMINISTIC
BEGIN
declare fname1 CHAR(30);
declare lname1 CHAR(30);
declare fullname CHAR(60);
set fname1 = CONCAT(UPPER(LEFT(fname,1)), LOWER(SUBSTRING(fname,2,CHAR_LENGTH(fname))));
set lname1 = CONCAT(UPPER(LEFT(lname,1)), LOWER(SUBSTRING(lname,2,CHAR_LENGTH(lname))));
set fullname = CONCAT(fname1, ' ', lname1);
return fullname;
END$$
DELIMITER ;
Now they only have to call this function in SQL query to get the full name of a customer in Proper Case. The updated full query would be as follows;
SELECT rental.rental_id, rental.rental_date 'Rented Date',
ProperName(first_name, last_name) AS 'Name'
FROM rental
INNER JOIN customer ON
rental.customer_id = customer.customer_id
WHERE return_date IS NULL;
The Extra Mile…
Also, a reputed movie production company was planning to deploy an email marketing campaign for an upcoming comedy movie. Therefore, the analysts needed to query the database and get a list of people who rented comedy movies with their names and email addresses.
SELECT CONCAT(first_name, " ", last_name) AS name, email
FROM customer WHERE customer_id IN
(SELECT customer_id FROM rental WHERE inventory_id IN
(SELECT inventory_id FROM inventory WHERE film_id IN
(SELECT film_id FROM film_category JOIN category USING (category_id) WHERE category.name="Comedy")));
In order to accomplish this task, the analytics team used “subqueries”. A subquery is simply a query within a query. They are easy to understand and allow to use results of another query in the outer query.