This article discusses joining multiple tables by rows and columns using SQL along with some examples.
Published in · 14 minute read · 13. September 2021
In practice, it's very rare to have a SQL query involving a single table. We may need to merge multiple tables by rows (records) or columns (fields) to get the desired result. In this article, we will discuss the operators/commands in SQL that make it possible to merge tables by rows or columns.
Multiple tables can be joined by columns in SQL using joins. Joins join two tables based on specified columns (usually the primary key in one table and a foreign key in the other). The following is the generic syntax for SQL joins.
CHOOSE
*
OFtabla 1
TO CONNECTTabla 2
USING(ID);
In the above syntax,tabla 1yTabla 2are the two tables with the key column (matching column in both tables),ID. We use the keywordUSINGonly when the key column has the same name in both tables. Otherwise, we need to explicitly mention the key columns in both tables, as shown below.
CHOOSE
*
OFtable_1 t1
TO CONNECTtabel_2 t2
INt1.t1_id = t2.t2_id;
In the above syntax,t1is an alias oftabla 1yt2is made ofTabla 2. When the names of the key columns are not the same in both tables, we need to join them usingINkeywords as shown above. We will now discuss some important joins in SQL.
join internally
The inner join joins two tables by columns and returns only the matching records (based on the specified columns) in both tables. In the query output below, we can see that only the records withIDin bothleft_tableytable_rightis returned.
CHOOSE
*
OFleft_table
JOIN THE INDREtable_right
USING(ID);
O
CHOOSE
*
OFtable_left l
JOIN THE INDREright_table r
INl.id = r.id;
join left
The left join joins two tables by columns and returns all records from the table on the left, but only matching records (based on the specified columns) from the table on the right. In the query result below, we can see the records withIDin both tables together with all the records ofleft_table. records intable_rightno partyID ileft_tablehas NULL.
CHOOSE
*
OFleft_table
LEFT TO CONNECTtable_right
USING(ID);
O
CHOOSE
*
OFtable_left l
LEFT TO CONNECTright_table r
INl.id = r.id;
join right
The right join joins two tables by columns and returns all records from the table on the right, but only matching records (based on the specified columns) from the table on the left. In the query result below, we can see the records withIDin both tables together with all the records oftable_right. records inleft_tableno partyID itable_righthas NULL.
CHOOSE
*
OFleft_table
GOOD TO CONNECTtable_right
USING(ID);
O
CHOOSE
*
OFtable_left l
GOOD TO CONNECTright_table r
INl.id = r.id;
full connection
Full join can be thought of as a combination of left and right join. The full join joins two tables by columns and returns all records in the left and right tables. In the query output below, we can see that all records in both tables are returned. unmatched recordsIDin the second table it has NULL.
CHOOSE
*
OFleft_table
FULL TO CONNECTtable_right
USING(ID);
O
CHOOSE
*
OFtable_left l
FULL TO CONNECTright_table r
INl.id = r.id;
cross union
Cross join returns the Cartesian product of two tables. Cartesian product of two sets A = {1, 2}, B = {3, 4} is A x B = {(1, 3), (1, 4), (2, 3), (2, 4)} . We don't need to specify a key column in cross joins.
CHOOSE
*
OFleft_table
CROSSWORDtable_right
semi-join
Semijoin is not technically an SQL join but works like a join. Semijoin returns the matching records in the left table based on a key column in the right table. Semijoin does not include the columns of the table on the right in the query result. In the following example, we want to return the records ofleft_table with coincidence ID i table_right.In other words, we want the records in left_table and ID is present in table_right.
CHOOSE
*
OFleft_table
WHERE
IDI
(
CHOOSEIDOFtable_right
)
anti union
The anti join is not technically an SQL join either, but it works like a join. The anti join returns the mismatched records in the left table based on a key column in the right table. Anti join also does not include the columns of the right table in the query result. In the following example, we want to return the records ofleft_table andIDdisagree withID of he table_right.In other words, we want the records in left_table and ID is not present in table_right.
CHOOSE
*
OFleft_table
WHERE
ID
NOT IN
(
CHOOSEIDOFtable_right
)
Participate yourself
Self join allows us to join a table with itself. In the query below, we need to find the records immediatelyleftworth. For this we have joined the table with itself and filtered the records with the sameleftvalue, but differentID.
CHOOSE
*
OFleft_table l1, left_table l2
WHERE
l1.left = l2.left
Y
l1.id <> l2.id
ORDER AFTERl1.left
Union
Union joins two tables by rows, as long as the data types in the columns of one table match the other table. We cannot merge a table with column data types like integer and text with a table with column data types like text and integer. However, we can merge two tables even if the column names of one table do not match those of the other. Union returns only the unique records for both tables.
(
CHOOSE
*
OFleft_table
)
UNION
(
CHOOSE
*
OFtable_right
)
union germany
Similar to Union, Union All also joins tables by rows. Unlike Union, Union All keeps duplicate records for both tables. In the query result below, we have mergedIDofleft_tableyright_table.We can see a couple of duplicates in the result.
(
CHOOSE
ID
OFleft_table
)
UNION ALL
(
CHOOSE
ID
OFtable_right
)
Cruz
The intersection returns the common records of both tables. In the query result below, we can see the commonalitiesIDthen fleft_tableytable_right.
(
CHOOSE
ID
OFleft_table
)
CRUZ
(
CHOOSE
ID
OFtable_right
)
EXCEPT
Except returns the records from the first table (left table) that are not present in the second table (right table). In the query result below we can seeIDthen fleft_tablewhich is not present intable_right.
We use the dvd_rental database downloaded fromareand restore it. Below is the documentation for restoring a database in PostgreSQL.
1. Top 5 Frequent Renters
In this example, we need to find the top 5 customers who rented the most. For this we will
- Participate incouldyrenttables usingCustomer identification.
- Count the customers (likerental_number) by groupingCustomer identification.
- Sort result byrental_numberin descending order.
- Limit results to the first 5 records.
CHOOSE
c.id_cliente,
c. first name,
c.apellido,
TELL(c.customer_id)ASrental_number
OFcould c
JOIN THE INDRErent
USING(Customer identification)
GROUP AFTERCustomer identification
ORDER AFTER
TELL(c.customer_id)DESC
LIMIT5;
2. Top 5 and worst customers by revenue generated
In this example, we use Common Table Expressions (CTEs). With CTE, we can create a temporary table that exists for a particular query. Below is the official Postgres documentation on CTE.
In this example, we need to find out the top 5 and bottom 5 customers that generated the most revenue. For this we will
1. Create a CTE with the namerevenue_per_customerby
- Unionscouldyrenttables usingCustomer identification.
- Joining the resulting table withpaytable usingrental_id.
- Calculation of the total amount paid by customers for each rental operation (such asTotal quantity) grouping byCustomer identification.
- Finally, you have to chooseCustomer identification,First name,last nameyTotal quantity.
2. Select the top 5 customers by revenue from the previous CTE by
- ClassificationTotal quantity irevenue_per_customer(CTE score) in descending order.
- Limit the result to the first 5 records.
- Add a comment that lists the entries as 'Top 5'.
3. Select the bottom 5 customers by revenue from the previous CTE by
- ClassificationTotal quantity irevenue_per_customer(CTE score) in ascending order.
- Limit the result to the first 5 records.
- Add a comment indicating the records as 'Bund 5'.
4. Combining the two previous results usingUNION.
CONrevenue_per_customerAS
(CHOOSE
c.id_cliente,
c. first name,
c.apellido,
ADDITION(by quantity)AS"Total quantity"
OFcould c
JOIN THE INDRErent
USING(Customer identification)
JOIN THE INDREpay
USING(rental identification)
GROUP AFTERc.id_client)(CHOOSE
*,
'top 5'AScomment
OFrevenue_per_customer
ORDER AFTERTotal quantityDESC
LIMIT5)UNION(CHOOSE
*,
'Package 5'AScomment
OFrevenue_per_customer
ORDER AFTERTotal quantityASC
LIMIT5)
ORDER AFTERcommentDESC, Total quantityDESC;
We can also get the result of the above query using window functions. Below is the official Postgres documentation on window functions.
To find out the top 5 and bottom 5 clients that generated the most revenue using window features, let's
1. Create a CTE with the nametotal_amt_rankby
- Unionscouldyrenttables usingCustomer identification.
- Joining the resulting table withpaytable usingrental_id.
- Calculation of the total amount paid by customers for each rental operation (such asTotal quantity) grouping byCustomer identification.
- Finally, you have to chooseCustomer identification,First name,last name,Total quantity and range ofTotal quantity (astotal_amount_rang) Sorting it in descending order. This gives rank 1 to the highest amount and so on.
2. Select the top 5 customers by revenue by selecting the customers whosetotal_amount_rang esIN BETWEEN1 and 5 of the aforementioned CTE.
3. Select the bottom 5 customers by revenue from the previous CTE by
- Classificationtotal_amount_rang itotal_amt_rank(CTE score) in descending order.
- Limit the result to the first 5 records.
4. Combining the two previous results usingUNION.
CONtotal_amt_rankAS
(
CHOOSE
c.id_cliente,
c. first name,
c.apellido,
ADDITION(by quantity)AS"Total quantity",
RINGED()ON(ORDER BY SUM(by quantity)DESC)AStotal_amount_rang
OFcould c
JOIN THE INDRErent
USING(Customer identification)
JOIN THE INDREpay
USING(rental identification)
GROUP AFTERc.id_cliente
)
(
CHOOSE*
OFtotal_amt_rank
WHERE
total_amount_rangIN BETWEEN1Y5
) UNION
(
CHOOSE*
OFtotal_amt_rank
ORDER AFTERtotal_amount_rangDESC
LIMIT5
)
ORDER AFTERmonto_total_rang;
3. The 5 countries with the highest income
In this example, we need to find the top 5 countries with the highest incomes. For this we will
- Participate inTierraybytables usingcountry_id.
- Join the resulting table withADDRESStable usingby_id.
- Join the resulting table withcouldtable usingaddress_id.
- Join the resulting table withrenttable usingCustomer identification.
- Tellcountry_id(asrental_number) by groupingcountry_id.we can also userental_id Arrive rental_number.
- Sort result byrental_numberin descending order.
- Limit results to 5 entries.
CHOOSE
co.country_id,
with earth,
TELL(co.country_id)ASrental_number
OFland with
JOIN THE INDREfor you
USING(id_country)
JOIN THE INDREaddressed to
USING(by_id)
JOIN THE INDREcould with
USING(address_id)
JOIN THE INDRErent
USING(Customer identification)
GROUP AFTERco.country_id
ORDER AFTER
TELL(co.country_id)DESC
LIMIT5;
There are some addresses and cities without customers. Using inner join omits such records. In the query below, we'll see how the result will include addresses without clients using a left join.
4. City/address without customers?
There are some cities and addresses without customers (these may be store addresses). Using inner joins would have left them out of the results, since there are no matching records in the second table. For example, a city named London in Canada has no match.by_idin the address table. Using the inner join would have left London, Canada out of the result. Similarly, four addresses in Canada and Australia do not matchaddress_idicouldboard
CHOOSE
with earth,
by,
one direction,
con.customer_id
OFland with
JOIN THE WOMBfor you
USING(id_country)
JOIN THE WOMBaddressed to
USING(by_id)
JOIN THE WOMBcould with
USING(address_id)
WHEREcu.address_idIT IS NULL;
5. Countries without customers
In this example, we find the countries without customers
1. Create a subquery to find the countries with at least one customer after
- UnionsTierratable withbytable usingcountry_id.
- Unify the rest of the tableADDRESStable usingby_id.
- Unify the rest of the tablecouldtable usingaddress_id.
2. ChoiceTierra of Tierra table where country_id is not present incountry_id from the previous subquery.
CHOOSE
Tierra
OFTierra
WHEREcountry_id
NOT IN
(
CHOOSE
co.country_id
OFland with
JOIN THE INDREfor you
USING(id_country)
JOIN THE INDREaddressed to
USING(by_id)
JOIN THE INDREcould
USING(address_id)
);
6. Are there stores in Australia?
In the previous example, we saw that Australia has no customers. In this example we can see if there are stores in Australia
- UnionsTierratable withbytable usingcountry_id.
- Joining the resulting table withADDRESStable usingby_id.
- Joining the resulting table withStoretable usingaddress_id.
- Selection of tickets wherestore id NOT NULLin Australia.
The left join ensures that countries with no cities and cities with no stores are also included in the query result.
CHOOSE
st.store_id,
with earth,
ad.address
OFland with
JOIN THE WOMBfor you
USING(id_country)
JOIN THE WOMBad address
USING(by_id)
JOIN THE WOMBstore st
USING(address_id)
WHERE
(st.store_idNOT NULL)
Y
(co.country = 'Australia');
There is a store in Australia. In fact, there are only two stores in the entire database. We will see them using the query below.
CHOOSE*OFStore;
7. Language without cinema
In this example, we want to see if there are languages without movies from
- UnionsLanguagetable withmovietable usinglanguage_id. The left join ensures that non-movie languages are also included.
- Record filtering wherefilm_idIT IS NULL.
CHOOSE
*
OFlanguage l
JOIN THE WOMBfilm f
USING(language ID)
WHEREf.film_idIT IS NULL;
We see some languages without movies in the database. We make sure that it is not a mistake by choosing the movies withlanguage_idand (2,3,4,5,6) demovietable. The query result should not return any records.
CHOOSE
*
OFmovie
WHERElanguage_idI(2,3,4,5,6);
8. Popularity of Movies by Category in India
In this example, we find the number of leases by film category in India by joining the required tables as discussed in the previous examples and
- grouping byTierraycategoryand filter entries from India and count movie category name (likemovie_category_number).
- Sorting the result by country in ascending order andmovie_category_numberin descending order.
CHOOSE
with earth,
cat.nameASmovie category,
TELL(name.cat)ASmovie_category_number
OFland with
JOIN THE INDREfor you
USING(id_country)
JOIN THE INDREad address
USING(by_id)
JOIN THE INDREcould with
USING(address_id)
JOIN THE INDRErent with respect to
USING(Customer identification)
JOIN THE INDREinventory
USING(inventory_id)
JOIN THE INDREmovie fi
USING(film_id)
JOIN THE INDREfc movie categories
USING(film_id)
JOIN THE INDREcat category
USING(categoria ID)
/*
Using
WHERE co.country = 'Indian'
here instead of
HAVING co.country = 'If'
reduces query execution time.
*/
GROUP AFTER(co.country, cat.name)
IN HAVEco.country = 'Yes'
ORDER AFTER
co.landASC,
TELL(name.cat)DESC;
9. Movies with only one actor
In this example we find the movies with a single actor
- Unionsmovietable withmovie actortable usingfilm_id.
- grouping byfilm_idand counting the number of actors (asactor_count).
- Record filtering whereactor_countes 1
CHOOSE
f.film_id,
f.title,
TELL(fa.actor_id)ASactor_count
OFfilm f
JOIN THE INDREfilm_actor fa
USING(film_id)
GROUP AFTERf.film_id
HAVE AN ACCOUNT(fa.actor_id) = 1;
10. Number of films by an actor by category
In this example we find the number of films of an actor by category of film by
- Create a CTE by nameactor_cat_cntwhich returns the number of movies for eachactor_idycategoria ID.
- Participates in the aforementioned CTE withcategorytable usingcategoria ID.
- Joining the resulting table withactortable usingactor_id.
- Sort actor name (concatenation ofFirst nameylast name)in ascending order andmovie counterin descending order.
CON
actor_cat_cntAS
(
CHOOSE
fa.actor_id,
fc.category_id,
TELL(f.film_id)ASmovie counter
OFfilm_actor fa
JOIN THE INDREfilm f
USING(film_id)
JOIN THE INDREfc movie categories
USING(film_id)
GROUP AFTER
fa.actor_id,
fc.category_id
)CHOOSE
KONCAT(ac.firstname, ' ', ac.lastname)ASactor,
approximate nameAScategory,
movie counter
OFactor_cat_cnt
JOIN THE INDREca category
USING(categoria ID)
JOIN THE INDREactor ac
USING(actor_id)
ORDER AFTER
KONCAT(ac.firstname, ' ', ac.lastname)ASC,
movie counterDESC;
11. Popular categories of an actor.
In the example above, we found the number of movies for an actor by movie category. In this example, we find an actor's popular categories (i.e., the categories in which an actor has the most films) from
- Create a CTE by nameactor_cat_cntwhich returns the number of movies for eachactor_idycategoria IDand ranks each actor's categories by number of films in descending order (such askat_rang).
- Participates in the aforementioned CTE withcategorytable usingcategoria ID.
- Joining the resulting table withactortable usingactor_id.
- Filtering records with cat_rank = 1.
- Sort actor name (concatenation ofFirst nameylast name)in ascending order andmovie counterin descending order.
CON
actor_cat_cntAS
(
CHOOSE
fa.actor_id,
fc.category_id,
TELL(f.film_id)ASmovie counter,
RINGED()ON
(DIVISION OFfa.actor_id
ORDER BY QUANTITY(f.film_id)DESC)ASkat_rang
OFfilm_actor fa
JOIN THE INDREfilm f
USING(film_id)
JOIN THE INDREfc movie categories
USING(film_id)
GROUP AFTER
fa.actor_id,
fc.category_id
)CHOOSE
KONCAT(ac.firstname, ' ', ac.lastname)ASactor,
approximate nameAScategory,
movie counter
OFactor_cat_cnt
JOIN THE INDREca category
USING(categoria ID)
JOIN THE INDREactor ac
USING(actor_id)
WHEREkat_rang = 1
ORDER AFTER
KONCAT(ac.firstname, ' ', ac.lastname)ASC,
movie counterDESC;
This brings this article to a close. We have discussed ways to merge tables by rows or columns using SQL along with some examples using the dvd_rental database. These are the basic concepts used in almost every query we write in SQL. Some of them we may not often use in practice, but it is necessary to know about them.