[MySQL]LV 3 문제 모음

2023-03-04

Intro

SQL 문제들은 많이 없기 때문에 LV 1 ~ LV 5 까지 문제들을 레벨별로 모아서 한번에 정리하겠다.

문제 1 - 집계함수 max, date + 스칼라 서브쿼리(in)

자동차 대여 기록에서 대여중 / 대여 가능 여부 구분하기

코드

-- 날짜비교1 -> date함수 사용
SELECT CAR_ID, max(case when START_DATE<=DATE(20221016) && END_DATE >= Date(20221016) 
then "대여중" else "대여 가능" end) as AVAILABILITY 
from CAR_RENTAL_COMPANY_RENTAL_HISTORY group by car_id order by car_id desc;

-- 날짜비교2 -> date함수 미사용
select car_id, max(
case when (start_date <= '2022-10-16' and end_date >= '2022-10-16') then '대여중'
else '대여 가능' end) as availability 
from car_rental_company_rental_history 
group by car_id order by car_id desc;

풀이

max(case when START_DATE<=DATE(20221016) && END_DATE >= Date(20221016) then "대여중" else "대여 가능" end)

중요한 점은 왜??? max() 를 사용했냐는 것이다.
- 위 조건의 결과로 “대여중”, “대여 가능” 이 둘다 구해졌을경우?? max() 함수는 “대여중” 을 더 큰값으로 보고 출력해준다!!
  - 잘 생각해보면 위 조건문의 경우 동일한 CAR_ID 가 여러개 있으므로 “대여중”, “대여 가능” 둘다 구해지는 경우가 존재한다!! => 그래서 max() 로 “대여중” 을 선택
  - 문자열 비교해보면 “대여 가능”보다 “대여중”이 더 큼
- 또 다른 방법으로는, 삽입 쿼리문인 IN 쿼리문 을 통해서도 간단히 해결할 수 있다.
  - ```
  select car_id, 
  (case 
   when car_id in (select a.car_id from car_rental_company_rental_history a where a.start_date <= '2022-10-16' and a.end_date >= '2022-10-16') then '대여중' 
   else '대여 가능' 
   end) as availability 
  from car_rental_company_rental_history 
  group by car_id order by car_id desc;
```
- (select a.car_id from car_rental_company_rental_history a where a.start_date <= '2022-10-16' and a.end_date >= '2022-10-16') 가 핵심이다.
- 대여중이 되는 car_id 리스트를 전부 구하는 서브쿼리! 그 리스트에 현재 car_id가 있는지 확인하기 위해 in연산자까지 사용

문제 2 - having 서브쿼리? 단일행, 다중행

대여 횟수가 많은 자동차들의 월별 대여 횟수 구하기
그룹핑 결과에 조건문 다는건 having

코드

-- having 에 서브쿼리
-- 서브쿼리 종류 -> 단일행, 다중행

select month(start_date) as month, car_id, count(*) as records 
from car_rental_company_rental_history 
group by month, car_id having month >= 8 and month <= 10 and 
car_id in(select car_id from car_rental_company_rental_history 
where month(start_date) >= 8 and month(start_date) <= 10 group by car_id having count(*)>=5)
order by month, car_id desc;

-- where절에 서브쿼리 써도 당연히 잘 풀린다.
select month(start_date) as month, car_id, count(*) as records
from CAR_RENTAL_COMPANY_RENTAL_HISTORY 
where year(start_date)=2022 and month(start_date)>=8 and month(start_date)<=10 and 
car_id in 
(select car_id
    from CAR_RENTAL_COMPANY_RENTAL_HISTORY 
    where year(start_date)=2022 and month(start_date)>=8 and month(start_date)<=10
group by car_id having count(*)>=5)
group by month(start_date), car_id
having count(*) != 0 -- 이 줄은 없어도 동작하네?
order by month, car_id desc

풀이

대여 시작일을 기준으로 2022년 8월부터 2022년 10월까지 총 대여 횟수가 5회 이상인 자동차들에 대해서 해당 기간 동안의 월별 자동차 ID 별 총 대여 횟수(컬럼명: RECORDS) 리스트를 출력하는 SQL문을 작성해주세요.

8, 9, 10월의 대여 횟수합이 5회 이상이라면 월에 1번만 대여했더라고 결과에 출력이 되어야 한다.

핵심 코드 : car_id in(select car_id from car_rental_company_rental_history where month(start_date) >= 8 and month(start_date) <= 10 group by car_id having count(*)>=5)

조건에 만족하는 car_id 리스트를 쭉 구해준다. 예로 1, 2
- 그럼 이 결과 리스트에 해당하는 car_id만 사용하게 되니까 “8~10월 데이터”를 잘 출력하면 됨. (그래서 한번 더 날짜 범위 지정하게 됨)
그룹절의 조건문 자리(having or where)에 IN 쿼리문을 사용했다.
총 대여 횟수를 구하기 위해서는 서브쿼리가 필요했다.
- 다중행의 경우 in, any, all 연산자를 사용
- 참고

문제 3 - 서브쿼리 다중컬럼 + rank() + 연관관계

즐겨찾기가 가장 많은 식당 정보 출력하기

코드

# 서브쿼리 다중컬럼 비교
select food_type, rest_id, rest_name, favorites from rest_info
where (food_type, favorites) in (select food_type, max(favorites) from rest_info group by food_type) order by food_type desc; 

# 다중 컬럼 비교하던거 -> 서브쿼리 연관관계로 풀이
select a.food_type, a.rest_id, a.rest_name, a.favorites
from rest_info a
where favorites = (select max(favorites) from rest_info where a.food_type=food_type)
order by a.food_type desc; 

# 윈도우함수 rank()
WITH FAVRK AS(
SELECT FOODTYPE,RESTID,RESTNAME,FAVORITES,
RANK() OVER(PARTITION BY FOODTYPE ORDER BY FAVORITES DESC) AS rk
FROM RESTINFO)

SELECT FOODTYPE,RESTID,RESTNAME,FAVORITES
FROM FAVRK
WHERE rk=1
ORDER BY FOOD_TYPE DESC;

풀이

(1)where (food_type, favorites) in (select food_type, max(favorites) from rest_info group by food_type) 이 핵심이다.

서브쿼리를 group by 를 활용했으므로 food_type은 중복없이 나오는건 자명하고,
다중행이므로 in 을 사용하고, 다중 컬럼이라 (food_type, favorites) 로 비교했다.

(2)where favorites = (select max(favorites) from rest_info where a.food_type=food_type) 이 핵심이다.

서브쿼리 내에서 food_type을 메인쿼리 내용을 활용해서 조건 비교한 것이다.
1번 풀이의 group by하는 불편함을 해소할 수 있다.

(3)윈도우함수를 활용해서 미리 랭킹을 구한후 해당 테이블을 출력하는 방안이다.

문제 4 - 서브쿼리 not in

오랜 기간 보호한 동물(1)

코드

select name, datetime from animal_ins 
where animal_id not in (select a.animal_id from animal_ins a join animal_outs b on (a.animal_id=b.animal_id)) 
order by datetime asc limit 3;

풀이

where animal_id not in (select a.animal_id from animal_ins a join animal_outs b on (a.animal_id=b.animal_id)) 가 핵심!!

in 쿼리문 에서 join 한 값 animal_id 이 동일하지 않는(not in)것을 조건으로 한다!!
in 연산자: 하나라도 같은게 있으면 true
not in 연산자: 전부 다르다면 없다면(같은게 아무것도 없다면) false

문제 5 - having절은 group by 결과를 사용

조건에 맞는 사용자와 총 거래금액 조회하기

코드

# 1:N 로 조인. 조인 결과는 항상 어떤 형태인지 생각하자.
# => 당연히 1인 a쪽이 중복이고, N인 b쪽은 중복아님.
select a.user_id, a.nickname, sum(b.price) as total_sales
from used_goods_user a join used_goods_board b on (a.user_id=b.writer_id)
where b.status = 'DONE'
group by a.user_id having sum(b.price) >= 700000
order by total_sales

풀이

group by 절의 결과를 보면 b.user_id, b.nickname, total_sales 이다.

참고: having 절은 위 결과를 사용한다. 따라서 갑자기 b.city 같은 컬럼을 사용하면
“Unknown column ‘b.city’ in ‘having clause” 에러가 발생한다.

문제 6 - substr과 concat, 1:N 중복 join 해결 + in서브쿼리

조건에 맞는 사용자 정보 조회하기

코드

-- group by 로 1:N join 중복 무시 및 having 에 count 써서 해결한 방식
select b.user_id, b.nickname, concat(b.city,' ', b.street_address1,' ', b.street_address2) as '전체주소', concat(substr(b.tlno,1,3),'-',substr(b.tlno,4,4),'-',substr(b.tlno,8,4)) as '전화번호'
from used_goods_board a join used_goods_user b on (a.writer_id=b.user_id) 
group by b.user_id having count(*)>=3 
order by b.user_id desc;

-- in 서브쿼리 활용 풀이. used_goods_user:used_goods_board 관계는 id가 1:N 관계.
-- in 서브쿼리에서 used_goods_board를 그룹핑하여 1관계로 만들어 매핑(in연산)
select user_id, nickname, concat(city, ' ', street_address1, ' ',street_address2) as '전체주소', concat(substring(tlno,1,3),'-',substring(tlno,4,4),'-',substring(tlno,8,4)) as '전화번호'
from used_goods_user
where user_id in (select writer_id from used_goods_board group by writer_id having count(*)>=3)
order by user_id desc

풀이

concat 메소드 사용, 기존 + 연산으로는 문자+숫자 인 경우 에러가 발생하므로 concat을 추천

substr 까지 함께 기억해두자. (substring은 부분문자열 추출 역할)

두 코드의 차이점은 조인을 했냐 안했냐의 차이일 뿐. (아마 조인을 안한 아래 코드가 더 성능 좋을 듯)

조인 했으면 중복데이터 생각해보고 -> 있으면 그룹핑은 필수(또는 distinct)!! -> 여기선 애초에 count(*)>=3 때문에 그룹핑이 더욱 필수였다.
조인 안했으면 in 서브쿼리를 가공해서 비교 -> 이때 서브쿼리를 그룹핑해서 id를 1:1로 매핑한 것(원래 N)

문제 7 - 단일행 서브쿼리

조회수가 가장 많은 중고거래 게시판의 첨부파일 조회하기

코드

# 게시물:파일 = 1:N 관계
# N:1 로 조인 -> 예상 결과는 당연히 b쪽이 중복 데이터 생성
select concat('/home/grep/src/',a.board_id,'/',a.file_id,a.file_name,file_ext) as file_path
from used_goods_file a join used_goods_board b on (a.board_id=b.board_id)
where b.views = (select max(views) as views from used_goods_board)
order by a.file_id desc

풀이

where a.views = (select max(views) from used_goods_board) 가 핵심!

in 쿼리문 뿐만아니라 “=” 으로도 비교 가능하다 (단일행이라서!) -> 집계함수 특징

문제 8 - not in 활용

업그레이드 할 수 없는 아이템 구하기

코드

-- A->B : A가 부모, B가 업데이트 완료상태
-- 최종 업데이트 : 자신을 부모로 가지고 있지 않은 아이템을 찾으면 된다. (not in 활용)
-- 서브쿼리에 group by a.item_id 넣으면 비교할게 더 줄어든다.(중복 생성이라) 여기선 그냥 생략했다.

select item_id, item_name, rarity from item_info  
where item_id not in (select a.item_id from item_info a join item_tree b on (a.item_id=b.parent_item_id)) 
order by item_id desc;

풀이

where a.item_id not in (select a.item_id from item_info a join item_tree b on (a.item_id=b.parent_item_id)) where절의 join문이 핵심이다.

문제를 해석해보면 “최종 업데이트는 자신을 부모로 가지고 있지 않은 아이템” 이다.
따라서 오히려 “자신을 부모로 가지는 아이템”을 join문으로 구하고,
이것을 not in 을 통해 자신을 부모로 가지지 않는 아이템을 구한 것이다.
그리고 조인할 때 a.item_id(부모) 는 업그레이드 가능한 아이템, b.item_id(자식) 업그레이드 된 아이템을 의미한다.

문제 9 - in절 서브쿼리에 group by 활용 + join

물고기 종류 별 대어 찾기

코드

-- 코드를 작성해주세요
-- 두 테이블 정보가 필요해서 from 절에서 먼저 join
-- "물고기 종류 별 가장 큰 물고기" 를 조건으로 구해야해서 서브쿼리 활용

select a.id, b.fish_name, a.length from fish_info a join fish_name_info b on (a.fish_type=b.fish_type) 
where (a.fish_type, a.length) in (select fish_type, max(length) from fish_info group by fish_type) 
order by a.id;

풀이

출력하는 부분에 fish_type(물고기 종류) 가 없기 때문에 메인쿼리가 아닌 서브쿼리에서 group by 사용하게 되었다.

select a.id, b.fish_name, a.length from fish_info a join fish_name_info b on (a.fish_type=b.fish_type) 처음 from절의 join문은 select의 컬럼을 출력하기 위한 목적이고,

where (a.fish_type, a.length) in (select fish_type, max(length) from fish_info group by fish_type) 두번째 where절의 서브쿼리가 핵심!

“물고기 종류 별” 라서 group by fish_type를 활용하고,
“가장 큰 물고리” 라서 max(length)를 활용했다.
추가로 fish_type 까지 같이 in 연산자로 비교해줘야 정확하게 매핑 되므로 주의하자.
- 그러기 싫다면 메인쿼리의 a.length를 서브쿼리의 having 절에서 max(length)와 비교하는 형태로 바꿔도 된다. (연관 관계)

문제 10 - left join (outer join)

대장균들의 자식의 수 구하기

코드

select a.id, 
(case 
 when b.parent_id is null then 0 
 else count(*) 
 end) as child_count 
from ecoli_data a left join ecoli_data b on (a.id = b.parent_id) 
group by a.id 
order by a.id;

-- case 안쓰는 더 간단한 방법 (count 함수 특징 활용)
select a.id, count(b.id) as child_count
from ecoli_data a left join ecoli_data b on(a.id=b.parent_id)
group by a.id
order by a.id

풀이

(case when b.parent_id is null then 0 else count(*) end) as child_count 는 select 문에 case when 조건문을 사용해서 null 인 값은 0으로 출력해줬다.

from ecoli_data a left join ecoli_data b on (a.id = b.parent_id) 로 left outer join 을 진행한다.

매핑하는 parent_id 가 null 인 경우가 있는데 count(*)는 null도 횟수로 가져가기 때문
- 만약, 직접 컬럼을 count에 기입하면 null은 무시한다. 그럼 원하는데로 0을 출력할 수 있다.
출력은 전체 id가 다 나와야 하므로 a 테이블을 살리기 위해 left outer join을 진행했다.
- inner join으로 진행하면 null 인 parent_id 값은 join 생략 되기 때문

문제 11 - 윈도우 함수, 백분율 문제(percent_rank,over,with)

대장균의 크기에 따라 분류하기 2

코드

with perTable as(
select id, percent_rank() over (order by size_of_colony desc) as per 
from ecoli_data
)

select a.id, 
(case when per <= 0.25 then 'CRITICAL' 
when per <= 0.5 then 'HIGH' 
when per <= 0.75 then 'MEDIUM' 
else 'LOW' end) as colony_name 
from ecoli_data a join perTable b on(a.id=b.id) 
order by a.id;

-- with를 사용 안하고도 당연히 가능 -> from 절에 바로!
select a.id, 
(case when a.per <= 0.25 then 'CRITICAL' 
when a.per <= 0.5 then 'HIGH' 
when a.per <= 0.75 then 'MEDIUM' 
else 'LOW' end) as colony_name 
from (select id, percent_rank() over (order by size_of_colony desc) as per 
from ecoli_data) a
order by a.id;

풀이

참고로 between으로 0 and 0.5 이런식으로 하면 0.51 범위는 포함하지 않기에 <=0.5 를 활용하자.

’<=’ 의 경우 정밀도 문제로 인해 약간 큰 값을 포함할 수 있어서 가능한 것.
다만, 확실히 하려면 반올림을 해서 비교하는게 제일 안전하다. 그럼 between도 충분히 가능하다.

with 란 서브쿼리, 임시 테이블처럼 활용할 수 있는 기능

percent_rank 함수는 인수로 지정한 값의 그룹 내의 상대적 위치를 나타내는 백분위수 순위(순위 퍼센트, Percent rank)를 반환한다.

over 함수는 윈도우 함수이다. 윈도우 함수는 다양한 계산을 도와준다.

PARTITION BY: 데이터 그룹핑 기준을 정의합니다. 이 기준에 따라 데이터를 나누고 각 그룹 내에서 별도로 계산합니다.
ORDER BY: 윈도우 함수가 적용될 행의 순서를 지정합니다.
ROWS 또는 RANGE: 특정 범위 내의 행을 지정합니다. 이는 더 정교한 윈도우 정의를 가능하게 합니다.
over (order by size_of_colony desc) 를 보면 윈도우 함수를 사용해서 order by를 진행한 모습이다.

Twitter Facebook LinkedIn

Intro

문제 1 - 집계함수 max, date + 스칼라 서브쿼리(in)

코드

풀이

문제 2 - having 서브쿼리? 단일행, 다중행

코드

풀이

문제 3 - 서브쿼리 다중컬럼 + rank() + 연관관계

코드

풀이

문제 4 - 서브쿼리 not in

코드

풀이

문제 5 - having절은 group by 결과를 사용

코드

풀이

문제 6 - substr과 concat, 1:N 중복 join 해결 + in서브쿼리

코드

풀이

문제 7 - 단일행 서브쿼리

코드

풀이

문제 8 - not in 활용

코드

풀이

문제 9 - in절 서브쿼리에 group by 활용 + join

코드

풀이

문제 10 - left join (outer join)

코드

풀이

문제 11 - 윈도우 함수, 백분율 문제(percent_rank,over,with)

코드

풀이

공유하기

댓글남기기

참고

2023-03-03 [MySQL]LV 2 문제 모음 ProgrammersTest

2023-03-04 [MySQL]LV 3 문제 모음 ProgrammersTest

2023-04-02 [MySQL]LV 4 문제 모음 ProgrammersTest

2023-03-03

[MySQL]LV 2 문제 모음

ProgrammersTest

2023-03-04

[MySQL]LV 3 문제 모음

ProgrammersTest

2023-04-02

[MySQL]LV 4 문제 모음

ProgrammersTest