중복데이터를 삭제하고 싶습니다.
타겟은 2021/09/28 포함 이후 데이터중에 중복이 발생하면 중복데이터중 1건을 남기고 나머진 삭제하려 합니다.
2021/09/28 포함 이후 데이터가 중복제거 대상이지만 예외로.
2021/09/28의 reservation_id "Ryaaaaa" 의 경우는 09/25일부터 생성되었기때문에 해당 reservation_id에 대해서는 2021/09/25 이후의 데이터 부터가 중복 삭제 대상이 됩니다.
delete문 보다는 삭제대상을 확인할수 있는 select문이 알고 싶습니다.
| reservation_no | check_in_date | Delete Target |
| Ryaaaaa | 2021-09-25 | YES |
| Ryaaaaa | 2021-09-25 | NO |
| Ryaaaaa | 2021-09-26 | NO |
| Ryaaaaa | 2021-09-27 | YES |
| Ryaaaaa | 2021-09-27 | YES |
| Ryaaaaa | 2021-09-27 | NO |
| Ryaaaaa | 2021-09-28 | NO |
| Ryaaaaa | 2021-10-03 | NO |
| Ryaaaaa | 2021-10-04 | YES |
| Ryaaaaa | 2021-10-04 | NO |
| Ryaaaaa | 2021-10-05 | NO |
| Rybbbb | 2021-01-30 | NO |
| Ryccccc | 2021-09-25 | NO |
| Ryccccc | 2021-09-26 | NO |
| Rydddd | 2021-10-06 | NO |
with t1 as ( select 'Ryaaaaa' as RESERVATION_ID, '2021-09-25' as check_in_dt, 'YES' as del_target from dual union all select 'Ryaaaaa' as RESERVATION_ID, '2021-09-25' as check_in_dt, 'NO' as del_target from dual union all select 'Ryaaaaa' as RESERVATION_ID, '2021-09-26' as check_in_dt, 'NO' as del_target from dual union all select 'Ryaaaaa' as RESERVATION_ID, '2021-09-27' as check_in_dt, 'YES' as del_target from dual union all select 'Ryaaaaa' as RESERVATION_ID, '2021-09-27' as check_in_dt, 'YES' as del_target from dual union all select 'Ryaaaaa' as RESERVATION_ID, '2021-09-27' as check_in_dt, 'NO' as del_target from dual union all select 'Ryaaaaa' as RESERVATION_ID, '2021-09-28' as check_in_dt, 'NO' as del_target from dual union all select 'Ryaaaaa' as RESERVATION_ID, '2021-10-03' as check_in_dt, 'NO' as del_target from dual union all select 'Ryaaaaa' as RESERVATION_ID, '2021-10-04' as check_in_dt, 'YES' as del_target from dual union all select 'Ryaaaaa' as RESERVATION_ID, '2021-10-04' as check_in_dt, 'NO' as del_target from dual union all select 'Ryaaaaa' as RESERVATION_ID, '2021-10-05' as check_in_dt, 'NO' as del_target from dual union all select 'Rybbbb' as RESERVATION_ID, '2021-01-30' as check_in_dt, 'NO' as del_target from dual union all select 'Ryccccc' as RESERVATION_ID, '2021-09-25' as check_in_dt, 'NO' as del_target from dual union all select 'Ryccccc' as RESERVATION_ID, '2021-09-26' as check_in_dt, 'NO' as del_target from dual union all select 'Rydddd' as RESERVATION_ID, '2021-10-06' as check_in_dt, 'NO' as del_target from dual ) select * from t1;
select *
from t1 a
where rowid > (select min(rowid)
from t1
where RESERVATION_ID = a.RESERVATION_ID
and check_in_dt = a.check_in_dt)
and ( check_in_dt >= '2021-09-28'
or (RESERVATION_ID = 'Ryaaaaa' and check_in_dt >='2021-09-25'))
앗! 괄호를 잘못 넣었네요. 마농님 감사합니다.~
테스트할 땐 잘 넣었던 것같은데 ^^:
답변감사드립니다. ^^
설명의 예로서 "Ryaaaaa" 를 말하긴 했습니다만 빅테이블 인관계로 특정 예약아이디를 답변주신 것처럼 조건절에 넣는것은 힘들것 같습니다.ㅎ
저.. 그리고 결과부분도 조금 다르게 나오는것 같습니다.^^;; 답변주셔서 정말 감사드려요.
결과를 확인하기 쉽도록 하기 위해서 "del_target" 컬럼을 추가해 두었습니다.
RESERVA CHECK_IN_D DEL
------- ---------- ---
Ryaaaaa 2021-09-25 NO
Ryaaaaa 2021-09-27 YES
Ryaaaaa 2021-09-27 NO
Ryaaaaa 2021-10-04 NO
-- 괄호 오류 수정
and ( check_in_dt >= '2021-09-28'
or (RESERVATION_ID = 'Ryaaaaa' and check_in_dt >='2021-09-25') )
SELECT *
FROM t1 a
WHERE check_in_dt >= '2021-09-25'
AND check_in_dt >= DECODE(reservation_id, 'Ryaaaaa', '2021-09-25', '2021-09-28')
AND EXISTS (SELECT 1
FROM t1
WHERE reservation_id = a.reservation_id
AND check_in_dt = a.check_in_dt
AND ROWID > a.ROWID
)
;
SELECT RESERVATION_ID, check_in_dt
FROM (SELECT RESERVATION_ID, check_in_dt
,ROW_NUMBER() OVER(PARTITION BY RESERVATION_ID, check_in_dt) AS RN
FROM t1
) T
WHERE RN > 1
AND (check_in_dt >= '2021-09-28' OR RESERVATION_ID = 'Ryaaaaa')
특정 RESERVATION_ID (Ryaaaaa) 의 경우는 전체 날짜 기준, 나머지는 20210928 기준으로 중복되는 값들 중에 1개만 제외하고 나머지는 출력(삭제대상)하는 쿼리 입니다.(의도하셨던 질문의 답이 될지는 모르겠지만...)
위의 분들과는 또다른 쿼리가 되었네요
특정ID 만 별도 처리해 달라는 질문의 요구사항과
특정ID 를 조건절에 넣는것은 힘들것 같다는 댓글의 표현은
서로 상충되는 표현인데요?
원하시는 바에 대한 설명을 잘못하고 계신 듯 하네요.
안녕하세요. 답변 정말 감사드립니다.
>> 특정ID 를 조건절에 넣는것은 힘들것 같다는 댓글의 표현은서로 상충되는 표현인데요?
이부분은 리터럴로 직접 입력을 하기힘들다는 뜻으로 드린 말씀입니다.^^
수천만건인 테이블에서 제가 예시로 든 Ryaaaaa 과 같은 케이스가 또 있을 경우 그 예약아이디를 계속 직접 추가해주는게 힘들다는 뜻으로 드린 말씀입니다.^^ 제 설명이 좀 부탁했나보네요
with t1 as (
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-25' as check_in_dt, 'YES' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-25' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-26' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-27' as check_in_dt, 'YES' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-27' as check_in_dt, 'YES' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-27' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-28' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-10-03' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-10-04' as check_in_dt, 'YES' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-10-04' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-10-05' as check_in_dt, 'NO' as del_target from dual union all
select 'Rybbbb' as RESERVATION_ID, '2021-01-30' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryccccc' as RESERVATION_ID, '2021-09-25' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryccccc' as RESERVATION_ID, '2021-09-26' as check_in_dt, 'NO' as del_target from dual union all
select 'Rydddd' as RESERVATION_ID, '2021-10-06' as check_in_dt, 'NO' as del_target from dual
)
,TEMP AS (
SELECT DISTINCT RESERVATION_ID
FROM t1
WHERE check_in_dt < '2021-09-28'
)
SELECT RESERVATION_ID, check_in_dt
FROM (SELECT RESERVATION_ID, check_in_dt
,ROW_NUMBER() OVER(PARTITION BY RESERVATION_ID, check_in_dt) AS RN
FROM t1
) T
WHERE RN > 1
AND (check_in_dt >= '2021-09-28' OR RESERVATION_ID IN (SELECT RESERVATION_ID FROM TEMP))
;
아마 특정 ID 가 아닌 해당 날짜 이전 기록을 가진 ID 값을 말씀하신거 같네요.
제가 이해한 것이 맞다면 이런 식의 쿼리는 어떨까 싶습니다.
안녕하세요 답변 정말 감사드립니다.
먼저 알려주신 SQL이 그대로는 애러가 생기는 관계로 "order by 1" 추가 했습니다. 결과는 조금 다른듯 합니다.
제가 해보겠습니다 ㅎ. 답변 정말 감사드려요.
RESERVA CHECK_IN_D DEL
------- ---------- ---
Ryaaaaa 2021-09-25 YES
Ryaaaaa 2021-09-27 YES
Ryaaaaa 2021-09-27 NO
Ryaaaaa 2021-10-04 YES
답변 정말 감사드립니다.
조건절에 Ryaaaaa 이 직접 들어가서 조금 제 설명이 부족했던것 같습니다.
중복데이터 삭제 조건이 먼저
1. 2021/09/28 이후 데이터에 대해서 삭제.
2. 기본적으로 2021/09/28 이후 발생한 데이터에 대해서 "RESERVATION_ID" + "check_in_dt" 가 유일 하도록 삭제를 진행 하지만.
예로 들은 Ryaaaaa 와 같이 2021/09/28 이후 발생 데이터중에 어떤 "RESERVATION_ID" 의 경우<예의 Ryaaaaa>는 최초 check_in_dt 데이터가 2021/09/28 이전부터도 발생한 경우가 있다. 이경우에는 해당 특정 RESERVATION_ID 의 경우는 최초 발생 데이터의 시점부터를 삭제 대상으로 한다. <예시로 든 Ryaaaaa경우는 2021/09/25 부터 .. >
3. 위 2번 조건으로부터, 아래 Ryccccc 의 경우는 중복 데이터가 있지만 Ryccccc 의 마지막 발생이 목표인 2021/09/28 이후에 발생하진 않았기 때문에 삭제 대상 제외.
1~3 을 요약하면 2021/09/28 이후 발생한 RESERVATION_ID 는 2021/09/28 이전 데이터도 전부 중복 제거 대상이 되겠네요.
| Ryccccc | 2021-09-25 | NO |
| Ryccccc | 2021-09-26 | NO |
입니다. 혹시 부족한 설명이 있다면 말씀주시면 추가 하겠습니다.
SELECT *
FROM t1 a
WHERE EXISTS (SELECT 1
FROM t1
WHERE reservation_id = a.reservation_id
AND check_in_dt = a.check_in_dt
AND ROWID > a.ROWID
)
AND reservation_id IN (SELECT reservation_id
FROM t1
WHERE check_in_dt >= '2021-09-28'
)
;
답변 정말 감사드려요.
좋은 공부가 되었습니다. ㅎ
자답입니다만 이런 느낌일까 싶었습니다.
select *
from t1 t
where exists (
select 'x'
from t1 x
where x.RESERVATION_ID = t.RESERVATION_ID
and x.check_in_dt = t.check_in_dt
and x.rowid < t.rowid
)
and (check_in_dt >= '2021-09-28' OR RESERVATION_ID in (
select RESERVATION_ID
from t1
group by RESERVATION_ID, check_in_dt
having count(*) > 1
)
);