오라클 빅 테이블 데이터 중복삭제에 관해서 조언 부탁드립니다. 0

by dbatrace [SQL Query] [2022.04.20 18:47:16]


중복데이터를 삭제하고 싶습니다.

타겟은 2021/09/28 포함 이후 데이터중에 중복이 발생하면 중복데이터중 1건을 남기고 나머진 삭제하려 합니다.

2021/09/28 포함 이후 데이터가 중복제거 대상이지만 예외로.

2021/09/28의 reservation_id "Ryaaaaa" 의 경우는 09/25일부터 생성되었기때문에 해당 reservation_id에 대해서는 2021/09/25 이후의 데이터 부터가 중복 삭제 대상이 됩니다.

delete문 보다는 삭제대상을 확인할수 있는 select문이 알고 싶습니다.

reservation_no check_in_date Delete Target
Ryaaaaa 2021-09-25 YES
Ryaaaaa 2021-09-25 NO
Ryaaaaa 2021-09-26 NO
Ryaaaaa 2021-09-27 YES
Ryaaaaa 2021-09-27 YES
Ryaaaaa 2021-09-27 NO
Ryaaaaa 2021-09-28 NO
Ryaaaaa 2021-10-03 NO
Ryaaaaa 2021-10-04 YES
Ryaaaaa 2021-10-04 NO
Ryaaaaa 2021-10-05 NO
Rybbbb 2021-01-30 NO
Ryccccc 2021-09-25 NO
Ryccccc 2021-09-26 NO
Rydddd 2021-10-06 NO
with t1 as (
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-25' as check_in_dt, 'YES' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-25' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-26' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-27' as check_in_dt, 'YES' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-27' as check_in_dt, 'YES' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-27' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-28' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-10-03' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-10-04' as check_in_dt, 'YES' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-10-04' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-10-05' as check_in_dt, 'NO' as del_target from dual union all
select 'Rybbbb' as RESERVATION_ID, '2021-01-30' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryccccc' as RESERVATION_ID, '2021-09-25' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryccccc' as RESERVATION_ID, '2021-09-26' as check_in_dt, 'NO' as del_target from dual union all
select 'Rydddd' as RESERVATION_ID, '2021-10-06' as check_in_dt, 'NO' as del_target from dual
)
select * from t1;

 

by 신이만든지기 [2022.04.21 09:34:35]
select * 
  from t1 a
 where rowid > (select min(rowid) 
                  from t1 
                 where RESERVATION_ID = a.RESERVATION_ID 
                   and check_in_dt = a.check_in_dt)
  and (   check_in_dt >= '2021-09-28' 
       or (RESERVATION_ID = 'Ryaaaaa' and check_in_dt >='2021-09-25'))

 

앗! 괄호를 잘못 넣었네요. 마농님 감사합니다.~

테스트할 땐 잘 넣었던 것같은데 ^^:


by dbatrace [2022.04.21 10:04:56]

답변감사드립니다. ^^

설명의 예로서 "Ryaaaaa" 를 말하긴 했습니다만 빅테이블 인관계로 특정 예약아이디를 답변주신 것처럼 조건절에 넣는것은 힘들것 같습니다.ㅎ

저.. 그리고 결과부분도 조금 다르게 나오는것 같습니다.^^;; 답변주셔서 정말 감사드려요.

결과를 확인하기 쉽도록 하기 위해서  "del_target" 컬럼을 추가해 두었습니다. 

RESERVA CHECK_IN_D DEL
------- ---------- ---
Ryaaaaa 2021-09-25 NO
Ryaaaaa 2021-09-27 YES
Ryaaaaa 2021-09-27 NO
Ryaaaaa 2021-10-04 NO


by 마농 [2022.04.21 10:10:36]
-- 괄호 오류 수정
  and (   check_in_dt >= '2021-09-28'
       or (RESERVATION_ID = 'Ryaaaaa' and check_in_dt >='2021-09-25') )

 


by 마농 [2022.04.21 09:54:47]
SELECT *
  FROM t1 a
 WHERE check_in_dt >= '2021-09-25'
   AND check_in_dt >= DECODE(reservation_id, 'Ryaaaaa', '2021-09-25', '2021-09-28')
   AND EXISTS (SELECT 1
                 FROM t1
                WHERE reservation_id = a.reservation_id
                  AND check_in_dt    = a.check_in_dt
                  AND ROWID          > a.ROWID
               )
;

 


by 우주민 [2022.04.21 10:02:01]
SELECT RESERVATION_ID, check_in_dt
FROM (SELECT RESERVATION_ID,  check_in_dt
      ,ROW_NUMBER() OVER(PARTITION BY RESERVATION_ID, check_in_dt) AS RN 
      FROM t1
      ) T
WHERE RN > 1
AND (check_in_dt >= '2021-09-28' OR RESERVATION_ID = 'Ryaaaaa')


특정 RESERVATION_ID (Ryaaaaa) 의 경우는 전체 날짜 기준, 나머지는 20210928 기준으로 중복되는 값들 중에 1개만 제외하고 나머지는 출력(삭제대상)하는 쿼리 입니다.(의도하셨던 질문의 답이 될지는 모르겠지만...)

위의 분들과는 또다른 쿼리가 되었네요


by 마농 [2022.04.21 10:16:12]

특정ID 만 별도 처리해 달라는 질문의 요구사항과
특정ID 를 조건절에 넣는것은 힘들것 같다는 댓글의 표현은
서로 상충되는 표현인데요?
원하시는 바에 대한 설명을 잘못하고 계신 듯 하네요.


by dbatrace [2022.04.21 11:10:33]

안녕하세요. 답변 정말 감사드립니다.

>> 특정ID 를 조건절에 넣는것은 힘들것 같다는 댓글의 표현은서로 상충되는 표현인데요?

이부분은 리터럴로 직접 입력을 하기힘들다는 뜻으로 드린 말씀입니다.^^

수천만건인 테이블에서 제가 예시로 든 Ryaaaaa 과 같은 케이스가 또 있을 경우 그 예약아이디를 계속 직접 추가해주는게 힘들다는 뜻으로 드린 말씀입니다.^^ 제 설명이 좀 부탁했나보네요


by 우주민 [2022.04.21 10:19:31]
with t1 as (
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-25' as check_in_dt, 'YES' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-25' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-26' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-27' as check_in_dt, 'YES' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-27' as check_in_dt, 'YES' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-27' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-09-28' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-10-03' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-10-04' as check_in_dt, 'YES' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-10-04' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryaaaaa' as RESERVATION_ID, '2021-10-05' as check_in_dt, 'NO' as del_target from dual union all
select 'Rybbbb' as RESERVATION_ID, '2021-01-30' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryccccc' as RESERVATION_ID, '2021-09-25' as check_in_dt, 'NO' as del_target from dual union all
select 'Ryccccc' as RESERVATION_ID, '2021-09-26' as check_in_dt, 'NO' as del_target from dual union all
select 'Rydddd' as RESERVATION_ID, '2021-10-06' as check_in_dt, 'NO' as del_target from dual
)
,TEMP AS (
SELECT DISTINCT RESERVATION_ID
FROM t1
WHERE check_in_dt < '2021-09-28'
)
SELECT RESERVATION_ID, check_in_dt
FROM (SELECT RESERVATION_ID,  check_in_dt
      ,ROW_NUMBER() OVER(PARTITION BY RESERVATION_ID, check_in_dt) AS RN 
      FROM t1
      ) T
WHERE RN > 1
AND (check_in_dt >= '2021-09-28' OR RESERVATION_ID IN (SELECT RESERVATION_ID FROM TEMP))
;


아마 특정 ID 가 아닌 해당 날짜 이전 기록을 가진 ID 값을 말씀하신거 같네요.

제가 이해한 것이 맞다면 이런 식의 쿼리는 어떨까 싶습니다.


by dbatrace [2022.04.21 11:38:43]

안녕하세요 답변 정말 감사드립니다.

먼저 알려주신 SQL이 그대로는 애러가 생기는 관계로  "order by 1" 추가 했습니다. 결과는 조금 다른듯 합니다.

제가 해보겠습니다 ㅎ. 답변 정말 감사드려요.

RESERVA CHECK_IN_D DEL
------- ---------- ---
Ryaaaaa 2021-09-25 YES
Ryaaaaa 2021-09-27 YES
Ryaaaaa 2021-09-27 NO
Ryaaaaa 2021-10-04 YES


by dbatrace [2022.04.21 11:34:21]

답변 정말 감사드립니다.

조건절에 Ryaaaaa 이 직접 들어가서 조금 제 설명이 부족했던것 같습니다.

중복데이터 삭제 조건이 먼저

1. 2021/09/28 이후 데이터에 대해서 삭제.

2. 기본적으로 2021/09/28 이후 발생한 데이터에 대해서 "RESERVATION_ID" + "check_in_dt" 가 유일 하도록 삭제를 진행 하지만.

   예로 들은 Ryaaaaa 와 같이 2021/09/28 이후 발생 데이터중에 어떤 "RESERVATION_ID" 의 경우<예의 Ryaaaaa>는 최초  check_in_dt 데이터가 2021/09/28 이전부터도 발생한 경우가 있다. 이경우에는 해당 특정 RESERVATION_ID 의 경우는 최초 발생 데이터의 시점부터를 삭제 대상으로 한다. <예시로 든 Ryaaaaa경우는 2021/09/25 부터 .. >

3.  위 2번 조건으로부터, 아래 Ryccccc 의 경우는 중복 데이터가 있지만 Ryccccc 의 마지막 발생이 목표인 2021/09/28 이후에 발생하진 않았기 때문에 삭제 대상 제외.

1~3 을 요약하면  2021/09/28 이후 발생한 RESERVATION_ID 는  2021/09/28 이전 데이터도 전부 중복 제거 대상이 되겠네요.

Ryccccc 2021-09-25 NO
Ryccccc 2021-09-26 NO

입니다.  혹시 부족한 설명이 있다면 말씀주시면 추가 하겠습니다.


by 마농 [2022.04.21 12:40:08]
SELECT *
  FROM t1 a
 WHERE EXISTS (SELECT 1
                 FROM t1
                WHERE reservation_id = a.reservation_id
                  AND check_in_dt    = a.check_in_dt
                  AND ROWID          > a.ROWID
               )
   AND reservation_id IN (SELECT reservation_id
                            FROM t1
                           WHERE check_in_dt >= '2021-09-28'
                          )
;

 


by dbatrace [2022.04.21 12:49:59]

답변 정말 감사드려요.

좋은 공부가 되었습니다. ㅎ


by dbatrace [2022.04.21 12:41:27]

자답입니다만 이런 느낌일까 싶었습니다.

select *
from t1 t
where exists (
            select 'x'
            from t1 x
            where x.RESERVATION_ID = t.RESERVATION_ID
            and x.check_in_dt = t.check_in_dt
            and x.rowid < t.rowid
            )
and (check_in_dt >= '2021-09-28' OR RESERVATION_ID in (
                                                        select RESERVATION_ID
                                                        from t1
                                                        group by RESERVATION_ID, check_in_dt
                                                        having count(*) > 1
                                                        )
    );

 

댓글등록
SQL문을 포맷에 맞게(깔끔하게) 등록하려면 code() 버튼을 클릭하여 작성 하시면 됩니다.
로그인 사용자만 댓글을 작성 할 수 있습니다. 로그인, 회원가입