Results 1 to 4 of 4

Thread: Catching duplicates and best way to test slowly changing dimensions

  1. #1
    Junior Member
    Join Date
    May 2013
    Posts
    2

    Catching duplicates and best way to test slowly changing dimensions

    Hello,

    Im looking for a way to catch duplicates in a table. Suppose say we have 100,000 (mill) records - and somewhere in between at 50,000 some of the records say 50 records we have duplicate values - ex: lets say we have emp table - the emp_name column will have duplicates ( 50 records). Can anybody please let me know the best way to catch these records. Assuming we dont have any testing tool and we should identify by running a sql.

    Also,

    We have slowly changing dimensions - say address column - how to catch these changes as well? Please provide your thoughts - sitting as a test engineer we are not sure that the source will change these - these should be caught once the load is complete in DW.

    Appreciate your help.

    Thanks

  2. #2
    Member
    Join Date
    Nov 2012
    Posts
    34
    If you want to catch duplicate records a simple way is

    Code:
    select id from table name group by id having count(1) > 1;
    This gives the list of all the values in id column that are repeated more than once (which means duplicate values).

    can you please elaborate second question

  3. #3
    Junior Member
    Join Date
    May 2013
    Posts
    2

    Slowly Changing Dimension Test

    Hi!! Katie - thanks for your response.

    More in details about second question - Slowly Changing Dimensions Test:

    Ex: The address column in Employee Table have some changes came in but we are not sure in which record these changes are. We can identify these by looking at the date column from previous day but at the same time if there are some 1000 records to be identified some where in between 50,000 and 60,000th record - whts the best way to test these. Please let me know - Thanks for your help..

  4. #4
    Member
    Join Date
    Nov 2012
    Posts
    34
    Here is a rough query for your requirement:

    Code:
    select  employee_id, 
              max(case when trunc(date_column) = trunc(sysdate) then employee_name else null end) todays_emp_name,
              max(case when trunc(date_column) = trunc(sysdate-1) then employee_name else null end) yeterdays_emp_name
    from    employees
    where  trunc(date_column) >= trunc(sysdate-1)    --this is for checking today and yesterdays data
    group by employee_id
    The above sql query gives yesterday and today changes in the same row. Now you can compare them easily.

Similar Threads

  1. Slowly changing dimensions
    By Shoban in forum ETL Testing
    Replies: 1
    Last Post: 11-07-2012, 02:58 AM
  2. Replies: 1
    Last Post: 07-16-2012, 11:02 AM
  3. Replies: 1
    Last Post: 06-06-2012, 06:54 AM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •