Data Deduplication

Agency of Duplicates

The Data Dedupliction process runs each year for ABE clients prior to the NRS reporting due date of 10/1/YYYY.  This process creates an "Agency of Duplicates" which displays on the client agency drop-down list as "Agency of Duplicates YYYY" where YYYY is the fiscal year, such as 2122, 2223, etc.  The Deduplicate Reporting Year process merges the data of duplicated students and staff  into one record at the agency of duplicates for the reporting year. ONLY users with access to the Agency of Duplicates will see the agency on the drop-down agency list. 


A student who was entered into more than one client agency, including students entered into more than one agency through the Cross Agency Tracking process. 

A staff who was entered into more than one client agency.



Students


By merging student data into the agency of duplicates and running NRS Deduplicated table and the State and Local Performance (SPR) Deduplicated searches at the client SEA, students are prevented from counting multiple times in the same reporting year.  

Data deduplication may cause the numbers on your NRS Deduplicated tables to be lower or higher than the numbers on your regular, non-deduplicated tables.  This may be caused by:


Students who were served by more than one agency are merged based on the following criteria:

1/1/2021 - 6/30/2023

Note that 1/1/2021 is 18 months prior to the 22/23 start date of 7/1/2022. 


Note that some clients do not require SSN, whether manually entering student records or when students self-register through the portal.  When a matching SSN is not found, student records are not merged, so it is possible to still have duplicate records on NRS tables even after the Data Duplication process has been run.  It is up to the client to omit the duplicate records prior to submitting NRS reports. 


When at least two student records match on SSN, Birthdate, and Gender (Sex) with activity or intake date in the proper date range, all other records that match on SSN, Birthdate, and Gender (Sex) will also be included in the merge.  This means that data from more than two records may be merged, but the additional records may have activity older than 18 months prior to the FY being deduplicated.


Once the merge criteria is met, ALL student data from the duplicate records is merged. A new student record is then created and added to the Agency of Duplicates for that fiscal year. Previous FY PoPs are synced and included in the merged record.  

FY summaries are created for the current reporting year as well as the prior two fiscal years.  For example, students merged into the Agency of Duplicates 2223 will have FY summaries for:


How is student data merged?

When students match the criteria above, all student data from all agencies that have a matching record are combined to create a new student record.  This record will differ from the student's records in the agencies that served the student individually. 

Merged record contains:

Assessment records that are the same in the student's individual records - either by an external import or manually entry with the same information - are not duplicated in the agency of duplicates but only display once.  For example, if an assessment with the same assessed date, instrument/form/level/subtest, and scaled score was in two duplicate student records, the assessment would only display once in the student's merged record in the agency of duplicates. 


Staff who served in more than one agency are merged based on the following criteria:

AND

Staff member records that do not have a start date are NOT merged. 


How is staff data merged?

When staff match the criteria above, all staff data from all agencies that have a matching record are combined to create a new staff record.  This record will differ from the staff member's records in the individual agencies.

Merged record contains:


Classes, Groups, Workshops, and Pairs

A record will be created in the agency of duplicates for each class, group, workshop, or pair with a merged student.  Pair records will also display the instructor name but tutors are NOT added to the agency of duplicates. 

Classes and workshop records are created for each staff assignment.  Some classes in the agency of duplicates may not have any students at the enrollment tab. This is because a class or workshop may have been created for merged staff, but there were no students enrolled in the class or workshop who met the criteria to be merged. 


Suggested process for SEA users:

After the Agency of Duplicates has been created, data deduplication can be thought of as a 3-step process:

Step 1:  Identify students who were not merged during the deduplication process.  There are 3 searches that help identify students who were not merged. (Information on the searches is below.)  

Step 2:  Determine if the students should be merged and ask agencies to update information so it matches based on the criteria outlined above.  

Step 3:  Contact Tech Support to request that the Deduplicate Reporting Year process be run again to capture these students.


FY Duplicate Searches

The FY Duplicate search is comprised of 3 separate searches available at the SEA agency.  The search may be run on the  FY reporting system if your database was deduplicated for that year, and finds duplicated students in the SEA agency.  The search should be run from All Students.

To run the search:

1. Click the >Searches link at the student grid in the SEA agency. 

2. Expand the search category for NRS Deduplicated. 

3. Select the FY Duplicate search and click the Replace button to run the search on all students. 

4. Select the Reporting System fiscal year. 

5. After the search rows populate, you may drill down to the student records. 


Students Merged into Agency of Duplicates

This search will find all students who have been merged into the Agency of Duplicates for a particular FY based on the merge criteria listed above.   


Students who match SSN of a merged student but aren't in the Agency of Duplicates because birthdate and/or gender don't match

This search is intended to help identify students who have the same SSN of a merged student in the Agency of Duplicates but whose data was not merged because the birthdate and/or gender (sex) did not match. For example, a student who was served by three different agencies in the FY may have an identical SSN, birthdate, and gender (sex) in two of the agencies.  Data from these two agencies is merged and the student is in the Agency of Duplicates.  The third agency that served the student may have entered the same SSN but a different birthdate or gender (sex). This search identifies these students so the records may be updated. Once SSN, birthdate, and gender (sex) matches in all three agencies, the student will be re-merged when the deduplication process is run again. 

This search includes students who match on all 3 data points, but whose intake date is after the last day of the FY being deduplicated. 

For example:


State Directors should contact Tech Support if the Deduplicate Reporting Year process needs to be run again to capture these students.


Students who match based on SSN alone and a merged student with this SSN does not exist in the Agency of Duplicates

This search finds students who have the same SSN but do not have matching birthdates or gender (sex).  Students do not have to be duplicates; they have the same SSN but do not meet additional criteria to be merged.


NRS Searches at the SEA

The NRS searches allow you to drill down to the list of students populating a particular cell in an NRS report.  Searches should be run from All Students at the SEA agency.  

There are two NRS searches.



NRS Table Searches return all NRS participants who met the criteria to be reportable, including duplicated reportable students.  For example, Student 1 and Student 2 are duplicate students served by two agencies in the FY and NRS reportable in both agencies.   When you run NRS Tables at the NRS search category, this student will count twice.


As part of the Data Dedupliction process, this student is merged into the agency of duplicates to count only once on the deduplicated NRS tables.  When you run a deduplicated table at the SEA on all students, the duplicated students are filtered out.  Then students are "pulled" from the agency of duplicates and reported on the table from the merged record.  If you drill down on a cell in the deduplicated table searches to display the students on the grid, you may see that some students display Agency of Duplicates as their agency, IF any students in that particular drill down list were merged.

To see the agency name after running a deduplicated table search:

1. Add the Agency column to a view at the SEA student area.  

2. Display all students. 

3. Click the >Searches link.

4. Expand the NRS Deduplicated category.

5. Select the NRS Table 4 Deduplicated search. 

6. Select the reporting year. 

7. Drill down on a number in one of the cells. 

8. If any of the students in the list were merged, the agency name will be Agency of Duplicates XXXX.

IMPORTANT NOTE:  You will only see the students from the Agency of Duplicates on the grid.  You are not able to do a grid column filter by Agency of Duplicates or by the name of students populating the table from the Agency of Duplicates. For example, Bob Smith counts on the deduplicated Table 4.  His agency displays as Agency of Duplicates.  If you attempt to filter the Agency column by Agency of Duplicates or the Last Name column by Smith, the grid search will return 0 students.  This is because the Agency of Duplicates is not part of the SEA agency tree, but is a separate agency for the sole purpose of containing merged records.  The record for Bob Smith that is included on the deduplicated Table 4 is also not in the actual SEA agency, but is a record in the Agency of Duplicates. 


How to verify the numbers on your deduplicated NRS searches

You can verify the numbers on the deduplicated NRS searches with the following search pattern, using Table 4 as an example.

1. At the SEA, display all students.

2. Run the regular Table 4 search (from the NRS search category) for the reporting FY. 

3. Drill down on Column B, Grand Total.

4. Run the deduplicated Table 4 search for the reporting FY on the selection.

5. Drill down on Column B, Grand Total of the deduplicated Table 4.

6. You will have two search tiles. Change the operator between the two tiles to 'And not.'

7. The resulting number of records are those students who were merged into the Agency of Duplicates.

8. Subtract that number from the number of students in the grand total of 'regular' Table 4.

9. Now go to the Agency of Duplicates for the reporting FY.  (If you do not see the Agency of Duplicates on the drop-down list of agencies, you may not have access to that agency.  Remember that ONLY users with access to the Agency of Duplicates will see the agency on the drop-down agency list.) 

10. Display all students and run Table 4.

11. Add the number from the column B, Grand Total total cell to the number from step #8.

12. Go back to the SEA and run deduplicated Table 4 on all students.

13.  The number from step #11 is the number of students who count on the column B, Grand Total of deduplicated Table 4.


For example (these numbers are for illustration purposes only and are not meant to be representative of the numbers on your tables):

Important note: You cannot run a deduplicated table on a selection from another deduplicated table.  For example, if you run deduplicated Table 4 and drill down on the Grand Total, you cannot then run a deduplicated Table 6 on that selection of students and expect accurate numbers.  This is because of the way students are filtered out and then pulled from the agency of duplicates. Run deduplicated tables one at a time, or select "Print All NRS Deduplicated Searches" from the More icon drop-down menu at the student area. 


State and Local Performance  (SPR) Deduplicated Searches at the SEA

The State and Local Performance (SPR) is also run on deduplicated data. 



Last Update:  9/8/2023