The Setup
The test database has the same CPU/RAM specs with Production.
Tested against 50million records. (next is 350M) It takes time to generate test data.
We added constant workload simulation. Around 5% constant CPU load. See Label #2 below.
Conclusion
It is safe and optimal to process 50Million records at a time.
Duration: 300 Seconds
5% increase in CPU workload only. See label #3 below.
RDS limits the CPU allocation to 5%
RDS db.r5.12xlarge
Once scripts are run. You cannot cancel
While scripts are running. It is not committed
yet. I will not reflect in record count
Processing small chunks of data. (ex. Archive data 1 day at a time) but looping multiple times causes the CPU to spike. See label #1 below.
90
Executing:
SELECT
(select count(*) from execution_archive)
+
(select count(*) from execution_report_1),
(select count(*) from execution_archive
where created <= current_date - 90)
BEFORE Archiving: Source + destiantion total and Data for archiving: [285604306, 45931504]
Executing:
insert into execution_report_1
select *
from execution_archive
where created <= current_date - 90;
delete from execution_archive
where created <= current_date - 90;
select count (*)
from execution_archive
where created <= current_date - 90;
Executing:
SELECT
(select count(*) from execution_archive)
+
(select count(*) from execution_report_1)
AFTER Archiving: Source + destination total: [285604306]
time_elapsed 348.073655128479
mikaelgulapa@Mikaels-MacBook-Pro trading-archive-execution-reports % python3 oneTimeArchiveExecutionReportTable.py
CREATE TABLE IF NOT EXISTS execution_archive AS TABLE execution_report WITH NO DATA;
Executing:
CREATE TABLE IF NOT EXISTS execution_archive AS TABLE execution_report WITH NO DATA;
90
Executing:
SELECT
(select count(*) from execution_report_1)
+
(select count(*) from execution_archive),
(select count(*) from execution_report_1
where created <= current_date - 90)
BEFORE Archiving: Source + destiantion total and Data for archiving: [285604306, 259003765]
time_elapsed 33.96823811531067
Executing:
insert into execution_archive
select *
from execution_report_1
where created <= current_date - 90;
delete from execution_report_1
where created <= current_date - 90;
select count (*)
from execution_report_1
where created <= current_date - 90;
Executing:
SELECT
(select count(*) from execution_report_1)
+
(select count(*) from execution_archive)
AFTER Archiving: Source + destination total: [285604306]
Success!
time_elapsed 2192.168580055237