One of the benefits of applied machine learning is the enhanced automation of tedious tasks currently performed by middle and back office operations team, freeing them up for more creative and productive work.
One such ubiquitous and time consuming back office task in the financial services industry is the matching of financial transactions. This function is performed in all organizations and can include matching and reconciliation of bank statements, trades, financial positions and other financial transactions.
Currently due to data quality limitations, 100% of financial transactions do not match automatically, leaving transactions to be matched manually. In the payments field alone, over 17 billion transactions are settled by the Federal Reserve each year. Typical automated match rates for these types of transactions is often less than 90%, leaving 10's of millions of transactions to be manually processed. Typically reconciliation teams spend 30% of their day on this manual matching leading to delays in the completion of reconciliations.
Currently the standard industry practice to match transactions is to use an ordered set of pre-designed match rules. A common illustrative example of this is the matching of GL entries to bank transactions which is required during the course of performing account reconciliations. In this process, each bank transaction (or group of bank transactions) must be matched to a unique GL entry (or group of GL entries), or alternatively the bank transaction can be classified as not having any matching GL.
For the GL to bank rec example, the transaction fields for the GL and Bank transactions would typically be fields such as amount, value date, reference, description.
Example ordered match rules might be:
1. Bank amount = GL amount and Bank value date = GL value date and Bank reference =
GL Reference
2. Bank amount = GL amount and Bank value date = GL value date and Bank description =
GL description
3. Bank amount = GL amount and Bank value date = GL value date +/- 5 days and Bank
reference = GL Reference
4. Bank amount = GL amount and Bank value date = GL value date +/- 5 days and Bank
description = GL description
5. Bank amount = GL amount and Bank value date = GL value date +/- 5 days and first 5
characters of Bank reference = first 5 characters of GL Reference
These rules are applied in order of priority to create pairs of GL and Bank transactions. Such rules are usually handcrafted and for reconciliations with more fields and tolerances, there may be 50 or more rules required to achieve acceptable match rates.
The advantages of matching with rules include:
● They are easily understood and matches are explainable directly in terms of the match
rule which created the match.
● They are easy to code.
● They are fast to apply.
The disadvantages of match rules are:
● The rules must be created, tested, deployed and periodically updated as data
characteristics change.
● If the transaction data suffers from any data quality issues (for example missing or
inconsistent references, descriptions, date or amount differences) then it is difficult to
achieve high match rates whilst maintaining high accuracy.
For reconciliations with data quality limitations, it is common for automated match rates to be less than 90%, which for the high volume reconciliations in the financial service industry leaves millions of transactions to be manually matched by the operations team each year. This competition explores machine learning models for improving matching beyond that achievable by rules based systems.
The competition consists of creating a machine learning model which is able to perform matching between two sets of financial transactions with both a high automated match rate and high accuracy (both of which are the key operational metrics used in reconciliation departments).
Competition dates
An obfuscated real-world training, evaluation and a sample submission file has been created in a Kaggle compatible format and will be made available on 9am EST October 30th 2023
Submissions may be made up to November 23rd 5pm EST.
Submission format:
Submissions will require a submission file in the specified format along with a high level description of the algorithm used (you will be provided with a kaggle link when you request to join the competition).
Evaluation Criteria:
The following (custom) operational metrics will be evaluated (using a Kaggle notebook script since these metrics are not available as standard metrics on the Kaggle platform).
Match rate: The fraction of matched transactions for which the model was able to predict a match which was consistent with the actual match.
Match accuracy: The fraction of matches predicted by the model which are consistent with the actual match.
If the model includes a confidence prediction, a model calibration curve will also be calculated (although this will not be evaluated as part of the competition).
The competition winner will be the submission which achieves the highest value of the weighted combination of match rate and match accuracy :
match rate - k * ( 1- match accuracy )
Where k will be an industry cost/benefit parameter representing the cost saving associated with an automated versus manual match minus the cost of the operational consequences of a mismatch.
ICAIF presentation
The winner will be announced at ICAIF 2023 and will be given the opportunity to present on their solution at the ICAIF conference.
Note if there are non winning models with high match rate or match accuracy these will also receive an honourable mention at the conference.
How to enter:
To enter the competition please email the following details to fintran.icaif2023.competitions@gmail.com
Name
Organization
Registration # for ICAIF 2023.
You will be provided with a link to the kaggle training, evaluation and sample submission datasets.
You may also contact us at this email for any questions about the dataset or the competition