Resources

Probable solutions

To get you started, we are providing some pointers for your solutions. We,however, encourage you to come up with your own innovative solutions to the problem.

Manual annotations
- BRAT tool can be used to crowd source the problem and let human annotators guess the masked entities, and optionally impute values too. But a more feasible solution is to let human annotators provide the entity types for the masked entities, and then use some dictionary to impute values of that type.
Rule based annotations
- A rule based system, which uses dictionaries (of names, places, credit card numbers etc) can be used to find patterns in sentences, and replace the masked portions. IBM’s System T (or any other solution, or perhaps just regular expressions) can be used to find such patterns in sentences. The following course provides an introduction to SystemT.
Data Programming
- Snorkel is a system used for generating large amounts of noisy training data in a short time. After generating a gold set using manual methods, this system could be used to annotate more.
Model based annotations
- A machine learning model can also be used to generate words/numbers to replace the redacted portions in a sentence. This problem can perhaps be solved using Natural Language Generation models which typically tend to be sequence-to-sequence models.
- Natural Language Generation
- Sequence-to-sequence models

Github Repo

https://github.com/hackathoniima/ICADABAI2019
- You can refer to the sample code to read the input file and write to an output file, in our github repository. This will help you jump-start your solution.

Report abuse