English Recipe NER

The English Recipe Named Entity (r-NE) Corpus

that is constructed according to the same guidelines with the Japanese recipe named entity corpus [1] using the IOB2 chunking format . It consistds of 300 recipes, in which 100 recipes were sampled from each category of ‘dish type’ and another 200 recipes were randomly sampled in the Allrecipes UK/Ireland web site (http://allrecipes.co.uk/) as of December 2016. Please refer the paper [2] for details.

r-NE Tag Type

Ten types of tag are defined as 'F' is Food, 'T' is Tool, and so on.

Examples of the Tagging Data:

Mash/B-Ac the/O eggs/B-F with/O a/O fork/B-T in/O a/O mixing/B-T bowl/I-T . /O

Add/B-Ac the/O avocado/B-F ,/O onion/B-F ,/O pickle/B-F ,/O mustard/B-F and/O mayonnaise/B-F ./O

NER on BERT

The result of NER using BERT-NER [3] that is a NER architecture based on BERT [4] is shown as follows;

processed 3386 tokens with 1399 phrases; found: 1419 phrases; correct: 1270.

accuracy: 94.21%; precision: 89.50%; recall: 90.78%; FB1: 90.13

Ac: precision: 94.74%; recall: 94.74%; FB1: 94.74 456

Ac2: precision: 45.45%; recall: 33.33%; FB1: 38.46 11

Af: precision: 57.89%; recall: 50.00%; FB1: 53.66 19

At: precision: 100.00%; recall: 100.00%; FB1: 100.00 2

D: precision: 85.71%; recall: 89.36%; FB1: 87.50 49

F: precision: 93.66%; recall: 96.30%; FB1: 94.96 473

Q: precision: 86.67%; recall: 83.87%; FB1: 85.25 60

Sf: precision: 60.87%; recall: 69.14%; FB1: 64.74 92

St: precision: 82.95%; recall: 87.95%; FB1: 85.38 88

T: precision: 91.12%; recall: 90.06%; FB1: 90.59 169

The pre-trained model 'cased_L-12_H-768_A-12' was downloaded from here.

Our pre-trained model can be downloaded from here. In this experiment, [train, dev, test] = [240, 30, 30].

The training data includes 2250 sentences and 31295 tokens. The max sequence length was 128, training batch size was 32, learning rate was 2e-5, number of epoch was 4.

References

[1] Shinsuke Mori, Hirokuni Maeta, Yoko Yamakata, and Tetsuro Sasada. 2014. Flow Graph Corpus from Recipe Texts. In LREC’14.

[2] Yoko Yamakata, Shinsuke Mori, John Carroll, "English Recipe Flow Graph Corpus", LREC2020.

[3] https://github.com/kyzhouhzau/BERT-NER

[4] https://arxiv.org/abs/1810.04805

Google Sites

Report abuse