Patentability
CS634 - Tedi Pano - Milestone 3/4
CS634 - Tedi Pano - Milestone 3/4
URL: https://huggingface.co/spaces/panotedi/milestone3
Description
This code is a Streamlit web application that uses a pre-trained DistilBERT model for sequence classification to predict the patentability of a given patent application
The application first loads a dataset of patent applications using Hugging Face's load_dataset function. Then it processes the data by only retaining the entries which have been considered ACCEPTED or REJECTED as well as creating a patentability score column which reiterates these values as one or zero. Finally, splits it into training and validation sets.Â
Then it will parition the data into the abstract column and patentability score column and tokenizes this partitioned data using the DistilBERT tokenizer, creating TensorFlow Datasets, and training the DistilBERT model for sequence classification on the training set. Once training is completed, the fine-tuned trained model is used to predict the patentability score of a user-selected patent application.
The user interface allows the user to select a patent application via it's patent number and submit it for prediction. The application then displays the predicted patentability score (either ACCEPTED or REJECTED) with a certainty score based on the model's prediction.