It's one of the fun projects I did during my COMS 561 database coursework. Based on a University Database (ER Diagram), this project aims to implement a graphical model using Neo4j. Here, to create nodes and edges in Neo4j; steps that I took to write cypher scripts to complete this project:
Set-up Neo4j Server
Import CSV Files
Script Queries
Script Modifies
Drop All Nodes and Edges
Part-2
Continuing Neo4j, The second part of the project was on the Twitter Database, where I had to complete a total of 9 queries based on the project requirements.
This project was based on the Covid-19 corpus in MongoDB using PyMongo connector. This project splitter into two parts.
Part-1
Libraries used: PyMongo, JSON, BSON, MapReduce, etc.
Set-up MongoDB server
Importing JSON datasets into MongoDB through PyMongo
Script Queries
Part-2: Pattern Mining
Libraries used: NumPy, Pandas, JSON, NLTK, etc.
Mining frequent itemsets of the data
Preprocessing involves:
Tokenization (sentences & words); remove stopwords/punctuations/duplicate words
Apriori and FP-Growth algorithms were used to mine the frequent itemsets.
Build a binary fortune cookie classifier
Implement the ID3 decision tree learning algorithm
Implement the information gain heuristic for selecting the next feature
Implement the decision tree pruning algorithm (via validation data)
Compute the accuracy of decision tree and pruned decision tree on validation and testing examples
Implementing LeNet (LeCun Network) using PyTorch and apply the LeNet to the image recognition task on CIFAR-10 (10-classes classification). The CIFAR-10 dataset consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. APIs used:
- nn.Conv2d
- nn.MaxPool2d
- nn.BatchNorm2d
- nn.Linear
- nn.ReLU()