Abstract: Worked on Developing Private Embeddings
Abstract: Almost all recent state-of-the-art (SOTA) techniques on Large Language Models (LLMs) use only transformer-based architectures, which rely entirely on their now ubiquitous attention mechanism. However, the quadratic increase in computational cost for every addition to the input sequence limits the application of Transformers to short passages only. Many recent efforts are directed towards overcoming this limitation through a selective attention mechanism, with common approaches splitting attention into two types: local and global. Although theoretical guarantees of sparse attention, such as them being Turing-complete similar to full attention, have been proven, their impact on pre-training is yet to be studied. In this work, we focus on empirically studying the impact of global attention on LLM pre-training. Firstly, we created a large corpus of structure-aware text as a pre-training corpus from arXiv data, along with its text-only counterpart. We conducted pre-training on the two datasets, studied the changes in attention patterns, and examined their impact on downstream tasks. Through our analysis, we demonstrate that encoding document structure into LLM models is crucial to enable them for more abstract tasks, such as document understanding
Abstract: The uncertain yield of crops is one of the major problems the agricultural sector faces today, especially in India. The objective of this paper is to provide an accurate and reliable prediction of crop yield. This will help farmers make decisions that can make their farming more efficient and profitable. We propose a novel deep learning model - an ensemble neural network model using Long Short-Term Memory (LSTMs) and one-dimensional Convolutional Neural Networks (CNNs). We used crop data for over 30 crops from 1997-2015 of all Indian districts. Our model substantially outperforms all other models (Linear Regression, Random Forest, extreme Gradient Boosting (XGB) Regressor, Feed-forward Neural Network (FFNN)) that were tested on accuracy in predicting crop yields. We achieve a correlation coefficient value of over 0.90 and 0.92 for our model for train and test datasets.Our model has several advantages compared to other models. Firstly, it is able to capture the time dependency on temperature and rainfall. Secondly, it is able to work on a large and diverse dataset, unlike most models which only perform well in small regions. Lastly, it is able to use several diverse features - geographical, social, and economic to make a prediction.