Arpan Chatterjee

Wikipedia Question Answering Model

Central to this project is a sophisticated QA system, utilizing Microsoft's MPNET retriever model on an extensive dataset of 6 million Wikipedia articles. This fusion of cutting-edge technology with rich content enables me to provide insightful responses tailored to your queries. Efficiency is key, and that's why I've seamlessly integrated the Pinecone index. This feature optimizes vector embedding storage and retrieval, ensuring swift access to information through intuitive cosine similarity queries. To enhance the accuracy of responses, I've employed the ELI5 BART model. This Sequence-To-Sequence marvel crafts precise answers by drawing inspiration from the most relevant responses.

GitHub

Patentability Score

In this project, I embarked on an exploration by conducting an Executed Exploratory Data Analysis on an expansive Harvard USPTO patent dataset, encompassing over 4,500,000 patents. This undertaking allowed me to gain invaluable insights into the intricacies of patent data. Taking my analysis a step further, I fine-tuned the Google BERT transformer using this dataset. Leveraging this advanced model, I delved into the realm of predictive analytics by assessing patentability scores through analysis of patent abstracts. This work can be used to check the viability of a patent.

GitHub

Autoregressive Text Generation

I crafted an advanced text generation system using over 70,000 e-books sourced from Project Gutenberg, totaling a remarkable 100GB of data. My approach involved the strategic implementation of a 3-gram model alongside a frequency dictionary, constructed using PySpark. To enhance data organization, I designed a MongoDB database to intricately store e-book metadata. Furthermore, I elevated user engagement by developing a Streamlit-powered User Interface. This intuitive platform empowers users to effortlessly select authors and their works, ultimately generating contextually rich paragraphs.

GitHub

Improved Node Localization In Wireless Sensor Networks

A sensor node localization technique, developed using the power of neural networks. The model has been methodically trained across a spectrum of scenarios, each involving intricate variations in the numbers and positions of anchor and sensor nodes. The result is a technique that not only demonstrates precision but also adapts seamlessly to a range of real-world situations. A remarkable achievement within this project is the attainment of an exceptional accuracy rate of 99.142%. The significance of this accomplishment is further underscored by the publication of this work.

Read the paper

Most Similar Faces

Ever wanted to find your doppelganger? Imagine feeding an image into a system, and in return, receiving a curated selection of images that bear a striking resemblance to your input. This captivating task is none other than the art of similarity search.

Our approach involved utilizing the ResNet-50 model which is pre-trained on the ImageNet We then did an analysis of over 13,000 meticulously labelled images from the "Labelled Faces in the Wild" dataset, uncovering hidden features that formed the foundation of our endeavour.

The real gem lies in our application of the cosine similarity metric, a technique that quantifies likeness between images. This process paved the way for us to unveil the most visually akin faces, delivering responses that truly resonate with the user's intent.

GitHub