Case Study
Research Platform
Research Platform
Scholarly Intake
A web platform that helps writers and researchers create a web journal by uploading academic papers and automatically upload it to a MongoDB database with extracted text and metadata.
Scholarly Intake was born out of the frustration I faced when trying to keep track of numerous academic papers and extract relevant information efficiently for my online student journal. The platform aims to streamline the research process by allowing editors to upload PDFs of academic papers, which are then processed to extract text and metadata for easy retrieval and analysis.
The system was created to manage the articles, essays and papers submitted by student writers for publication for another project I worked on, TheStudentOpinion. The existing workflow required me to manually upload each paper to the website using HTML. This was time-consuming and inefficient, especially as the volume of submissions increased and I had to send papers to editors individually. The platform automates this process, allowing for bulk uploads and automatic extraction of key information.
The platform leverages Python for backend processing, using libraries such as PyMuPDF to extract text from PDFs. LangChain is utilized to handle the document processing and chunking, while Pinecone serves as the vector database for storing and retrieving the extracted information. The frontend is built with Next.js, providing a user-friendly interface for uploading documents and managing the research database.
Moving forward, I plan to enhance Scholarly Intake by implementing new features such as advanced search capabilities, and implement AI-driven summarization of papers to help researchers quickly grasp the main points of each document. Additionally, I aim to integrate collaboration tools to allow multiple users to work on the same research projects seamlessly. This project is open source and available on GitHub for other researchers and developers to create their own online journals.