Uploaded by hardiktogadiya

PDF Task

advertisement
Next Task: PDF, Microsoft Office, and Google Suite Interaction with the model.
We will be starting with integrating the ability for the model to extract the text
from a PDF and Docx File and allow the user to interact with the text and
information.
User Flow
1. User creates a new BEBA AI Chat.
2. The user will press (a new button) that says upload PDFs to this chat.
3. Once the user presses the button they will be prompted to upload 1 or
more PDFs into the model window.
4. Using pyPDF2 and python-Docx two NPMs models we will extract the data
from the PDF document, including the metadata.
5. The User can then ask questions about the PDF file that was uploaded, and
the BEBA AI will give them the answer to it based on any of the uploaded
PDFs.
6. Note the BEBA AI model will search through all the provided PDF data to
give an answer.
A link to the PDF should be stored in MongoDB with the full PDFs being stored
inside Backblaze Bucket. Mr. Saliba can provide access to it.
NPMs
Pdf-parse - https://www.npmjs.com/package/pdf-parse-fork
Mammoth.js: - https://github.com/mwilliamson/mammoth.js
Download