Iβm excited to share Content Extractor with Vision LLM, an open-source Python tool that extracts content from documents (PDF, DOCX, PPTX), describes embedded images using Vision Language ModelsIβm excited to share Content Extractor with Vision LLM, an open-source Python tool that extracts content from documents (PDF, DOCX, PPTX), describes embedded images using Vision Language Models, and saves the results in clean Markdown files.
This is an evolving project, and Iβd love your feedback, suggestions, and contributions to make it even better!
β¨ Key Features
- Multi-format support: Extract text and images from PDF, DOCX, PPTX.
- Advanced image description: Choose from local models (Ollama's llama3.2-vision) or cloud models (OpenAI GPT-4 Vision).
- Two PDF processing modes:
- Text + Images: Extract text and embedded images.
- Page as Image: Preserve complex layouts with high-resolution page images.
- Markdown outputs: Text and image descriptions are neatly formatted.
- CLI interface: Simple command-line interface for specifying input/output folders and file types.
- Modular & extensible: Built with SOLID principles for easy customization.
- Detailed logging: Logs all operations with timestamps.
π οΈ Tech Stack
- Programming: Python 3.12
- Document processing: PyMuPDF, python-docx, python-pptx
- Vision Language Models: Ollama llama3.2-vision, OpenAI GPT-4 Vision
π¦ Installation
Clone the repo and install dependencies using Poetry. System dependencies like LibreOffice and poppler are required for processing specific file types.
Detailed setup instructions: GitHub Repo
π How to Use
- Clone the repo and install dependencies.
- Start the Ollama server:
ollama serve
. - Pull the llama3.2-vision model:
ollama pull llama3.2-vision
. - Run the tool:bashCopy codepoetry run python main.py --source /path/to/source --output /path/to/output --type pdf
- Review results in clean Markdown format, including extracted text and image descriptions.
π‘ Why Share?
This is a work in progress, and Iβd love your input to:
- Improve features and functionality
- Test with different use cases
- Compare image descriptions from models
- Suggest new ideas or report bugs
π Repo & Contribution
GitHub: Content Extractor with Vision LLM
Feel free to open issues, create pull requests, or fork the repo for your own projects.
π€ Letβs Collaborate!
This tool has a lot of potential, and with your help, it can become a robust library for document content extraction and image analysis. Let me know your thoughts, ideas, or any issues you encounter!
Looking forward to your feedback, contributions, and testing results.
, and saves the results in clean Markdown files.
This is an evolving project, and Iβd love your feedback, suggestions, and contributions to make it even better!
β¨ Key Features
- Multi-format support: Extract text and images from PDF, DOCX, PPTX.
- Advanced image description: Choose from local models (Ollama's llama3.2-vision) or cloud models (OpenAI GPT-4 Vision).
- Two PDF processing modes:
- Text + Images: Extract text and embedded images.
- Page as Image: Preserve complex layouts with high-resolution page images.
- Markdown outputs: Text and image descriptions are neatly formatted.
- CLI interface: Simple command-line interface for specifying input/output folders and file types.
- Modular & extensible: Built with SOLID principles for easy customization.
- Detailed logging: Logs all operations with timestamps.
π οΈ Tech Stack
- Programming: Python 3.12
- Document processing: PyMuPDF, python-docx, python-pptx
- Vision Language Models: Ollama llama3.2-vision, OpenAI GPT-4 Vision
π¦ Installation
Clone the repo and install dependencies using Poetry. System dependencies like LibreOffice and poppler are required for processing specific file types.
Detailed setup instructions: GitHub Repo
π How to Use
- Clone the repo and install dependencies.
- Start the Ollama server:
ollama serve
. - Pull the llama3.2-vision model:
ollama pull llama3.2-vision
. - Run the tool:bashCopy codepoetry run python main.py --source /path/to/source --output /path/to/output --type pdf
- Review results in clean Markdown format, including extracted text and image descriptions.
π‘ Why Share?
This is a work in progress, and Iβd love your input to:
- Improve features and functionality
- Test with different use cases
- Compare image descriptions from models
- Suggest new ideas or report bugs
π Repo & Contribution
GitHub: Content Extractor with Vision LLM
Feel free to open issues, create pull requests, or fork the repo for your own projects.
π€ Letβs Collaborate!
This tool has a lot of potential, and with your help, it can become a robust library for document content extraction and image analysis. Let me know your thoughts, ideas, or any issues you encounter!
Looking forward to your feedback, contributions, and testing results.
[link] [comments]