PDF/DOCX Reader for Cursor IDE

A simple Python tool I built to help read PDF and DOCX files directly in Cursor IDE. I was tired of not being able to process documents in my AI workflows, so I created this tool to extract text content from PDFs and Word documents.

What it does

Reads PDF files (using pdfplumber and PyPDF2 as backup)
Reads DOCX files (using python-docx)
Extracts metadata like title, author, creation date
Outputs in JSON or plain text format
Handles errors gracefully
Preserves document structure (pages/paragraphs)

Setup

Clone this repo
Install the Python dependencies:

pip install -r requirements.txt

That's it! No complex setup needed.

How to use it

Just run it with a PDF or DOCX file:

# Read a PDF (defaults to JSON output)
python pdf_docx_reader.py document.pdf

# Read a DOCX file
python pdf_docx_reader.py document.docx

# Get plain text instead of JSON
python pdf_docx_reader.py document.pdf --output-format text

You can also pipe the output to files or use it in scripts:

# Save to file
python pdf_docx_reader.py document.pdf > output.json

# Process multiple files
for file in *.pdf; do
    python pdf_docx_reader.py "$file" > "${file%.pdf}.txt"
done

Options

file_path: The PDF or DOCX file to read (required)
--output-format: Choose json (default) or text
--help: Show help
--version: Show version

Output Format

JSON Output (Default)

{
  "file_path": "/path/to/document.pdf",
  "file_type": "PDF",
  "pages": [
    {
      "page_number": 1,
      "text": "Page content here...",
      "char_count": 150
    }
  ],
  "full_text": "Complete document text...",
  "metadata": {
    "title": "Document Title",
    "author": "Author Name",
    "creation_date": "2024-01-01",
    "modification_date": "2024-01-02"
  },
  "page_count": 1
}

Text Output

File: /path/to/document.pdf
Type: PDF
Pages: 1

Metadata:
  title: Document Title
  author: Author Name
  creation_date: 2024-01-01

Content:
Complete document text here...

Using with Cursor IDE

I built this specifically for Cursor IDE, so it works great there. Just open the terminal in Cursor and run:

python pdf_docx_reader.py your_document.pdf

You can also create a simple wrapper script if you want:

#!/bin/bash
# pdf_reader.sh
python /path/to/pdf_docx_reader.py "$1" --output-format text

Or use it in your Python code:

from pdf_docx_reader import FileReader

reader = FileReader()
data = reader.read_file("document.pdf")
print(data['full_text'])

🤖 AI Auto-Detection (NEW in v1.1.0)

The extension now includes automatic detection and processing for AI models! AI can now seamlessly process PDF/DOCX files without manual intervention.

New AI Commands

pdfDocxReader.autoDetectAndProcess - Automatically detects and processes any PDF/DOCX file
pdfDocxReader.processForAI - Returns AI-optimized data structure with enhanced context
pdfDocxReader.getAIReadyContent - Returns clean, AI-ready text content

AI Integration Example

// AI can now automatically process documents
const result = await vscode.commands.executeCommand(
    'pdfDocxReader.processForAI', 
    '/path/to/document.pdf'
);

const data = JSON.parse(result);
if (data.ai_ready) {
    console.log(`Processing ${data.file_type} document:`);
    console.log(`Summary: ${data.summary}`);
    console.log(`Content: ${data.content}`);
}

AI-Optimized Output

The new AI commands return data in this enhanced format:

{
  "ai_ready": true,
  "file_path": "/path/to/document.pdf",
  "file_type": "PDF",
  "content": "Full document text...",
  "summary": "This is a PDF document with 5 pages containing 1,234 words...",
  "metadata": { "title": "Document Title", "author": "Author" },
  "structure": {
    "page_count": 5,
    "char_count": 12345,
    "word_count": 1234
  },
  "processed_at": "2024-01-01T12:00:00.000Z"
}

Perfect for AI workflows! 🚀

Troubleshooting

If something goes wrong:

"PDF reading libraries not available"
- Run pip install -r requirements.txt
"File is not a PDF/DOCX"
- Make sure the file has .pdf or .docx extension
Empty text extraction
- Some PDFs are just images - you'll need OCR for those
- Try the other PDF reader (it switches between pdfplumber and PyPDF2)
Permission errors
- Make sure the file isn't locked by another app

Dependencies

pdfplumber>=0.9.0 - Main PDF reader
PyPDF2>=3.0.0 - Backup PDF reader
python-docx>=0.8.11 - DOCX reader

Examples

Reading a research paper:

python pdf_docx_reader.py research_paper.pdf --output-format text

Batch processing:

for pdf in *.pdf; do
    echo "Processing: $pdf"
    python pdf_docx_reader.py "$pdf" > "${pdf%.pdf}.txt"
done

Get just the metadata:

python pdf_docx_reader.py document.pdf | jq '.metadata'

Installation

From VS Code Marketplace

Open VS Code or Cursor IDE
Go to Extensions (Ctrl/Cmd + Shift + X)
Search for "PDF/DOCX Reader"
Click Install

From GitHub

Download the latest VSIX from Releases
Install from VSIX file

License

MIT License - feel free to use it however you want.

Contributing

Found a bug? Have an idea? Open an issue or send a PR. I'm always looking to improve this tool.

Marketplace

Available on VS Code Marketplace: PDF/DOCX Reader

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
images		images
src		src
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.vscodeignore		.vscodeignore
AI_AUTO_DETECTION_GUIDE.md		AI_AUTO_DETECTION_GUIDE.md
AI_INTEGRATION_GUIDE.md		AI_INTEGRATION_GUIDE.md
LICENSE		LICENSE
MARKETPLACE_README.md		MARKETPLACE_README.md
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
package.json		package.json
pdf_docx_reader.py		pdf_docx_reader.py
requirements.txt		requirements.txt
setup.py		setup.py
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF/DOCX Reader for Cursor IDE

What it does

Setup

How to use it

Options

Output Format

JSON Output (Default)

Text Output

Using with Cursor IDE

🤖 AI Auto-Detection (NEW in v1.1.0)

New AI Commands

AI Integration Example

AI-Optimized Output

Troubleshooting

Dependencies

Examples

Installation

From VS Code Marketplace

From GitHub

License

Contributing

Marketplace

About

Uh oh!

Releases

Packages

Languages

License

certainly-param/cursor-pdf-docx-reader-

Folders and files

Latest commit

History

Repository files navigation

PDF/DOCX Reader for Cursor IDE

What it does

Setup

How to use it

Options

Output Format

JSON Output (Default)

Text Output

Using with Cursor IDE

🤖 AI Auto-Detection (NEW in v1.1.0)

New AI Commands

AI Integration Example

AI-Optimized Output

Troubleshooting

Dependencies

Examples

Installation

From VS Code Marketplace

From GitHub

License

Contributing

Marketplace

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages