Web Page Content Query System

Project Summary

The Web Page Content Query System is an advanced application designed to facilitate the retrieval and querying of web page content using natural language. Leveraging state-of-the-art technologies such as LangChain, Ollama, and Chroma, this system provides both a command-line interface and an interactive Streamlit web interface. Users can load web pages, process their content, and ask questions to receive AI-generated answers. The system is built to handle complex queries efficiently, making it a powerful tool for information retrieval and analysis.

Features

Load and analyze any web page content
Split content into manageable chunks for processing
Generate embeddings using Ollama's local embedding model
Store and retrieve relevant content using Chroma vector database
Query content using natural language questions
Get AI-generated answers using local Ollama models
Interactive web interface with Streamlit
Colorful and intuitive UI design

Technologies Used

LangChain v0.2+: Framework for building LLM applications
Ollama: Local Large Language Model for generating responses
Chroma: Vector database for storing and retrieving embeddings
Streamlit: Web interface framework
BeautifulSoup4: Web scraping and HTML parsing
Python 3.x: Programming language

Prerequisites

Python 3.x installed
Ollama installed and running locally
Git (for cloning the repository)

Installation

Clone the repository:

git clone <repository-url>
cd <repository-name>

Install required packages:

pip install -r requirements.txt

Ensure Ollama is running locally with the required models:

ollama pull llama3.1

Usage

Command Line Interface

Run the command-line version:

python rag_app.py

Follow the prompts to:

Enter a webpage URL
Ask questions about the content
Type 'new' to analyze a different webpage
Type 'quit' to exit

Streamlit Web Interface

Run the Streamlit interface:

streamlit run streamlit_app.py

The web interface provides:

URL input field for loading web pages
Question input for querying content
Clear button to reset the application
Visual feedback for successful/failed operations

Project Structure

rag_app.py: Core RAG functionality and CLI interface
streamlit_app.py: Streamlit web interface
requirements.txt: Project dependencies
chroma_db/: Directory for vector database storage

How It Works

Web Page Loading: The application fetches and parses web page content using WebBaseLoader
Content Processing: Text is split into chunks using RecursiveCharacterTextSplitter
Embedding Generation: Content chunks are converted to embeddings using Ollama's local embedding model
Vector Storage: Embeddings are stored in a Chroma vector database
Query Processing: User questions trigger relevant content retrieval
Answer Generation: Ollama generates answers based on retrieved content and user questions

Error Handling

The application includes error handling for:

Failed webpage loading
API errors
Invalid URLs
Query processing issues

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is open source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.devcontainer		.devcontainer
.gitignore		.gitignore
README.md		README.md
rag_app.py		rag_app.py
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Page Content Query System

Project Summary

Features

Technologies Used

Prerequisites

Installation

Usage

Command Line Interface

Streamlit Web Interface

Project Structure

How It Works

Error Handling

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

matr1xp/RAG

Folders and files

Latest commit

History

Repository files navigation

Web Page Content Query System

Project Summary

Features

Technologies Used

Prerequisites

Installation

Usage

Command Line Interface

Streamlit Web Interface

Project Structure

How It Works

Error Handling

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages