CoT vs ToT (Reasoning with LLMs)

This project explores different reasoning strategies for large language models (LLMs):

• Chain-of-Thought (CoT) → a single reasoning path, step by step.
• Self-Consistency → sample multiple CoTs, then vote for the most consistent answer.
• Tree-of-Thoughts (ToT) → branching exploration of possible reasoning steps.
• Graph-of-Thoughts (GoT) → voting across multiple CoT runs (a toy graph-based variant).

The idea is to see how prompting strategies can improve reasoning accuracy on benchmarks like GSM8K, BIG-Bench Hard, and ARC-Challenge. This repo is a research playground for myself: it’s not about state-of-the-art results, but about learning how reasoning methods differ in practice.

What you’ll learn

• How prompting alone can change the reasoning ability of LLMs.
• Why sampling multiple paths (self-consistency) often beats a single chain-of-thought.
• How to set up a reproducible evaluation harness for reasoning tasks.
• How to run simple experiments with OpenAI or local HuggingFace models.

Requirements

This project requires Python 3.10 or higher. To install dependencies, run:

pip install -r requirements.txt

You will also need API keys:
• OpenAI API key: set it in a .env file as OPENAI_API_KEY=…
• (Optional) Hugging Face token: set HUGGINGFACEHUB_API_TOKEN=… if you want to use local transformers models.

How to run

To run a simple Chain‑of‑Thought experiment on the GSM8K dataset:

python -m src.cot --dataset openai/gsm8k --subset main --split test --max-samples 20 --model openai:gpt-4o-mini

To try the Graph‑of‑Thoughts variant:

python -m src.graph_of_thought --dataset openai/gsm8k --subset main --split test --max-samples 10 --iters 4 --model openai:gpt-4o-mini

Results are saved under the results/ directory, and you can compute accuracy with:

python -m src.eval --run-dir results/<your_run_folder>

Key takeaways

• Reasoning is not “one prompt and done.” Different strategies yield different results.
• Self‑consistency and tree‑based approaches can uncover better reasoning paths than plain chain‑of‑thought.
• Prompting alone can provide research‑level insights without any model fine‑tuning.

References

• Chain-of-Thought Prompting
• Self-Consistency
• Tree-of-Thoughts
• BIG-Bench Hard

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
scripts		scripts
src		src
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CoT vs ToT (Reasoning with LLMs)

What you’ll learn

Requirements

How to run

Key takeaways

References

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

MackCesar/cot-vs-tot

Folders and files

Latest commit

History

Repository files navigation

CoT vs ToT (Reasoning with LLMs)

What you’ll learn

Requirements

How to run

Key takeaways

References

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages