Skip to content

mr-sarthakgupta/deep-paper-search

Repository files navigation

Research Paper Agent with Local LLM

An intelligent research agent that finds academic papers, discovers related works through Connected Papers, and identifies relevant GitHub repositories. Works completely offline with local LLMs - no API keys required!

πŸš€ Quick Start

Get started in 2 minutes - no setup required:

# 1. Install dependencies
pip install -r requirements.txt

# 2. Run immediately with rule-based processing
python main.py "transformer neural networks"

✨ Features

  • πŸ” Multi-source Paper Search: arXiv, Google Scholar integration
  • πŸ•ΈοΈ Connected Papers: Discover related works and paper networks
  • πŸ“¦ GitHub Discovery: Find code repositories for research papers
  • πŸ€– Local LLM Support: Ollama, Transformers, or rule-based processing
  • πŸ”’ Privacy-First: Everything runs locally, no data sent to external APIs
  • πŸ“Š Structured Output: Clean JSON results with relevance scoring

πŸ› οΈ Setup Options

Option 1: Rule-Based (Instant Start) ⚑

No configuration needed - works immediately:

python main.py "machine learning"

Uses intelligent keyword matching and scoring algorithms.

Option 2: Ollama (Recommended for AI) 🧠

Best local AI experience:

# 1. Install Ollama from https://ollama.ai
# 2. Pull a model
ollama pull llama2:7b

# 3. Configure the agent
cp .env.example .env
# Edit .env and set: LLM_PROVIDER=ollama

# 4. Run with AI enhancement
python main.py "deep learning"

Option 3: Transformers (Alternative AI) πŸ€—

Offline Hugging Face models:

# Configure for Transformers
cp .env.example .env
# Edit .env and set: LLM_PROVIDER=transformers

python main.py "neural networks"

Option 4: OpenAI (If you have API key) πŸ’°

# Edit .env and set:
# LLM_PROVIDER=openai
# OPENAI_API_KEY=your_key_here

πŸ“– Usage

Command Line Interface

# Basic search
python main.py "attention mechanisms in transformers"

# Interactive mode
python main.py --interactive

# Limit results for faster testing
python main.py "computer vision" --max-papers 5

# Export results to JSON
python main.py "reinforcement learning" --export results.json

Python API

from research_agent import ResearchAgent

# Initialize agent (auto-detects configuration)
agent = ResearchAgent()

# Search for papers
results = agent.search("graph neural networks")

# Access results
print(f"Found {len(results['papers'])} papers")
print(f"Found {len(results['related_papers'])} related papers")
print(f"Found {len(results['github_repos'])} GitHub repos")

# Print summary
if results.get('llm_summary'):
    print(f"Summary: {results['llm_summary']}")

agent.close()

πŸ“ Project Structure

connected-papers/
β”œβ”€β”€ main.py              # CLI interface
β”œβ”€β”€ research_agent.py    # Core agent orchestrator
β”œβ”€β”€ paper_searcher.py    # arXiv/Scholar search
β”œβ”€β”€ connected_papers.py  # Related papers discovery
β”œβ”€β”€ github_searcher.py   # GitHub repository finder
β”œβ”€β”€ llm_processor.py     # Local LLM integration
β”œβ”€β”€ config.py            # Configuration settings
β”œβ”€β”€ utils.py             # Helper functions
β”œβ”€β”€ requirements.txt     # Dependencies
└── .env.example         # Environment template

βš™οΈ Configuration

Copy .env.example to .env and customize:

# LLM Provider: "ollama", "transformers", "rule_based", or "openai"
LLM_PROVIDER=ollama

# Ollama settings (if using Ollama)
OLLAMA_MODEL=llama2:7b
OLLAMA_BASE_URL=http://localhost:11434

# Transformers settings (if using Transformers)
TRANSFORMERS_MODEL=microsoft/DialoGPT-small

# Optional: GitHub token for higher rate limits
GITHUB_TOKEN=your_github_token

# Optional: OpenAI key (if using OpenAI)
OPENAI_API_KEY=your_openai_key

πŸ“Š Output Format

The agent returns structured JSON with:

{
  "query": "transformer neural networks",
  "papers": [
    {
      "title": "Attention Is All You Need",
      "authors": ["Vaswani, Ashish", "..."],
      "abstract": "...",
      "url": "https://arxiv.org/abs/1706.03762",
      "published": "2017-06-12",
      "relevance_score": 0.95
    }
  ],
  "related_papers": [...],
  "github_repos": [
    {
      "name": "pytorch/pytorch",
      "description": "Tensors and Dynamic neural networks...",
      "url": "https://github.com/pytorch/pytorch",
      "stars": 70000,
      "relevance_score": 0.88
    }
  ],
  "llm_summary": "Found 10 papers on transformer networks..."
}

πŸ”§ Troubleshooting

Ollama Issues

# Check if Ollama is running
ollama list

# Start Ollama service
ollama serve

# Pull a smaller model if needed
ollama pull llama2:7b-chat

Transformers Issues

# Install PyTorch if needed
pip install torch torchvision torchaudio

# Try a smaller model
# Edit .env: TRANSFORMERS_MODEL=distilbert-base-uncased

Search Issues

# If getting rate limited, add GitHub token to .env
GITHUB_TOKEN=your_token_here

# For Google Scholar blocks, the agent will fall back to arXiv

πŸ“ Examples

Research Query Examples

# Find papers and code for specific topics
python main.py "graph neural networks for drug discovery"
python main.py "attention mechanisms in computer vision"
python main.py "reinforcement learning robotics"
python main.py "natural language processing transformers"

# Academic paper investigation
python main.py "BERT model architecture"
python main.py "GPT transformer decoder"
python main.py "ResNet convolutional networks"

Advanced Usage

from research_agent import ResearchAgent

# Custom configuration
agent = ResearchAgent(max_papers=20, max_repos=10)

# Multiple queries
queries = [
    "federated learning privacy",
    "quantum machine learning",
    "explainable AI methods"
]

for query in queries:
    results = agent.search(query)
    print(f"Query: {query}")
    print(f"Papers: {len(results['papers'])}")
    print(f"Repos: {len(results['github_repos'])}")
    print("-" * 40)

agent.close()

πŸ”„ How It Works

  1. Query Processing: LLM or rule-based system expands and refines the research query
  2. Paper Search: Searches arXiv and Google Scholar for relevant papers
  3. Connected Papers: Uses paper IDs to find related works and citations
  4. GitHub Search: Searches for repositories using paper titles and keywords
  5. Relevance Scoring: Ranks results by relevance to the original query
  6. Summary Generation: LLM provides insights and summary of findings

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make changes and test thoroughly
  4. Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details.

πŸš€ Future Enhancements

  • Support for more paper databases (PubMed, IEEE, ACM)
  • Citation network visualization
  • Paper PDF analysis and extraction
  • More local LLM providers (LM Studio, GPT4All)
  • Web interface for non-technical users
  • Paper recommendation system
  • Research trend analysis

πŸ™ Acknowledgments


Ready to explore research? Start with:

python main.py "your research topic here"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages