9F: How to Use AI for Free and Privately - A Local Installation Guide for Multiple AI Models with Anything LLM
Welcome to my presentation on AI tools and their applications. I've been working with AI systems since 2023, exploring their capabilities and limitations across different platforms.
JP
by Johnny Phung
1st Half Agenda
1
What is the GPT in ChatGPT
2
There's more to life than ChatGPT
3
Getting to know Anything LLM
4
Demo Time: Installation and Setup
5
Basic API Configuration
2nd Half Agenda
1
Tokens, inference and context windows oh my!
2
System Hardware Requirements for Local Inference
3
Self Hosting Requirements and Benefits
4
Agent Flow Overview
5
MCP Overview
A Brief History of ChatGPT Generative Pre-trained Transformer
2017: The Foundation
Google engineers publish "Attention is All You Need" paper introducing the transformer architecture.
2020: GPT-1 Release
OpenAI begins building and releases the first version of the GPT model.
2021-2022: Rapid Evolution
GPT-2 launches in 2021, followed by GPT-3 and ChatGPT in 2022.
Beyond ChatGPT: Alternative AI Assistants
Claude Desktop
Anthropic's conversational AI with strong reasoning capabilities and ethical guardrails.
Long context window
Document analysis
Low hallucination rate
Google AI Studio
Google's platform for accessing Gemini models with integrated development tools.
Multimodal capabilities
Search integration
Code generation focus
Other Contenders
The field continues to expand with specialized and open-source alternatives.
Mistral AI
Microsoft Copilot
Llama, Gemma and Phi models
Deepseek and Qwen Models
AnythingLLM: The All-in-One Solution
Document Support
Works with PDFs, Word documents, CSV, and PowerPoints.
LLM Flexibility
Compatible with almost any LLM, local or cloud-based.
Privacy-Focused
Everything stored and run locally by default.
Multi-Modal
Supports text, images, and audio in one interface.
AnythingLLM offers both free desktop and self-hosted options, plus cloud plans starting at $50/month for individuals.
Getting Started with AnythingLLM
Visit the Website
Go to anythingllm.com to access all installation options.
Choose Your Version
Select between Desktop (free), Docker, or Cloud-hosted options.
Download and Install
For desktop users, download the application and follow setup prompts.
Connect Your LLM
Configure your preferred LLM connection, local or cloud-based.
The desktop version offers the easiest way to get started without technical configuration.
Demo Time and Exit Strategy
AnythingllmWorkflow.pdf
Click to open the step by step guide as seen in live demo.
Understanding AI Core Concepts
Tokens
Words or word fragments that AI models process. GPT models read text as chunks, not complete words.
Inference
The process of AI generating responses. This is where computational resources are consumed when using AI.
Context Window
The amount of text an AI can "remember" during a conversation. Larger windows allow more comprehensive analysis.
These concepts directly impact AI performance and costs. Different models offer varying token limits and inference speeds, affecting their practical applications.
Comparing Models:
GPT-4.1 (OpenAI)
Context Window: 1 Million tokens
Requires cloud access via API
No local hardware option
Best for document analysis
Pricing: ~$2 input / ~$8 output per million tokens
Claude 3.7 (Anthropic)
Context Window: 200K tokens
Cloud-only via API or Bedrock
Superior for long-form reasoning
Best for code generation
Pricing: ~$3 input / ~$15 output per million tokens
Gemini 2.5 Pro (Google)
Context Window: Up to 1M tokens
Available via Google Cloud
Excels at multimodal tasks
Strong Google Workspace integration
Pricing: ~$2.50-$15 (varies by usage)
Retrieval Augmented Generation (RAG)
Retrieval Augmented Generation combines AI language models with external knowledge retrieval to provide more accurate, factual responses.
Document Retrieval
System searches through your documents to find relevant information based on user queries.
Vector Storage
Retrieved information is stored in embeddings that capture semantic meaning beyond keywords.
Context Integration
Relevant documents are inserted into the AI's context window alongside the query.
Response Generation
LLM generates answers based on both retrieved information and its trained capabilities.
This approach dramatically reduces hallucinations while allowing AI to reference your specific knowledge base rather than relying solely on its training data.
Understanding Embeddings & Vector Storage
1
Embedding Models
Convert text into numerical vector representations that capture semantic meaning.
Nomic-Embed-1.5: Open-source model with strong performance
OpenAI Embeddings: High-quality but API-dependent
Creates "understanding" for AI systems
2
Vector Databases
Specialized storage systems optimized for vector similarity search.
LanceDB: Lightweight, embedded vector database
Pinecone: Cloud-native vector search service
Enable semantic search capabilities
3
Why They Matter
Essential infrastructure for AI applications like AnythingLLM.
Powers document retrieval
Enables context-aware responses
Reduces token usage and costs
System Hardware Requirements for Local AI
7GB
7B Model
Entry-level models like Llama 3 7B require minimal VRAM for inference.
14GB
13B Model
Mid-tier models like Llama 3 13B need consumer-grade GPUs.
24GB+
70B Model
Large models demand high-end GPUs like RTX 4090 or A100.
VRAM requirements scale directly with model size. Each billion parameters typically needs 1-2GB of VRAM for efficient inference.
Quantization techniques can reduce these requirements by up to 75%, enabling larger models on consumer hardware.
Self-Hosting AnythingLLM: Requirements & Benefits
AnythingLLM Docker supports both single and multi-user environments with local LLMs, RAG, and Agents—all with minimal configuration and complete privacy.
Docker hosting works locally or in cloud environments like Hostinger or Railway. Choose Docker for team collaboration, browser access, and public-facing features.
Agent Flow Overview: No-Code AI Capabilities
Agent Flows
A no-code visual interface for building custom AI capabilities without programming.
Drag-and-drop interface
Built for everyone
Available in Docker and Desktop versions
What is the AnythingLLM Community Hub?
The AnythingLLM Community Hub is a platform and marketplace for AnythingLLM users to share system prompts, slash commands, agent skills, and more.
The community hub enables you to share your own items, skills, and workflows with the AnythingLLM community both publicly and privately.
An open-source protocol developed by Anthropic for AI integration. It creates standardized connections between LLM applications and external tools.
Key Benefits
Enables seamless access to data sources without custom coding. Perfect for AI-powered IDEs, chat interfaces, and custom workflows.
AnythingLLM Integration
Fully supports all MCP tools for use with AI Agents. Works with existing MCP-compatible tools right out of the box.
MCP expands AnythingLLM's capabilities by standardizing how your AI agents connect with external resources. This open protocol ensures compatibility with a growing ecosystem of tools.