A
- AI Agent
- An autonomous AI system that can perceive its environment, make decisions, and take actions to achieve specific goals. Unlike chatbots, agents can use tools, execute code, browse the web, and complete multi-step tasks independently. Learn more →
- Agentic Workflow
- A workflow where AI agents autonomously complete tasks with minimal human intervention. The agent decides which tools to use, what information to gather, and how to proceed based on the situation.
- API (Application Programming Interface)
- A set of protocols that allows different software applications to communicate. In AI, APIs like OpenAI's GPT API let you integrate language models into your applications.
C
- Chunking
- The process of breaking down large documents into smaller, manageable pieces for processing by an LLM. Proper chunking is crucial for RAG systems to retrieve relevant context without exceeding token limits.
- Context Window
- The maximum amount of text (measured in tokens) that an LLM can process in a single request. GPT-4 Turbo has a 128K token context window, while Claude 3 supports up to 200K tokens.
- Chain of Thought (CoT)
- A prompting technique where you ask the model to show its reasoning step-by-step before giving a final answer. This often improves accuracy on complex reasoning tasks.
E
- Embeddings
- Numerical vector representations of text that capture semantic meaning. Similar concepts have similar embeddings, enabling semantic search and retrieval. Essential for RAG systems.
- Enterprise RAG
- RAG systems designed for enterprise use cases with additional requirements: role-based access control, audit logging, data governance, scalability, and integration with existing enterprise systems. Learn more →
F
- Fine-Tuning
- Training a pre-trained model on your specific data to improve its performance on your use case. More expensive than RAG but can be necessary for specialized domains or formatting requirements. Learn more →
- Function Calling
- A capability where LLMs can output structured JSON to call external functions or APIs. Enables AI agents to interact with databases, send emails, or perform actions in external systems.
G
- Guardrails
- Safety measures that constrain AI behavior to prevent harmful, off-topic, or incorrect outputs. Includes content filtering, topic restrictions, and output validation.
- GPT (Generative Pre-trained Transformer)
- A family of large language models developed by OpenAI. The architecture uses transformers and is trained on massive text datasets to generate human-like text.
H
- Hallucination
- When an AI model generates information that sounds plausible but is factually incorrect or made up. RAG systems help reduce hallucinations by grounding responses in retrieved documents.
- Human-in-the-Loop (HITL)
- A system design where humans review, approve, or correct AI outputs before they're used. Common in high-stakes applications like healthcare, legal, and financial services.
L
- LLM (Large Language Model)
- AI models trained on massive text datasets that can understand and generate human-like text. Examples include GPT-4, Claude, Gemini, and Llama. The foundation for most modern AI applications.
- LangChain
- A popular open-source framework for building LLM applications. Provides abstractions for chains, agents, memory, and retrieval, making it easier to build complex AI workflows.
- LoRA (Low-Rank Adaptation)
- An efficient fine-tuning technique that adds small trainable layers to a frozen model. Dramatically reduces the compute and memory needed for fine-tuning while maintaining quality.
M
- Multimodal AI
- AI models that can process and generate multiple types of data: text, images, audio, and video. GPT-4V and Gemini are examples of multimodal models.
- Model-as-Judge
- Using an LLM to evaluate the quality of outputs from another LLM. A scalable alternative to human evaluation for testing and monitoring AI systems.
N
- NLP (Natural Language Processing)
- The field of AI focused on enabling computers to understand, interpret, and generate human language. LLMs are the current state-of-the-art approach to most NLP tasks.
P
- Prompt Engineering
- The practice of designing and optimizing prompts to get better outputs from LLMs. Includes techniques like few-shot learning, chain of thought, and system prompts.
- Prompt Injection
- A security vulnerability where malicious users craft inputs that override or manipulate the system prompt, causing the AI to behave unexpectedly or reveal sensitive information.
R
- RAG (Retrieval-Augmented Generation)
- A technique that enhances LLM responses by first retrieving relevant documents from a knowledge base, then using them as context for generation. Reduces hallucinations and enables AI to access up-to-date or proprietary information. Learn more →
- Reranking
- A second-stage retrieval step that uses a more sophisticated model to reorder initial search results by relevance. Improves RAG accuracy, especially for complex queries.
S
- Semantic Search
- Search that understands meaning rather than just matching keywords. Uses embeddings to find documents that are conceptually similar to the query, even if they don't share exact words.
- System Prompt
- Instructions given to an LLM that define its behavior, personality, and constraints. The system prompt persists across the conversation and shapes all responses.
T
- Tokens
- The basic units that LLMs use to process text. A token is roughly 4 characters or ¾ of a word in English. API pricing and context limits are measured in tokens.
- Transformer
- The neural network architecture behind modern LLMs. Uses attention mechanisms to process sequences in parallel, enabling training on massive datasets. Introduced in the 2017 paper "Attention Is All You Need."
- Temperature
- A parameter that controls randomness in LLM outputs. Lower temperature (0-0.3) produces more focused, deterministic responses. Higher temperature (0.7-1.0) produces more creative, varied outputs.
V
- Vector Database
- A database optimized for storing and querying embeddings (vectors). Essential for RAG systems. Popular options include Pinecone, Weaviate, Qdrant, and pgvector.
- Vector Search
- Finding similar items by comparing their vector representations. Uses algorithms like HNSW or IVF to efficiently search through millions of vectors.
Z
- Zero-Shot Learning
- When an LLM performs a task it wasn't explicitly trained for, without any examples. Modern LLMs excel at zero-shot tasks due to their broad training. Contrast with few-shot learning, where you provide examples in the prompt.