GenAI with LLMs

Note

This blog post is the compendium of my notes from the online Coursera Course It covers detailed concepts of how to select and tune your LLMs for your specific usecase.

Generative AI Project Lifecycle

The course outlines a project lifecycle for incorporating an LLM into your application.

LLM Optimisation Techniques

Pre-training Prompt Engineering Prompt tuning and fine-tuning Reinforcement learning / Human Feedback (RLHF) Compression / Optimisation / Deployment
Training Duration Days to weeks to months Not Required Minutes to hours Minutes to hours similar to fine-tuning Minutes to hours
Customisation Determine model architecture, size and tokeniser.

Choose vocabulary size and # of tokens for input/context

Large amount of domain training data
No model weights

Only prompt customisation
Tune for specific tasks

Add domain-specific data

Update LLM model or adapter weights
Need separate reward model to align with human goals (helpful, honest, harmless )

Update LLM model or adapter weights
Reduce model size through model pruning, weight quantization, distillation.

Smaller size, faster inference
Objective Next-token prediction Increase task performance Increate task performance Increase alignment with human preferences Increase inference performance
Expertise High Low Medium Medium-High Medium

Using LLMs in Applications

Retrieval Augmented Generation (RAG)

Considerations

  1. RAG response must fit into the context window
    1. Large documents must be slit
  2. Relevance, based on embedding vectors
    1. Require a vector database
    2. Text generation can include a citation to the original document