GenAI with LLMs
Note
This blog post is the compendium of my notes from the online Coursera Course It covers detailed concepts of how to select and tune your LLMs for your specific usecase.
Generative AI Project Lifecycle
The course outlines a project lifecycle for incorporating an LLM into your application.
LLM Optimisation Techniques
| Pre-training | Prompt Engineering | Prompt tuning and fine-tuning | Reinforcement learning / Human Feedback (RLHF) | Compression / Optimisation / Deployment | |
|---|---|---|---|---|---|
| Training Duration | Days to weeks to months | Not Required | Minutes to hours | Minutes to hours similar to fine-tuning | Minutes to hours |
| Customisation | Determine model architecture, size and tokeniser. Choose vocabulary size and # of tokens for input/context Large amount of domain training data |
No model weights Only prompt customisation |
Tune for specific tasks Add domain-specific data Update LLM model or adapter weights |
Need separate reward model to align with human goals (helpful, honest, harmless ) Update LLM model or adapter weights |
Reduce model size through model pruning, weight quantization, distillation. Smaller size, faster inference |
| Objective | Next-token prediction | Increase task performance | Increate task performance | Increase alignment with human preferences | Increase inference performance |
| Expertise | High | Low | Medium | Medium-High | Medium |
Using LLMs in Applications
Retrieval Augmented Generation (RAG)
Considerations
- RAG response must fit into the context window
- Large documents must be slit
- Relevance, based on embedding vectors
- Require a vector database
- Text generation can include a citation to the original document