GenAI with LLMs

Note

This blog post is the compendium of my notes from the online Coursera Course It covers detailed concepts of how to select and tune your LLMs for your specific usecase.

Generative AI Project Lifecycle

The course outlines a project lifecycle for incorporating an LLM into your application.

LLM Optimisation Techniques

	Pre-training	Prompt Engineering	Prompt tuning and fine-tuning	Reinforcement learning / Human Feedback (RLHF)	Compression / Optimisation / Deployment
Training Duration	Days to weeks to months	Not Required	Minutes to hours	Minutes to hours similar to fine-tuning	Minutes to hours
Customisation	Determine model architecture, size and tokeniser. Choose vocabulary size and # of tokens for input/context Large amount of domain training data	No model weights Only prompt customisation	Tune for specific tasks Add domain-specific data Update LLM model or adapter weights	Need separate reward model to align with human goals (helpful, honest, harmless ) Update LLM model or adapter weights	Reduce model size through model pruning, weight quantization, distillation. Smaller size, faster inference
Objective	Next-token prediction	Increase task performance	Increate task performance	Increase alignment with human preferences	Increase inference performance
Expertise	High	Low	Medium	Medium-High	Medium

Using LLMs in Applications

Retrieval Augmented Generation (RAG)

Considerations

RAG response must fit into the context window
1. Large documents must be slit
Relevance, based on embedding vectors
1. Require a vector database
2. Text generation can include a citation to the original document