If you need help with anything quantization or ML related (e.g. debugging code) feel free to book a 30 minute consultation session! https://calendly.com/oscar-savolainen I'm also available for long-term freelance work, e.g. for training / productionizing models, teaching AI concepts, etc. *Video Summary:* In this video we speedrun deploying an LLM embedding model from Huggingface into production on a GPU instance. We spin up a Runpod GPU instance, expose the required ports, install an infinity server inference engine, and make API requests to the server to obtain text embeddings.