Friday, April 26, 2024

RAG using Milvus, HuggingFace, LangChain, Ragas, with or without OpenAI


Christy Bergman, Developer Advocate, Zilliz Abstract: You’ve heard good data matters in Machine Learning, but does it matter for Generative AI applications? Corporate data often differs significantly from the general Internet data used to train most foundation models. Join me for a Python demo tutorial on building a customizable RAG (Retrieval Augmented Generation) stack using OSS Milvus vector database, LangChain, Ragas, HuggingFace, and optional Zilliz cloud and OpenAI. Learn best practices and advanced techniques to optimize GenAI workflows with your own data. What you’ll learn: * Using Python, learn how to build a customizable open source RAG (Retrieval Augmented Generation) chatbot with Milvus vector database, LangChain, Ragas, and HuggingFace models, and optional Zilliz cloud and OpenAI. * Best practices around embedding text data ("embedding" in AI is like "featurization" in ML). * Best practices around vector indexing and search. * Best practices around RAG evaluation with Ragas. Tutorial notebook link will be linked here: https://github.com/milvus-io/bootcamp/tree/master/bootcamp Tutorial instructions like this but more focused on running locally: https://docs.google.com/document/d/1yetuGEkYqh_1rAYEBXFAnwsFClMAIQFx1erLHHKXTLg Slides like these: https://docs.google.com/presentation/d/1hpiaiVMHm4oQr5P86NhcrL0qXwBIdWhHEZOlWKySHyM Speaker Bio: 6+ years building AI and ML systems with math and coding. My mission is to help developers and customers use those tools (with fewer heartaches than I had teaching myself) to organize and search unstructured data, such as images, videos, texts, and audios, using LLM and multi-modal apps. I enjoy learning new technologies and tools and solving challenging problems with math and coding. As a Developer Advocate, I use my skills in Python, HuggingFace, PyTorch, Spark, RLlib, Ray distributed computing, and vector databases to create and share engaging and informative content, such as tutorials, demos, blogs, and talks. I also manage the Bay Area Unstructured Data meetup group, where I organize events and foster a community of enthusiasts and experts in the field. Outside of work, I enjoy hiking and bird watching. In my background photo: Australian bustard, spotted near Cairns, Australia. https://www.meetup.com/sf-bay-acm/events/300144563/ Pre-trimmed times: 0:00 Chapter Intro 3:17 Speaker Intro Gap, bad audio & missing first few slides, see link above 10:22 Presentation

No comments:

Post a Comment