GPT Discussion

Architecture
Document Loading
Document Chunking
OpenAI Embeddings
PineCone
LLM

Slide 1

SYSTEM ARCHITECTURE

Below is the open AI Architecture which has important components which helps us create the bot which can be trained

Slide 2

DCOUMENT LOADING

Document loading is discussed in detail in this section


            #Load your data
                loader = UnstructuredPDFLoader("field-guide-to-data-science.pdf")
                print("Done with loading the pdf file...")

                data = loader.load()

Slide 3

DCOUMENT LOADING

Document loading is discussed in detail in this section


                # Chunk your data up into smaller documents 
                text_spliter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap=0)
                texts = text_spliter.split_documents(data)

Slide 4

DCOUMENT LOADING

Document loading is discussed in detail in this section


                #Create Embeddings of your documents to get ready for semantic search 
                from langchain.vectorstores import Chroma, Pinecone
                from langchain.embeddings.openai import OpenAIEmbeddings 
                import pinecone 
                
                OPENAI_API_KEY = os.environ.get('OPENAI_API_KEY', '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')
                PINECONE_API_KEY = os.environ.get('PINECONE_API_KEY', '0b0244fc-fee8-43c0-8f6a-ec6bad3b2dcc')
                PINECONE_API_ENV = os.environ.get('PINECONE_API_ENV', 'northamerica-northeast1-gcp')
                
                embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

Slide 5

DCOUMENT LOADING

Document loading is discussed in detail in this section


                pinecone.init(
                    api_key = PINECONE_API_KEY,
                    environment=PINECONE_API_ENV
                )
                index_name = "datascience-book"
                
                docsearch = Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)

Slide 6

DCOUMENT LOADING

Document loading is discussed in detail in this section


                query = "What are examples of good data science teams ?"
                docs = docsearch.similarity_search(query)
                
                # Here's an example of the first document that was returned 
                print(docs[0].page_content[:450])
                
                #Query those docs to get your answer back 
                
                from langchain.llms import OpenAI
                from langchain.chains.question_answering import load_qa_chain
                
                llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
                chain = load_qa_chain(llm, chain_type="stuff")
                
                query= "what is the collect stage of data maturity ?"
                docs = docsearch.similarity_search(query)
                answer = chain.run(input_documents=docs, question=query)
                
                print(f"Question:{query}")
                print(f"Answer:{answer}")

GPT Discussion

Table of contents

Slide 1

SYSTEM ARCHITECTURE

Slide 2

DCOUMENT LOADING

Slide 3

DCOUMENT LOADING

Slide 4

DCOUMENT LOADING

Slide 5

DCOUMENT LOADING

Slide 6

DCOUMENT LOADING