Terms and Conditions Documents of Health Care Benefits

Install the Indexify Extractor SDK, Langchain Retriever and the Indexify Client¶

In [ ]:

Copied!

%%capture
!pip install indexify-extractor-sdk indexify-langchain indexify
%%capture
!pip install indexify-extractor-sdk indexify-langchain indexify

Start the Indexify Server¶

In [ ]:

Copied!

!./indexify server -d
!./indexify server -d

Download an Embedding Extractor¶

On another terminal we'll download and start the embedding extractor which we will use to index text from the Insurance pdf document.

In [ ]:

Copied!

!indexify-extractor download hub://embedding/minilm-l6
!indexify-extractor join-server
!indexify-extractor download hub://embedding/minilm-l6
!indexify-extractor join-server

Download an Chunking Extractor¶

On another terminal we'll download and start the chunking extractor that will create chunks from the text and embeddings.

In [ ]:

Copied!

!indexify-extractor download hub://text/chunking
!indexify-extractor join-server
!indexify-extractor download hub://text/chunking
!indexify-extractor join-server

Download the PDF Extractor¶

On another terminal we'll install the necessary dependencies and start the PDF extractor which we will use to get text, bytes or json out of Insurance PDF documents.

Install Poppler on your machine

In [ ]:

Copied!

!sudo apt-get install -y poppler-utils
!sudo apt-get install -y poppler-utils

Download and start the PDF extractor

In [ ]:

Copied!

!indexify-extractor download hub://pdf/pdf-extractor
!indexify-extractor join-server
!indexify-extractor download hub://pdf/pdf-extractor
!indexify-extractor join-server

Create Extraction Policies¶

Instantiate the Indexify Client

In [ ]:

Copied!

from indexify import IndexifyClient, ExtractionGraph
client = IndexifyClient()
from indexify import IndexifyClient, ExtractionGraph
client = IndexifyClient()

First, create policies to get contents out of the Insurance PDF and then create chunks and embeddings.

In [ ]:

Copied!





extraction_graph_spec = """
name: 'knowledgebase'
extraction_policies:
  - extractor: 'tensorlake/pdf-extractor'
    name: 'pdfextractor'
  - extractor: 'tensorlake/chunk-extractor'
    name: 'chunks'
    content_source: 'pdfextractor'
    input_params:
      chunk_size: 512
      overlap: 150
  - extractor: 'tensorlake/minilm-l6'
    name: 'getembeddings'
    content_source: 'chunks'
"""
extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph)
extraction_graph_spec = """
name: 'knowledgebase'
extraction_policies:
  - extractor: 'tensorlake/pdf-extractor'
    name: 'pdfextractor'
  - extractor: 'tensorlake/chunk-extractor'
    name: 'chunks'
    content_source: 'pdfextractor'
    input_params:
      chunk_size: 512
      overlap: 150
  - extractor: 'tensorlake/minilm-l6'
    name: 'getembeddings'
    content_source: 'chunks'
"""
extraction_graph = ExtractionGraph.from_yaml(extraction_graph_spec)
client.create_extraction_graph(extraction_graph)  

Upload a Insurance PDF File¶

In [ ]:

Copied!

import requests
req = requests.get("https://irdai.gov.in/documents/37343/931203/CHIHMGP22132V012122_HEALTH2083.pdf/b5c15df2-8a5a-5927-10d8-c3d136055139?version=1.1&t=1668769248703&download=true")

with open('HEALTH2083.pdf','wb') as f:
    f.write(req.content)
import requests
req = requests.get("https://irdai.gov.in/documents/37343/931203/CHIHMGP22132V012122_HEALTH2083.pdf/b5c15df2-8a5a-5927-10d8-c3d136055139?version=1.1&t=1668769248703&download=true")

with open('HEALTH2083.pdf','wb') as f:
    f.write(req.content)

In [ ]:

Copied!

client.upload_file(path="HEALTH2083.pdf")
client.upload_file(path="HEALTH2083.pdf")

What is happening behind the scenes¶

Indexify is designed to seamlessly respond to ingestion events by assessing all existing policies and triggering the necessary extractors for extraction. Once the PDF extractor completes the process of extracting texts, bytes, and JSONs from the document, it automatically initiates the embedding extractor to chunk the content, extract embeddings, and populate an index.

With Indexify, you have the ability to upload hundreds of Insurance PDF files simultaneously, and the platform will efficiently handle the extraction and indexing of the contents without requiring manual intervention. To expedite the extraction process, you can deploy multiple instances of the extractors, and Indexify's built-in scheduler will transparently distribute the workload among them, ensuring optimal performance and efficiency.

Perform RAG¶

Initialize the Langchain Retriever.

In [ ]:

Copied!

from indexify_langchain import IndexifyRetriever
params = {"name": "knowledgebase.getembeddings.embedding", "top_k": 3}
retriever = IndexifyRetriever(client=client, params=params)
from indexify_langchain import IndexifyRetriever
params = {"name": "knowledgebase.getembeddings.embedding", "top_k": 3}
retriever = IndexifyRetriever(client=client, params=params)

Now create a chain to prompt OpenAI with data retrieved from Indexify to create a simple Q&A bot

In [ ]:

Copied!





from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

In [ ]:

Copied!





template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI()

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

model = ChatOpenAI()

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

Now ask any question related to the ingested Insurance PDF document

In [ ]:

Copied!

chain.invoke("What is a Ayush Hospital?")
# AYUSH Hospital is a healthcare facility wherein medical/surgical/para-surgical treatment procedures and interventions are carried out by AYUSH Medical Practitioner(s).
chain.invoke("What is a Ayush Hospital?")
# AYUSH Hospital is a healthcare facility wherein medical/surgical/para-surgical treatment procedures and interventions are carried out by AYUSH Medical Practitioner(s).