[컴][머신러닝] 외부data 를 openapi 가 익히도록 해서 답변하도록 하기

 

ai / 머신러닝 / LLM / 언어모델 / 오픈api / 챗gpt /chatgpt / gpt code / code ai  / analyze code / source code github

외부data 를 openapi 가 익히도록 해서 답변하도록 하기

설치

pip install llama-index

git clone https://github.com/jerryjliu/gpt_index.git
cd gpt_index/examples/paul_graham_essay

paul_graham_essay 폴더에 가면 data 가 있다. 이 data 를 보면, paul_graham_essay.txt 가 있다. 간략하게 설명하면, 이 .txt를 OPENApi 에게 던져주고, 그것을 기반으로 대답을 하도록 하게 해주는 것이다.

이렇게 얻은 index 는 저장할 수 있다. index.save_to_disk()

다음처럼 .py 를 만들자.

from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader('data').load_data()
index = GPTSimpleVectorIndex(documents)
# load from disk
# index = GPTSimpleVectorIndex.load_from_disk('my-index.json')

response = index.query("What did the author do growing up?")
print(response)

# save to disk
index.save_to_disk('my-index.json')

그리고 OPENAI_API_KEY 를 환경변수로 설정해주자. 그리고 code를 실행하면 된다.

SET OPENAI_API_KEY=sk-myopenapikey
python run.py

요즘의 chatGPT 를 이용한 tool들

실제로 하는 일은 chatgpt 에 대한 prompt 를 만들어주는 것이 이것이 하는 일이다. 요즘 이런식의 툴들이 많이 보인다.

위 code 에는 다음과 같은 prompt 를 이용한다.

const makeQAPrompt = (projectName: string, repositoryUrl: string, contentType: string, chatPrompt: string, targetAudience: string) =>
  PromptTemplate.fromTemplate(
    `You are an AI assistant for a software project called ${projectName}. You are trained on all the ${contentType} that makes up this project.
  The ${contentType} for the project is located at ${repositoryUrl}.
You are given the following extracted parts of a technical summary of files in a ${contentType} and a question. 
Provide a conversational answer with hyperlinks back to GitHub.
You should only use hyperlinks that are explicitly listed in the context. Do NOT make up a hyperlink that is not listed.
Include lots of ${contentType} examples and links to the ${contentType} examples, where appropriate.
Assume the reader is a ${targetAudience} but is not deeply familiar with ${projectName}.
Assume the reader does not know anything about how the project is strucuted or which folders/files are provided in the context.
Do not reference the context in your answer. Instead use the context to inform your answer.
If you don't know the answer, just say "Hmm, I'm not sure." Don't try to make up an answer.
If the question is not about the ${projectName}, politely inform them that you are tuned to only answer questions about the ${projectName}.
Your answer should be at least 100 words and no more than 300 words.
Do not include information that is not directly relevant to the question, even if the context includes it.
Always include a list of reference links to GitHub from the context. Links should ONLY come from the context.
${
  chatPrompt.length > 0
    ? `Here are some additional instructions for answering questions about ${contentType}:\n${chatPrompt}`
    : ''
}
Question: {question}
Context:
{context}
Answer in Markdown:`,
  );

Reference

  1. Installation and Setup — LlamaIndex documentation : llama-index 설치
  2. Starter Tutorial — LlamaIndex documentation : test code 를 만들자

댓글 없음:

댓글 쓰기