I want to manually control the agent’s context, for example:
user: Hello
assistant: Hi there
user: Tell me about yourself
assistant: I'm a large language model
user: XXXXX
assistant: AAAAA
At this point, I only want to send (manually constructed in some way):
user: Hello
assistant: Hi there
user: What are you doing?
That is to say, I might select only part of the context as the new input to the model (the model generates replies based only on the context I send), or even freely add content that wasn’t part of the current conversation and send it as a whole input for the model to generate a reply.
In other words, I want complete custom control over the context.
When using OpenAI API-compatible models, I would use input like this:
{
"role": "user",
"content": [
{
"type": "input_text",
"text": "Hello"
}
]
},
{
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "Hi there! How can I help you? 😊"
}
]
},
{
"role": "user",
"content": [
{
"type": "input_text",
"text": "What are you doing?"
}
]
},
{
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "I'm here waiting for your questions or requests, ready to assist! How are you doing? Is there anything you want to talk about or need help with?"
}
]
}
However, when I use Gemini, it returns an error:
GenerateContentRequest.contents[1].parts: contents.parts must not be empty
Question
If I want the agent to generate replies only based on my custom-defined context, then:
- Do I need to adapt the interface for each model myself, or is there a unified input format that will be automatically adapted to each model’s API? If so, how do I do that?
- How can I achieve the above custom context input behavior—ensuring the reply is based only on my constructed context?
Furthermore, how should I adapt to different model-specific API formats, such as Gemini?