The JSON format (JavaScript Object Notation) is so organised and structured, that it’s ideal for most of your data to turn into a nice readable, and accessible output.

In this post, we’ll feed an AI Prompt with a PDF file to get the output in a JSON format that we can use in our Agent flow.

PDF input

The below is an extract of my PDF sample. It contains text, tables, and other values. We’ll use it in the prompt to test the model.

Agent flow

Let’s build our Agent flow as a Tool by clicking on the Tools tab, + Add a tool, then Agent Flow.

Hypothetically, we’d want the Agent flow / AI Prompt to read PDF file that a user had uploaded in the chat, so we’ll add a File input in the trigger.

Next, we’ll add the Run a prompt action, and create a new prompt by clicking on + New custom prompt in the dropdown.

Here’s our first shot at a what looks like a good prompt for our scenario 🙂

  1. We give some instructions
  2. We provide the PDF sample for testing
  3. And by clicking Test, the model returns a JSON formatted response.

In the Output (top right), it says Text, but you can choose JSON from the dropdown if you wish to have a prettier visual. However, we’ll use the Parse JSON action in the agent flow, so we can work with text even if it wasn’t indented.

Once we think the response from the model is what we need, click on Save. The PDF File input in the action will then be the File content contentBytes.

Let’s run the flow to see if all goes well (Publish and Test), and get what we need for later…

When the flow runs successfully, the output we need is under:

body/responsev2/predictionOutput/text

But before going any further, observe that we’ve got things we don’t want 😥 Any new lines (\n), carriage returns (\r) or anything else will cause issues that we need to remediate.

We want our text to be as clean as possible, so we’ll modify the prompt for that.

Return only valid JSON format with no comments, no explanation, no markdown, no new lines, and no carriage returns.

Now, as mentioned earlier, add the Parse JSON action. The Content will be:

outputs('<YOUR-PROMPT-ACTION-NAME>')?['body/responsev2/predictionOutput']?['text']

and I’ve used the prompt model output as a sample payload to generate the schema.

Let’s run the flow again to check the results 🙂

What’s the problem?

All seems to work as expected, and we can continue with more actions in our Agent flow. Grab the values from the Parse JSON, etc…

BUT what I have noticed is that, the prompt may change the JSON objects & property names! 😅

This means that your next actions will fail because instead of body/report, the next response from the model will be body/report_name.

What can we do?

Instructions with Generative AI are extremely important. Therefore, we’ll add our JSON schema into the prompt.

Let’s imagine that the sample payload we’ve added to the Parse JSON action is the one we want every time, then that’s the one we’ll use in the prompt.

I’ve also added a Compose action with the JSON schema for simplicity because we need it in the prompt action now.

Make sure that your schema is the same everywhere. And by doing that, we’re clear with what we want the model to output. This adds consistency and mitigates the risk of the flow failing.

Thanks for reading! 🙂

Leave a Reply

Discover more from Veronique's Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading