Back in June, 2023, we ran a series of experiments with Google’s Bard AI to determine if it was capable of reading and understanding a PDF document. Our ultimate goal was to upload or provide a link to a PDF document, the AI would read and understand the document, then the user could ask questions about the information in the PDF or ask the AI to summarize the document. Bard was not able to perform this task.
Read more about our experiences here: Can Bard Read PDFs and URLs? Here’s What We Found
In the beginning of December, 2023, Google announced the release of its newest AI system: Gemini.
Gemini is a fully multi-modal AI system meaning it can interpret text, audio, images and even videos. Learn more here: Google Announces Gemini – The Most Advanced AI To Date
While the full power of Gemini is not yet available to the public, Google announced it will be slowly integrating features of Gemini into Bard. This means the feature set of Bard will continue to grow as it gets more upgrades behind the scenes.
As of December 12, 2023, Google has improved Bard by utilizing Gemini’s advanced abilities at understanding text.
From our testing at the time this post was written (December 15, 2023), we have found that Bard still cannot read PDF’s or access web URLs.
Gemini’s image processing capabilities have not yet been included in Bard. We will continue to monitor Google’s announcements and update this post when that happens. Read more here: Google Bard Latest Updates
In the meantime, Bard still has the ability to upload an image and summarize it. This can be used to summarize single page PDFs.
Prompt Bard To Read And Summarize a Single PDF Page – Work Around Solution
While Bard still cannot accept PDF files as inputs yet, it can accept images with text.
In the examples below, we will walk you through the process of converting a single page PDF to an image that Bard can read and understand.
As of right now (December 15, 2023), Bard will only allow a single image to be uploaded at a time. In Google’s AI studio however, you may attach multiple images to a prompt – however AI Studio is a more advanced tool that requires more effort to use and access.
Here is a breakdown of the process we will use to get started.
- Open the PDF document in your favorite PDF viewer
- Zoom the PDF document so the full page is in view
- Take a screenshot of the page and save as an image file
- Upload the image to Bard and provide a prompt with instructions
In this example, we will use the PDF document detailing the NASA Mars MAVEN project:
Step 1: Using the link above, either open the PDF in your web-browser or download the file to your computer and open is using Adobe PDF or a third-party PDF viewer. In the example below, I’m using Google Chrome to open the PDF.
Step 2: Click the the drop down where it reads ‘Automatic Zoom’ and select the option for ‘Page Fit’.
Step 3: Now take a screen grab of the PDF page. Save as an image file.
- For Windows: Use the Snipping Tool to select the area to save as a PNG file.
- For Mac: Press CMD+SHIFT+5 to open the screen grab dialog. Use the Capture Selected Portion option to size the dashed lines around the edges of the PDF page.
Step 4: Upload the captured image to Bard. Prompt Bard to summarize the document.
- Prompt: Summarize the text in this document into 3 bullet points.
- Click the Image Icon to the left of the text box. Upload the screenshot of the PDF page you captured
After running the prompt, Bard will interpret the image and output a summary:
Issues We Experienced
As mentioned above, the full multi-modal capabilities of Gemini are not yet fully integrated into Bard. This means, Bard is not using the more advanced image processing capabilities we can soon expect.
Because of this, when prompting Bard to return the EXACT text in the image, sometimes it did, sometimes it returned only a summary of the text, or it returned nothing at all.
Also, being limited to a single image is also limiting the usefulness of this method.
Can Bard Read PDFs Directly Through URLs?
As of this writing, it seems Bard still cannot follow a direct link.
From our testing, it seems Bard considers the URL in the prompt as just part of the prompt text. It does not see the prompt (or command) and the URL as two different things.
Most URLs have some clues about the company name, what the post is about, etc. For example:
Each of these URLs have multiple clues as to what they are about. Bard is able to to parse these words and generate a response without even opening the URL.
When we provided a URL that did not contain any ‘hints’ Bard responded that it could not help us.
Here is the URL we used to test this theory:
While Bard is constantly improving and getting new features added, it still lacks the ability to read a PDF in raw format (ie. without having to change the PDF into an image first) or the ability to follow URLs.
Our expectation is as Gemini Pro is merged into Bard, we should start seeing the ability of Bard to read text in images and actually output the complete text. Additionally, our hope is that as the multi-modal capabilities of Bard expand, Google allows user to either upload files or provide URLs to files in the cloud. This would allow a major breakthrough in the productivity gains Bard could provide to users.