Creating Google Search API with AgentQL

Published Time: Feb 27, 2024

Reading Time: 15 min

Besides the human-facing use cases for Large Language Models (LLMs), more industrial-facing cases will need LLMs to deal with structured data and code. In these cases, human supervisions are usually not required or preferred to be avoided for efficiency reasons. The team of Tiny Fish Inc. recently builds a tool, AgentQL, in this domain for assisting humans (mostly software developers and researchers) to gain programmatic access to one of the largest database in the world, i.e. the Internet.

What purpose is AgentQL built for?

Frictional works often assume all kinds of AI assistants who can do (almost) everything for the user, such as Javis and Friday created by Tony Stark. AI Agent is the word often used to refer to such kind of AI who has access to the tools and perhaps can also do the planning. Here I rephrase AI Agents as generative models with access to tools (or functions) and the ability to plan for using these tools for a specific goal. The hard way to achieve this is to teach the model in an end-to-end way. It comes with the promise that the resulting agent can generalize to actions not taught a prior -- autonomous agents.

While autonomous agents sound more closer to Javis in my mind, they might not immediately fit into our lives (and plus I do not think humans are ready for that). So what if we take the "planning" part out? The only thing humans want is that AIs can complete the task well-specified by humans. Namely, AIs do not need to plan what actions they will need to perform but are dedicated to act as instructed.

Following instructions often require outside knowledge that is not shown to the LLMs during the training time, or it is not possible to be seen because that is a future event to the LLM. Many tools show up as OpenAI instroduces the plug-in API for ChatGPT for assiting the LLM, such as to do mathematical reasoning. So as you really dive into this area and try to build a tool for your own use case, you will discover another shortage of resource (besides the compute) -- the shortage of APIs.

While LLMs are ready to consume the information related to your task, you may discover that gathering such information is the true bottleneck. Automated flight booking with LLMs: possible because airlines do have their APIs for developers. Automated Amazon product search with LLMs: the general public does not seem to have such API ready for use.

The problem above is what AgentQL trying to tackle -- creating programmatic accesses to any webpage for all queries.

How does AgentQL create APIs for websites

What AgentQL does is just prompting the LLM with the source code (e.g. HTML) of the page, but in a very careful way. AgentQL expects users to write a structured string containing the minimal sufficient information about the web elements of interests. For example, if you are interested to interact with the search button and the search box on the page, your query may look like the following. I will point the readers to [1] to check the rules and supports for queries.

QUERY = """{

search_box

search_btn

}"""

AgentQL supports variations to each element in the querry as long as they are English letters and some special chars like `_`. Therefore, you are free to use either of the following.

search_btn

search_button

To help the user locate and interact with the elements of interest on the webpage, AgentQL will prompt the LLM with the query and the HTML of the page and the completion is expected to contain the locations of all elements in the HTML. Tiny Fish team creates an open-source SDK package that leverages this completion from the LLM to create interaction mehods, such as click and fill, for each term in the query. The supported interaction will depend on the actual functionality of the web element. For example, if the user query asks for a button, then only the user can only click the button and they should not expect that the SDK implements a "fill" method for such button because no button can be filled with text in reality.

Creating Google Search API with AgentQL

https://google.com probably has the highest visit rate in the world and I am sure people use it everyday. So I am going to demo how to create a Google Search API so you can access Google Search simply from your command line. You can integrate this function to you LLMs so they can access the information from Google and perform RAG or other type of generations. I am using Python in this example.

Step 0. Get Access to the server

The first thing first -- you need to get an API key from Tiny Fish and put the key as an environment variable. More information should be found in the Discord Channel or from the homepage.

export WEBQL_API_KEY=<your-api-key>

Step 1. Instal Dependencies

The first step is to install packages and import them. Run the following in your terminal.

pip install webql

playwright install chromium

Step 2. Import Packages in Your Python Script

import logging

import webql

from time import time

logging.basicConfig(level=logging.DEBUG)

log = logging.getLogger(__name__)

You do not have to do the logging but it is recommended.

Step 3. Start to Query the Google Search Page

I use the following lines to get two web elements, the search button and the search box, which are sufficient for performing the search action. I set the query in my session and send it out in the last line. As a return, I get a response back.

session = webql.start_session("https://www.google.com")

QUERY = """{

search_box

search_btn

}"""

log.debug("Analyzing...")

response = session.query(QUERY)

Step 4 Search for Anything You Want

Now, you can expect the response has two attributes, i.e. search_box and search_btn, as specified by you in the query. You should be able to fill in a string for the search box and click the search button. If you run the following code, you will see this in your chrome and the page will show the Google Search results for the question "Who is the CEO of OpenAI?". I put the program in sleep in case the whole process is too fast for you to actually see what is happening.

search_content = "Who is the CEO of OpenAI?"

response.search_box.fill(search_content)

response.search_btn.click(force=True)

time.sleep(1000)

Do not forget to stop the session when you are done.

session.stop()

The full script is here.

import logging

import webql

from time import time

logging.basicConfig(level=logging.DEBUG)

log = logging.getLogger(__name__)

if __name__ == "__main__":

session = webql.start_session("https://www.google.com")

QUERY = """

{

search_box

search_btn

about_link

}

"""

log.debug("Analyzing...")

response = session.query(QUERY)

log.debug("Inputting text...")

response.search_box.fill("tinyfish")

log.debug('Clicking "Search" button...')

response.search_btn.click(force=True)

time.sleep(1000)

session.stop()

Incoming blog post

In the next blog post, I will share how I integrate multiple AI models with AgentQL to create even complex functions.

[1] Best Practice of Creating Query

[2] Youtube Video: OpenAI's Agent 2.0: Excited or Scared?

Creating Google Search API with AgentQL

[1] Best Practice of Creating Query

Contact: thezifan@gmail.com