> ## Documentation Index
> Fetch the complete documentation index at: https://watermelon.ai/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Website

> Learn how to use your website as a source to automatically pull content into your AI Agent’s knowledge.

Using the Website source, you can easily **"crawl"** the pages on your website (using a sitemap, your root domain, or manually added URLs) and **integrate** that information into your AI Agent.

Once crawled, your AI Agent can use this knowledge to answer questions quickly and accurately, and keeps your AI Agent up to date with the **latest information** on your website without manual updates or documents.

<Info>
  Website is available in all plans. Limits differ per plan, for more information, check our [Pricing page](/help-center/plans-pricing/pricing-overview).
</Info>

## Adding a Website Source

<Steps>
  <Step title="Open your AI Agent">
    Go to **Agents** in the main menu and select the relevant AI Agent.

    <Frame>
      <img src="https://mintcdn.com/watermelon/SnvqguCHhaAI544H/images/features/website/website-menu.png?fit=max&auto=format&n=SnvqguCHhaAI544H&q=85&s=192665b9ce7e484b4c2a2d07c535b655" alt="Website Menu" width="2922" height="1390" data-path="images/features/website/website-menu.png" />
    </Frame>
  </Step>

  <Step title="Navigate to Website">
    In the Agent menu, click **Website** to view your website settings, URL list, and crawling options.

    <Frame>
      <img src="https://mintcdn.com/watermelon/gPQ-0zAmhgSTI442/images/Navigate-website-crawl.png?fit=max&auto=format&n=gPQ-0zAmhgSTI442&q=85&s=6150129f010e8d0e6e073b228f6d10f8" alt="Website Website" width="2878" height="1460" data-path="images/Navigate-website-crawl.png" />
    </Frame>
  </Step>

  <Step title="Fetch URLs">
    <Frame>
      <img src="https://mintcdn.com/watermelon/gPQ-0zAmhgSTI442/images/fetch-urls-website.png?fit=max&auto=format&n=gPQ-0zAmhgSTI442&q=85&s=cbed3e22579d58959c338130201598d0" alt="Website Sitemap" width="2880" height="1460" data-path="images/fetch-urls-website.png" />
    </Frame>

    You can add URLs in three different ways:

    * **Option A:** Add your sitemap (recommended)
      * This gives the **most complete** list of URLs. Enter your sitemap URL (without a trailing slash):
            <Check>
              **Example:**
              ✅ [https://website.com/sitemap.xml](https://website.com/sitemap.xml)
              ❌ [https://website.com/sitemap.xml/](https://website.com/sitemap.xml/)
            </Check>
    * **Option B:** Fetch URLs from your root domain
      * Add your homepage (e.g., [https://website.com](https://website.com)) and the system will attempt to **discover pages** across the site.
    * **Option C:** Add URLs manually
      * Use this for specific pages you want to include **without** crawling the whole site.

    After selecting the type of URL, click \*\*Fetch URL's. \*\*Fetched URLs appear in a **table**. This includes:

    * The **URL**
    * The **date** added
    * **Status** indicators
    * Toggles for **including/excluding** and for link-sharing

    Depending on your **site size,** fetching may take a few moments

    <Warning>
      If you reach the size limit of your Agent, you'll need to remove sources to **crawl**.
    </Warning>
  </Step>

  <Step title="Customize URL usage" stepNumber={4}>
    <Frame>
      <img src="https://mintcdn.com/watermelon/gPQ-0zAmhgSTI442/images/customize-url-website.png?fit=max&auto=format&n=gPQ-0zAmhgSTI442&q=85&s=ec52a2f487232f058553acab8edb9f23" alt="Website Customize Urls" width="2880" height="1460" data-path="images/customize-url-website.png" />
    </Frame>

    After a URL is fetched, you can choose at any time to:

    * **Include** → content will be **added** to the AI Agent
    * **Exclude** → page content will be **ignored**
    * **Delete URL** → Removing a URL also removes its **stored** content.

    These settings let you **fine-tune** which parts of your website your AI Agent uses.
  </Step>

  <Step title="Crawl Agent" stepNumber={5}>
    Once your URLs are prepared, select **Crawl** to update your Agent.

    <Frame>
      <img src="https://mintcdn.com/watermelon/gPQ-0zAmhgSTI442/images/crawl-urls-website.png?fit=max&auto=format&n=gPQ-0zAmhgSTI442&q=85&s=f5772d3f787b06beabe9a9920564ab2d" alt="Website Synchronize" width="2880" height="1460" data-path="images/crawl-urls-website.png" />
    </Frame>

    ### **Crawl statuses**

    * **Crawled** – Content added successfully
    * **Not Crawled** – Not processed yet
    * **Queued** – Waiting to be processed
    * **Excluded** – Skipped by your choice
  </Step>

  <Step title="Crawled information">
    After a URL has been crawled, you can click **Details** next to the URL to view the exact information that was extracted from the page. This helps you make sure the information on your website is part of your AI Agent’s knowledge.

    <Frame>
      <img src="https://mintcdn.com/watermelon/zuQ320efvzZzFYU4/images/website-details.png?fit=max&auto=format&n=zuQ320efvzZzFYU4&q=85&s=0e297d585c464779563a66e2ffc1f1c3" alt="Website Details" width="2878" height="1462" data-path="images/website-details.png" />
    </Frame>
  </Step>
</Steps>

### Important behavior notes

* The crawling process may take up to **24 hours** to complete
* You do **not** need to stay on the page or stay logged in; the crawling process will continue automatically
* Failed pages will be retried up to **50 times** before marking a URL as failed
* If crawling a sitemap or domain finishes **within seconds**, it may indicate access issues or a technically difficult site.
* If progress appears **stuck** at 90–100%, the system is still retrying a small number of slower or temporarily unavailable URLs.

<Warning>
  Be sure to **re-crawl after website updates** to refresh your AI Agent's knowledge!
</Warning>

## Troubleshooting

Common reasons a URL **cannot** be crawled include:

* robots.txt **restrictions**
* The page **blocks** crawlers.
* Incorrect or **inaccessible** URLs
* **Typos**, wrong protocol, or pages that do not load.
* **Anti-bot** protection
* **CAPTCHAs** or bot detection systems block access.
* Server **errors**
* 404, 500, 504, or temporary **downtime.**
* IP or geographic **restrictions**
* Some sites **block** scraping or certain regions.
* Server **overload**
* Too many requests can cause **timeouts.**

Try correcting the URL, batching crawls, or retrying later.
