04. Initiating a Test Crawl

Once a seed has been identified and added to a collection, the archivist should initiate a test crawl of the seed. Test crawls are important because seeds, if improperly scoped, could result in superfluous captured content, using an unnecessarily large portion of the Bentley's document and data budgets, or in content not being captured that ought to have been, resulting in a loss of important contextual information. To start a test crawl, locate a seed in the relevant collection's seeds list, select the check box next to the seed URL, and click the Run Crawl button. [Note: Test crawls for multiple seeds from the same collection can be run at the same time by selecting each of the relevant check boxes before clicking Run Crawl]

A pop up will appear with options for configuring the crawl type, document limit, data limit, and time limit for the select seed URL(s).

Crawl Type: Select Test Crawl

Doc. Limit: Leave blank. As this is a test crawl, no documents will be captured.

Data Limit: Leave blank. As this is a test crawl, no data will be stored.

Time Limit: Set to a time that is consistent with how long the seed will be crawled during its regularly scheduled crawl frequency (5 days for Annual and Semiannual, 3 days for other frequencies). Crawling for a shorter or longer period of time may result in document and data counts that do not accurately reflect what will be captured in production.

Click Crawl. Once the test crawl has finished (after either all documents have been crawled or the allotted time has passed), proceed to Evaluating a Crawl Report for instructions on what to look for in a crawl report and how to appropriately scope a seed based on the results of the crawl report.

Page updated

Report abuse