b. Modifying Crawl Scope

If the results of the test crawl indicate that the current crawl scope is capturing too much or too little content, the crawl scope will need to be modified. Crawl scopes can be modified to block or include certain hosts or sub-domains of a host, URLs that match a regular expression or SURT rule, or can be set to limit crawls for certain frequencies to only capture up to a data limit, only archive PDF files, or lengthen or shorten the time limit for crawls. Scoping rules can be applied at the collection level or seed level.

To modify the crawl scope for a particular host at the collection level (note: this will apply the given rule to all seeds in the collection), click the Collection Scope tab on a collection page.

To add or edit a collection scope rule, navigate to Add Collection Scope Rule and select a collection scope rule type from the drop-down menu:

Existing collection scope rules for a given collection can be found below the Add Collection Scope Rule section. Toggling the controls allows you to activate/deactivate the rules when necessary. All changes to collection rules within a given collection will impact all seeds within the collection unless overridden by rules added at the individual seed level.

To modify crawl scope rules at the individual seed level, navigate to the desired seed and click its Seed Scope tab. The Add Seed Scope Rule menu allows archivists to assign many of the same rules to seeds as the Add Collection Scope Rule menu allows for hosts. See above for a list of rules that can be used to scope crawls at the seed level. Note that "Block Hosts" and "Add Document Limit" are only available when scoping crawls at the collection level.

Certain types of seeds with known complications (e.g., Facebook pages, Wix sites) require special scoping rules to be captured effectively. The Archive-It Help Center offers recommendations for scoping a variety of known complicated seed types. 

If issues with capturing a seed persist after modifying scoping rules, consult with the Archivist for Metadata and Digital Projects on next steps.