Open Web to ClueWeb12 Mapping
The CWI team has provided a mapping of Open Web URLs to ClueWeb 12 Doc IDs for URLs submitted to the track.
The file is a tab separated file with three columns:
The first column is the URL.
The second column is either "exact" or "normalized" depending on whether the match is a exact string match or if the URL had to be normalized to find a match (by removing "http://", "https://", "www", and trailing slashes.
The third column is the ClueWeb12 ID.
If you have questions about this mapping contact the organizers of the track or Thaer Samar.