5D. Check Cross References
In Toolbox, references are not required to exist in the database, but in FLEx they are. If you import an SFM file with references that don't point at something else in the database, you may be surprised or disappointed at the results. To prevent that, please read the following considerations.
You can check cross references using the script check_cf_special.pl
This script uses an .ini file for specifying the input/output files, and also which markers to check.
This script reports references without targets.
It converts targetless ones to use a different marker (\sy_NF, \cf_NF, etc.).
Options to consider: The script currently also reports references that point to items that have homograph numbers, and marks these with _NF so the user can make the link in FLEx to the item with the correct homograph number. An alternative would be to not modify the markers for these (leave them as \sy, \cf, etc) so they are imported. FLEx will link to "the first homograph" or "the first sense". This may or may not be what the user wanted. There is a script that can add a note to these entries to flag that this guess was made. After the import, the user can search for this note, verify the links are pointing to the right entry or sense, and fix the ones that are not. [Need to add a link to CheckRefs.pl with instructions.]
When importing, map these to a custom field (e.g., SynonymNotFound, CompareNotFound) so the linguist can easily find them and decide on the best way to fix them. An alternative would be to map all of them into a single "References Not Found" custom field, and include the marker in the field contents. For instance: References Not Found: \sy bolok; \an bala; \cf loman This results in fewer custom fields.
Prepare instructions for the linguist, explaining how to filter for non-blanks in these fields, and then adjust the appropriate data.
Check for double refs: In FLEx it is okay for more than one entry to point at the same subentry or variant, but the import process can't handle that. At this stage, it is important to check for that.
At the command line, use a command like this: egrep "^.va " dictionary.db | sort | uniq -c | sort -r > va-sorted.txt
That will find all values for va, sort the results with counts, and then take those results and sort them again (in reverse order), placing them into the file va-sorted.txt.
Open that file, and find all places where the count (first digit on the line) is greater than 1. If these are erroneous, fix them. If they are intentional, a process will be needed to handle them. [Need to write more about this. Two options: keep only one of the references and add the other link manually in FLEx after the import, or use a process developed by Wes (need more info)]
Repeat the same procedure for \se, and for any other marker that is used for variants or subentries.
If any record has two \mn fields, that also is a case of "more than one parent" for a subentry. For the import to succeed, there needs to be only one \mn per entry, and the other needs to be added by hand after the import. [Or Wes has a process to help with this--need link.]