Probably this Data specification page and the Data provenance page should be integrated or merged somehow, or perhaps at least distinguished from each other. It should probably also be made consistent with the Statistics Show page as well.
The notion of data on this page seems to refer to things that are actually measured, as opposed to a slightly more general conception on the Data provenance page where it could also be something arising from an expert elicitation process. Another possible way to distinguish is to think of data values as different from parameter values estimated from data values and assumptions, which obviously require justifications.
Marco's data specification/documentation/searching scheme
what's on the list of possible and necessary elements?
measurand
kind of value (count, continuous point-value)
value (point-value, interval, distribution, p-box, other)
units
precision
uncertainty
reference to calibration data
who collected (or observed, or generated, or calculated)
measurement device, sensor, protocol
where
when
auditor
range of possible values
conditions & circumstances of collection
reason for collection
who/what paid for collection
acknowledgements (permissions, help, guidance, advice, etc.)
previous record like this one
next record like this one
list of data sets of which this measurement is a part*
etc.
*Data set structures: pairs, vectors, replicate groups, temporal sequence, assemblages to make datasets and meta-analyses
Addressability: presumably some uniform resource identifier (URI) scheme compatible with URLs, Orcid, Semantic Web, Wiktionary, Resource Description Framework, Web Ontology Language
Searchability: to identify records collected by someone, or at an institution during a given period, or using a protocol, etc.
Data integrity: checksums, blockchain security, correctness & anti-tampering attestation; if the checksums fail, it should scream; if there's no anti-tampering scheme in place, the user should always know this
Empty fields: if a field is empty, we might allow the convention of inheriting from the previous record for units, collector, reason, etc. Missing values are presumably to be treated as missing data, but it might be helpful to distinguish values that were supposed to be collected but weren't or values that were collected that are now missing or have been corrupted or have been marked as suspicious or something, from values that cannot be collected because the measurand doesn't exists, they are NA (not applicable), etc.
Cautions and flags: display conventions using fontstyle or font or background color to indicate important distinctions such as integrity-assured data versus unsecured data, human-collected versus mechanical measurements versus data from electronic sensors, or values collected by a particular person, etc.