Home

Next-generation science and engineering need their products, i.e., their documents, calculations, simulations, graphics and arguments, to be different in fundamental ways. They must be

- distributed,
- collaborative,
- co-created,
- reproducible,
- recoverable,
- secure and lawful,
- archived automatically,
- checkable (uncertainties, units, typing, range errors, etc.),
- re-doable (different parameters, different assumptions, augmented or better data), and
- highly accessible to support both arduous peer review and even citizen science, yet
- resistant to vandalism, abuse, and co-option by malware.

Many reasons have been articulated under the Open Science Initiative that would demand such changes. But, even without these lofty appeals and prudent arguments, the plain practicality of these changes becomes evident in the overwhelming scope and interconnectedness of science and engineering today. Our technological world built on proliferating proposals, assessments, comparisons, and calculations, and the workability of this world depends more and more on their clarity and transparency and on their reproducibility and demonstrable correctness.

This website facilitates a collaboration on several related ideas:

§ Document abstraction. A semantic document markup language for report writing can make relevant gray literature accessible and useful. You can't expect a busy person to read an abstract; it needs to be digested for them. With a markup language for document semantics and argumentation structure, a computer could read a text and understand (without conscious cognition) its main point, ancillary points, subplots, arguments, lines of evidence, the evidence itself, its sources and provenance, definitions, examples and counterexamples, and other germane aspects that allow it to fish up answers to questions such as "why?", "so what?", "what makes you think so?", and "how do you know?" that are much richer and more relevant than would be achievable using any scheme based on text searching alone.

§ Sharing platform for scientific models. Marco De Angelis envisions a platform for sharing scientific and engineering models and analyses for peer review and citizen science with a user-interface generator for scientific and engineering codes that enables modelers to share their models with others without bothering with designing a user interface. The platform handles wrangling data conversions, checking dimensional soundness, and even elementary uncertainty analysis. It automatically archives submissions and executions, tracks provenance of data sources, and publishes justifications for modelling choices about parameter values and assumptions. Such a scheme could help to modernise peer review and establish standards for accessibility that could improve transparency and scientific review generally. Tracy Kijewski et al. have described a geospatially explicit platform for managing real-time risk assessment, situational awareness and resilience planning by local governments during hurricanes and storms featuring cloud-based computation of efficient surrogates of meteorological models, pre-event caching of model results, collaborative visualisation on maps, and privacy and security through role-designated access.

§ Self-documenting statistics. Statistics Show is an interactive checklist and documentation tool for statistical analyses. The idea is that, several months after you did any analysis (or, worse, a series of analyses), it’s often very difficult to reconstruct exactly what you did. This tool assembles all the essential details and inputs together into one place and in a standard format. It can automatically create (HTML and RTF) files that document the analysis for use in reports. The software would be useful to comply with ISO9000 record keeping rules or with OMB's “peer review standards” guidance.

§ Memo minder. The problem with institutional memory is that it degenerates as the humans in those institutions forget and retire. The office workers we all have to deal with require adherence to certain rules although no one in the office can quite explain why those rules exist, saying unhelpfully “That’s the way we do it.” Sometimes the loss of history is beyond mere inefficiency. Today’s NASA could not go to the moon without a substantial and costly ramp-up effort to recover the expertise of the engineers now in retirement homes or already lost to the world. Peggy Sutherland of Long Island Science (formerly of Brookhaven National Laboratory) suggests using an elaborated computerized record system to answer questions "Why do we do this?" (and "When can we stop?") by keeping track of institutional motivations, promises, justifications and their champions, reasons legal and otherwise for the various ways and rules governing the conduct of business, retention and destruction of records, etc. [The allusion is to the memo minder in the Tom Cruise breakout hit film Risky Business, and by subreference to The Jetsons' robotic organiser, not to the misnamed in-home dementia aid.]

§ Data side-collection. Conal Brown of ABB suggests using text analytics and natural language processing to analyse and categorize engineering inspection records (which his company has boatloads of). The company expends a considerable amount of effort to collect and store this information, but it is difficult to access after collection and it is very difficult to survey for trends in the data, which would be helpful to the discipline of engineering as a whole if it were available. The information could be more accessible to its owners and more available to researchers by making small changes in how the information is recorded and stored.

§ Data provenence. Good practice in engineering and progress in science depend on having a reliable scheme to manage the collection, curation and promulgation of data along with details about that collection, curation and promulgation. Losing track of where the data came from and who collected it can be as great a loss as from losing track of the measurement uncertainty associated with the data, which is arguably almost as great a loss as losing track of the units the data were measured in. The interpretation of data depends on its context. Formalizing a system to manage an inclusive conception of data provenance is key to managing the torrent of data coming in the Internet of Things. The provenance identifies the origin of each number and the details needed to understand its nature and justification that can be used to trace its chain of emendations, if any, that have been made to it as an explicit history that shows all prior values. The meaning of the word ‘provenance’ here is as it is in the art world. The provenance includes extended research credits for non-author contributions (sensu Emogor et al. 2025. Crediting non-author contributors in scientific publishing. Nature Human Behaviour. https://doi.org/10.1038/s41562-025-02123-7). The provenance of a number answers essential questions about it such as ‘Where did this come from?’ and ‘How can we trust it is authentic?’.

§ Version control and trust. The trustworthiness of technical arguments and reliability of the answers they produce depend not only the transparency of the arguments, traceability of the analyses and the data on which they are based, and demonstration of correctness of the analyses, but also it requires that the data they depend on are legitimate, possibly reproducible, and that the assumptions are still justified, or at least were justified at the time they were made. Guaranteeing data in the past was a matter of good faith and honor among researchers in the world before climate deniers, anti-vaxxers, Flat Earthers, and Jugalos played such as prominent role in the public discourse. But it is not only today's inter-individual and institutional mistrust, but also the sheer volume of data and growing importance of algorithms in our technological world that demand a more secure system that automatically protects against mistakes, vandalism, and lies. Various technologies for version control and trust schemes have been suggested, including models used by Wikipedia (a comprehensive set of rules and an army of human and robotic editors), pre-publication announcements of experimental design, Github's open-history publication, and blockchain-mediated publishing for guaranteed authenticity. Bob Williamson of Australian National University shared his proposal "Mnemosyne–End-to-end Trustworthy Data Analytics for Science, Industry and Society" (file:///C:\Users\ferson\Google%20Drive\Proposals\Rob%20Williamson%20%20MnemosyneRedacted.docx).

§ Sharing platform for data science. Jason O'Rawe envisions a platform for open-source data science, with version control, snippets, commentary, collaborative editing and testing, linkability, and features for interoperability (file:///D:\Google%20Drive\slides\jason.pptx).

Other threads of our research interests may also join together here, including possibly

Numbers of the future,
Humane algorithms,
Collecting data while maintaining privacy,
Bad data and various uncertainty projection projects,
Numbers of the past (inspection records),
Interpreting hedge words and verbally encoded numerical expressions,
Expert elicitation,
Risk governance and regulatory compliance.

Page updated

Google Sites

Report abuse