Collaboration Ideas

Contact person: Boris Lorenc, boris.lorenc@blresearch.ee

Potential areas of activity: Responding to Intrastat questionnaires is a large source of administrative burden to businesses. So, if one could reduce that administrative burden by half, that would be a really meaningful burden reduction.

Starting with January 2022 data, the member states are obliged to submit their exports data to a central Eurostat hub. Thus, in addition to a member state having access to own data, collected by their Intrastat questionnaire, on imports from another member state, the state can now find on the hub the other member state's exports data to that state. The two sets of data describe - in general - the same physical reality. One may call these two datasets symmetrical.

Starting from 2025, the member states can decide to stop collecting their own imports data and instead publish their EU imports based on the hub dataset only.

Interesting methodological questions exist both in the period up to 2025 and afterwards. Among them I mention the following here.

It has been well known for a long time that there are differences (called asymmetries) in the data. Among the causes found: improper NACE code, member state, possible currency conversion, differing threshold values, possible imputation/weighting, misunderstanding of rules on reporting or the rules' lack of clarity, etc. Until now symmetric dataset have not existed on the large scale, but now they do. Their existence enables a full-scale study of the asymmetries. But, to do that, the two datasets need to be collated appropriately. Given that the transactions data may have errors (misstated arrival state and CN8 codes, and goods value) and that they may be aggregated by the reporting business or not, both database matching and statistical matching face difficulties. New data science methods may be needed.

What kind of models should be built, and under what circumstances should they be used after 2025 to produce official import statistics of a member state? For how long?

Provided that it is possible, should the efforts be put in reducing the asymmetries in the period up to 2025? This question is relevant if models are built which depend on long-term historical data.

Does the process of production of import statistics based on hub data involve an editing (data verification) step, and if it does, then how is that step going to be implemented?