Digital Advisory Services for European Fields
Delineated image of a Poland Field
Our development journey moved from exploratory analysis to a sophisticated segmentation and labeling pipeline.
Satellite Choice: After comparing Landsat and ESA options, we selected Sentinel-2. Its Near-Infrared (NIR) sensor is superior for contiguous crop and delineation detection.
Retrieval Logic: We adapted techniques from the EveryField master’s thesis to manage image bands and download protocols.
To solve initial boundary issues, we integrated Meta’s "Segment Anything" (SAM) model.
The Masking Process: We use SAM to generate field masks, followed by logic checks to prevent overlapping or faulty polygons.
Stability: This ensures clean separation between adjacent fields, which is critical for accurate data attribution.
We are currently training a Long Short-Term Memory (LSTM) model to classify crop types based on the following steps:
Identify Area of Interest (AOI): Locate a 10km² subset with the highest density of LUCAS survey points.
Centroid Calculation: Determine the center point of these survey locations.
Time-Series Download: Retrieve Copernicus images for the entire growing season.
Subset Selection: Focus on a 5km x 5km square centered on the survey centroid.
Field Segmentation: Process the RGB subset through the "Segment Anything" algorithm to isolate fields from non-agricultural features (rivers, cities, etc.).
Feature Engineering:
Calculate NDVI (Normalized Difference Vegetation Index) and NDWI (Normalized Difference Water Index).
Formula for NDVI:
$$\text{NDVI} = \frac{\text{NIR} - \text{Red}}{\text{NIR} + \text{Red}}$$
Aggregate these scores as monthly averages per field.
Labeling: Assign crop types (e.g., Sugar Beet, Wheat) from the LUCAS dataset to the segmented polygons.
Output: A comprehensive data frame where each row represents a specific field, featuring monthly vegetation indices as inputs and crop types as targets for the LSTM model.
This demonstration focuses on a single subset to showcase the logic required for a future large-scale batch processing pipeline.
Feature
Observation
Input
RGB Subset (Left)
Segmentation
Meta’s Pytorch Segment (Right)
Target Labels
Red dots (e.g., Common Wheat, Sugar Beets, Unknown Cereal)
Regional Performance
High accuracy in Southern Poland due to smaller field sizes.
While the 5km x 5km subset provides excellent segmentation, we observed that increasing the subset size to 10km x 10km leads to sub-optimal results. Our current focus remains on maintaining high precision at the 5km scale to ensure data integrity for the LSTM training phase.
Here is the refined, structured version of your project summary. I have organized the technical requirements and future milestones into a clear, professional format.
Our analysis leverages the unique temporal "signatures" of different crops to feed the LSTM (Long Short-Term Memory) model.
Winter Wheat: Characterized by two distinct cycles—summer wheat (spring cycle) and winter wheat (fall cycle).
Root Crops & Others: Crops like sugar beets follow a single-cycle growth pattern, typically harvested in late September or early October.
By identifying these specific temporal patterns, we can establish a robust signature analysis to classify various European crops and derive secondary agricultural factors.
The current workflow serves as a proof-of-concept. To reach a production-ready state, we must scale our data processing and infrastructure.
Each satellite tile ($110\text{km} \times 110\text{km}$) occupies 1.2 GB. To train a high-accuracy LSTM, we must process hundreds of these tiles into $5\text{km} \times 5\text{km}$ subsets.
Estimated Monthly Cost: ~$150 for a dedicated Virtual Machine.
Storage Requirement: Minimum of 3 TB of cloud storage.
Yield Data Acquisition: Find a yield estimation dataset with field-level granularity.
LUCAS Synchronization: Download contiguous tiles matching the LUCAS dataset coordinates (two priority areas identified).
Model Training: Train the LSTM using improved masks from the "Segment Anything" model once storage is upgraded.
Crop Code Integration: Expand the model to recognize and classify all relevant European crop codes with high accuracy.
Once the fields are classified and delimited, we can move toward predictive analytics:
Yield Extrapolation: Calculate expected harvest based on polygon surface area and historical data.
Input Requirements: Determine the necessary amount of fertilizer and seed.
Financial Planning: Help farmers calculate the short-term loans required for seeds, which are typically repaid post-harvest.
We have demonstrated the feasibility of segmenting and monitoring agricultural fields at a micro-level using free and open-source tools. While commercial APIs like ResUNet SentinelHub (approx. €300) offer similar capabilities, our open-source protocol provides a scalable, cost-effective stepping stone for the agricultural community.
Precision: Field delineation allows for exact surface area calculations.
Integration: Masks can be tied to secondary LSTMs for fertilizer and health monitoring.
Insight: Yields can be calculated using a combination of field size and NDVI (Normalized Difference Vegetation Index).
Dataset
Description
CORINE Land Cover
Pan-European raster data for land usage.
Global Land Cover
Pixel-based data masks for Europe.
GUS (Stat.gov.pl)
Crop yield data by Polish provinces.
LUCAS Dataset
Primary land usage survey information.
https://github.com/OmdenaAI/cracow-poland-rural-farmers