Portfolio

SolarData - Simulation of a photovoltaic panels

This platform allows to simulate the amount of energy produced by a photostatic installation. It extracts weather data from grib files (binary format of weather data). Based on this data and geographic coordinates the system computes how much energy a PV installation is able to produce for particular weather conditions and location. Finally, it generates a graphical report to present this data on charts.

Toolbox: Python, pandas, numpy, eccodes, pvlib, flask, plotly, MongoDB, docker

https://solardata.pl/

ETL system for cryptoassets transactions data

For an advisory company focusing on investments in cryptoassets I designed an ETL system collecting relevant data from public block explorers. The system extracts data for multiple cryptoassets, aggregates data about transactions, and loads into a database. The system runs for long-term and it appends new data. Moreover, it is designed to be resilient to errors (e.g., network issues) and to avoid overloading servers providing data.

Toolbox: Python, pandas, numpy, exasol, Linux, different APIs

Anomaly detection in business intelligence smart metrics

I designed and implemented smart metrics infrastructure and algorithms for business intelligence systems. My solution dynamically fetches data from various data sources and loads it to InfluxData. Following, the solution does online analysis of such data, detects anomalous values, and tigers actions (e.g., notifications) based on detected anomalies.

Toolbox: InfluxData, TICK stack, Python, NodeJS, MongoDB, JupyterLabs, datastream.io, statistical methods for anomaly detection (moving average, normal distribution), unsupervised outlier detection (local outlier factor estimator)

DiploCloud is an efficient and scalable distributed RDF data management system for the cloud. Contrary to previous approaches, DiploCloud runs a physiological analysis of both instance and schema information prior to partitioning the data.

https://exascale.info/projects/diplodocus-rdf/

dipLODocus[RDF] - Short and Long-Tail RDF Analytics for Massive Webs of Data DiploCloud: Efficient and Scalable Management of RDF Data in the Cloud

TripleProv is an in-memory RDF database capable to store, trace, and query provenance information in processing RDF queries. TripleProv returns an understandable description of the way the results of an RDF query were derived; specifically it gives a detailed explanation which pieces of data and how were combined to produce the answer of a query. Moreover, with TripleProv you can tailor query execution with provenance information. You can input a provenance specification of the data you want to use to derive the answer. For example, if you are interested with articles about “Obama”, but you want the answer to come only from sources attributed to “US News”.

TripleProv: Tracking and Querying Provenance in Linked Data

https://github.com/MarcinWylot/tripleprov_demo

https://exascale.info/projects/tripleprov/