Portfolio

Expert System for Blood Management

I developed an expert system implementing official guidelines for blood transfusion in order to help doctors to asses the patient state faster. The system served as a REST API which accepts blood test results (complete blood count) and returns a guidelines for the case. The system returns also a detailed explanation and reasoning behind the decision.

Toolbox: Python, durable_rules, Linux, Flask, docker

https://aidadx.io

Live Contract Trading Data Harvesting

For an advisory company focusing on investments in cryptoassets I designed a system harvesting live/stream data on cryptocurrencies trades and positions from different trading platforms. In this project for some platform I was also computing online estimation of traders positions.

Toolbox: Python, Apache Nifi, exasol, Linux, different APIs, web sockets

Predicting Blood Transfusion Needs

I developed a Machine Learning system which assists medical doctors to make decision about blood transfusion. The model trained on historical medical records and curated doctors decisions. The model is served as a REST API which accepts patient's data and returned a decision and it's probability.

Toolbox: Python, tensorflow/keras, Linux, Flask, docker

https://aidadx.io

Sentiment Data Harvesting

For a client I designed system which collects current and historical data (search volume) from google trends for sentiment analysis. The system maintains an up-to-date database with popularity index as features for an ML model. This system fetches data for given assets (keywords) from the source, scales it to short and long term trends pushes to a database.

Toolbox: Apache Nifi, Python, exasol, Linux, google trends, docker

SolarData - Simulation of a photovoltaic panels

This platform allows to simulate the amount of energy produced by a photostatic installation. It extracts weather data from grib files (binary format of weather data). Based on this data and geographic coordinates the system computes how much energy a PV installation is able to produce for particular weather conditions and location. Finally, it generates a graphical report to present this data on charts.

Toolbox: Python, pandas, numpy, eccodes, pvlib, flask, plotly, MongoDB, docker

https://solardata.pl/

ETL system for cryptoassets transactions data

For an advisory company focusing on investments in cryptoassets I designed an ETL system collecting relevant data from public block explorers. The system extracts data for multiple cryptoassets, aggregates data about transactions, and loads into a database. The system runs for long-term and it appends new data. Moreover, it is designed to be resilient to errors (e.g., network issues) and to avoid overloading servers providing data.

Toolbox: Python, pandas, numpy, exasol, Linux, different APIs

Anomaly detection in business intelligence smart metrics

I designed and implemented smart metrics infrastructure and algorithms for business intelligence systems. My solution dynamically fetches data from various data sources and loads it to InfluxData. Following, the solution does online analysis of such data, detects anomalous values, and tigers actions (e.g., notifications) based on detected anomalies.

Toolbox: InfluxData, TICK stack, Python, NodeJS, MongoDB, JupyterLabs, datastream.io, statistical methods for anomaly detection (moving average, normal distribution), unsupervised outlier detection (local outlier factor estimator)

DiploCloud is an efficient and scalable distributed RDF data management system for the cloud. Contrary to previous approaches, DiploCloud runs a physiological analysis of both instance and schema information prior to partitioning the data.

https://exascale.info/projects/diplodocus-rdf/

dipLODocus[RDF] - Short and Long-Tail RDF Analytics for Massive Webs of Data DiploCloud: Efficient and Scalable Management of RDF Data in the Cloud

TripleProv is an in-memory RDF database capable to store, trace, and query provenance information in processing RDF queries. TripleProv returns an understandable description of the way the results of an RDF query were derived; specifically it gives a detailed explanation which pieces of data and how were combined to produce the answer of a query. Moreover, with TripleProv you can tailor query execution with provenance information. You can input a provenance specification of the data you want to use to derive the answer. For example, if you are interested with articles about “Obama”, but you want the answer to come only from sources attributed to “US News”.

TripleProv: Tracking and Querying Provenance in Linked Data

https://github.com/MarcinWylot/tripleprov_demo

https://exascale.info/projects/tripleprov/