The role involves the implementation of data structures, and adhering to best practices in data modeling, processes, and technologies. It includes designing, developing, and testing business intelligence (BI) solutions, encompassing databases, data warehouses, queries, views, reports, and dashboards. Additionally, the responsibilities encompass data conversions, imports, and exports within and between internal and external software systems. Integration of BI platforms with enterprise systems and applications is another key aspect. The position also focuses on enhancing the performance of business intelligence tools through the definition of filtering and indexing criteria for data. Documentation of new and existing models, solutions, and implementations is a crucial part of the role. Finally, it involves working on workflows, Glue Jobs, ETL automation, data migrations, dashboard manipulation, Redshift data warehousing, data lake solutions, leveraging various AWS services, and incorporating algorithmic intelligence.
This project has a number of modules out of which I am working on the ETL module. In which we have information in AWS Athena that we call a standardized zone. My job is to write complicated SQL / PostgreSQL queries in Python using SQLalchemy, extract information from the Standardized zone, perform some analysis and then transform information using the Python Pandas library and save it to the PostgreSQL databases.
BitMovio is a blockchain-enabled video entertainment marketplace, providing a decentralized Netflix-Twitch-Indiegogo-like platform that enables content creators, consumers, and financiers to transparently and instantaneously exchange value and attention, without editorial censorship.
Data lake platform project, In which data comes from multiple data sources. Data lake solution with efficient consumption end-point has the following components:
Data Ingestion, Data Management & Data versioning, Data Processing, DevOps
The developed project related to the development of data lake which contains many modules including Data Ingestion, Data Quality, Profiling, Data Management & Data Processing, Data versioning, Parquet conversion, Hive (Tables), Spark processing, application of compression, data ingestion in Redshift, Data Lineage.