Projects

Fumbani Banda

Github Repository

pYTHON PACKAGE FOR USGS DATA

U.S. Geological Survey(USGS) National Geospatial Program developed 3D Elevation Program (3DEP) to respond to the growing needs of high-quality topographic data and for a wide range of other three-dimensional (3D) representations of the Nation's natural and constructed features. 3DEP informs critical decisions that are made across our the world every day that depend on elevation data ranging from immediate safety of life, property and environment to long term planning of infrastructure projects.

The data collection of the 3DEP program consists of Lidar Point Cloud (LPC). The lidar point cloud files contain all the original lidar points collected, with the original spatial reference and unit's preserved. 3DEP data provides foundational elevation information for earth science studies and mapping applications.

PYTHON PACKAGE FOR USGS DATA

Pipeline:

PDAL- was used to build a custom pipeline that fetches and preprocesses lidar data from USGS API.
las and tif- The pipeline generated a .las and .tif file after execution
The data from the .las file was used to generate a 3D terrain of the area while the .tif file was used to estimate the area covered.

Applications

The 3D rendered showed which areas of the terrain have high elevation and which areas have low elevation. This is useful is determining how water flows through the terrain.

Github Repository

amharic speech to text model

Worked in a group of 7 people. All from different countries to build an Amharic(language in Ethiopia) speech to text model. Three architectures were used: simple Recurrent Neural Network, Convolutional Bidirectional Recurrent Neural Network and Residual Network. The Residual Network model performed better in making correct speech to text translation than the other two models.

AMHARIC SPEECH TO TEXT MODEL

Approach:

Audio files were preprocessed by normalizing the audio to get maximum volume of the audio signal without changing the dynamic range.
Audio files were trimmed to remove silence.
Silent intervals were removed from the audio files
Some transcriptions for the audio files had incomplete words. To solve this problem, the incomplete Amharic word was run through an Amharic dictionary to generate a complete word.
Outliers were also removed from the audio files.

Model Results:

Out of the three architectures: simple recurrent neural network, convolutional bidirectional recurrent neural network and residual network. Residual network performed very well in translating Amharic speech to text compared to simple recurrent neural network and convolutional bidirectional recurrent neural network

Github Repository

Binary CTF Challenge

Binaries are machine code for a computer to execute. Binary exploitation is the process of subverting a compiled application such that it violates some trust boundary in a way that is advantageous to you, the attacker.

Binary exploitation comes down to finding a vulnerability in the program and exploiting it to gain control of a shell or modifying the program's functions.

This project used bash and python scripting to automate processes. GDB-peda was used for disassembling and objdump was used to identify functions/methods within the program.

Binary CTF Challenge

Approach:

Have a solid understanding of how binaries ( Linux executables) work and how they interact with the computer memory.
Used gdb-peda, python 2.7, bash and objdump to exploit the binary.
All the functions inside the binary were checked and analyzed.
Vulnerable functions were identified and exploited.

Results:

Successfully managed to exploit the vulnerability in the binary.

Github Repository

Twitter data analysis

This project focused on cleaning raw twitter data and saving it to a csv file. The data in csv file was uploaded to MySQL database.

Worked on fixing bugs, writing unittest and integrating Travis CI to automatically run unittests once there is push to the GitHub repository.

Github Repository

Data PIpeLine

Worked in a team of 7 people, to build a data pipeline that could generate Amharic audio file given Amharic text. The pipeline was built on AWS. Kakfa and Spark were used for scalability. Airflow was used as a scheduler to automate the data pipeline.

Data PIpeLine

Approach:

Flask was used to build a web app to display Amharic text to the user and record the audio file
The audio file was sent to Kafka cluster
Airflow was set to call Spark after 10 minutes and read the audio files in Kafka cluster to s3 bucket for storage

Result :

Built a scalable data pipeline that is used to collect Amharic audio files from Amharic text and store the audio files in s3 bucket for further processing.

Page updated

Google Sites

Report abuse

Projects

pYTHON PACKAGE FOR USGS DATA

PYTHON PACKAGE FOR USGS DATA

Applications

amharic speech to text model

AMHARIC SPEECH TO TEXT MODEL

Binary CTF Challenge

Binary CTF Challenge

Twitter data analysis

Data PIpeLine

Data PIpeLine

For more information on hiring from 10 Academy, contact team@10academy.org