Cryptocurrency and stock trading engineering: A scalable back testing infrastructure and a reliable, large-scale trading data pipeline
Table of contents
- Introduction
- Overview
- Objective
- Data
- Requirements
- Install
- Using the application
- Frontend
- Backend
- Screenshots
- Notebooks
- Scripts
- Strategies
- Test
- Authors
Data is everywhere. In order to get the best out of it, one needs to extract it from several sources, make required transformations and load it to a data warehouse for further analysis and explorations. This is where ETL data pipelines come to use.
Trade involves the transfer of goods and services from one person or entity to another, often in exchange for money. Economists refer to a system or network that allows trade as a market.
A cryptocurrency is a digital or virtual currency that is secured by cryptography, which makes it nearly impossible to counterfeit or double-spend. Many cryptocurrencies are decentralized networks based on blockchain technology—a distributed ledger enforced by a disparate network of computers.
A defining feature of cryptocurrencies is that they are generally not issued by any central authority, rendering them theoretically immune to government interference or manipulation.
A startup called Mela (our client for this week’s project) wants to make it simple for everyone to enter the world of cryptocurrencies and general stock market trade. It also wants to give investors a reliable source of investment while lowering the risk associated with trading cryptocurrencies.
Although the past performance of any financial market is never a reliable indicator of the future, it is important to run backtests that simulate current and past particular situations as well as their trend over time. Having a clear understanding of the financial system, and stock market trading, and recognizing the complex data engineering systems involved in the crypto and general stock market trading systems are essential.
This objective of this project is very straightforward: design and build a robust, reliable, large-scale trading data pipeline, for both crypto and stock market trading that can run various backtests and store various useful artifact in a robust data warehouse system.
Users will be prompted with several different stock and crypto trading options and parameters. After processing these parameters users will then be provided with several back testing outputs using different strategies on the specific stock or crypto trading.
The data used for generating these back testing results comes from several historical data of stocks and crypto currency trades. These datasets are publicly available and can be found at yahoo finance and binance. You can read a brief description of what a K-line or candlestick data is here.
Basic features of the data sets:
- Date: The day the specific trade was recorded
- Open: The opening price of the trade at the beginning of the specific day
- High: The highest price of the trade for the specific day
- Low: The lowest price of the trade for the specific day
- Close: The closing price of the trade at the end of the specific day
- Adj Close: The closing price after adjustments for all applicable splits and dividend distributions
- Volume: The volume of the trade for the specific day
Pip
FastApi
Zookeeper
kafka-python
Apache kafka
backtrader and yfinance
React (nodejs)
Apache airflow
Python 3.5 or above
Docker and Docker compose
You can find the full list of requirements in the requirements.txt file
We highly recommend you create a new virtual environment and install every required modules and libraries on the virtual environment.
- First clone this repo to your local machine using the command below
git clone https://github.com/TenAcademy/backtesting.git
cd backtesting
pip install -r requirements.txt
- One can start using the application by first running the front and back ends.
- You can run the front-end by running the following command on the terminal
- A more detailed instruction regarding the front-end can be found at presentation/readme.md file.
cd presentation
npm run start
- You can run the back-end by running the following command on the terminal
cd api
uvicorn app:app --reload
- After running the front end, one can simply go over to the browser and type in http://localhost:3000. or click this link
- A page similar to this will appear.
- After creating an account, or if users have already an account, they can simply click on the 'No Sign in' button
- After clicking on the 'No Sign in' button, a page similar like this will appear
-
Enter your emil and password and click on the 'No Sign in' button.
-
Users will then fill in the parameters that are listed to get the back testing results they want, and click on the 'Run Test' button.
The front end application can be found here in the presentation folder
The back end application can be found here in the api folder
The detailed use and implementation of the pipelines using Apache Airflow, pipeline summary and interaction, kafka clusters, interaction with the topics on the kafka clusters, front-end images and usage can all be found in this screenshots folder as image files.
All the notebooks that are used in this project including EDA, data cleaning and summarization along with some machine learning model generations are found here in the Notebooks folder.
All the scripts and modules used for this project relating to interactions with kafka, airflow, and other frameworks along with default parameters and values used will be found here, in the scripts folder.
All the back testing strategies and algorithms are found here in the strategies folder.
All the unit and integration tests are found here in the tests folder.
👤 Birhanu Gebisa
👤 Ekubazgi Gebremariam
👤 Emtinan Salaheldin
👤 Fisseha Estifanos
👤 Natnael Masresha
👤 Niyomukiza Thamar
Give us a ⭐ if you like this project, and also feel free to contact us at any moment.