Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from Mongo DB to Kafka #15

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mohamedawnallah
Copy link
Member

@mohamedawnallah mohamedawnallah commented Jan 11, 2023

Description

This pull request includes required changes to migrate from Mongo DB to Kafka to the following files:

  • docker-compose.yaml
  • requirements.txt
  • dataManipulation/stream.py
  • dataManipulation/stream_batch.py
  • dataManipulation/downloadData.py
  • analysis/rttAnalysis.py
  • analysis/routeAnalysis.py
  • analysis/plot.py
  • logs/.gitignore

The specific changes I made regards the codebase structure:

  • Moved all logs to a unified logs directory

Testing

I have tested these changes using Docker compose to set up a Kafka cluster with three brokers and one apache zookeeper. To test the changes, I did the following:

  1. Created the Kafka cluster on my local machine using docker-compose up -d command making sure I'm in the working directory where docker-compose.yaml file located.
  2. Installed Kafka Offset Explorer a GUI tool for managing apache Kafka and then configured it according to my Kafka cluster configurations made sure it connected successfully before moving to the next step.
  3. Installed the python third party libraries needed in the code using pip3 install -r requirements.txt command.
  4. Added current working to directory to PYTHONPATH environment variable by using export PYTHONPATH=$PWD command.
  5. Replaced all bootstrap servers in the files I want to test with these hostnames for kafka brokers 'bootstrap.servers': 'localhost:29092,localhost:39092,localhost:49092'Notice these ports are hardcoded in the docker-compose.yaml file.
  6. Executed (but not limited to) the stream_batch.py that generate streaming data into Kafka topics in the Kafka Cluster.
  7. Verified that the data was loaded successfully by connecting to the Kafka cluster using the Kafka Offset Explorer tool.
  8. Re-checked log files in logs directory to track the processes while they're running.

I have confirmed that the changes are working as intended and the data is being loaded into the Kafka cluster correctly.

Impact

These changes will migrate the code from Mongo DB to Kafka and making a unified logs directory, make it easier to add new features in the future, and update the dependencies and configuration files.

Additional Notes

  • To set up the environment for testing these changes, I used Docker to create multiple containers through docker-compose with necessary dependencies. To install Docker, I followed the instructions on the Docker website in addition to installing docker-compose on local machine, I followed the instructions on the Docker compose installation

  • To be able to test Kafka cluster visually through a friendly graphical user interface, I used Kafka Offset Explore for this mission. To install Kafka Offset Explorer I downloaded it from Kafka Offset Explore Site it's available on all operating systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant