Skip to content
View EarthlyAlien's full-sized avatar
πŸ‘¨β€πŸ’»
Working from Home
πŸ‘¨β€πŸ’»
Working from Home

Block or report EarthlyAlien

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
EarthlyAlien/README.md

Hello, I'm Chaitanya πŸ‘‹

Welcome to my GitHub! I'm a Data guy (analytics/engineering/science and little bit of AI) with a Master’s in Advanced Data Analytics and a solid foundation in Data Analytics, Data Science, Data Engineering, MLOps, and Business Analytics with keen interest in AI Applications. I’m passionate about building data-driven solutions that drive growth, innovation, and operational efficiency. My background spans data architecture, scalable ML pipelines, cloud computing, and actionable insights that help teams make strategic decisions.


πŸ› οΈ About Me

  • ⚑ Former Product Lead at Cirrus Nexus (Cumulus Nexus India Pvt Ltd)
  • πŸ‘¨β€πŸ’» Experienced in Python, R, SQL, Rust, C++, Go, Terraform, and advanced ML frameworks like TensorFlow, PyTorch, and Scikit-Learn
  • ☁️ Proficient in Cloud Platforms: AWS (SageMaker, Glue, Redshift, Lambda), Azure (Data Factory, Synapse, HDInsight, ML Studio), GCP (BigQuery, Looker, Vertex AI Platform); Certified in AWS, Azure, GCP, and Kubernetes
  • πŸ“Š Skilled in Data Engineering (ETL, Data Modeling, Real-Time Streaming), MLOps (CI/CD, Model Deployment), and Data Science (Predictive Modeling, NLP, Computer Vision)
  • πŸ’¬ Advocate for Cloud Cost Optimization strategies, helping companies cut costs while improving performance through structured planning
  • πŸ€– Specialized in Natural Language Processing, Large Language Models (LLMs), Retrieval Augmented Generation (RAG), FAISS, AI Agents, and Vector Databases

πŸ”­ Projects

  • Data Engineering & Big Data Pipelines – Architecting and optimizing ETL pipelines for large-scale data processing with Apache Spark, Flink, Superset, Dagster, Druid,Delta lakee,dbt,Airflow, Snowflake, and Fivetran
  • MLOps Pipelines – Building end-to-end ML pipelines with Kubernetes, Docker, Jenkins, and Kubeflow to automate model training and deployment, with a focus on scalability and CI/CD workflows
  • Generative AI & NLP Models – Developing cutting-edge models for NLP, including language models and sentiment analysis, using transformer architectures
  • Cloud Infrastructure Optimization – Implementing efficient infrastructure using Terraform and IaC (Infrastructure as Code) to optimize cloud resources on AWS, Azure, and GCP
  • Generative AI & LLMs – Building production-ready LLM applications using LangChain, LlamaIndex, and Vector Databases (Pinecone, Weaviate, Milvus). Implementing RAG pipelines with custom knowledge bases and hybrid search strategies. Designing AI Agents using CrewAI, Weaviate and other tools.

🌱 Always Learning

  • Scaling Machine Learning Operations – Expanding knowledge in MLflow, Argo, and advanced MLOps for seamless deployment and monitoring of ML models
  • Distributed Systems & Real-Time Analytics – Exploring Apache Flink, Kafka, and Delta Lake for real-time analytics and streaming solutions
  • Advanced Data Engineering – Diving deeper into data warehouse and data lake architecture, leveraging platforms like Snowflake and Databricks
  • Advanced LLM Engineering – Exploring LLM fine-tuning, prompt engineering, and context window optimization techniques for enterprise applications

🧩 Key Skills & Technologies

Data Engineering & ETL

  • Tools & Platforms: Apache Spark, Kafka, Hadoop, Snowflake, Databricks, Apache Airflow, Fivetran, dbt
  • Cloud & Big Data: AWS (Lambda, Glue, RDS, S3, EMR, Redshift), Azure Data Factory, Azure Databricks, Azure Synapse, GCP BigQuery, Snowflake
  • Skills: Data Pipeline Design, ETL Optimization, Data Modeling, Real-Time Data Streaming

Data Science & Machine Learning

  • Languages & Libraries: Python, R, Julia, Scala, Java, SQL, Scikit-Learn, TensorFlow, PyTorch, PySpark, Keras, Pandas, Dask
  • Specializations: Predictive Modeling, Time Series, NLP, Deep Learning, Hyperparameter Tuning, Computer Vision

MLOps & DevOps

  • MLOps Tools: Docker, Kubernetes, Jenkins, MLflow, Kubeflow, Argo, Terraform, GitHub Actions
  • CI/CD & Automation: CI/CD Pipelines, Model Versioning, Model Deployment, Monitoring & Logging

Data Visualization & Business Analysis

  • Visualization Tools: Power BI, Tableau, Plotly, Matplotlib, ggplot2
  • Business Tools: JIRA, Confluence, Lucidchart, Microsoft Visio, Business Process Mapping, Requirements Analysis

Generative AI & LLMs

  • Frameworks: LangChain, LlamaIndex, Semantic Kernel, OpenAI API, Anthropic API
  • Vector Databases: Pinecone, Weaviate, Milvus, Chroma, FAISS
  • Skills: RAG Pipeline Design, Prompt Engineering, LLM Fine-tuning, Embedding Optimization, Context Window Management

πŸŽ“ Certifications

  • Data Engineering & Cloud:
    • AWS Cloud Data Engineer, Azure Data Engineer, Google Cloud Professional Data Engineer, SnowPro Core, Meta Database Engineer
  • Machine Learning & Data Science:
    • TensorFlow Developer, AWS Certified Machine Learning Specialty, IBM Data Science Professional
  • MLOps & DevOps:
    • Certified Kubernetes Administrator, Terraform Associate, Databricks Certified for Apache Spark

🌟 Featured Projects

Humana-Mays Case Competition

  • Tools: R, SQL, Tableau, ETL
  • Summary: Advanced to Round 2 among 400 teams by designing KPIs to track healthcare patient engagement, creating impactful insights for targeted health improvement.

Real-Time Data Streaming Solution

  • Tools: Kafka, AWS Lambda, Spark
  • Summary: Built a real-time data streaming architecture to process and analyze data instantly, achieving 99.9% system availability and reducing latency for business-critical decisions.

Customer Churn Prediction Model

  • Tools: Python, Scikit-Learn, AWS
  • Summary: Developed a predictive model with 86.2% accuracy to forecast customer churn, allowing for proactive retention strategies and enhancing customer engagement.

Automated ML Pipeline for Model Deployment

  • Tools: Python, Apache Airflow, AWS SageMaker
  • Summary: Created an ML pipeline automating data preprocessing, model training, and deployment, reducing operational costs by 14% while maintaining high model performance.

LLM-Powered Document Assistant

  • Tools: LangChain, OpenAI API, ChromaDB
  • Summary: Built a Q&A system over internal documentation using RAG, achieving 85% query relevance while reducing response time by 60% compared to manual searches.

πŸ’¬ Let’s Connect!


⚑ Fun Facts

  • β˜• Tea over Coffee! Extra fuel for complex problem-solving.
  • 🎲 Avid puzzle solver and lover of challenging data problems.
  • πŸ‘Ύ I enjoy exploring the latest in Generative AI and contributing to open-source projects.

Thanks for stopping by my profile! Feel free to explore my repos, and let’s collaborate if you share similar interests or need insights on cloud and AI solutions.

Pinned Loading

  1. big-list-of-naughty-strings big-list-of-naughty-strings Public

    Forked from minimaxir/big-list-of-naughty-strings

    The Big List of Naughty Strings is a list of strings which have a high probability of causing issues when used as user-input data.

    Python

  2. earthlyalien.github.io earthlyalien.github.io Public

    HTML

  3. Public-APIs Public-APIs Public

    Forked from n0shake/Public-APIs

    πŸ“š A public list of APIs from round the web.

  4. tayllan/awesome-algorithms tayllan/awesome-algorithms Public

    A curated list of awesome places to learn and/or practice algorithms.

    21.2k 2.7k

  5. python-scripts python-scripts Public

    Forked from realpython/python-scripts

    because i'm tired of gists

    Python 1

  6. PayloadsAllTheThings PayloadsAllTheThings Public

    Forked from swisskyrepo/PayloadsAllTheThings

    A list of useful payloads and bypass for Web Application Security and Pentest/CTF

    Python