Here comes a list of algorithms, packages and data science or engineering topics I was dealing with in the past.
Projects
End-to-end Machine Learning Platform: Design and implementation
Analytics: Churn, sales, engagement, conversion and budget prediction, customer lifetime value (CLV), explainable models via Shap
Funnel optimisation: Predicting conversion probability of leads within the funnel
Finance: streamed stock price predictions and feature engineering
Natural Language Processing (NLP): Sentiment analysis, semantical search, sequence to sequence models
Image classification: ImageNet classifications and object detection
Mail optimisation: via multi-armed bandits
Information retrieval from knowledge graphs
Big data risk reporting: for a DAX company, using Spark, Kafka and Scala
Food Intolerance detection: Causality and effect estimation of eaten food on bodily pain
Startup CTO tasks: building of full tech stack including Backend, Frontend, Data Pipelines, ML models
Models and algorithms
tree models: xgboost, catboost, lightgbm, ngboost, random forest
deep learning: TensorFlow, Keras, CNNs, LSTMs, attention models, TabNet
NLP: gensim, nltk, spacy, topic modelling
bayesian methods: Bayesian inference, Markov Chain Monte Carlo (MCMC), Bayesian AB tests
multi-armed bandits for stateless reinforcement learning
custom loss functions: for gradient boosted trees
hyperparameter optimisation: skopt
feature selection
model selection
embeddings
ML Engineering
Airflow workflow management: Scheduling of batch predictions and ETL processes via Airflow
Realtime predictions: Live predictions on streamed stock data via RabbitMQ, containerised
Cloud deployment: of model artifacts and pipelines
TensorfFlow Extended (TFX) for deep learning pipelines
ML Metadata: For data and artifact lineage (tracking of their path through the ML pipeline)
Databases and scheduling
databases, warehouses, cloud storage: snowflake, postgres, timescaleDB, neo4j, ElasticSearch, S3
scheduling: airflow, cron
Streaming: Kafka, RabbitMQ
Deployment
AWS
EC2, Lambda, Sagemaker, Batch, Fargate, ECR
Google Cloud
Cloud Function, App engine
Others
Heroku
Engineering
docker, terraform, git, linux shell
MLOps: Monitoring, tracking
plotly for dashboarding
grafana for technical parameter monitoring
slack integration
Evidently for model drift
MLflow for experiment tracking and model registry
Data validation
pandera
great_expectations
pydantic
tensorflow data validation (tfdv)
Data Exploration and visualisation
sweetviz
facets
matplotlib
seaborn
plotly
APIs
Interactive Brokers (Stock data streaming and order execution)
Alpha Vantage (stock data)
Emarsys (eMarketing system)
Web development
Javascript
React, React Native
Nodejs