Data collection projects

Data is essential for training machine learning models, and more quality data tends to improve performance metrics, allowing you to solve business problems better. I can collect and label any open source data using a variety of scraping techniques, handling APIs and databases along the way. This can include data on social media platforms, product reviews, market trends you can use to understand public demand, etc.

The following list represents some of my highlighted machine learning projects (excluding open-source contributions and other proprietary or small projects).

🎯SEO-CTR-optimizer2024 1 week

Digital marketing, web scraping

Digital marketing analytics solution that scrapes websites for SEO factors and predicts advertisement CTR

🎯 SEO-CTR-optimizer  

 — Digital marketing analytics solution that scrapes websites for SEO factors and predicts advertisement CTR

👀LTRS-scraperMar 20, 2024 2 days

Recommender systems

A simple pipeline that collects, preprocesses and labels raw user-item interaction data to build a hybrid recommender system using collaborative filtering (SVD, ALS) and learning-to-rank (XGBoost ranking) methods, evaluated with NDCG and MAP metrics

👀 LTRS-scraper  

 — A simple pipeline that collects, preprocesses and labels raw user-item interaction data to build a hybrid recommender system using collaborative filtering (SVD, ALS) and learning-to-rank (XGBoost ranking) methods, evaluated with NDCG and MAP metrics

⛏️training-scraperNov 6, 2021

Web scraping, data preprocessing

A Python script designed for web scraping, data integration and further model training. It leverages BeautifulSoup for parsing HTML content, TensorFlow and Keras for building and training baseline models, and several other libraries for data processing and automation.

⛏️ training-scraper  

 — A Python script designed for web scraping, data integration and further model training. It leverages BeautifulSoup for parsing HTML content, TensorFlow and Keras for building and training baseline models, and several other libraries for data processing and automation.

🫐Wildberries parsers

API parsing

Python scripts for parsing Wildberries, the largest Russian online retailer. Designed to extract data via API.

🫐 Wildberries parsers  

 — Python scripts for parsing Wildberries, the largest Russian online retailer. Designed to extract data via API.

👁️Kallisto2025 – Ongoing

OSINT, NLP, web agents, web scraping

LLM-based OSINT tool designed to perform deep web searches by orchestrating multiple web agents and a knowledge agent that uses state-of-the-art machine learning methods. The tool crawls the entire web to gather massive amounts of publicly available data, then leverages advanced LLM techniques to perform natural language tasks on that information.

👁️ Kallisto  

 — LLM-based OSINT tool designed to perform deep web searches by orchestrating multiple web agents and a knowledge agent that uses state-of-the-art machine learning methods. The tool crawls the entire web to gather massive amounts of publicly available data, then leverages advanced LLM techniques to perform natural language tasks on that information.