Data is essential for training machine learning models, and more quality data tends to improve performance metrics, allowing you to solve business problems better. I can collect and label any open source data using a variety of scraping techniques, handling APIs and databases along the way. This can include data on social media platforms, product reviews, market trends you can use to understand public demand, etc.
The following list represents some of my highlighted machine learning projects (excluding open-source contributions and other proprietary or small projects).
Digital marketing, web scraping
Digital marketing analytics solution that scrapes websites for SEO factors and predicts advertisement CTR
Recommender systems
A simple pipeline that collects, preprocesses and labels raw user-item interaction data to build a hybrid recommender system using collaborative filtering (SVD, ALS) and learning-to-rank (XGBoost ranking) methods, evaluated with NDCG and MAP metrics
Web scraping, data preprocessing
A Python script designed for web scraping, data integration and further model training. It leverages BeautifulSoup for parsing HTML content, TensorFlow and Keras for building and training baseline models, and several other libraries for data processing and automation.
API parsing
Python scripts for parsing Wildberries, the largest Russian online retailer. Designed to extract data via API.
OSINT, NLP, web agents, web scraping
LLM-based OSINT tool designed to perform deep web searches by orchestrating multiple web agents and a knowledge agent that uses state-of-the-art machine learning methods. The tool crawls the entire web to gather massive amounts of publicly available data, then leverages advanced LLM techniques to perform natural language tasks on that information.