End to end ML Ops infrastructure (video)

This is a personal project to demonstrate a fully automated end-to-end Machine Learning infrastructure, including reading data from a feature store, automated model creation with hyperoptimization tuning and automatically finding the most efficient model, storing models in a model registry, building a CI/CD pipeline for deploying the registered model into a production environment, serving the model using HTTP API or Streaming interfaces, implementing monitoring, writing tests and linters, creating regular automated reporting.

Written on September 22, 2022

Target Encoding in Machine Learning

I’ve been getting my hands dirty with Machine Learning lately. I made this research on target encoding in ML and how it can improve or alter the model’s accuracy and I want to share it with you. Having a machine learning feature from a dataset that has too many categorical variations can make the ML model over complicated. For example unique fields like IDs or categories with too many possibilities (city names). One good option to still use these fields in a ML model is to aggregate and encode them. We will do this by using target encoding. Let’s see how that works in more detail.

Written on January 15, 2022

Scaling reactive APIs

Realtime applications are becoming more and more popular, streams of data travel from one service to another continuously. We want our entire microservices architecture behaving like closed electronic circuits, always reacting on any input changes, triggers, or interruptions. Everything working together like a symphonic orchestra. And if this is not complex enough, we have to continuously grow our businesses, more and more data has to go through our reactive circuit. Services have to process more without appearing slower to the end consumer. Frontend applications are becoming realtime mirrors that reflect any change that our orchestrated backend services do. We can achieve this by streaming data from backend services to frontend applications in realtime, keeping the end user updated with all changes the system is going through. It is an impressive evolution of cloud applications. But growth always comes with challenges, one of which is to make an API service scalable, a service that exposes a reactive API that opens an event stream on each HTTP request and keeps it open for a long period of time, updating the customer with any state change, continuously for many minutes or even hours. At JustEat Takeaway.com, the Tracker application uses this reactive API every day to inform millions of customers about the state of their orders in realtime.

Written on October 7, 2020

Event Sourced Aggregates

As of Domain Driven Design approach, an aggregate is a domain specific pattern for encapsulating a collection of entities that live on their own. The aggregate encapsulates all the behavior of internal parameters and is unique, being identifiable by an id. Changing the state of the aggregate is possible only through it’s exposed domain specific methods. Because an aggregate needs to be persisted for a longer time, we should always save and load the entire object as a whole. By applying different commands to an aggregate, it always results in a new aggregate state that will overwrite any previous states. This means that we loose all history of the aggregate. Logging a message for each behavior taken would help with having a history of the aggregate but what if we want to see what was the state of the aggregate at a particular time in the past. Or if we want to implement new features and use all the past data to update the aggregate.

Written on December 6, 2018

Anti Corruption Layer Architecture Pattern

I see this architectural pattern being underrated and I think it needs to be used more often. Systems are shifting more and more towards microservices architectures and also at the application level we want to gain modularity and decoupling. Especially for services that are not under our control. Putting a boundary between our business and any other dependencies would ensure better future evolvability.

Written on June 10, 2018

Binding Java/Spring microservices using Feign

Recently I am working on the edge microservice of a distributed system, let’s call it Microservice 1. This is built using Spring Boot and as usually it uses Spring MVC to build it’s internal API defined by Java interfaces and DTOs. Microservice 2 wants to consume the API provided by Microservice 1 so we need to bind these 2 microservices together. But what is the ideal way to do this in this case? Since microservices architectures are relying very much on bounded context and loose coupling we need to do this with minimum intersection but also avoiding duplications.

Written on May 3, 2018

Capture, save and resend requests with Wireshark

Today I had to face a strange issue regarding inspecting requests over the network. I am working on a new feature where users can upload files to our server trough a mobile application. I am responsible for developing the backend part and exposing the proper API. The frontend parts for this feature are the mobile applications, Android and iOS done by remote developers.

Written on February 12, 2018

Blockchain 101

I had the opportunity to work with Blockchain lately (on Ethereum) so I learned quickly how is all working, how can I build my own blockchain. The documentation is a little immature at this time, as expected, it’s new technology. Also issues posted online ware broken because of different versions of the tools I used. Most of them weren’t even version 1 yet. I decided to explain the logic behind blockchains, so I will start by building a raw blockchain without using any specific tools. This will be pure Java.

Written on December 4, 2017

Expose port on already running Docker container

This is for those times when we start a container and figure that no port or the wrong port was exposed. Docker uses it’s own bridge network, that we can define or change but there is no official documentation about how it manipulates the OS routing rules. And there is no official solution to expose a port on a docker container but there are a few workarounds. Let’s see what can we do.

Written on July 2, 2017

Investigating Java thread dumps

A thread is just a unit that is doing some processing. A java application runs on multiple threads that can be spawn up any particular time. Each thread has it’s own stack trace bat all the threads in an application share the same heap. Once all threads are stopped, the application will exit. Working with threads can get very complicated and to be able to investigate our applications behavior it is important to be able to read through thread dumps.

Written on May 28, 2016

Practicing TDD with a Java code kata

In this post I will explain the workflow of doing TDD while solving a coding challenge. While following the TDD workflow we will intentionally head in the wrong direction at some point and we will see how we can easily refactor while having confidence in our tests. To stay original I chose a challenge that I didn’t find anywhere. And that is because the the educational system from Eastern Europe (ex Sovietic Block) was different from other places. It is also a good coding challenge to give developers doing a coding interview.

Written on October 5, 2015

Squashing commits with git and keeping your branches clean

Lets face it, it happens often that our branches are flooded with commits with a significat number of them not even being directly related to the feature we are implementing, like fixing typo or cleanup debugger, comments, etc.

Written on March 4, 2015

Quick Ruby ERB

This will be a very short one, just to prove a point. ERB templates are usually very verbose while including partials and layouts and can result is a very complex UI architecture.

Written on April 11, 2014

Big O Notation and Time Complexity (Javascript)

Time complexity measures the performance evolvability of an algorithm. It describes how an algorithm’s performance will scale based on scaling the inputs (or some external factor) or in other words, how will the processing time of the algorithm grow if we grow the input values or data quantity (like: increasing the integer values that represents some form of quantity, increasing the size of an array / string, etc). Instead of comparing the input and execution time, we can compare the input and the number of operations the algorithm has to execute. This is very useful when comparing algorithms.

Written on December 27, 2013