I’ve been getting my hands dirty with Machine Learning lately. I made this research on target encoding in ML and how it can improve or alter the model’s accuracy and I want to share it with you. Having a machine learning feature from a dataset that has too many categorical variations can make the ML model over complicated. For example unique fields like IDs or categories with too many possibilities (city names). One good option to still use these fields in a ML model is to aggregate and encode them. We will do this by using target encoding. Let’s see how that works in more detail.
Realtime applications are becoming more and more popular, streams of data travel from one service to another continuously. We want our entire microservices architecture behaving like closed electronic circuits, always reacting on any input changes, triggers, or interruptions. Everything working together like a symphonic orchestra. And if this is not complex enough, we have to continuously grow our businesses, more and more data has to go through our reactive circuit. Services have to process more without appearing slower to the end consumer. Frontend applications are becoming realtime mirrors that reflect any change that our orchestrated backend services do. We can achieve this by streaming data from backend services to frontend applications in realtime, keeping the end user updated with all changes the system is going through. It is an impressive evolution of cloud applications. But growth always comes with challenges, one of which is to make an API service scalable, a service that exposes a reactive API that opens an event stream on each HTTP request and keeps it open for a long period of time, updating the customer with any state change, continuously for many minutes or even hours. At JustEat Takeaway.com, the Tracker application uses this reactive API every day to inform millions of customers about the state of their orders in realtime.
As of Domain Driven Design approach, an
aggregate is a domain specific pattern for encapsulating a collection of entities that live on their own. The
aggregate encapsulates all the behavior of internal parameters and is unique, being identifiable by an id. Changing the state of the aggregate is possible only through it’s exposed domain specific methods. Because an
aggregate needs to be persisted for a longer time, we should always save and load the entire object as a whole. By applying different
commands to an aggregate, it always results in a new aggregate state that will overwrite any previous states. This means that we loose all history of the aggregate. Logging a message for each behavior taken would help with having a history of the aggregate but what if we want to see what was the state of the aggregate at a particular time in the past. Or if we want to implement new features and use all the past data to update the aggregate.
I see this architectural pattern being underrated and I think it needs to be used more often. Systems are shifting more and more towards microservices architectures and also at the application level we want to gain modularity and decoupling. Especially for services that are not under our control. Putting a boundary between our business and any other dependencies would ensure better future evolvability.
Recently I am working on the edge microservice of a distributed system, let’s call it
Microservice 1. This is built using Spring Boot and as usually it uses Spring MVC to build it’s internal API defined by Java interfaces and DTOs.
Microservice 2 wants to consume the API provided by
Microservice 1 so we need to bind these 2 microservices together. But what is the ideal way to do this in this case? Since microservices architectures are relying very much on bounded context and loose coupling we need to do this with minimum intersection but also avoiding duplications.
Today I had to face a strange issue regarding inspecting requests over the network. I am working on a new feature where users can upload files to our server trough a mobile application. I am responsible for developing the backend part and exposing the proper API. The frontend parts for this feature are the mobile applications, Android and iOS done by remote developers.
I had the opportunity to work with Blockchain lately (on Ethereum) so I learned quickly how is all working, how can I build my own blockchain. The documentation is a little immature at this time, as expected, it’s new technology. Also issues posted online ware broken because of different versions of the tools I used. Most of them weren’t even version 1 yet. I decided to explain the logic behind blockchains, so I will start by building a raw blockchain without using any specific tools. This will be pure Java.
This is for those times when we start a container and figure that no port or the wrong port was exposed. Docker uses it’s own bridge network, that we can define or change but there is no official documentation about how it manipulates the OS routing rules. And there is no official solution to expose a port on a docker container but there are a few workarounds. Let’s see what can we do.
Today we will discuss about memory leaks, what are these, what is causing them and how to find them using a tool called Java VisualVM.
A thread is just a unit that is doing some processing. A java application runs on multiple threads that can be spawn up any particular time. Each thread has it’s own stack trace bat all the threads in an application share the same heap. Once all threads are stopped, the application will exit. Working with threads can get very complicated and to be able to investigate our applications behavior it is important to be able to read through thread dumps.
In this post I will explain the workflow of doing TDD while solving a coding challenge. While following the TDD workflow we will intentionally head in the wrong direction at some point and we will see how we can easily refactor while having confidence in our tests. To stay original I chose a challenge that I didn’t find anywhere. And that is because the the educational system from Eastern Europe (ex Sovietic Block) was different from other places. It is also a good coding challenge to give developers doing a coding interview.
Lets face it, it happens often that our branches are flooded with commits with a significat number of them not even being directly related to the feature we are implementing, like
fixing typo or
cleanup debugger, comments, etc.
Acceptance tests should be very easy to follow by non tech people like our client could be.
Let’s say we want to let our users upload images and we want to save them to our
.../public/gallery folder using Paperclip. But we want to add them dynamically, based on another field of our model.
This will be a very short one, just to prove a point. ERB templates are usually very verbose while including partials and layouts and can result is a very complex UI architecture.
Time complexity measures the performance evolvability of an algorithm. It describes how an algorithm’s performance will scale based on scaling the inputs (or some external factor) or in other words, how will the processing time of the algorithm grow if we grow the input values or data quantity (like: increasing the integer values that represents some form of quantity, increasing the size of an array / string, etc). Instead of comparing the input and execution time, we can compare the input and the number of operations the algorithm has to execute. This is very useful when comparing algorithms.