Remediate Kubernetes issues using Robusta

It’s 3 AM and production is groaning. There is a crazy amount of traffic, but the Horizontal Pod Autoscaler has already hit […]

Read more

Modernizing DevOps with Kubernetes

Software is everywhere in our daily lives, and it’s become ingrained into our daily routines. The spread of software has led to […]

Read more

Securing a Kubernetes cluster using TLS certificates

Security is an essential concept in every aspect of life, whether it be your home, bank accounts, social media accounts, emails, and websites […]

Read more

9 Security Best Practices in Identity and Access Management

Access management is a key part of any present-day organization’s security strategy. Whether it be accessing the physical […]

Read more

Data Pipelines and Their Components: An Insight

Data pipelines are the backbone of any batch processing system used to load data to and from data warehouses, data lakes, or data marts […]

Read more

Xgrid Snow DataOps: A Snowflake CI/CD Using Azure DevOps and Flyway

DevOps uses on-demand IT resources to prioritize continuous delivery by automating software testing and deployment […]

Read more

Troubleshooting Kubernetes using Robusta on Kind

In this blog (part of Troubleshooting K8s Using Robusta blog series), we’ll look at Robusta, an open source tool that enables Kubernetes[…]

Read more

Tackling Kubernetes Observability Challenges with Pixie

Over the past several years, the software industry has seen a boom in the adoption of containers and container orchestration technologies […]

Read more

Building Modern Applications Using Serverless and Microservices Architecture

Introduction

Architecture design is not only the first step in building an application but perhaps the most vital one as well. Earlier, the entire application was built as a single, indivisible unit of code that shared resources – known as Monolithic Architecture.

However, modern applications tend to favor the loosely coupled or decoupled Microservices Architecture. It allows the services to be autonomous, agile, rapidly scalable, and independently deployable. 

Another increasingly favored architecture is Serverless Architecture because it does not need server management. The servers used to build applications are managed and provisioned by Cloud Service Providers (CSPs) and cost as per consumption. 

The options available to build your application are numerous. Before starting to build your products, as a startup founder, you must understand the architecture requirements. You need to answer questions such as; which architecture is best suited for your unique solution; what benefits are offered by each available option; and among those benefits, which cater best to your priorities. The blog appeases the architecture for application development debate for startup founders. 

What is microservices architecture? What are its benefits?

In a microservices architecture, a single large application is broken into smaller elements where each element delivers a specific value. In monolithic applications, all functions are connected to a single large codebase which is harder to manage. For modern applications, the preferred application development architecture is a microservices architecture, in which the codebase is decoupled into smaller functional units that interact with each other through well-defined APIs. 

How does microservice architecture impact an application’s performance and scalability?

The biggest benefit of a microservices architecture is agility. The decoupled services are programmed, tested, and even deployed independently. Microservices are safer because the blast radius has been reduced with decoupling. Other merits are the use of well-defined interfaces for communication among the decoupled services. The backend code paths are well defined as well. Microservices are also easier to test due to the small-sized code for each decoupled service. 

In comparison, the demerit of monolithic architecture is that all developers are contributing to the same codebase. Hence, over time, the codebase increases in size and becomes harder to manage. Each team member creates their own changes. For these changes to be incorporated into the codebase, the approval requests for a large number of changes get queued. And so, the development is simply not fast enough in monoliths. 

A well-architected monolith application can outperform microservices but only in the short term. Over time, the monolithic application becomes bulkier but fragile, harder to maintain and operate. However, irrespective of the period, the agility and safety merits of microservices architecture trump the performance benefits of monolithic architecture. 

What is the difference between Monolithic, Service-Oriented Architecture (SOA), and Micro-services Architecture?

All functions whether it is the database or the API layer(s), are part of the same codebase in monolithic architecture and are maintained as one repository of code as well. Generally, they get packaged as a single large container or VM that gets deployed. Contrary to monolithic architecture philosophy is the microservices architecture which has been explained earlier. 


Microservices architecture uses the principles of service-oriented architecture. The service-oriented architecture combines discrete business functions that have been separately deployed and maintained. Its function is to provide data and code integration services and to ensure communication between these business functions. 

The difference between the two is the scope of operations and their methods of communication. Microservices architecture operates within the mobile application development services realm whereas service-oriented architecture is an enterprise-wide phenomenon. In microservices, each decoupled element uses APIs to communicate with each other. In contrast, each element in service-oriented architecture uses an enterprise service bus, (ESB) to communicate between discrete business functions.  

What are the best practices for designing microservices?

When microservices are designed, there are multiple ways to divide the application into the smallest element possible. The division can be based on teams called Service Per Teams e.g. front-end and back-end APIs teams, etc. It can also be based on functional areas that vary on the business use cases or application capabilities. 

Can breaking an application w.r.t number of transactions, serve as a guiding principle? 

Dividing application development architecture in terms of transactions can serve as a good guiding principle because the frequency of transactions per second can affect testing, deploying, and monitoring. Since there is no rule for the division parameter, the number of transactions can be one parameter. To cover all the use cases, the development team can conceive an agile way to implement, test, and monitor the application. The aim here is to achieve the best safety and performance for your application and its deployment. 

Is microservice architecture better than others in terms of cost?

It depends on the use case. Any architecture can be costly if used carelessly. For example, it is more costly if the monolith application is running in big instances or if load balancers are not properly configured. Similarly, microservices can be costly if multiple VMs or containers are in use but not needed. In general monolithic architecture tends to be slightly less expensive. 

Would you say that with a monolithic architecture, CI/CD pipelines are easier to manage?

No. It is rather hard to manage CI/CD pipelines in a monolithic architecture. One small change in a large codebase will impact other parts of the application. Whereas in microservices, the well-defined interfaces ensure fast defect isolation and resolution. 

CI/CD pipelines and monolithic architecture are like oil and water. Monoliths are harder to build, harder to deploy, and harder to test whereas Microservices are much easier in comparison. 

How do I maintain the security of my application in microservices architecture?

In terms of security, microservices and monolithic architecture are not fundamentally different because in both cases you have to perform your own security reviews.  The responsibility of patching your hosting and maintaining availability is on you. However, in serverless architecture, these actions are the responsibility of CSPs. 

Does that mean for security serverless architecture is better?

No. Serverless takes away that responsibility and hence a pain point. There is a fleet of VMs and servers that do not need management in serverless architecture, depending on the scale of your operations. However, serverless is not easy to manage when it comes to security unless the serverless functions are divided into the smallest possible functions.  Fine-grained IAM policies, then, manage each microservices that communicate with each other. This allows each team to own their own space and to autonomously work on that service. 

What are the challenges you face with serverless architecture? Please go into details of runtime and resource isolation.

One of the challenges is the time budget. Serverless can only run for a certain period. It is both a blessing and a curse. A blessing because it ensures developers finish early or at least timely. And a curse because you must finish work on serverless architecture in the designated time. This poses even more strain in high-load situations. 

Resource isolation is another challenge with serverless architecture. Serverless is no different when it comes to shared resources on the same infrastructure. Hence, the noisy neighbor effect will be a factor. The effect will be exasperated if a lot of serverless functions are running on the same infrastructure. 

How would the security, latency, and privacy of the data be impacted for a complex, distributed application in microservices architecture?

There were times when it was believed that microservices take longer to deploy. However, computing has evolved to such an extent that monolith does not yield more performance benefits than microservices, in general.

 The only time monoliths yield better performance is when IPC communication is used on the same Unix machine. Monoliths are faster in that case than using APIs. However, GraphQL query across two microservices, running on two different machines, may not be so. 

Nowadays network fabric, switching, and computing infrastructure has evolved enough to minimize the difference. 

How does microservices architecture deal with asynchronous calls resulting in congestion and latency? 

The issue of congestion and latency resulting from asynchronous calls does not change for monolithic architecture or microservices architecture. If you have a use case with high Transactions Per Second (TPS) then you design the architecture to handle high TPS. One important design consideration for monolith and microservices is to autoscale your system. Every Cloud Service Provider (CSP) allows you to autoscale. For example, for a food delivery service, the TPS will be low in the middle of the night vs during lunch hour. If you fail to autoscale, then the payment for both times will be the same. When in fact, you should pay less for times with fewer TPS. 

As a startup founder, what should be my motivation for going serverless? How would this impact my cost?

Your motivation to use serverless architecture should be agility and time efficiency. Independent teams can move faster. There is higher availability and safety of applications in a serverless architecture. Serverless also allows you to reduce your blast radius, build test cases that are specific to each microservices, and avoid unpredictable code paths. More important is the benefit of cost because you pay for only what you use. It is reasonable to run VMs as long as they are needed. All these cumulatively make serverless infrastructure an excellent choice for application architecture. Therefore, you should use serverless until you can’t. 

The cost of serverless architecture or the cost of the on-prem solution, which is easier to bear?

On-prem solutions also have serverless with VM support, nowadays. Whether you are on-prem or on the cloud, the cost will be according to the infrastructure consumption time. Serverless prevents you from hogging resources when not needed. Once the resource is released for use somewhere else, you get more from the same infrastructure. 

What are the different types of APIs that microservices can use to communicate with each other?

There are many. However, there are three standard APIs. The most commonly practiced API is the Representational State Transfer (REST). REST uses a client/server approach where the front and back-end APIs operate separately. RESTful APIs usually exchange data or documents and can communicate with each other both directly and with the help of intermediaries such as load balancers and API gateways.  

SOAP i.e. Simple Object Access Protocol is commonly used in the creation of Web API. Therefore SOAP supports a wide range of the internet protocols such as HTTP, SMTP, TCP, and more. SOAP works with XML (Extensible Markup Language). The extensible nature provides developers the ease and flexibility to add new features and functions. 

And the third is the Remote Procedural Call (RPC) usually solicits executable processes that help the exchange of parameters and their results. However, RPC is the least commonly used API because of its higher security risks and limited data-type support.

What are the best practices to implement microservices APIs?

Instead of implementing APIs yourself, use standard APIs. Make sure the standard API, you have opted for, has developer community support to back it up. There are a lot of libraries available to implement APIs. In the public cloud, even technologies resemble AWS API gateway. Other clouds have similar tools as well. They are easy to set up and can provide a restful interface that performs CRUD operations on the backend infrastructure in no time. 

Explain the way to implement service discovery in a microservices architecture.

Earlier, there might have been debates on the selection of many different service discovery architectures but currently, DNS is used by the majority. It is simple. Every service has a name. You register with a DNS and it guides the communication and service location. Service registration and service discovery are standardized on DNS and no longer require complex architectures. 

What does the future hold for these application development architectures? How do you see them evolving? 

For the application development architecture debate, winners have been called. The future of computing would be more and more serverless applications except for the workloads that cannot be serverless. For each use case, you need to choose which workload will be serverless. Therefore, it would be a mix of serverless and container or serverless container and database running as VMs. 

To Watch the Complete Episode, Please Click on This YouTube Link:

https://www.youtube.com/watch?v=9Jcse3Vjw4I&t

Read more

MLOps: Exclusive Insights Into the Field of Machine Learning Operations

Introduction

Machine Learning has been around for decades. However, the term has been interchangeably used with Artificial Intelligence, Deep Learning, and many others. With machine learning increasingly becoming part of technologies from all walks of life, software development is no exception. Therefore, we deem it necessary to explicate Machine Learning (ML) and all related terms for startup founders. Centrally, the blog addresses how Machine Learning Operations (MLOps) has now become an integral part of the Software Development Lifecycle (SDLC) for many businesses.

Introduction to MLOps and Why It Matters?

Machine Learning is a set of algorithms that learn from data, unlike explicitly programmed algorithms. It is used in many fields such as spam detection, product recommendations, object recognition, fraud detection, forecast modeling, etc.

For machine learning systems, the real challenge is not to build a machine learning model; instead, the real challenge is to build an integrated machine learning system that continuously operates seamlessly in production with quick solutions to any arising issues. Practicing machine learning operations elicits the strive to automate and monitor all parts of the data pipelines such as data collection, data cleaning, feature engineering, model development, evaluation, deployment, and infrastructure management. 

Among the numerous benefits of ML, the first is the increase in development velocity. It might take time for companies to develop processes for MLOps today but it rapidly reduces the time of development for data scientists, consequently leading to the growth of the company. 

Another benefit is data and model lineage and tracking. The credibility of a prediction lies in the data source, the type and model of cleaning transformations, and what metrics are used to evaluate that model. Hence, in MLOps, every part of the pipeline is tracked and versioned, thus allowing for model audits in the future. 

Another MLOps benefit is reproducibility. In MLOps, the emphasis on version control, tracking, and modularization of all components enables developers to re-run the pipeline and produce the same results, models, and data. Upgrading a stack in MLOps is easier due to modularization and containerization of the code. These steps are taken as best practices in MLOps and lead to fewer issues in production as well.

What Are the Different Components of a Machine Learning System?

Machine learning has two main parts namely, data and algorithms. In machine learning systems, these two main parts are further divided into six sub-parts, the first of which is data collection and analysis. Based on its importance in ML systems, data is collected from multiple sources. This collected data could be structured or unstructured, therefore, it needs to be analyzed. The analysis tackles the following queries: the origin of the data, its range and scale of values, and its quality. 

The second part is feature engineering. Once the data is developed, it needs more work before it is fed into an ML model. This feature engineering work varies as per the requirements e.g. in the spam classification model, the features such as the subject line or the email body text would be developed. Similarly, for the stock market value prediction model, the feature engineering would require features such as historic prices of the stock, market indexes, market volatility, or political stability. 

The third part is model development. After the feature development, the data with highlighted features are fed into the ML model. With the evolution of ML technology, model development has become the easiest part of the pipeline. Owing to the vast and conscientious libraries, pipelines only require a few lines of code and deliver state-of-the-art ML performance. 

The fourth part is the model evaluation and validation. Once the model is built, its quality and performance for the business use-case are assessed thus providing a direction for the machine learning model and how it should be optimized. 

The fifth part is model deployment. The model at this stage is used for live predictions and a pipeline is built around it which continuously deploys and serves the requirement. 

The last part is the monitoring and it is a vital part of the ML systems. Monitoring of ML models is performed to ensure the required performance is maintained in production. It also ensures there are little to zero deviations from offline model development.

MLOps

MLOps Vs DevOps: Are They Really That Different?

DevOps normally focus on application development, but MLOps is a combination of both DevOps and machine learning. The functional features of DevOps, such as CI/CD deployment, dependable releases, load testing, and monitoring, are combined with machine learning components to facilitate MLOps. 

The differences between them can be summarized as follows;

MLOps team is structured differently from a DevOps team; in that data scientists are part of the mix and they might not have software engineering knowledge. 

The second difference is the experimental development. Traditional software development is roughly linear as compared to ML pipeline which is rather circular. 

The third difference is the requirement of additional testing apart from integration and unit testing. This additional testing entails data and model testing on top of integration and unit testing.

The fourth difference is in the deployment process. Apart from fast deployment, the goal is to be able to re-run the deployment based on the signals that appear during production. 

Another difference is the monitoring for additional metrics. The traditional metrics used to determine health, traffic, and memory in DevOps are also used in MLOps. However, in the case of MLOps, additional metrics are also required such as prediction quality and model performance, etc.

Do You Need MLOps for Your Startup or Enterprise Business?

It depends on the business and resources. If the data does not change frequently and only needs an update once or twice a year, manual processes might suffice for such businesses. For other businesses where the data is updated a few times a month such as insurance risk and disaster risk-related business use-cases, a certain degree of machine learning automation will be beneficial. 

There is also the case of businesses where data is changing very frequently in which case a full MLOps should be the only way forward. For example, in the case of spam detection, the latest data in spam is required and models need to be re-trained in a full feedback loop pipeline. These factors, among many others, determine your need for MLOps.

Should You Automate the Entire MLOps Pipeline or Parts of It?

MLOps is all about monitoring and automation of the pipeline and should be carried out as required. The most frequent part of the business should be automated instead of automating the entire pipeline. Businesses that deploy once a year would not benefit from automating the entire process; however, if your deployment is fairly regular, then MLOps should be practiced.

ML Pipeline Vs CI/CD Pipeline: what are the differences and similarities?

Continuous Integration  (CI) and Continuous Deployment (CD) pipelines are relatively the same as MLOps pipelines with a few additional components. CI/CD pipelines are no longer merely testing and validating code and components but also testing and validating data schemas and models as well. It is no longer a single software package or a service but an entire system that should automatically deploy another service. Consequently, it becomes a machine learning training pipeline that, upon prompt, deploys another service, unlike traditional continuous deployment which is well-defined and linear. The continuous training component is a new feature that the MLOps pipeline exhibits. Another distinguishing factor is the continuous model monitoring in production.

What should we understand from Model or Concept Drift?

The complete performance assessment of a machine learning model in online systems is a challenge and therefore requires an offline model to train the data on a specific use-case and generate a prediction model. However, the offline model performance deviates from the online model performance implying that the model has “gone still” and needs to be re-trained on fresh data. This phenomenon is referred to as the concept drift and requires continuous training in ML systems.

How to Monitor the MLOps Deployment?

In MLOps deployment, apart from the aforementioned traditional metrics, health, memory, time variation, and latency (i.e. the time it takes the model to make a prediction) are also monitored. The throughput metric in MLOps deployment is another important metric as it deliberates on how many examples the model can predict in one instance of prediction. Yet another important metric is data schema skew because data changes need to be monitored with a great deal of attention in ML.

As a Startup Founder, Can I Hire DevOps Engineers to Work as MLOps Engineers?

It is believed in the technology community that either a  machine learning engineer or DevOps engineer should work as an MLOps engineer. Therefore, it could be that the DevOps engineer learns the machine learning systems to become an MLOps engineer and vice versa. However, without a thorough understanding of machine learning that a data scientist would have, solely a DevOps engineer might miss out on important factors, and similarly, without software engineering knowledge, the data scientist might miss out on important deployment details. So, this is turning into a newly rising field called machine learning software engineer with skills to develop basic machine learning models as well as testing and monitoring the model in deployment.

How To Determine if an ML Model Is Ready to Be Released?

As mentioned above, an ML model is trained and deployed for an offline performance evaluation based on a given data set. If the model performed well offline, it would be deployed in production. It is a very basic version of the deployment process. However, a more robust approach is to perform AB testing on the ML model that has been deployed in production, determine its performance, and update the model during production. The model is then constantly evaluated to determine its performance and readiness as opposed to previous practice.

How Do Tech Giants Use MLOps?

For instance, Google has an open-source policy and uses most of the internal libraries or frameworks. There are different levels of software that one can use based on the required sophistication. Another big part of Google culture is Auto ML which performs the activities typical to a machine learning team. The data is injected in Auto ML which automatically trains data and offers a shortcut and accelerates typical machine learning projects and their tasks.

TensorFlow data is also used by Google to monitor production data quality and to compute the basic statistics. Interactive charts allow for visualized inspection and comparison to monitor any concept drifts. On top is a Cloud Composer which essentially orchestrates all the aforementioned functions starting from data exploration and collection to model serving and monitoring among others. In a basic setup, Cloud Composer acts as the orchestrator. For more advanced processes, KubeFlow is available which is a Kubernetes-native machine learning toolkit. It can build and deploy portable and scalable end-to-end machine learning workflows based on containers. Another advanced tool is TFX, recently open-sourced by Google, which is a configuration that provides components to define, launch and monitor TensorFlow-based models and performs model training, prediction, and the serving.


Step by Step Guide for Companies to Adopt MLOps

Many Cloud Service Providers (CSPs) offer abstraction for MLOps. As mentioned earlier, Google Cloud Services offer Google ML Kit. Similarly Azure offers Azure ML service, and Amazon offers Sage Maker. These abstractions allow data scientists to focus on business logic rather than production-related problems.

To Watch the Complete Episode, Please Click on This YouTube Link:

https://www.youtube.com/watch?v=d19JfKF5Y38&t=274s

Read more