4 days ago (edited) • Hussein Nasser

The launch of the Fundamentals of Operating Systems course was a success, I’m glad people are enjoying it so far and already asking interesting questions. 

It is now a best seller and listed in udemy business check it out! This course can be taken in any order but I recommend having some programming experience.

$10 discount coupons for all my courses as well for the month of May, ends May 5th.

Fundamentals of Operating Systems
 https://oscourse.win/ 

Fundamentals of Database Engineering
 https://databases.win/ 

Fundamentals of Network Engineering
 https://network.husseinnasser.com/ 

Fundamentals of Backend Engineering
 https://backend.win/ 

Discovering Backend Bottlenecks
 https://performance.husseinnasser.com/ 

Introduction to NGINX
 https://nginx.husseinnasser.com/ 

Python on the Backend
 https://python.husseinnasser.com/ 

5 days ago • Hussein Nasser

This company improved their Kafka produce tail latency by over 80% when they switched from ext4 to xfs. What I enjoyed most about this article is the detailed analysis and tweaking the team made to ext4 before considering switching to xfs. This is a classic case of how a good tech blog looks like in my opinion, Allegro, the company, folks did a good job

In this video I cover  

- How Kafka Works?
- Why Producers Writes are Slow
- How Allegro Traced the Kafka Protocol 
- How Tracing Kernel System Calls
- What is a Journaled File Systems
- How Allegro Tuned ext4
- And finally Switching to XFS

Video
 https://youtu.be/QAq3HRMmdbo 

Podcast 
 https://podcasters.spotify.com/pod/show/hnasr/episodes/How-Apache-Kafka-got-faster-by-switching-ext4-to-XFS-e2j1cu9 

5 days ago • Hussein Nasser

TCP_NODELAY is one interesting option. 

When an app writes to a socket connection, the raw bytes are copied from the user space process application to the kernel where TCP/IP takes place. 

Each connection has a receive buffer where data from the other party arrives and a send buffer where the data from the application goes before it is sent to NIC. Both send and receive buffers live in the kernel. 

When the app writes data to the connection to be sent to the other party it goes to the kernel’s send buffer. The kernel buffers the data in hopes of getting a full size TCP segment (MSS or maximum segment size which is often around 1500 bytes) before sending it through the network.

Each segment comes with ~40 bytes header, the overhead of sending few bytes with such a large header can lead to inefficient use of network bandwidth. So by default the kernel delays sending the segments to the network in hopes of receiving more data from the application to fill a full MSS. The algorithm which specifies when to delay and how much is called Nagle’s algorithm.

You can disable this delay by setting the TCP_NODELAY on the socket option, which causes the kernel to send whatever it has in the send buffer even if its a few bytes. Essentially favoring low latency over network bandwidth.

Backend applications benefits specifically from enabling this option when writing responses back to the client, as responses are sent through the send buffer. Delaying sending segments just because they are not full can lead to slowdowns in writing responses. 

—
I talk about this option on my backend and network course. 

9 days ago • Hussein Nasser

Operating systems (their kernel specifically) orchestrate many processes, allow access to memory, disk, network and execute processes by scheduling them to the CPU. Sounds simple when we put it this way but this task is vast. Writing efficient programs depends on how much understanding the engineer has in OS kernel.

When we access a 32 bit integer in memory or write 6 bytes to disk, the kernel and hardware work together through several steps and layers to handle those tasks, simple as may seem. Each step and layer may add unpredictable costs. As a result we as engineers are bound to write inefficient code as we grind across the grain of the kernel.

I built this course to demystify what I believe are the fundamentals operating systems to software engineers. By knowing how the kernel works, you will start writing software differently and naturally as you will start questioning what happens in each line you author.

Like all my courses, I recommend the student having some programming experience to take this course, it just makes the course relatable. The course is focused on Linux but I do explain how Windows and Mac are different in certain situations.

I hope you enjoy it. I'm happy I was able to finish it after 2 years of work. 

Use code "KERNEL" or head to  https://os.husseinnasser.com/  

 https://www.udemy.com/course/fundamentals-of-operating-systems/?couponCode=KERNEL 

2 weeks ago • Hussein Nasser

Postgres locks are interesting.

e.g. CREATE INDEX allows reads but blocks Writes, while VACUUM FULL blocks all queries.

In this article I explore all locks in Postgres and show my pglocks dot org tool which identifies conflicting commands.

 https://medium.com/@hnasr/postgres-locks-a-deep-dive-9fc158a5641c 

3 weeks ago • Hussein Nasser

Lunch and cache invalidation - A story

A construction project at work has been blocking the main doorway to the cafeteria where we get lunch.

For the first few days, we tried to use the main door and forgot it was closed, so we went around to use the side door, which takes a longer path. The trip to the main door and back was in itself long.

This continued until we became accustomed to avoiding the main door altogether. 

Two months passed and we still choose the longer side door route to the cafeteria over the shorter main door path, despite the construction having been complete for weeks.

None of us bothered to check if the main door had reopened; we simply retained the knowledge that it was "closed".

The other day I accidentally discovered it had reopened when I was going for a walk on campus.

In software engineering speak we have been using a stale cache and spending more time walking (not complaining, the long walk was nice). 

Updating the stale cache would have require us to check the main door every day which in itself takes time. 

On the other hand we could’ve kept using the stale cache and waited on a random event to update it.

We also could have used an expiry, if the cache is X days old, go do the check, but this presents another thing to maintain, how long is the expiry? too short and we incur the cost of updating the cache, too long and we risk using a stale cache for long.

We could have also not cached at all. What made us cache in the first place is a mystery. 

3 weeks ago • Hussein Nasser

‎عيدكم مبارك وعساكم من عواده 
‎Eid Mubarak to all who celebrates!  

‎Giving out 2000 free coupons of the networking and python courses, 1000 each. 

‎Fundamentals of Network Engineering 
‎⁦‪network.husseinnasser.com‬⁩ 
‎Free coupon: NET-EID2024 

‎Python on the Backend
‎⁦‪python.husseinnasser.com‬⁩ 
‎Free coupon:  PY-EID2024 

1 month ago • Hussein Nasser

I used to think that performance of the backend depends on the application logic itself. However, that is only one piece of it, there are many other factors that play a role in the overall quality, efficiency and performance of the application. 

Few things are connection management, Kernel TCP/IP stack, security and TLS, protocol serialization, intermediaries and much more. 
 
I designed this course for developers and engineers who built backend and frontend apps and would like to take their skills further in understanding the full stack and how to identify performance bottlenecks on their backend applications. 

Enjoy April coupons for this course and all my courses (links redirect to udemy) 

 https://performance.husseinnasser.com/ 

I recommend taking the Fundamentals of backend engineering as a prerequisite. 

Link to rest of my courses 

 https://backend.husseinnasser.com/  

 https://database.husseinnasser.com/  

 https://network.husseinnasser.com/  

 https://nginx.husseinnasser.com/  

 https://python.husseinnasser.com/ 

1 month ago • Hussein Nasser

I was researching how an implementation of a CPU can support context switching between two different processes (or threads of different processes) without flushing the translation look-aside buffer virtual memory translations. Essentially improving context switch time. 

You see we mostly work with virtual addresses, and this means nothing to the actual memory as we need physical address, so the CPU need to translate a virtual memory address to a physical address, so we need to translate virtual to physical as this is done through CPU MMU unit and the because the mapping is stored in memory, the CPU MMU caches it in a cache called translation look-aside buffer or (TLB) for fast translations.

But guess what? Process A's virtual address V1 and Process B's same virtual address V1 can map to two different physical addresses. So most TLB implementation discards the TLB when a context switch happens between two processes because the mapping is just different. And that is why switching threads of the same process is actually faster as we don't need to do this flushing.

However, some CPUs has a way to solve this and I stumbled upon ARM detailed documentation of this very issue that answered all my questions.

As I'm wrapping up working on my new OS course this has been very helpful, I have to thank arm for fantastic documentation. Their cpu architecture documentation is very well detailed. 

Doc matters 

1 month ago • Hussein Nasser

facebook installed a ROOT certificate on devices to intercept TLS traffic, this was done through a VPN client as part of their Protect service (yes the irony).

Of course users have to opt in to accept the install, but most users don’t know what a certificate is so they would just click ok. Plus Facebook was pushing Protect hard. 

Once you are part of the protect service, all your traffic will go through an fb VPN server, TLS handshakes to Snap/youtube would be intercepted, and completed by the VPN server and the target snapchat/youtube server. On the client side the fb VPN server generates its own certificate with a subject common name snapshot or YouTube (the site being visited) and creating a new TLS session between the client and the VPN server, with different keys. 

The client device completes the TLS handshake and validates the fake certificate which succeeds because its available in the cert store (added by the vpn client)

a classic man in the middle attack.