Pushing Bits: The Surprisingly Arcane Art of Sending Data on the Internet

Introduction

The Internet is based on "store and forward packet switching" technology. Data is sent in discreet chunks called packets. Those packets are transmitted from a source machine over a data link to a packet switch. That packet switch may temporarily hold a packet before that packet is sent over another data link to another packet switch. Typically a packet will traverse several data links and pass through several packet switches as it makes its way to the destination.

Before the Internet

In the era before the Internet, data was usually moved over data links that ran directly from source to destination. There was no storing of data. Rather bits were rigidly clocked out over RS232 or V.35 interfaces. There was typically no flow control push-back on senders beyond the ticking of the bit-clocks and perhaps a Request-to-Send/Clear-to-Send handshake over the RS232 or V.35 plug.

Transmission of data over today's packet switched Internet is much more complicated. Care must be taken to avoid overloading or congesting the Internet. This means that machines that transmit data - whether those machines are user desktops, servers, or intermediate relays - must regulate themselves.

This article is about various methods of transmission regulation and how those methods work.

This article is not about end-to-end flow control such as performed in the TCP protocol with its "window" machinery. Nor is this article about channel contention systems such as may be found in radio systems or some recent satellite constellations.

Rather, this article is concerned with how a computer - whether that computer is the initial source of a packet or is an intermediary store and forward node - regulates its transmission using only internal information (such as the passage of time) and local configuration settings.

In a modern network with layer two switches and layer three routers, this kind of regulated transmission can (and should) occur at each of these devices as it forwards packets (or layer two frames) onto the next hop data link.

This article does not address how data is received from a data link. Nor does this article deal with "offload" software that exists in many network interfaces except to recognize that that offload software and the NIC in which it runs are computers facing the same transmission regulation issues as switches, user computers, and servers.

This article closes with a discussion concerning how one might emulate the effects of transmission of data over a network.

Terminology

For simplicity the rest of this article will refer to both layer two switches and layer three routers as simply "switches".

User computers and servers exist on the edges of the internet. In this article they will be described as "edge computers" or "edge devices".

When perceived at layer two, data is usually packaged as "frames". At layer three the word is "packets". Usually when carried over a data link each packet is contained in a frame. This article will tend to ignore this often significant distinction and will use the words "frame" and "packet" interchangeably.

The term "transmit queue management" is intended as an umbrella term for all forms of regulating the outgoing flow of packets (or frames) from a device onto a data link. This includes everything from simple first-in-first-out scheduling to fair queueing algorithms (possibly involving multiple "fair" queues) to active queue management (AQM) methods such as fq_codel. The term also includes "traffic shaping" to regulate and constrain data rates of designated flows.

For purposes of this article we do not draw a strong line between switches and the computers (user machines or servers) at the edges of the network. The reason for this is that those edge computers contain many of the same elements as switches.

There Is Always At Least One Data Link

Data Links carry data over a distance, be that distance long, short, or very short. Data, bits or packets, travel over distance only on Data Links. Every computer on the Internet is attached to at least one Data Link. The same is true for every switch.

Data Links Are Complicated; Switches And Routers Are Even More Complicated

Networks are complicated. Even Rube Goldberg would be awed by the complexity of modern networks. That complexity descends down even to the most basic elements of networks - the "wires" that connect and the switches that are interconnected.

Understanding Data Links in Computer Networks

Data links connect things (switches and computers) over distance.

Modern data paths are getting rather complicated. For example, what looks like a simple USB-C cable is likely an intricate system containing tiny computer chips at either end that manage power and the data flows.

Data links can be composed of one or more logical (or physical) parallel channels. Data - whether bits or packets - may be spread, "multiplexed" or "load balanced", across those channels. (On modern data links the usual unit of multiplexing is the frame or packet; the spreading of bits across parallel channels has become less common.)

One common example is long distance data trunks between the same endpoints. These trunks are composed of parallel electrical circuits that appear to the switches at either end as separate links but are logically bundled together by those switches.

What we initially perceive as a single data link may have multiple channels, each of which may be a separate, independent data link. For example, we may think of a single fiber optic cable as being one data link. However it may be divided into different “colors”, each of which is its own data link.

Parallel data links are sometimes bundled together. This can sometimes cause reordering of frames or packets. (For instance an original sequence of frames A-B-C-D could arrive as something like A-C-D-B).

Data links are imperfect; they are often noisy. This noise can be internally generated from the electronics or come from external events such as lightning or a tree limb waving in the wind across a microwave link. Many types of link hardware have error detection so that bad data can be discarded at the receiving end; some types have error correction that can repair short bursts of errors.

Users often do not see this noise (except as a perception of slowed performance.) The Internet protocols have checksums and "CRCs" (Cyclic Redundancy Checks) to detect errors. Many Internet protocols have end-to-end feedback handshakes to recover lost or damaged data.

Switches

Switches (and routers) are entirely different kinds of things from data links.

Switches are interconnected with one another (and with computers, whether those be user machines or servers) by data links.

Switches are computers with memory to hold data. Switches use complex software to make forwarding decisions and to manage the data flows.

Switches may have filters that classify traffic into packet queues. Switches may have code that decides how to forward traffic via packet queues to interfaces to "next hop" datalinks. In other words, switches may contain many packet queues.

It is quite common for switches to give priority to certain data flows or to impose rate limits on other data flows. Sometimes this is as simple as moving priority packets towards the front of the queue. Sometimes, this means holding packets back for a while. But because queues are finite (and also because data can become stale and worthless) packets may be silently discarded.

Further on, this article will delve more deeply; you will encounter Active Queue Management (AQM), Token Buckets, Leaky Buckets, Bit Clocking, RED, Fair Queueing, CODEL, and more.

Every Switch, Every Data Link Erodes Data Quality In Some Way

There is no switch and no data link that does not erode data quality in some way. The best we can hope for is that that erosion is merely the addition of a small amount of transit time.

However, the erosion can be worse. Packets can be lost. Packet content can be damaged. The sequence of packets may be altered. Packets could be duplicated. The time intervals between packets may be altered so that packets that once flowed with nice, even gaps are bunched into tight trains (bursts) with barely any time between them. Or, what were once tight bursts are spread out with added inter-packet time intervals.

The Impact of Individual Data Links And Switches Along A Path

Data packets typically cross the Internet (or even just a local network) in a series of store-and forward hops across several switches and data links.

The aggregate impact is that data packets rarely travel across the net without enduring a cumulative gauntlet of effects. And because the path travelled by packets can be long and subject to changes in conditions and noise, it can be hard, even impossible, to fully predict end- to-end data rates, packet arrival intervals, sequencing, or loss rates.

A Deeper Look

Every device - user computers, servers, and switches - transmits through a network interface (often called a "NIC") onto a data link.

(Many NICs are computers in their own right, often with potentially deep transmission queues and complex "offload" software. The impact of this software and its transmission queues can be significant and ought not be ignored.

Let's look at two topics:

  • First, we will look at the transmission of bits from any computer (whether it be a user computer, a server, or a switch) onto a data link.

  • Second, we will look at the management of pending transmission queues inside switches, user computers, and servers.

Is there a difference between edge devices, such as user computers or servers, and switches? From a networking perspective the principle difference is that the former tend to have few interfaces to data links and the latter have several. For edge devices choosing which network interface to which data link is usually a simple decision. For switches that decision can be much more complex. However, whether a device is a desktop, laptop, or server on the edge of the network, or is a switch inside the network, the task of managing transmission queues is quite similar.

You may ask "what about the reception of bits from a data link into a computer?" There are indeed many optimizations that are deployed to improve reception efficiency, such as aggregation of layer two frames and use of polling rather than interrupts. However, those aspects are largely local and invisible to the overall flow of data across a network in total or even across a directly attached data link.

Transmission Of Bits And Packets Onto A Data Link

If one digs down into the source code in an operating system kernel and looks at the actual code that sends data out onto a data link, things might look quite simple. Often we might see but a single line of code that tells an interface device to pull a buffer from memory and send the buffer's contents.

Often, much of what happens at this stage is done by hardware (or firmware running in what we think of as hardware.)

Some network interface (NIC) hardware contains "offload" code that attempts to relieve the operating system of some amount of routine work, such as computing checksums, or to reduce the number of interrupts generated by splitting outgoing IP packets into data link frames. From our point of view these are actually extensions of the operating system - and may create a new, and often hidden, layer of transmission queue management. (If you get the sense that networks follow the ancient model of the world as the top layer of an infinitely deep stack of turtles you would not be wrong.)

How a user computer, server, or switch integrates these “offloads” with its own outgoing queue management is a complex issue that is often overlooked.

There Are Several Kinds of Transmission Delay

When bits are finally sent the following kinds of delay occur:

  • Channel access delay: Many forms of data link media are like old "party line" telephones: they are shared. This is true for many kinds of radio-based links such as Wi-Fi or satellite uplinks. For these kinds of media, a media-specific procedure must be followed to sense whether the line is in use, to bid for access to the channel, and to resolve collisions when those bids are contentious. This takes time and can result in a non-trivial delay. That delay can be almost zero for wired Ethernet using lightly loaded Ethernet switches. That delay can be several seconds for shared geosynchronous satellite channels.

  • Transmission serialization delay: Data bits are almost always clocked out onto the transmission media bit by bit by bit. (Sometimes larger units, such as bytes or even 32-bit or larger words are sent as a unit, but that's largely the province of internal computer busses.) Clocking bits onto the medium takes time. That time is fairly small when the medium is clocking bits at gigabits per second. But many kinds of media are slower, especially down in the Internet of Things (IoT) and low-power world.

  • Propagation delay: No medium moves bits faster than the speed of light, most media moves bits rather more slowly. This will add overall delay. In addition, some media types have error detection and correction. If those are in use they can add delay, especially if an error does occur and the correction mechanisms are triggered.

  • Reception deserialization delay: Just as when sending bits, arriving bits must be clocked off of the network and into a buffer in the memory of a switch or edge device. Usually reception deserialization delay can be considered as running in parallel to transmission serialization delay, so the cumulative delay is one or the other, not both.

  • Reception notification delay: Modern computers and operating systems do not like to be interrupted. Those interrupts can reduce the efficiency of memory caches and cause software scheduling overhead. Many modern network devices minimize the use of interrupts; the operating system compensates by polling to see whether a device has incoming data. Modern computers and operating systems have tuned this so that delay is limited; but there is some delay nevertheless.

Don't Forget Speed Of Light Delays

Most of us have the speed of light committed to memory - 300,000 kilometers (186,000 miles) per second. That is light traveling through a vacuum.

Most data links on the internet are either copper wires (coaxial cable or twisted pairs) or fiber optical cables. Signals on fiber optics travel (propagate) at about 0.6 to 0.7 of the speed of light - 180,000 to 210,000 kilometers per second. It is generally safe to presume the same for signals on copper.

Consequently, when emulating data links one ought to add about one millisecond of latency for every 180 kilometers (112 miles) of emulated distance. Thus, emulation of a 4,100 kilometer straight link between San Francisco and New York deserves about 23 milliseconds of one-direction delay.

But remember that on the Internet, most trans-continental pathways are not straight lines. Indeed Internet paths can often be quite long, with multiple segments, simply to go what is, to humans, but a short distance.

Don't Forget Header Bits

Many people do not fully comprehend how many headers and trailers wrap their data. For example, the Ethernet frame header and trailer add 304 bits. (This includes an 8 byte preamble, 12 bytes of MAC addresses, 2 byte type field, 4 byte CRC, and 12 byte interframe gap.) That is on top of the bits used by IPv4 or IPv6 headers, UDP or TCP headers, and any higher level protocol header.

These bits are too often forgotten.

And when forgotten the data link emulation will not be accurate.

Take a look at our short article on this: Counting Bits.

Failure to account for header bits can cause substantial errors. For example, the common "iperf" family of tools tend to measure only transport data, i.e. only the data carried by TCP or UDP, rather than the entire bit-burden of the that data plus transport (TCP or UDP) headers plus IP (V4 of V6) headers plus Ethernet headers. See our note on this: Does IPERF Tell White Lies?

Transmission Queue Management

Transmission queue management attempts to accomplish various (and not-mutually exclusive) goals:

  • Rate limitation (either in terms of bits/second or packets/second)

  • Queue overflow protection

  • Discarding or sequencing of frames and packets to reduce impacts on higher level protocols

Why Would Any Network Device Not Want To Send Data Immediately?

When sending a data packet (or layer two frame) the basic decision of a switch (and user computer or server) is whether to send it now or send it later. Often this basic decision may consider whether to simply and silently discard the packet (or whether to discard another, older, packet that is waiting its turn in the transmit queue.)

When the Internet was young the general rule was "send it as soon as the outgoing data link is ready." (Remember, this article is focused on the packet and frame layer; this article does not generally deal with end-to-end flow control such as TCP's windowing machinery.)

It was soon discovered that, much like highways, pushing traffic into the system as soon as possible can lead to congestion at points where multiple links (or roads) come together or where a fast path (a wide highway) is connected to a slower (narrower) one.

Early switches protected themselves by simply discarding packets when internal resources were inadequate. Usually a simple tail drop algorithm was used, which is a fancy way of saying "the newest packet gets tossed." That is often not a good choice, particularly as it often affects higher level protocol algorithms such as used in TCP to detect and avoid congestion.

Newer software in edge computers and switches uses a variety of algorithms to manage transmit queues. The word "manage" in this case often means "choosing which packet to discard". But it also means selection of which packet to send next, and when.

Devices Often Contain Multiple Queues Awaiting Transmission

Transmission queue management in a single device may involve more than one queue of outgoing packets. This is typical of systems that try to be "fair" so that the entire capacity of an outgoing network link is not consumed by a few dominant sending applications.

A useful mental model is to imagine each device having one or more queues of packets lined up to be sent out of a network interface.

That interface itself is attached to a data link of some sort and will abide by the several constraints (previously listed) under the header "There Are Several Kinds of Transmission Delay".

When there are multiple queues there is always a mechanism that choses among those queues to pick the next packet to be handed over to the network interface. This could be a simple round robin algorithm or it could be something more complex and could well involve a calculation of when each of the front-of-queue packets could arrive at the next hop destination or final destination. This is often done on the basis of packet header information or packet size in order to establish traffic priorities and per-flow rate limits.

Packets on networks tend to flow with the least end-to-end loss and with more consistent end-to-end latency when there is no congestion.

Yet many, perhaps most, network applications are written to send data as fast as that data can be pushed into the outgoing network protocol stack. This can result in surges of traffic from an application. Networks are full of switching and routing points where multiple incoming data streams can converge. When a traffic surge hits one of these points there may not be sufficient outgoing capacity to immediately forward those streams. As a result these switching devices will build internal queues in which they will store packets until they can be sent. Switching devices have finite amounts of memory, so they must protect themselves against queue memory exhaustion.

In addition queues cause packets to be delayed. Sometimes that delay can cause a packet to become stale in some way: real-time data may become worthless, a retransmission packet may have been sent and is also in the queue, etc. Consequently, active queue management may review the data in a queue and cull these kinds of packets.

There is a network syndrome called "bufferbloat". This occurs when the availability of large amounts of inexpensive buffer memory has led vendors of switches and routers to use that memory to create potentially large queues. Those large queues can build up, which will cause delay. That delay can trigger end-to-end protocol interactions (such as TCP congestion detection) to trigger and reduce source traffic. The overall impact is to cause effective end-to-end data rates to drop significantly, to pulse between fast and slow, or to allow bandwidth to go unused. Considerable work has been done in the last few years to redress this serious problem. Much of that work has been to develop active queue management algorithms and code.

Drilling Down: A Single Queue

Let's look at what could happen to just one of these queues of ready-to-go packets.

Rate Limiting Release From The Queue

It is not uncommon for an edge device or switch administrator to consider the characteristics of the outbound paths from that edge device or switch. Often that outbound path may have an inadequate traffic-bearing capacity. Often that capacity limit is several switch hops further towards the destination.

So administrators may impose rate limitations so that an edge device or switch does not make things worse at those points of downstream congestion by throwing an excessive data load onto those points.

Administrators may also chose to leave spare capacity on their network as a "just in case" safety margin.

The way that this is accomplished is to impose a rate-limiter of some kind on a queue (or across all queues).

This is usually done through a "token bucket" or a "leaky bucket" algorithm. These are general classes of algorithms and the details may (and probably will) vary.

One important aspect of these algorithms is how they deal with time in the near-past that went unused.

If the rate limit is being imposed to correspond to a clocked, bit-serial link (such as a modem, radio link, or a fiber optic line) then that past time is of no interest and should take no part in the limit calculation.

On the other hand, if the limit is being imposed to avoid overrunning a shared switching or routing resource that uses packet queues (as all such devices do) then the rate limit calculation ought to consider whether time recently past has been used. Often recently past, but unused, time has allowed queues in those switches to drain, thus allowing short over-rate bursts of traffic.

Queue Overflow And Selective Elimination Of Packets From The Queue

Packet queue size is always limited. That limit may be based on the number of packets or the number of bytes.

When it comes time to put a fresh packet into a queue of ready-to-send packets, it may be discovered that doing so would cause the queue to exceed its limits.

When this happens something must be discarded.

The method of selecting which packets to discard may, and often does, have an impact on the operation of higher level protocols. For example tail-drop of packets may cause TCP to incorrectly detect path congestion and initiate its various methods of slowing down in order to avoid making that congestion worse.

There are various methods to select which packet to discard. Some of the more common methods are these:

  • Tail Drop - This is perhaps the most common method. The newest packet (the one that triggered the queue overload condition) is discarded.

  • Head Drop - The oldest packet (the one at the head of the queue, the next in-line to be sent) is discarded.

  • Random Drop - A randomly selected packet in the queue is discarded.

  • RED (Random Early Detection) - https://en.wikipedia.org/wiki/Randomearlydetection There are several variations of this method.

As a general matter, more complex active queue management techniques are starting to replace or supplement these relatively simple methods.

Modern Active Queue Management

As was mentioned, the way that packets are handled (and perhaps discarded) in switching devices can have an impact on higher level protocols such as TCP.

Over the last few years a small group of computer scientists has investigated the impacts of queue strategies on end-to-end network throughput.

This group has come up with useful approaches, along with running code, to greatly improve the prior state of the art. And they have argued and cajoled that code into an increasing number of operating systems and devices.

Active queue management is a complex subject that goes well beyond the scope of this short article.

You may dig deeper by taking a look at:

Simulating Real-World Internet Protocols with Network Emulators

Earlier we described network paths as a sequence of data links and switches with the overall behavior rather difficult to predict.

So, when doing network emulation in a lab, one of the first steps is to decide what dominates the path: data links or switches.

Network emulation occurs on laboratory or test bench networks. These are usually quite fast relative to any real-life end-to-end pathway, and errors and lost packets are negligible. It is usually safe to assume that the test network is much faster and more reliable than any net that is being emulated and that, relative to the emulated network, delivery time across the lab net is effectively instantaneous and things like packet loss, duplication, or packet resequencing are non-existent.

Emulation tools generally are designed to emulate a single hop, i.e. one data link or one transmission queue. This is to avoid the effectively infinite complexity of concatenating the effects of the several data links and transmit queues that exist on real network paths.

Therefore, it is generally up to the operator of a network emulator to consolidate and merge the end-to-end effects so that they can be entered into the emulator.

A Caveat

Network emulation is synthetic, it is not reality. Certain folk wisdoms apply:

  • Your mileage may vary.

  • In theory, theory and practice are the same; in practice they are not.

When testing protocol software it is important to use network emulation to stress the implementation under a range of conditions even though not every possible condition will be generated by that emulation.

Side Effects

Whether one is emulating a data link or the queue management in a switch, a side effect will be delay of packets. As a general practice one ought not to combine a variable delay (jitter) kind of emulation with data link or transmit queue management emulation - the combined delay may be difficult to predict.

Emulating Data Links

Data Links are emulated using two basic methods:

  • Calculating when the last bit of the packet awaiting transmission should arrive at the destination. Then, that packet is held until a few microseconds before that calculated departure time and then transmitted onto that lab network.

  • Token bucket algorithms that pace the outflow of frames and packets based on the accumulation over time of tokens (or credits). A packet or frame is held until there are sufficient tokens available.

These two methods are frequently configured along with queue management emulation.

That arrival time calculation must take into account the full size of the packet, including all header fields, and all of the factors enumerated previously such as serialization and speed-of-light delays.

When configuring data link emulation you should take care to chose whether you want to take recently unused capacity into account. When emulating things like a fiber optic link or an Ethernet cable, past capacity is permanently lost. When emulating switches and routers then you should select an emulation method that allows very recently past unused capacity to be available.

In the IWL network emulators this distinction is typically described as "bit clocked" versus "token bucket".

IWL's network emulators give the user several additional knobs to control these and other link emulation characteristics.

Emulating Queue Management

As one might anticipate, emulating the aggregate impact of queue management in a sequence of switching devices along a network path can be complicated - a good emulator will have a large number of configuration knobs, dials, and options.

Certain aspects of queue management may be emulated using parts of an emulator that are not clearly marked as "queue management".

For example, in the IWL network emulators, the accumulation and sudden release of packets ("dam bursting") and the re-sequencing of packets is done as part of the delay component.

When emulating a full path there may be multiple ways to emulate a particular characteristic. For example, in the IWL emulators one may dial-in an overall packet loss rate (including bursts) in a specific "Drop" impairment. Or such packet loss may be dialed in via a queue management setting in a "rate limit" impairment component.

Conclusion

Sending frames (and packets) of data onto the Internet is complicated.

Undisciplined transmission (whether as an original source or as a relay) can cause congestion and increased end-to-end delays.

There are transport layer (layer 3) end-to-end methods of flow control, such as TCP's window based flow control. But those methods typically deal with only one transport flow between two endpoints.

The Internet gains much of its power because it is essentially multiplexing many flows of packets between many endpoints onto the shared packet-switching fabric of the Internet. The Internet works best when all of the players - sources of traffic - play fair and do not try to consume all available resources.

Consequently, down at layer 2, all Internet devices, whether they be the end-points or an intermediary relaying (store and forward) device, must take care when emitting packets.

Often being a good network citizen requires that a device refrain from sending packets for a period of time. Sometimes it means holding back some packets while sending others. And sometimes it means discarding packets.

The algorithms used by devices to regulate their sending of data are evolving.

Older algorithms were simple and sometimes tended to interact negatively with higher level protocols, causing reduced end-to-end throughput or increased end-to-end latency and jitter.

Newer algorithms are being developed, coded and deployed; but they are not yet ubiquitous.

Because of the shared nature of the Internet and because of these algorithms, it has become rather difficult to create accurate quantitive descriptions of the end-to-end behavior of a network path.

Emulating and testing end-to-end paths is, therefore, somewhat of an art of approximation.

One who desires to use a network emulator to mimic the behavior of a path between two devices must begin by developing a rough quantitative model of the path to be emulated.

This model might be constructed by looking and measuring an actual, real network, and observing its delays, losses, duplications, re-sequencing, and such. This can be quite difficult. Measuring tools often produce aggregate numbers that mask the short and long term dynamics that are occurring underneath. For example see our notes Does IPERF Tell White Lies? and Why You Shouldn’t Believe Network Speed Tests.

Or the model might be constructed using hypothetical numbers for delay, jitter (variation of delay), loss, burstiness, etc.

Burstiness can be "high frequency" or "low frequency". High frequency busts of latency or loss tend to happen over short time spans, often much less than a second. Low frequency changes are things that happen over long time spans, such as what might happen between daytime and nighttime, or as a relay satellite transits across the face of the sun (and thus blinding the ground station.)

Most network emulators compress the entire end-to-end nature of a path into a single set of numbers and impairments. Thus the emulation is usually somewhat artificial.

When emulating a network path in order to test how code in devices handles less than laboratory-perfect, metronomed, noise-free packet flows the test regime should consider a large number of test cases that vary the emulation configuration. (It can be quite difficult to look inside a device being tested so observe and measure the impact of variations in packet flows.)

Even with the best of network transmission algorithms, high speed links, and solid code, the Internet will always do surprising things. Packets will sometimes arrive and trigger obscure behavior or bugs. Remember, the Internet seems to be firm practitioner of Murphy's Law: