week 11: networking and communications

this week's assignment was particularly fitting, given that i was out of town all week networking (with company executives) and communicating (my career goals to various interviewers). needless to say, i didn't get a chance to work on this assignment at all. HOWEVER, i've been doing some networking research of my own, which i'm happy to share here.

i have a UROP (undergraduate research opportunities position) at MIT's Computer Science and Artificial Intelligence Lab (CSAIL) working with a graduate student on exploring various internet protocols and their benefits and drawbacks. the most commonly-used protocol today is something called sfqCoDel, which stands for stochastic fairness queuing (sfq) with controlled delay (CoDel, pronounced "coddle"). it's a fairly simple algorithm: the packet scheduler works in a round-robin fashion, splitting packets up into a number of different queues and pulling a packet from each queue in order. this is the sfq part of the algorithm--since each packet in a flow tends to go to a different queue, and each queue gets the same amount of attention from the scheduler, each flow gets (stochastically) the same amount of network time. hence: stochastic fairness. CoDel comes in to limit queue size and cut down on something called bufferbloat--it strategically drops packets from queues to make sure that packets aren't sitting in queues for a ridiculously long time. the theory is that, for most applications, a few dropped packets impacts performance much less than high queue delay. this is certainly true for some applications, but not for all--think about something like a videoconferencing service, for example. dropped packets mean patchy audio or video; queue delay means delays between when the sender says something and when the receiver hears it, or lag between the audio and video streams. dropped packets and long delays are what cause most of skype's performance issues. it's a balancing act, then, between what can and cannot be sacrified for good performance.

the research

my grad student's idea is this: choosing one scheduling algorithm over all others has more of a negative impact on network performance than we think. it affects performance of many different services, a good proportion of which are commonly used in today's mainstream network interactions (video and music streaming being two primary examples). he and i are exploring the performance of a few different scheduling algorithms on different types of networks--networks with mostly short flows, networks with a gaussian distribution of flow sizes, and empirically-distributed workloads (mostly short flows with a few VERY large flows, or a heavy-tailed distribution). right now we're mostly in the data-collection phase, although in my basic analysis i've uncovered a few interesting trends. sfq performs quite differently with and without CoDel, for example:

las (least-attained service), an algorithm which gives preference to shorter flows, may have its advantages over sfq in an empirical workload, and maybe even in a more evenly-distributed workload if shorter flows are much more important than longer ones:

las and sfq on an empirical workload, performance on both small and large flows

las and sfq on an empirical workload, average performance

las and sfqCoDel

rather than using traditional metrics (throughput and delay) to analyze protocol performance, we're using something called flow completion time (FCT). again, this is an important metric for audio and video streaming applications, since it directly affects the smoothness of the streaming. FCT is measured as the time between the when the first packet was sent to the time the last packet in the flow was received (successfully).
it's pretty interesting stuff--though all the testing has been done in simulation so far, it's given some promising results. i look forward to when we get to test on real networks rather than simulated ones. should continue to be cool!