Saturday, July 17, 2010

Requesting blocks in Bittorrent

I'm working on perhaps my 4th iteration this week of an algorithm to decide the next block to request. This is kind of important; done poorly it leads to certain inefficiencies, such as getting the same block from many peers and not reducing rarity of pieces in the swarm.

I could look at the algorithm in use by other clients, and that would certainly be interesting. However, a particular algorithm is not required by the protocol, and it is a fun problem. Checking if this is covered in academia, and what research has occured would be interesting. Perhaps next Tuesday. For the moment, I've scrapped a couple of designs and mutated others; it's my type of puzzle, and I don't particularly want to have the journey spoilt - though there is an issue that the result will not be the most optimal (;-)), and not as tested as existing solutions.

I work on the idea that pieces have rarity, and if given the opportunity it would be best to request blocks from a rare piece rather than a common piece (yep, basic bittorrent). In order to complete a piece there needs to be several block requests (a block is an arbitrary segment which can be transferred between peers, and it can go over a piece boundary into the next piece). For efficiency from the requesting point of view, batch requesting several blocks is a good idea.

At some point there can be a peer available that can only help with a piece that has already started to be downloaded. This may be great; maybe the new peer has a fast connection. But it would be simpler if a piece was only being received from one peer. One of the issues I don't like is this: you're waiting on a particular block from peer A. Peer B could also do it, but for efficiency you don't want to ask a second peer. Time goes on, and peer B leaves the swarm. Time goes on, and your connection to Peer A is broken. Sure, it's a special case, and it perhaps doesn't alter the health of the swarm but it does potentially alter the acquisition time to get your content.

It's this kind of thing that bugs me, and has given rise to a few ideas in my planning. What piece should you target when you have a new peer? Maybe have a list of pieces not yet associated with any pending block request to other peers, and choose to work on one of those pieces first. If none of those is a valid option (ie. the list is empty, or only contains pieces this peer doesn't have), fall back to a list of pieces that don't have requests pending for all of the piece yet, then finally fall back to list of the oldest requests, or something similar, or ignore and download whatever is available with no regard for other requests with other peers.

I've started such a thing, and work on picking through missing bits of a piece for a potential block request, but this isn't the route I'm taking. Maybe a few ideas have been kept, however I've still disposed of more code than kept, and there's much more that needs to be written before it's complete.

At the moment it looks like I'm working with a priority queue of potential pieces, that move about based on rarity and current pending requests. The queue has survived the last major removal of code, though things at the level of picking the block request have been scrapped and restarted. The ability to batch request several blocks at a time forced the last scrapping of code; it's fairly easy to pick a single block at a time, but several blocks is painful. It may be okay to request the same block from a couple of peers (at some point), but I was heading in the direction of making the request for the same block from the same peer... and if catching this case, forcing it to consider anything else is kind of difficult, when making a request for an already pending block is valid in some cases. It seems like a simple problem, and it is, but for clarity it meant dumping code and a different approach.

I should probably only spend a day or two more on this, I don't want to lose pace releasing code. Days ago I felt I was in the same place of dealing with memory issues as I was some time ago, and it almost stopped me. Thankfully, these days I work on smaller pieces of code that can be tested independently and I could eventually reduce my problem to a single line of code. The fact I now have to frequently spam the garbage collector with requests to clean up sucks, but at least the problem of chewing huge chunks of memory receiving anything from a socket, even if it's never assigned to a JavaScript variable is handled. I was very, very stressed; I was in the same position I was in last year and all my options looked the same, and I didn't solve it last time. So relieved to track it to garbage collection.

No comments:

Post a Comment