Tuesday, July 20, 2010

Requestin' Blocks in Bittorrent (Part II)

I've worked on a XPCOM component to figure out what block (piece subsection) to request next. The component is probably complete, it's passed parsing and there are no exceptions on calling methods. I've got an idea for a set of tests, so perhaps I'll write that before releasing the code.

I expected to just write code to identify the next block to request, however it kept growing as the decision became influenced by more variables. In the end, the decision of what block to request came as a result of (i) what had already been downloaded, (ii) rarity/what has already been requested, and (iii) what messages have been sent/received from the peer. The third item grew the component so that it now functions by responding to peer messages. The table below summarizes how the component has been written to respond to incoming messages.

General behavior of bittorrent perhaps?
Received MessageResponse Message
BitfieldInterested (if peer has pieces we don't)
HaveInterested (if peer has pieces we don't & we haven't sent Interested)
Choke-
UnchokeRequest (if they still have pieces we don't) else Not Interested
InterestedUnchoke
Not InterestedChoke
PieceRequest (if they still have pieces we don't) else Not Interested
RequestPiece

What isn't covered here, though I think is probably appropriate, is a timeout associated with each peer that will either send Have messages of completed pieces since last timeout, or a keepalive. I haven't written the timeout, but provided it existed and worked in such a manner, there doesn't really need to be any more intelligence in the program, with the exception of handling bandwidth.

Saturday, July 17, 2010

Requesting blocks in Bittorrent

I'm working on perhaps my 4th iteration this week of an algorithm to decide the next block to request. This is kind of important; done poorly it leads to certain inefficiencies, such as getting the same block from many peers and not reducing rarity of pieces in the swarm.

I could look at the algorithm in use by other clients, and that would certainly be interesting. However, a particular algorithm is not required by the protocol, and it is a fun problem. Checking if this is covered in academia, and what research has occured would be interesting. Perhaps next Tuesday. For the moment, I've scrapped a couple of designs and mutated others; it's my type of puzzle, and I don't particularly want to have the journey spoilt - though there is an issue that the result will not be the most optimal (;-)), and not as tested as existing solutions.

I work on the idea that pieces have rarity, and if given the opportunity it would be best to request blocks from a rare piece rather than a common piece (yep, basic bittorrent). In order to complete a piece there needs to be several block requests (a block is an arbitrary segment which can be transferred between peers, and it can go over a piece boundary into the next piece). For efficiency from the requesting point of view, batch requesting several blocks is a good idea.

At some point there can be a peer available that can only help with a piece that has already started to be downloaded. This may be great; maybe the new peer has a fast connection. But it would be simpler if a piece was only being received from one peer. One of the issues I don't like is this: you're waiting on a particular block from peer A. Peer B could also do it, but for efficiency you don't want to ask a second peer. Time goes on, and peer B leaves the swarm. Time goes on, and your connection to Peer A is broken. Sure, it's a special case, and it perhaps doesn't alter the health of the swarm but it does potentially alter the acquisition time to get your content.

It's this kind of thing that bugs me, and has given rise to a few ideas in my planning. What piece should you target when you have a new peer? Maybe have a list of pieces not yet associated with any pending block request to other peers, and choose to work on one of those pieces first. If none of those is a valid option (ie. the list is empty, or only contains pieces this peer doesn't have), fall back to a list of pieces that don't have requests pending for all of the piece yet, then finally fall back to list of the oldest requests, or something similar, or ignore and download whatever is available with no regard for other requests with other peers.

I've started such a thing, and work on picking through missing bits of a piece for a potential block request, but this isn't the route I'm taking. Maybe a few ideas have been kept, however I've still disposed of more code than kept, and there's much more that needs to be written before it's complete.

At the moment it looks like I'm working with a priority queue of potential pieces, that move about based on rarity and current pending requests. The queue has survived the last major removal of code, though things at the level of picking the block request have been scrapped and restarted. The ability to batch request several blocks at a time forced the last scrapping of code; it's fairly easy to pick a single block at a time, but several blocks is painful. It may be okay to request the same block from a couple of peers (at some point), but I was heading in the direction of making the request for the same block from the same peer... and if catching this case, forcing it to consider anything else is kind of difficult, when making a request for an already pending block is valid in some cases. It seems like a simple problem, and it is, but for clarity it meant dumping code and a different approach.

I should probably only spend a day or two more on this, I don't want to lose pace releasing code. Days ago I felt I was in the same place of dealing with memory issues as I was some time ago, and it almost stopped me. Thankfully, these days I work on smaller pieces of code that can be tested independently and I could eventually reduce my problem to a single line of code. The fact I now have to frequently spam the garbage collector with requests to clean up sucks, but at least the problem of chewing huge chunks of memory receiving anything from a socket, even if it's never assigned to a JavaScript variable is handled. I was very, very stressed; I was in the same position I was in last year and all my options looked the same, and I didn't solve it last time. So relieved to track it to garbage collection.

Thursday, July 15, 2010

Info - Bittorrent Wire Protocol

Ohhkay. I'm not going to claim any great wisdom in this, but I will try to cover this component and what it does, with regards to the bittorrent wire protocol.

Specification, where?
Bittorrent.org's BEP 0003 covers the wire protocol, but the link to the section is crap. It's after this bit. Theory.org is a bit easier to digest in my opinion, starting with an introduction section.

Wire
After opening the TCP connection the two parties send a handshake, which provides their capabilities, identities and a confirmation of the torrent they are interested in (the info hash). Theoretically, the party opening the connection will send their handshake first, the party who opened the connection waiting until receiving the info hash before starting their part of the handshake. After this, the two parties use the connection in the same manner, sending messages to one another about what their state is and requesting parts of the torrent. If there's ever an error the connection should be broken.

Messages
Messages are structured with a 4 bytes header, which gives the length of the message, and for messages other than keepalive this is followed by a byte identifying the type of message.
The component informs an observer of the messsage id and the message payload after receiving a complete message. This is simpler than interfacing with the incoming stream directly.

Onto files:
Implementation of WireReader XPCOM component
IDL for WireReader XPCOM component

Example of usage:

// Attempt to connect to the peer
// These should be real values
var peerIpAddress = "127.0.0.1";
var peerPort = 6882;
var infoHash = String.fromCharCode(0,0,0,0,0);
infoHash = infoHash + infoHash + infoHash + infoHash;
var ourId = "01234567890123456789";
var remoteId = ""; // When remote peer id unknown, use empty string

var transportService =
Components.classes["@mozilla.org/network/socket-transport-service;1"].
getService(Components.interfaces.nsISocketTransportService);
var transport = transportService.createTransport(null,0,peerIpAddress,peerPort,null);

var listener = {
onHandshake : function ( context, reader ) {
// Received a valid handshake.
},
onRemoteId : function ( context, reader, remotePeerId ) {
// Received the remote peer id (20 byte identifier of remote peer)

},
onMessage : function( context, reader, messageId, messageBody ) {
// Good.
switch(message_id) {
// Keepalive doesn't have a message id, so we've used -1
case -1: alert("Keepalive"); break;
case 0: alert("Choke"); break;
case 1: alert("UnChoke"); break;
case 2: alert("Interested"); break;
case 3: alert("Not Interested"); break;
// etc.
default: alert("Message with id " + messageId);
}
// Message body is a string, representing the
// bytes in the message.
},
onDisconnect : function( context, reader ) {
// Disconnected
}
};
// Start reading from the input stream of the transport
reader.init(transport, ourId, remoteId, infoHash, listener, null /* context */);


More information on the wire protocol (Theory.org) Bittorrent Protocol Specification (BEP0003)
The bencoding XPCOM component.
The bittorrent tracker XPCOM component.
The Bittorrent File Info XPCOM component.
The Bittorrent File Storage XPCOM component.

Wednesday, July 14, 2010

Brick Wall - Memory Leak

Okay, I've reached a wall. Every time I do sockets, I end up with a memory leak.
Here's an example JS script that can be run with xpcshell (./run_mozilla.sh ./xpcshell pathToWhereYouSavedScript when you've built firefox).

The 'main' method for this program is testSockets(). It creates a server, and then connects to that server. We then see in testSockets_tick() that every 200 milliseconds we'll write more data out. We also see in the onDataAvailable() that the data is read, but not assigned to any variable. If this program is run, it chews and chews and chews memory.

I would assume that the inputstream/binaryinputstream is not holding the data, as calling readBytes() would release the data from the inputstream (and create a string, which would be soon freed by it's reference count being 0) - is this correct?

I would assume that the outputstream/binaryoutputstream is not holding the data, as it would be cleared after being sent - is this correct?

If both of the above are correct, where am I leaking?


/**************************
* This section is from mozilla/src/testing/xpcshell/head.js
* Licensed MPL 1.1/GPL 2.0/LGPL 2.1... as much as I'd like to spend half this post with a license,
* see http://hg.mozilla.org/mozilla-central/file/17dc041b9884/testing/xpcshell/head.js
*******************/
function do_timeout(delay, expr) {
var timer = Components.classes["@mozilla.org/timer;1"]
.createInstance(Components.interfaces.nsITimer);
timer.initWithCallback(new _TimerCallback(expr, timer), delay, timer.TYPE_ONE_SHOT);
}
var _pendingCallbacks=[];
function _TimerCallback(expr, timer) {
this._expr = expr;
// Keep timer alive until it fires
_pendingCallbacks.push(timer);
}
_TimerCallback.prototype = {
_expr: "",

QueryInterface: function(iid) {
if (iid.Equals(Components.interfaces.nsITimerCallback) ||
iid.Equals(Components.interfaces.nsISupports))
return this;

throw Components.results.NS_ERROR_NO_INTERFACE;
},

notify: function(timer) {
_pendingCallbacks.splice(_pendingCallbacks.indexOf(timer), 1);
eval(this._expr);
}
};
/**************************
* Code we're testing.
*********************/

var serverInfo;
var clientInfo;
var data;
function testSockets() {
// Create the data
data = (new Array(1024 * 20)).join("a");
// Create a server
serverInfo = startSocketServer(6503,function(result) {
serverInfo = result;
print("Started");
testSockets_tick();

});
clientInfo = connectToSocketServer("127.0.0.1",6503,function() {
// okay, disconnected.
print("Disconnected");
});
}
var ticks=0;
function testSockets_tick() {
// Every tick, send more data to client
serverInfo.sout.write(data,data.length);
ticks++;
do_timeout(200, "testSockets_tick();");
}

function connectToSocketServer(host,port,callback) {
var result = { sout : false, sin : false};
var transportService =
Components.classes["@mozilla.org/network/socket-transport-service;1"].
getService(Components.interfaces.nsISocketTransportService);
var transport = transportService.createTransport(null,0,host,port,null);

var outstream = transport.openOutputStream(0,0,0);
result.sout = outstream;
var stream = transport.openInputStream(0,0,0);
var instream = Components.classes["@mozilla.org/binaryinputstream;1"].
createInstance(Components.interfaces.nsIBinaryInputStream);
instream.setInputStream(stream);
result.sin = instream;

var dataListener = {
data : "",
onStartRequest: function(request, context){
},
onStopRequest: function(request, context, status){
instream.close();
outstream.close();
callback();
},
onDataAvailable: function(request, context, inputStream, offset, count){
// Not even going to save the data we read, let it go.
instream.readBytes(count);
}
};
var pump = Components.
classes["@mozilla.org/network/input-stream-pump;1"].
createInstance(Components.interfaces.nsIInputStreamPump);
pump.init(stream, -1, -1, 0, 0, false);
pump.asyncRead(dataListener,null);
return result;
}

function startSocketServer(port,readyCallback) {
// http://web.archive.org/web/20080313034101/www.xulplanet.com/tutorials/mozsdk/serverpush.php
if (port === undefined) {
port = 7055;
}
var result = {};
var listener = {
onSocketAccepted : function(serverSocket, transport) {
result.sout = transport.openOutputStream(0,0,0);
var instream = transport.openInputStream(0,0,0);
result.sin = Components.classes["@mozilla.org/binaryinputstream;1"].
createInstance(Components.interfaces.nsIBinaryInputStream);
result.sin.setInputStream(instream);

if (!(readyCallback === undefined)) {
readyCallback(result);
}
},
onStopListening : function(serverSocket, status){
print("Server shutdown");
}
};

result.serverSocket = Components.classes["@mozilla.org/network/server-socket;1"]
.createInstance(Components.interfaces.nsIServerSocket);
result.serverSocket.init(port,false,-1);
result.serverSocket.asyncListen(listener);
return result;
}

testSockets();

/**************
* Code to force xpcshell NOT to quit, allowing async processing
* From https://developer.mozilla.org/en/XPConnect/xpcshell/HOWTO
***************/
gScriptDone = false;
var gThreadManager = Components.classes["@mozilla.org/thread-manager;1"]
.getService(Components.interfaces.nsIThreadManager);
var mainThread = gThreadManager.currentThread;

while (!gScriptDone)
mainThread.processNextEvent(true);
while (mainThread.hasPendingEvents())
mainThread.processNextEvent(true);




Update: I've separated out the code into a script that runs the server, and a script that runs the client. By watching the process id, I can see that it's the client that is having the problem. nsIBinaryInputStream perhaps. So, I kept trying a few things, changing the method I was using. I tried calling gc(); at different times - this worked, but I was getting some strange results depending on how often I was calling it. I used if(count++>20000) { count=0; gc(); } in the method that received data; this worked well, but if I changed it to if(count++==20000) it didn't work at all... bizarre. I've now moved the gc(); call into it's own function that is called every 4 seconds and it's going okay. This is fantastic to me. This could've been what was causing my issues a year ago, and to finally be rid of it... could this be true? Hmm... well, this debug session has left me tired and drained (it's 7:45AM), but I think it's left my code in a better state than I am.

Monday, July 12, 2010

Info - File Storage

If you've used a Bittorrent client, or maybe have installed many games using Steam, you'll notice that files are preallocated space before the download begins. This is different from your average HTTP (web browser) download, though many HTTP clients could do this - there's just less reason for it.
With Bittorrent the download will tend to occur by acquiring pieces of files not in sequential order. This isn't always the case, and some extensions to Bittorrent may mean it's easier to stream the file in the original order. Regardless, it's the norm for the files to be created and then parts overwritten as the file's real contents are received.

Preallocating
My notes reference DownThemAll for when the preallocation code was written - though pre-allocation in downThemAll is different, the general approach is the same: use the nsISeekableStream to get to the position where the last byte would be.

Inserting/Extracting
MDC's guide to writing binary data is excellent, though I've previously had trouble with the suggested file permissions (resulting in (NS_ERROR_FILE_ACCESS_DENIED) [nsIFileOutputStream.init] ). I've reached success through trial and error, though I should probably revisit this at sometime.
The extract method has the form extract(in nsILocalFile baseFolder, in nsIVariant pathArray, in long offset, in long datalength) returning an ACString. Path arrays in Bittorrent have been discussed previously; they represent the path to the file by splitting the path into several strings instead of relying on a OS dependent directory delimiter. The BaseFolder parameter allows an easy way of setting the location where the torrent will be downloaded without having to alter every pathArray.
I still use nsIBinaryInputStream with the readBytes function, which is why the ACString type is being used.

File Storage
So yes, all of this focuses on storing the data in files. It doesn't have to be that way, a torrent can be held completely in memory for other uses... I hope to be able to write about some fun with Bittorrent that is less concerned with just implementing the protocol. Here's Brad Goldsmith's CompTorrent for some interesting use of bittorrent. He has since submitted his PhD thesis, so there is further reading if you're interested.

Anyway, onto files:
Implementation of FileStorage XPCOM component
IDL for FileStorage XPCOM component

Examples:

Usage:

var service = Components.
classes["@wikiscraps.com/storage/file;1"].
getService(Components.interfaces.btIBittorrentFileStorage);
var baseFolder = Components.classes["@mozilla.org/file/directory_service;1"].
getService(Components.interfaces.nsIProperties).
get("TmpD", Components.interfaces.nsILocalFile);
baseFolder.appendRelativePath("test");
// We'll create two files, one in a subfolder
var list = [{path:["a.txt"],size:10},{path:["a","a.txt"],size:10}];
service.preallocate(baseFolder,list);
service.insert(baseFolder, ["a.txt"], 6, "cdef");
service.insert(baseFolder, ["a.txt"], 4, "abcd");
var val = service.extract(baseFolder, ["a.txt"], 4, 6); // returns "abcdef"


Usage with FileInfo:
var infoService = Components.
classes["@wikiscraps.com/fileinfo;1"].
getService(Components.interfaces.btIBittorrentFileInfo);
var storageService = Components.
classes["@wikiscraps.com/storage/file;1"].
getService(Components.interfaces.btIBittorrentFileStorage);
var baseFolder = Components.classes["@mozilla.org/file/directory_service;1"].
getService(Components.interfaces.nsIProperties).
get("TmpD", Components.interfaces.nsILocalFile);
baseFolder.appendRelativePath("test");
// Fake block
var fakeBlock = "Baaaaaaaaaaa";
var fakeBlockOffset = 5;
// Fake info
var fakeInfo = {
files : [
{ length : 10, path : ["a"] },
{ length : 10, path : ["b"] },
{ length : 10, path : ["c"] },
{ length : 10, path : ["d"] },
{ length : 10, path : ["e"] }
]
};
// Add piece length later, because it has a space in it's key.
fakeInfo["piece length"] = 6;

// Okay, we've faked the state, now
// let's try to save the block to files!

var fileList = infoService.listFile(fakeInfo);
var fileParts = infoService.blockToFileParts(fakeInfo, fakeBlockOffset, fakeBlock.length);
var completedIndex = 0;
for (var i=0; fileParts.length > i; i++) {
storageService.insert(
baseFolder,
fileList[fileParts[i].index],
fileParts[i].offset,
fakeBlock.substr(completedIndex,fileParts[i].size));
completedIndex += fileParts[i].size;
}


More information on the .torrent structure (Theory.org) Bittorrent Protocol Specification (BEP0003)
The bencoding XPCOM component.
The bittorrent tracker XPCOM component.
The Bittorrent File Info XPCOM component.

Saturday, July 10, 2010

File Storage for torrent

Writing received data from bittorrent is a bit of a pain. As previously described, a block may end up split into a number of files.
When I first did this, my code for handling a received block was complex. The task of splitting the block into parts relevant for separate files was mixed with the task of writing to files. A tad harder to read and debug.
Now, it's possible to cleanly work out the file parts and write to them. The file storage component is really simple, with just 3 methods: preallocate, insert & extract. I'll post the code tomorrow, for now here's a demo of how it's used with the FileInfo component:


var infoService = Components.
classes["@wikiscraps.com/fileinfo;1"].
getService(Components.interfaces.btIBittorrentFileInfo);
var storageService = Components.
classes["@wikiscraps.com/storage/file;1"].
getService(Components.interfaces.btIBittorrentFileStorage);
var baseFolder = Components.classes["@mozilla.org/file/directory_service;1"].
getService(Components.interfaces.nsIProperties).
get("TmpD", Components.interfaces.nsILocalFile);
baseFolder.appendRelativePath("test");
// Fake block
var fakeBlock = "Baaaaaaaaaaa";
var fakeBlockOffset = 5;
// Fake info
var fakeInfo = {
files : [
{ length : 10, path : ["a"] },
{ length : 10, path : ["b"] },
{ length : 10, path : ["c"] },
{ length : 10, path : ["d"] },
{ length : 10, path : ["e"] }
]
};
// Add piece length later, because it has a space in it's key.
fakeInfo["piece length"] = 6;

// Okay, we've faked the state, now
// let's try to save the block to files!

var fileList = infoService.listFile(fakeInfo);
var fileParts = infoService.blockToFileParts(fakeInfo, fakeBlockOffset, fakeBlock.length);
var completedIndex = 0;
for (var i=0; i < fileParts.length; i++) {
storageService.insert(
baseFolder,
fileList[fileParts[i].index],
fileParts[i].offset,
fakeBlock.substr(completedIndex,fileParts[i].size));
completedIndex += fileParts[i].size;
}

Fairly clear, and much easier to run tests on. A few things ahead bother me. Wrapping the nsIBinaryInputStream to handle the Bittorrent wire protocol should be easy, though I can see foresee that handling the received messages will probably end up with a suboptimal structure. Writing the code to order block requests will be a pain, and I will thankfully separate that into it's own component.

Friday, July 9, 2010

Info - File Information in Bittorrent

The information about files is stored in the info section of the .torrent file. There is a slight difference in this structure if the torrent contains one file or whether it contains more than one file. To make handling some of that information a tad easier, I have a collection of methods for common tasks and to hide the single/multiple file differences.

Subtle Differences
The structure of the info data is different depending on whether it's a torrent containing one file or many. This makes for a large number of "if" conditions when handling the info data, and it's just easier if this is hidden away. The old Firepuddle Firefox/Bittorrent extension didn't get around to handling multiple files; the info data is just the tip of the iceberg of pain that multiple file torrents can be.

Common Functions
The methods the service I've written provides are fairly simple; though partly that's to keep the handling of different info data structures in one place. listFiles(in nsIVariant decodedInfo) returns an array of file paths. A single file has its path stored in a torrent file as an array, for example ["foldername","foldername","filename"] which avoids the issue of an OS specific directory separator character. Handy. So, that means the result of listFiles is an array of arrays of strings. fileCount(in nsIVariant decodedInfo) returns the number of files. fileSize(in nsIVariant decodedInfo, in long fileIndex) returns the size in bytes of a file, given an index from the list. I should note, the ordering of the listFiles function is that provided by the info data.

Not so necessary functions
A couple of functions I don't believe I've used for handling the downloading/uploading of a torrent, but I've needed for another application of bittorrent are fileInTorrent(in nsIVariant decodedInfo, in nsIVariant filePathArray) returning a boolean result (the file path array is expected to be an array of strings, as described previously), and findFile(in nsIVariant decodedInfo, in nsIVariant filePathArray) to provide the index in the fileList of a particular file path. -1 if not found, as normal.

Tetris Block functions
During a session the content of torrent is transferred as a series of blocks, in no pre-ordained order. These blocks are some subsection of the torrent, and one block may end up split into multiple files.
To figure out where a block belongs, blockToFileParts(in nsIVariant decodedInfo, in long offset, in long blockLength) returns an array of objects. Each object specifies a fileIndex, the offset in the file, and how many bytes from the offset belong to that file. The structure of this returned data is like so: [{index:...,offset:...,size:...},...].
The torrent is split into a series of pieces that have a checksum. Sometimes not all files in a torrent are desired, so not all pieces will be needed. To find out which pieces are needed for a particular file piecesInFile(in nsIVariant decodedInfo, in long fileIndex) will return an array of numbers, each number representing a piece index necessary for the file specified by the fileIndex parameter.

Implementation of FileInfo XPCOM component
IDL for FileInfo XPCOM component

Example:

var service = Components.
classes["@wikiscraps.com/fileinfo;1"].
getService(Components.interfaces.btIBittorrentFileInfo);
var fakeInfo = {
files : [
{ length : 10, path : ["a"] },
{ length : 10, path : ["b"] },
{ length : 10, path : ["c"] },
{ length : 10, path : ["d"] },
{ length : 10, path : ["e"] }
]
};
// Add piece length later, because it has a space in it's key.
fakeInfo["piece length"] = 6;
// Check service methods
// Check the list files/file count/file size/file in torrent/find file
var fileList = service.listFiles(fakeInfo);
service.fileCount(fakeInfo); // 5
service.fileSize(fakeInfo, 0); // 10
service.fileInTorrent(fakeInfo, ["a"]); // this will return true
service.fileInTorrent(fakeInfo, ["z"]); // this will return false
service.findFile(fakeInfo, ["c"]); // this will return 2
service.findFile(fakeInfo, ["z"]); // this will return -1 (not in torrent)
service.blockToFileParts(fakeInfo, 10, 9); // returns [{index:1,offset:0,size:9}]
service.piecesInFile(fakeInfo, 1); // returns [1,2,3] (pieces are indexed from 0)

Note that none of this code deals with the relationship between pieces and hashes. At the moment I don't think that's needed in this service, though I might consider a verify piece against hash method at some point. This service will probably change by the time I've finished refactoring.

More information on the .torrent structure (Theory.org) Bittorrent Protocol Specification (BEP0003)
The bencoding XPCOM component.
The bittorrent tracker XPCOM component.

Thursday, July 8, 2010

XPCOM Generating SHA1 hash on a string

Reading some old code, thought this would be interesting. I tend to comment a fair bit, particularly when I'm frustrated or think I'll forget something. What follows are my comments from December 2008, and some updated SHA1 code.
/*****
* Generate a SHA1 hash of some data.
* Sources:
* Mainly based on example from https://developer.mozilla.org/en/NsICryptoHash
* With a conversion from string to array of numbers from http://www.cozmixng.org/retro/projects/piro/diff/backforwardthumbnail/trunk/content/backforwardthumbnail/backforwardthumbnail.js?compare_with=1438&format=diff&rev=1440
*
* Discussion:
* The basis of this function is the Mozilla documentation for nsICryptoHash,
* which does include a demonstration of generating a hash for a string -
* but it won't work (at least in this case). The Mozilla docs do some
* Unicode handling, however this generates an incorrect hash for our
* purposes. Instead, I found if I saved the data to a file, and then
* opened that file and passed the stream to nsICryptoHash, I would get the
* correct result. So, ignore the Unicode handling. Saving then opening a
* file is crazy, so a different stream was needed. StringInputStream was
* considered, and is used by FastServicesJavaScript (1) and Flock (2),
* but XUL Planet docs mention that the incoming string shouldn't contain
* null. This condition can't be guaranteed with the data we're handling,
* so another option was needed. The update method for nsICryptoHash can
* take an array of bytes. How can we convert a string to an array of
* bytes? Apparently, easily. Just create an array of the character's
* ordinal value using charCodeAt. This technique was found from
* cozmixng (3), and is a cut-and-paste of 3 lines.
* 1 - 2008/DEC http://wiki.fastmail.fm/index.php?title=FastServicesJavascript
* 2 - 2008/DEC https://lists.flock.com/pipermail/svn-commits/2007-September/012334.html
* 3 - 2008/DEC http://www.cozmixng.org/retro/projects/piro/diff/backforwardthumbnail/trunk/content/backforwardthumbnail/backforwardthumbnail.js?compare_with=1438&format=diff&rev=1440
* (3) is licensed under MPL, GPL & LGPL
* http://www.cozmixng.org/retro/projects/piro/browse/backforwardthumbnail/trunk/content/backforwardthumbnail/license.txt?rev=1429
********/

SHA1 = function (data) {
var hash = Components.classes["@mozilla.org/security/hash;1"]
.createInstance(Components.interfaces.nsICryptoHash);
hash.init(hash.SHA1);
// cozmixng split/map call
var byteArray = data.split('').map(function(c) { return c.charCodeAt(0); });
hash.update(byteArray, byteArray.length);
return hash.finish(false);
}

It's interesting re-reading this. I have found XPCOM components to be weird beasts, particularly when used in JavaScript. Looking back, this is partly from lack of understanding. I find understanding anything Mozilla will provide a headache; confirming a bug for example. It seems to me that once someone comes to understand anything in Mozilla, it is best if they blurt out that knowledge lest it be lost.

I really miss XUL Planet's documentation on sockets, that good old Pushing and Pulling. I feel lucky to have been able to catch that, and read their XPCOM reference before XUL Planet reached a decision to drop the content.

A few explanations on the SHA1 function, if this helps others. I tend to read data from sockets, and use nsIBinaryInputStream to do it. I don't use nsIScriptableInputStream, in fact I will probably never again use nsIScriptableInputStream simply because it doesn't handle null values. In the past, I've used the readBytes method (of nsIBinaryInputStream) which provides the result as a string in JavaScript. This may have been a good idea, maybe a bad one. Regardless, it is the reason for the above need to call split() on a string. The string is really just a collection of bytes, but now it's in JavaScript so everything has become a little odd, so we need to change the appearance of the string into an array of numbers (split to an array, map to numbers), and then let XPCOM handle the conversion so nsICryptoHash receives an array of bytes.

It seems so bloody obvious now, but reading the comments above replays part of the journey it took to get there.

Wednesday, July 7, 2010

Liking getService over createInstance (plus some xpcshell unit tests for bittorrent extension)

So, I've finished the tracker code as a service, instead of creating an object for each tracker we communicate with. I like it. Next, I'm writing a service to simplify interpreting the info section of a .torrent file (once this data has been bedecoded). Actually, the info section is really easy to understand, as seen in Theory.org's Bittorrent Protocol documentation. It's just that the common things I want to know are not always easy to infer directly from the info section.

If we receive a block 100 bytes long that belongs at offset 3,000 in the torrent, what part of which files does this represent? I like to have a function that given the offset and block length will convert this to an array of objects each containing the filename, offset in the file, and how many bytes to write. You know? Something like blockToFileParts(decodedInfo,offset,blockLength) returning [{file:...,offset:...,size:...},...]. Now I can start taking chunks from the received block and placing them in the correct places in files.

Sooo, that's what's next. I've been testing it with a bunch of unit tests, which have proven quite valuable. I've caught several bugs in a controlled situation, which is much easier than debugging using a network sniffer and the good old pen and paper calculator. Here's a peak at the tests, I'll see about dropping the component tomorrow. It's obviously very much a specialised utility class, it doesn't appear obviously useful to anyone but myself at this time.


function fileInfoTests(){
do_check_eq("Now testing fileInfo", "Now testing fileInfo");
// Check class loaded
do_check_true("@wikiscraps.com/fileinfo;1" in Components.classes);
// Check interface exists
do_check_true("btIBittorrentFileInfo" in Components.interfaces);
// Grab the service object.
var service = Components.
classes["@wikiscraps.com/fileinfo;1"].
getService(Components.interfaces.btIBittorrentFileInfo);
var fakeInfo = {
files : [
{ length : 10, path : ["a"] },
{ length : 10, path : ["b"] },
{ length : 10, path : ["c"] },
{ length : 10, path : ["d"] },
{ length : 10, path : ["e"] }
]
};
// Add piece length later, because it has a space in it's key.
fakeInfo["piece length"] = 6;
// Check service methods
// Check the list files/file count/file size/file in torrent/find file
var fileList = service.listFiles(fakeInfo);
do_check_eq(fileList.length,5);
do_check_eq(service.fileCount(fakeInfo),5);
do_check_eq(service.fileSize(fakeInfo, 0),10);
do_check_true(service.fileInTorrent(fakeInfo, ["a"]));
do_check_false(service.fileInTorrent(fakeInfo, ["a","a"]));
do_check_false(service.fileInTorrent(fakeInfo, ["z"]));
do_check_eq(service.findFile(fakeInfo, ["a"]),0);
do_check_eq(service.findFile(fakeInfo, ["a","a"]),-1);
do_check_eq(service.findFile(fakeInfo, ["z"]),-1);
// Check the blockToFileParts method
// For a block spanning all of one file
var fileParts = service.blockToFileParts(fakeInfo, 0, 10);
do_check_eq(fileParts.length, 1);
do_check_eq(fileParts[0].index, 0);
do_check_eq(fileParts[0].offset, 0);
do_check_eq(fileParts[0].size, 10);
// For a block spanning two files
fileParts = service.blockToFileParts(fakeInfo, 0, 11);
do_check_eq(fileParts.length, 2);
do_check_eq(fileParts[0].index, 0);
do_check_eq(fileParts[0].offset, 0);
do_check_eq(fileParts[0].size, 10);
do_check_eq(fileParts[1].index, 1);
do_check_eq(fileParts[1].offset, 0);
do_check_eq(fileParts[1].size, 1);
// For a block of one byte near the beginning of a file
fileParts = service.blockToFileParts(fakeInfo, 10, 1);
do_check_eq(fileParts.length, 1);
do_check_eq(fileParts[0].index, 1);
do_check_eq(fileParts[0].offset, 0);
do_check_eq(fileParts[0].size, 1);
// For a block of one byte near the end of a file
fileParts = service.blockToFileParts(fakeInfo, 9, 1);
do_check_eq(fileParts.length, 1);
do_check_eq(fileParts[0].index, 0);
do_check_eq(fileParts[0].offset, 9);
do_check_eq(fileParts[0].size, 1);

// PieceList check
// With a pieceLength of 6 & file sizes of 10:
// File 0 contains piece 0 & 1 (4 bytes)
// File 1 contains pieces 1 (2 bytes), 2, 3 (2 bytes)
// File 2 contains pieces 3 (4 bytes), 4
// File 3 contains pieces 5, 6 (4 bytes)
// File 4 contains pieces 6 (2 bytes), 7, 8 (2 bytes, irregular sized piece)
var pieceList = service.piecesInFile(fakeInfo, 0);
do_check_eq(pieceList.length, 2);
do_check_eq(pieceList[0], 0);
do_check_eq(pieceList[1], 1);
pieceList = service.piecesInFile(fakeInfo, 1);
do_check_eq(pieceList.length, 3);
do_check_eq(pieceList[0], 1);
do_check_eq(pieceList[1], 2);
do_check_eq(pieceList[2], 3);
/* check pieceIndex 4 (5th piece) is correctly allocated to fileIndex 2 (3rd file) */
pieceList = service.piecesInFile(fakeInfo, 2);
do_check_eq(pieceList.length, 2);
do_check_eq(pieceList[1], 4);
pieceList = service.piecesInFile(fakeInfo, 3);
do_check_eq(pieceList.length, 2);
do_check_eq(pieceList[0], 5);
// Tests missing:
// when piece is shared between 3 files (piecesize > middle file size)
// Files at end of torrent
}

Hmm. Okay, so a few of those methods aren't clear without understanding what the parameters mean. Here's the IDL:


[scriptable, uuid(463206c3-74dd-44f4-b6bf-dc8fbc198993)]
interface btIBittorrentFileInfo : nsISupports
{
nsIVariant listFiles(in nsIVariant decodedInfo);
long fileCount(in nsIVariant decodedInfo);
long fileSize(in nsIVariant decodedInfo, in long fileIndex);
boolean fileInTorrent(in nsIVariant decodedInfo, in nsIVariant filePathArray);
long findFile(in nsIVariant decodedInfo, in nsIVariant filePathArray);
nsIVariant blockToFileParts(in nsIVariant decodedInfo, in long offset, in long blockLength);
nsIVariant piecesInFile(in nsIVariant decodedInfo, in long fileIndex);
};


This again isn't completely useful, because I've used nsIVariant where arrays or objects would be. Ugh... I can see documentation would be shorter way to enlightenment.

Tuesday, July 6, 2010

Info - Bittorrent Tracker (and XPCOM component)

The tracker is a web service that when given an id of a torrent will respond with a list of IP addresses/ports of other peers interested in that torrent.

Torrent Identity
The identity of a torrent (the 'infoHash') is generated from a hash using SHA1 on the section of the .torrent file describing the list of files & checksums. This means that the rest of the .torrent file can be edited, while the torrent identity will remain unchanged. The editing may be the addition or removal of a tracker for the torrent, or some other meta data that itself is not the 'contents' of the torrent.

Finding Trackers
The .torrent file will contain at least one tracker, identified as 'announce'. Additional trackers may appear in the .torrent file as 'announce-list'.

Talking to Trackers
Trackers are given a HTTP GET request with information passed via the query string. The payload in the response is usually a bencoded dictionary listing other peers in the torrent, though this is an optimal response. Sometimes it's a string describing an error such as not accepting requests without compact=1 (the format of the peers list may be a compact string, or a list of dictionaries; the component defaults to compact=1 to avoid this issue). Darn.

Talking to Trackers, again
A subsequent call to a tracker may occur, to get more peers or to update the tracker on progress. The original tracker response may have included a minimum timeout to use before making this follow up call, and also a Tracker Id which should be sent as part of the follow up call.


Implementation of Tracker XPCOM component
IDL for Tracker XPCOM component

Example:
var trackerCaller = Components.
classes["@wikiscraps.com/tracker;1"].
getService(Components.interfaces.btIBittorrentTracker);
// Get a list of peers
trackerCaller.start(
// infohash of http://www.mininova.org/det/3194273
"\x72\xc2\x4e\x55\xc3\x2c\xde\xf7\x23\x78\x12\xf1\x69\x4d\xad\xd1\xe7\x13\x19\x97",
// announceURI
"http://tracker.mininova.org/announce",
"", // tracker Id
"-YY0000-000000000000", // peer id (should've generated & recorded this)
6881, // local bittorrent port
-1, // maximumPeers (-1 means use default)
0, // uploaded bytes
0, // downloaded bytes
0, // remaining bytes...
{
onResponse : function(peers, interval, trackerId) {
// got a list of peers
alert("Look! Peers!\n" +peers);
// the interval before another request to the tracker can be made
alert
("Timeout (in seconds) before next tracker request:\n" +interval);
// the id to use for subsequent calls to the tracker
alert("Tracker's Id:\n" + trackerId);
},
onError : function(serverResponse) {
// darn.
alert("Unknown error, tracker component response was:\n" + serverResponse);
}
});

More information on the .torrent structure (Theory.org) Bittorrent Protocol Specification (BEP0003)
More information on the tracker response (Theory.org) Bittorrent Protocol Spec (BEP0003)
The bencoding XPCOM component.

Monday, July 5, 2010

Leak due to XPCOM service held in global variable

I've been running some xpcshell tests. I'm testing the bittorrent tracker code, which is asynchronous so that's kinda fun. It means that I call do_test_pending(); before starting the tracker request. Oh, let's show a test:

function run_test() {
// Just check we're running tests okay
do_check_true(true);
// Check the interface has been loaded correctly
do_check_true("btIBittorrentTracker" in Components.interfaces);
// Check the component has been registered correctly
do_check_true("@wikiscraps.com/tracker;1" in Components.classes);
// Grab the service
var trackerService = Components.
classes["@wikiscraps.com/tracker;1"].
getService(Components.interfaces.btIBittorrentTracker);
do_test_pending();
// Make a request to the tracker
trackerService.start(
// infohash of http://www.mininova.org/det/3194273
"\x72\xc2\x4e\x55\xc3\x2c\xde\xf7\x23\x78\x12\xf1\x69\x4d\xad\xd1\xe7\x13\x19\x97",
// announceURI
"http://tracker.mininova.org/announce",
"", // tracker Id
// peer id (should've generated & recorded this)
"-YY0000-000003030303",
6881, // local bittorrent port
-1, // maximumPeers (-1 means use default)
0, // uploaded bytes
0, // downloaded bytes
0, // remaining bytes...
{
onResponse : function(peers, interval, trackerId) {
// got a list of peers
do_check_true(peers instanceof Array);
// finished tests
do_test_finished();
},
onError : function(serverResponse) {
// darn. Fail the test.
do_throw("Unknown error, server response was:\n" + serverResponse);
do_test_finished();
}
});
}

So you pass the tracker service the information needed to make a call to the tracker, and some kind of observer/callback to use when there's a result known. Okay? Got it? Cool.
Anyways, I started seeing a BloatView report after running this test. It didn't appear originally, so I assume it only appears when some kind of leak occurs.

== BloatView: ALL (cumulative) LEAK STATISTICS

|<----------------Class--------------->|<-----Bytes------>|<----------------Objects---------------->|<--------------References-------------->|
Per-Inst Leaked Total Rem Mean StdDev Total Rem Mean StdDev
0 TOTAL 27 520 24783 11 ( 183.34 +/- 238.02) 80546 13 ( 263.14 +/- 337.35)
2 BackstagePass 24 24 1 1 ( 1.00 +/- 0.00) 6366 4 ( 148.79 +/- 78.75)
19 XPCNativeScriptableShared 112 112 1053 1 ( 12.21 +/- 1.38) 0 0 ( 0.00 +/- 0.00)
22 XPCWrappedNative 56 168 1322 3 ( 661.75 +/- 381.27) 8044 3 ( 682.46 +/- 383.57)
23 XPCWrappedNativeProto 32 64 519 2 ( 260.00 +/- 149.61) 0 0 ( 0.00 +/- 0.00)
100 nsJSID 36 36 116 1 ( 58.25 +/- 33.42) 499 1 ( 117.51 +/- 67.79)
135 nsStringBuffer 8 8 2600 1 ( 435.41 +/- 266.14) 4832 1 ( 413.00 +/- 254.37)
144 nsSystemPrincipal 36 36 1 1 ( 1.00 +/- 0.00) 347 1 ( 4.72 +/- 1.09)
147 nsThread 72 72 3 1 ( 1.80 +/- 0.84) 4598 3 ( 571.22 +/- 379.54)

nsTraceRefcntImpl::DumpStatistics: 180 entries

Ouch. So, going back over what I'd done recently, I found this:

if (Tracker.prototype._bencoding == false) {
try {
Tracker.prototype._bencoding = Components.
classes["@wikiscraps.com/encoding/bencoding;1"].
getService(Components.interfaces.Bencoding);
} catch(e) {
callback.onError("Bencoding component not loaded");
return;
}
}

It seems loading a XPCOM service, and then leaving it in a 'global' variable and never explicitly releasing it caused a leak. Whoops. The BloatView goes away if I switch to using local variables and acquire the service each time it's needed, so it seems I can call that fixed?

More information on testing a JavaScript Add-on with XPCShell (Interfaces and components!).
More information on writing XPCShell unit tests.

Friday, July 2, 2010

Bittorrent Tracker ... again

Some time ago I was talking about implementing an object to communicate with bittorrent trackers (the server or servers which provide an initial list of peers).
This week I've been revisiting that code, finishing it yesterday. Aaaand today I'll scrap the code.
I had gotten a platform independent Firefox bittorrent extension working long ago, with a major memory leak which led to wasted time and disinterest. I started breaking the extension into individual components that could be re-used and tested independently, the first release being the encoding/decoding component (with XPIDL and all that). I liked that release, the code is really a translation to JavaScript of the 'official' bittorrent clients bencoding, attributed to Petru Paler and remaining licensed with the Bittorrent Open Source License.
Finishing the tracker code, I looked back and saw I had been attempting to create classes that could be extended, and in turn objects that held state. The idea being that this could be a base tracker class, extended to provide additional abilities. I've looked back at the code with some disgust. I really don't think I should've approached the design that way. Sure, part of the reason for the approach was to make it easier for someone to extend the code, but predicting where/how it needs to be flexible was a waste (in this case).
I'm rewriting the code to be like the bencoding component. The object will hold no state, everything relevant is passed as part of method calls. In this case, there's a networking issue that requires asynchronous activity so I'll provide a callback parameter. If the code is useful for someone in the future, great, take the code and someone else can design for flexibility, I'll just try to focus on cohesion.