Dylan Storey

Recovering academic, hacker, tinkerer, scientist.

Multi threaded streaming FASTA processing

Setting up software RAID in Ubuntu.

So a while ago I was playing with multi-threaded processing of files and while I got it working well and got decent speed ups , I was incredibly lazy about reading the file in and had a huge memory overhead as a result. Here is an updated code block that streams a FASTA file and executes threads one at a time.

use warnings;
use strict;
use threads;
 
 
local $/ = ">";
open ( IN , '<' , $file ) || die $!;
my @executing_threads = ();
while (my $fasta_record = <IN>){  # while not EOF
    chomp $fasta_record; # chomp the current record
    push (@executing_threads , threads->create(\&work , $fasta_record )) if $fasta_record ne ''; #as long as its not an empty record , send it of to the sub routine
    for (my $i = 1; $i < $threads; $i++){ # execute threads-1 processes
        if (! eof(IN)){ #as long as you're not at EOF
        $fasta_record  = <IN> ; #get a record
        chomp $fasta_record; # remove >
        push(@executing_threads , threads->create(\&work , $fasta_record)); #execute a process
        } 
    }
    while (@executing_threads){
        my $head = (pop (@executing_threads)->join()); #call your threads and clean it up
        print $head . "\n";
        }
    } 
     
 
exit;
 
sub work{
 
#do your work
return;
}

The code block is pretty straight forward. We utilize the $/ variable again to make our breaks on the header portions of the file. We then simply get records from the file if they exist and immediately pass it along to whatever process were using it for.

Voila!

All of the speedups , no where near the memory overhead.

blog comments powered by Disqus