Dylan Storey

Recovering academic, hacker, tinkerer, scientist.

Benchmarking Regex study() for string searches

Benchmarking regex study()

Introduction:

I was doing some reading on study() in Perl and there are no hard and fast rules on when to use it or when its best. In fact the only discussion I was able to find on it was a year old thread from Stack Overflow . The results from the single test study demonstrated wasn’t overly realistic from a genomic context, so I decided to test it myself.

Some notes , I’m not testing the speed increase of a single study , I could have but decided that I wanted to know the average speed up of using speed up across an entire FASTA file.

Code:

use warnings;
use strict;
use threads;
use Benchmark;
 
 
foreach(0..100){
my $t0 = Benchmark->new;
{
local $/ = ">";
open ( IN , '){
    chomp $fasta_record;
    push (@executing_threads , threads->create(\&process_records_1 , $fasta_record )) if $fasta_record  ne ''; 
    for (my $i = 1; $i  ;
            chomp $fasta_record;
            push(@executing_threads , threads->create(\&process_records_1 , $fasta_record));
            }
        }
    while (@executing_threads){
        my $head = (pop (@executing_threads)->join());
        }
    }
}   
my $t1 = Benchmark->new;
my $td = timediff($t1,$t0);
print "the non-study code took:".timestr($td)."\n";
close IN;
}
 
 
 
foreach(0..100){
my $t2 = Benchmark->new;
{
local $/ = ">";
open ( IN , '){
    chomp $fasta_record;
    push (@executing_threads , threads->create(\&process_records_2 , $fasta_record )) if $fasta_record  ne ''; 
    for (my $i = 1; $i  ;
            chomp $fasta_record;
            push(@executing_threads , threads->create(\&process_records_2 , $fasta_record));
            }
        }
    while (@executing_threads){
        my $head = (pop (@executing_threads)->join());
        }
    }
}   
my $t3 = Benchmark->new;
my $td2 = timediff($t3,$t2);
print "the study code took:".timestr($td2)."\n";
close IN;
}
 
 
exit;
 
 
 
 
sub process_records_1{
    (my $header , my @tmp) = split (/\n/, shift);
    my $sequence = uc(join '', @tmp);
    while ($sequence=~/ATG/ig){
        }
    while ($sequence=~/TAG|TAA|TAR|TRA|TGA/ig){}    
    return;
    }
     
sub process_records_2{
    (my $header , my @tmp) = split (/\n/, shift);
    my $sequence = uc(join '', @tmp);
    study $sequence;
    while ($sequence=~/ATG/ig){
        }
    while ($sequence=~/TAG|TAA|TAR|TRA|TGA/ig){
        }   
    return;
    }

Results:

no study study 22.77 20.77 22.51 20.95 22.27 21.07 22.14 20.83 22.25 21.07 22.16 21 22.22 20.89 22.14 20.88 22.26 20.75 22.08 21.14 22.06 21.07 22.09 20.93 22.23 20.85 22.27 20.77 22.24 20.94 22.33 20.92 22.73 20.98 22.03 20.98 22.12 20.94 22.2 20.96 22.2 20.91 22.02 20.95 22.22 21.11 22.15 20.87 22.05 20.9 22.11 20.86 Average 22.225 20.9342307692 Speed Up (%)
5.8077355715
Verdict:

Not really a huge speed up, may be useful in really large files but but its only shaving on average 1.3 seconds on a run across an entire FASTA file.

blog comments powered by Disqus