Dylan Storey

Recovering academic, hacker, tinkerer, scientist.

Getting Quality Score Offests from your FASTQ

Getting Quality Score Offests from your FASTQ file.

Knowing which quality scale your vendor is using is important , and honestly these guys can’t seem to decide themselves which ones they want to use as a group or as individuals. Here’s a quick little subroutine that will test a quality scores and tell you which offset to use in order to convert to the correct Phred score. Based on the below diagram:

Fastq Offsets

Any scores that are seen that are below 59 indicate that the file is using an offset of 33 while any scores seen over 74 indicate and offset of 64. The sub routine is written such that the first time these tests are caught it returns the offset so hopefully you shouldn’t need to go through more than a few records to get your answer.

use warnings;
use strict;
my $file = $ARGV;
my $offset = test_qualities($file);
sub iterator{
    my $handle = shift // die $!;
    my %return;
    return sub{ ## actual iterator
        my %return;
        $return{'head'} = readline($handle) // return; #if the next line exists , get it otherwise return null
        $return{'seq'} = readline($handle);
        $return{'head2'} = readline($handle);
        $return{'quals'} = readline($handle);
        map {chomp $return{$_}} keys %return;   
        return \%return;
sub test_qualities{
    my $file = shift;
    open ( my $TEST, '<' , $file) || die $!;
    my $iterator = iterator($TEST);
    while (my $record = $iterator->()){
        map {if ($_ > 74) {return 64;} elsif ($_ < 59) {return 33}} unpack("W*" , $record->{'quals'});
    close $TEST;
    die "Was unable to determine quality offset , you'll need to do this yourself\n";
blog comments powered by Disqus