Recovering academic, hacker, tinkerer, scientist.

This is a pretty common way of classifying and handling character data in programming languages, especially when you know a priori how certain combinations are going to be made. To be honest I’ve seen it for years in code was always kind of confused by it and didn’t really need to think about it until recently for a project.

Let’s say we have a need to determine that we’ve seen a specific combination of letters, and then emit a second classification for that first set. (i.e. If we’ve seen both A and T , emit W).

It’s fairly easy to imagine how we might do this with a series of nested if/else if/ else switches, the problem with this is that it’s clunky to write, clunky to read, and your program will have to actually test each conditional making the set of statements in - efficient. It also makes it further complicated if you have multiple levels of combination that are meaningful.

Instead we can do something that is less intuitive on the surface but allows us to use a smaller set of statements that’s easier to read and a little more efficient computationally.

The first thing we do is we define what each of our character clases and find out what their individual ASCII codes are:

```
my @possible = qw /A T G C /;
print "Single Values:\n";
map {print $_ . "\t"} @possible; print "\n";
map {print unpack('C*',$_) . "\t"} @possible;
print "\n";
```

Which outputs:

```
Single Values:
A T G C
65 84 71 67
```

Next we look at our pairs of values:

```
print "Pairwise Values:\n";
print "x\t";
map {print $_ . "\t"} @possible; print "\n";
foreach my $j (@possible){
print "$j\t";
foreach my $i (@possible){
print unpack('C*',$i)+unpack('C*',$j) . " \t";
}
print "\n";
}
```

Which gives us:

```
Pairwise Values:
x A T G C
A 130 149 136 132
T 149 168 155 151
G 136 155 142 138
C 132 151 138 134
```

Third we look at our classes of 3 (which mathmatically is the sum of all codes minus one) :

```
print "3 way values:\n";
my $N;
map {$N+=unpack('C*',$_)} @possible;
map {print "not$_ \t"} @possible; print "\n";
map {print $N - unpack('C*',$_) . "\t"} @possible;
print "\n";
print "N is $N\n";
```

```
Which Gives us:
3 way values:
notA notT notG notC
222 203 216 220
N is 287
```

Once we have all of this calculated, determine what your overlapping classes outputs are and put it in a switch statement:

```
sub iupac{
my @incoming = shift;
my $sum = 0;
map {
if ($_){
if (defined $_){
$sum += unpack('C*',$_);
}
}
} @incoming;
given($sum){
when(65) {return 'A'}
when(84) {return 'T'}
when(71) {return 'G'}
when(67) {return 'C'}
when(136) {return 'R'}
when(151) {return 'Y'}
when(138) {return 'S'}
when(149) {return 'W'}
when(155) {return 'K'}
when(132) {return 'M'}
when(222) {return 'B'}
when(220) {return 'D'}
when(216) {return 'H'}
when(203) {return 'V'}
when(287){return 'N'}
when(0){return ''}
default {die $! . Dumper \@incoming}
}
}
```

Full Table from script:

```
Single Values:
A T G C
65 84 71 67
Pairwise Values:
x A T G C
A 130 149 136 132
T 149 168 155 151
G 136 155 142 138
C 132 151 138 134
3 way values:
notA notT notG notC
222 203 216 220
N is 287
```