## Loading data

The upcoming section consists of a few preparatory steps such as: (I.) preparing environment and loading packages necessary for the further steps, (II.) loading sequencing data, (III.) removing outlier and random samples to equalize a number of replicates per group and (IV.) normalizing data using a rarefaction.   

In [56]:
import sys
import copy

sys.path.append("/home/adam/miseq/seqDataClass")

import pandas as pd
import numpy as np
import random

#import seqDataClass.seqDataClass as seqDataClass
import seqDataClass as seqDataClass

In [2]:
data = seqDataClass.seqObject(mappingFile="/home/adam/miseq/ReverseTrans/mapping.csv", 
                              taxonomyFile= "/home/adam/miseq/ReverseTrans/taxonomy.csv", 
                              otuFile="/home/adam/miseq/ReverseTrans/otutab.txt", 
                              taxonomySep=',',
                              sampleNamesColumn="Name")

#print(data.data.sum(axis=0))

# Removing outlier samples
bad_samples = ["Osnat056", "Osnat045", "Osnat055"]

for sample in bad_samples:
    print("Removing bad sample : {}".format(sample))
    data.remove_sample(category="Name", sample=sample)
    
# Randomply removing samples from the data set
TGIRT = ["Osnat037", "Osnat038", "Osnat039", "Osnat040", "Osnat041", "Osnat042", "Osnat043"]

tgirtChoice = random.sample(TGIRT, 1)

for sample in tgirtChoice:
    print("Removing randomly: {} from TGIRT".format(sample))
    data.remove_sample(category="Name", sample=sample)

promega55 = ["Osnat059", "Osnat060", "Osnat061", "Osnat062"]
promega55Choice = random.sample(promega55, 2)   # Randomly select an element from a list

for sample in promega55Choice:
    print("Removing randomly: {} from ImPromII 55°C".format(sample))
    data.remove_sample(category="Name", sample=sample)

data.rarefy_to_even_depth(seqDepth=10000,seed=124)

#print(data.data.sum(axis=0))

data.add_otu_parameter(yamlParamDictFile="/home/adam/miseq/ReverseTrans/GC_content_plot/otus_GC.yaml", paramName="GC_content")

Removing bad sample : Osnat056
Removing bad sample : Osnat045
Removing bad sample : Osnat055
Removing randomly: Osnat039 from TGIRT
Removing randomly: Osnat060 from ImPromII 55°C
Removing randomly: Osnat061 from ImPromII 55°C
Sample Osnat038 was removed from the dataset because it contains insufficient amount of sequences (34).
Sample Osnat048 was removed from the dataset because it contains insufficient amount of sequences (121).


In [57]:
data.save_otu_csv(fileName="/home/adam/miseq/ReverseTrans/normalized_tab.csv")

## Saving data for downstream analysis

Data is normalized by column the results are presented in a table. Following that, only an upper quartile of the classes is further normalized by row (class) and the result is used for the GC relative enrichment plot. 

In [3]:
data.data

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Enzyme,TGIRT,TGIRT,TGIRT,TGIRT,TGIRT,SuperScriptIV,SuperScriptIV,SuperScriptIV,SuperScriptIV,SuperScriptIV,Promega42,Promega42,Promega42,Promega42,Promega42,Promega55,Promega55,Promega55,Promega55,Promega55
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Replicate,1,4,5,6,7,1,3,4,6,7,1,2,3,4,7,1,2,5,6,7
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Name,Osnat037,Osnat040,Osnat041,Osnat042,Osnat043,Osnat044,Osnat046,Osnat047,Osnat049,Osnat050,Osnat051,Osnat052,Osnat053,Osnat054,Osnat057,Osnat058,Osnat059,Osnat062,Osnat063,Osnat064
Domain,Phylum,Class,Order,Family,Genus,OTU,GC_content,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3,Unnamed: 25_level_3,Unnamed: 26_level_3,Unnamed: 27_level_3
Bacteria,Cyanobacteria,Oxyphotobacteria,Nostocales,Phormidiaceae,Tychonema,Otu0001,0.547847,30,258,17,121,303,48,544,418,128,325,9,1254,294,318,268,17,498,179,37,24
Bacteria,Actinobacteria,Actinobacteria,Micrococcales,Micrococcaceae,Unclassified,Otu0002,0.572127,463,298,301,437,267,464,377,419,485,321,336,180,260,169,232,702,428,406,433,199
Bacteria,Proteobacteria,Alphaproteobacteria,Azospirillales,Azospirillaceae,Skermanella,Otu0003,0.579208,162,437,653,244,296,175,325,356,242,247,96,271,399,327,218,135,159,177,130,47
Bacteria,Proteobacteria,Gammaproteobacteria,Betaproteobacteriales,Burkholderiaceae,Pelomonas,Otu0004,0.550117,54,3,0,20,27,54,4,2,13,21,268,5,274,851,149,49,37,110,313,352
Bacteria,Proteobacteria,Alphaproteobacteria,Rhizobiales,uncultured,Unclassified,Otu0005,0.559406,189,302,415,305,402,151,406,306,348,312,137,201,424,249,313,96,61,90,89,107
Bacteria,Proteobacteria,Alphaproteobacteria,Sphingomonadales,Sphingomonadaceae,Sphingomonas,Otu0006,0.522277,251,102,125,77,106,207,116,95,69,73,438,107,375,809,433,257,196,299,268,587
Bacteria,Proteobacteria,Alphaproteobacteria,Rhizobiales,Beijerinckiaceae,Unclassified,Otu0007,0.564356,47,73,100,65,89,66,164,97,77,75,22,103,66,80,58,54,37,31,31,16
Bacteria,Proteobacteria,Alphaproteobacteria,Rhizobiales,Beijerinckiaceae,Microvirga,Otu0008,0.559406,104,103,86,122,111,84,167,89,149,123,91,108,90,59,75,67,68,47,39,37
Bacteria,Proteobacteria,Alphaproteobacteria,Elsterales,uncultured,Unclassified,Otu0009,0.574257,23,41,85,112,85,23,57,43,107,112,22,22,118,70,157,49,12,63,55,18
Bacteria,Proteobacteria,Deltaproteobacteria,Myxococcales,Unclassified,Unclassified,Otu0010,0.567757,112,155,200,81,88,55,145,139,81,94,40,346,363,157,90,90,128,55,26,31


In [4]:
# A copy of the original data frame
df1 = data.data.copy()

In [5]:
# Calculating a sum per enzyme category to normalize 
df2 = df1.groupby("Class").sum()

# Standard deviation per enzyme group
df2_std = df2.std(level="Enzyme", axis=1)

# Making a sum on the class and enzyme level
#df2 = df1.sum(level="Class", axis=0)
df2_ave = df2.sum(level="Enzyme", axis=1)

In [6]:
df2_ave

Enzyme,TGIRT,SuperScriptIV,Promega42,Promega55
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0319-7L14,392,332,527,442
ABY1,0,0,0,0
AKAU4049,4,3,0,1
Acidimicrobiia,1858,1518,1203,1617
Acidobacteriia,359,444,257,259
Actinobacteria,11353,10105,5864,11263
Alphaproteobacteria,12385,12148,12943,9302
Anaerolineae,89,113,191,175
Armatimonadia,2,1,6,19
BD2-11,42,48,26,33


In [7]:
df2_std

Enzyme,TGIRT,SuperScriptIV,Promega42,Promega55
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0319-7L14,56.389715,50.257338,85.207394,81.041964
ABY1,0.000000,0.000000,0.000000,0.000000
AKAU4049,1.095445,0.894427,0.000000,0.447214
Acidimicrobiia,55.410288,46.301188,109.917242,47.300106
Acidobacteriia,28.455228,18.088670,29.364945,6.942622
Actinobacteria,274.296919,223.156895,445.351771,581.639321
Alphaproteobacteria,382.188435,303.496787,481.747652,232.488279
Anaerolineae,5.805170,12.700394,7.563068,8.093207
Armatimonadia,0.547723,0.447214,0.447214,2.863564
BD2-11,6.730527,8.820431,5.761944,7.987490


The data frame is simplified to only a class level information, OTU names, GC content and the four enzymatic conditions. 

In [8]:
# Copy of the original data frame
df2 = df1.copy()
# Dividing the per/enzyme values by an average to normalize them
df2 = df2.div(df2_ave)

# Sorting out the rest
df2 = df2.reset_index()
df3 = df2.drop(["Domain","Phylum", "Order", "Family", "Genus"], axis=1)
df3

  new_axis = axis.drop(labels, errors=errors)


Enzyme,Class,OTU,GC_content,TGIRT,TGIRT,TGIRT,TGIRT,TGIRT,SuperScriptIV,SuperScriptIV,...,Promega42,Promega42,Promega42,Promega42,Promega42,Promega55,Promega55,Promega55,Promega55,Promega55
Replicate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,1,4,5,6,7,1,3,...,1,2,3,4,7,1,2,5,6,7
Name,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Osnat037,Osnat040,Osnat041,Osnat042,Osnat043,Osnat044,Osnat046,...,Osnat051,Osnat052,Osnat053,Osnat054,Osnat057,Osnat058,Osnat059,Osnat062,Osnat063,Osnat064
0,Oxyphotobacteria,Otu0001,0.547847,0.034208,0.294185,0.019384,0.137970,0.345496,0.028054,0.317943,...,0.003217,0.448177,0.105075,0.113653,0.095783,0.017672,0.517672,0.186071,0.038462,0.024948
1,Actinobacteria,Otu0002,0.572127,0.040782,0.026249,0.026513,0.038492,0.023518,0.045918,0.037308,...,0.057299,0.030696,0.044338,0.028820,0.039563,0.062328,0.038001,0.036047,0.038444,0.017668
2,Alphaproteobacteria,Otu0003,0.579208,0.013080,0.035285,0.052725,0.019701,0.023900,0.014406,0.026753,...,0.007417,0.020938,0.030827,0.025265,0.016843,0.014513,0.017093,0.019028,0.013975,0.005053
3,Gammaproteobacteria,Otu0004,0.550117,0.020963,0.001165,0.000000,0.007764,0.010481,0.019334,0.001432,...,0.044914,0.000838,0.045919,0.142618,0.024971,0.006439,0.004862,0.014455,0.041130,0.046255
4,Alphaproteobacteria,Otu0005,0.559406,0.015260,0.024384,0.033508,0.024627,0.032459,0.012430,0.033421,...,0.010585,0.015530,0.032759,0.019238,0.024183,0.010320,0.006558,0.009675,0.009568,0.011503
5,Alphaproteobacteria,Otu0006,0.522277,0.020266,0.008236,0.010093,0.006217,0.008559,0.017040,0.009549,...,0.033841,0.008267,0.028973,0.062505,0.033454,0.027628,0.021071,0.032144,0.028811,0.063105
6,Alphaproteobacteria,Otu0007,0.564356,0.003795,0.005894,0.008074,0.005248,0.007186,0.005433,0.013500,...,0.001700,0.007958,0.005099,0.006181,0.004481,0.005805,0.003978,0.003333,0.003333,0.001720
7,Alphaproteobacteria,Otu0008,0.559406,0.008397,0.008317,0.006944,0.009851,0.008962,0.006915,0.013747,...,0.007031,0.008344,0.006954,0.004558,0.005795,0.007203,0.007310,0.005053,0.004193,0.003978
8,Alphaproteobacteria,Otu0009,0.574257,0.001857,0.003310,0.006863,0.009043,0.006863,0.001893,0.004692,...,0.001700,0.001700,0.009117,0.005408,0.012130,0.005268,0.001290,0.006773,0.005913,0.001935
9,Deltaproteobacteria,Otu0010,0.567757,0.037321,0.051649,0.066644,0.026991,0.029324,0.017510,0.046164,...,0.009238,0.079908,0.083834,0.036259,0.020785,0.041783,0.059424,0.025534,0.012071,0.014392


The resulting data frame is "melted" such that the enzymatic conditions become a new column and the GC content and value are the only two numerical value columns.

In [9]:
df4 = df3.melt(id_vars=["Class", "OTU", "GC_content"])

By summarizing within each Class and Enzyme category, we should receive either 1 or 0 if a group is not present in a given category. 

In [10]:
df4.groupby(["Class", "Enzyme"]).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,GC_content,value
Class,Enzyme,Unnamed: 2_level_1,Unnamed: 3_level_1
0319-7L14,Promega42,164.067946,1.0
0319-7L14,Promega55,164.067946,1.0
0319-7L14,SuperScriptIV,164.067946,1.0
0319-7L14,TGIRT,164.067946,1.0
ABY1,Promega42,2.616708,0.0
ABY1,Promega55,2.616708,0.0
ABY1,SuperScriptIV,2.616708,0.0
ABY1,TGIRT,2.616708,0.0
AKAU4049,Promega42,5.941860,0.0
AKAU4049,Promega55,5.941860,1.0


Multiplying a GC content value of each OTU by the OTU's relative abundance within a given Class and Enzyme group, we receive the OTU's proportional contribution to the overall Class GC content. By summarizing the GC content over a Class, we obtain it's weighted average GC content. The Class normalized count is discarded. 

In [11]:
df4.GC_content = df4.GC_content*df4.value
df5 = df4.groupby(["Class","Enzyme"]).sum()
df5 = df5.drop("value", axis=1)
df5 

Unnamed: 0_level_0,Unnamed: 1_level_0,GC_content
Class,Enzyme,Unnamed: 2_level_1
0319-7L14,Promega42,0.580750
0319-7L14,Promega55,0.582987
0319-7L14,SuperScriptIV,0.581233
0319-7L14,TGIRT,0.582942
ABY1,Promega42,0.000000
ABY1,Promega55,0.000000
ABY1,SuperScriptIV,0.000000
ABY1,TGIRT,0.000000
AKAU4049,Promega42,0.000000
AKAU4049,Promega55,0.593023


In [43]:
df2_ave2 = df2_ave.reset_index()
df2_ave2 = df2_ave2.melt(id_vars="Class")
df2_ave2 = df2_ave2.set_index(["Class", "Enzyme"])
df2_ave2 = df2_ave2.sort_values(["Class", "Enzyme"])
# Renaming the values column to have neater output
df2_ave2.rename(columns={"value":"Average"},level=0, inplace=True)
df2_ave2

Unnamed: 0_level_0,Unnamed: 1_level_0,Average
Class,Enzyme,Unnamed: 2_level_1
0319-7L14,Promega42,527
0319-7L14,Promega55,442
0319-7L14,SuperScriptIV,332
0319-7L14,TGIRT,392
ABY1,Promega42,0
ABY1,Promega55,0
ABY1,SuperScriptIV,0
ABY1,TGIRT,0
AKAU4049,Promega42,0
AKAU4049,Promega55,1


In [42]:
df2_std2 = df2_std.reset_index()
df2_std2 = df2_std2.melt(id_vars="Class")
df2_std2 = df2_std2.set_index(["Class", "Enzyme"])
df2_std2 = df2_std2.sort_values(["Class", "Enzyme"])
# Renaming the values column to have neater output
df2_std2.rename(columns={"value":"StdDev"},level=0, inplace=True)
df2_std2

Unnamed: 0_level_0,Unnamed: 1_level_0,StdDev
Class,Enzyme,Unnamed: 2_level_1
0319-7L14,Promega42,85.207394
0319-7L14,Promega55,81.041964
0319-7L14,SuperScriptIV,50.257338
0319-7L14,TGIRT,56.389715
ABY1,Promega42,0.000000
ABY1,Promega55,0.000000
ABY1,SuperScriptIV,0.000000
ABY1,TGIRT,0.000000
AKAU4049,Promega42,0.000000
AKAU4049,Promega55,0.447214


In [44]:
dfFinal = pd.concat([df2_ave2, df2_std2, df5],axis=1)
dfFinal

Unnamed: 0_level_0,Unnamed: 1_level_0,Average,StdDev,GC_content
Class,Enzyme,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0319-7L14,Promega42,527,85.207394,0.580750
0319-7L14,Promega55,442,81.041964,0.582987
0319-7L14,SuperScriptIV,332,50.257338,0.581233
0319-7L14,TGIRT,392,56.389715,0.582942
ABY1,Promega42,0,0.000000,0.000000
ABY1,Promega55,0,0.000000,0.000000
ABY1,SuperScriptIV,0,0.000000,0.000000
ABY1,TGIRT,0,0.000000,0.000000
AKAU4049,Promega42,0,0.000000,0.000000
AKAU4049,Promega55,1,0.447214,0.593023


In [45]:
dfFinal = dfFinal.reset_index()
dfFinal = dfFinal.pivot(index="Class", columns="Enzyme")
dfFinal

Unnamed: 0_level_0,Average,Average,Average,Average,StdDev,StdDev,StdDev,StdDev,GC_content,GC_content,GC_content,GC_content
Enzyme,Promega42,Promega55,SuperScriptIV,TGIRT,Promega42,Promega55,SuperScriptIV,TGIRT,Promega42,Promega55,SuperScriptIV,TGIRT
Class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
0319-7L14,527,442,332,392,85.207394,81.041964,50.257338,56.389715,0.580750,0.582987,0.581233,0.582942
ABY1,0,0,0,0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
AKAU4049,0,1,3,4,0.000000,0.447214,0.894427,1.095445,0.000000,0.593023,0.593798,0.594186
Acidimicrobiia,1203,1617,1518,1858,109.917242,47.300106,46.301188,55.410288,0.585221,0.586662,0.588191,0.587488
Acidobacteriia,257,259,444,359,29.364945,6.942622,18.088670,28.455228,0.555791,0.561708,0.561813,0.559804
Actinobacteria,5864,11263,10105,11353,445.351771,581.639321,223.156895,274.296919,0.583321,0.583805,0.584725,0.586320
Alphaproteobacteria,12943,9302,12148,12385,481.747652,232.488279,303.496787,382.188435,0.554835,0.553426,0.562606,0.563610
Anaerolineae,191,175,113,89,7.563068,8.093207,12.700394,5.805170,0.552375,0.559423,0.561238,0.564171
Armatimonadia,6,19,1,2,0.447214,2.863564,0.447214,0.547723,0.579068,0.579835,0.594132,0.575936
BD2-11,26,33,48,42,5.761944,7.987490,8.820431,6.730527,0.609861,0.610253,0.608890,0.610935


In [46]:
dfFinal.to_csv("GC_class_whole.csv")

Only bacterial classes that account for the upper quartile of the dataset will be plotted. The remaining groups are summarized and labeled as "Low abundance".

In [61]:
# Using row sums to normalize each class
rowSums = dfFinal.sum(axis=1, level=0)["Average"]

# 85 % quantile
quantile85 = rowSums.quantile(q=0.5)
rowSumsVector = rowSums <= quantile85

# Upper 85% quantile rows
df3 = dfFinal.loc[rowSums >= quantile85, :]
la = dfFinal.loc[rowSums < quantile85, :].sum(axis=0)

In [62]:
# Pulling lower quantile rows together and appending them to the upper quantile data frame
la = la.to_frame(name="Low abundance")
la = la.transpose()

df3 = df3.append(la)
df3

Unnamed: 0_level_0,Average,Average,Average,Average,StdDev,StdDev,StdDev,StdDev,GC_content,GC_content,GC_content,GC_content
Enzyme,Promega42,Promega55,SuperScriptIV,TGIRT,Promega42,Promega55,SuperScriptIV,TGIRT,Promega42,Promega55,SuperScriptIV,TGIRT
0319-7L14,527.0,442.0,332.0,392.0,85.207394,81.041964,50.257338,56.389715,0.58075,0.582987,0.581233,0.582942
Acidimicrobiia,1203.0,1617.0,1518.0,1858.0,109.917242,47.300106,46.301188,55.410288,0.585221,0.586662,0.588191,0.587488
Acidobacteriia,257.0,259.0,444.0,359.0,29.364945,6.942622,18.08867,28.455228,0.555791,0.561708,0.561813,0.559804
Actinobacteria,5864.0,11263.0,10105.0,11353.0,445.351771,581.639321,223.156895,274.296919,0.583321,0.583805,0.584725,0.58632
Alphaproteobacteria,12943.0,9302.0,12148.0,12385.0,481.747652,232.488279,303.496787,382.188435,0.554835,0.553426,0.562606,0.56361
Anaerolineae,191.0,175.0,113.0,89.0,7.563068,8.093207,12.700394,5.80517,0.552375,0.559423,0.561238,0.564171
BD2-11,26.0,33.0,48.0,42.0,5.761944,7.98749,8.820431,6.730527,0.609861,0.610253,0.60889,0.610935
Bacilli,1356.0,656.0,663.0,520.0,133.784155,60.870354,74.751589,65.076878,0.522874,0.533793,0.531269,0.537584
Bacteroidia,2825.0,1817.0,2307.0,3005.0,141.663333,188.279845,143.224649,194.803747,0.514352,0.514383,0.513738,0.508924
Blastocatellia,336.0,243.0,309.0,328.0,31.546791,14.876155,33.899853,36.929663,0.543584,0.54642,0.548164,0.549827


In [58]:
df3.to_csv("GC_class_forFigure.csv")

In [63]:
df3.to_csv("barplot_data_class.csv")

In [55]:
df3

Unnamed: 0,index,variable,value
0,0319-7L14,level_0,0
1,Acidimicrobiia,level_0,1
2,Acidobacteriia,level_0,2
3,Actinobacteria,level_0,3
4,Alphaproteobacteria,level_0,4
5,Anaerolineae,level_0,5
6,BD2-11,level_0,6
7,Bacilli,level_0,7
8,Bacteroidia,level_0,8
9,Blastocatellia,level_0,9


In [None]:
# Simplifying a multi index into just one index 
idx = pd.IndexSlice

df1 = data.data.copy()

df1 = df1.sum(level="Enzyme", axis=1)

df1 = df1.reset_index()

df1 = df1.iloc[:,1:]

df1 = df1.iloc[:,[1,5,6,7,8,9,10]]

In [None]:
df1.index = df1.OTU

df1 = df1.drop("OTU", axis=1)

In [None]:
colNames = list(df1.columns.get_level_values("Enzyme"))
colNames[1] = "GC"
colNames

df1.columns = colNames

del df1.index.name

In [None]:
df1melted = df1.melt(id_vars=["Class", "GC"])
df1melted

In [None]:
colSum = df1melted.groupby("variable").sum()["value"]

colSum["Promega42"]

newValue = []
# Dividing each row value by an appropriate total sum of the category (enzyme)
for row in df1melted.iterrows():
    newValue.append(row[1].value/colSum[row[1].variable])
    
df1melted["value"] = newValue

In [None]:
df1sum = df1melted.groupby(["Class", "variable"]).sum()

In [None]:
df1melted.groupby(["Class", "variable"]).mean()

In [None]:
for row in df1melted.iterrows():
    # For a future convenience
    row=row[1]
    classval = row.Class
    enzyme = row.variable
    print(classval, enzyme)
    break

In [None]:
idx = pd.IndexSlice
df1sum.loc[idx[classval, enzyme],idx["value"]]

In [None]:
df1sum.loc[idx[classval, enzyme],:]

In [None]:
df1.to_csv("data_for_gc_plot.csv")

|Enzyme        |TGIRT  |SuperScriptIV|Promega42  |Promega55|
|:-------------|:-----:|:-----------:|:---------:|:-------:|
|TGIRT         | x     | TvS         | Tv42      | Tv55    |  
|SuperScriptIV | TvS   | x           | Sv42      | Sv55    |
|Promega42     | Tv42  | Sv42        | x         | P42v55  |
|Promega55     | Tv55  | Sv55        | P42v55    | x       |