Recent Changes

Friday, August 14

  1. page Session 10.2 edited ... Informative Interlude Before we jump into this afternoon's very short lecture, there is an im…
    ...
    Informative Interlude
    Before we jump into this afternoon's very short lecture, there is an important functionality of matplotlib we should touch on. Many of you are probably familiar with xkcd, the science web comic.
    ...
    details here:
    http://matplotlib.org/xkcd/examples/showcase/xkcd.html
    #!/usr/bin/env python
    ...
    'YJL222W-B':(9345,9483,1,True),
    'YJL222W-A':(9551,9679,1,True)}
    Solutions
    1) Boxplots
    ## 1 ##
    from Bio import SeqIO
    import matplotlib.pyplot as plt
    import numpy as np
    # Parse .fastq file
    ecoli_reads = SeqIO.parse("E_coli_reads.fastq","fastq")
    qualityScores=[]
    for read in ecoli_reads:
    quality = read.letter_annotations["phred_quality"]
    qualityScores.append(quality)
    # Convert list of lists to array
    qualityArray = np.array(qualityScores)
    print qualityArray
    # Plot array
    plt.boxplot(qualityArray,1,showfliers=False)
    plt.savefig("boxplot_ex1.png")
    2) Graphing genome features
    ## 2 ##
    import matplotlib.pyplot as plt
    import pylab as p
    featureDict = {'TEL10L_X_element':(7004,7367,1,False),'ARS1002':(7544,7943,1,False),'YJL223C':(8775,9138,0,True),'YJL222W-B':(9345,9483,1,True),'YJL222W-A':(9551,9679,1,True)}
    plt.figure()
    for feature in featureDict:
    start,stop,orientation,orf = featureDict[feature]
    # adjust start,stop coordinates
    start,stop = start-5000,stop-5000
    # check to see whether or not feature
    # is ORF
    if orf:
    # plot arrow
    orf_length = stop-start+1
    # check orientation
    if orientation:
    pass
    else:
    orf_length=orf_length*-1
    p.arrow(start,3,orf_length,0,length_includes_head=True,head_length=50,head_width=3,width=1.5,color='purple')
    else:
    # plot rectangle
    rectangleWidth=stop-start+1
    rectangle = p.Rectangle((stop,2),rectangleWidth,1.5,color="gray")
    axesInfo = p.gca()
    axesInfo.add_patch(rectangle)
    plt.ylim(0,10)
    plt.xlim(0,5000)
    plt.show()

    (view changes)
    2:30 pm
  2. page Session 10.1 edited ... import numpy as np from Bio import SeqIO ... 50 x 4 5 # Each row represents a position…
    ...
    import numpy as np
    from Bio import SeqIO
    ...
    50 x 45
    # Each row represents a position
    # Each column represents: A, T, G, C, N
    (view changes)
    12:46 pm
  3. page Session 10.1 edited ... (a) Google how to calculate the standard error of the mean using scipy.stats and apply the app…
    ...
    (a) Google how to calculate the standard error of the mean using scipy.stats and apply the appropriate method to get the standard error of the mean (sem) for the gene's expression in each genetic condition.
    (b) Plot the mean measurement of the gene across all mutant conditions as a bar plot and add error bars representing the standard error of the mean you calculated in part (a). Hint: read the documentation of plt.bar to figure out how to add error bars to your bar plot.
    Solutions
    Exercise 1: Plotting %A, %T, %G, %C, and %Ns across read position.
    ## 1 ##
    ## 1a ##
    import numpy as np
    from Bio import SeqIO
    # Create an array of zeros 50 x 4
    # Each row represents a position
    # Each column represents: A, T, G, C, N
    nucleotideArray = np.zeros((50,5))
    # Read into .fastq file
    ecoli_fastq = "E_coli_reads.fastq"
    indexDict = {'A':0,'T':1,'G':2,'C':3,'N':4}
    number_of_reads=0
    for rec in SeqIO.parse(ecoli_fastq,"fastq"):
    read = rec.seq
    number_of_reads+=1
    for pos,base in enumerate(read):
    nucleotideArray[pos,indexDict[base]]+=1
    # Divide this entire array by number_of_reads
    # and multiply by 100 to get percentage
    percentArray = (nucleotideArray/float(1e6))*100
    labelNames = []
    # Now plot percent for each base
    # by slicing down each column
    for base in indexDict:
    column = indexDict[base]
    label = "% "+base
    x = range(50)
    plt.plot(x,percentArray[:,column],'-o',linewidth=3)
    labelNames.append(label)
    plt.legend(labelNames)
    plt.show()
    ## 1b ##
    # Replace the code above
    # Starting at percentArray
    # with the following:
    # Initialize a new figure and
    # Give it a title by passing a string
    # to plt.figure()
    plt.figure("Percent Nucleotide By Read Position")
    percentArray = (nucleotideArray/float(1e6))*100
    labelNames = []
    # Now plot percent for each base
    # by slicing down each column
    for base in indexDict:
    column = indexDict[base]
    label = "% "+base
    x = range(50)
    plt.plot(x,percentArray[:,column],'-o',linewidth=3)
    labelNames.append(label)
    # Set x & y-axis labels
    plt.xlabel("Read Position")
    plt.ylabel("% Nucleotide")
    plt.legend(labelNames)
    # Save the figure
    plt.savefig("Ex10-1b.png")
    Exercise 2: Scatter plot practice
    ## 2 ##
    import pandas as pd
    import matplotlib.pyplot as plt
    # Read into table to get FPKMs
    fpkmTable = pd.read_table("MP_genes.fpkm_table")
    comparisons = [("MP_control_0","MP_2H_0"),("MP_control_0","MP_4H_0"),("MP_2H_0","MP_4H_0")]
    colorList = ['purple','green','yellow']
    # Iterate over comparisons and plot
    for ind,pair in enumerate(comparisons):
    cond1_fpkm = fpkmTable[pair[0]]
    cond2_fpkm = fpkmTable[pair[1]]
    # Use the .add_subplot method on the fig object to add a subplot.
    # This returns an axes instance just like the above plt.subplot(n,n,n)
    plt.subplot(1,3,ind+1)
    plt.scatter(cond1_fpkm,cond2_fpkm,color=colorList[ind],edgecolor="none")
    plt.ylim(1,1000000)
    plt.xlim(1,1000000)
    # Use ax.set_title to add a title to the current axes instance
    plt.title(pair[0]+' vs '+pair[1])
    plt.yscale('log')
    plt.xscale('log')
    plt.show()
    Exercise 3: Bar plots
    ## 3 ##
    ## 3a ##
    import pandas as pd
    import scipy.stats as sp
    import matplotlib.pyplot as plt
    # Read data table
    dataTable = pd.read_table("dataTable.tab")
    # Turn data into float array
    # to make calculations easier
    data = np.zeros((6,3))
    for idx,item in enumerate(dataTable.columns.values):
    if item=="Genotype":
    pass
    else:
    dataI = idx-1
    values = dataTable[item]
    data[:,dataI]+=values
    # Calculate Standard Error of the Mean
    # across rows, or axis = 1
    semArray = sp.sem(data,1)
    print semArray
    ## 3b ##
    # Calculate the mean across rows
    avgArray = np.mean(data,1)
    # plot
    plt.bar(range(1,7),avgArray,align="center",yerr=semArray)
    plt.show()

    (view changes)
    12:44 pm
  4. page Session 10.2 edited ... Plotting Part II: Boxplots and Graphing Shapes Using Python Informative Interlude ... fami…
    ...
    Plotting Part II: Boxplots and Graphing Shapes Using Python
    Informative Interlude
    ...
    familiar with XKCD,xkcd, the science web comic.
    Turns

    Turns
    out there
    ...
    plots into XKCDxkcd comics! To
    ...
    details here: http://matplotlib.org/xkcd/examples/showcase/xkcd.html
    http://matplotlib.org/xkcd/examples/showcase/xkcd.html

    #!/usr/bin/env python
    # http://matplotlib.org/xkcd/examples/showcase/xkcd.html
    (view changes)
    10:35 am
  5. page Session 10.2 edited Plotting Part II: Boxplots and Graphing Shapes Using Python Informative Interlude Before we ju…

    Plotting Part II: Boxplots and Graphing Shapes Using Python
    Informative Interlude
    Before we jump into this afternoon's very short lecture, there is an important functionality of matplotlib we should touch on. Many of you are probably familiar with XKCD, the science web comic.
    Turns out there is a built-in function within matplotlib that will turn your plots into XKCD comics! To explore this functionality, re-start ipython and copy and paste in the code below. More details here: http://matplotlib.org/xkcd/examples/showcase/xkcd.html
    #!/usr/bin/env python
    # http://matplotlib.org/xkcd/examples/showcase/xkcd.html
    # close ipython and restart it
    # $ quit()
    # $ ipython
    # Then, copy & paste the following:
    import matplotlib
    matplotlib.use('QT4Agg')
    from matplotlib import pyplot as plt
    import numpy as np
    import pandas as pd
    plt.xkcd()
    fig = plt.figure()
    ax = fig.add_subplot(1, 1, 1)
    ax.spines['right'].set_color('none')
    ax.spines['top'].set_color('none')
    plt.xticks([])
    plt.yticks([])
    ax.set_ylim([-30, 10])
    data = np.ones(100)
    data[70:] -= np.arange(30)
    plt.annotate(
    'THE DAY I REALIZED\nI COULD COOK BACON\nWHENEVER I WANTED',
    xy=(70, 1), arrowprops=dict(arrowstyle='->'), xytext=(15, -10))
    plt.plot(data)
    plt.xlabel('time')
    plt.ylabel('my overall health')
    fig = plt.figure()
    ax = fig.add_subplot(1, 1, 1)
    ax.bar([-0.125, 1.0-0.125], [0, 100], 0.25)
    ax.spines['right'].set_color('none')
    ax.spines['top'].set_color('none')
    ax.xaxis.set_ticks_position('bottom')
    ax.set_xticks([0, 1])
    ax.set_xlim([-0.5, 1.5])
    ax.set_ylim([0, 110])
    ax.set_xticklabels(['CONFIRMED BY\nEXPERIMENT', 'REFUTED BY\nEXPERIMENT'])
    plt.yticks([])
    plt.title("CLAIMS OF SUPERNATURAL POWERS")
    plt.show()
    # Now, turn your own plot into
    # an XKCD comic!
    fpkms = pd.read_table("FPKM_table.tab")
    # To find what row "HMLALPHA2" is in
    for num,row in fpkms.iterrows():
    if row["Gene"]=="HMLALPHA2":
    rowNumber = num
    else:
    pass
    print rowNumber
    # Get values for all columns for this row
    # Excluding Gene
    expressionValues = fpkms.ix[1617][1::]
    # Convert to Numpy Array
    expArray = np.array(expressionValues)
    print expArray
    # Get length of the array to pass to plt.bar
    l = len(expArray)
    # Plot
    plt.bar(range(1,l+1),expArray,align="center")
    plt.xticks(range(1,l+1),fpkms.columns.values[1::])
    plt.show()

    This afternoon, we'll be finishing out the second week of the bootcamp by touching on two more plotting capabilities in python: boxplots and graphing shapes. Let's begin!
    Boxplots
    (view changes)
    10:35 am

Thursday, August 13

  1. page Session 9.2 edited ... $ cuffdiff -o HS_cuffdiff/ -L HS_control,HS_15min,HS_30min,HS_60min Escherichia_coli_str_k_12_…
    ...
    $ cuffdiff -o HS_cuffdiff/ -L HS_control,HS_15min,HS_30min,HS_60min Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.1.19.gtf HS_control_cuffquant/abundances.cxb HS_15min_cuffquant/abundances.cxb HS_30min_cuffquant/abundances.cxb HS_60min_cuffquant/abundances.cxb
    $ cuffdiff -o MP_cuffdiff/ -L MP_control,MP_2H,MP_6H Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.1.19.gtf MP_control_cuffquant/abundances.cxb MP_2H_cuffquant/abundances.cxb MP_6H_cuffquant/abundances.cxb
    https://www.dropbox.com/s/zzjb6u22frfo070/splitCuffdiff.py?dl=0
    import sys
    in_name = sys.argv[1]
    ...
    outp.close()
    B. For each of the comparisons, how many genes are significantly differentially increased? How many change by 2-fold or 4-fold? How does the number of genes that increase compare to the number that decrease for each comparison?
    https://www.dropbox.com/s/tlx2ccy3t6s03n4/countDiffExp.py?dl=0
    import sys
    in_file =sys.argv[1]
    ...
    Exercise 3: Venn diagrams
    A. How many of the genes differentially expressed in long treatment samples are also differentially expressed in the short treatment samples? Does this give you more or less confidence in the results? Make sure that changing genes are going the same direction (up or down) in both experiments. As a hint, the set() datatype will likely be helpful here, since you can calculate the intersections and differences of two lists at a time.
    https://www.dropbox.com/s/vbgg9djg2dftlod/countIntersection_diffExp.py?dl=0
    import sys
    def getSet(filename):
    (view changes)
    9:30 pm
  2. page Session 9.2 edited ... Exercise 3: Venn diagrams A. How many of the genes differentially expressed in long treatment…
    ...
    Exercise 3: Venn diagrams
    A. How many of the genes differentially expressed in long treatment samples are also differentially expressed in the short treatment samples? Does this give you more or less confidence in the results? Make sure that changing genes are going the same direction (up or down) in both experiments. As a hint, the set() datatype will likely be helpful here, since you can calculate the intersections and differences of two lists at a time.
    import sys
    def getSet(filename):
    genes1 = set()
    genes2 = set()
    fh = open(filename, 'r')
    head = fh.readline()
    lines = fh.readlines()
    for line in lines:
    ln = line.strip().split()
    if ln[6] == 'OK':
    ln[9] = float(ln[9])
    if ln[9] > 1:
    genes1.add(ln[2])
    elif ln[9] < -1:
    genes2.add(ln[2])
    fh.close()
    return genes1, genes2
    set_one_up, set_one_dn = getSet(sys.argv[1])
    set_two_up, set_two_dn = getSet(sys.argv[2])
    print
    print "genes in file one: %s" % (len(set_one_up) + len(set_one_dn))
    print "genes in file two: %s" % (len(set_two_up) + len(set_two_dn))
    print "genes in both files %s:" % (len(set_one_up.intersection(set_two_up)) + len(set_one_dn.intersection(set_two_dn)))
    print

    B. Compare the overlap of differentially expressed genes in the heat shock treatment vs in phosphorus starvation. What might these genes be? Get the files with the other experiments results below. Pick what you think are the best time points.
    https://www.dropbox.com/s/hg2l5wq30409ots/HS_gene_exp.diff?dl=0
    (view changes)
    9:24 pm
  3. page Session 9.2 edited ... A. Run Cuffdiff for your experiment (HS or MP). The gene_exp.diff file will have a p-value for…
    ...
    A. Run Cuffdiff for your experiment (HS or MP). The gene_exp.diff file will have a p-value for every pairwise comparison for each gene. From that file, pull the relevant comparisons into their own file (ie. HS_control vs HS_15min, HS_control vs HS30min, etc).
    $ cuffdiff -o HS_cuffdiff/ -L HS_control,HS_15min,HS_30min,HS_60min Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.1.19.gtf HS_control_cuffquant/abundances.cxb HS_15min_cuffquant/abundances.cxb HS_30min_cuffquant/abundances.cxb HS_60min_cuffquant/abundances.cxb
    ...
    MP_2H_cuffquant/abundances.cxb MP_6H_cuffquant/abundances.cxb
    [[code format="python"]]

    import sys

    in_name <span class="sy0">=</span> <span class="st0">'MP_gene_exp.diff'</span>= sys.argv[1]
    outbase <span class="sy0">=</span> <span class="st0">'MP_gene_exp.diff'</span>= sys.argv[1]
    out_lines <span class="sy0">=</span> <span class="br0">{}</span>= {}
    inp <span class="sy0">=</span> <span class="kw2">open</span><span class="br0">(</span>in_name<span class="sy0">,</span> <span class="st0">'r'</span><span class="br0">)</span>= open(in_name, 'r')
    head <span class="sy0">=</span> inp.<span class="kw3">readline</span><span class="br0">()</span>.<span class="me1">strip</span><span class="br0">()</span>= inp.readline().strip()
    lines <span class="sy0">=</span> inp.<span class="me1">readlines</span><span class="br0">()</span>
    <span class="kw1">for</span>
    = inp.readlines()
    for
    line <span class="kw1">in</span>in lines:
    line <span class="sy0">=</span> line.<span class="me1">strip</span><span class="br0">()</span>= line.strip()
    ln <span class="sy0">=</span> line.<span class="me1">split</span><span class="br0">()</span>
    cond1<span class="sy0">,</span>
    = line.split()
    cond1,
    cond2 <span class="sy0">=</span> ln<span class="br0">[</span><span class="nu0">4</span><span class="br0">]</span><span class="sy0">,</span> ln<span class="br0">[</span><span class="nu0">5</span><span class="br0">]</span>
    <span class="kw1">if</span> cond1 <span class="sy0">!=</span> <span class="st0">'MP_control'</span> <span class="kw1">and</span> cond2 <span class="sy0">!=</span> <span class="st0">'MP_control'</span>:
    <span class="kw1">continue</span>
    = ln[4], ln[5]
    comp <span class="sy0">=</span>= cond1 + <span class="st0">"-"</span>"-" + cond2
    <span class="kw1">if</span>

    if
    comp <span class="kw1">in</span>in out_lines:
    out_lines<span class="br0">[</span>comp<span class="br0">]</span>.<span class="me1">append</span><span class="br0">(</span>line<span class="br0">)</span>
    <span class="kw1">else</span>:
    out_lines<span class="br0">[</span>comp<span class="br0">]</span> <span class="sy0">=</span> <span class="br0">[</span>line<span class="br0">]</span>
    inp.<span class="me1">close</span><span class="br0">()</span>
    <span class="kw1">for</span>

    out_lines[comp].append(line)
    else:
    out_lines[comp] = [line]
    inp.close()
    for
    c <span class="kw1">in</span>in out_lines:
    out_file <span class="sy0">=</span>= outbase + <span class="st0">"_"</span>"_" + c + <span class="st0">".txt"</span>".txt"
    outp <span class="sy0">=</span> <span class="kw2">open</span><span class="br0">(</span>out_file<span class="sy0">,</span> <span class="st0">'w'</span><span class="br0">)</span>
    <span class="kw1">print</span> <span class="sy0">>></span>outp<span class="sy0">,</span>
    = open(out_file, 'w')
    print >>outp,
    head
    <span class="kw1">print</span> <span class="sy0">>></span>outp<span class="sy0">,</span> <span class="st0">"</span><span class="es0">\n</span><span class="st0">"</span>.<span class="me1">join</span><span class="br0">(</span>out_lines<span class="br0">[</span>c<span class="br0">])</span>
    outp.<span class="me1">close</span><span class="br0">()</span>
    code

    print >>outp, "\n".join(out_lines[c])
    outp.close()

    B. For each of the comparisons, how many genes are significantly differentially increased? How many change by 2-fold or 4-fold? How does the number of genes that increase compare to the number that decrease for each comparison?
    import sys
    in_file =sys.argv[1]
    fh = open(in_file, 'r')
    head = fh.readline()
    lines = fh.readlines()
    sigUp = 0
    sigDown = 0
    twoUp = 0
    twoDown = 0
    fourUp = 0
    fourDown = 0
    totalTested = 0
    for line in lines:
    ln = line.strip().split()
    if ln[6] == 'OK':
    totalTested += 1
    ln[9] = float(ln[9])
    if ln[-1] == 'yes' and ln[9] > 0:
    sigUp += 1
    elif ln[-1] == 'yes' and ln[9] < 0:
    sigDown += 1
    if ln[9] > 1:
    twoUp += 1
    if ln[9] > 2:
    fourUp +=1
    elif ln[9] < -1:
    twoDown += 1
    if ln[9] < -2:
    fourDown += 1
    fh.close()
    print
    print "tested:", totalTested
    print "sig increase:", sigUp
    print "sig decrease:", sigUp
    print "2x increase:", twoUp
    print "2x decrease:", twoDown
    print "4x increase:", fourUp
    print "4x decrease:", fourDown
    print

    Exercise 3: Venn diagrams
    A. How many of the genes differentially expressed in long treatment samples are also differentially expressed in the short treatment samples? Does this give you more or less confidence in the results? Make sure that changing genes are going the same direction (up or down) in both experiments. As a hint, the set() datatype will likely be helpful here, since you can calculate the intersections and differences of two lists at a time.
    (view changes)
    9:12 pm
  4. page Session 9.2 edited ... ExercisesExercises Exercises-Exercise 1: Find FPKM values for all genesExercise 1: Find FPKM …
    ...
    ExercisesExercises
    Exercises-Exercise 1: Find FPKM values for all genesExercise 1: Find FPKM values for all genes
    NOTE: All commands posted as an answer are just examples. You'll need to change things so that it'll run properly on your computer.
    A.
    Run Cuffquant on the accepted_hits.bam output from Tophat using the Ensembl annotation . Remember to use the -b and -u options.
    $ cuffquant -o HS_30min_cuffquant/ -b Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.1.19.dna.toplevel.fa -u Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.1.19.gtf HS_30min_accepted_hits.bam
    B.
    ...
    for either do the heat
    Heat shock:
    https://www.dropbox.com/s/dtbejctmj0f0ia4/HS_control_abundances.cxb?dl=0
    ...
    https://www.dropbox.com/s/r76ta48r7er6f5n/MP_2H_abundances.cxb?dl=0
    https://www.dropbox.com/s/37kg74ay54gopnh/MP_6H_abundances.cxb?dl=0
    $ cuffnorm -o HS_cuffnorm/ -L HS_control,HS_15min,HS_30min,HS_60min Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.1.19.gtf HS_control_cuffquant/abundances.cxb HS_15min_cuffquant/abundances.cxb HS_30min_cuffquant/abundances.cxb HS_60min_cuffquant/abundances.cxb
    $ cuffnorm -o MP_cuffnorm/ -L MP_control,MP_2H,MP_6H Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.1.19.gtf MP_control_cuffquant/abundances.cxb MP_2H_cuffquant/abundances.cxb MP_6H_cuffquant/abundances.cxb

    Here are the Cuffnorm files if you have trouble with Cuffquant or it takes too long:
    https://www.dropbox.com/s/wydq8lckdr7kwuk/HS_genes.fpkm_table?dl=0
    ...
    Exercise 2: Find differentially expressed genes
    A. Run Cuffdiff for your experiment (HS or MP). The gene_exp.diff file will have a p-value for every pairwise comparison for each gene. From that file, pull the relevant comparisons into their own file (ie. HS_control vs HS_15min, HS_control vs HS30min, etc).
    $ cuffdiff -o HS_cuffdiff/ -L HS_control,HS_15min,HS_30min,HS_60min Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.1.19.gtf HS_control_cuffquant/abundances.cxb HS_15min_cuffquant/abundances.cxb HS_30min_cuffquant/abundances.cxb HS_60min_cuffquant/abundances.cxb
    $ cuffdiff -o MP_cuffdiff/ -L MP_control,MP_2H,MP_6H Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.1.19.gtf MP_control_cuffquant/abundances.cxb MP_2H_cuffquant/abundances.cxb MP_6H_cuffquant/abundances.cxb
    [[code format="python"]]
    in_name <span class="sy0">=</span> <span class="st0">'MP_gene_exp.diff'</span>
    outbase <span class="sy0">=</span> <span class="st0">'MP_gene_exp.diff'</span>
    out_lines <span class="sy0">=</span> <span class="br0">{}</span>
    inp <span class="sy0">=</span> <span class="kw2">open</span><span class="br0">(</span>in_name<span class="sy0">,</span> <span class="st0">'r'</span><span class="br0">)</span>
    head <span class="sy0">=</span> inp.<span class="kw3">readline</span><span class="br0">()</span>.<span class="me1">strip</span><span class="br0">()</span>
    lines <span class="sy0">=</span> inp.<span class="me1">readlines</span><span class="br0">()</span>
    <span class="kw1">for</span> line <span class="kw1">in</span> lines:
    line <span class="sy0">=</span> line.<span class="me1">strip</span><span class="br0">()</span>
    ln <span class="sy0">=</span> line.<span class="me1">split</span><span class="br0">()</span>
    cond1<span class="sy0">,</span> cond2 <span class="sy0">=</span> ln<span class="br0">[</span><span class="nu0">4</span><span class="br0">]</span><span class="sy0">,</span> ln<span class="br0">[</span><span class="nu0">5</span><span class="br0">]</span>
    <span class="kw1">if</span> cond1 <span class="sy0">!=</span> <span class="st0">'MP_control'</span> <span class="kw1">and</span> cond2 <span class="sy0">!=</span> <span class="st0">'MP_control'</span>:
    <span class="kw1">continue</span>
    comp <span class="sy0">=</span> cond1 + <span class="st0">"-"</span> + cond2
    <span class="kw1">if</span> comp <span class="kw1">in</span> out_lines:
    out_lines<span class="br0">[</span>comp<span class="br0">]</span>.<span class="me1">append</span><span class="br0">(</span>line<span class="br0">)</span>
    <span class="kw1">else</span>:
    out_lines<span class="br0">[</span>comp<span class="br0">]</span> <span class="sy0">=</span> <span class="br0">[</span>line<span class="br0">]</span>
    inp.<span class="me1">close</span><span class="br0">()</span>
    <span class="kw1">for</span> c <span class="kw1">in</span> out_lines:
    out_file <span class="sy0">=</span> outbase + <span class="st0">"_"</span> + c + <span class="st0">".txt"</span>
    outp <span class="sy0">=</span> <span class="kw2">open</span><span class="br0">(</span>out_file<span class="sy0">,</span> <span class="st0">'w'</span><span class="br0">)</span>
    <span class="kw1">print</span> <span class="sy0">>></span>outp<span class="sy0">,</span> head
    <span class="kw1">print</span> <span class="sy0">>></span>outp<span class="sy0">,</span> <span class="st0">"</span><span class="es0">\n</span><span class="st0">"</span>.<span class="me1">join</span><span class="br0">(</span>out_lines<span class="br0">[</span>c<span class="br0">])</span>
    outp.<span class="me1">close</span><span class="br0">()</span>
    code

    B. For each of the comparisons, how many genes are significantly differentially increased? How many change by 2-fold or 4-fold? How does the number of genes that increase compare to the number that decrease for each comparison?
    Exercise 3: Venn diagrams
    (view changes)
    9:09 pm

More