Manipulating Files and Processing Text


Topics:

  • Basic text processing with split, join, and partition
  • Text testing with endswith(), startswith(), find()
  • Text conversion with swapcase(), replace(), upper(), and lower()
  • Opening and closing filehandles
  • Reading from the filehandle with read(), readline(), and readlines()
  • Reading from the filehandle iterable
  • Writing or appending to a file with write() and writelines()
  • Writing to a file with a loop

Introduction:

We've learned so far how we can write programs to make many, many decisions with an ordered logic to process information. What we've lacked thus far is how to input and output large tomes of data. In addition to manipulating large amounts of data with functions that open, read, write, and close files, we'll also benefit from learning about Python's marvelously powerful abilities to process text. Not to malign the now-dead king of text-processing languages, Perl (The King is Dead! Long Live the King!), Python really cleans house with it's unparalleled text-processing abilities with respect to both speed and ease of use.

Basic Text Processing

Systematically manipulating large text files is one of the most common tasks you will encounter. The most basic tools for this task are the built-in Python string methods. These allow us to convert between strings and lists, test the properties of strings, and modify strings.


Informative Interlude: getting ahead of ourselves with methods vs. functions

Tomorrow, we're going to learn all about writing our own functions to process information. These will be sets of logic that consider variables and manipulate them according to the logic that we assign. In a sense, the functions are formally encapsulated manifestations of the sorts of things we've been writing with our scripts all week.

But, as we're going to see with strings, many types of objects have special built-in functions. We call these endemic functions methods, and in a broader discussion of objected-oriented programming practice and theory, we would have much, much more to say about them. However, we're not getting into the object-oriented universe or philosophy here, so you'll have to take as explanation simply that some objects are so routinely manipulated with the same sorts of operations that it pays to have functions dedicated to their processing. In the case of strings and files today, we'll see the methods that routinely operate on these types.

Whereas a function is written to accept variables and arguments to manipulate those variables with, a method already exists for the object under manipulation and is called differently. Whereas a function such as print is called by typing print(string_variable), etc, a method is called by typing a period and the name of the method the end of the object. For example, ifprint were a method, it would be called like this: string_variable.print(). Notice that there are still () at the end of the name of the method, and methods can accept arguments just like functions. If all this seems eerily familiar, it may be because we've already seen the list methods append() and extend() earlier in the week. All apologies if this seems out of order and confusing, but we'll see how these concepts interoperate in more detail as the week progresses. This is why these paragraphs are in an I.I. after all...

Split()


Let's consider the task of converting a character string of a sentence into a list of words separated by spaces and punctuation marks:


#!/usr/bin/python
 
delimiter = ","
sentence_string = "I am a well-written sentence, and so I dependably have punctuation. "
list_from_string = sentence_string.split(delimiter)
print "clause one %s" % list_from_string[0]
print "clause two %s" % list_from_string[1]

clause one I am a well-written sentence
clause two and so I dependably have punctuation.

Note that as we've split with a comma, the comma doesn't appear in our list. We can try out what happens with different arguments to split().

# we don't need to specify the delimiter in a different variable
 
list_A = sentence_string.split(' ')
print list_A
for word in list_A:
     print word
 
list_B = sentence_string.split('a')
print list_B
for vowel_handicapped_lump in list_B:
     print vowel_handicapped_lump

You might also want to take a string and turn it letter-by-letter into a list. Although this isn't done by split(), it fits nicely here:

list_C = list(sentence_string)
print list_C
for letter in list_C:
     print letter

split() also can take a second argument (see, as always, the string methods documentation ): you can specify how many times you want to split.

sentence_string = "I am a well-written sentence, and so I dependably have punctuation. "
list_from_string = sentence_string.split(' ', 3)
print list_from_string
for item in list_from_string:
     print item

Now let's see what happens when two delimiters are next to each other:

list_from_string = sentence_string.split('t')
print list_from_string
for consonant_crippled_lump in list_from_string:
     print consonant_crippled_lump

We can see that we have a blank space in our list: "written," in particular, was split into three parts: ["...wri","","en..."]. If delimiters are adjacent to each other, it will find that empty string between them and give it to you at the appropriate spot. It's a very one-hand-clapping-in-a-forest sort of thing.

However, there is an exception to this. If you glanced at the split() documentation, you might have noticed that all of its arguments are, in fact, in brackets. That means that it doesn't need arguments to run: it has a default behavior.

# this should look the same as splitting by spaces
list_from_string = sentence_string.split()
for item in list_from_string:
     print item
 
# this is not the same as splitting by spaces -- no empty items!
sentence_string = "   this      is    a   different                         string"
list_from_string = sentence_string.split()
for item in list_from_string:
     print item
 
sentence_string = '''   complete
\t\t whitespace                      chaos
             !!!!!!!!!!!         '''
list_from_string = sentence_string.split()
for item in list_from_string:
     print item

We see that the default behavior of split() is to:
  1. Remove all kinds of whitespace from the beginning and end of the string.
  2. Condense all adjacent whitespaces to single space characters.
  3. Split on those spaces.

This turns out to be really handy. For instance, if you're using someone else's table, and, as happens more often than you might want to think, they've done a poor job delimiting their fields systematically with whitespace, this cleans things up quickly and easily in just one line.

You'll learn to extend this power of whitespace to other characters, sets of characters, and all sorts of exotic delimiters.

The split() method being popular, it has a few hangers-on:

toes = '''went to the market
stayed home
had roast beef
had none
cried wee wee wee all the way home'''
 
# splitlines splits on linebreaks
list_from_string = toes.splitlines()
print list_from_string
for toe in list_from_string:
     print "this little piggy %s" % toe
 
# from the end of the string
last_toe = "and _this_ little piggy went wee wee wee all the way home"
# when given a second argument, reverse split counts
list_from_string = last_toe.rsplit(' ',7)
print list_from_string
for item in list_from_string:
     print item

Though the partition() method isn't named after split(), it's a very similar method. partition() works a lot like split(delimiter,1), taking a delimiter and splitting at the first instance. However, while split(delimiter,1) will return either a list of length two (if it split successfully) or a list of length one (if it didn't), partition() will always return a tuple of length three. Let's look at the output.

rhyme = '''There was a crooked man
Who walked a crooked mile.
He found a crooked sixpence
Against a crooked stile.
He bought a crooked cat
Which caught a crooked mouse,
And they all lived together
In a crooked little house.'''
 
# you can split on words as well as single letters and symbols
split_list = rhyme.split('crooked',1)
print split_list
print
print "List output:"
for item in split_list:
     print item
print
 
# partition keeps the delimiter in your list
partition_list = rhyme.partition('crooked')
print partition_list
print "Partition output:"
for item in partition_list:
     print item

What if the delimiter doesn't occur within the string?

split_list = rhyme.split('happiness',1)
print split_list
print "List output:"
 
for item in split_list:
    print item
print
 
# Notice that partition still produces a 3-item list, but the last two elements are empty strings.
partition_list = rhyme.partition('happiness')
print partition_list
print "Partition output:"
for item in partition_list:
    print item
 

This can be useful if you are looking for that second item, but you're not sure if it's going to be there. The string could be user generated or read in from a file, and you want to gracefully do one thing if it's there and another if it's not. split() can be less than graceful about this:

if rhyme.split('happiness')[1]:
# if it's there you're all good
else:
# if it isn't your program will crash
 
# vs
if rhyme.partition('happiness')[2]:
# parse the wanted information out of it
else:
# wait until the next line

Join()


So now we're pretty good at splitting things up, but how do we put things together again? join() takes care of that: it turns lists into strings. Surprisingly enough, it's not a method of lists. It's a string method, and it relies on the delimiter to know how to put lists together. This little surprise renders the syntax of join() to be among the most unintuitive of all syntactic trifles, but we will persevere if we concentrate on the fact that just like split(), join() is a method of strings.

broken = ['hu','m','pty',' du','mpty']
all_the_kings_horses = '...'
all_the_kings_men = '+++'
first_try = all_the_kings_horses.join(broken)
second_try = all_the_kings_men.join(broken)
if (first_try == 'humpty dumpty') or (second_try =='humpty dumpty'):
     print 'hooray!'
else:
     print '''All the king's horses and all the king's men couldn't put Humpty together again'''

Like split, join can usefully use the empty string-- it glues the components of the list directly together.

third_try = ''.join(broken)
print third_try
# Paradoxically,'nothing' can put poor Humpty together again
#To summarize, the syntax of join is variable=''.join(list)

This is in fact the usual way to use join() -- you don't need to declare a separate variable to act as the glue.

fairy_tale_characters = ['witch','rapunzel','prince']
plot = 'hair'.join(fairy_tale_characters)
print plot
 

Testing Text: startswith(), endswith(), and find()


We just saw how you can use an if statement to test for the presence of a delimiter with partition(). There are other tests you will often be interested in, for example asking if a string begins with, ends with, or contains a substring of interest.

#!/usr/bin/env python
 
id_number = '1131431a'
 
# let's see if the id_number string starts with the number one
if (id_number[0] == '1'):
    print "this id starts with a 1!"
 
# now let's use the string method startswith()
if ( id_number.startswith('1') ):
    print "this id starts with a 1!"
 
# and here's the endswith() method
if ( id_number.endswith('1') ):
    print "This id number ends with a 1!"
else:
    print "This id number doesn't end with a 1 at all!"
 
# and these methods can get a little fancier by having multiple things to
# test for if you provide a tuple of characters
if ( id_number.endswith( ('1', 'a') ) ):
    print "this id number ended with either an 'a' or a '1' "
else:
    pass
 

Or maybe we don't care what the string starts or ends with as long as it contains a substring of interest. For this, we can use the find() method, which will return the index of the substring. But be careful when you write if tests using the find() method, as it returns the index of the substring only if the substring is found.
Otherwise,find()returns the integer -1, which is not a zero, and thus will pass the if test as True.

beatles = "johnpaulgeorgeandringo"
 
# the wrong way
if ( beatles.find('paul')):
    print "At least we've got a bassist."
else:
    print "Anyone here play bass?"
 
# let's do a comparison for -1 instead
if not (beatles.find('paul') == -1):
    print "At least we've got a bassist"
else:
    print "Well, I guess we're a three piece."

Text Conversions


Systematically replacing the instances of a substring with a replacement substring may be a familiar task of tedium. Python has several methods for systematically converting characters in strings. The most general is the method replace().

beatles = 'johnpaulgeorgeandringo'
beatles = beatles.replace('george', 'PETER')
print beatles
 
# YES! Peter's in!
 
beatles = beatles + "MOREPETER!"
print beatles.replace("PETER", "DIANA!")
print beatles
 
# and we can tell replace how many replacements to make, starting at the beginning
print beatles.replace("PETER", "DIANA!", 1)
print beatles
 
# but notice that replace() does not change the string in place; you have
# to reassign the variable to "save" the change

Since Python is case sensitive, as are most UNIX-based bioinformatics programs you'll be interested in using, you may also find yourself wishing that all the text in your data was the same case. There are methods for both testing and converting cases.

# why not use something a touch relevant for a change
blast_hit = 'ACTGTCAGTACGTAGCATCGAaaatCGATCGACTGAatacgatCG'
 
if ( blast_hit.isupper() ):
    pass
else:
    blast_hit = blast_hit.upper()
    print blast_hit
 
# or if you prefer lower case
 
blast_hit = blast_hit.lower()
print blast_hit
 
# or if you are (or the program you're writing is) indecisive
 
blast_hit = blast_hit.swapcase()
print blast_hit
 
# and we might also be interested in these methods
 
if ( blast_hit.isalpha() ):
    print "we got all letters here"
else:
    print "whoa, something doesn't look like nucleotides!"

Files and Filehandles


Now that we can process text, all we need is... more text. And odds are, that text is going to come in the form of a file, so it's high time that we start using them.

Opening filehandles


A filehandle is an object that controls the stream of information between your program and a file stored somewhere on the computer. Filehandles are not filenames, and they are not the files themselves. They are a tool that your program uses to interact with files, nothing more (for instance, deleting a filehandle in your script using the del command does nothing to the file that handle refers to).

We create filehandles in the simplest sense with the open() command:

fh = open('some_file')

where some_file is the path to a file (i.e. the filename) on your filesystem. In general, it is good practice to use absolute path nomenclature (e.g. /Users/aaron/some_file or /home/aaron/some_file), but you can be lazy if you know the file you want is going to be in the same directory as your program.

#!/usr/bin/env python
 
fh = open('hello.txt')
contents = fh.read()
print contents
fh.close()
$ ./hello.py
#!/usr/bin/env python
THIS IS A TEXT FILE
THAT I AM USING AS AN EXAMPLE
AND WE ARE CURRENTLY
READING FROM IT.

As you can see, the read() method of the filehandle just sucks in the whole file in a single string, newlines and all! This is quick and easy, for sure, but it's not necessarily the most orderly way to deal with the contents of a file.

readline(), readlines(), and strip()


Copy the contents of the following snippet to a text file in your directory for this session, and save the file as pdb_head.

HEADER OXIDOREDUCTASE 08-JUL-97 1AOP
TITLE SULFITE REDUCTASE STRUCTURE AT 1.6 ANGSTROM RESOLUTION
COMPND MOL_ID: 1;
COMPND 2 MOLECULE: SULFITE REDUCTASE HEMOPROTEIN;
COMPND 3 CHAIN: A;

Then try the following:

#!/usr/bin/env python
 
filename = 'pdb_head'
fh = open(filename, 'r')
# the 'r' is for 'read-only', which will keep us from being able to alter
# this file with the filehandle we just created
 
print fh.readline()
print fh.readline()
 
lines = fh.readlines()
 
fh.close()
 
print lines

$ ./hello.py
HEADER OXIDOREDUCTASE 08-JUL-97 1AOP

TITLE SULFITE REDUCTASE STRUCTURE AT 1.6 ANGSTROM RESOLUTION

['COMPND MOL_ID: 1; \n', 'COMPND 2 MOLECULE: SULFITE REDUCTASE HEMOPROTEIN; \n', 'COMPND 3 CHAIN: A; \n']

While this is a bit of a mess, a few things should become apparent:
  1. fh.readline() takes in one line (and since print() also supplies a newline, we've got an extra linebreak after each of the first two print statements.
  2. fh.readlines() (plural!) takes the entire file, from the current read position all the way to the end, giving back a list of lines (again, with newlines intact).
  3. This file has a bunch of whitespace cluttering things up at the end of each line.

All of these complications are easily resolved with the use of the strip() method whenever we actually make use of the lines we read:



#!/usr/bin/env python
 
filename = 'pdb_head'
fh = open(filename, 'r')
 
print fh.readline().strip()
print fh.readline().strip()
 
lines = fh.readlines()
 
fh.close()
 
lines[0] = lines[0].strip()
 
print lines

$ ./hello.py

HEADER OXIDOREDUCTASE 08-JUL-97 1AOP
TITLE SULFITE REDUCTASE STRUCTURE AT 1.6 ANGSTROM RESOLUTION
['COMPND MOL_ID: 1;', 'COMPND 2 MOLECULE: SULFITE REDUCTASE HEMOPROTEIN; \n', 'COMPND 3 CHAIN: A; \n']

Now the spaces and newlines are gone from the first two, and from the 0th element of the list I printed in the last print statement (since I only bothered to strip() and put back the 0th element).

One crucially important concept of file input in Python is that each time you read something by any of the three methods I've described, you advance the position of the filehandle in the file, which means that you never get the same character or characters twice (unless of course they're in the file twice!)

This is why reading from the filehandle with fh.readline() twice in a row gave two different values; as soon as the line is read, the filehandle has moved to the next line, awaiting another read request. This is an example of an iterable type, meaning that the filehandle is a type of object that knows how to advance itself in anticipation of the next request. That means that to get back to the beginning of the file, you must either close the file with the close() and reopen it, or use the seek() method of the filehandle (which we don't have time to go into -- google is your friend!)

While potentially a bit odd now, this behavior will be essential when we discuss reading file contents with loops.... oh, speaking of...

Reading files in a loop


Certainly one of the most common contexts in which you'll encounter for loops is in working your way through a file. You can just put together two things we've already seen to get to where we need to be:


#!/usr/bin/env python
 
fh = open('pdb_head')
lines = fh.readlines()
for line in lines:
    fields = []
    fields.append(line[0:6].strip())
    fields.append(line[6:10].strip())
    print '0th field: %s, 1st field: %s' % (fields[0],fields[1])
$ ./hello.py
0th field: HEADER, 1stfield: OXI
0thfield: TITLE, 1stfield: SULF
0thfield: COMPND, 1stfield: MOL
0thfield: COMPND, 1stfield: 2 M
0thfield: COMPND, 1stfield: 3 C

This is starting to get a little fancier, but we're only doing things you've seen before: read all the lines in a file into a list, then iterate over the list, looking for a couple of different parts of the line, stripping off leading and trailing whitespace, then printing the first and second elements of the resulting list.

We can simplify this one more step using the fact that filehandles are iterable, and know what's being asked of them. So we can replace this:

lines = fh.readlines()
for line in lines:

with:

for line in fh:
to exactly the same end.

Writing to Files


Writing output is sorta like doing the dishes. You just did all this work to cook up a fancy program and analyze some data, and the last thing you want to do is put all your answers away into clean little output files. Fortunately, we'll learn about pickle files later, but for now, we'd best make sure you know how to write output to a file.

The default behavior of the filehandle is to open the file supplied in read mode. However, by giving an additional argument, you can either add lines to the bottom of the specified file, or overwrite it entirely:

#!/usr/bin/env python
 
filename = 'test_out'
fh = open(filename, 'w')
# 'w' flag means "writeable"
 
fh.write('Historically, this lesson was used as a medium to hurtle insults between')
fh.write(' Matt and our former labmate Brant.\n')
# note that we have to add the '\n' if we want it at the end of the line;
# this is in contrast to the print command's behavior.
 
fh.close()
 
filename = 'test_out2'
fh = open(filename, 'a')
# 'a' flag means "append"
 
fh.write("Unfortunately, I have no beef with Aisha, so this section is a bit mundane.\n")
 
fh.close()

While this script doesn't print anything to the screen, if you run it a few times and look at the contents of test_out vs test_out2, the distinction between the 'w' and 'a' arguments to open()should become clear.

When reading files, the close() method is a good thing to keep in mind, but if you forget it, python will close the file at the end of the program's execution. With writing files, however, python may not make the changes you stipulate right away, so if you plan to evaluate the contents of the file you're writing in the same script (or for instance use that file for something else during the run of that script) it is wise to close the filehandle to ensure that all the write operations you've requested are performed.

While python has no writeline() method, the other two read methods are mirrored for writing to files. The first, write() you've already seen. It takes a string, and puts it in a file. The only difference between this and writelines() is that writelines() takes a list of strings, and writes them all (But beware! If you want those strings to appear on separate lines, they had best all end with a \n!)


#!/usr/bin/env python
 
filename = 'test_out'
fh = open(filename, 'w')  # 'w' flag means "writeable"
 
lines = ["Aisha is a friendly dudette.\n", "You'd better be one too.\n"]
lines.extend(["Or next year, she might use this space\n",
"to write a phish song about you.\n"])
 
fh.writelines(lines)
 
fh.close()

And check out the contents of test_out to see your many-line-writing machine in action!



Exercises


1. Pile of basic split drills:

  • Turn 'Humpty Dumpty sat on a wall' into ['Humpty','Dumpty','sat','on','a', 'wall']
  • Turn 'Humpty Dumpty had a great fall' into ['Humpty Dumpty had a ', ' fall']
  • Turn "All the King's horses" into ["All the King's hor",'e',''] (note: there is still an "s" at the end of "King's")
  • Turn "and all the King's men" into ['and a',''," the King's men"] (note: there is a space at the beginning of " the King's men")
  • Turn "couldn't put Humpty together again" into 'again' (using one line)

2. Pile of basic split, join, and replacement drills:

  • Turn ' Sara AishaEllahi Jeremy\n' into Diana\tDebbieThurtle\tChris'
  • Turn 'Sara,Aisha,James' into 'SARA\tAISHA\tJAMES\t'

3. Using the names of the instructors and TA's (Aisha, Sara, James, Mel, Courtney, Chris), write each possible pair of names to a file, separated by a line of hyphens (i.e. '-----------------')

4. Reopen the last output file, and read in the file, then write the lines back out (to a new file) in reverse order, in all capital letters.

5. Parse a FASTA file

Copy the text below into a text file and save it as seq.FASTA

>gene1
ATGAGACGTAGTGCCAGTAGCGCGATGTAGCG
ATGACGCATGACGCGCGACGCGCGAGTGAGCC
ATACGCACGCATTGGCA
>gene2
ATGTTCGACGCATACGACGCGCAGTACCAGCA
ATGACGCACCGGGATACACGACGCGGATTTTT
ACGCACCGAGATAGCATAAAAGACCATTAG
>gene3
TTATGGCACCCACTAGAGCCAGATTATTTTAAA

Write a script called read_fasta.py that will open this file, read the lines, and store the data as a dictionary keyed by gene with values of the sequence. Make sure the sequences are contiguous (i.e. contain no endline characters), and make sure to remove the > from the names of the genes.





Solutions


Alternate solutions in iPython Notebook are available here.

1. Pile of basic split drills:
  • Turn 'Humpty Dumpty sat on a wall' into ['Humpty','Dumpty','sat','on','a', 'wall']
  • Turn 'Humpty Dumpty had a great fall' into ['Humpty Dumpty had a ', ' fall']
  • Turn "All the King's horses" into ["All the King's hor",'e',''] (note: there is still an "s" at the end of "King's")
  • Turn "and all the King's men" into ['and a',''," the King's men"] (note: there is a space at the beginning of " the King's men")
  • Turn "couldn't put Humpty together again" into 'again' (using one line)

#1 Pile of basic split drills:
#Turn 'Humpty Dumpty sat on a wall' into ['Humpty','Dumpty','sat','on','a', 'wall']
 
s='Humpty Dumpty sat on a wall'
split_string=s.split()
print split_string
 
#Turn 'Humpty Dumpty had a great fall' into ['Humpty Dumpty had a ', ' fall']
 
s2='Humpty Dumpty had a great fall'
print s2
split2=s2.split('great')
print split2
 
#Turn "All the King's horses" into ["All the King's hor",'e',''] (note: there is still an "s" at the end of "King's")
 
s3="All the King's horses"
split3=s3.rsplit('s',2)
print split3
 
#Turn "and all the King's men" into ['and a',''," the King's men"] (note: there is a space at the beginning of " the King's men")
 
s4="and all the King's men"
split4=s4.split('l')
print split4
 
#Turn "couldn't put Humpty together again" into 'again' (using one line)
 
s5="couldn't put Humpty together again"
split5=s5.partition('again')[1]
print split5
 
 
2. Pile of basic split, join, and replacement drills:

  • Turn ' Sarah AishaEllahi Jeremy\n' into Diana\tDebbieThurtle\tPeter'
  • Turn 'Sarah,Aisha,Jeremy' into 'SARAH\tAISHA\tJEREMY\t'
string2='Sara AishaEllahi Jeremy\n'
print string2
 
 
string2=string2.replace('Sara ','Diana\t')
string2=string2.replace('AishaEllahi','DebbieThurtle\t')
string2=string2.replace('Jeremy\n','Peter')
print string2
 
##of course, this replacement could also be done in one step, but that sort of feels like cheating, doesn't it?:
 
string2=string2.replace(string2,'Diana\tDebbieThurtle\tPeter')
print string2
 
##For second part: Turn 'Sara,Aisha,Jeremy' into 'SARAH\tAISHA\tJEREMY\t'
 
names='Sara,Aisha,Jeremy'
print names
names=names.upper()
names=names.replace(',','\t')+'\t'
print names
3. Using the names of all six instructors and TA's (Aisha, Sara, James, Mel, Courtney, Chris), write each possible pair of names to a file, separated by a line of hyphens (i.e. '-----------------')

fh=open('names','w')
list_of_names=['Aisha','Sara','James','Mel','Courtney','Chris']
#Use a for loop within a for loop to pair one name with every other name in the list.
for name in list_of_names:
    for name2 in list_of_names:
        if name==name2:
            pass
        else:
            line=name+'-----------------'+name2
            fh.write(line+'\n')
 
fh.close()
 
 
### More complex (but more correct) solution - accounting for repeat pairs
 
fh=open('names','w')
list_of_names=['Aisha','Sara','James','Mel','Courtney','Chris']
 
# We create an empty set in which we are going to deposit every unique pair we have written
uniqpairs = set([])
 
for name in list_of_names:
    for name2 in list_of_names:
        if name==name2:
            pass
        else:
            pair=[name,name2]
            pair.sort()
            # Sets can only contain immutable elements, so we need to turn our list pair into a tuple
            pair = tuple(pair)
            if pair in uniqpairs:
                pass
            else:
                pair_string = "-------------------".join(pair)
                uniqpairs.add(pair)
                fh.write(pair_string+'\n')
 
fh.close()
4. Reopen the last output file, and read in the file, then write the lines back out (to a new file) in reverse order, in all capital letters.

fh=open('names','r')  #Open the previous file 'names' with read-only status
fh2=open('names_reverse','w') #Open a new file with the 'w' flag, in which you will write the reversed names
lines=fh.readlines() #Store the lines of fh into a list using readlines()
lines.reverse()  #Reverse the order of these lines using the reverse method of lists
for line in lines:   #Loop through the list and re-rewrite each item as a new line; don't forget the '\n'!
    line=line.upper()
    fh2.write(line+'\n')
 
fh.close()
fh2.close()
5. Parse a FASTA file

fh=open('seq.FASTA','r')  #Open the fasta file for reading
genes={}  #Create an empty dictionary that will be populated with 'genes' as keys and sequences as values
for line in fh:     #Parse each line of the fasta file by looping through the file
    line=line.strip()  #Strip to remove all whitespace and newline characters
    if line[0]=='>':  #Search for the '>' as the first element of the string to tell you if you're dealing with a gene or a sequence
        gene_name=line[1:]   #Define a new variable, gene_name, that is equal to everything in the line except the '>'.
        genes[gene_name]=''  #Make gene_name a key in your dictionary, and use an empty string '' as a value placeholder
    else:
        genes[gene_name]+=line  #Add the subsequent sequence to the value
fh.close()
 
print genes