17. File IO

  • We know how to read input from a user

  • We know how to store data in variables and lists

  • We know how to manipulate data

  • The trouble is, if we have large amounts of data, inputting data with input is not workable

  • Fortunately an easy way to address this is reading data from a file

17.1. Text Files

  • Text files are great way to store textual data

  • They typically have the file extension “.txt”, but the actual extension doesn’t really matter

  • Most of what we are about to see will work on many different file types too (not just text files)

17.1.1. Reading from a Text File

  • There are a few ways to open and read from a file, but the easiest is as follows

1my_file = open("someFileName.txt", "r")
  • The above example opens up a file named "someFileName.txt" in read only mode ("r")

    • This assumes that the file being opened is in the current working directory

  • A reference to the file is stored in the variable my_file

Activity

  1. Create a text file somewhere on your computer (perhaps Desktop for ease).

  2. Upload the file to Colab.

  3. Open your file like in the above example, but with your proper file name.

  4. Try using the methods .readline() and .read().

  5. See if you can figure out how to re-read from the file after you already read the full contents.

  • Note that there are many more methods available beyond .readline() and .read(), but these will likely be the ones you use the most

    • read reads the entire contents of the file

    • readline reads a single line from the file

  • It is also important to .close() the file once you are done using it in Python

17.1.2. Writing to a Text File

  • Writing to a text file is similarly simple

1my_other_file = open("anotherFileName.txt", "w")
  • Unlike reading however, the file does not need to exist

  • Python will create a new file with the name "anotherFileName.txt"

  • The most commonly used methods you will likely use when writing to a file will be .write(text) and .writelines(listOfText)

    • write will write the provided text to the file

    • writelines will write multiple lines of text to a file based on a list of strings — each string in the list will be its own line

Activity

  1. Open some file in write only mode ("w") in Python with a name of your choice.

  2. Use the .write() method to write contents to the file.

  3. Once you are done writing to the file, use the .close() method to close the file.

  4. Open the file you just created in some text editor and confirm that it matches what you wrote.

Warning

It is very important to .close() your files when you are done with them, especially when writing to a file. Based on how Python writes to files, the contents you write are not sent to the file right away. Instead, it goes to something called a buffer that periodically writes to the file. If you fail to .close() your file, there is a chance that the buffer never finished writing to the file before the program terminated. When you .close() the file, it flushes the buffer, meaning that anything left in the buffer will be written to the file.

17.2. Comma Seperated Values (CSV)

  • CSV files are are a popular file format for tabular data

    • Data that can be stored in a table

    • Think of rows and columns of data, like in a spreadsheet

  • CSV files are stored in plain text, but values are seperated with commas

    • You may come across CSV files that use tabs or spaces to separate data

  • They can be read in a simple text editor, or even in a spreadsheet program where it will format the data nicely

    • In fact, you can typically save data from a spreadsheet into a CSV file

  • An example of data in a CSV is as follows

1name, height, weight, IQ
2Subject 1, 170, 68, 100
3Subject 2, 182, 80, 110
4Subject 3, 155, 54, 105
  • The above example can be represented in a table as follows

CSV Viewed as a Table

name

height

weight

IQ

Subject 1

170

68

100

Subject 2

182

80

110

Subject 3

155

54

105

  • The first line in the example CSV is a header, which explains the values in each column

    • You do not need these, some CSV files have them, some don’t

17.2.1. Reading a CSV File

  • Python has a built-in library to help make reading CSV files simple

  • In fact, you have already seen this in the Starbucks Density assignment

 1def load_starbucks_data(file_name: str) -> list:
 2
 3    import csv
 4
 5    # Open the Starbucks file specified by file_name
 6    starbucks_file = open(file_name, "r")
 7    starbucks_file_reader = csv.reader(starbucks_file)
 8
 9    # Create an empty list that the Starbucks location tuples will be added to
10    starbucks_locations = []
11
12    # For each row in the file, create a tuple of the lat/lon pair and add it to the list
13    for row in starbucks_file_reader:
14        location_tuple = (float(row[0]), float(row[1]))
15        starbucks_locations.append(location_tuple)
16
17    starbucks_file.close()
18    return starbucks_locations
  • The emphasized line with the for loop is the trick to reading data from the csv reader

  • When using the for loop, we read one row at a time from the file

    • The file is like a collection of rows

    • So, for each row in the collection of rows

  • Here, the variable row will store a reference to the row’s data in the form of a list, where each element in the list is from a different column

Activity

  1. Download this csv file to your computer and then upload it to Colab.

  2. Write a function called load_airports() that loads this CSV file into a list and returns the list.

    • Use load_starbucks_data as a reference

  3. Play around with the data a little to get a feel for how the information is stored in the list.

Activity

Write a function get_name_from_code(airport_code, airport_list) that will return a string containing the full name of the airport with the corresponding airport_code. The parameter airport_list should be the list you loaded using load_airports().

If your function made use of a linear search, can you think of a way to alter get_name_from_code and load_airports such that you do not need a linear search?

17.2.2. Writing to a CSV File

  • If we have large amounts of tabular data in our program we want to save to a file, we can write to a CSV file

1# Create a file to write to
2out_file = open("nameOfOutputFile.csv", "w")
3csv_out_file = csv.writer(out_file)
4
5# Write a row to the file
6csv_out_file.writerow(['First cell','Second cell', 'Third cell'])
  • In the above example, notice that all the data for the row is contained within a list

    • This is similar to how the data is read in as a list

  • With a csv writer, there are two important methods for us to know

    • writerow, which was discussed above

    • writerows, which takes a list of lists to write a large block of data

17.3. For Next Class