Dictionaries

Motivation

Suppose we need to represent years and the total North American fossil fuel CO2 emissions for those years.

Question: How should we do this?

  • One option is to use parallel lists, in which the years list at position i corresponds to the emissions list at position i:

    years = [1799, 1800, 1801, 1802, 1902, 2002]

    emissions = [1, 70, 74, 79, 82, 1733297] # metric tons of carbon, thousands

    Question: How would operations on the data work? For example:

     (a) to add an entry, such as year 1950 and emissions 734914?
    
     We need to modify both lists.  
     We could append or keep both lists sorted (then must find the right spot
     and insert there).
     Either way, both lists must be kept in sync.
    
     (b) to edit the emissions value for a particular year?
    
     We need to find the year in the years lists and modify the 
     corresponding item in the emissions list.
In general, storing the vlaues in this format is not terribly convenient.

Notice that the lists don't explicitly represent the associations like (1799, 1).

  • A second option is to use a list of lists. For example,

    years_emissions = [[1799, 1], [1800, 70], [1801, 74], [1802, 79], [1902, 82], [2002, 1733297]]

    This is better, but it is still hard to look up a year, because we must search the list to find it.

There is a better way: a new type of object called a dictionary, which is represented by Python's type dict.

Dictionary basics

A dictionary keeps track of associations for you. Let's consider the emissions example:

In [1]:
# Braces indicate that you are defining a dictionary.
emissions_by_year = {1799: 1, 1800: 70, 1801: 74, 1802: 79,
                     1902: 82, 2002: 1733297}        

# Look up the emissions for the given year
print(emissions_by_year[1801])

# Add another year to the dictionary
emissions_by_year[1950] = 734914
print(emissions_by_year[1950])        
74
734914

Dictionary entries have two parts: a key and a value. In our example, the key is the year and the value is the CO2 emissions.

Why is it called a key? Like a physical (or metaphorical) key, it provides a means of gaining access to something.

Keys don't have to be numbers, but they do have to be immutable objects.

In [2]:
d = {1: 5, 3: 45, 4: 10}
d["abc"] = "Hello!"
d[ [1, 2, 3] ] = 77        # error; the list [1, 2, 3] cannot
                           # be a key because it is mutable.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-23c0533d9190> in <module>()
      1 d = {1: 5, 3: 45, 4: 10}
      2 d["abc"] = "Hello!"
----> 3 d[ [1, 2, 3] ] = 77        # error; the list [1, 2, 3] cannot
      4                            # be a key because it is mutable.

TypeError: unhashable type: 'list'

And the associated values can be anything: any type, and mutable or not.

In [3]:
d = {}
d[5] = ("Diane", "978-6024", "BA", 4236)
d["weird"] = ["my", "you", "walrus"]
d["nested"] = {"diane": 4236, "paul": 4234}  # The values can be dicts.
print(d)
{'weird': ['my', 'you', 'walrus'], 'nested': {'diane': 4236, 'paul': 4234}, 5: ('Diane', '978-6024', 'BA', 4236)}

Dictionaries themselves are mutable.

In [4]:
print(id(d))
d["me"] = "you"  # Does NOT create a new dict.  It changes this one.
print(id(d))
4391625928
4391625928

Dictionary operations

In [5]:
print(emissions_by_year)
        
# extend (add a new key and its value)
emissions_by_year[2009] = 1000000   # Wishful thinking
print(emissions_by_year) 
        
# update (change the value associated with a key)
emissions_by_year[2009] = 10        # Old value is tossed out
print(emissions_by_year)            # Reports most recent values
        
# check for membership
1950 in emissions_by_year           # A dict operator (not a function
                                    # or method).  This one is binary.
{2002: 1733297, 1950: 734914, 1799: 1, 1800: 70, 1801: 74, 1802: 79, 1902: 82}
{2002: 1733297, 1950: 734914, 1799: 1, 1800: 70, 1801: 74, 1802: 79, 2009: 1000000, 1902: 82}
{2002: 1733297, 1950: 734914, 1799: 1, 1800: 70, 1801: 74, 1802: 79, 2009: 10, 1902: 82}
Out[5]:
True
In [6]:
# remove a key-value pair
del emissions_by_year[1950]         # A unary dict operator.
1950 in emissions_by_year           # This is now false
Out[6]:
False
In [7]:
# determine length (number of key-value pairs)
len(emissions_by_year)
Out[7]:
7
In [8]:
# Iterating over the dictionary
for key in emissions_by_year:
    print(key)
2002
1799
1800
1801
1802
2009
1902

Why did the keys come out in an unexpected order?

In Python 3.5 and earlier versions, the dictionary keys are unordered. The order that the keys are traversed (when you loop through) is arbitrary: there is no guarantee that it will be in the order that they were added.

Silly analogy: A dict is like a filing assistant who is very efficient but keeps everything in a secret room. You have no idea how he organizes things, and you don't care -- as long as he can pull the file you need when you give him the key.

Note: In Python 3.6, the dictionary keys will appear in the order that they are added to the dictionary. However, dictionaries are still considered unordered in that two dictionaries are considered equivalent if they contain the same key/value pairs regardless of order:

In [ ]:
d1 = {1: 'a', 2: 'b', 3: 'c'}
d2 = {3: 'c', 1: 'a', 2: 'b'}
d1 == d2

Dictionary methods

In [9]:
emissions_by_year.keys()
Out[9]:
dict_keys([2002, 1799, 1800, 1801, 1802, 2009, 1902])
In [10]:
emissions_by_year.values()
Out[10]:
dict_values([1733297, 1, 70, 74, 79, 10, 82])

Method items produces the (key, value) pairs

In [11]:
emissions_by_year.items()
Out[11]:
dict_items([(2002, 1733297), (1799, 1), (1800, 70), (1801, 74), (1802, 79), (2009, 10), (1902, 82)])

To work with the data returned by the methods described above, we typically convert it to type list. For example:

In [12]:
years = list(emissions_by_year.keys())
print(years)
[2002, 1799, 1800, 1801, 1802, 2009, 1902]

Practice Exercise: working with dictionaries

  1. Create a variable doctor_to_patients that refers to an empty dictionary.
  2. Add an entry for 'Dr. Ngo' with 1200 patients.
  3. Add another entry for 'Dr. Singh' with 1400 patients.
  4. Add a third entry for 'Dr. Gray' with 1350 patinets.
  5. Print the number of patients associated with 'Dr. Singh'.
  6. Change the number of patients associated with 'Dr. Singh' to 1401.
  7. Write an expression to get the number of key-value pairs in the dictionary.
  8. Write an expression to get the doctors.
  9. Write an expression to get the patient quantities.
  10. Write an expression to check whether 'Dr. Koch' is a key in the dictionary.
  11. Remove the key-value pair with 'Dr. Ngo' as the key.

Iterating through a dictionary

In [13]:
phone = {'555-7632': 'Paul', '555-9832': 'Andrew', '555-6677': 'Dan', 
         '555-9823': 'Michael', '555-6342' : 'Cathy', '555-7343' : 'Diane'}

(a) Going through the keys

In [14]:
# The proper way:
for key in phone:
    print(key)

# This is equivalent, but not considered good style:
#for key in phone.keys():
#    print(key)
555-7632
555-9823
555-9832
555-6342
555-6677
555-7343

(b) Going through the key-value pairs:

In [15]:
# This gives you a series of tuples.
for item in phone.items():
    print(item)
('555-7632', 'Paul')
('555-9823', 'Michael')
('555-9832', 'Andrew')
('555-6342', 'Cathy')
('555-6677', 'Dan')
('555-7343', 'Diane')
In [16]:
# You can pull the pieces of the tuple out as you go:
for (number, name) in phone.items():
    print("Name:", name, "; Phone Number:", number)
Name: Paul ; Phone Number: 555-7632
Name: Michael ; Phone Number: 555-9823
Name: Andrew ; Phone Number: 555-9832
Name: Cathy ; Phone Number: 555-6342
Name: Dan ; Phone Number: 555-6677
Name: Diane ; Phone Number: 555-7343

Practice Exercise: looping over dictionaries

The following dictionary has brand name drugs as keys and generic drug names as values:

brand_to_generic = {'lipitor': 'atorvastatin',
                    'zithromax': 'azithromycin',
                    'amoxcil': 'amoxicillin',
                    'singulair': 'montelukast',
                    'nexium': 'esomeprazole',
                    'plavix': 'clopidogrel',
                    'abilify': 'ARIPiprazole'}

Using the dictionary above and for loops, complete the following tasks:

  1. Get a list of brand name drugs that start with the letter 'a'.
  2. Count the number of generic drugs that end with the letter 'n'.
  3. Get a list of brand name drugs in alphabetical order. (Hint: this can be solved both with or without a for loop. Once you have solved it one way, try to solve it using a different approach.)

Inverting a dictionary

Here's a dictionary mapping phone numbers to names.
Some people have more than one phone number, of course.

In [20]:
phone_to_person = {'555-7632': 'Paul', '555-9832': 'Andrew', 
                   '555-6677': 'Dan', '555-9823': 'Michael',
                   '555-6342' : 'Cathy', '555-2222': 'Michael',
                   '555-7343' : 'Diane'}

Suppose we want to create a list of all of Michael's phone numbers:

In [21]:
# Method 1
michael = []
for key in phone_to_person:
    if phone_to_person[key] == 'Michael':
        michael.append(key)
print(michael)
['555-9823', '555-2222']

But what if I want to be able to do this for all people? Question: is there some object you could create to make this easy? Answer: A dictionary!

  • The original dictionary takes us from numbers to names.
  • The new dictionary will take us in the reverse direction, from names to numbers.
In [22]:
new_phone = {}
for (number, name) in phone_to_person.items():
    if name in new_phone:
        new_phone[name].append(number)
    else:
        new_phone[name] = [number]
new_phone
Out[22]:
{'Andrew': ['555-9832'],
 'Cathy': ['555-6342'],
 'Dan': ['555-6677'],
 'Diane': ['555-7343'],
 'Michael': ['555-9823', '555-2222'],
 'Paul': ['555-7632']}

We call this an inverted dictionary.