Python Unveiled

June 6, 2023 · 12 min read · 2,480 words · python

Python is an interpreted, dynamically typed, language. Python's philosophy can be found in any interpreter by typing import this. These are not hard-rules, but rather guidelines to writing clean, efficient, and 'Pythonic' code.


>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

(another fun one: import antigravity)

Pythonic Idioms: Embracing the Zen#

List Comprehensions#

Python's preferred way to filter lists,


def is_even(num):
  return num % 2 == 0

evens = []
for num in range(10):
  if is_even(num):
    evens.append(num)
print(evens)  # [0, 2, 4, 6, 8]

# equivalent to
evens = [num for num in range(10) if is_even(num)]
print(evens)  # [0, 2, 4, 6, 8]

and a functional way to create new lists.


def add_two(num):
  return num + 2

nums = []
for num in range(10):
  nums.append(add_two(num))
print(nums)  # [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

# equivalent to
nums = [add_two(num) for num in range(10)]
print(nums)  # [2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

You can combine both approaches to map and filter in one fell swoop.


def add_two(num):
  return num + 2

def is_even(num):
  return num % 2 == 0

nums = [add_two(num) for num in range(10) if is_even(num)]
print(nums)  # [2, 4, 6, 8, 10]

Dictionary Comprehensions#

Dictionary comprehensions give you the ability to generate new dictionaries, concisely. The syntax looks like this: {k: v for k, v in thing}


words = ["foo", "comprehension", "bar", "dictionary", "baz"]

word_lengths = {word: len(word) for word in words}
print(word_lengths)  # {'foo': 3, 'comprehension': 13, 'bar': 3, 'dictionary': 10, 'baz': 3}

easy_words = {word: count for word, count in word_lengths.items() if count < 10}
print(easy_words)  # {'foo': 3, 'bar': 3, 'baz': 3}

Decorators#

@staticmethod Used when a method logically belongs in a class but doesn't operate on it's instances.


class MathUtil:

    @staticmethod
    def add(x, y):
        return x + y


# Use the static method
result = MathUtil.add(5, 7)

@classmethod - We know that __init__ is the constructor of a class. Think of @classmethod's as alternate constructors among other things.


class Pizza:
    def __init__(self, ingredients):
        self.ingredients = ingredients

    @classmethod
    def from_string(cls, ingredient_string):
        ingredients = ingredient_string.split(',')
        return cls(ingredients)


# Create a Pizza the usual way
p1 = Pizza(['cheese', 'tomatoes'])

# Create a Pizza using the alternative constructor
p2 = Pizza.from_string('cheese,tomatoes')

print(p1.ingredients)  # Output: ['cheese', 'tomatoes']
print(p2.ingredients)  # Output: ['cheese', 'tomatoes']

@property gives us some syntax sugar. Note that we're decorating the function full_name, but we don't use parens when calling it!


class Person:
    def __init__(self, first_name, last_name):
        self.first_name = first_name
        self.last_name = last_name

    @property
    def full_name(self):
        return f"{self.first_name} {self.last_name}"


p = Person('John', 'Doe')
print(p.full_name)  # Output: John Doe

Decorators (custom)#

Decorators functions that wrap other functions, giving you access to the time just before the function runs and immediately after. This lets gives you the ability to create really neat things.

Good use cases: retrying when failures happen, memoization, logging, and more.


import functools


def print_decorator(func):

    @functools.wraps(func)  # <- another decorator. Preserves original function's name and docstring
    def wrapper(*args, **kwargs):
        print(f"Before calling function {func.__name__}")
        result = func(*args, **kwargs)
        print(f"After calling function {func.__name__}")
        return result

    return wrapper


# Usage:
@print_decorator
def add(x, y):
    return x + y

add(1, 2)

# Outputs:
# Before calling function add
# After calling function add

Note: that since you're only defining a higher order function, this is functionally equivalent to the following. Python just gives us @ as shorthand.


add = print_decorator(add)
add(1, 2)

Dunder methods#

Magic methods that exist on the class allow Python to treat that object in unique ways.

__repr__(self) : Controls how an object is represented in the system
__str__(self): Controls how an object is present when the user invokes print()
__add__(self, other): Controls how two objects interact with the + operator
__iter__(self): Controls what is returned as the objects iterator
__next__(self): Controls how to generate the next() item in the iteration

Here we're creating a custom iterable.


class Repeat:
    def __init__(self, value, times):
        self.value = value
        self.times = times
        self.count = 0

    def __iter__(self):
        return self

    def __next__(self):
        if self.count >= self.times:
            raise StopIteration
        self.count += 1
        return self.value


r = Repeat('Python', 5)
for val in r:
    print(val)

# Output:
# Python
# Python
# Python
# Python
# Python

Note: You should use itertools.repeat if you actually need repeat functionality.

Context Managers#

Context managers are Python's solution to working with objects that require a specific setup and exit operations. Here's a timer, for example:


import time

class Timer:
    def __enter__(self):  # <- called when we enter the new indentation
        self.start = time.time()
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):  # <- called after we leave that block
        self.end = time.time()
        print(f"Execution time: {self.end - self.start} seconds")


with Timer():  # <- __enter__
    sum(range(10**7))  # Some time-consuming operation
    # other operations can go here...

    # __exit__() is called here

If you've ever typed with open(...), that's a context manager!


lines = []
with open("myfile.csv", "r") as f:  # file is opened in reader mode, e.g "r"
  lines = f.read_lines()
  # other operations can go here.
  # file is closed once the end of the block is reached

# equivalent to:
f = open("myfile.csv", "r")
lines = f.read_lines()
f.close()

Fun fact: we often use these to close db connections, too! Preventing us from accidentally leaving orphaned open connections.

Generators#

You can think of iterables and iterators like a book and bookmark respectively. Iterators hold their location untill instructed to advance. Generators take it a step further and chop the left side of the book off entirely. In other word, generators only care about what the current value is and the location of the next value. This reduces a lot of memory overhead.

In this example, pretend that we're iterating over a very large file (terabytes, even!). If we were to load that file into memory entirely, python would raise an OutOfMemory exception. We're able to prevent that by using generators.


def read_large_file(file_path):
    with open(file_path, "r") as file:
        for line in file:
            yield line.strip()  # remove trailing newline character


# Use the generator function
for line in read_large_file("/path/to/your/large/file.log"):
    # Process each line
    print(line)

Python's Standard Library: Hidden Gems#

Python ships with 'batteries included.' This means that out of the box, Python provides us with a broad selection of libraries to use for many different tasks. 305 modules, to be exact.


>>> import sys
>>> len(sys.stdlib_module_names)
305  # <- dang, that's a lot!

Dang, that's a lot! (jinx) Probably too much, to be honest. Good news, PEP 594 aims to remove "dead batteries" and Brett Cannon had discussions this year at the Python Summit to answer the question What is the std library for?

However, there are a few modules in the standard library that stand out!

Itertools (for efficient looping)#

itertools.cycle(iterable) is used to infinitely repeat iteration over an iterable.


import itertools

colors = itertools.cycle(['red', 'green', 'blue'])
for _ in range(10):
    print(next(colors))

# Output:
# red
# green
# blue
# red
# green
# blue
# red
# ...

# you could also do this, but BEWARE that it is an infinite loop.
for color in itertools.cycle(['red', 'green', 'blue']):
  print(color)

itertools.chain(*interables) chains multiple iterables together.


import itertools

for i in itertools.chain([1, 2, 3], ['a', 'b', 'c']):
    print(i)

# Output:
# 1
# 2
# 3
# a
# b
# c

itertools.permutations(iterable) iterates over every permutation of <iterble>.


import itertools

word = "dog"
perms = itertools.permutations(word)
for perm in perms:
    print(''.join(perm))

# Ouputs:
# dog
# dgo
# odg
# ogd
# gdo
# god

Functools (for operations on callable objects)#

We've already seen functools.wraps. Here are some others:

functools.partial(func, *args, **kwargs) - Returns a new function with partial application of the given arguments.


from functools import partial

def multiply(x, y):
    return x * y

# create a new function that multiplies by 2
dbl = partial(multiply, 2)
print(dbl(4))   # Output: 8

functools.lru_cache(maxsize) - caches the result of function calls. It can save time when an expensive or I/O bound function is periodically called with the same arguments.


from functools import lru_cache


@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n - 1) + fib(n - 2)


print(fib(36))  # Output: 14930352, but quickly

Other neat modules

functools.reduce (think javascript reduce)
functools.total_ordering

Collections (for working with container datatypes)#

collections.defaultdict(callable) saves us from having to check if a key exists in a dictionary before working with it.


from collections import defaultdict

grades = [
    ('Anna', 'A'),
    ('Brad', 'B'),
    ('Carol', 'A'),
    ('Derek', 'C'),
    ('Ella', 'B'),
    ('Fred', 'A'),
    ('Grace', 'C'),
]

# Using defaultdict
students_by_grade = defaultdict(list)

for student, grade in grades:
    students_by_grade[grade].append(student)

print(students_by_grade)
# Output: defaultdict(<class 'list'>, {'A': ['Anna', 'Carol', 'Fred'], 'B': ['Brad', 'Ella'], 'C': ['Derek', 'Grace']})

collections.namedtuple(name: str, fields: list[str]) - Alternative, memory efficient structure to hold data


from collections import namedtuple

Point = namedtuple('Point', ['x', 'y'])

p = Point(1, 2)

print(p.x)  # Output: 1
print(p.y)  # Output: 2


# Python 3.5+ supports an alternative syntax that supports typing
from typing import NamedTuple

class Point(NamedTuple):
  x: int
  y: int

p = Point(1, 2)

print(p.x)  # Output: 1
print(p.y)  # Output: 2

collections.Counter(iterable)` - Used to easily count occurrences in an iterable.


from collections import Counter

orders = ['latte', 'espresso', 'cappuccino', 'latte', 'espresso', 'latte', 'cappuccino', 'cappuccino', 'latte']

counter = Counter(orders)

print(counter)
# Output: Counter({'latte': 4, 'cappuccino': 3, 'espresso': 2})

Other neat modules

collections.deque
collections.OrderedDict

Python Gotchas: Avoiding the Snakebite#

Mutable default arguments: the trap in function definitions

Consider this snippet.


def update_profile(request_data, profile_data={"name": "Guest", "email": "guest@example.com"}):
    profile_data.update(request_data)
    return profile_data

print(update_profile({"name": "Alice"}))
# Output: {'name': 'Alice', 'email': 'guest@example.com'}

print(update_profile({"email": "bob@example.com"}))
# Output: {'name': 'Alice', 'email': 'bob@example.com'}

What happened here? If Alice and Bob were two different users, we wouldn't want Bob's profile to have Alice's name. In Python, default arguments (profile_data in this case) are evaluated once when the function is defined, not each time the function is called. This means that if you use a mutable default argument and mutate it, you will mutate that object for all future calls to the function as well.

A better approach would be to use None as the default value, and assign an empty list in the function body, if None was provided:


def update_profile(request_data, profile_data=None):
    if profile_data is None:
        profile_data = {"name": "Guest", "email": "guest@example.com"}
    profile_data.update(request_data)
    return profile_data

print(update_profile({"name": "Alice"}))
# Output: {'name': 'Alice', 'email': 'guest@example.com'}

print(update_profile({"email": "bob@example.com"}))
# Output: {'name': 'Guest', 'email': 'bob@example.com'}

Pitfalls of using == and "is" interchangeably

== checks for equality in value: This operator compares the values of two objects and returns True if they are equal.


print(10 == 10.0)  # True, although one is int and the other is float

is checks for identity: This operator checks whether two variables point to the same object in memory, not if their values are equal. This can be thought of as checking if they are exactly the same object.


a = [1, 2, 3]
b = a
print(b is a)  # True, because b and a point to the same list object in memory

c = [1, 2, 3]
print(a is c)  # False, because c points to a new list object in memory, even though its value is equal to a's

The pitfall comes when you assume is checks for value equality, or vice versa.

For instance, due to Python’s optimization strategy (called interning), small integers and strings sometimes point to the same object in memory, making is return True unexpectedly:


print(10 is 10)  # True, due to Python's optimization
print("hello" is "hello")  # True, strings are also interned

# But with different values or types:
a = 1000
print(a is 1000)  # False, the optimization doesn't apply here
print("hello " is "hello")  # False, the two strings are different

Another example


a = [1, 2, 3]
b = [1, 2, 3]
c = a

print(a == b)  # True, because the lists have the same contents
print(a is b)  # False, because they're not the same object. They are completely different objects in memory.
print(a is c)  # True, because they're the same object. We instructed c to point to a's location in memory.

Last example. Python automatically interns (i.e., creates and reuses) the numbers from -5 to 256. So these numbers will have the same id and exist at the same memory location. Here's an example that demonstrates this:


# Small numbers are interned
a = 256
b = 256
print(a is b)  # Outputs: True

# Larger numbers are not interned
a = 257
b = 257
print(a is b)  # Outputs: False

In the last example we should clearly be using equality checks instead of identity.

General Performance Advice#

This is taken from Anthony Shaw's PyCon talk. link

Loop invariances


def before():
  x = (1, 2, 3, 4)
  i = 6

  for j in range(10_000):
    print(len(x) * i + j)  # len(x) doesn't change in the context of this loop


def after():
  x = (1, 2, 3, 4)
  i = 6
  x_i = len(x) * i

  for j in range(10_000):
    print(x_i + j)

Missing Comprehensions

From our earlier example,


evens = []
for num in range(10):
  if is_even(num):
    evens.append(num)
print(evens)  # [0, 2, 4, 6, 8]

# equivalent to
evens = [num for num in range(10) if is_even(num)]

The first approach creates a new list, loops over another list and appends things to the first one. However, this is slower because Python has optimizations for list comprehensions. Not only is the second approach faster, but it's less code too.

Other#

There are many other optimizations you can consider, but remember:

Readability counts.

Here are some helpful links

Hello, 2022

Working with Arrays in Bash

Authored by Anthony Fox on June 6, 2023

Have comments or feedback? I'd love to hear from you.