Time-Saving Tips for Python
Caveat: “Time saving” is a loose term – especially for a large and/or growing codebase shared by many developers, you want to be careful that time saved by one person in writing code (e.g. with clever one-liners) doesn’t grossly compromise time spent by other people in reading it, maintaining it, or otherwise working with it. This answer addresses the question in terms of broader productivity gains across a software team, as opposed to tips and tricks to hack up throwaway Python code as quickly as possible.
General Tips
For idiomatic Python, get comfortable with list comprehensions, generators, decorators, and context managers.
- List comprehensions are often the most idiomatic way of creating and transforming lists.
- Generators are most often used for defining iterators cleanly, usually with less memory overhead than alternative solutions.
- Decorators are great for common setup/cleanup code, like authentication, caching, or timing. Some built-in decorators like @classmethod and @property are very useful when working with classes. See also: What are common uses of Python decorators?
- Context managers are also handy for setup/cleanup code, for example in opening and closing files or tags (Quora uses these to make sure tags in HTML generated from WebNode are always closed). Another cool use case is setting some state for the duration of a code block, like flags for bypassing cache or for pipelining networked instructions.
Familiarize yourself with all the methods of the most common data structures: lists, sets, dictionaries. Python has a lot of useful built-in functionality for working with these data structures, like slicing for lists or mathematical set operations with syntactic sugar for easy use on sets.
Run code snippets at an iPython shell. Take full advantage of the fact that Python has a great interactive shell – test out your code snippets quickly and fiddle with syntax or logic ‘til they work, inspect your objects and functions, time function calls, etc. Make sure to use iPython and not the standard Python shell, as iPython is much more powerful and has nice features like tab-completion and %history.
Use a linter like PyFlakes, PyLint, or PyChecker comparison here. Better yet, hook your linter into your editor so it checks as you go, similar to the code analysis and automatic error or warning highlighting that Java IDEs do.
Familiarize yourself with Python’s standard libraries, like urllib2
,
time
and datetime
, random
, itertools
, functools
, re
, collections
,
etc. See Also.
Use print
statements and pdb
to debug. print
is particularly effective
to use in Python since you don’t need to compile your code again to start
printing useful debug output. As for pdb
, just dump this wherever you need to
start investigating your running program: import pdb; pdb.set_trace();
and
step through/in, continue, or print values as you
please. See also.
Adopt strong naming conventions to make your code more self-documenting. For
example, if you have user ids and user objects in your codebase, always use
_user_id
to refer to the former and _user
to refer to the latter, so it’s
always clear whether you’re working with an int/long that is the id or an object
that is the full user object. (This is an example that is particularly salient
given Python’s lack of static type annotations.) And unless you are using _
for i18n, use _
to indicate values to be thrown away.
Use a Linter
Run pyflakes
and pep8
as part of your tests. This keeps your code from
getting weird. The autopep8
package has an -i
flag that will let you fix
many pep8
mistakes. Consider pylint
or something more configurable for
deeper control.
Package any and all library like things, and install them via pip
instead of
keeping them in a single repository. Use virtualenv and pth files to link to
favor “checked out” libraries if they exist over pip installed libraries. This
will let you edit your libraries in
place. You can turn off pyc creation -
these aren’t always beneficial.
Contribute to open source projects that have people writing code that you find easy to read and useful. You can learn a lot from them. Understand some python syntactical sugar: list comprehensions, generators, and generator comprehensions are all very helpful. Learn what methods produce iterators and when to use them (e.g. dictionaries, xrange). Get an idea of what the standard library provides. collections, for example, is a quite a treat.
Have people more familiar with different aspects of the language code review your code. I get a lot of mileage from working with people who know a lot of python, and I like it when they review my code.
Efficiency Tips
For generating a bunch of data that is similar but not the same, I recently ran across this beginner’s shortcut on Codeacademy using argument unpacking:
class Dog:
def __init__(self, name, age, breed, color):
self.name = name
self.gender = gender
self.breed = breed
self.color = color
# create the defaults
attrs = {"age": 7, "gender": "m", "breed": "mutt", "color": "motley"}
#create boy dogs
spuds = Dog("spuds", **attrs)
spike = Dog("spike", **attrs)
#modify sex and get all of the rest of the default data for free
attrs["gender"] = "f"
#create girl dogs
laika = Dog("laika", **attrs)
lassy = Dog("lassy", **attrs)
**attrs
unpacks into the list init is expecting.
Super easy to throw this concept into a couple of loops and generate a bunch of data files for testing various mechanisms that need valid (and invalid, as needed) bulk data.
I suspect there are more efficient ways of doing this, and look forward to reading about them in the comments.
Data Structures & Algorithms
Understand list comprehensions:
many for
loops can be written more clearly and succinctly with this language
construct. Or, more generally …
Understand generators since list comprehensions are essentially generators realized into a list. Generator expressions can be used in many contexts where the naive way would be to build a list in an additional code block. Once you understand generators, learn about the itertools library.
Understand data structure capabilities. My favorite underutilized data structures are sets and counters, but there are many more. Trying to conceptualize your problem in a mathematical domain will often make the solution simple to write because of Python’s famed psudo-code that runs aesthetic. That aesthetic frequently comes from data structure support for common operations.
Learn another language. Clojure and Haskell (as examples) will force you to write more succinct code in many cases, and that experience can translate to better Python code. Just make sure when you come back to Python, you follow Python’s social mores.
Here’s an example that tries to tie these together: find the number of different
words between two pieces of text, a_str
and b_str
.
a_words, b_words = [set(w.lower() for w in re.split('\W+', s)) for s in (a_str, b_str)]
len(a_words ^ b_words)
A generator expression is used in the construction of two sets. A list comprehension is used to run a transformation over the two inputs. Iterable data structures can be unpacked or destructured into multiple variables. The mathematical notion of set difference is used to find the set of different words.
Other Tips
A few things not yet mentioned.
enumerate()
Sometimes you want the items in a list and a counter. The following are all more or less equivalent.
# Iterating with a counter
for i in xrange(len(thelist)):
v = thelist[i]
...
# Keeping a counter
i = 0
for v in thelist:
...
i += 1
# Using enumerate
for i, v in enumerate(thelist):
...
defaultdict
Python’s collections
module has a bunch of very useful tools. It’s worth
exploring the package. One of my favorites and more used is the defaultdict
.
Let’s say we have a list of values and we want to count occurrences of the items
in the list. (Granted there’s a Counter
in collections now, but bear with
me.) These will work:
# Naive way
counter = {}
for item in thelist:
if item in counter:
counter[item] += 1
else:
counter[item] = 0
# Using get()
counter = {}
for item in thelist:
counter[item] = counter.get(item, 0) + 1
# Using defaultdict
counter = defaultdict(int)
for item in thelist:
counter[item] += 1
As you can see, it’s also handy to know about dict’s get()
method… and
defaultdict
can work with many types. Want a dict of lists? Use
defaultdict(list)
.
Embedding IPython
Using pdb
has been mentioned, but I usually just like to break in at a
particular spot with an IPython shell and go from there to debug. Remember
these two magic lines:
import IPython
IPython.embed()
That’s it! Just toss that in wherever you’d use pdb
.
Know your data types
Looking up an item in a list? It’s runtime is O(n).
Looking up a key in a dict? Runtime is O(1).
What if you have a big list that you’re going to do many lookups on in a loop and you don’t care about duplicates in the list?
# This can be slow
user_ids = get_user_ids_meeting_condition()
for t in transactions:
user_id = t['user_id']
if user_id not in user_ids:
continue
...
# Speeding things up with a set
user_ids = set(get_user_ids_meeting_condition())
for t in transactions:
user_id = t['user_id']
if user_id not in user_ids:
continue
...
Same code, but now user_ids
is a set
and lookups are O(1) instead of
O(n). Moral of the story? Don’t forget about the set!
Saving time later …
I assume time saving doesn’t mean saving time right now but saving time later, avoiding difficulties when debugging or refactoring code. In this context, time saving tips could be:
- Never use
import *
or relative imports - Avoid duplicate ascendant/descendant directory names in your tree.
- Avoid
locals()
andglobals()
- Try hard to have no circular dependencies, import only on top of modules. Exceptions must have explanatory comment.
- Inside the same project, import at the source: do not import x from b if the actual code is in a.
- Do not use default param values when it is not used.
- Use
_private
methods and functions by default for method you don’t use outside the module or the class. - Listen carefully to most of what pylint will tell you, tweak its config if necessary.
- Do not pass variables just in case they might be used in the future. Python is very ductile, adding something later is almost always easier than removing later something that might it might not be useless.
- Never ever write testable code without the tests.
Quick simple debugging:
import pdb
pdb.set_trace()
Tracing live code:
$ python -m trace --trace myModule.py
Start debugger on exception
If you’re running Python code via a console (eg unit tests), you might find the below recipe useful:
it automatically starts the debugger on an exception. Place
the code in the path LIB/python2.X/sitecustomize.py
for this to work.
import bdb
import sys
def info(type, value, tb):
if hasattr(sys, 'ps1') \
or not sys.stdin.isatty() \
or not sys.stdout.isatty() \
or not sys.stderr.isatty() \
or issubclass(type, bdb.BdbQuit) \
or issubclass(type, SyntaxError):
# we are in interactive mode or we don't have a tty-like
# device, so we call the default hook
sys.__excepthook__(type, value, tb)
else:
import traceback, pdb
# we are NOT in interactive mode, print the exception...
traceback.print_exception(type, value, tb)
print
# ...then start the debugger in post-mortem mode.
pdb.pm()
sys.excepthook = info
The version provided here is a little modified based on the comments on the recipe; it is available as a gist.
General Tips
Document your code with comments: This should be self explanatory but it will help dramatically. There is a fine line between doing it right and having way too many comments. What I like to do is, for every module I have a docstring at the top and for every function/method which is not extremely self explanatory (e.g. a property which returns an underlying variable) I will have a short description and describe the arguments and return value. This seems useless at times but it will make your code so much more maintainable and readable to a third party and yourself.
Use the same style throughout: The way I think about this is if I were to write a book, I would try to keep the same style all the way through. If I choose to use the oxford comma then I’ll continue using it. If I put two spaces after a period I’ll continue doing that. I find the same with python especially because white spaces matter and there tend to be a lack of parentheses and brackets to wrap statements. So I try to follow 80 characters per line and when I have line breaks I try to follow the same conventions, especially when passing a lot of arguments to functions. This way it makes it far easier to read when I come back to it.
Use some sort of IDE: Whether it is plugins to VIM or full fledged PyCharm, use something where jumping to different modules is simple. I use sublime text 2 and it’s pretty excellent.