Sunday, April 12, 2009

List Comprehension for filtering

In the previous blog post, I wrote about a list of domains I use in checking whether a domain looks valid for indexedbygoogle.com. The list of domains is retrieved from here: http://data.iana.org/TLD/tlds-alpha-by-domain.txt with much thanks to them. They seem to update the list on a regular basis so they'll do as a resource for now. Here is the bit of code used to download and make the file into a list. I want to exclude the first line which is a comment (the first char is a #) and any empty lines.

import urllib2

url = 'http://data.iana.org/TLD/tlds-alpha-by-domain.txt'

domain_file = urllib2.urlopen(url).read()
domain_list = domain_file.split('\n')
DOMAINS = [tld for tld in domain_list if not (tld.startswith('#') or tld == '')]

The interesting bit is in the last line. The if part of the list comprehension will filter out any blanks and any list items that start with a #.

It does not make sense to make this part of the CGI app itself since the code above will run whenever a user uses indexedbygoogle.com. A better alternative might be to pickle DOMAINS to a file and load it on demand, updating the contents of DOMAINS daily or weekly via a cron job.

Finally, here it is condensed into a one liner but, keep in mind that this might not be the best way to write it since it sacrifices readability and clarity for less code. Clarity should always trump brevity! :)

DOMAINS = [tld for tld in urllib.urlopen('http://data.iana.org/TLD/tlds-alpha-by-domain.txt').read().split('\n') if not (tld.startswith('#') or tld == '')]

Labels: , , ,

Thursday, October 02, 2008

List Comprehension in Python, an example

Coding up a little website where a user would select (using check boxes) from a list of items and submit. I'm using Django btw (which is great). Due to the dynamic nature of the list being generated and displayed, the name of each checkbox wold be something like vid0, vid1, etc. The POST data however will not have the keys for unsubmitted values, and I have no way to tell which keys are in the POST dictionary.

So at first, I coded the following bit of code. Simple, straightforward, gets the job done.
feed = []
for i in range(max_results):
feed.append(request.POST.get('vid'+str(i), '')) # feed will contain empty strings
if feed[-1] is '':
del feed[-1] #if the most recently appended item is an empty string, delete it.
But then, I had a light bulb moment and coded this:
feed=[request.POST.get('vid'+str(i)) for i in range(max_results) if request.POST.get('vid'+str(i),'') is not '']
One line of code now replaces 5! How awesome is that! Python rocks.

There might be even simpler and more elegant solutions out there. If so, please leave them in the comments. (Maybe something using a lambda?)

BTW, I am a newcomer to python having started to code with it just a few months ago.

Labels: , , , , , ,