There is a new release of AstronomyPictureOfTheDay available this morning. Version 2.0 is a substantial rewrite of the 1.1 version, but retains essentially the same functionality. The primary difference is that you no longer have to run a script to "personalize" it during installation. Now that it is distributed as an Application instead of an Automator workflow, you can just drag it to your Applications folder.
The source is still created with Automator, but some of the Finder actions for working with directories have been replaced by Shell Script actions to perform the same operations. The Finder actions are hard-coded to specific folder names, selected in the Automator UI when the action is configured. With a Shell Script action, I can use environment variables such as "$HOME" to make the action more flexible. Using variables also avoids the problem of having everything in the workflow tied to my own home directory, so that the paths needed to be modified before the workflow was usable by anyone else.
I could, of course, have written the entire program as a shell script, Python program, or whatever. But the point of building it with Automator in the first place was that it was easy. I suppose I am stretching the boundaries of where Automator is the easiest tool for this particular job, but at least now the hack is hidden inside the app instead of hanging out where everyone can see it in the installation instructions.
Code Interstices All the little things that happen between bouts of coding. Covering internet technologies, Python, Mac OS X, and open source.
Tuesday, July 31, 2007
Monday, July 30, 2007
PyMOTW: glob
Module: glob
Purpose: Use Unix shell rules to fine filenames matching a pattern.
Python Version: 1.4
Description:
Even though the glob API is very simple, the module packs a lot of power. It is useful in any situation where your program needs to look for a list of files on the filesystem with names matching a pattern. If you need a list of filenames that all have a certain extension, prefix, or any common string in the middle, use glob instead of writing code to scan the directory contents yourself.
The pattern rules for glob are not regular expressions. Instead, they follow standard Unix path expansion rules. Shell variable names and tilde (~) are not expanded. There are only a few special characters: two different wild-cards, and character ranges are supported. The patterns rules are applied to segments of the filename (stopping at /), but paths in the pattern can be relative or absolute.
Example Data
The examples below assume the following test files are present in the current working directory:
Use the glob_maketestdata.py script in the sample code to create these files if you want to run the examples.
Wildcards
An asterisk (*) matches zero or more characters in a segment of a name. For example, dir/*.
The pattern matches every pathname (file or directory) in the directory dir, without recursing further into subdirectories.
To list files in a subdirectory, you must include the subdirectory in the pattern:
The first case above lists the subdirectory name explicitly, while the second case depends on a wildcard to find the directory.
The results, in this case, are the same. If there was another subdirectory, the wildcard would match both subdirectories and include the filenames from both.
Single Character Wildcard
The other wildcard character supported is the question mark (?). It matches any single character in that position in the name. For example,
Matches all of the filenames which begin with "file", have one more character of any type, then end with ".txt".
Character Ranges
When you need to match a specific character, use a character range instead of a question mark. For example, to find all of the files which have a digit in the name before the extension:
The character range [0-9] matches any single digit. The range is ordered based on the character code for each letter/digit, and the dash indicates an unbroken range of sequential characters. The same range value could be written [0123456789], in this case.
References:
Python Module of the Week Home
Download Sample Code
Pattern Matching Notation, The Open Group
Purpose: Use Unix shell rules to fine filenames matching a pattern.
Python Version: 1.4
Description:
Even though the glob API is very simple, the module packs a lot of power. It is useful in any situation where your program needs to look for a list of files on the filesystem with names matching a pattern. If you need a list of filenames that all have a certain extension, prefix, or any common string in the middle, use glob instead of writing code to scan the directory contents yourself.
The pattern rules for glob are not regular expressions. Instead, they follow standard Unix path expansion rules. Shell variable names and tilde (~) are not expanded. There are only a few special characters: two different wild-cards, and character ranges are supported. The patterns rules are applied to segments of the filename (stopping at /), but paths in the pattern can be relative or absolute.
Example Data
The examples below assume the following test files are present in the current working directory:
dir/
dir/file.txt
dir/file1.txt
dir/file2.txt
dir/filea.txt
dir/fileb.txt
dir/subdir/
dir/subdir/subfile.txt
Use the glob_maketestdata.py script in the sample code to create these files if you want to run the examples.
Wildcards
An asterisk (*) matches zero or more characters in a segment of a name. For example, dir/*.
import glob
print glob.glob('dir/*')
The pattern matches every pathname (file or directory) in the directory dir, without recursing further into subdirectories.
$ python glob_asterisk.py
['dir/file.txt', 'dir/file1.txt', 'dir/file2.txt',
'dir/filea.txt', 'dir/fileb.txt', 'dir/subdir']
To list files in a subdirectory, you must include the subdirectory in the pattern:
print 'Named explicitly:'
print glob.glob('dir/subdir/*')
print 'Named with wildcard:'
print glob.glob('dir/*/*')
The first case above lists the subdirectory name explicitly, while the second case depends on a wildcard to find the directory.
$ python glob_subdir.py
Named explicitly:
['dir/subdir/subfile.txt']
Named with wildcard:
['dir/subdir/subfile.txt']
The results, in this case, are the same. If there was another subdirectory, the wildcard would match both subdirectories and include the filenames from both.
Single Character Wildcard
The other wildcard character supported is the question mark (?). It matches any single character in that position in the name. For example,
print glob.glob('dir/file?.txt')
Matches all of the filenames which begin with "file", have one more character of any type, then end with ".txt".
$ python glob_question.py
['dir/file1.txt', 'dir/file2.txt',
'dir/filea.txt', 'dir/fileb.txt']
Character Ranges
When you need to match a specific character, use a character range instead of a question mark. For example, to find all of the files which have a digit in the name before the extension:
print glob.glob('dir/*[0-9].*')
The character range [0-9] matches any single digit. The range is ordered based on the character code for each letter/digit, and the dash indicates an unbroken range of sequential characters. The same range value could be written [0123456789], in this case.
$ python glob_charrange.py
['dir/file1.txt', 'dir/file2.txt']
References:
Python Module of the Week Home
Download Sample Code
Pattern Matching Notation, The Open Group
Technorati Tags:
python, PyMOTW
Sunday, July 22, 2007
PyMOTW: calendar
Module: calendar
Purpose: The calendar module implements classes for working with dates to manage year/month/week oriented values.
Python Version: 1.4, with updates in 2.5
Description:
The calendar module defines the Calendar class, which encapsulates calculations for values such as the dates of the weeks in a given month or year. In addition, the TextCalendar and HTMLCalendar classes can produce pre-formatted output.
Formatting Examples:
A very simple example which produces formatted text output for this month using TextCalendar might use the prmonth() method.
I told TextCalendar to start weeks on Sunday, following the American convention. The default is to start on Monday, according to the European convention.
The output looks like:
The HTML output for the same time period is slightly different, since there is no prmonth() method:
The rendered output looks roughly the same.
You can see, though, that each table cell has a class attribute corresponding to the day of the week.
If you need to produce output in a format other than one of the available defaults, you can use the calendar module to calculate the dates and organize the values into week and month ranges, then iterate over the rest yourself. The weekheader(), monthcalendar(), and yeardays2calendar() methods of Calendar are especially useful for that sort of work.
Calling yeardays2calendar() produces a sequence of "month row" lists. Each list includes the months as another list of weeks. The weeks are lists of tuples made up of day number (1-31) and weekday number (0-6). Days which fall outside of the month have a day number of 0.
Calling yeardays2calendar(2007, 2) returns data for 2007, organized with 2 months per row.
This is equivalent to the data used by formatyear()
which for the same arguments produces output like:
If you want to format the output yourself for some reason (such as including links in HTML output), you will find the day_name, day_abbr, month_name, and month_abbr module attributtes useful. They are automatically configured correctly for the current locale.
Calculation Examples:
Although the calendar module focuses mostly on printing full calendars in various formats, it also provides functions useful for working with dates in other ways, such as calculating dates for a recurring event. For example, the Python Atlanta User's Group meets the 2nd Thursday of every month. To calculate the dates for the meetings for a year, you could use the return value of monthcalendar().
Notice that some days are 0. Those are days of the week which overlap with the given month but which are part of another month.
Remember that, by default, the first day of the week is Monday. It is possible to change that by calling setfirstweekday(). On the other hand, since the calendar module includes constants for indexing into the date ranges returned by monthcalendar(), it is more convenient to skip that step in this case.
To calculate the PyATL meeting dates for 2007, assuming the second Thursday of every month, we can use the 0 values to tell us whether the Thursday of the first week is included in the month (or if the month starts, for example on a Friday).
So the PyATL meeting schedule for the year is:
References:
Python Module of the Week Home
Download Sample Code
Purpose: The calendar module implements classes for working with dates to manage year/month/week oriented values.
Python Version: 1.4, with updates in 2.5
Description:
The calendar module defines the Calendar class, which encapsulates calculations for values such as the dates of the weeks in a given month or year. In addition, the TextCalendar and HTMLCalendar classes can produce pre-formatted output.
Formatting Examples:
A very simple example which produces formatted text output for this month using TextCalendar might use the prmonth() method.
import calendar
c = calendar.TextCalendar(calendar.SUNDAY)
c.prmonth(2007, 7)
I told TextCalendar to start weeks on Sunday, following the American convention. The default is to start on Monday, according to the European convention.
The output looks like:
$ python PyMOTW/calendar/calendar_textcalendar.py
July 2007
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31
The HTML output for the same time period is slightly different, since there is no prmonth() method:
import calendar
c = calendar.HTMLCalendar(calendar.SUNDAY)
print c.formatmonth(2007, 7)
The rendered output looks roughly the same.
| July 2007 | ||||||
|---|---|---|---|---|---|---|
| Sun | Mon | Tue | Wed | Thu | Fri | Sat |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
| 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| 15 | 16 | 17 | 18 | 19 | 20 | 21 |
| 22 | 23 | 24 | 25 | 26 | 27 | 28 |
| 29 | 30 | 31 | ||||
You can see, though, that each table cell has a class attribute corresponding to the day of the week.
<table border="0" cellpadding="0" cellspacing="0" class="month">
<tr><th colspan="7" class="month">July 2007</th></tr>
<tr><th class="sun">Sun</th><th class="mon">Mon</th><th class="tue">Tue</th><th class="wed">Wed</th><th class="thu">Thu</th><th class="fri">Fri</th><th class="sat">Sat</th></tr>
<tr><td class="sun">1</td><td class="mon">2</td><td class="tue">3</td><td class="wed">4</td><td class="thu">5</td><td class="fri">6</td><td class="sat">7</td></tr>
<tr><td class="sun">8</td><td class="mon">9</td><td class="tue">10</td><td class="wed">11</td><td class="thu">12</td><td class="fri">13</td><td class="sat">14</td></tr>
<tr><td class="sun">15</td><td class="mon">16</td><td class="tue">17</td><td class="wed">18</td><td class="thu">19</td><td class="fri">20</td><td class="sat">21</td></tr>
<tr><td class="sun">22</td><td class="mon">23</td><td class="tue">24</td><td class="wed">25</td><td class="thu">26</td><td class="fri">27</td><td class="sat">28</td></tr>
<tr><td class="sun">29</td><td class="mon">30</td><td class="tue">31</td><td class="noday"> </td><td class="noday"> </td><td class="noday"> </td><td class="noday"> </td></tr>
</table>
If you need to produce output in a format other than one of the available defaults, you can use the calendar module to calculate the dates and organize the values into week and month ranges, then iterate over the rest yourself. The weekheader(), monthcalendar(), and yeardays2calendar() methods of Calendar are especially useful for that sort of work.
Calling yeardays2calendar() produces a sequence of "month row" lists. Each list includes the months as another list of weeks. The weeks are lists of tuples made up of day number (1-31) and weekday number (0-6). Days which fall outside of the month have a day number of 0.
pprint.pprint(calendar.Calendar(calendar.SUNDAY).yeardays2calendar(2007, 2))
Calling yeardays2calendar(2007, 2) returns data for 2007, organized with 2 months per row.
$ python calendar_yeardays2calendar.py [[[[(0, 6), (1, 0), (2, 1), (3, 2), (4, 3), (5, 4), (6, 5)],
[(7, 6), (8, 0), (9, 1), (10, 2), (11, 3), (12, 4), (13, 5)],
[(14, 6), (15, 0), (16, 1), (17, 2), (18, 3), (19, 4), (20, 5)],
[(21, 6), (22, 0), (23, 1), (24, 2), (25, 3), (26, 4), (27, 5)],
[(28, 6), (29, 0), (30, 1), (31, 2), (0, 3), (0, 4), (0, 5)]],
[[(0, 6), (0, 0), (0, 1), (0, 2), (1, 3), (2, 4), (3, 5)],
[(4, 6), (5, 0), (6, 1), (7, 2), (8, 3), (9, 4), (10, 5)],
[(11, 6), (12, 0), (13, 1), (14, 2), (15, 3), (16, 4), (17, 5)],
[(18, 6), (19, 0), (20, 1), (21, 2), (22, 3), (23, 4), (24, 5)],
[(25, 6), (26, 0), (27, 1), (28, 2), (0, 3), (0, 4), (0, 5)]]],
[[[(0, 6), (0, 0), (0, 1), (0, 2), (1, 3), (2, 4), (3, 5)],
[(4, 6), (5, 0), (6, 1), (7, 2), (8, 3), (9, 4), (10, 5)],
[(11, 6), (12, 0), (13, 1), (14, 2), (15, 3), (16, 4), (17, 5)],
[(18, 6), (19, 0), (20, 1), (21, 2), (22, 3), (23, 4), (24, 5)],
[(25, 6), (26, 0), (27, 1), (28, 2), (29, 3), (30, 4), (31, 5)]],
[[(1, 6), (2, 0), (3, 1), (4, 2), (5, 3), (6, 4), (7, 5)],
[(8, 6), (9, 0), (10, 1), (11, 2), (12, 3), (13, 4), (14, 5)],
[(15, 6), (16, 0), (17, 1), (18, 2), (19, 3), (20, 4), (21, 5)],
[(22, 6), (23, 0), (24, 1), (25, 2), (26, 3), (27, 4), (28, 5)],
[(29, 6), (30, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5)]]],
[[[(0, 6), (0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5)],
[(6, 6), (7, 0), (8, 1), (9, 2), (10, 3), (11, 4), (12, 5)],
[(13, 6), (14, 0), (15, 1), (16, 2), (17, 3), (18, 4), (19, 5)],
[(20, 6), (21, 0), (22, 1), (23, 2), (24, 3), (25, 4), (26, 5)],
[(27, 6), (28, 0), (29, 1), (30, 2), (31, 3), (0, 4), (0, 5)]],
[[(0, 6), (0, 0), (0, 1), (0, 2), (0, 3), (1, 4), (2, 5)],
[(3, 6), (4, 0), (5, 1), (6, 2), (7, 3), (8, 4), (9, 5)],
[(10, 6), (11, 0), (12, 1), (13, 2), (14, 3), (15, 4), (16, 5)],
[(17, 6), (18, 0), (19, 1), (20, 2), (21, 3), (22, 4), (23, 5)],
[(24, 6), (25, 0), (26, 1), (27, 2), (28, 3), (29, 4), (30, 5)]]],
[[[(1, 6), (2, 0), (3, 1), (4, 2), (5, 3), (6, 4), (7, 5)],
[(8, 6), (9, 0), (10, 1), (11, 2), (12, 3), (13, 4), (14, 5)],
[(15, 6), (16, 0), (17, 1), (18, 2), (19, 3), (20, 4), (21, 5)],
[(22, 6), (23, 0), (24, 1), (25, 2), (26, 3), (27, 4), (28, 5)],
[(29, 6), (30, 0), (31, 1), (0, 2), (0, 3), (0, 4), (0, 5)]],
[[(0, 6), (0, 0), (0, 1), (1, 2), (2, 3), (3, 4), (4, 5)],
[(5, 6), (6, 0), (7, 1), (8, 2), (9, 3), (10, 4), (11, 5)],
[(12, 6), (13, 0), (14, 1), (15, 2), (16, 3), (17, 4), (18, 5)],
[(19, 6), (20, 0), (21, 1), (22, 2), (23, 3), (24, 4), (25, 5)],
[(26, 6), (27, 0), (28, 1), (29, 2), (30, 3), (31, 4), (0, 5)]]],
[[[(0, 6), (0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (1, 5)],
[(2, 6), (3, 0), (4, 1), (5, 2), (6, 3), (7, 4), (8, 5)],
[(9, 6), (10, 0), (11, 1), (12, 2), (13, 3), (14, 4), (15, 5)],
[(16, 6), (17, 0), (18, 1), (19, 2), (20, 3), (21, 4), (22, 5)],
[(23, 6), (24, 0), (25, 1), (26, 2), (27, 3), (28, 4), (29, 5)],
[(30, 6), (0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5)]],
[[(0, 6), (1, 0), (2, 1), (3, 2), (4, 3), (5, 4), (6, 5)],
[(7, 6), (8, 0), (9, 1), (10, 2), (11, 3), (12, 4), (13, 5)],
[(14, 6), (15, 0), (16, 1), (17, 2), (18, 3), (19, 4), (20, 5)],
[(21, 6), (22, 0), (23, 1), (24, 2), (25, 3), (26, 4), (27, 5)],
[(28, 6), (29, 0), (30, 1), (31, 2), (0, 3), (0, 4), (0, 5)]]],
[[[(0, 6), (0, 0), (0, 1), (0, 2), (1, 3), (2, 4), (3, 5)],
[(4, 6), (5, 0), (6, 1), (7, 2), (8, 3), (9, 4), (10, 5)],
[(11, 6), (12, 0), (13, 1), (14, 2), (15, 3), (16, 4), (17, 5)],
[(18, 6), (19, 0), (20, 1), (21, 2), (22, 3), (23, 4), (24, 5)],
[(25, 6), (26, 0), (27, 1), (28, 2), (29, 3), (30, 4), (0, 5)]],
[[(0, 6), (0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (1, 5)],
[(2, 6), (3, 0), (4, 1), (5, 2), (6, 3), (7, 4), (8, 5)],
[(9, 6), (10, 0), (11, 1), (12, 2), (13, 3), (14, 4), (15, 5)],
[(16, 6), (17, 0), (18, 1), (19, 2), (20, 3), (21, 4), (22, 5)],
[(23, 6), (24, 0), (25, 1), (26, 2), (27, 3), (28, 4), (29, 5)],
[(30, 6), (31, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5)]]]]
This is equivalent to the data used by formatyear()
print calendar.TextCalendar(calendar.SUNDAY).formatyear(2007, 2, 1, 1, 2)
which for the same arguments produces output like:
$ python ./calendar_formatyear.py
2007
January February
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 1 2 3
7 8 9 10 11 12 13 4 5 6 7 8 9 10
14 15 16 17 18 19 20 11 12 13 14 15 16 17
21 22 23 24 25 26 27 18 19 20 21 22 23 24
28 29 30 31 25 26 27 28
March April
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 1 2 3 4 5 6 7
4 5 6 7 8 9 10 8 9 10 11 12 13 14
11 12 13 14 15 16 17 15 16 17 18 19 20 21
18 19 20 21 22 23 24 22 23 24 25 26 27 28
25 26 27 28 29 30 31 29 30
May June
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 1 2
6 7 8 9 10 11 12 3 4 5 6 7 8 9
13 14 15 16 17 18 19 10 11 12 13 14 15 16
20 21 22 23 24 25 26 17 18 19 20 21 22 23
27 28 29 30 31 24 25 26 27 28 29 30
July August
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7 1 2 3 4
8 9 10 11 12 13 14 5 6 7 8 9 10 11
15 16 17 18 19 20 21 12 13 14 15 16 17 18
22 23 24 25 26 27 28 19 20 21 22 23 24 25
29 30 31 26 27 28 29 30 31
September October
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 1 2 3 4 5 6
2 3 4 5 6 7 8 7 8 9 10 11 12 13
9 10 11 12 13 14 15 14 15 16 17 18 19 20
16 17 18 19 20 21 22 21 22 23 24 25 26 27
23 24 25 26 27 28 29 28 29 30 31
30
November December
Su Mo Tu We Th Fr Sa Su Mo Tu We Th Fr Sa
1 2 3 1
4 5 6 7 8 9 10 2 3 4 5 6 7 8
11 12 13 14 15 16 17 9 10 11 12 13 14 15
18 19 20 21 22 23 24 16 17 18 19 20 21 22
25 26 27 28 29 30 23 24 25 26 27 28 29
30 31
If you want to format the output yourself for some reason (such as including links in HTML output), you will find the day_name, day_abbr, month_name, and month_abbr module attributtes useful. They are automatically configured correctly for the current locale.
Calculation Examples:
Although the calendar module focuses mostly on printing full calendars in various formats, it also provides functions useful for working with dates in other ways, such as calculating dates for a recurring event. For example, the Python Atlanta User's Group meets the 2nd Thursday of every month. To calculate the dates for the meetings for a year, you could use the return value of monthcalendar().
pprint.pprint(calendar.monthcalendar(2007, 7))
Notice that some days are 0. Those are days of the week which overlap with the given month but which are part of another month.
$ python calendar_monthcalendar.py
[[0, 0, 0, 0, 0, 0, 1],
[2, 3, 4, 5, 6, 7, 8],
[9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22],
[23, 24, 25, 26, 27, 28, 29],
[30, 31, 0, 0, 0, 0, 0]]
Remember that, by default, the first day of the week is Monday. It is possible to change that by calling setfirstweekday(). On the other hand, since the calendar module includes constants for indexing into the date ranges returned by monthcalendar(), it is more convenient to skip that step in this case.
To calculate the PyATL meeting dates for 2007, assuming the second Thursday of every month, we can use the 0 values to tell us whether the Thursday of the first week is included in the month (or if the month starts, for example on a Friday).
import calendar
# Show every month
for month in range(1, 13):
# Compute the dates for each week which overlaps the month
c = calendar.monthcalendar(2007, month)
first_week = c[0]
second_week = c[1]
third_week = c[2]
# If there is a Thursday in the first week, the second Thursday
# is in the second week. Otherwise the second Thursday must
# be in the third week.
if first_week[calendar.THURSDAY]:
meeting_date = second_week[calendar.THURSDAY]
else:
meeting_date = third_week[calendar.THURSDAY]
print '%3s: %2s' % (month, meeting_date)
So the PyATL meeting schedule for the year is:
$ python calendar_secondthursday.py
1: 11
2: 8
3: 8
4: 12
5: 10
6: 14
7: 12
8: 9
9: 13
10: 11
11: 8
12: 13
References:
Python Module of the Week Home
Download Sample Code
Technorati Tags:
python, PyMOTW
Saturday, July 21, 2007
Converting podcasts to regular tracks in iTunes
I have spent the better part of the morning trying to work out how to convert podcasts to "regular" tracks in iTunes, so they would show up in shuffle, etc. Mostly this was for my collection of Jonathan Coulton "Thing a Week" episodes, but it would be useful for anything you wanted to move out of your podcast list into the main audio portion of the library. I suppose the reason it took so long to find the solution is I started by searching for it on Google instead of just looking through the iTunes menu options, though as you will see the solution wasn't immediately obvious even once I had found it.
I knew that iTunes would let me change settings like "Remember playback position" and "Skip when shuffling" from the info dialog (Cmd-I), so I started there. The only setting that even mentioned podcast was the genre, and I already knew that was not what I needed to change. Some of the tracks already had "real" genres and only some were set to "Podcast".
I also knew there is a separate podcast flag available for queries in Smart Lists and AppleScript, since I used it to create my "Active Queue" podcast playlist with selections of episodes from various podcast series (it's like creating your own mix tape, but for talk radio). I tried to write a simple AppleScript to change the podcast flag of selected tracks to true. It turns out the flag is a read-only attribute. No amount of searching uncovered any way to change the setting using AppleScript.
The first useful looking suggestions I ran into were to convert the ID3 tag format to an older version, then convert it back. Doing that erased most of the comments and other meta-data associated with the tracks, though, so I didn't like the results.
Next I found a few forum and blog posts that talked about an ITUNESPODCAST setting in the extended ID3 tags. They all mentioned a Windows program for removing or changing the flag, though. I examined a few of the files with Ned Batchelder's python module id3reader, but didn't see anything that looked like "ITUNESPODCAST" in the output.
Going back to Google, I finally found a reference to converting the files to AAC using an option in the Advanced menu. That seemed like overkill, but at this point I was becoming fed up and just wanted to be done with the whole thing. I could always re-encode as MP3, after all. Well, iTunes didn't have a menu option to "Encode as AAC". It did have "Convert selection to MP3", which didn't make much sense to me. As far as I knew, the tracks were already MP3 files. But lo and behold, selecting that menu option did enable them in the iTunes Music Library. It made copies of all of the tracks as it converted them, so I could even delete the podcast subscription.
So, if you want to add podcast episodes you have already downloaded to your music library and turn off the podcast flag, select the track and choose Advanced->Convert selection to MP3.
I knew that iTunes would let me change settings like "Remember playback position" and "Skip when shuffling" from the info dialog (Cmd-I), so I started there. The only setting that even mentioned podcast was the genre, and I already knew that was not what I needed to change. Some of the tracks already had "real" genres and only some were set to "Podcast".
I also knew there is a separate podcast flag available for queries in Smart Lists and AppleScript, since I used it to create my "Active Queue" podcast playlist with selections of episodes from various podcast series (it's like creating your own mix tape, but for talk radio). I tried to write a simple AppleScript to change the podcast flag of selected tracks to true. It turns out the flag is a read-only attribute. No amount of searching uncovered any way to change the setting using AppleScript.
The first useful looking suggestions I ran into were to convert the ID3 tag format to an older version, then convert it back. Doing that erased most of the comments and other meta-data associated with the tracks, though, so I didn't like the results.
Next I found a few forum and blog posts that talked about an ITUNESPODCAST setting in the extended ID3 tags. They all mentioned a Windows program for removing or changing the flag, though. I examined a few of the files with Ned Batchelder's python module id3reader, but didn't see anything that looked like "ITUNESPODCAST" in the output.
Going back to Google, I finally found a reference to converting the files to AAC using an option in the Advanced menu. That seemed like overkill, but at this point I was becoming fed up and just wanted to be done with the whole thing. I could always re-encode as MP3, after all. Well, iTunes didn't have a menu option to "Encode as AAC". It did have "Convert selection to MP3", which didn't make much sense to me. As far as I knew, the tracks were already MP3 files. But lo and behold, selecting that menu option did enable them in the iTunes Music Library. It made copies of all of the tracks as it converted them, so I could even delete the podcast subscription.
So, if you want to add podcast episodes you have already downloaded to your music library and turn off the podcast flag, select the track and choose Advanced->Convert selection to MP3.
Wednesday, July 18, 2007
Unexpectedly broken, and fixed: svnbackup
Yesterday Pierre Lemay sent me one of the clearest bug reports I've seen in quite a while, and a patch to fix the problem. He was having trouble with svnbackup duplicating changesets in the dump files. It turns out every changeset that appeared on a "boundary" (at the end of one dump file and the beginning of the next) was included in both dumps. Oops.
When I tested the script, I was able to recover the repository without any trouble. I didn't check 2 cases that Pierre encountered. First, the changeset revision numbers did not stay consistent. When the changeset was duplicated, that threw off all of the subsequent changeset idhttp://www.blogger.com/img/gl.link.gifs by 1. For each duplicate. That in itself is only annoying. The more troubling problem is when a duplicate changeset includes a delete operation. The second delete would fail while restoring, which prevented Pierre from importing the rest of the backup.
In his email describing all of that, he gave me great details about how he had tested, the specific scenario that caused the problem, and then provided the fix!
So, if you are using version 1.0, go on over and download version 1.1 with Pierre's fixes.
When I tested the script, I was able to recover the repository without any trouble. I didn't check 2 cases that Pierre encountered. First, the changeset revision numbers did not stay consistent. When the changeset was duplicated, that threw off all of the subsequent changeset idhttp://www.blogger.com/img/gl.link.gifs by 1. For each duplicate. That in itself is only annoying. The more troubling problem is when a duplicate changeset includes a delete operation. The second delete would fail while restoring, which prevented Pierre from importing the rest of the backup.
In his email describing all of that, he gave me great details about how he had tested, the specific scenario that caused the problem, and then provided the fix!
So, if you are using version 1.0, go on over and download version 1.1 with Pierre's fixes.
Sunday, July 15, 2007
Unexpectedly popular: svnbackup
My svnbackup script is the 2nd most popular page on my site, after the PyMOTW home page, and search terms such as "svn backup" and "svn backup script" regularly appear at the top of the list of sources of traffic to my site. The link to svnbackup doesn't appear on the first page of Google's search results, a sign I take to mean that this problem isn't well understood or solved (otherwise, why would so many people page through the search results to find it?).
Requirements
I created svnbackup to manage off-site backups of my svn repository and the repository we run at work. The requirements were pretty basic:
1. Incremental backups.
2. Easy to restore.
3. Safe if the backups were running during a transaction.
Both repositories use FSFS to store the repository contents, so according to some reports it would be safe to simply rsync (or otherwise backup) the raw repository data as long as (or possible even if) a transaction was in process. It turns out to not be so difficult to do a safer backup, though, so that's what I went with.
Solution
The obvious solution is to use "svnadmin dump", to extract transaction information from the repository. svnbackup.sh is a wrapper around svnadmin to produce reasonably-sized chunks for the backups. The only real problems I had to solve were how to track what had been backed up and how to move the backup output offsite.
Tracking the last revision number which had been backed up is easy using a simple text file on the svn server. If the information is lost, the worst thing that happens is the next run backs up the entire repository again. That can be time consuming, but is not destructive. Copying the files offsite is handled via scp.
Alternatives
Other alternatives have more or different options. I like python, and obviously use it a lot, but I'm not sure I would have used it as a shell script replacement as the folks at collab.net did. On the other hand, I didn't care that my tool doesn't run on Windows (thought it might, with Cygwin) and they do. If I had found their tool when I needed it, I probably would not have written my own, since the features are largely the same. Their off-site support uses ftp, not scp, but it looks like it would be straightforward to add the scp support.
Requirements
I created svnbackup to manage off-site backups of my svn repository and the repository we run at work. The requirements were pretty basic:
1. Incremental backups.
2. Easy to restore.
3. Safe if the backups were running during a transaction.
Both repositories use FSFS to store the repository contents, so according to some reports it would be safe to simply rsync (or otherwise backup) the raw repository data as long as (or possible even if) a transaction was in process. It turns out to not be so difficult to do a safer backup, though, so that's what I went with.
Solution
The obvious solution is to use "svnadmin dump", to extract transaction information from the repository. svnbackup.sh is a wrapper around svnadmin to produce reasonably-sized chunks for the backups. The only real problems I had to solve were how to track what had been backed up and how to move the backup output offsite.
Tracking the last revision number which had been backed up is easy using a simple text file on the svn server. If the information is lost, the worst thing that happens is the next run backs up the entire repository again. That can be time consuming, but is not destructive. Copying the files offsite is handled via scp.
Alternatives
Other alternatives have more or different options. I like python, and obviously use it a lot, but I'm not sure I would have used it as a shell script replacement as the folks at collab.net did. On the other hand, I didn't care that my tool doesn't run on Windows (thought it might, with Cygwin) and they do. If I had found their tool when I needed it, I probably would not have written my own, since the features are largely the same. Their off-site support uses ftp, not scp, but it looks like it would be straightforward to add the scp support.
PyMOTW: getpass
Module: getpass
Purpose: Prompt the user for a value, usually a password, without echoing what they type to the console.
Python Version: 1.5.2
Description:
Many programs which interact with the user via the terminal need to ask the user for password values without showing what the user types on the screen. The getpass module provides a portable way to handle such password prompts securely.
Example:
The getpass() function prints a prompt then reads input from the user until they press return. The input is passed back as a string to the caller.
The default prompt, if none is specified by the caller, is "Password:".
Of course the prompt can be anything your program needs.
I don't recommend such an insecure authentication scheme, but it illustrates the point.
By default, getpass() uses stdout to print the prompt string. For a program which may produce useful output on sys.stdout, it is useful to send the prompt to another stream such as sys.stderr.
This way standard output can be redirected (to a pipe or file) without seeing the password prompt. The value entered by the user is still not echoed back to the screen.
Using getpass Without a Terminal
Under Unix, getpass() always requires a tty it can control via termios, so echo can be disabled. This means values will not be read from a non-terminal stream redirected to standard input.
It is up to the caller to detect when the input stream is not a tty and use an alternate method for reading in that case.
With a tty:
Without a tty:
References:
Python Module of the Week Home
Download Sample Code
Purpose: Prompt the user for a value, usually a password, without echoing what they type to the console.
Python Version: 1.5.2
Description:
Many programs which interact with the user via the terminal need to ask the user for password values without showing what the user types on the screen. The getpass module provides a portable way to handle such password prompts securely.
Example:
The getpass() function prints a prompt then reads input from the user until they press return. The input is passed back as a string to the caller.
import getpass
p = getpass.getpass()
print 'You entered:', p
The default prompt, if none is specified by the caller, is "Password:".
$ python getpass_defaults.py
Password:
You entered: sekret
Of course the prompt can be anything your program needs.
p = getpass.getpass(prompt='What is your favorite color? ')
if p.lower() == 'blue':
print 'Right. Off you go.'
else:
print 'Auuuuugh!'
I don't recommend such an insecure authentication scheme, but it illustrates the point.
$ python getpass_prompt.py
What is your favorite color?
Right. Off you go.
$ python getpass_prompt.py
What is your favorite color?
Auuuuugh!
By default, getpass() uses stdout to print the prompt string. For a program which may produce useful output on sys.stdout, it is useful to send the prompt to another stream such as sys.stderr.
import getpass
import sys
p = getpass.getpass(stream=sys.stderr)
print 'You entered:', p
This way standard output can be redirected (to a pipe or file) without seeing the password prompt. The value entered by the user is still not echoed back to the screen.
$ python getpass_stream.py >/dev/null
Password:
Using getpass Without a Terminal
Under Unix, getpass() always requires a tty it can control via termios, so echo can be disabled. This means values will not be read from a non-terminal stream redirected to standard input.
$ echo "sekret" | python getpass_defaults.py
Traceback (most recent call last):
File "getpass_defaults.py", line 34, in
p = getpass.getpass()
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/getpass.py", line 32, in unix_getpass
old = termios.tcgetattr(fd) # a copy to save
termios.error: (25, 'Inappropriate ioctl for device')
It is up to the caller to detect when the input stream is not a tty and use an alternate method for reading in that case.
import getpass
import os
import sys
if os.isatty(sys.stdin.fileno()):
p = getpass.getpass('Using getpass: ')
else:
print 'Using readline'
p = sys.stdin.readline().rstrip()
print 'Read: ', p
With a tty:
$ python ./getpass_noterminal.py
Using getpass:
Read: sekret
Without a tty:
$ echo "sekret" | python ./getpass_noterminal.py
Using readline
Read: sekret
References:
Python Module of the Week Home
Download Sample Code
Technorati Tags:
python, PyMOTW
Sunday, July 8, 2007
PyMOTW: atexit
Module: atexit
Purpose: Register function(s) to be called when a program is closing down.
Python Version: 2.1.3 and later
Description:
The atexit module provides a simple interface to register functions to be called when a program closes down normally. The sys module also provides a hook, sys.exitfunc, but only one function can be registered there. The atexit registry can be used by multiple modules and libraries simultaneously.
Examples:
A simple example of registering a function via atexit.register() looks like:
Since the program doesn't do anything else, all_done() is called right away:
It is also possible to register more than one function, and to pass arguments. That can be useful to cleanly disconnect from databases, remove temporary files, etc. Since it is possible to pass arguments to the registered functions, we don't even need to keep a separate list of things to clean up -- we can just register a clean up function more than once.
Notice that order in which the exit functions are called is not the reverse of the order they are registered. This allows modules to be cleaned up in the reverse order from which they are imported (and therefore register their atexit functions), which should reduce dependencies.
When are atexit functions not called?
The callbacks registered with atexit are not invoked if:
To illustrate a program being killed via a signal, we can modify one of the examples from the subprocess summary last week. There are 2 files involved, the parent and the child programs. The parent starts the child, pauses, then kills it:
The child sets up an atexit callback, to prove that it is not called.
When run, the output should look something like this:
Note that the child does not print the message embedded in not_called().
Similarly, if a program bypasses the normal exit path it can avoid having the atexit callbacks invoked.
Since we call os._exit() instead of exiting normally, the callback is not invoked.
If we had instead used sys.exit(), the callbacks would still have been called.
Simulating a fatal error in the Python interpreter is left as an exercise to the reader. :-)
Exceptions in atexit Callbacks
Tracebacks for exceptions raised in atexit callbacks are printed to the console and the last exception raised is re-raised to be the final error message of the program.
Notice again that the registration order controls the execution order. If an error in one callback introduces an error in another (registered earlier, but called later), the final error message might not be the most useful error message to show the user.
In general you will probably want to handle and quietly log all exceptions in your cleanup functions, since it is messy to have a program dump errors on exit.
References:
Python Module of the Week
Sample Code
Updated 9/5/2007 with minor formatting changes.
Updated 23 Aug 2007: Corrected download link.
Purpose: Register function(s) to be called when a program is closing down.
Python Version: 2.1.3 and later
Description:
The atexit module provides a simple interface to register functions to be called when a program closes down normally. The sys module also provides a hook, sys.exitfunc, but only one function can be registered there. The atexit registry can be used by multiple modules and libraries simultaneously.
Examples:
A simple example of registering a function via atexit.register() looks like:
import atexit
def all_done():
print 'all_done()'
print 'Registering'
atexit.register(all_done)
print 'Registered'
Since the program doesn't do anything else, all_done() is called right away:
$ python atexit_simple.py
Registering
Registered
all_done()
It is also possible to register more than one function, and to pass arguments. That can be useful to cleanly disconnect from databases, remove temporary files, etc. Since it is possible to pass arguments to the registered functions, we don't even need to keep a separate list of things to clean up -- we can just register a clean up function more than once.
def my_cleanup(name):
print 'my_cleanup(%s)' % name
atexit.register(my_cleanup, 'first')
atexit.register(my_cleanup, 'second')
atexit.register(my_cleanup, 'third')
Notice that order in which the exit functions are called is not the reverse of the order they are registered. This allows modules to be cleaned up in the reverse order from which they are imported (and therefore register their atexit functions), which should reduce dependencies.
$ python atexit_multiple.py
my_cleanup(third)
my_cleanup(second)
my_cleanup(first)
When are atexit functions not called?
The callbacks registered with atexit are not invoked if:
- the program dies because of a signal
- os._exit() is invoked directly
- a Python fatal error is detected (in the interpreter)
To illustrate a program being killed via a signal, we can modify one of the examples from the subprocess summary last week. There are 2 files involved, the parent and the child programs. The parent starts the child, pauses, then kills it:
import os
import signal
import subprocess
import time
proc = subprocess.Popen('atexit_signal_child.py')
print 'PARENT: Pausing before sending signal...'
time.sleep(1)
print 'PARENT: Signaling %s' % proc.pid
os.kill(proc.pid, signal.SIGTERM)
The child sets up an atexit callback, to prove that it is not called.
import atexit
import time
def not_called():
print 'CHILD: atexit handler should not have been called'
print 'CHILD: Registering atexit handler'
atexit.register(not_called)
print 'CHILD: Pausing to wait for signal'
time.sleep(5)
When run, the output should look something like this:
$ python atexit_signal_parent.py
CHILD: Registering atexit handler
CHILD: Pausing to wait for signal
PARENT: Pausing before sending signal...
PARENT: Signaling 2038
Note that the child does not print the message embedded in not_called().
Similarly, if a program bypasses the normal exit path it can avoid having the atexit callbacks invoked.
import atexit
import os
def not_called():
print 'This should not be called'
print 'Registering'
atexit.register(not_called)
print 'Registered'
print 'Exiting...'
os._exit(0)
Since we call os._exit() instead of exiting normally, the callback is not invoked.
$ python atexit_os_exit.py
Registering
Registered
Exiting...
If we had instead used sys.exit(), the callbacks would still have been called.
import atexit
import sys
def all_done():
print 'all_done()'
print 'Registering'
atexit.register(all_done)
print 'Registered'
print 'Exiting...'
sys.exit()
$ python atexit_sys_exit.py
Registering
Registered
Exiting...
all_done()
Simulating a fatal error in the Python interpreter is left as an exercise to the reader. :-)
Exceptions in atexit Callbacks
Tracebacks for exceptions raised in atexit callbacks are printed to the console and the last exception raised is re-raised to be the final error message of the program.
def exit_with_exception(message):
raise RuntimeError(message)
atexit.register(exit_with_exception, 'Registered first')
atexit.register(exit_with_exception, 'Registered second')
Notice again that the registration order controls the execution order. If an error in one callback introduces an error in another (registered earlier, but called later), the final error message might not be the most useful error message to show the user.
$ python atexit_exception.py
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "atexit_exception.py", line 36, in exit_with_exception
raise RuntimeError(message)
RuntimeError: Registered second
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "atexit_exception.py", line 36, in exit_with_exception
raise RuntimeError(message)
RuntimeError: Registered first
Error in sys.exitfunc:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "atexit_exception.py", line 36, in exit_with_exception
raise RuntimeError(message)
RuntimeError: Registered first
In general you will probably want to handle and quietly log all exceptions in your cleanup functions, since it is messy to have a program dump errors on exit.
References:
Python Module of the Week
Sample Code
Updated 9/5/2007 with minor formatting changes.
Technorati Tags:
python, PyMOTW
Updated 23 Aug 2007: Corrected download link.
Sunday, July 1, 2007
PyMOTW: subprocess
Module: subprocess
Purpose: Spawn and communicate with additional processes.
Python Version: New in 2.4
An updated version of this article can be found on the main PyMOTW site.
Description:
The subprocess module provides a consistent interface to creating and working with additional processes. It offers a higher-level interface than some of the other available modules, and is intended to replace functions such as os.system, os.spawn*, os.popen*, popen2.* and commands.*. To make it easier to compare subprocess with those other modules, this week I will re-create earlier examples using the functions being replaced.
The subprocess module defines one class, Popen() and a few wrapper functions which use that class. Popen() takes several arguments to make it easier to set up the new process, and then communicate with it via pipes. I will concentrate on example code here; for a complete description of the arguments, refer to section 17.1.1 of the library documentation.
A Note About Portability
The API is roughly the same, but the underlying implementation is slightly different between Unix and Windows. All of the examples shown here were tested on Mac OS X. Your mileage on a non-Unix OS will vary.
Running External Command
To run an external command without interacting with it, such as one would do with os.system(), Use the call() function.
And since we set shell=True, shell variables in the command string are expanded:
Reading Output of Another Command
By passing different arguments for stdin, stdout, and stderr it is possible to mimic the variations of os.popen().
Reading from the output of a pipe:
Writing to the input of a pipe:
Reading and writing, as with popen2:
Separate streams for stdout and stderr, as with popen3:
Merged stdout and stderr, as with popen4:
Sample output:
All of the above examples assume a limited amount of interaction. The communicate() method reads all of the output and waits for child process to exit before returning. It is also possible to write to and read from the individual pipe handles used by the Popen instance. To illustrate this, I will use this simple echo program which reads its standard input and writes it back to standard output:
Make note of the fact that repeater.py writes to stderr when it starts and stops. We can use that to show the lifetime of the subprocess in the next example. The following interaction example uses the stdin and stdout file handles owned by the Popen instance in different ways. In the first example, a sequence of 10 numbers are written to stdin of the process, and after each write the next line of output is read back. In the second example, the same 10 numbers are written but the output is read all at once using communicate().
Notice where the "repeater.py: exiting" lines fall in the output for each loop:
Signaling Between Processes
In part 4 of the series on the os module I included an example of signaling between processes using os.fork() and os.kill(). Since each Popen instance provides a pid attribute with the process id of the child process, it is possible to do something similar with subprocess. For this example, I will again set up a separate script for the child process to be executed by the parent process.
And now the parent process:
And the output should look something like this:
Conclusions
As you can see, subprocess can be much easier to work with than fork, exec, and pipes on their own. It provides all of the functionality of the other modules and functions it replaces, and more. The API is consistent for all uses and many of the extra steps of overhead needed (such as closing extra file descriptors, ensuring the pipes are closed, etc.) are "built in" instead of being handled by your code separately.
References:
Python Module of the Week
Sample code
PyMOTW: os (Part 2)
PyMOTW: os (Part 4)
Updated 9/5/2007 with minor formatting changes.
Updated 3/16/2009 with reference to maintained version of this article.
Purpose: Spawn and communicate with additional processes.
Python Version: New in 2.4
An updated version of this article can be found on the main PyMOTW site.
Description:
The subprocess module provides a consistent interface to creating and working with additional processes. It offers a higher-level interface than some of the other available modules, and is intended to replace functions such as os.system, os.spawn*, os.popen*, popen2.* and commands.*. To make it easier to compare subprocess with those other modules, this week I will re-create earlier examples using the functions being replaced.
The subprocess module defines one class, Popen() and a few wrapper functions which use that class. Popen() takes several arguments to make it easier to set up the new process, and then communicate with it via pipes. I will concentrate on example code here; for a complete description of the arguments, refer to section 17.1.1 of the library documentation.
A Note About Portability
The API is roughly the same, but the underlying implementation is slightly different between Unix and Windows. All of the examples shown here were tested on Mac OS X. Your mileage on a non-Unix OS will vary.
Running External Command
To run an external command without interacting with it, such as one would do with os.system(), Use the call() function.
import subprocess
# Simple command
subprocess.call('ls -l', shell=True)
$ python replace_os_system.py
total 16
-rw-r--r-- 1 dhellman dhellman 0 Jul 1 11:29 __init__.py
-rw-r--r-- 1 dhellman dhellman 1316 Jul 1 11:32 replace_os_system.py
-rw-r--r-- 1 dhellman dhellman 1167 Jul 1 11:31 replace_os_system.py~
And since we set shell=True, shell variables in the command string are expanded:
# Command with shell expansion
subprocess.call('ls -l $HOME', shell=True)
total 40
drwx------ 10 dhellman dhellman 340 Jun 30 18:45 Desktop
drwxr-xr-x 15 dhellman dhellman 510 Jun 19 07:08 Devel
drwx------ 29 dhellman dhellman 986 Jun 29 07:44 Documents
drwxr-xr-x 44 dhellman dhellman 1496 Jun 29 09:51 DownloadedApps
drwx------ 55 dhellman dhellman 1870 May 22 14:53 Library
drwx------ 8 dhellman dhellman 272 Mar 4 2006 Movies
drwx------ 11 dhellman dhellman 374 Jun 21 07:04 Music
drwx------ 12 dhellman dhellman 408 Jul 1 01:00 Pictures
drwxr-xr-x 5 dhellman dhellman 170 Oct 1 2006 Public
drwxr-xr-x 15 dhellman dhellman 510 May 12 15:19 Sites
drwxr-xr-x 5 dhellman dhellman 170 Oct 5 2005 cfx
drwxr-xr-x 4 dhellman dhellman 136 Jan 23 2006 iPod
-rw-r--r-- 1 dhellman dhellman 204 Jun 18 17:07 pgadmin.log
drwxr-xr-x 3 dhellman dhellman 102 Apr 29 16:32 tmp
Reading Output of Another Command
By passing different arguments for stdin, stdout, and stderr it is possible to mimic the variations of os.popen().
Reading from the output of a pipe:
print '\nread:'
proc = subprocess.Popen('echo "to stdout"',
shell=True,
stdout=subprocess.PIPE,
)
stdout_value = proc.communicate()[0]
print '\tstdout:', repr(stdout_value)
Writing to the input of a pipe:
print '\nwrite:'
proc = subprocess.Popen('cat -',
shell=True,
stdin=subprocess.PIPE,
)
proc.communicate('\tstdin: to stdin\n')
Reading and writing, as with popen2:
print '\npopen2:'
proc = subprocess.Popen('cat -',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
stdout_value = proc.communicate('through stdin to stdout')[0]
print '\tpass through:', repr(stdout_value)
Separate streams for stdout and stderr, as with popen3:
print '\npopen3:'
proc = subprocess.Popen('cat -; echo ";to stderr" 1>&2',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
stdout_value, stderr_value = proc.communicate('through stdin to stdout')
print '\tpass through:', repr(stdout_value)
print '\tstderr:', repr(stderr_value)
Merged stdout and stderr, as with popen4:
print '\npopen4:'
proc = subprocess.Popen('cat -; echo ";to stderr" 1>&2',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
stdout_value, stderr_value = proc.communicate('through stdin to stdout\n')
print '\tcombined output:', repr(stdout_value)
Sample output:
read:
stdout: 'to stdout\n'
write:
stdin: to stdin
popen2:
pass through: 'through stdin to stdout'
popen3:
pass through: 'through stdin to stdout'
stderr: ';to stderr\n'
popen4:
combined output: 'through stdin to stdout\n;to stderr\n'
All of the above examples assume a limited amount of interaction. The communicate() method reads all of the output and waits for child process to exit before returning. It is also possible to write to and read from the individual pipe handles used by the Popen instance. To illustrate this, I will use this simple echo program which reads its standard input and writes it back to standard output:
import sys
sys.stderr.write('repeater.py: starting\n')
while True:
next_line = sys.stdin.readline()
if not next_line:
break
sys.stdout.write(next_line)
sys.stdout.flush()
sys.stderr.write('repeater.py: exiting\n')
Make note of the fact that repeater.py writes to stderr when it starts and stops. We can use that to show the lifetime of the subprocess in the next example. The following interaction example uses the stdin and stdout file handles owned by the Popen instance in different ways. In the first example, a sequence of 10 numbers are written to stdin of the process, and after each write the next line of output is read back. In the second example, the same 10 numbers are written but the output is read all at once using communicate().
import subprocess
print 'One line at a time:'
proc = subprocess.Popen('repeater.py',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
for i in range(10):
proc.stdin.write('%d\n' % i)
output = proc.stdout.readline()
print output.rstrip()
proc.communicate()
print 'All output at once:'
proc = subprocess.Popen('repeater.py',
shell=True,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
)
for i in range(10):
proc.stdin.write('%d\n' % i)
output = proc.communicate()[0]
print output
Notice where the "repeater.py: exiting" lines fall in the output for each loop:
$ python interaction.py
One line at a time:
repeater.py: starting
0
1
2
3
4
5
6
7
8
9
repeater.py: exiting
All output at once:
repeater.py: starting
repeater.py: exiting
0
1
2
3
4
5
6
7
8
9
Signaling Between Processes
In part 4 of the series on the os module I included an example of signaling between processes using os.fork() and os.kill(). Since each Popen instance provides a pid attribute with the process id of the child process, it is possible to do something similar with subprocess. For this example, I will again set up a separate script for the child process to be executed by the parent process.
import os
import signal
import time
def signal_usr1(signum, frame):
"Callback invoked when a signal is received"
pid = os.getpid()
print 'Received USR1 in process %s' % pid
print 'CHILD: Setting up signal handler'
signal.signal(signal.SIGUSR1, signal_usr1)
print 'CHILD: Pausing to wait for signal'
time.sleep(5)
And now the parent process:
import os
import signal
import subprocess
import time
proc = subprocess.Popen('signal_child.py')
print 'PARENT: Pausing before sending signal...'
time.sleep(1)
print 'PARENT: Signaling %s' % proc.pid
os.kill(proc.pid, signal.SIGUSR1)
And the output should look something like this:
$ python signal_parent.py
CHILD: Setting up signal handler
CHILD: Pausing to wait for signal
PARENT: Pausing before sending signal...
PARENT: Signaling 4124
Received USR1 in process 4124
Conclusions
As you can see, subprocess can be much easier to work with than fork, exec, and pipes on their own. It provides all of the functionality of the other modules and functions it replaces, and more. The API is consistent for all uses and many of the extra steps of overhead needed (such as closing extra file descriptors, ensuring the pipes are closed, etc.) are "built in" instead of being handled by your code separately.
References:
Python Module of the Week
Sample code
PyMOTW: os (Part 2)
PyMOTW: os (Part 4)
Updated 9/5/2007 with minor formatting changes.
Updated 3/16/2009 with reference to maintained version of this article.
Technorati Tags:
python, PyMOTW
Subscribe to:
Posts (Atom)