Create a podcast from your media files

RSS and Atom feeds, originally meant for HTML content aggregation, have been widely used for multimedia contents distribution. When a feed is used to broadcast multimedia files, it is referred to as podcast. You might be yourself an avid consumer of podcasts or you might have heard this term and you don’t know what exactly is. In either case, you might be interested in the creation of your own podcast, be that to use your favorite podcast client to play your media files or to share content with others.

In this article, we’ll go beyond the “consumption” of podcasts and learn how to turn your media files library (or part of it) into an RSS feed (or say a podcast) and use it within the scope of your home network.

Outline

What is a podcast?

Many TV and radio stations make their programs available on-line for a certain spell of time, after the first broadcast, so that their audience can view, review or listen to their missed favorite programs any time they please. With smart-phones and tablets, the use of podcasts has become way more fun.

Well, this is actually (and fortunately) not the only reason for a podcast to exist. Podcasts can emerge from and cover a wide variety of uses and themes, from computer hacks to yoga, not to forget language courses and college lectures. In fact, you don’t need to own a radio station to diffuse your yummy recipes.

What are RSS and Atom?

Suppose you have a number of articles, videos or audio files that you want to make available to people to whom it may be interesting. One  solution to do so is drop your content into a directory on your website and share the link to that directory. This works but is not very brilliant because people will have to visit your page regularly in search of new content, unless they want to miss your latest article. Moreover, any web masters wishing to include your incredibly interesting articles (or links to them) into their own website(s) will be brassed off because of your dynamically changed contents.

One way to overcome this is to present your content in a way that computers can understand and manage efficiently, without a “human” intervention. RSS (Really Simple Syndication) and Atom formats are used to create a computer-comprehensible interface to your content, so that an RSS/Atom client (also known as feed reader) can automatically fetch your recent publications as you put them on-line.

Using RSS

In a nutshell, RSS and Atom use a standard XML file, the so called feed, to represent the content. As they both use plain text in XML format, the feeds are also readable and editable by humans. There are no substantial differences between RSS and Atom. In this article (and code), we only use RSS 2.0. For more information about feed formats, you can read this article.

Now, this is an example of a RSS 2.0 feed containing two items (an item is an article, a web page, a media file etc.) hosted on a web site called http://www.example.com.

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
   <channel>
      <title>Healthy Cooking</title>
      <description>This podcast is about healthy recipes</description>
      <link>http://www.example.com/</link>
 
      <item>
         <guid>http://www.example.com/podcast/media/Add_Olive_Oil_To_Everything.mp3</guid>
         <link>http://www.example.com/podcast/media/Add_Olive_Oil_To_Everything.mp3</link>
         <title>How to add olive oil to almost everything you cook</title>
         <description>Did you know that olive oil can be added to almost everything you cook...</description>
         <pubDate>Sun, 9 Nov 2014 12:00:00 -0000</pubDate>
         <enclosure url="http://www.example.com/podcast/media/Add_Olive_Oil_To_Everything.mp3" type="audio/mpeg" length="40946"/>
      </item>
 
      <item>
         <guid>http://www.example.com/podcast/media/Honey_with_Avocado.mp3</guid>
         <link>http://www.example.com/podcast/media/Avocado_with_Honey.mp3</link>
         <title>A salad with avocado and honey</title>
         <description>Have you ever tried a half avocado with a bit of pure bee honey</description>
         <pubDate>Tue, 23 Dec 2014 13:07:20 -0000</pubDate>
         <enclosure url="http://www.example.com/podcast/media/Honey_with_Avocado.mp3" type="audio/mpeg" length="61358"/>
      </item>
 
 </channel>
</rss>

As we can see, a feed contains a channel made up of a title, a description, a link and a sequence of item tags. Each item tag hosts the following elements:

  • guid: a unique identifier of the item. It can be the URL of the item.
  • link: URL of the item.
  • title: a short text to whet the appetite of the users (or to save their time !).
  • description: a longer text for the potentially interested users.
  • pubDate: date of publication of the item in RFC-822 format.
  • enclosure: some podcast clients won’t download the media file unless you explicitly set it as an enclosure.

Note that some of these elements are not mandatory but they can greatly improve the experience of using the podcast. Take the pubDate tag for instance. If you don’t supply it, a podcast reader may organize the items in a way you don’t expect (and like).

The core of this article is a tool (genRSS) that can do (almost) everything for you and generate a podcast feed from a directory that contains your media files. But before we go any further, I am going to try to answer the unavoidable question:

Why do I need/want to create my own podcast?

Here are a few reasons:

  • On your smart-phone, you’d rather use your nice podcast reader to play your audio files than a media player
  • You’ve a bunch of FSI language courses with no mp3 tags and you want to use them correctly on your computer or mobile device (mp3 tags such as artist, album, year, etc., are used by media players to organize your media library so you can effectively use it)
  • You’ve a very limited space on your mobile device and you seek a way to stream your videos from your computer to the device (like a Video On Demand)  using your home wifi.
  • You want to share some content with your co-tenants, friends  or family members without juggling with a USB stick.
  • Use your imagination !

What do you need?

  • Python 2.7 or higher.
  • A home wifi connection. If you don’t have a wifi network, you can still use your computer to test everything locally.
  • A web server. Don’t worry if you don’t have an installed and configured web server on your machine. If you have Python, you almost have a basic web server.
  • A couple of media files.

Download the code and test it

Download the code from (click on Download ZIP):  https://github.com/amsehili/genRSS and extract the zip file into a directory on your file system. You can also clone the code from the repository:

mkdir genRSS
git clone https://github.com/amsehili/genRSS.git genRSS

Try the python web server

In order to expose any content to other machines, you need a protocol. Podcasts are normally diffused using the web protocol, HTTP. So to make a podcast viewable by other devices, you will need to transform your machine into a web server. Perhaps you will face the need to install a decent web server on your machine such as Apache or Lighttpd in the future. To keep things simple, everything we need in this article can already be fulfilled by the tinyHttpServer.py script.

To launch the server  type:

python tinyHttpServer.py

or simply run this command (found on http://www.commandlinefu.com):

python -m SimpleHTTPServer 8080

Then open a new tab and type (or simply click on) http://localhost:8080

If everything went well, you must be having a web page that contains a link to tinyHttpServer.py itself and anything within the same directory.

localhost here is the name of your machine and 8080 is the port that our server is listening at (you can modify the source if you want to use another port). In order for other devices sharing the same network to reach the server, they need to know the IP address of your machine. Go to your connection information panel and copy your IP address. If your IP address is 192.168.1.22 you can reach your server with the following address: http://192.168.1.22:8080

Generate the first podcast

genRSS is delivered with a directory (test/media) that contains empty media files and subdirectories. We will use it here to test genRSS.py.

Here is the simplest command to generate a podcast feed from files in test/media

python genRSS.py -d test/media --host localhost:8080 -o feed.rss

If you open http://localhost:8080/feed.rss with your browser you will find the content of the just generated feed (1.mp3, 1.mp4, 1.mp4, 1.ogg 2.MP3).

This is a really basic feed, you may want your podcast to:

  • Have a title and a description
python genRSS.py -d test/media --host localhost:8080 --title "A simple podcast" --description "This is the description" -o feed.rss

or

python genRSS.py -d test/media -H localhost:8080 -t "A simple podcast" -p "This is the description" -o feed.rss

You can now refresh the page and see the change.

  •  Include all files in subdirectories
python genRSS.py -d test/media --recursive -H localhost:8080 --t "A simple podcast" -p "This is the description" -o feed.rss
  •  Only include files with a given extension(s)
python genRSS.py -d test/media -e mp3,ogg -H localhost:8080 -t "A simple podcast" -p "This is the description" -o feed.rss
  •  Use an IP address instead of localhost
python genRSS.py -d test/media -H 192.168.1.22:8080 -t "A simple podcast" -p "This is the description" -o feed.rss
  • Add an image
python genRSS.py -d test/media -H 192.168.1.22:8080 -i images/logo.jpg -t "A simple podcast" -p "This is the description" -o feed.rss

Obviously, image/logo.jpg must be visible to the web server (i.e. located in the same directory as tinyHttpServer.py). You can also use an image from the web by supplying its full http or https URL to the -i option.

Check your feed with a validator

If you want to check the validity of your feed, you can use the W3C feed validaor (copy the text of your feed and paste it into the text area). Don’t worry if you get a warning for a “missing atom:link”.

Test you podcast on a mobile device

Get some real media files, put them in a directory where tinyHttpServer.py is located and generate a podcast for the directory. On a tablet or a smart-phone connected to the same network as your local machine, open a web browser and type your machine’s IP address plus :8080. You should be able to see the content of the server. Copy the link of your feed, open a podcast reader and add the link as a new feed. If the the reader has no trouble finding the podcast, you can start streaming or downloading your media files.

Comments to the code

I think the code is simple and fairly documented. Here are however a couple of comments that can be helpful for the interested developer.

Test the code

The code is tested with doctest. If you wish to re-run the test, open genRSS.py, activate the test mode (TESTRUN = 1), and type:

python genRSS.py -v

Functions

  • getFiles(dirname, extensions=None, recursive=False): crawls a given directory looking for files and returns a list of relative paths.  extensions is a list of string used to restrict the files to the desired set of case-insensitive extensions.
  • buildItem(link, title, guid = None, description=””, pubDate=None, indent = ”   “, extraTags=None): builds a RSS 2.0 item and returns it as string (new line characters included).

How items are sorted in the feed

Unless an explicit -C (or –sort-creation) command line option for genRSS.py is given, which means that items are to appear in the podcast sorted by their date of creation (newest item on top), we assume that items should be sorted by name.

This is however problematic because we need to supply a pubDate tag for each item so that a podcast reader can sort them the way we want it to. What the code does is to consider that the first item in a list of items sorted by name is published right now (at the time of creation of the feed), the second item, n minutes and f1 seconds ago, the third item 2*n minutes and f2 seconds ago and so on. In the code, n=1 and f1, f2, … fk is a random number of seconds between 0 and 10.

fileNames = getFiles(dirname.encode("utf-8"), extensions=opts.extensions, recursive=opts.recursive)
        
        if opts.sort_creation:
            pubDates = [os.path.getctime(f) for f in fileNames]
            sortedFiles = sorted(zip(fileNames, pubDates),key=lambda f: - f[1])
        
        else:
            now = time.time()
            import random
            pubDates = [now - (60 * d + (random.random() * 10)) for d in xrange(len(fileNames))]
            sortedFiles = zip(fileNames, pubDates)


Enclosures for audio and video items

As mentioned above, we need to add an enclosure tag to an item so that a client downloads it. This is particularly useful for audio and video files. To check the type of a file we use the mimetypes package:

        import mimetypes
        mtype = mimetypes.guess_type(fname)[0]
        if "audio" in mtype or "video" in mtype:
           # generate an enclosure tag


RFC-822 dates

pubDates must follow the RFC-822 format. Here are the lines of code that do so:

import time
now = time.time()
time.strftime("%a, %d %b %Y %H:%M:%S -0000", time.localtime(now))

Links

Code’s repository: https://github.com/amsehili/genRSS

RSS and Atom tutorials:

Feed validator: http://validator.w3.org/feed/

RFC-822 date format: http://www.faqs.org/rfcs/rfc822.html (see section 5)

Advertisements

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s