Blog
Blog > Unbundling the Washington Post

Unbundling the Washington Post

How a small startup a took a piece of Bezos’s new pie

 

In late September, 2013, we got a call from a search engine in need of our services. A vertical search engine for churches called FaithStreet. A church engine.

They were part of the Techstars NYC 2013 class, picked up by the incubator after two years of bootstrapping. They had also just raised funding from the Graham Holdings (formerly The Washington Post Company) and from other notable investors.

We were intrigued since we have been working with the Washington Post for almost a year under its Kaplan subsidiary. And coincidentally, we sit directly above the Washington Post offices of Slate.com and The Root in the West Village. Simultaneously, the news just hit that Amazon CEO Jeff Bezos personally acquired the paper for $250 million on October 1.

The deal:
Just weeks before the sale of the paper, The Washington Post agreed to sell the paper’s religion section (“On Faith”) to FaithStreet. On Faith was founded by some influential folks in Washington’s elite inner circle, and some of the articles have been written by the likes of notable figures such as Desmond Tutu. FaithStreet became the new owner of this historical archive of content.

FaithStreet would be required to identify seven years of On Faith content across the Washington Post’s vast network of sites (including Newsweek) and migrate it over to the FaithStreet.com website. FaithStreet would need to design and build a new home at http://faithstreet.com/onfaith to house the content consisting of articles, comments, videos, author profiles, and more. They would also need to write the redirects for all the On Faith content from the Washington Post to the corresponding URLs on the FaithStreet website.

All this was bound by a legal agreement that set the deadline for the migration as December 15, 2013 — all while adjusting to the fact that the new man in charge of the Post was now Jeff Bezos.

The project:
We struck a deal with FaithStreet to do all the work. In just under two months, we had to do the following:

1. Identify the content. Not an easy task since the network of WaPo sites is distributed over a myriad of subdomains, servers, CMS’s, and subfolders. The content was scattered, and the data unstructured.

2. Extract the content. The complexity of extracting information across various legacy databases and infrastructures, given the ever-present time constraint, led to the decision to crawl and scrape the content from the washingtonpost.com rather than export it. When scraping the site, we had to figure out what belonged to the On Faith religion section. We also needed to leverage PhantomJS and Selenium to extract the JavaScript content.

3. Write the redirects that would live on the WashingtonPost.com servers that would take users to Faithstreet.com. The redirects had to be type 301 (permanent), to pass PageRank from the Post to Faithstreet for SEO purposes, and written in the nginx syntax. The agreement between the Washington Post and FaithStreet permitted a limited number of rewrite rules for 1:1 matching. It turned out that less than 20 rules (regex matches) total would be required for all 13,000+ articles, several hundred author pages, index, and category pages. Considering that many URLs look like this:

http://www.washingtonpost.com/national/on-faith/the-tribe-and-the-war-on-terror/2013/03/20/e5189474-9156-11e2-bdea-e32ad90da239_story.html or
http://www.washingtonpost.com/blogs/altmuslimah/post/no-sex-on-campus/2011/10/20/gIQA5ZWy1L_blog.html

We were happy that we didn’t need nearly as many as allotted to match all the variations of patterns. The only content that needed 1:1 matches were the videos since each has a unique identifier in the URL.

4. Design a new site that would be responsive on mobile and tablets. We needed to find a designer that specializes in web user experiences, and hired the team over at Mr. UX. We attempted to create the mockups, but they were rejected since they didn’t conform to the vision of the new publication.

The Washington Post website (like many newspaper websites) are designed to mimic the print versions. FaithStreet wanted something very readable along the lines of medium.com or qz.com, with the shareability of an Upworthy (as per their design brief).

Mister UX delivered the design comps, and Josh, our developer, and Ricky sliced them into HTML and CSS.

5. Deploy WordPress (on the Twitter’s Bootstrap 3 framework) to house the content, and host it on a secure and scalable environment. WPEngine was selected since they had a promo with Techstars companies for free credits. The support team was so impressive, we ended up eventually migrating our own site to them. For strong SEO, we made sure OnFaith was on a subdirectory rather than a subdomain. We used a reverse proxy to redirect users transparently to the WPEngine site.

6. Custom functionality needed to be added to WordPress to allow FaithStreet to A/B test headlines on articles, capture reader email addresses, managing ads and popups, social sharing functionality, etc.

7. Optimize the new site for SEO. Leveraging Yoast’s SEO plugin, each article was given a clean and readable URL (unlike the old schema), a unique title, meta description, and an XML sitemap feed. We also used natural language processing using NLTK to extract relevant noun phrases from each article, serving as a basis for tagging the 13,000+ articles in the On Faith archives.

7. Install Disqus for comments. We also had to configure Disqus to prevent future comment spam by setting rules on when to close discussions.

The big switch
On December 6, the migration process had begun. It started with the engineers at the Post creating a testing server where the redirect rules would live. We had to test that the redirection was successful by crawling the site and seeing if the URLs 301’d successfully. Initially we made an error in our nginx syntax that used 302s (temporary redirects). Fortunately, we caught it, and corrected the mistake.

After a week of testing and updating the redirect rules, we ready ready to give the WaPo the green light to flip the slip the switch and go live (December 13) two days before our deadline.

The new Onfaith site started getting traffic from the Post immediately thereafter. Within a few days, Google’s index for On Faith started displaying the new URLs for the pages on the faithstreet.com domain. Also within a few days FaithStreet began to rank #1 for ‘On Faith’, and several days later sitelinks appeared. Google had correctly honored the redirects almost immediately, and the change was quicker than expected. The Google Webmaster Tools account for FaithStreet showed over 5 million inbound links that were originally pointing to the WaPo. They were successfully transferred and attributed to the new domain!

Post Migration
Fast forward a month later, the new On Faith is thriving. Many of the new posts have gone viral. One of them – 5 Churchy Phrases that are Scaring Off Millennials has garnered 27,125 Facebook likes, and has an active debate in the comments with over 341 threads at the time of this posting!

We are thankful to have participated in this historical ‘Unbundling’ of the Washington Post – the first of its kind that we know of. We are proud of the end result and our team for a job well done. We thank the dedicated Washington Post engineers that kept the original commitment during their major transition. And of course, the folks at FaithStreet for their hard work (and their business), especially co-founder Ryan Melogy who spearheaded the efforts on the FaithStreet side, and CTO Glenn Ericksen who provided significant development support.

Leave a Reply

* Required
* Required, Private

Categories

Contact Us