MySQL Deadlocks in ZSQL Methods

17 March 2006 | Zope | No Comments

I ran across an issue with a deadlock in a ZSQL method, which seems obvious looking back on it, but which took me a while to figure out.

The problem is that I have a table containing a queue of rss feeds and I have several threads that will use that queue to grab a feed and parse it. I have a column “available” that is 1 if the feed is up for parsing and 0 if the feed is being parsed or has been parsed.

I was using SQL roughly like this:

set @id=(select min(id) from queue for update)
update queue set available = 0 where id=@id
select @id

And I was consistently getting deadlocks. The above operation takes milli if not microseconds, so I couldn’t understand what the issue was.

The problem was that after executing the above SQL, I was then parsing the feed and since all of that (SQL + parsing) was in a Zope transaction, all the other threads were waiting not only for the SQL to finish, but for the parse to finish as well. Thus, what I thought should take milliseconds was taking seconds.

The solution is to call get_transaction().commit() after the SQL call. That commits the Zope transaction and in so doing, commits the MySQL transaction as well.

Technorati Tags: , ,

ZODB-less Zope

12 March 2006 | Zope | No Comments

We built the Loomia portal and Loomia Recommendations using Zope 2.7.6. There are many aspects of Zope that are marvelous (Python, ZPT and ZSQL methods in particular) and I’m planning to write about them in greater detail later. We’ve decided, however, to switch from the Zope Object Database (ZODB) to using MySQL as our datastore. For the benefit of others who are considering Zope, I decided to outline the reasons for our decision.

Background

We had a number of reasons for choosing Zope and the Zope Object Database (ZODB) in particular. The most compelling being the ease with which we could handle arbitrary metadata. We started off enamored of the semantic web (and FOAF in particular); the prospect of not having to nail down an RDBMS schema was very appealing. Even when we’d abandoned the semantic web, still, the fact of being able to capture metadata from an RSS feed (for both the portal and the Recommendations service we use RSS feeds) was seductive.

What we learned, however, is that that kind of flexibility is more costly than it seems and less useful that it seems (at least in our case). After a while, we got a good handle on the RSS data that we were really interested in, so the appeal of handling arbitrary data diminished considerably. And, of course, you can handle arbitrary data with an RDBMS, it’s just not quite as painless as it with a Python-based object database like the ZODB.

After running a hybrid system that uses both the ZODB and MySQL, we’ve decided to abandon the ZODB in favor of a strictly MySQL-based system. We still like Zope for many reasons that I’ll detail elsewhere, but the ZODB has to go. The reasons for this are:

At the outset, I should say that this discussion applies to Zope 2.7.6. The current version of the Zope 2 series is 2.9. There is also a Zope 3 product, but code written for Zope 2 has to be ported to Zope 3 (or rewritten). Furthermore, the jump from Zope 2.7 to Zope 2.8 was a big one and has probably addressed some of the issues that I’m raising here. So, anyone reading this should certainly take a look at 2.8 or 2.9 or 3 before dismissing the ZODB. Moreover, I’m not in any way trying to argue that the ZODB is a bad technology; I simply want to explain why we’re not going to use it anymore. It’s a well known fact that Zope suffers from a dearth of documentation, so to the extent I can document my experiences for others, I’d like to do so.

Performance

My thesis is that building a highly dynamic, high traffic, high performance website using a large ZODB is an enormous technical challenge; a much greater technical challenge than doing the same with an RDBMS.

Our portal serves on the order of a three-quarters of a million podcasts and videocasts coming from maybe 20,000 feeds. You can search for syndicated media, rate podcasts and videocasts, write reviews of them, and get recommendations based on your preferences. We’re running it on dual Xeon box with 4 GB of RAM. We use a Zeo server with two Zeo clients coupled with Squid and Pound. We’ve been consistently frustrated that our Zope installation just doesn’t perform up to our expectations. We aren’t Craigslist or MySpace. We’re not getting Slash-dotted; our numbers are not huge. For the traffic we get, our system should perform better than it does. We regularly have complaints about page rendering speed and search times (done with the ZCatalog). There are many reasons for our performance issues and some of them certainly relate to poor programming or architecture decisions that I made. My frustration with the ZODB is that these problems are hard to isolate and diagnose. It’s true that performance issues are always challenging, but with the ZODB much more than with an RDBMS, you’re really venturing in a world of arcane, recondite knowledge. Many people reading this will tell me about their sites that are bigger than mine, with more traffic, and brilliant performance. My first question would be: how dynamic is your site? Alas, it’s very difficult for us to cache much of our site and so although Zope’s caching machinery is good, we can derive only marginal benefit it. Alternatively, maybe others do manage to build highly dynamic, high traffic sites with great performance; in that case, my guess is that such people know Zope far, far better than I do (I’ve been at it for 2 years) or they’re smarter than I am or they’re luckier than I am. The point is for “average” developers it shouldn’t be this hard to build a site that performs well under moderate-load.

Documentation

It shouldn’t be hard, but using the ZODB, it is hard. And part of the reason it’s hard is that Zope is extremely poorly documented. There are volumes and volumes of documentation, but most of it is out-of-date. Take the Zope Developer’s Guide, for example. That’s a must-read for Zope developers. The only problem is that it hasn’t been updated in about 4 years. The Zope website is an egregious illustration of this problem: do a search on performance and the first hit you get is a document from 2001. I’ve long since given up using the Zope website as a source of documentation. There are resources: there is the excellent Zope book, maintained online by Chris McDonough (an outstanding Zope developer) and there are mailing lists whose members respond to enquiries with a regularity and rapidity that is truly commendable. Still, working with Zope is not like working with PHP, Java, or MySQL in terms of the information available.

So, while documentation is a big problem with Zope in general, with regard to the ZODB it’s particularly serious because no one (that I know of, anyway) has published a best-practices for building a ZODB-based website. Or a list of ZODB gotchas. Which brings me back to our performance problems: they may well have been caused by bad choices I made, but I had no way of knowing beforehand that the particular solution I chose was going to be a performance problem down the road.

Maintenance

One of the great virtues of the ZODB is that you can undo any transaction. This is extremely useful is you want to be able to rollback your transactions. If you don’t, it’s not only useless, but a maintenance hassle. We don’t need to rollback our transactions, but there’s no way to turn off undo. Instead, what you need to do is pack your database. Packing the database isn’t that big a deal, but it takes a while (20 minutes maybe) and it pins the CPU while it’s happening. So, you have to set up a cron job that runs at 2 o’clock in the morning to do this everyday. If all your traffic is US-based, then doing the pack in the middle of the night isn’t a big deal, but if you’ve got traffic coming from elsewhere, then since the middle of the night in San Francisco is the middle of the day in Japan, your Japanese users may be inconvenienced during that 20 minutes of CPU-pinned packing.

Technical Expertise/Portability

Simply put, it’s easier to find people who know the arcana of MySQL (or RDBMS in general) then it is to find people who are expert in the ZODB. Related to this is the fact that from MySQL it’s a short-trip to PostgreSQL or any other RDBMS. The ZODB is (roughly) sui-generis.

Conclusion

The ZODB has many uses, but dynamic, high traffic websites are perhaps not the ideal one. Zope coupled with an RDBMS makes for a wonderful combination. You get the benefits of Python, Zope Page Templates, ZSQL Methods in combination with the vast body of readily accessible knowledge about RDBMS scalability.

For an alternative viewpoint, I encourage you to read this article: “How to Love the ZODB and Forget RDBMS.”

Technorati Tags: ,