I’ve mentioned before how no ORM is perfect, and how the best frameworks give you the option of going around the ORM when you need to. As great as OO and MVC are, neither is a panacea. I don’t care how frowned-upon it is to use custom SQL in a CakePHP project, practical stuff-that-works beats design dogma every time. Today (and yesterday) I ran into just such a situation.
Working on a CakePHP-based web app, the client wanted a particular report added to the site. The report needed to break down counts and sub-totals of revenue according to attributes that belonged to the line items, the objects being paid for, and the users themselves, given a date range. The report definitely needed to be based on the line-items. The line items “belong_to” the payments and the things being paid for (licensing applications, as it turns out), and the payments and license applications both “belong_to” the users.
It was bad enough that I had to make CakePHP’s model layer do an aggregate/GROUP BY query to get the counts and totals I need (preferable to looping through payment items and totaling them in PHP). This is clearly a case where the functionality I needed was more relational than OO, so I was already trying to hammer a square peg into a round hole (something I seem to do a lot of). Where the trouble really started was getting CakePHP to pull in the users along with the line items. CakePHP has no relationship along the lines of Rails’s “has_many :through”, but it does have “recursive” and “Containable.” Unfortunately, these aren’t smart enough to pull in the two-relationships-away data by joining the table into the query; instead, CakePHP’s crummy not-quite-ORM actually recurses through the models and pulls those in in another query. So the query for line items (joined with their payments and their applications) brings in, let’s say, 1200 rows (typical for a month), then once CakePHP has all the user IDs from all the payments that all those line items belongs to, it sticks all their ids in a list and fires off a big “where id in ([900 or so numbers here])” query. Which, as you can imagine, takes way too long, causing PHP to puke and die because the process doesn’t complete within 30 seconds. So the “proper” CakePHP method of effecting this relationship turns out to be too inefficient for this case.
So, I had to do a little custom SQL. An upside to this is that I at least have tighter control over the query (I can narrow down the fields returned with Containable, but like any time you try to do something a little bit advanced with a query in CakePHP, it gets a bit verbose and ends up actually less readable than straight SQL). After a few minutes of fiddling at the MySQL command line, I came up with this:
select payment_items.product_type, license_application_types.trade, license_application_types.license_type, (licensees.personal_state = 'IA') as in_state, sum(payment_items.amount) as fees, count(*) as count from payment_items join payments on payment_items.payment_id = payments.id left join license_applications on payment_items.thing_id = license_applications.id left join license_application_types on license_application_types.license_application_id = license_applications.id left join licensees on license_applications.licensee_id = licensees.id where payments.approved = 1 and payments.created >= ? and payments.created <= ? group by product_type, trade, license_type, in_state order by product_type, trade, license_type, in_state
This gave me pretty much what I wanted, minus all the sub-totals for different combinations of subsets of the attributes of trade, license type, and in-state/out-of-state. I got back one row for each possible combination of all those attributes though, as well as for every kind of additional fee that showed up on payments but isn’t a license application fee itself; and for each row, I got a count of line items and a total of dollars paid for that kind of item.
Now I had to figure out how to come up with all the sub-grand-totals that the report needed to have – for all out-of-state journeyman applications, all master plumber applications, etc. I could have re-run queries like this one but with each one narrowed down by adding some extra ands to the WHERE clause, but I thought I could probably get by with looping through this data and totaling up the columns that meet each set of criteria.
In Ruby, this would be a cinch – I’d just call select() on this data a few times, each time with a different block to pick out the subset I wanted to total up, then use sum() or inject() on that. PHP, however, being the COBOL of the Web, doesn’t really have the first-class functions that enable things like select() and inject(). It has some things like array_map and array_filter that take “callback” arguments, but you pass those by giving the name of the function, and if needed, the class or object it’s defined on. So I’d have to actually clutter up my class with a bunch of little methods, one describing each subset I need to total. Ugh.
But I figured that, given the usual array-of-associative-arrays “recordset” of data you get back from a SQL database, it would be pretty trivial to write a function that filtered it according to an associative array of conditions – pass it the data set and an associative-array “condition” such as “array(‘trade’ => hvac’),” and it would return the rows for which every element in the condition matched. This would be great, if only CakePHP returned data in the array-of-associative-arrays recordset format we usually get from SQL databases. But alas, CakePHP feels the need to map your data into a pseudo-ORM tree and return you junk like this:
Array(  => Array ( [payment_items] => Array ( [product_type] => application ) [license_application_types] => Array ( [trade] => [license_type] => apprentice )  => Array ( [in_state] => 1 [fees] => 400 [count] => 8 ) )  => Array ( [payment_items] => Array ( [product_type] => application ) [license_application_types] => Array ( [trade] => rapping [license_type] => master )  => Array ( [in_state] => 0 [fees] => 500 [count] => 2 ) ) [.... and so on ...]
Which rather complicates matching against a simple associative array of column_name => value. I’d have to either nest another loop or figure some way to recurse it – eeeew. I tried a more “procedural” solution – coming up with all the totals by looping through the dataset and adding the values from each row to every running total they applied to – but that ended up being the worst pile of nested-loop confusion I’ve ever written – The code to initialize all the totals to 0 alone was longer than the data being processed – and I still couldn’t get it to work right, owing to the complexity.
There’s got to be a way to get just the rows-and-columns data, right? CakePHP is bound to have a “just act like a vanilla SQL data store just this once” escape hatch somewhere, especially given how thin and anemic its ORM (more of a Table Data Gateway combined with an OAAM (object-associative-array-mapper)) is to begin with – right?
Well, I just spent my morning digging though CakePHP’s database-centric naughty bits looking for the method that gets the row-and-column data that then gets passed to a rearrange-into-a-tree method. No such luck – no such separation of concerns between fetching the records and re-structuring them was to be found, the two tasks, with rather different purposes, were being done in the same method. Even a method called fetchRow() returns stuff that is no longer recognizable as a “row.” Each Dbo* class for each type of database CakePHP supports re-implements the wannabe-object-graph re-arranging right there in the fetchResult() method, immediately after calling mysql_fetch_row() or whatever lower-level database call. There is no option to be found anywhere, no parameter that can be passed nor class variable that can be set, that turns this re-arranging off.
So instead, I had to write a method to undo it. I packaged it up with my previously-described filtering method into a Component called RawReportComponent. The better part of a workday wasted trying to find a work-around for some “magic” that’s not all that great to begin with (ooh look, we do you the favor of massaging array data according to what table it’s from!). I’d love to share this undoWannabeORM() method with the “community,” but since it’s code I wrote for work I probably shouldn’t. You could probably write it yourself, with three nested loops like I did. The real pity is that anyone should have to.