In-memory manipulation of data often results in a pile of spaghetti code. The manipulation itself might be simple enough: grouping, aggregating, creating hierarchies, and performing calculations; but once the data munging code is written and the results are sent off to the part of the application where they’re needed, related needs continue to arise. A similar transformation of the data may be required in another part of the application, or more details may be needed: metadata, context, parent or child data, etc. In visualization or complex reporting applications particularly, after shoehorning data into some structure for a given need, one realizes that tooltips or synchronized highlights or drilldowns put unexpected pressures on the transformed data. One might address these requirements by:
- Stuffing more details and more levels into the transformed data until it’s huge and ungainly but satisfies the needs of all the nooks and crannies of the application it eventually visits.
- Writing new transformation functions that have to join some already processed node to the global data source to bring in new details.
- Designing complex object classes that somehow know how to handle all the contexts they end up in..
After building data-centric software for 20 or 30 years like I have, one begins to suspect that they are solving the same set of problems over and over again. We bring in complex loops, list comprehensions, database analytical functions, map or groupBy functions, or even full-fledged reporting engines. As our skills develop, we get better at making any chunk of data munging code clever and concise, but spaghetti still seems to proliferate.