Peter Bell's
presentation on LightWire
generated some
comments I found very interesting and thought provoking.
(Perhaps Peter is not simply into application generation, but comment generation as well.)
The one I find most interesting is brought up by several people whose opinions I value -
Joe Rinehart,
Sean Corfield,
Jared Rypka-Hauer, and others during and after the presentation.
That is: what is the distinction between code and data, and specifically, is XML code or data
(assuming there is a difference)?
The first item where I see a distinction that needs to be made is on, "what do we mean when we are talking about
XML?" I see two types - XML the paradigm where you use tags to describe data, and the XML you write - as in,
the concrete tags you put into a file (like, "see that XML?"). We're talking about XML you've written, not
the abstract notion of XML.
The second idea: what is code? What is data? Sean Corfield offers what I would consider to be a concice,
and mostly correct definition: "Code executes, non-code (data) does not execute." To make it correct (rather
than partially so), he adds that (especially in Lisp) code can be data, but data is not code. You see this
code-as-data any time you are using closures or passing code around as data. But taking it a bit further -
your source code is always just data to be passed to a compiler or interpreter, which figures out what the
code means, and does what it has been told to do.
So is XML code? Certainly we see it can be: ColdFusion, MXML, and others are languages where your
source code is written (largely) in XML. But what about the broader issue of having a programmatic
configuration file versus a "data-only" config file?
Is the data in the config file executable? It depends on the purpose behind the data. In the case of data
like
<person>
<name>
Bill
</name>
<height>
4'2"
</height>
</person>
I think (assuming there is nothing strange going on) that it is clearly data. Without knowing anything about the
back end, it seems like we're just creating a data structure. But In the case of
DI (and many others uses for config files),
I see it as giving a command to the DI framework to configure a new bean. In essence, as Peter notes,
we've just substituted one concrete syntax for another.
In the case of XML, we're writing (or using)
a parser to send data to an intepreter we've written that figures out what "real" commands to run based on
what the programmer wrote in the configuration file. We've just created a higher level language than we had before
- it is doing the same thing any other non-machine code language does (and you might even argue
about the machine code comment). In the configuration case,
often it is a DSL (in the DI case specifically, used to describe which objects depend on which other
objects and load them for us).
While we don't often have control structures, there is nothing stopping us from implementing them,
and as Peter also notes, just because a language is not
Turing complete), doesn't mean it is not
a programming language. In the end, I see it as code.
Both approaches are known to have their benefits and drawbacks, and choosing one over the other is largely a matter
of personal taste, size and scope of problem, and problem/solution domain. For me, in the worlds of
JIT compiling and interpreted langages, the programmatic way
of doing things tends to win out - especially with large configurations because I prefer to have
the power of control structures to help me do my job (without having to implement them myself).
On the other hand, going the hard-coded XML route is especially popular in the compiled world, if not
for any other reason than you can change configurations without recompiling your application.
I don't make a distinction between the two on terms of XML is data, while programming (or using an in-language DSL)
in your general-purpose language is code. To me, they are both code, and if either is done incorrectly it will
blow-up your program.
Finally, I'm not sure what value we gain from seeing that code is data (and in many cases config data is code),
other than perhaps a new way of looking at problems which might lead us to find better solutions.
But that isn't provided by the distinction itself, just the fact that we saw it.
Comments, thoughts, questions, and requests for clarifications are welcome and appreciated.
Hey! Why don't you make your life easier and subscribe to the full post
or short blurb RSS feed? I'm so confident you'll love my smelly pasta plate
wisdom that I'm offering a no-strings-attached, lifetime money back guarantee!
Leave a comment
Hi Sam, Interesting comments! I pretty well agree with your statements, although my conclusions differ. I find that when you start to remove the distinction between "data" and "code" a lot of possibilities present themselves (especially in the world of metaprogramming) that you wouldn't even consider if you thought and API and an XML DTD to be fundamentally different things. Suddenly yo base your concrete syntax choices on heuristics based on your use case rather than your initial assumptions about whether you are programming or configuring and that flexibility can lead to a wonderful collection of interesting solutions to problems.
Posted by
Peter Bell
on May 19, 2007 at 06:44 PM UTC - 5 hrs
There's no question in my mind that *any* data can be seen as code and vice versa. More interesting is why we care about the distinction. The psychology of programming is quite intriguing, and I see code/data distinctions as being part of the same phenomenon as our striving to make our code take on characteristics other than just formal correctness. In fact we tend to use intrinsically fuzzy ideas like "elegance", "high cohesion", "do The Right Thing" both as surrogate measures for formal correctness and as goals in themselves in the very common case where no definition of correctness has been or will be made, or is even seen as desirable.
My take on why we distinguish between code and data is to provide a clue as to what we want to treat as primary and what we want to treat as secondary in any given development effort.
So I would say that "code" is a code word (no pun intended) for that part of the system configuration that we are most interesting in tinkering with in a given development context, and "data" refers to that part of the system configuration whose impact on system behaviour we will attempt to understand by understanding it's impact on the behaviour of the "code". In this sense, if you're develop an app, things like versions of libraries and OS are "data" - and, lo and behold, are often represented as such in config files, conditional build directives, etc. OTOH, if you're bringing out the next version of an OS, all the existing apps are "data" for input into your compatibility testing.
Posted by Jaime Metcher
on May 19, 2007 at 08:38 PM UTC - 5 hrs
@Jamie, I think the distinction you make is REALLY interesting. I'm not sure the specific intent you select is unequivically the correct one (although neither is it wrong), but it definitely opens an amazingly interesting dicsussion.
Another way of viewing code vs data (again, probably not right - just another perspective to consider ad possibly discard) is the idea of the skill level required to edit. Much software has some degree of predictable variability (in a software product line, you spend quite some time scoping out such variabilities) so it often makes sense to encapsulate that variability by breaking out some kind of easy to use variation mechanism (content management system, config file, etc.). So one perspective is that data is stuff you expect to need to be able to change more frequently and have packaged to make editing easier whether by removing it from your class files (in Java or C#) to save the compilation step, or by putting it into a simple XML file or CMS to allow for the management of the data by less technical users.
Just another thought . . !
Posted by
Peter Bell
on May 20, 2007 at 01:27 AM UTC - 5 hrs
Peter,
Good point...and to recast your description back into my terms (not because mine are better, but to show it can be done), the app developer is considering the config file to be data because his/her main focus is the application code. Even the developer assigned to creating the config file format and config file parser is probably focussing more on the parser than the actual config file.
The "less technical user" who is editing the config file is, however, highly likely to describe what they are doing as "programming" - as in "I'll just program the VCR to record tonight's show" or "I've programmed my system to page me when an email from the boss comes in" - which makes the config file their program code.
Posted by Jaime Metcher
on May 20, 2007 at 02:57 AM UTC - 5 hrs
Point of order:
CFML is *not* XML, nor can it be made to be. Nor is/was the intention of Allaire / Macromedia / Adobe to have it considered as XML (it would be embarrassing for them, as it would make it seem like they don't understand what XML is...).
Something is either XML or it isn't, it cannot be "largely XML". Just because some misguided individuals stick a slash at the end of their CFML tags (eg: <cfset foo="bar />) does not make CFML XML. No matter how hard they wish it.
CFML is *tag-based*, and CFML code could be described as "written largely in a tag-based fashion". But tag-based does not equal "XML".
CFML could be described as being "reminiscent of HTML", but I think it's beyond "a stretch" to describe it as being "reminiscent of XML". It's not.
However MXML is a good example of XML as code.
I think the question of code/data vis-a-vis XML is close to being a non sequitur. XML is a method for formalising an approach to writing text (or some text formalised in that way). Code is a formalising an approach to writing computer instructions, usually as text (or some instructions formalised in that way). XML can be code. Code can be XML. Not all code is XML. Not all XML is code.
--
Adam
Posted by Adam Cameron
on May 20, 2007 at 04:07 AM UTC - 5 hrs
@Peter - "I find that when you start to remove the distinction between "data" and "code" a lot of possibilities present themselves (especially in the world of metaprogramming)"
I guess my line about that was a bit too nuanced. Indeed, that is what I meant by "other than perhaps a new way of looking at problems which might lead us to find better solutions." All I was trying to get across is that I think those possibilities are there as a consequence of us /seeing/ that code is data, not by the /fact/ that code is data. My reasoning behind this is that many people use code as data or data as code without knowing it - only when they notice or realize it is when those possibilities open (similar perhaps to the "enlightenment" one is said to achieve when they finally "get" Lisp. Of course, they may be the same thing (the existence and our seeing it), but I'm not yet convinced of that.
@Peter and Jaime - Very good comments. I agree about the psychology behind this and the reasons we might choose to call one thing data and one thing code. Not sure I agree totally with skill level - I'd be more comfortable calling it ease of change, or abstraction ... or something along those lines. My point being that much of the time, the reason behind config files is more along those lines than "I want an idiot to be able to configure this." (although, I think it encompasses that reason too). Generally, (at least in my view) its more likely to be a case of encapsulation - no need to know the implementation details to do what you want to do.
@Adam - I was originally going to write a simple interpreter to perform some actions based on a made-up XML programming language to show the point, but I thought the mention of CFML and MXML would illustrate it better, without the overhead for what is essentially only a small point of the post (despite its title).
That said, I don't immediately see how CFML cannot be made to be valid XML - the only requirement I can think of off the top of my head is that you put a slash at the end of single-line tags like cfset, cffile, ... (which as you mentioned, "misguided" individuals sometimes do). If you feel up to it, would you explain?
As for "the question of code/data vis-a-vis XML [being] close to being a non sequitur," I felt I addressed that point when I said we are not talking about the abstract notion of XML, to which you refer (as "a method for formalising an approach to writing text"), but the concrete XML we wrote. In particular, XML is code, I posited, depending on the reason behind writing it. In the case of configuration files, I see it as code (in most instances of config files that I've seen - for example, setting colors in an application might not be considered coding).
Posted by
Sammy Larbi
on May 20, 2007 at 07:40 AM UTC - 5 hrs
Hi Sammy.
Fair enough, I see where you're coming from with the example. I just think CFML is inappropriate for the example (and, yes, the whole "let's pretend it's XML!!" is a bit of a bugbear for me. Did you guess?).
How is CFML not XML-compliant?
- <cfif>/<cfelseif>/<cfelse>.
- Any time one used a ampersand in the CFML (or, I imagine, a few other characters that are reserved / need escaping in XML).
- escaping CDATA in <cfscript> blocks might be tricky too. Or <cfquery> blocks.
But more importantly... *why*? Why would one want to have a coding language that is XML-compliant? What would the gain - in real world terms - be?
I think Macromedia were nuts to decide that Flex ought to be fully XML-compliant. Other than as an exercise, what good does it do other than adding baggage to the code? I can't imagine the compiler is every going to be doing some xpath searching through the source code? Or maybe a bit of XSLT? (hey, correct me if I'm wrong). Doesn't it just take up unnecessary space?
I can't see the "up" side.
But I think the people who are more nuts are those that go around "closing" their CFML tags when no closure is necessary (nor required by the language, given it's not... you know... XML). I can only think they're thinking "ooh, look at me mum, I'm doing XML! NEAT!" (voiced with a slightly dimwitted-sounding voice). I can't imagine that sort of notion crossing the minds of people using a less niche and less "aimed at beginners" language entertaining that sort of notion.
Adolescent. That's what it seems to me: adolescent (I've been trying to think of how I consider it, and haven't come up with a decent description yet, but that's it).
But anyway... whichever way one wishes to spin it, it's neither here nor there I guess. Each to their own.
--
Adam
Posted by Adam Cameron
on May 20, 2007 at 12:57 PM UTC - 5 hrs
Sigh.
The old "remove the CF tags on a CF-oriented blog" trick.
OK, so here's one of those paragraphs again:
How is CFML not XML-compliant?
- CFIF/ CFELSEIF / CFELSE.
...
- escaping CDATA in CFSCRIPT blocks might be tricky too. Or CFQUERY blocks.
--
Adam
Posted by Adam Cameron
on May 20, 2007 at 01:00 PM UTC - 5 hrs
Adam - I had been meaning to allow tags in this and thanks to your comment, I did it today. It is appreciated.
Also, I see your point about the special characters - I hadn't even thought to look there.
As far as making a programming language in XML, I'm not sure about the reasoning. As for real world benefits, I would guess a parse tree would be quite easy to generate using all the tooling already available for XML.
Finally, as for closing the tags in CFML, I'd be more inclined to believe people do it out of habit (or to create the habit) from HTML, rather than out of some desire to be XML compliant. In any case, all this is way off topic, so I'll go ahead and stop now =).
Posted by
Sam
on May 21, 2007 at 10:53 AM UTC - 5 hrs
Sam, Thanks for the mention and link in your post. :D I haven't been blogging a lot lately but hopefully I'll be able to start doing so again soon. Getting married (even though it was last August), moving, running a conference, buying a house and being onsite on contract have all conspired to ruin any sense of "free time" I might have had lately to blog much... hopefully I'll be able to turn that around. Anyway... as for CFML being valid XML, here's one case that blows the idea right out of the water: <cfset myVar = myQueryObject.recordCount GT 0> It's perfectly legal CF syntax and the result will be YES or NO, but as it sits, it's not XML compliant. You may be able to recast it as XML like this: <cfset myVar="#myQuery.recordCount GT 0#" /> But if you did that people would think you'd lost your mind, you'd have typed at least 4 extraneous characters, and gained nothing. So even if most CFML could be typed out in a way that would allow an XML interpreter to parse it, there's not really any great gain AND, and here's the kicker, there's no container heirarchy that would define the xmlRoot, and there's no way to create one. There's no "base tag" that wraps all CFML and therefore CFML isn't and can't be valid XML. Then again if all you're doing is CFCs and HTML pages you could consider <html /> and <cfcomponent /> to be the base tags... but you still gain nothing.j So CFML isn't XML and forcing it to BE XML sees no gains and causes (potentially a lot of) extra work and typing. ...and with that I have to get back to work. I have more to say on the subject so maybe this afternoon/evening I'll be able to do that. ;)
Posted by
Jared Rypka-Hauer
on May 22, 2007 at 10:29 AM UTC - 5 hrs
Jared,
I definitely agree that CF is not XML, but thought it was sufficient to prove the case that XML (as it is written, not as the abstract notion of XML) can be made into a programming language.
At the time, I thought it was relatively minor adjustments to make my code XML compliant, but as Adam pointed out, there are special character cases to consider. And, as you pointed out, we don't have the appropriate container. I was clearly wrong about the details, but I think the point still stands (without me needing to create a simple language and compiler for it).
Thanks for the comment, BTW!
Posted by
Sam
on May 22, 2007 at 10:33 AM UTC - 5 hrs
Sam,
Agreed, at least in the sense that XML can be used to provide a foundation for the syntactical vehicle we would call a "programming language".
I wasn't trying to shoot down your post at all, just providing a clarification on one point in your post.
As for creating an example, you don't really need to do that as Fusebox provides an XML "vocabulary" that the Fusebox parsing engine will compile down to the appropriate CFML equivalents. Fuseboxers can use it to condense a lot of the very tiny CFML files the end up writing directly into their circuit.xml files. The biggest issue with it is that in order to provide the level of determinism it needs to actually compile the XML down to CFML it has to be very verbose:
<if condition="#this EQ that#"><true>...actions...</true><false /></if>
<set name="foo" value="wibble" />
There are also several types of loops that can be used.
All of this was engineered in to give fuseboxers the ability to write circuits directly into the XML file and alleviate a lot of the includes that need to be done for a Fusebox application, and there's been MASSIVE amounts of debate over the idea because the circuit.xml file is widely believed to be for CONFIGURATION DATA, not for programmatic execution. The argument centers around the conflict between the idea of configuration data and programmatic execution. Arguablly, the circuit.xml file, even using the XML vocabulary for programmatic constructs is simply telling the compiler to "insert the CFML that's represented by this XML into the parsed result of this XML block" and thus we have a fundamental conflict between the ideas of configuration and execution:
When is something executed?
Anyway, this is a fascinating conversation and hopefully I'll have a chance to blog about it myself soon. First we have to get internet access at home and that should happen this afternoon.
I hate moving! ;)
Posted by
Jared Rypka-Hauer
on May 22, 2007 at 11:14 AM UTC - 5 hrs
Sorry for the BRs... something went wrong with my previous post and it dropped all my carriage returns, so I inserted BRs in my second comment and that didn't work too.
Perhaps this will work? I think the issue may have been that I posted my first comment from my blog reader, not from a "proper" browser.
We shall see!
Posted by
Jared Rypka-Hauer
on May 22, 2007 at 11:15 AM UTC - 5 hrs
Yup, that was it...
Posted by
Jared Rypka-Hauer
on May 22, 2007 at 11:16 AM UTC - 5 hrs
Jared - the first one came through to me as having been marked as potential spam (due to posting through the blog reader I suppose). It didn't seem to have line breaks, but I don't know if the cfdump (I just dump the form and send it to myself as an email) stripped them out or if they were just never there. I also went ahead and removed the BRs from your next comment (as you can tell).
I enjoyed the story about Fusebox's circuit.xml file. Indeed, I was under the impression it was created precisely because too many people were having fat controllers and putting their model code in the place it didn't belong.
Clearly, I'm on the side that it /is/ programming and it is being executed. I see it as something along the lines of "well, we want to be able to have some logic that relates to controllers, but let's limit it so the programmer doesn't need to think about what goes in there - no model code can enter." Of course, I don't know what the original authors intended, but I think that's a decent guess.
In any case, I still see it as programming - just that they've limited what can be programmed so as to cut down on programmer mistakes.
Posted by
Sam
on May 22, 2007 at 11:45 AM UTC - 5 hrs
@Jared, Guess it happens to us all sometimes when we have something we're passionate about :-> Good luck with the move.
I request just one thing (and feel free to ignore). f you're going to blog on what is program vs. config then include a definitive definition of both terms. Otherwise we're all in Alice in Wonderland territory:
`When I use a word,' Humpty Dumpty said in rather a scornful tone, `it means just what I choose it to mean -- neither more nor less.'
Best Wishes,
Peter
Posted by
Peter Bell
on May 22, 2007 at 11:57 AM UTC - 5 hrs
I'll get more into this in my post, but it's not a question of code vs config, it's a question of code vs data and whether configuration should be a function of execution or operation. That is to say "Do I do read operations on my static config file or do I execute my config file as it's own block of code?"
But yeah, I think a definition would be a good thing.
Laterz,
J
Posted by
Jared Rypka-Hauer
on May 22, 2007 at 12:13 PM UTC - 5 hrs
Is such a definition possible, or is it by nature like what Humpty Dumpty said? I tried (perhaps not hard enough) to do that here- and it does seem to boil down to intent and perception, which leads me to beleive they are closer than we realize when it gets past data structures. Of course, I may have inadvertently argued from that position and went down that path simply because it was what I beleived in the first place.
It will definitely be worth reading more opinions. And good luck with the move as well!
Posted by
Sam
on May 22, 2007 at 12:55 PM UTC - 5 hrs
At a sufficiently fundamental level, all data is code and vica versa. There is a distinction when you get down into chip instructions as you are typically performing an operation on a number, but even the simplest piece of data could create many operations on many numbers.
I think you could argue that any DSL that doesn't contain constructs for (at the very least) conditional logic is a "data" language irrespective of the concrete syntax in which it is implemented (XML, method calls, in-language extension, custom tags, records in a db, a non-XML textual config file, etc.). To me the big issue is to distinguish concrete syntax, abstract grammar, semantics and intent.
They aren't completely orthogonal concerns, but they deserve to be treated independently.
Posted by
Peter Bell
on May 22, 2007 at 01:07 PM UTC - 5 hrs
Well, yes it all gets down to 0s and 1s, but I think we have to start at a more useful level (but, what that level is is certainly debateable).
I see what you mean about DSLs and not having conditional operators, but I don't know that I agree. Certainly in creating and using the "data" language is great if it can abstract out any conditional logic you might need, thus making for simpler (and hence less likely to contain errors) code, but I would probably still think of it more as code than data.
Posted by
Sam
on May 22, 2007 at 01:38 PM UTC - 5 hrs
As long as we agree it is code irrespective of concrete syntax (textual config - XML or otherwise, records in db, method calls, etc.), then I'm happy to call it code!
Posted by
Peter Bell
on May 22, 2007 at 01:49 PM UTC - 5 hrs
You and I agree it is code - and I agree that the concrete syntax is not important in the lack of distinction between code and data (we could program in images if we wanted to), but I suspect that some don't see the lack of difference, or don't agree that it exists (the lack of one).
Posted by
Sam
on May 22, 2007 at 01:53 PM UTC - 5 hrs
Leave a comment