My Secret Life as a Spaghetti Coder
home | about | contact | privacy statement | getting started with cfrails
(or your favorite regex engine)

I have a feeling this post is going to go over like a ton of bricks.

The subject of regular languages, context free languages, and just formal language theory in general caught my eye today with a question about prime numbers. This was one of my favorite classes as an undergraduate, so I thought I'd join in the discussion.

Raganwald quotes Sam, who said
If you can provide a regular expression such that it actually matches the string representation of only prime (or only non-prime) integers, that would be pretty sweet. A proof that such a thing could not be created would be equally impressive.
(Sam also linked to a blog post that linked to another that constructed a regular expression to decide if the number of 1s in a string of 1s is not prime, or /^1?$|^(11+?)\1+$/.)

A couple of comments from people at Raganwald mentioned using the pumping lemma for regular languages to prove that such a thing was not possible.

Indeed, even determining if 1* is not prime (the regular expression above) can be shown not to be a regular language. Using the pumping lemma for regular languages:
  1. Let the language L = { 1i, i is a number greater than 0 and is not prime }
  2. Assume L is a regular language. Then, by the pumping lemma, there exists a number p >= 1 such that every string w in L of length p or greater can be decomposed into substrings w=xyz, where |y| (length of y) > 0, |xy| <= p, and for all i >= 0, xyiz is also in L.
  3. Choose a string, w from L whose length is greater than p.
  4. Since there are infinitely many prime numbers, we can find one greater than any p.
  5. Therefore, we can choose an i in w=xyiz such that repeating it some number of times will be a prime. We arrive at a contradiction, and note that because w cannot be pumped, L is not a regular language.
Another commenter mentioned that "Regular expressions these days can match any context-free language through the use of sub-expressions." Clearly, since our language L is not regular, but it is matched by a regex, we can see that today's regexes are more powerful than regular languages would allow them to be. But, is our regex even restricted enough to be a CFL?

A similar proof using the pumping lemma for CFLs would show that our language L is more powerful than even a CFL. (Don't let me slide here if that's not the case.)

Still, that doesn't tell us anything useful for the problem at hand - only that the regexen of (at least) Perl and Ruby are more powerful than CFLs. But how much more? If we want to prove that a regular expression (in the practical sense of the word) cannot take a string representation of a number, (e.g., "13" or "256") and determine if it is not prime (or prime), then we need to know how powerful regex are.

But I don't know where to start on that front. Any ideas?

Alternatively, if we want to prove that it can be done, we need only demonstrate so by coming up with the regex to do it. I'm not convinced it's possible, but I'm not convinced it's not possible either. Ideally, I'd like to find the formal definition of how powerful regex are, and prove at least that we don't know if the language is in that class or not. (The pumping lemmas, for example, are necessary but not sufficient to prove membership of L amongst the respective class of languages.)

Comments are appreciated. I'm sort of stuck at the moment, so I wanted to bounce these ideas out there to see if someone might bounce one back.

Hey! Why don't you make your life easier and subscribe to the full post or short blurb RSS feed? I'm so confident you'll love my smelly pasta plate wisdom that I'm offering a no-strings-attached, lifetime money back guarantee!


Comments
Leave a comment

Can you send me some questions on the Pumping Lemma, so that I can practice using it to decide if a language is regular or not.

Posted by Piyush Patel on May 07, 2008 at 04:56 AM UTC - 5 hrs

Hi Piyush. I can't think of any off hand, but I noticed that http://www.google.com/search?q=pumping+lemma lists a few examples and practice problems.

I can also recommend Intro to Automata Theory from Hopcroft, Motwani, and Ullman, as it's the book I read:

(link is to Amazon.com) http://tinyurl.com/6zlde8

It's a slim book for the price, but if you're looking to gain a solid understanding of the topics, it helped me out.

Posted by Sammy Larbi on May 07, 2008 at 07:11 AM UTC - 5 hrs

Leave a comment

Leave this field empty
Your Name
Email (not displayed, more info?)
Website

Comment:

Subcribe to this comment thread
Remember my details
Google
Web CodeOdor.com

Me
Picture of me

Topics
.NET (21)
AI/Machine Learning (12)
C++ (4)
cfrails (22)
ColdFusion (80)
Customer Relations (15)
DRY (18)
DSLs (11)
Future Tech (4)
Games (5)
Groovy/Grails (7)
IDEs (8)
Java (41)
JavaScript (2)
Lisp (1)
Mac OS (1)
Miscellany (59)
OOAD (34)
Programming (97)
Programming Quotables (5)
Rails (18)
Ruby (50)
Save Your Job (40)
scriptaGulous (4)
Software Development Process (22)
TDD (39)
TDDing xorblog (6)
Tools (3)
Web Development (2)
YAGNI (10)

Resources
Agile Manifesto & Principles
Principles Of OOD
ColdFusion
CFUnit
Ruby
Ruby on Rails
JUnit

Reading List
InfoQ - Agile, Ruby, & more
Ray Camden - Cf Jedi Master
Ron Jeffries' XP Mag
Peter Bell's Application Generation
Others coming...

RSS 2.0: Full Post | Short Blurb
Subscribe by email:

Delivered by FeedBurner