Welcome to Geek Cave Creations
Safe, Reliable Insanity, Since 1961!

Blog: The Making of Program O Version 3

By Dave Morton

As some of you folks know, I've begun working on the next incarnation of Program O, namely, version 3. This new version will be a major departure from version 2. In fact, it's going to be so radically different that it won't be "backwards compatible", and will require a fresh install and a new database. I'm going to try to chronicle the process of creation, and this is the start of that process. Here, I'll outline some of the major changes to the project, and I'll explain the reasons behind these changes.

The first, and by far the most fundamental change with version 3 of Program O is the migration from procedural code over to using classes to instantiate objects. By switching to a more modular approach, I'm hoping to take advantage of certain key features that you just can't get from procedural code, such as greater flexibility, re-usability, and a more robust structure. Classes and objects also solve issues with scope, which was a prickly problem at times with both versions 1 and 2.

Another change involves using PDO (PHP Database Object), rather than the older, less flexible, and greatly less secure mysql_* functions that version 2 uses. In addition to being more secure (provided one uses prepared statements), PDO is also more efficient, and requires fewer lines of code to do the same tasks that mysql_* functions do.

Also, one of the other major changes revolves around parsing the AIML templates that are returned from the database. since AIML is a subset of XML, it only makes sense to parse these templates as XML. that's where PHP's SimpleXML functions come in. I've done some extensive testing of AIML parsing using SimpleXML, and I've found that it's incredibly simple to parse AIML tags using these methods. My tests have shown that I can reduce the amount of code to parse even the most complex AIML templates by as much as 60%! In fact, the more complex the code, the more efficient these methods become. Imagine, parsing a <RANDOM> tag with 30+ <LI> choices with only 5 lines of code! And this brings me to the final change that I want to describe right now. <SRAI> tags.

One of the things that I've experienced with working on Program O, starting with version 1, was that it's a huge performance hit to come across an <SRAI> tag, mainly because when one of these tags is encountered, the script has to save it's current state, and then begin an entirely new search through the database so that it can obtain the content desired, then it has to pick up where it left off, to complete the parsing of the template. If a given template has, say three or more <SRAI> tags, the script slows down noticeably. Well, I've come up with a better approach. Now, when an <SRAI> tag is encountered, the script takes a three-tiered approach. first, it looks directly for an exact pattern match in the database, rather than trying to find ALL potential matches, and then selecting the "best match", like it does initially. If it finds a direct match, it does a quick parse of the contents, and continues on, saving HUGE loads of time and code. If it fails to locate a direct match, then it searches a special lookup table for previously encountered <SRAI> tags. If it finds a match in the lookup table, then it goes back to the AIML table, looking for the proper category by ID number, again, getting the template directly, and parsing continues from that point. If it fails to find a matching AIML category in the lookup table, THEN it does a "best match" search, selecting the category with the best score, and at that point, it stores the matching category's ID in the lookup table, so that if the script encounters that same <SRAI> tag again, it can pull the proper category from there, again accessing the proper category by ID. This process will actually be slower at first, since there will be nothing in the lookup table, initially; but as users chat with the bot, the lookup table will become populated with previously encountered <SRAI> tags, and performance will increase.

There are other, less "Earth shattering" changes, as well, but these changes that I've outlined will serve to improve efficiency, performance and security, and will make for a more robust chatbot. I'll also be including parsing support for the new AIML tags that were created for Pandorabots' CallMom app, so an enterprising botmaster will be able to tie their chatbot to CallMom, as well.

One last thing that's being considered for future releases is the possibility of supporting "extended" wildcards and "wildcard groups". More on this in a later post. :) Feedback is always welcome, of course. Please feel free to use my Contact Page to send me a message. :)