Using the HTMLControl Class in Adobe AIR to parse HTML as a data source
One of the cool features of Adobe AIR (especially for Flash developers) is its ability to render full featured HTML within Flash content. The rendering is handled by the WebKit core, and the HTML content can be from both local and remote URLs as well as from a string of HTML text.
HTML rendering within Flash content is handled by the ActionScript HTMLControl class (which is wrapped by the HTML component in Flex). The HTMLControl class is a DisplayObject instance (it directly inherits from Sprite) and thus renders its HTML directly to the display list.
However, it is possible to load HTML content into an HTMLControl instance, without placing it on the display list to be displayed. HTML content is still loaded and executed, and its DOM is exposed to the scripting environment. This means that you can essentially use the HTMLControl to load and parse HTML with the sole purpose of retrieving data from the HTML… i.e. using the HTMLControl you can treat HTML as a data source, as if it was XML.
This is done by leveraging the JavaScript APIs within HTML for manipulating the HTML DOM.
Below is a simple example that loads some HTML, and then uses various JavaScript DOM APIs to extract data and information from the HTML. Note that the HTML is never displayed or rendered to the screen.
index.html
<html>
<head>
<title>Example HTML Page</title>
</head>
<body>
<h1>Title 1</h1>
<p>This is some sample text for title 1</p>
<p id="foo">This is the foo id</p>
<p><a href="http://onair.adobe.com">onair</a></p>
<p><a href="http://www.adobe.com/go/air">Adobe AIR</a></p>
<ul>
<li>List Item A</li>
<li>List Item b</li>
</ul>
</body>
HTMLParsingExample.mxml
<?xml version="1.0" encoding="utf-8"?>
<mx:WindowedApplication xmlns:mx="http://www.adobe.com/2006/mxml"
layout="absolute">
<mx:Script>
<![CDATA[
private var html:HTMLControl;
private function onLoadAndParseClick():void
{
html = new HTMLControl();
html.addEventListener(Event.COMPLETE, onHTMLLoadComplete);
html.load(new URLRequest("app-resource:/index.html"));
}
private function onHTMLLoadComplete(e:Event):void
{
//get a reference to the top level html document
var document:JavaScriptObject = html.window.document;
/********** find number of links in html page ************/
//grab all of the links in the document
var a:JavaScriptObject = document.links;
//get the length
var len:int = a.length;
trace(len + " links in html page.");
/*********** Find element by ID and get its value ***********/
var foo:JavaScriptObject = document.getElementById("foo");
trace(foo.innerText);
/*********** Use the document DOM parsing API to parse out LI items **********/
//get all of the UL items
var lists:JavaScriptObject = document.getElementsByTagName("ul");
//make sure we found some
if(lists.length > 0)
{
//grab the first one
var ul:JavaScriptObject = lists[0];
//get the child nodes
var childNodes:JavaScriptObject = ul.childNodes;
var childLen:int = childNodes.length;
var tempNode:JavaScriptObject;
//loop through the nodes looking for LI elements
for(var j:int = 0; j < childLen; j++)
{
tempNode = childNodes[j];
if(String(tempNode.nodeName).toLowerCase() == "li")
{
//print the value of the LI element
trace("LI Found : " + tempNode.innerHTML);
}
}
}
}
]]>
</mx:Script>
<mx:Button label="Load and Parse" right="10" bottom="10" click="onLoadAndParseClick()"/>
</mx:WindowedApplication>
One thing to remember is that when working with the JavaScript APIs from within ActionScript, most of the APIs return JavaScriptObject, JavaScriptArray, and JavaScriptFunction instances (and not ActionScript Objects, Arrays and Functions).
You can view the API docs for all of the AIR classes here.






This is slick! You could do some cool stuff with microformats using this.
enefekt
9 Nov 07 at 7:16 am
Hi Mike,
Is it possible to do this inside the browser? Or is there a way to port it?
TIA.
Pedro.
Pedro
9 Nov 07 at 8:58 am
>Is it possible to do this inside the browser? Or is there a way to port it?
No, this is AIR only.
mike chambers
mesh@adobe.com
mikechambers
9 Nov 07 at 9:42 am
This does NOT work. Upon packaging and installation, the “Load and Parse” button did NOTHING when clicked. No error message was given either. The html file was NOT loaded, despite the fact that I packaged it with the application.
I even modified the onLoadAndParseClick() function thus:
private var html:HTMLControl;
private function onLoadAndParseClick():void
{
var html:HTMLControl = new HTMLControl();
html.addEventListener(Event.COMPLETE, onHTMLLoadComplete);
var urlReq:URLRequest = new URLRequest(“index.html”);
html.width = stage.stageWidth;
html.height = stage.stageHeight;
html.load(urlReq);
addChild(html);
}
and obtained the same result – NOTHING. What is wrong with your code?
Elijah
10 Nov 07 at 1:25 am
>What is wrong with your code?
I just retested this and it is working fine for me. What version of Flex Builder are you running? Which version of AIR?
Might I also suggest that you use some of the excellent debugging tools provided by Flex Builder to figure out what is going wrong? If you don’t have Flex Builder, then just put in some trace statements to find out where the code is not working.
Hope that helps…
mike chambers
mesh@adobe.com
mikechambers
10 Nov 07 at 1:42 am
@ Elijah: While I have not tested the code, in theory the code seems to be fine… I attached the principle to a Flex Air based webpage editor using TinyMCE that grabs values from the page and converts it to Flex variables.
Thanks Mike!!! The examples helped, this and your sample editor helped go a long way building a flex based WYSIWYG editor for my air apps, I hope I can clean up the code and release a sample on my site by Monday…
ssandy
10 Nov 07 at 11:34 am
Wouldn’t be easier to use regular expressions to parse the XML content?
Samuel Agesilas
11 Nov 07 at 11:30 am
>Wouldn’t be easier to use regular expressions to parse the XML content?
Well, I think that depends in part on how well you know regular expressions.
But, in general, I think the method above is easier and less error prone than using general expressions.
mike chambers
mesh@adobe.com
mikechambers
11 Nov 07 at 12:40 pm
[...] Mike Chambers – Using the HTMLControl Class in Adobe AIR to parse HTML as a data source [...]
Prisoned in the Digital World… » Blog Archive
12 Nov 07 at 12:53 am
Hello Mike,
Congratulations, this looks very promising.
First question:
You mention “HTML content is still loaded and executed, and its DOM is exposed to the scripting environment. ”
If there is any javascript inside the HTML page, as the DOM is accessible, can we execute any JS and get back the result as JavaScriptObject, JavaScriptArray, and JavaScriptFunction instances as u mention?
Sorry about this newbie question but what is the difference between
JavaScriptObject ActionScript Object
JavaScriptArray ActionScript Object
Is there a way for converting them?
Bazard
13 Nov 07 at 11:01 pm
Hi Mike, looks like Elijah was not the only one having problems with this. I also get a blank container (nothing loads into the HTMLControl component)
My code looks like this:
——————————————————-
——————————————————
Any ideea ?
Many thanks !
Doru Adrian
16 Nov 07 at 1:49 am
[...] I attended Mike’s talk in AIR last year and have been closely following the progress of AIR since then. He gave a brief overview of the makeup of AIR and what’s possible. He then created a very simple HTML editing application from scratch and showed the various security options within AIR – this was new. Apparently you can sign AIR apps with a VeriSign certificate so people know they can trust the source(a good idea). Web: http://www.mikechambers.com/blog/ Source: http://www.mikechambers.com/blog/2007/11/09/using-the-htmlcontrol-in-adobe-air-to-parse-html-as-a-da... [...]
Flash on the Beach 2007 « Pauls Bit
19 Nov 07 at 1:36 am
This means such an application would show data from other websites even if there is no rss to read from.Very useful, at least for price comparisions etc. But is this a permissible thing and btw how the website owners would know if their data is being used by such an application?
Ashwinee
21 Nov 07 at 8:29 pm
This is great. I’m trying to do some similar stuff, and I had a question about the HTMLControl.
I can’t figure out how to intercept a URLRequest that is constructed by clicking on a link within a rendered HTML page, or by javascript within a rendered page. For example, I load an initial web page with a form that has a submit air UI), and click submit. I can’t figure out how to get read access to the URLRequest that is being sent as a result of clicking the submit button. Further, I would like to be able to intercept that request, possibly modify some of the submission parameters, and send it later with a stand-alone URLLoader.
It’s looking more and more like this isn’t possible. If this was possible, an air tool could be developed which automates a user progressing through a series of web pages to get a set of information at the end.
Noah
6 Dec 07 at 11:34 am
Mike:
I’ve been trying to get PDFs to display in Flash, but I can’t even get this thing started!
***
import flash.html.HTMLControl;
import flash.html.HTMLPDFCapability;
trace(HTMLControl.pdfCapability);
***
Just doing this creates errors…
Please help!
Rob Decker
Rob
20 Dec 07 at 7:15 am
HTMLControl is now HTMLLoader
This threw me a little as I was writing code in Flex 3 and compling in Flash and I apparently had different sets of classes
Updating Flash to Beta 3 did the trick
jvc
11 Jan 08 at 11:59 pm
Would someone mind leaving a simple sample of working with the dom for flexbuilder3 and AIR?
I am building this air app that needs to set a value in the dom and I have yet to find a tutorial, video or otherwise that actually runs without bombing.
The above example gives a null object reference, even after changing out HTMLControl for HTMLLoader.
Anthony
13 Feb 08 at 3:32 pm
This came through in the adobe forums with some info on how to make dom manipulation work on the latest adobe platform.
http://www.adobe.com/cfusion/webforums/forum/messageview.cfm?catid=641&threadid=1337341&CFID=14549621&CFTOKEN=2ebeb71cf3076366-103C2E1E-FBD7-FC12-79E112E46FF334E0&jsessionid=483084bbddbb69541dcc
Anthony
13 Feb 08 at 6:29 pm
Hi Mike,
Just wondering if there is any way to do this from the Flash IDE and not through Flex?
Cheers,
Jassa
jassa
27 Mar 08 at 8:06 pm
Ignore the above message – forgot to import the class. My bad!
jassa
27 Mar 08 at 8:19 pm
Hi,
I am trying to develop an air application that shows a list of radios. The problem is the stream format. I need to show it in windows media player embedded in a html. I use this control (mx:HTML), but the wmp don´t appears when i run the application.
Is this a bug of the mx:HTML control?
Thanks,
Paul
Paul
9 Oct 08 at 8:49 pm
[...] Using the HTMLControl Class in Adobe AIR to parse HTML as a data source [...]
Loading data in Flex 101 « :maohao:
16 Dec 08 at 10:14 pm
Hi Mike, in which AIR package is the JavaScriptObject type? In AIR 1.5 it requires the class to have an import statement to be found.
Thanks and great job!
Frito
31 Mar 09 at 11:46 am
Hi Mike :)
I have a proplem similar like Noah,s reply.
I trying to intercept a URLRequest that is constructed by clicking on a link within a rendered HTML page,
Becasuse I want to push some URLRequestHeader for all links (Requests) in the html (cliked by the user)
Any Idea how Ican do that
Thanks alot ;)
Lucas
8 Jun 09 at 1:31 pm
plz let me know , can we add text in pdf instead of using addImage method. If no then let me know can we add more then one images from editor…?
Muhammad Usman Ashraf
10 Nov 09 at 1:43 am
Hi Mike,
The article is fantastic and Kudos to you in bringing this out with so much clarity.
Would like to clarify something. this example does not work on Adobe AIR 1.5 and greater versions. Has the javascript object been depricated? If yes are you aware of any alternatives to get this working on Adobe 1.5?
Naveen
30 Dec 09 at 4:57 am
Hi Mike,
It’s very bad as pdf displaying in HTMLLoader will have to always remain onTop all other components. Or is there any way that a MovieClip or Button instance can be ontop pdf, just as to prevent users from selecting text or mouse click or right click on the pdf document.
Or anyone can solve this problem pls let me know
Kielsoft
8 Feb 10 at 5:13 am