Case Study : ActionScript 3 Performance Optimization
Prompted by some of the work from Grant Skinner (in particular his FOTB 2009 session) and Thibault Imbert, I have been doing a lot of research lately into optimizing ActionScript 3 content. Not just how to make it run faster, but how to approach the process of optimization.
I am also starting to work on a small project which works with pixel data from images, and on which I anticipate performance might be an issue when working with larger images. I figured this would be a good opportunity to use some of the early code as a case study. I wanted to post the process and results here.
The task that I will focus on is grabbing a palette of 16 colors from an image, created by averaging the colors within that image. Upon searching on google, I found a very good solution over at soulwire.co.uk, which I will use as the base for creating the palette. I want to point out that the original code targeted Flash Player 9 (and thus couldn’t take advantage of some things such as Vectors), and already ran pretty blazingly fast.
I am using Grant Skinner’s performance test harness to profile performance. Each test is run 50 times, and is tested in Flash Player MAC 10,0,32,18 (debug) in the browser.
You can download all of the code from here.
First, here is the original test case, based on soulwire’s code:
/*
Code adapted from:
http://blog.soulwire.co.uk/flash/actionscript-3/colourutils-bitmapdata-extract-colour-palette/
*/
package
{
import flash.display.Bitmap;
import flash.display.BitmapData;
import flash.display.Sprite;
import flash.events.Event;
import flash.geom.Rectangle;
import flash.geom.Point;
import com.gskinner.utils.PerformanceTest;
public class PixelSort extends Sprite
{
[Embed(source="../graphics/image.jpg")]
public var TestImage:Class;
public function PixelSort()
{
addEventListener(Event.ADDED_TO_STAGE, onAddedToStage);
}
private var d:BitmapData;
private function onAddedToStage(evet:Event):void
{
removeEventListener(Event.ADDED_TO_STAGE, onAddedToStage);
var b:Bitmap = new TestImage();
d = b.bitmapData;
var perfTest:PerformanceTest = PerformanceTest.getInstance();
perfTest.out = trace;
perfTest.testFunction(run, 50, "averagecolors", "averagecolors");
}
private function run():void
{
var out:Array = averagecolors(d, 16);
}
public function averageColour( source:BitmapData ):uint
{
var red:Number = 0;
var green:Number = 0;
var blue:Number = 0;
var count:Number = 0;
var pixel:Number;
for (var x:int = 0; x < source.width; x++)
{
for (var y:int = 0; y < source.height; y++)
{
pixel = source.getPixel(x, y);
red += pixel >> 16 & 0xFF;
green += pixel >> 8 & 0xFF;
blue += pixel & 0xFF;
count++
}
}
red /= count;
green /= count;
blue /= count;
return red << 16 | green << 8 | blue;
}
public function averagecolors( source:BitmapData, colors:int ):Array
{
var averages:Array = new Array();
var columns:int = Math.round( Math.sqrt( colors ) );
var row:int = 0;
var col:int = 0;
var x:int = 0;
var y:int = 0;
var w:int = Math.round( source.width / columns );
var h:int = Math.round( source.height / columns );
for (var i:int = 0; i < colors; i++)
{
var rect:Rectangle = new Rectangle( x, y, w, h );
var box:BitmapData = new BitmapData( w, h, false );
box.copyPixels( source, rect, new Point() );
averages.push( averageColour( box ) );
box.dispose();
col = i % columns;
x = w * col;
y = h * row;
if ( col == columns - 1 ) row++;
}
return averages;
}
}
}
And here is the initial performance test:
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– averagecolors (50 iterations) Player version: MAC 10,0,32,18 (debug) averagecolors –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– method...................................................ttl ms...avg ms averagecolors 1264 25.28 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
First, considering what the code is doing, it is already pretty fast, taking only 25 ms to split the image into a grid, and loop through all of the pixels and averaging the values. However, there is probably some room for improvement, especially given that the original code targets Flash Player 9 and thus cant take care of Flash Player 10 optimizations such as using Vectors.
Now, the first thing I would normally do is to profile the SWF using the profiler in Flash Builder to find out where the most time is being sent. However, in this case, there are only two methods that do anything, averageColors and averageColor. averageColors is called once, while averageColor is called once for each swatch we want to create (in this case 16), and ends up looping over each pixel in the image (over those 16 calls). So these are the two areas we will focus on, with particular attention directed to averageColor.
The first thing I did was look at updating the content to Flash Player 10 by converting all of the Arrays to Vectors. I expected to get a decent boost from this, but the improvement was minimal.
Within the averageColors method, I looked at reusing the Point, Rectangle and BitmapData instances, instead of creating new ones on each iteration of the loop. Again, on the desktop this didn’t really make any difference. However, one thing to consider is that on a mobile device where memory allocation can be more expensive (and there is less RAM altogether), this change may have had a bigger impact (which I didnt test). This leads to an important point. It is important to test performance on the platforms which you are targeting, as some optimizations can have a different impact depending on where the content is running.
Next, I set the averageColor and averageColors methods as final, which allows them too be looked up at compile time (as opposed to runtime), this led to small improvement in performance, but again, not really anything significant.
At this point, I was getting a very slight performance improvement, but not really anything that mattered (basically, small enough to be insignificant),
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– averagecolors (50 iterations) Player version: MAC 10,0,32,18 (debug) averagecolors –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– method...................................................ttl ms...avg ms averagecolors 1224 24.48 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
Next, I moved on to the averageColor method, where I expected (and hoped) to have better results, as this is where the bulk of the work occurs.
First I converter some of the Numbers to ints and uints in places where Numbers were not needed. This led to a small improvement.
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– averagecolors (50 iterations) Player version: MAC 10,0,32,18 (debug) averagecolors –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– method...................................................ttl ms...avg ms averagecolors 1190 23.80 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
Next, I changed the bitmapData.getPixel call to use bitmapData.getVector. Doing this then allowed me to loop through the pixels using a single loop, instead of a nested double loop, and also eliminated a getPixel call for each pixel. I used a for each loop to loop through the pixel color values.
This provided another slight improvement (although not quite as much as I expected). We are now making some small gains.
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– averagecolors (50 iterations) Player version: MAC 10,0,32,18 (debug) averagecolors –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– method...................................................ttl ms...avg ms averagecolors 1137 22.74 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
Next, I decided to try a for loop, instead of a for each loop.
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– averagecolors (50 iterations) Player version: MAC 10,0,32,18 (debug) averagecolors –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– method...................................................ttl ms...avg ms averagecolors 282 5.64 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
Wow! As you can see, that makes a huge difference.
Finally, I explicitly cast i to an int when pulling the value from the Vector. This gave a small improvement, but again, nothing significant:
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– averagecolors (50 iterations) Player version: MAC 10,0,32,18 (debug) averagecolors –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– method...................................................ttl ms...avg ms averagecolors 268 5.36 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
I tried a couple of more optimizations in the method, around converting division operations to multiplication operation, and replacing Math.round calls but in this case it didnt make any difference.
I also looked at caching some constants used in some of the bitwise operations, changing
red += pixel >> 16 & 0xFF;
green += pixel >> 8 & 0xFF;
to
private var s16:Number = 16 & 0xFF;
private var s8:Number = 8 & 0xFF;
red += pixel >> s16;
green += pixel >> s8;
First, that optimization actually produces the wrong result (I had my operator precedence backwards). Second, it was actually slower:
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– averagecolors (50 iterations) Player version: MAC 10,0,32,18 (debug) averagecolors –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––– method...................................................ttl ms...avg ms averagecolors 349 6.98 ––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
There are two lessons from this. First, make sure your optimizations produce the same results (ideally by creating and using unit tests). Second, bitwise operations are really, really fast. In this case, they are even faster than doing a variable lookup.
So, after going through the code, and applying a number of different optimizations, I was able to improve performance from an average of 25.28 ms, to 5.36 ms, an improvement of about 470%.
Here is the final code:
/*
Code adapted from:
http://blog.soulwire.co.uk/flash/actionscript-3/colourutils-bitmapdata-extract-colour-palette/
*/
package
{
import flash.display.Bitmap;
import flash.display.BitmapData;
import flash.display.Sprite;
import flash.events.Event;
import flash.geom.Rectangle;
import flash.geom.Point;
import com.gskinner.utils.PerformanceTest;
public class PixelSort extends Sprite
{
[Embed(source="../graphics/image.jpg")]
public var TestImage:Class;
public function PixelSort()
{
addEventListener(Event.ADDED_TO_STAGE, onAddedToStage);
}
private var d:BitmapData;
private function onAddedToStage(evet:Event):void
{
removeEventListener(Event.ADDED_TO_STAGE, onAddedToStage);
var b:Bitmap = new TestImage();
d = b.bitmapData;
var perfTest:PerformanceTest = PerformanceTest.getInstance();
perfTest.out = trace;
perfTest.testFunction(run, 50, "averagecolors", "averagecolors");
}
private function run():void
{
var out:Vector.<uint> = averagecolors(d, 16);
}
public final function averageColour( source:BitmapData ):uint
{
var red:Number = 0;
var green:Number = 0;
var blue:Number = 0;
var count:int = 0;
var pixel:uint;
var pixels:Vector.<uint> = source.getVector(new Rectangle(0,0, source.width, source.height));
var len:int = pixels.length;
for(var i:int = 0; i < len; i++)
{
pixel = pixels[int(i)];
red += pixel >> 16 & 0xFF;
green += pixel >> 8 & 0xFF;
blue += pixel & 0xFF;
count++;
}
red /= count;
green /= count;
blue /= count;
return red << 16 | green << 8 | blue;
}
public final function averagecolors( source:BitmapData, colors:int ):Vector.<uint>
{
var averages:Vector.<uint> = new Vector.<uint>(colors, false);
var columns:int = Math.round( Math.sqrt( colors ) );
var row:int = 0;
var col:int = 0;
var x:int = 0;
var y:int = 0;
var w:int = Math.round( source.width / columns );
var h:int = Math.round( source.height / columns );
var p:Point = new Point();
var rect:Rectangle = new Rectangle(0,0,0,0);
var box:BitmapData = new BitmapData( w, h, false );
for (var i:int = 0; i < colors; i++)
{
rect.x = x;
rect.y = y;
rect.width = w;
rect.height = h;
box.copyPixels( source, rect, p );
averages[i] = averageColour( box );
col = i % columns;
x = w * col;
y = h * row;
if ( col == columns - 1 )
{
row++;
}
}
box.dispose();
return averages;
}
}
}
Lessons learned
Profile content to isolate bottlenecks : I skipped that step in this case since my code consisted of only two methods, but even in that case, the most significant improvement came from a single optimization. Profile so you know where to focus your efforts.
Test and profile all optimizations : Make sure to test performance after each optimization, as optimizations do not always have the desired effect.
Test on target devices and platforms : Optimizations can have a different impact on where they are run. This includes browser, platform and device, as well as player type (debug vs release). For example, when testing directly from Flash Authoring, results where significantly slower than when testing in the browser.
Test the results of the optimizations : Make sure that your optimizations do not break your code or content. The best way to do this is by using unit tests and running them after each optimization.
There is still some potential for optimization. In particular, since the code is essentially looping over all of the pixels of a bitmap and then doing some math operations on their values, this could be a good candidate for porting to PixelBender.
If you have any additional optimizations, questions or suggestions, post them in the comments.
Also, make sure to check out soulwire’s blog, as he is doing some very cool stuff with ActionScript 3 and Flash.






Don’t forget:
Instead of a division by ‘count’ you could also store the inverse value in a variable and multiply (faster).
Pre-Incrementing a variable (++i) is faster than post-increment (i++).
uint is slower than int. If you do not need these extra bits, use int.
Instead of using Number for r/g/b you should use Integer. This may be slower, since the compiler is not using the available VM Bytecode addInt.
But here we go: Most things should be optimized by the compiler, not by the developer. This would sustain readable code without having slower code.
Joa is collecting optimization at http://wiki.joa-ebert.com
Check it out.
Andre Michelle
13 Oct 09 at 10:50 am
I’d be curious how much of that 470% improvement is due solely to your switch from getPixel() to scanning through the vector? A lot of the changes you made barely moved the needle, and that one got you the bulk (if not all) of your final gains. It’s also the first one I would have made (come on, a function call per-pixel?).
I guess my point about optimization: how much did all of the tweaks hurt the flexibility of the code, or its readability, in exchange for very little gain?
Troy Gilbert
13 Oct 09 at 10:53 am
Good info thanks. You mention running the profiler in Flash Builder. I assumed this was for Flex projects only. is it possible to run the profiler on non-flex projects?
fredo
13 Oct 09 at 10:53 am
@Andre: is pre-increment really faster than post-increment under AS3? I know this comes up as a constant debate in many languages, but I thought it was irrelevant at this point in AS3?
Also, I’ve seen recently that under some performance tests that uint is as-fast or faster than int under Flash Player 10 (I believe Grant’s latest shows that).
Of course, you’re absolutely right about your last point: most of the optimizations that you mentioned and that Mike made should be done by the AS3 compiler (and are done in virtually all C/C++/C#/Java compilers).
Troy Gilbert
13 Oct 09 at 10:57 am
Also important:
Testing with the Debugplayer will not return proper results on performance tests. Some operations are faster in DebugPlayer than ReleasePlayer! Also, when using Eclipse (Flexbuilder, FDT) for a very precise final test you should close/restart Eclipse every time you test it. The Flashplayer sometimes starts with occupied memory from last launches. I have no clue why.
However this is often the reason, why people have different results. We should focus on a clean ReleasePlayer launch.
Andre Michelle
13 Oct 09 at 10:58 am
@Andre
–
Instead of a division by ‘count’ you could also store the inverse value in a variable and multiply (faster).
Pre-Incrementing a variable (++i) is faster than post-increment (i++).
–
Yeah, I tried both of these, but they provided no improvements at all in my case, as so I reverted them.
Thanks for the link to joa’s wiki. I was reading it last night, and it is a great resource.
mike chambers
mesh@adobe.com
mikechambers
13 Oct 09 at 11:05 am
@troy:
Pre-increment are faster. We encountered it when updating our VorbisEncoder. A lot!
uint vs int, I am not sure with FP10. If they have changed the behavior, I am happy to use uint in future. However every type of numbers are strange in Actionscript > http://blog.andre-michelle.com/2007/weird_behavior_of_numbers_in_as3
Anyways, the current compiler sux as hell as Joa demonstrated on several occasions. This compiler does not do anything to make things faster.
Andre Michelle
13 Oct 09 at 11:05 am
@Troy
Well, the biggest change was in how I looped through the data. i.e. just switching to getVector and looping in a for each didnt really give an improvement.
Looping through using for gave a huge improvement. (I am looking into why).
However, the key is that using getVector allowed me to remove 151,500 function calls (one for each pixel), as well as optimize my loop.
Yeah, I got the biggest performance boost where I would expect. I didnt do that first, because I wanted to update to Flash Player 10, as I expected using Vectors would also help (they didnt in this case).
As far as code readability, I dont think it suffered. I tried and reverted some optimizations that made the code less readable. However, your point is important. You have to consider performance gains against how convoluted they make your code. In my case, the gains were not enough to justify making the code less readable.
mike chambers
mesh@adobe.com
mikechambers
13 Oct 09 at 11:08 am
@fredo
Yes. You can have it run an arbitrary SWF or HTML page with a SWF (although it is very, very buggy on Mac).
mike chambers
mesh@adobe.com
mikechambers
13 Oct 09 at 11:09 am
@andre
–
Pre-increment are faster. We encountered it when updating our VorbisEncoder. A lot!
–
What types of improvements are you seeing? In my case, I didnt see anything significant (over 151,500 loops).
mike chambers
mesh@adobe.com
mikechambers
13 Oct 09 at 11:12 am
//try declare this, before get in loop
var box_copyPixels:Function = box.copyPixels;
//and then in loop just call
box_copyPixels( source, rect, p );
and…fixed length vector :)
katopz
13 Oct 09 at 11:14 am
“In my case, the gains were not enough to justify making the code less readable.”
Imagine you could do this:
Instead of
[red += pixel >> 16 & 0xFF;]
you write
[red += pixel.getRed();]
That is readable code. A usual compiler would fix the optimization for you by inlining the code from the method call. That is what Joa is about to do in Apparat > http://blog.joa-ebert.com/2009/08/11/apparat-is-now-open-source
We are so used to write optimized code, that we forgot how it could be. Not to mention bigger projects, where readability suffers when trying to make things fast. And not to forget Actioscript beginners, who have no clue at all, that optimization exists.
Andre Michelle
13 Oct 09 at 11:16 am
@Mike: I’m guessing the difference between for each and for is down to the “proxy” function call overhead? I would think this could be optimized-out by the compiler with a vector, but it wouldn’t surprise me if it’s not.
What I mean: when calling for each, the runtime is internally calling getNextName() (or whatever the proxy iterator function is) on the object. These would be unnecessary for direct indexing.
Troy Gilbert
13 Oct 09 at 11:19 am
@Mike:
“What types of improvements are you seeing?”
Have you tried to test it the way I propose?
If you still don’t get a big performance peak, I will try it tomorrow. Maybe things have change in the last update.
Andre Michelle
13 Oct 09 at 11:21 am
@Andre
Yes. I saw Joa’s session at FOTB and Max, as well as met with him a couple of times about his work.
I, and the compiler team, are aware of the improvements that could be made in the compiler. However, I was trying to focus on improvements that I could make to the code today.
mike chambers
mesh@adobe.com
mikechambers
13 Oct 09 at 11:22 am
@Mike
“However, I was trying to focus on improvements that I could make to the code today.”
Yeah, that is our daily bread. Workaround the bad compiler. Spending a huge amount of time. I just wanted to point out, that a new compiler would make a difference in future for everybody, not just for cutting edge guys.
@everyone
Please vote > http://bugs.adobe.com/jira/browse/ASC-3802
Andre Michelle
13 Oct 09 at 11:27 am
Hello!
This can be a dumb question. If you’re saying that “Some operations are faster in DebugPlayer than ReleasePlayer”! So how about using an external bebugger? Like Alcon maybe? Just asking :S
neon::pixel
13 Oct 09 at 11:41 am
Interesting post, Mike.
Well, I believe it would be essential that all these tricks to improve performance get done by the compiler for mobile applications ONLY.
I understand it sounds nuts writting instructions to the compiler to every single bad coding case in order to keep the code appearance and make life easier to developers.
I believe the flash team is doing a great job already by sharing these tricks, tips, tests, etc to the community, this is something we should learn, not to ask the compiler to do it for us.
Renato Moya
13 Oct 09 at 5:19 pm
Do you really need the ‘count’ variable at all? isn’t it just duplicating ‘len’ = pixels.length ? If so then you could remove one statement from the loop.
David Wilhelm
13 Oct 09 at 5:52 pm
@David
Yeah, you are right. Count is not needed at all. That was left over from the original code.
You could just len.
Good catch.
mike chambers
mesh@adobe.com
mikechambers
13 Oct 09 at 7:52 pm
Andre mentioned this, but what is the diff for release vs. debug player running this code? In my experience, you can see massive differences between the two.
Ben Garney
14 Oct 09 at 12:14 am