Friday, August 26, 2011

The Most Expensive One-Byte Mistake

This article was taken from ACM's Communications Sep 2011 Issue.

Information technology (IT) both drives and implements the modern Western-style economy. Thus, we regularly see headlines about staggeringly large amounts of money connected with IT mistakes. Which IT or CS decision has resulted in the most expensive mistake?
Not long ago, a fair number of pundits were doing a lot of hand waving about the financial implications of Sony's troubles with its PlayStation Network, but an event like that does not count here. In my school days, I talked with an inspector from The Guinness Book of World Records who explained that for something to be "a true record," it could not be a mere accident; there had to be direct causation starting with human intent (such as, we stuffed 26 high-school students into our music teacher's Volkswagon Beetle and closed the doors).
Sony (probably) did not intend to see how big a mess it could make with the least attention to security, so this and other such examples of false economy will not qualify. Another candidate could be IBM's choice of Bill Gates over Gary Kildall to supply the operating system for its personal computer. The damage from this decision is still accumulating at breakneck speed, with StuxNet and the OOXML perversion of the ISO standardization process being exemplary bookends for how far and wide the damage spreads. But that was not really an IT or CS decision. It was a business decision that, as far as history has been able to uncover, centered on Kildall's decision not to accept IBM's nondisclosure demands.
A better example would be the decision for MS-DOS to invent its own directory/filename separator, using the backslash (\) rather than the forward slash (/) that Unix used or the period that DEC used in its operating systems. Apart from the actual damage being relatively modest, however, this does not qualify as a good example either because it was not a real decision selecting a true preference. IBM had decided to use the slash for command flags, eliminating Unix as a precedent, and the period was used between filename and filename extension, making it impossible to follow DEC's example.
Space exploration history offers a pool of well-publicized and expensive mistakes, but interestingly, I did not find any valid candidates there. Fortran syntax errors and space shuttle computer synchronization mistakes do not qualify for lack of intent. Running one part of a project in imperial units and the other in metric is a "random act of management" that has nothing to do with CS or IT.
The best candidate I have been able to come up with is the C/Unix/Posix use of NUL-terminated text strings. The choice was really simple: Should the C language represent strings as an address + length tuple or just as the address with a magic character (NUL) marking the end? This is a decision that the dynamic trio of Ken Thompson, Dennis Ritchie, and Brian Kernighan must have made one day in the early 1970s, and they had full freedom to choose either way. I have not found any record of the decision, which I admit is a weak point in its candidacy: I do not have proof that it was a conscious decision.
As far as I can determine from my research, however, the address + length format was preferred by the majority of programming languages at the time, whereas the address + magic _ marker format was used mostly in assembly programs. As the C language was a development from assembly to a portable high-level language, I have a difficult time believing Ken, Dennis, and Brian gave it no thought.
Using an address + length format would cost one more byte of overhead than an address + magic _ marker format, and their PDP computer had limited core memory. In other words, this could have been a perfectly typical and rational IT or CS decision, like the many similar decisions we all make every day; but this one had quite atypical economic consequences.
Hardware development costs. Initially, Unix had little impact on hardware and instruction set design. The CPUs that offered string manipulation instructions—for example, Z-80 and DEC VAX—did so in terms of the far more widespread adr+len model. Once Unix and C gained traction, however, the terminated string appeared on the radar as an optimization target, and CPU designers started to add instructions to deal with them. One example is the Logical String Assist instructions IBM added to the ES/9000 520-based processors in 1992.1
Adding instructions to a CPU is not cheap, and it happens only when there are tangible and quantifiable monetary reasons to do so.
Performance costs. IBM added instructions to operate on NUL-terminated strings because its customers spent expensive CPU cycles handling such strings. That bit of information, however, does not tell us if fewer CPU cycles would have been required if a ptr+len format had been used.
Thinking a bit about virtual memory (VM) systems settles that question for us. Optimizing the movement of a known-length string of bytes can take advantage of the full width of memory buses and cache lines, without ever touching a memory location that is not part of the source or destination string.
One example is FreeBSD's libc, where the bcopy(3)/memcpy(3) implementation will move as much data as possible in chunks of "unsigned long," typically 32 bits or 64 bits, and then "mop up any trailing bytes" as the comment describes it, with byte-wide operations.2
If the source string is NUL terminated, however, attempting to access it in units larger than bytes risks attempting to read characters after the NUL. If the NUL character is the last byte of a VM page and the next VM page is not defined, this would cause the process to die from an unwarranted "page not present" fault.
Of course, it is possible to write code to detect that corner case before engaging the optimized code path, but this adds a relatively high fixed cost to all string moves just to catch this unlikely corner case—not a profitable trade-off by any means.
If we have out-of-band knowledge of the strings, things are different.
Compiler development cost. One thing a compiler often knows about a string is its length, particularly if it is a constant string. This allows the compiler to emit a call to the faster memcpy(3) even though the programmer used strcpy(3) in the source code.
Deeper code inspection by the compiler allows more advanced optimizations, some of them very clever, but only if somebody has written the code for the compiler to do it. The development of compiler optimizations has historically been neither easy nor cheap, but obviously Apple is hoping this will change with Low-level Virtual Machine (LLVM), where optimizers seem to come en gros.
The downside of heavy-duty compiler optimization—in particular, optimizations that take holistic views of the source code and rearrange it in large-scale operations—is that the programmer must be really careful that the source code specifies his or her complete intention precisely. A programmer who worked with the compilers on the Convex C3800 series supercomputers related his experience as "having to program as if the compiler was my ex-wife's lawyer."
Security costs. Even if your compiler does not have hostile intent, source code should be written to hold up to attack, and the NUL-terminated string has a dismal record in this respect. Utter security disasters such as gets(3), which "assume the buffer will be large enough," are a problem "we have relatively under control."3
Getting it under control, however, takes additions to compilers that would complain if the gets (3) function were called. Despite 15 years of attention, over- and underrunning string buffers is still a preferred attack vector for criminals, and far too often it pays off.
Mitigation of these risks has been added at all levels. Long-missed no-execute bits have been added to CPUs' memory management hardware; operating systems and compilers have added address-space randomization, often at high costs of performance; and static and dynamic analyses of programs have soaked up countless hours, trying to find out if the byzantine diagnostics were real bugs or clever programming.
Yet, absolutely nobody would be surprised if Sony's troubles were revealed to start with a buffer overflow or false NUL-termination assumption.

Friday, June 17, 2011

.NET Certification - Only as good as you make it.

I am currently trying to get an MCP with the ASP.NET 3.5, 70-562 exam.  I have already passed the Application Foundation exam last June 2010 but have just not had the time to study for the next exam enough to take it.  I have started studying more lately and am planning on taking the test in the next few months.  The way I study for the exam is to read the book, then take the practice tests and study from that.  I first take the tests that come with the book in study mode. While doing this I go to MSDN and study in depth all objects that are asked in the questions.   Sometimes I willl spend several hours on one quesiton. Doing this you will pass the test and you will actually learn something in the process.  This brings us to another point and the purpose of this article.

People have mixed feelings about certifications.  They say just because someone passes the tests does not mean they can actually program and solve real world problems.  I completely agree with this and if you hired someone just based on certifications then that is your own error.  Would you hire someone just because they have a masters degree in Computer Science?  If you did you might find out they don't know how to program in your environment or up to your expectations at all.  It is always best to interview based on real world questions derived on things you use in your environment.  If you miss doing this incompetent people will slip through the cracks and become employees.  I am not a hiring manager but I have often participated in  technical interviews on the hiring and being hired sides of the table. 

Friday, June 10, 2011

.Net Application Evironment Configuration - How do you move through environments?

Environment configuration, what I mean by this are your settings that allow your application to run in the different environments it needs to run in.  Development, QA, Production etc...

This is an ever changing entity where I work.  We use a custom Microsoft Enterprise Library implementation to store connection string and credentials.  We use .Net config files for almost everything else.  We also have some third party tools we use for configuring console applications with command line parameters.

I have come up with a very nice method that uses custom sections in the config such as below:

<Environment>
                 <add key="CurrentEnvironment" value="1"/> 1=local, 2=dev, 3=qa 4= prod
</Environment>
<local>
                <keys....
</local>
<Dev>
               <keys....
</Dev>
ETC.....

Then we setup a class with properties that allows you to pull from the custom sections or use the appSettings section for items that don't change between environments.  We then define properties in the class that expose all of the keys in the config file and it is transparent when we use the properties whether they are environment specific or common.

At build time, in the build scripts we use, we create a new build for each environment and change the XML in the config file to set the CurentEnvironment for the correct build target.  So far this has been working fine and we use built in .Net configuration methods to access config file values.  This has proven more stable than using custom XML configuration files and cache objects.

Please feel free to leave comments or suggestions if you have better methods of configuring your applications for different environments.

Wednesday, June 8, 2011

C# Extension Methods Really Do Exist - Are there Extension Properties?

I learned about this sometime ago in Javascript and I thought it was a brilliant feature.  Well today I just learned it was also available in C#3 and above.  Sometimes my blindness amazes me and I cant believe I did not see this sooner given I used in it javascript about 6-7 years ago.

Well here is the MSDN link for extensions.
http://msdn.microsoft.com/en-us/library/bb383977.aspx

Example:
namespace ExtensionMethods
{
    public static class MyExtensions
    {
       public static int WordCount(this String str)
        {
            return str.Split(new char[] { ' ', '.', '?' },
                             StringSplitOptions.RemoveEmptyEntries).Length;
        }
    }  
}


string s = "Hello Extension Methods";
int i = s.WordCount();


This syntax really allows you to write cleaner code.  Extending the built in .net objects is just as easy as adding new methods to your own classes.  I have seen people mentioning extending properties on forumns but Im not really sure if that is possible or they were just complaining about it.  I will need to research that a little further.  1 person mentioned he would like to do something like the following with extension properties which I think would also be nice to have.
DateTime d = 20.Minutes.Ago

Tuesday, June 7, 2011

Porting Applications from HP Tandem Nonstop to Windows - Large Software projects.

I am currently working on a very large software project at work.  This project is a goverement mandated change that is affecting a large part of most of the systems I work on. 

I am currently building 4 new components as part of my change with several of the changes being ports from a HP Tandem NonStop environment to the our Windows environment.  They are porting these applications due to most of the new functionality being written in WCF and it is just much faster to process the new component on Windows than to link in from Cobol programs to the WCF service.  Other issues are most of the new databases are on the Windows side so again it is just faster to do the processing on Windows.

It really seemed like once they made the decision to move the first process to Windows it opened the flood gates.  They quickly started moving applications that they were having problems designing solutions for on the NonStop to Windows where the solutions were almost trivial. 

This is not the first time I have gone through these excercises.  Actually it seems like I have been going through this type of effort ever since I started working on software.  It really seems to go through phases.  Every few years they will decide it is time to make the leap.

To Transfer or Redirect is the Question.

While studying for a Microsoft certification 70-562 I ran across this explination.  I never really dug down into what either of these did so here are the basics.

Server.Transfer performs server-side redirection, which minimizes the delay in displaying the page by not requiring the client to make an additional request. Set the second parameter to True to pass the query string parameters to the new page.

Response.Redirect performs a client-side redirection, which requires an additional response and request, slowing the display of the new page.