programming

Todd Hoff's picture

XML Encoding

Informative article on XML encoding at
http://www10.org/cdrom/papers/542/index.html.

Gzip often had much lower encode/decode times at the
expense of slightly larger content sizes when
compared to other approaches. In one test gzip took
3.33 msecs to encode test data producing a packed of
1516 bytes versus other approaches that took 914 msec
and produced a packet of 1222 bytes. The next lowest
competitor was at 266 msecs for the encode times.

Give me faster encode/decode times as long as the packet
is not grossly larger. Small size differences are easily
averaged out if data are streamed up to clients. Encode/decode
times are funamental performance limitaters because it
controls how many messages per second can be handled.
Using gzip you will be able to process 100s more messages
per second.

I've had good results with gzip as well, but chose to go
a different way. Instead i directly write a binary form
of the XML into a buffer so there's no separate encode
step. I also use a generic properties format so i
don't have to worry about arbitrary schemas. On the decode
side the buffer is passed around until needed so it doesn't
have to be docoded immediately, it can be decoded in some
other thread. The binary format is searchable so the
entire message doesn't need to be decode to get to part
of it.

This approach perfoms excellently.

Todd Hoff's picture

Group Dynamics

The group dynamics of important software projects under heavy development and release pressure is a lot like that of squads in battle. I've read where soldiers say after a while it stops being about country, it stops being about what they are fighting for, and becomes just about surviving. All that matters is your squad members helping each other stay alive.
In the crucible of a critical release the same narrowing of focus happens. All that matters is supporting your team memebers and getting the job done. You long ago stopped caring about your company, customers, and even whatever they hell you are making. You do whatever it takes to help your friends and make the product work.

It's an irrational process, like being replugged into ancient survival behaviours. Only with the perspective of time do you realize what an idiot you were. It probably has something in common with the madness of crowds behaviour that has been observed throughout history.

Todd Hoff's picture

Personas are a powerful design tool

Personas are a powerful design tool, especially when combined with responsibility driven design. http://www.boxesandarrows.com/archives/002330.php. Cooper's personas are:
simply pretend users of the system you're building. You describe
them, in a surprising amount of detail, and then design your
system for them.
I have a standard set of personas that i consider when creating a design/architecture that don't seem to be common. When you write code their are a lot of personas looking over your shoulder:

  • other programmers using the code
  • maintenance
  • extension
  • documentation group
  • training group
  • code review
  • test and validation
  • manufacturing
  • field support
  • first and second line technical support
  • live debugging
  • post crash debugging
  • build system (documentation generation and automatic testing)
  • unit testing
  • system testing
  • source code control
  • code readers
  • legal
  • You are much more careful and more thorough when you really thing about all the personas, all the different people and all their different roles and purposes.

    Todd Hoff's picture

    Flow Chart for Project Decision Making

    Not that i'm cycnical, but this is my favorite big picture of how projects work. This diagram is from my c++ coding standards page. Some people have complained about the profanity, but i admire its directness.

    In medieval times the majority of developers for all their brain power would have been serfs. Few groups work so hard under such difficult circumstances for the so unworthy. Can't complain about the pay, but that's not all there is. Certainly some of us would be wizards or alchemests or jugglers. A few of us like Galileo would have cracked open the doors of the enlightment and then like Newton blow the doors open. But most of us, myself included i think, would have served our masters quietly tending our fields of code.

    +---------+
    | START |
    +---------+
    |
    V
    YES +------------+ NO
    +---------------| DOES THE |---------------+
    | | DAMN THING | |
    V | WORK? | V
    +------------+ +------------+ +--------------+ NO
    | DON'T FUCK | | DID YOU FUCK |-----+
    | WITH IT | | WITH IT? | |
    +------------+ +--------------+ |
    | | |
    | | YES |
    | V |
    | +------+ +-------------+ +---------------+ |
    | | HIDE | NO | DOES ANYONE |<------| YOU DUMBSHIT! | |
    | | IT |<----| KNOW? | +---------------+ |
    | +------+ +-------------+ |
    | | | |

    Todd Hoff's picture

    Programming as Creating Causal Models

    Programming can never just be programming, we must always explain programming using a metaphor. Programming IS manufacturing. Programming IS conducting a symphony. Programming IS making a peanut butter and jelly sandwitch. Insert your particular agenda here.

    After reading The Mind's Arrows by Clark Glymour i think an interesting metaphor may be Programming IS Building Causal Models.

    My take is that programming as primarily teleological and analytical in nature. Programs are purposeful. This purpose serves as a grounding against which meanings are resolved. We start from our goals and work backwards figuring out how to achieve our goals. Goals arise and subside in feedback loops over time. There is no lack of available paradigms for implementing all of the above, but the idea of causal models is interesting. Then again i just may be channeling long repressed memories from past logic programming classes.

    Cause means The producer of an effect, result, or consequence. Causal means Indicative of or expressing a cause. The sense of model that applies is A schematic description of a system, theory, or phenomenon that accounts for its known or inferred properties and may be used for further study of its characteristics. A program is chain of causes that produces the effects necessary to reach our goals. The program is the model.

    The programmer is responsible for implementing the necessary causes and effects by translating the causal model a programmer has constructed in their mind to the causal model embodied in a program. In the mind reason, emotion, and experience meld to provide the deep structure of a causal model. A lot of what we know isn't easy to articulate. Programming requires the elicitation of the model which is difficult because our thoughts are primarily images which are hard to fully explore and extract.

    The discipline of programming, to paraphrase Mr. Glymour:

    is about the causal processes and mechanisms though

    Todd Hoff's picture

    Server-Side Design Principles for Scalable Internet Systems

    IEEE Software article Server-Side Design Principles for Scalable Internet Systems by Colleen Row and Sergio Gonik of GemStone Systems is a good overview of different strategies for achieving scalability. http://computer.org/software/index.shtml. It has the principles of scalable architecture as:

  • divide and conquer - system should be partitioned into smaller subsystems that can be deployed onto separate process or threads, which disperses the load and allows for load balancing and tuning.
  • asynchrony - means work can be carries out in the system on a resource-available basis. Asynchrony decouples functions and lets the system better schedule resources.
  • encapsulation - system components are loosely coupled with little dependencies among the components.
  • concurrency - Activities are split across hardware, processes, and threads and can exploit physical concurrency of modern multiprocessors. Concurrency allows the maximum work to be scheduled.
  • parsimony - Designer must be econimical in what they design. Pay attention to the thousand of micro details.
  • Strategies for achieving scalability:

  • Careful system partitioning
  • Service-base layered architecture
  • Just-enough data distribution
  • Pooling and multiplexing
  • Queueing work for background processes
  • Near real-time synchronization of data
  • Distributed session tracking
  • Keep it simple
  • And lots more with a lot more detail on each topic. It's a very good overview discussion that jibes with my experience. The question is how do developers implement all this, which is where i assume GemStone comes in :-) A part of scalabily that doesn't get addressed is ordered syncronization between distributed applications that can fail and recover independently. Maybe more on that later

    Todd Hoff's picture

    Differential Diagnosis Debugging

    Differential Diagnosis is an innovative technique for finding bugs. It's a strategy a co-worker of mine uses that is so obvious in retrospect, yet has an incredible amount of power.
    Usually problem debugging starts from scratch everytime and our heroes eventually find the probem. Using Differential Diagnosis you go back and look at the change history for every change since the bug wasn't a bug. The idea is that the code worked at one time. The bug is likely to have been introduced in one the later changes.

    By inspecting the source of only the changes it's often possible to figure out the problem or at least narrow it down considerably.

    This approach makes a lot of sense and works extremely well. But i hadn't seen it before. Obviously no strategy will always work, but it works a lot of the time. Even well unit tested code can have intregration related bugs that don't show up until later. And many products are so complex that any unit test doesn't scratch the surface of possible tests.

    Since then i've read a paper where the build system after finding a bug from a smoke test run would automatically backout changes and rebuild until it found which change caused the bug. Very cool. Someday i hope to add this to our current build system.

    Todd Hoff's picture

    Robert Martin's Rules

    Excellent heuristics from Robert Martin (http://objectmentor.com):

  • Whenever you see the number 1, consider that it might be N.
  • Whenever you see a constant, consider it might be a variable.
  • Whenever you see two or more concepts that are arbitrarily connected, consider they might need separation.
  • If a decision seem arbitrary, consider how it could be made differently.
  • Consider that what is ancillary today will be primary tomorrow.
  • Consider that what is low volume today will be high volume tomorrow.
  • Good things to consider when designing/coding. Being a little more reflective during the development process would help prevent a lot of problems. Development isn't a race.
  • Developers win through a complex set of tradeoffs that usually look like a loss from a number of other perspectives.
  • Todd Hoff's picture

    On Leadership

    On Leadership

    I wish i had said this, but it was said by asd@asd.com in comp.software-eng.

    Leaders:

    lead by example
    don't ask anything of anyone they wouldn't do themselves
    are called on to make difficult and unpopular decisions
    keep the team focused
    reward/support their team in whatever they do
    keep/clear unnecessary crap out of the way of the team
    Consensus is great. If it lasts for the project lifecycle, consider yourself blessed. I've been on a couple projects where two engineers just blantantly *disagreed*!

    #1 " x = 1"
    #2 " x != 1"

    That's when a Project Leader is required. Unless you want to flip a coin.

    Oh yea - one more thing. Project leaders: TAKE the blame when things go wrong and SHARE the credit when things go right.

    Ain't easy - but it's the way I try to run my life.

    Todd Hoff's picture

    Schedule to Unblock Others

    Schedules are lies. Schedules suck. Yes, yes, yes. But we still need them.
    The most effective scheduling rule i've used is to schedule so as to unblock others.

    The idea is to complete the portions of a feature that will unblock those dependent on you. This way development moves along smoothly because more lines of development can be active at a time. For example, instead of implementing the entire database, implement the simple interface and stub it out. People can work for a very long time this way using that portion of the feature that caused others not to block. Plus it's a form of rapid prototyping because you get immediate feedback on these parts. Don't worry about the quality of implementation because it doesn't matter yet.

    Todd Hoff's picture

    Three Layers of Inheritence Limit

    There is almost never a reason to use more than three layers of inheritence.
    This first layer is a concrete implementation of an AbstractBaseClass that abstracts a protocol. Don't jump to creating an AbstractBaseClass first. It should come from experience. Much of time there will just be a concrete implementation with no abstraction or derived classes.

    The second layer would be a complete implementation of the AbstractBaseClass or a partial implementation that is expected to be specialied further.

    The third layer is the complete implementation of the partial implementation at the second layer. You should never need to derive from this layer. Instead, backup and make a new second layer class.

    The advantage of this architecture is all classes can work as a system in terms of the abstract base class. Yet, with the second layer developers can make use of a fairly functional and standard base class that is easily extended with new system behaviour. 3 layers is not to deep to understand, yet allows almost all solutions to be expressed in an extensible manner because of the abstract base class strategy.

    Todd Hoff's picture

    Handling Infinite Work Streams

    Infinite work streams are the new reality of
    most systems. Web servers and application servers
    serve very large user populations where it is
    realistic to expect infinite streams of new work.
    The work never ends. Requests come in 24 hours a day
    7 days a week. Work could easily saturate
    servers at 100% CPU usage.

    Traditionally we have considered 100% CPU usage a bad sign.
    As compensation we create complicated infrastructures
    to load balance work, replicate state, and cluster
    machines.

    CPUs don't get tired so you might think we would
    try to use the CPU as much as possible.

    In other fields we try to increase productivity by
    using a resource to the greatest extent possible.

    In the server world we try to guarantee a certain
    level of responsiveness by forcing an artificially
    low CPU usage. The idea is if we don't have CPU
    availability then we can't respond to new work with a
    reasonable latency or complete existing work.

    Is there really a problem with the CPU being used
    100% of the time? Isn't the real problem that we use CPU
    availability and task priority as a simple cognitive
    shorthand for architecting a system rather than having
    to understand our system's low level work streams and using
    that information to make specific scheduling decisions?

    We simply don't have the tools to do anything other
    than make clumbsy architecture decisions based on
    load balancing servers and making guesses at the
    number of threads to use and the priorities for
    those threads.

    We could use 100% of CPU time if we could:

    0. Schedule work so that explicit locking is uncessary (though possible). This
    will help prevent dead lock and priority inversion.
    1. Control how much of the CPU work items can have.
    2. Decide on the relative priority of work and schedule work by
    that priority.
    3. Have a fairness algorithm for giving a particular level of service
    to each work priority.
    4. Schedule work CPU allowance across tasks.

    Todd Hoff's picture

    Thoughts On Interview Questions, the Process, and Resumes

    Given that i and few other people i know will be interviewing a bit more now :-) I've
    put together an interview related wiki page at http://www.possibility.com/epowiki/Wiki.jsp?page=InterviewQuestions.
    It covers C++ and general programmin interview questions. It also has some thoughts on some issues
    companies should consider when interviewing and some issues interview candidates
    should think about during the interview and when making their resume.

    Here's a bit of it.

    Thoughts For The Company Doing The Interviewing

    * Know the kind of person you want, the skills they should have, and design your interview process accordingly.
    * Do pre-interview phone interviews. This can save a lot of time for both parties if there is an obvious lack of a match.
    * Do you really have an open slot with money for it? Interviewing is a ton of work. It sucks to go through the entire process and the find out there really wasn't any money.
    * Decide who gets to decide if a person is hired. Is the manager going to hire who they want no matter what? Then don't bother with interviews. Does it have to be unanimous? Is it majority rules?
    * Every person in an interview should have a defined subject area. Have people know what they are supposed to ask and don't overlap questions.
    * Is someone a friend of the interviewee? If so don't have them interview the person. Make sure that the friendship doesn't influence others when making the decision.
    * People lie. Make people answer a wide variety of questions. Have them read code. Have them write code. Have them demonstrate specific knowledge. Have them demonstrate detailed knowledge. Do not accept generalities or diversions.
    * People lie. Have someone verify that what is on the resume is true. People will say they know C++ but can't describe a destructor!
    * Check references.
    * Be able to tell a candidate what job they are being hired for.

    Todd Hoff's picture

    Software Is Really a Community

    Software is far more a community than it is a well ordered bag of bits. This feeling struck me hard during a "transfer of knowledge" session for software i've worked on for over 6 years. The transfer of knowledge is due to an unfortunate plant closing.
    Just how do you transfer knowledge of a huge piece of software that you have so lovingly worked on for 6 years? It is a daunting task. There's no real place to start and there's no real place to end. The stories are fractally infinite.

    You hope you are transfering knowledge so that the software might live and even prosper. But in the back of my mind i know that this is not the case.

    I can talk about the software for days. I can demo it. I can document it better and better. But that's not the software.

    The software is really all the people and circumstances that gave rise to it, along with the culture that sustained it.

    The meaning for the software isn't it in the code. It comes from the society of people who used it. From the traditions and culture that were built around it. The exciting moments when you were able to add something that made someone elses life easier.

    Software is its community. Without a community software can not be said to live.

    Anything complex does not stay alive by the written word. Software lives through continual use; through old people handing down knowledge to new people, sharing tips, tricks, and workarounds; through steady continual improvement based on the feedback of actual caring users so that the software fits its niche so well nobody can imagine it working any other way.

    When going over each feature i can remember when it was added, who wanted it added, and why they wanted it. I can remember when the feature was completed and their thanks when it worked. Without that person or their living descendents, any explanation of the feature makes no sense. Inside, I know it will never be used again. It will just die.

    Todd Hoff's picture

    Swing, Threading, and Application Architectures

    Here's an interesting thread on writing efficient swing
    code (http://www.javalobby.org/thread.jspa?forumID=61&;threadID=13166).
    It's interesting to me because it talks about improving swing perfomance
    by not doing work in the UI thread. I would say this is obvious,
    but i've noticed in general threads are not talked about much
    in java.

    As threads are built into java you might expect a more
    energetic discussion.

    But unfortunately threads in java make it so easy to screw
    things up.

    The UI by at least not defaulting to having work done in another
    thread has caused Swing years and years of bad press.

    Observers not requiring notifications to be processed in a separate thread,
    for example, is a disaster waiting to happen. In the naive implementation of java
    you can handle notifications sagely by proving a bridge to an Actor
    type architecture, but few people know or will think to do it.
    Instead you get tangles of recursive code with entirely unpredictable
    latencies and deadlock characteristics.

    High performance applications consider threading architectures very carefully,
    as they do in SEDA (http://www.eecs.harvard.edu/~mdw/proj/seda/),
    for example.

    These issues are not related to swing only, they exist in every
    application, every jvm, every system. Inputs like databases,
    tcp/ip, rmi, soap, jms, servlets, etc all have the same problems
    of dispatching work, getting work done, and dealing with
    notifications from all the work performed which inturn causes more work,
    more notifications, etc.

    Container frameworks like Spring generally assume work is
    processed in a single thread. Thread local variables are used
    to transparently store transaction information or AOP is used to declaratively
    support transactions.

    This approach doesn't support moving work to different threads for different
    processing steps. Nor does it allow you to condition your total work load by limiting