Todd Hoff's blog

Todd Hoff's picture

Encapsulation or Representation?

David Bau has an interesting post on when one should choose Encapsulation or Representation. http://davidbau.com/archives/2003/11/12/encapsulation_or_representation.....

Use encapsulation within a subsystem. Use representation between subsystems.

Encapsulation creates a language API binding. This is the least
flexible option when trying to integrate between subsystems.
CORBA, RMI, RPC, etc have all pretty much failed for system
integration for this reason.

For example, i want to use Perforce via a programmatic interface.
They have a C++ API that runs on platform X. I use perl and i am
on platform Y. Within Perforce they should use a carefully
crafted encapsulation. If they had used SOAP or simple HTTP
at the subsystem boundry i would be set. These are available to me
in every language on every platform.

Encapsulation makes for a great internal program architecture.
Representation between subsystems makes for great accessibility
from any language and platform.

Perforce has a command line program called p4 which people often use
to make wrappers around perforce. The problem is this is neither encapsulation
or representation. The output must be screen scraped because it is meant
for CLI display. This is a far cry from having a well defined schema with
specific fields and values. With perl's regular expressions screen scraping
isn't horrible, but it is still crude and error prone.

Todd Hoff's picture

Why Frameworks are Good

Frameworks get a bad wrap because everyone has a story about how they were on a project that tried to build a framework and it spiraled out of control and the whole project failed and everyone died a firey death. I contend frameworks fail for pretty much the same reason any other software project fails. If it's not done properly it will fail. If it's done properly yet get a huge ROI.
From dictionary.com:

frame·work Pronunciation Key (frmwûrk) n. 1. A structure for supporting or enclosing something else, especially a skeletal support used as the basis for something being constructed. 2. An external work platform; a scaffold. 3. A fundamental structure, as for a written work. 4. A set of assumptions, concepts, values, and practices that constitutes a way of viewing reality.

There's no reason a framework must apply accross multiple applications, there's no reason for it to be OO based, and there's no reason for it to be complete.

My definition of a framework in the context of programming would be something like:

The systemization of a domain expressed in code to solve a particular class of problems in a particular ecology.

The framework could be large or small. It could work in one application or many applications. The primary point is a framework allows developers to solve their problem in terms of the framework. If done well it can provide a lot of leverage (ROI). A framework doesn't solve all problems in every application.

They keys are: 1. Systemization is an experienced based process otherwise the probablity of success is reduced greatly. Experience comes from working on the same or similar problem in multiple projects. It never stops which is why a framework is never perfect and is never done. Systemization is a key to success in other fields and it can be a key in software.

Todd Hoff's picture

Leave the platonic realms for philosophers.

Bala Paranj wrote: Hello, If I have an abstract class Bird and subclasses Sparrow, Pigeon and Ostrich having methods fly(), with no-op for the Ostrich fly() implementation. I am violating the LSP. Is this violation acceptable? Is there any case where the violation is acceptable?
My Reply:

It's up to you. LSP is helpful in programming because it let's programmers more confidently reason about programs because they can assume they know how something will behave. LSP is not a constraint in the real world, it is a constraint that can be put on software to help build systems.

One root of the problem is that types in most programming languages are based on classical categorization. The real world is much more interesting and may combine examplar and prototype based categorization (http://www.bsos.umd.edu/hesp/newman/Newman_classes/Newman300/webpages/ca...).

A tree stump can be considered on the outer boundry of being a seat, but it's not our best example of a seat. And in fact the only reason a tree stump could be considered a seat at all is because we are human. In the classification system of an ant or an elephant, a tree stump probably wouldn't be in the seat category.

This sort of ambiguity is not easily represented in class structures.

The categories we think of as natural and intrinsic to the world are largely the production of our human embodied self's relation with the world.

In the same way categories in your system need to be related to the problem space. The problem space is what embodies your solution and acts as the context in which questions, definitions, and relationships are resolved.

If you wish to build a software system then following LSP is probably a good idea. Figure out why you care about flight and make all your objects consistent with that view of the world. Trying to model reality in an unsuitable medium like a programming language can get very frustrating. Perhaps the brain is the complexity needed.

Todd Hoff's picture

XML Encoding

Informative article on XML encoding at
http://www10.org/cdrom/papers/542/index.html.

Gzip often had much lower encode/decode times at the
expense of slightly larger content sizes when
compared to other approaches. In one test gzip took
3.33 msecs to encode test data producing a packed of
1516 bytes versus other approaches that took 914 msec
and produced a packet of 1222 bytes. The next lowest
competitor was at 266 msecs for the encode times.

Give me faster encode/decode times as long as the packet
is not grossly larger. Small size differences are easily
averaged out if data are streamed up to clients. Encode/decode
times are funamental performance limitaters because it
controls how many messages per second can be handled.
Using gzip you will be able to process 100s more messages
per second.

I've had good results with gzip as well, but chose to go
a different way. Instead i directly write a binary form
of the XML into a buffer so there's no separate encode
step. I also use a generic properties format so i
don't have to worry about arbitrary schemas. On the decode
side the buffer is passed around until needed so it doesn't
have to be docoded immediately, it can be decoded in some
other thread. The binary format is searchable so the
entire message doesn't need to be decode to get to part
of it.

This approach perfoms excellently.

Todd Hoff's picture

Group Dynamics

The group dynamics of important software projects under heavy development and release pressure is a lot like that of squads in battle. I've read where soldiers say after a while it stops being about country, it stops being about what they are fighting for, and becomes just about surviving. All that matters is your squad members helping each other stay alive.
In the crucible of a critical release the same narrowing of focus happens. All that matters is supporting your team memebers and getting the job done. You long ago stopped caring about your company, customers, and even whatever they hell you are making. You do whatever it takes to help your friends and make the product work.

It's an irrational process, like being replugged into ancient survival behaviours. Only with the perspective of time do you realize what an idiot you were. It probably has something in common with the madness of crowds behaviour that has been observed throughout history.

Todd Hoff's picture

Personas are a powerful design tool

Personas are a powerful design tool, especially when combined with responsibility driven design. http://www.boxesandarrows.com/archives/002330.php. Cooper's personas are:
simply pretend users of the system you're building. You describe
them, in a surprising amount of detail, and then design your
system for them.
I have a standard set of personas that i consider when creating a design/architecture that don't seem to be common. When you write code their are a lot of personas looking over your shoulder:

  • other programmers using the code
  • maintenance
  • extension
  • documentation group
  • training group
  • code review
  • test and validation
  • manufacturing
  • field support
  • first and second line technical support
  • live debugging
  • post crash debugging
  • build system (documentation generation and automatic testing)
  • unit testing
  • system testing
  • source code control
  • code readers
  • legal
  • You are much more careful and more thorough when you really thing about all the personas, all the different people and all their different roles and purposes.

    Todd Hoff's picture

    Gossiping as War

    An interesting paper on strategic gossipping as a form of information warfare in reputation based networks (http://cogprints.soton.ac.uk/documents/disk0/00/00/21/12/index.html). A lot of systems on the internet, like slashdot.org, are using reputation based ratings as a form of decentralized collective/community control. With a few glitches it's a strategy that basically seems to work. Interesting how an integrated war strategy might take advantage.
    Reputation systems are fun because because they can be endlessly tinkered and debated, acting as a proxy for designing the ideal society. It's hard to create new governments in the meat world, but in the digital world we can set them up anytime and in endless variation. The internet is one vast experiment in self governance even to the extent of having old colonial powers trying to assert their control.

    Interestingly girls have been using strategic gossiping tactics forever as stunningly shown in the great article Girls Just Want to be Mean (http://www.nytimes.com/2002/02/24/magazine/24GIRLS.html). This article made me very glad as a guy i could count on just being hit or something equally obvious. Girls are much crueler.

    Todd Hoff's picture

    Flow Chart for Project Decision Making

    Not that i'm cycnical, but this is my favorite big picture of how projects work. This diagram is from my c++ coding standards page. Some people have complained about the profanity, but i admire its directness.

    In medieval times the majority of developers for all their brain power would have been serfs. Few groups work so hard under such difficult circumstances for the so unworthy. Can't complain about the pay, but that's not all there is. Certainly some of us would be wizards or alchemests or jugglers. A few of us like Galileo would have cracked open the doors of the enlightment and then like Newton blow the doors open. But most of us, myself included i think, would have served our masters quietly tending our fields of code.

    +---------+
    | START |
    +---------+
    |
    V
    YES +------------+ NO
    +---------------| DOES THE |---------------+
    | | DAMN THING | |
    V | WORK? | V
    +------------+ +------------+ +--------------+ NO
    | DON'T FUCK | | DID YOU FUCK |-----+
    | WITH IT | | WITH IT? | |
    +------------+ +--------------+ |
    | | |
    | | YES |
    | V |
    | +------+ +-------------+ +---------------+ |
    | | HIDE | NO | DOES ANYONE |<------| YOU DUMBSHIT! | |
    | | IT |<----| KNOW? | +---------------+ |
    | +------+ +-------------+ |
    | | | |

    Todd Hoff's picture

    Programming as Creating Causal Models

    Programming can never just be programming, we must always explain programming using a metaphor. Programming IS manufacturing. Programming IS conducting a symphony. Programming IS making a peanut butter and jelly sandwitch. Insert your particular agenda here.

    After reading The Mind's Arrows by Clark Glymour i think an interesting metaphor may be Programming IS Building Causal Models.

    My take is that programming as primarily teleological and analytical in nature. Programs are purposeful. This purpose serves as a grounding against which meanings are resolved. We start from our goals and work backwards figuring out how to achieve our goals. Goals arise and subside in feedback loops over time. There is no lack of available paradigms for implementing all of the above, but the idea of causal models is interesting. Then again i just may be channeling long repressed memories from past logic programming classes.

    Cause means The producer of an effect, result, or consequence. Causal means Indicative of or expressing a cause. The sense of model that applies is A schematic description of a system, theory, or phenomenon that accounts for its known or inferred properties and may be used for further study of its characteristics. A program is chain of causes that produces the effects necessary to reach our goals. The program is the model.

    The programmer is responsible for implementing the necessary causes and effects by translating the causal model a programmer has constructed in their mind to the causal model embodied in a program. In the mind reason, emotion, and experience meld to provide the deep structure of a causal model. A lot of what we know isn't easy to articulate. Programming requires the elicitation of the model which is difficult because our thoughts are primarily images which are hard to fully explore and extract.

    The discipline of programming, to paraphrase Mr. Glymour:

    is about the causal processes and mechanisms though

    Todd Hoff's picture

    Server-Side Design Principles for Scalable Internet Systems

    IEEE Software article Server-Side Design Principles for Scalable Internet Systems by Colleen Row and Sergio Gonik of GemStone Systems is a good overview of different strategies for achieving scalability. http://computer.org/software/index.shtml. It has the principles of scalable architecture as:

  • divide and conquer - system should be partitioned into smaller subsystems that can be deployed onto separate process or threads, which disperses the load and allows for load balancing and tuning.
  • asynchrony - means work can be carries out in the system on a resource-available basis. Asynchrony decouples functions and lets the system better schedule resources.
  • encapsulation - system components are loosely coupled with little dependencies among the components.
  • concurrency - Activities are split across hardware, processes, and threads and can exploit physical concurrency of modern multiprocessors. Concurrency allows the maximum work to be scheduled.
  • parsimony - Designer must be econimical in what they design. Pay attention to the thousand of micro details.
  • Strategies for achieving scalability:

  • Careful system partitioning
  • Service-base layered architecture
  • Just-enough data distribution
  • Pooling and multiplexing
  • Queueing work for background processes
  • Near real-time synchronization of data
  • Distributed session tracking
  • Keep it simple
  • And lots more with a lot more detail on each topic. It's a very good overview discussion that jibes with my experience. The question is how do developers implement all this, which is where i assume GemStone comes in :-) A part of scalabily that doesn't get addressed is ordered syncronization between distributed applications that can fail and recover independently. Maybe more on that later

    Todd Hoff's picture

    Differential Diagnosis Debugging

    Differential Diagnosis is an innovative technique for finding bugs. It's a strategy a co-worker of mine uses that is so obvious in retrospect, yet has an incredible amount of power.
    Usually problem debugging starts from scratch everytime and our heroes eventually find the probem. Using Differential Diagnosis you go back and look at the change history for every change since the bug wasn't a bug. The idea is that the code worked at one time. The bug is likely to have been introduced in one the later changes.

    By inspecting the source of only the changes it's often possible to figure out the problem or at least narrow it down considerably.

    This approach makes a lot of sense and works extremely well. But i hadn't seen it before. Obviously no strategy will always work, but it works a lot of the time. Even well unit tested code can have intregration related bugs that don't show up until later. And many products are so complex that any unit test doesn't scratch the surface of possible tests.

    Since then i've read a paper where the build system after finding a bug from a smoke test run would automatically backout changes and rebuild until it found which change caused the bug. Very cool. Someday i hope to add this to our current build system.

    Todd Hoff's picture

    Robert Martin's Rules

    Excellent heuristics from Robert Martin (http://objectmentor.com):

  • Whenever you see the number 1, consider that it might be N.
  • Whenever you see a constant, consider it might be a variable.
  • Whenever you see two or more concepts that are arbitrarily connected, consider they might need separation.
  • If a decision seem arbitrary, consider how it could be made differently.
  • Consider that what is ancillary today will be primary tomorrow.
  • Consider that what is low volume today will be high volume tomorrow.
  • Good things to consider when designing/coding. Being a little more reflective during the development process would help prevent a lot of problems. Development isn't a race.
  • Developers win through a complex set of tradeoffs that usually look like a loss from a number of other perspectives.
  • Todd Hoff's picture

    On Leadership

    On Leadership

    I wish i had said this, but it was said by asd@asd.com in comp.software-eng.

    Leaders:

    lead by example
    don't ask anything of anyone they wouldn't do themselves
    are called on to make difficult and unpopular decisions
    keep the team focused
    reward/support their team in whatever they do
    keep/clear unnecessary crap out of the way of the team
    Consensus is great. If it lasts for the project lifecycle, consider yourself blessed. I've been on a couple projects where two engineers just blantantly *disagreed*!

    #1 " x = 1"
    #2 " x != 1"

    That's when a Project Leader is required. Unless you want to flip a coin.

    Oh yea - one more thing. Project leaders: TAKE the blame when things go wrong and SHARE the credit when things go right.

    Ain't easy - but it's the way I try to run my life.

    Todd Hoff's picture

    Schedule to Unblock Others

    Schedules are lies. Schedules suck. Yes, yes, yes. But we still need them.
    The most effective scheduling rule i've used is to schedule so as to unblock others.

    The idea is to complete the portions of a feature that will unblock those dependent on you. This way development moves along smoothly because more lines of development can be active at a time. For example, instead of implementing the entire database, implement the simple interface and stub it out. People can work for a very long time this way using that portion of the feature that caused others not to block. Plus it's a form of rapid prototyping because you get immediate feedback on these parts. Don't worry about the quality of implementation because it doesn't matter yet.

    Todd Hoff's picture

    Three Layers of Inheritence Limit

    There is almost never a reason to use more than three layers of inheritence.
    This first layer is a concrete implementation of an AbstractBaseClass that abstracts a protocol. Don't jump to creating an AbstractBaseClass first. It should come from experience. Much of time there will just be a concrete implementation with no abstraction or derived classes.

    The second layer would be a complete implementation of the AbstractBaseClass or a partial implementation that is expected to be specialied further.

    The third layer is the complete implementation of the partial implementation at the second layer. You should never need to derive from this layer. Instead, backup and make a new second layer class.

    The advantage of this architecture is all classes can work as a system in terms of the abstract base class. Yet, with the second layer developers can make use of a fairly functional and standard base class that is easily extended with new system behaviour. 3 layers is not to deep to understand, yet allows almost all solutions to be expressed in an extensible manner because of the abstract base class strategy.