Recent Posts

RSS Feeds

The Irony of the Bailout

It is just me, or is it incredibly ironic that the government's proffered "solution" to the self-created crisis in the credit business is to borrow 3/4 of a trillion dollars?

Permalink     1 Comment

The $700 Billion Bailout of Slavery

No, the title is not an careless error - I make enough of those, however, that the potential question needed to be addressed right up front. It's just that I have come to see a tremendous similarity between the recent $700 billion bailout of the US financial backbone and that of the institution of slavery in this country in the late 1700s.

During my Unitarian church service today, my minister was talking about Thomas Jefferson - you know the guy, founding father, US president, unitarian, and huge slave owner. Yup, he held more slaves than anyone else in all of Virginia, and there were a lot of slave holders in Virginia. How could one of our most famous founding fathers, author of the bulk of the US Constitution, declarer that all men are created equal, also be a slave holder? After all, there were a number of prominent and vocal abolitionists at that time so it wouldn't have been without precedent, people including George Washington, John Adams,  and Thomas Jefferson. No, that's not an error, Jefferson was both an abolitionist and a slave holder (as was Washington). Why? The answer to this seeming paradox is simple: economics.

Jefferson was a leader in the abolitionist movement before the mid 1780s. It was then that he realized the huge economic cost he - indeed, the entire US economy - would incur should slavery be suddenly abolished. This is why he couldn't simply free his slaves even after the Virginia legislature passed a bill allowing slave holders to voluntarily free their slaves if they chose to. He simply couldn't afford it. As Howard Zinn wrote in a People's History of the United States, "Jefferson tried his best, as an enlightened, thoughtful individual might.  But the structure of American society, the power of the cotton, the slave trade, the politics of unity between northern and southern elites, and the long culture of race prejudice in the colonies, as well as his own weaknesses- that combination of practical need and idealogical fixation- kept Jefferson a slaveowner throughout his life."

Which brings us to the connection between slavery and the bailout of Wall Street. Congress - and both McCain and Obama - know quite well that they have no choice but to "save" the greedy, allegedly capitalistic institutions from their own malfeasance. Have to. Just as Jefferson realized that the economy of the still-young US could not survive the abolition of slavery during his lifetime, congress knows now what drives the US economy - the stock market. It's the modern-day equivalent of slavery, in a very obtuse way. It's something that they believe must be kept going at all costs ($700 billion for starters in this case) regardless of any other considerations. Just like the plantations had to be kept operating. Had to.

For the slave, unfortunately, it was a basically moot point, as it is for the average US citizen today. If the South stayed in the Union, he was still going to be a slave. If the South left the Union over the issue of slavery, he was still going to be a slave, but without the influence of the Northern abolitionists. Slavery was going to continue in the South anyway. $700 billion bailout or not, we're all similarly screwed. Where do you think that $700 billion is coming from? Same place as all the billions for the war, it's being borrowed. But as the world economy depending on exhaustible resources like oil starts to go down the drain, those debts will be called in. And then the @#%$ will really hit the fan. But our government - and capitalism in general - doesn't care about that. It only cares about today. What will happen when the mortgage of the entire country represented by this debt goes bad?

Revolution, I hope. It's about time for another one, to throw off the shackles of the government we've allowed and to find new guardians for our future security.


Permalink     2 Comments

The Great Media Bias

With both the right and the left proclaiming widespread media bias against them, especially in an election year, who can you trust? Certainly not the venerable AP, as this story illustrates.

This article reports on a web site that shows you what you 'would' pay in taxes under either a McCain presidency or an Obama one. As you read this article, notice the use of the word 'would' when 'could' is the far more appropriate term. Connotation is everything, and even this simple, unimportant article can't resist it's own spin.

What happened to real reporting?

Permalink     1 Comment

DAWG - Bug fixed!

I found a bug in the original code, it wasn't properly testing subtrees. Here is the corrected code:

package twinfeats;

import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.RandomAccessFile;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
import java.util.TreeMap;

/*
Directed Acyclic Word Graphs

original algorithm concept and description at http://www.wutka.com/dawg.html
This implementation and revised text by Kent L. Smotherman http://www.twinfeats.com

A Directed Acyclic Word Graph, or DAWG, is a data structure that permits
extremely fast word searches. The entry point into the graph represents 
the starting letter in the search. Each node represents a letter, and you 
can travel from the node to two other nodes, depending on whether you the 
letter matches the one you are searching for.

It's a Directed graph because you can only move in a specific direction 
between two nodes. In other words, you can move from A to B, but you can't 
move from B to A. It's Acyclic because there are no cycles. You cannot have 
a path from A to B to C and then back to A. The link back to A would create 
a cycle, and probably an endless loop in your search program.

The description is a little confusing without an example, so imagine we have 
a DAWG containing the words CAT, CAN, DO, and DOG. The graph woud look like this:

     C --Child--> A --Child--> N (EOW)
     |                         |
     |                       Next
   Next                        |
     |                         v
     |                         T (EOW)
     v
     D--Child--> O (EOW) --Child --> G (EOW)

Now, imagine that we want to see if CAT is in the DAWG. We start at the entry 
point (the C) in this case. Since C is also the letter we are looking for, we 
go to the child node of C. Now we are looking for the next letter in CAT, which 
is A. Again, the node we are on (A) has the letter we are looking for, so we 
again pick the child node which is now N. Since we are looking for T and the 
current node is not T, we take the Next node instead of the child. The Next 
node of N is T. T has the letter we want. Now, since we have processed all 
the letters in the word we are searching for, we need to make sure that the 
current node has an End-of-word flag (EOW) which it does, so CAT is stored in 
the graph.

One of the tricks with making a DAWG is trimming it down so that words with 
common endings all end at the same node. For example, suppose we want to store 
DOG and LOG in a DAWG. The ideal would be something like this:

   D --Child--> O --Child--> G(EOW)
   |            ^
  Next          |
   |            |
   v            |
   L --Child----

In other words, the OG in DOG and LOG is defined by the same pair of nodes.
 
Creating a DAWG

The idea is to first create a tree, where a leaf would represent the end of a 
word and there can be multiple leaves that are identical. For example, DOG and 
LOG would be stored like this:

  D --Child--> O --Child--> G (EOW)
  |
 Next
  |
  v
  L --Child-> O --Child--> G (EOW)

Now, suppose you want to add DOGMA to the tree. You'd proceed as if you were 
doing a search. Once you get to G, you find it has no children, so you add a 
child M, and then add a child A to the M, making the graph look like:

  D --Child--> O --Child--> G (EOW) --Child--> M --Child--> A (EOW)
  |
 Next
  |
  v
  L --Child-> O --Child--> G (EOW)

As you can see, by adding nodes to the tree this way, you share common 
beginnings, but the endings are still separated. To shrink the size of the 
DAWG, you need to find common endings and combine them. To do this, you start 
at the leaf nodes (the nodes that have no children). If two leaf nodes are 
identical, you combine them, moving all the references from one node to the 
other. For two nodes to be identical, they not only must have the same letter, 
but if they have Next nodes, the Next nodes must also be identical (if they 
have child nodes, the child nodes must also be identical).

Take the following tree of CITIES, CITY, PITIES and PITY:

 C --Child--> I --Child--> T --Child--> I --Child--> E --Child--> S (EOW)
 |                                      |
 |                                     Next
Next                                    |
 |                                      v
 |                                      Y (EOW)
 P --Child--> I --Child--> T --Child--> I --Child--> E --Child--> S (EOW)
                                        |
                                       Next
                                        |
                                        v
                                        Y (EOW)

Once I create the tree, I run through it and tag each node with a number of 
children and a child depth (the highest number of nodes you can go through 
from the current node to reach a leaf). Leaf nodes have 0 for the child
depth and 0 for the number of children. The main reason for the tagging is 
that when looking for identical nodes, you can quickly rule out nodes with 
a different number of children or a different child depth. When I tag the 
nodes, I also put them in an array of nodes with a similar child depth 
(again to speed searching).

Now that the nodes have been tagged and sorted by depth, I start with the 
children that have a depth of 0. If node X and node Y are identical, I make 
any node that points to Y as a child now point to X. Originally, I also 
allowed nodes that pointed to Y as a Next node point to X. While this works 
from a data structure standpoint, the implemented algorithm here does not
perform this step.

In the CITY-PITY graph, the algorithm would see that the S's at the end are 
the same. Although the Y nodes are identical, they are not referenced by any 
Child links, only Next links. As I said before, combining common next links 
make it difficult to store the graph the way I needed to. The graph would look 
like this after processing the nodes with a 0 child depth (leaf nodes):

 C --Child--> I --Child--> T --Child--> I --Child--> E --Child--> S (EOW)
 |                                      |                       ^
 |                                     Next                     |
Next                                    |                       |
 |                                      v                       |
 |                                      Y (EOW)                 |
 P --Child--> I --Child--> T --Child--> I --Child--> E --Child--
                                        |
                                       Next
                                        |
                                        v
                                        Y (EOW)

Next, the algorithm looks at nodes with a child depth of 1, which in this case 
would be the two E nodes (although T has 1-deep path to Y, it's child depth is 
3 because the longest path is the 3-deep path to S). In order to test that the 
two E's are identical, you first see that the letters are the same, which they 
are. Now you make sure that the children are identical. Since the only child of 
E is the same node, you KNOW they are identical. So the E's are now combined:

 C --Child--> I --Child--> T --Child--> I --Child--> E --Child--> S (EOW)
 |                                      |          ^
 |                                     Next        |
Next                                    |          |
 |                                      v          |
 |                                      Y (EOW)    |
 P --Child--> I --Child--> T --Child--> I --Child-->
                                        |
                                       Next
                                        |
                                        v
                                        Y (EOW)

Notice that the E and S nodes that come from the PITIES word are no longer 
needed. This technique pares the tree down pretty nicely.

Next come the nodes with a child depth of 2, which would be the I (the I's 
that have ES and Y as children) nodes. Applying the same comparison strategy, 
we can see that both the I nodes are identical, so they become combined. 
This procedure repeats for the T nodes and then the I nodes that are the 
parents of T. The last set of nodes, the C & P nodes, are not identical, 
so they can't be combined. The final tree looks like this:

 C --Child--> I --Child--> T --Child--> I --Child--> E --Child--> S (EOW)
 |            |                         |
 |            |                        Next
Next          |                         |
 |            |                         v
 |            |                         Y (EOW)
 P --Child---- 
 */
public class Dawg {
static Node head = new Node(); //dummy head node, only so we have the children TreeMap
static int count,words; //just some stats counters

static ArrayList[] depthList = new ArrayList[36];

public Dawg() {
super();
}

public static void main(String[] args) throws Exception {
head.letter = '0';
for (int i=0;i<36;i++) depthList[i] = new ArrayList();
FileReader fr = new FileReader("word.list";);
BufferedReader reader = new BufferedReader(fr);
while (true) {
words++;
String line = reader.readLine();
if (line == null) break;
line = line.trim();
if (line.length() > 36) continue; //only store words up to 36 letters long
addToDawg(line,0,head);
}
System.out.println("starting nodes="+count+", words="+words);
reader.close();
fr.close();
compress();
count = 0;
countNodes(head);
System.out.println("ending nodes="+count);
FileOutputStream fw = new FileOutputStream("words.dawg";);
BufferedOutputStream writer = new BufferedOutputStream(fw);
clearCounts(head);
outputTree(head,writer);
writer.close();
fw.close();
RandomAccessFile file = new RandomAccessFile("words.dawg","r";);
// long start = System.currentTimeMillis();
// dumpWordsDAWG(file);
// System.out.println((System.currentTimeMillis()-start));
// dumpWords(head,buf,0);
}

static void addToDawg(String word, int idx, Node parent) {
char c = word.charAt(idx);
Character ch = new Character(c);
Node node = (Node)parent.children.get(ch);
if (node == null) { //first time for this letter as child of parent letter
node = new Node();
node.parent = parent;
node.letter = c;
if (parent.children.size() > 0) {
Node tempNode = (Node)parent.children.get(parent.children.lastKey());
tempNode.next = node;
node.prev = tempNode;
}
parent.children.put(ch,node); //add letter to parent's list of letters
node.maxDepth = 0;
Node walk = node.parent;
int depth = 0;
while (walk != null) {
depth++;
if (walk.maxDepth < depth) {
walk.maxDepth = depth;
}
walk.numChildren++;
walk = walk.parent;
}
}
if (idx+1 < word.length()) { //not done with word
addToDawg(word,idx+1,node); //next letter
}
else {
node.endOfWord = true; //we have a complete word
}
}

static void compress() {
//first we need to build lists of nodes by their maxDepth
buildDepthArrays(head);

//next we start the node processing from depth 0 to depth 35 (depth is distance from leaf node)
for (int i=0;i<36;i++) {
processDepth(i);
}
}

static void processDepth(int depth) {
Collections.sort(depthList[depth]);
for (int i=0;i<depthList[depth].size();i++) {
Node n1 = (Node)depthList[depth].get(i);
for (int j=i+1;j<depthList[depth].size();j++) {
Node n2 = (Node)depthList[depth].get(j);
if (compareSubTrees(n1,n2)) {
if (compareNexts(n1,n2)) {
n2.parent.children.put(new Character(n1.letter),n1);
if (n2.prev != null) {
n2.prev.next = n1;
}
depthList[depth].remove(j);
j--; //must reprocess due to the remove
}
}
else { //since the depth arrays are sorted, when we find a difference we can skip all the nodes up to this point since they must have been processed already
i = j-1;
break;
}
}
}
}

static boolean compareNexts(Node p1, Node p2) {
while (true) {
if (p1 == p2) return true;
if ((p1==null) != (p2==null)) return false;
p1 = p1.next;
p2 = p2.next;
if (p1 != null && p2 != null) {
if (!compareSubTrees(p1,p2)) {
return false;
}
}
}
}

static boolean compareSubTrees(Node n1, Node n2) {
if (n1 == n2 || (n1.letter == n2.letter && n1.numChildren == n2.numChildren && n1.maxDepth == n2.maxDepth && n1.children.size() == n2.children.size() && n1.endOfWord == n2.endOfWord)) { //might be same subtrie
Iterator nodes1 = n1.children.values().iterator();
Iterator nodes2 = n2.children.values().iterator();
while (nodes1.hasNext()) {
Node node1 = (Node)nodes1.next();
Node node2 = (Node)nodes2.next();
if (!compareSubTrees(node1,node2)) return false;
}
return true;
}
return false;
}

static void buildDepthArrays(Node parent) {
if (parent != head) {
depthList[parent.maxDepth].add(parent);
}
if (parent.children != null) {
Iterator it = parent.children.values().iterator();
while (it.hasNext()) {
Node node = (Node)it.next();
buildDepthArrays(node);
}
}
}

static void dumpWords(Node parent, StringBuffer word, int len) {
word.setLength(len);
if (parent != head) {
word.append(parent.letter);
len++;
}
if (parent.endOfWord) {
System.out.println(word.toString());
}
if (parent.children != null) {
Iterator it = parent.children.values().iterator();
while (it.hasNext()) {
Node node = (Node)it.next();
dumpWords(node,word,len);
}
}
}

static void dumpWordsDAWG(RandomAccessFile file) throws Exception {
StringBuffer word = new StringBuffer();
int nextIdx = 0;
do {
file.seek(nextIdx*6);
byte[] buf = new byte[6];
file.readFully(buf);
int v1 = ((buf[1]&0xff)>>4)<<16;
int v2 = ((buf[4]&0xff)<<8);
int v3 = buf[5]&0xff;
nextIdx = v1+v2+v3;
dumpWordsDAWG(file,0,word,0);
} while(nextIdx != 0);
}

static void dumpWordsDAWG(RandomAccessFile file, int parentIdx, StringBuffer word, int len) throws Exception {
word.setLength(len);
file.seek(parentIdx*6);
byte[] buf = new byte[6];
file.readFully(buf);
char c = (char)((buf[0]&0x1f) + 'a');
word.append(c);
len++;
if ((buf[0]&0x80) != 0) {
// System.out.println(word.toString());
}
int v1 = (buf[1]&0x0f)<<16;
int v2 = ((buf[2]&0xff)<<8);
int v3 = buf[3]&0xff;
int childIdx = v1+v2+v3;
v1 = ((buf[1]&0xff)>>4)<<16;
v2 = ((buf[4]&0xff)<<8);
v3 = buf[5]&0xff;
int nextIdx = v1+v2+v3;
if (childIdx != 0) {
int walkIdx = childIdx;
while (walkIdx != 0) {
dumpWordsDAWG(file,walkIdx, word, len);
file.seek(walkIdx*6);
byte[] buf2 = new byte[6];
file.readFully(buf2);
v1 = ((buf2[1]&0xff)>>4)<<16;
v2 = ((buf2[4]&0xff)<<8);
v3 = buf2[5]&0xff;
walkIdx = v1+v2+v3;
}
}
}

static void clearCounts(Node parent) {
parent.counted = false;
if (parent.children != null) {
Iterator it = parent.children.values().iterator();
while (it.hasNext()) {
Node node = (Node)it.next();
clearCounts(node);
}
}
}

static void countNodes(Node parent) {
if (parent.children != null) {
Iterator it = parent.children.values().iterator();
Node n = null;
if (parent.children.size() > 0)
n = (Node)(parent.children.get(parent.children.firstKey()));
while (it.hasNext()) {
Node node = (Node)it.next();
if (!node.counted) {
node.counted = true;
node.nodeNum = count++;
}
else {
// if (n != node) System.out.println("sibling already counted";);
}
}
it = parent.children.values().iterator();
while (it.hasNext()) {
Node node = (Node)it.next();
countNodes(node);
}
}
}

static void outputTree(Node parent, BufferedOutputStream writer) throws Exception {
byte[] bytes;
if (parent.children != null) {
Iterator it = parent.children.values().iterator();
while (it.hasNext()) {
Node node = (Node)it.next();
if (!node.counted) {
node.counted = true;
bytes = node.toBytes(!it.hasNext());
writer.write(bytes);
}
}
it = parent.children.values().iterator();
while (it.hasNext()) {
Node node = (Node)it.next();
outputTree(node,writer);
}
}
}
}

class Node implements Comparable {
char letter;
boolean endOfWord;
TreeMap children = new TreeMap();

Node parent;
Node prev,next;
int numChildren;
int maxDepth;
boolean counted;
int nodeNum;


public int compareTo(Object arg0) {
Node n = (Node)arg0;
if (letter < n.letter)
return -1;
if (letter > n.letter)
return 1;
if (numChildren < n.numChildren) 
return -1;
if (numChildren > n.numChildren)
return 1;
if (maxDepth < n.maxDepth)
return -1;
if (maxDepth > n.maxDepth)
return 1;
return 1;
}


Node() {
Dawg.count++;
}

public byte[] toBytes(boolean endOfList) {
byte[] bytes = new byte[6];
for (int i=0;i<6;i++) bytes[i] = 0;
if (endOfWord) {
bytes[0] |= 0x80;
}
if (endOfList) {
bytes[0] |= 0x40;
}
byte c = (byte)(letter-'a');
bytes[0] |= c;
int num = 0;
if (children.size() > 0) {
Node n = (Node)(children.get(children.firstKey()));
num = n.nodeNum;
}
bytes[1] = (byte)(num>>16);
bytes[2] = (byte)((num&0xffff)>>8);
bytes[3] = (byte)(num&0xff);
if (next != null) {
bytes[1] |= (byte)((next.nodeNum>>16)<<4);
bytes[4] = (byte)((next.nodeNum&0xffff)>>8);
bytes[5] = (byte)(next.nodeNum&0xff);
}

if ((next == null) != endOfList) {
System.out.println("Inconsistent endOfList";);
}
return bytes;
}

Permalink     No Comments

DAWGs and the super-secret iPhone project

I'm still working on a couple of super-secret iPhone projects, and I have to admit that picking up Objective-C and the iPhone SDK have been pretty taxing. I am making good progress, though, with lots of reading and re-reading of the docs. :)

One of my first objectives was to implement a DAWG (Directed Acyclic Word Graph) for storing a word list. I had done this a decade ago for my PalmOS game WordBox, but have long since lost that old source code so had to start from scratch.  I decided to once again write this in Java to build the DAWG file, and then the iPhone app would have to decode it.

I am really happy with the way it turned out. The source word list is the YAWL 2008 list, which has 265,098 words. The starting DAWG comprised 586,566 nodes and after subtree reduction contained 235,474 nodes. The java app took about 30 seconds to perform this task on my 2GHz first generation MacBook.

For your geeky enjoyment, source code is below (feel free to use this code however you like, no warranty expressed or implied - just leave the header in place so proper credit is given to the authors):

package twinfeats;

import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
import java.util.TreeMap;

/*
Directed Acyclic Word Graphs

original algorithm concept and description at http://www.wutka.com/dawg.html
This implementation and revised text by Kent L. Smotherman http://www.twinfeats.com

A Directed Acyclic Word Graph, or DAWG, is a data structure that permits
extremely fast word searches. The entry point into the graph represents 
the starting letter in the search. Each node represents a letter, and you 
can travel from the node to two other nodes, depending on whether you the 
letter matches the one you are searching for.

It's a Directed graph because you can only move in a specific direction 
between two nodes. In other words, you can move from A to B, but you can't 
move from B to A. It's Acyclic because there are no cycles. You cannot have 
a path from A to B to C and then back to A. The link back to A would create 
a cycle, and probably an endless loop in your search program.

The description is a little confusing without an example, so imagine we have 
a DAWG containing the words CAT, CAN, DO, and DOG. The graph woud look like this:

     C --Child--> A --Child--> N (EOW)
     |                         |
     |                       Next
   Next                        |
     |                         v
     |                         T (EOW)
     v
     D--Child--> O (EOW) --Child --> G (EOW)

Now, imagine that we want to see if CAT is in the DAWG. We start at the entry 
point (the C) in this case. Since C is also the letter we are looking for, we 
go to the child node of C. Now we are looking for the next letter in CAT, which 
is A. Again, the node we are on (A) has the letter we are looking for, so we 
again pick the child node which is now N. Since we are looking for T and the 
current node is not T, we take the Next node instead of the child. The Next 
node of N is T. T has the letter we want. Now, since we have processed all 
the letters in the word we are searching for, we need to make sure that the 
current node has an End-of-word flag (EOW) which it does, so CAT is stored in 
the graph.

One of the tricks with making a DAWG is trimming it down so that words with 
common endings all end at the same node. For example, suppose we want to store 
DOG and LOG in a DAWG. The ideal would be something like this:

   D --Child--> O --Child--> G(EOW)
   |            ^
  Next          |
   |            |
   v            |
   L --Child----

In other words, the OG in DOG and LOG is defined by the same pair of nodes.
 
Creating a DAWG

The idea is to first create a tree, where a leaf would represent the end of a 
word and there can be multiple leaves that are identical. For example, DOG and 
LOG would be stored like this:

  D --Child--> O --Child--> G (EOW)
  |
 Next
  |
  v
  L --Child-> O --Child--> G (EOW)

Now, suppose you want to add DOGMA to the tree. You'd proceed as if you were 
doing a search. Once you get to G, you find it has no children, so you add a 
child M, and then add a child A to the M, making the graph look like:

  D --Child--> O --Child--> G (EOW) --Child--> M --Child--> A (EOW)
  |
 Next
  |
  v
  L --Child-> O --Child--> G (EOW)

As you can see, by adding nodes to the tree this way, you share common 
beginnings, but the endings are still separated. To shrink the size of the 
DAWG, you need to find common endings and combine them. To do this, you start 
at the leaf nodes (the nodes that have no children). If two leaf nodes are 
identical, you combine them, moving all the references from one node to the 
other. For two nodes to be identical, they not only must have the same letter, 
but if they have Next nodes, the Next nodes must also be identical (if they 
have child nodes, the child nodes must also be identical).

Take the following tree of CITIES, CITY, PITIES and PITY:

 C --Child--> I --Child--> T --Child--> I --Child--> E --Child--> S (EOW)
 |                                      |
 |                                     Next
Next                                    |
 |                                      v
 |                                      Y (EOW)
 P --Child--> I --Child--> T --Child--> I --Child--> E --Child--> S (EOW)
                                        |
                                       Next
                                        |
                                        v
                                        Y (EOW)

Once I create the tree, I run through it and tag each node with a number of 
children and a child depth (the highest number of nodes you can go through 
from the current node to reach a leaf). Leaf nodes have 0 for the child
depth and 0 for the number of children. The main reason for the tagging is 
that when looking for identical nodes, you can quickly rule out nodes with 
a different number of children or a different child depth. When I tag the 
nodes, I also put them in an array of nodes with a similar child depth 
(again to speed searching).

Now that the nodes have been tagged and sorted by depth, I start with the 
children that have a depth of 0. If node X and node Y are identical, I make 
any node that points to Y as a child now point to X. Originally, I also 
allowed nodes that pointed to Y as a Next node point to X. While this works 
from a data structure standpoint, the implemented algorithm here does not
perform this step.

In the CITY-PITY graph, the algorithm would see that the S's at the end are 
the same. Although the Y nodes are identical, they are not referenced by any 
Child links, only Next links. As I said before, combining common next links 
make it difficult to store the graph the way I needed to. The graph would look 
like this after processing the nodes with a 0 child depth (leaf nodes):

 C --Child--> I --Child--> T --Child--> I --Child--> E --Child--> S (EOW)
 |                                      |                       ^
 |                                     Next                     |
Next                                    |                       |
 |                                      v                       |
 |                                      Y (EOW)                 |
 P --Child--> I --Child--> T --Child--> I --Child--> E --Child--
                                        |
                                       Next
                                        |
                                        v
                                        Y (EOW)

Next, the algorithm looks at nodes with a child depth of 1, which in this case 
would be the two E nodes (although T has 1-deep path to Y, it's child depth is 
3 because the longest path is the 3-deep path to S). In order to test that the 
two E's are identical, you first see that the letters are the same, which they 
are. Now you make sure that the children are identical. Since the only child of 
E is the same node, you KNOW they are identical. So the E's are now combined:

 C --Child--> I --Child--> T --Child--> I --Child--> E --Child--> S (EOW)
 |                                      |          ^
 |                                     Next        |
Next                                    |          |
 |                                      v          |
 |                                      Y (EOW)    |
 P --Child--> I --Child--> T --Child--> I --Child-->
                                        |
                                       Next
                                        |
                                        v
                                        Y (EOW)

Notice that the E and S nodes that come from the PITIES word are no longer 
needed. This technique pares the tree down pretty nicely.

Next come the nodes with a child depth of 2, which would be the I (the I's 
that have ES and Y as children) nodes. Applying the same comparison strategy, 
we can see that both the I nodes are identical, so they become combined. 
This procedure repeats for the T nodes and then the I nodes that are the 
parents of T. The last set of nodes, the C & P nodes, are not identical, 
so they can't be combined. The final tree looks like this:

 C --Child--> I --Child--> T --Child--> I --Child--> E --Child--> S (EOW)
 |            |                         |
 |            |                        Next
Next          |                         |
 |            |                         v
 |            |                         Y (EOW)
 P --Child---- 
 */
public class Dawg {
static Node head = new Node(); //dummy head node, only so we have the children TreeMap
static int count,words; //just some stats counters

static ArrayList[] depthList = new ArrayList[36];

public Dawg() {
super();
}

public static void main(String[] args) throws Exception {
head.letter = '0';
for (int i=0;i<36;i++) depthList[i] = new ArrayList();
FileReader fr = new FileReader(args[0]);
BufferedReader reader = new BufferedReader(fr);
while (true) {
words++;
String line = reader.readLine();
if (line == null) break;
line = line.trim();
if (line.length() > 36) continue; //only store words up to 36 letters long
addToDawg(line,0,head);
}
System.out.println("starting nodes="+count+", words="+words);
reader.close();
fr.close();
StringBuffer buf = new StringBuffer();
compress();
count = 0;
countNodes(head);
System.out.println("ending nodes="+count);
// dumpWords(head,buf,0);
}

static void addToDawg(String word, int idx, Node parent) {
char c = word.charAt(idx);
Character ch = new Character(c);
Node node = (Node)parent.children.get(ch);
if (node == null) { //first time for this letter as child of parent letter
node = new Node();
node.parent = parent;
node.letter = c;
parent.children.put(ch,node); //add letter to parent's list of letters
node.maxDepth = 0;
Node walk = node.parent;
int depth = 0;
while (walk != null) {
depth++;
if (walk.maxDepth < depth) {
walk.maxDepth = depth;
}
walk.numChildren++;
walk = walk.parent;
}
}
if (idx+1 < word.length()) { //not done with word
addToDawg(word,idx+1,node); //next letter
}
else {
node.endOfWord = true; //we have a complete word
}
}

static void compress() {
//first we need to build lists of nodes by their maxDepth
buildDepthArrays(head);

//next we start the node processing from depth 0 to depth 35 (depth is distance from leaf node)
for (int i=0;i<36;i++) {
processDepth(i);
}
}

static void processDepth(int depth) {
Collections.sort(depthList[depth]);
for (int i=0;i<depthList[depth].size();i++) {
Node n1 = (Node)depthList[depth].get(i);
for (int j=i+1;j<depthList[depth].size();j++) {
Node n2 = (Node)depthList[depth].get(j);
if (compareSubTrees(n1,n2)) {
n2.parent.children.put(new Character(n1.letter),n1);
depthList[depth].remove(j);
j--; //must reprocess due to the remove
}
else { //since the depth arrays are sorted, when we find a difference we can skip all the nodes up to this point since they must have been processed already
i = j-1;
break;
}
}
}
}

static boolean compareSubTrees(Node n1, Node n2) {
if ( (n1.letter == n2.letter && n1.numChildren == n2.numChildren && n1.maxDepth == n2.maxDepth && n1.children.size() == n2.children.size() && n1.endOfWord == n2.endOfWord)) { //might be same subtrie
Iterator nodes1 = n1.children.values().iterator();
Iterator nodes2 = n2.children.values().iterator();
while (nodes1.hasNext()) {
Node node1 = (Node)nodes1.next();
Node node2 = (Node)nodes2.next();
if (!compareSubTrees(node1,node2)) return false;
}
return true;
}
return false;
}

static void buildDepthArrays(Node parent) {
if (parent != head) {
depthList[parent.maxDepth].add(parent);
}
if (parent.children != null) {
Iterator it = parent.children.values().iterator();
while (it.hasNext()) {
Node node = (Node)it.next();
buildDepthArrays(node);
}
}
}

static void dumpWords(Node parent, StringBuffer word, int len) {
word.setLength(len);
if (parent != head) {
word.append(parent.letter);
len++;
}
if (parent.endOfWord) {
System.out.println(word.toString());
}
if (parent.children != null) {
Iterator it = parent.children.values().iterator();
while (it.hasNext()) {
Node node = (Node)it.next();
dumpWords(node,word,len);
}
}
}

static void clearCounts(Node parent) {
parent.counted = false;
if (parent.children != null) {
Iterator it = parent.children.values().iterator();
while (it.hasNext()) {
Node node = (Node)it.next();
clearCounts(node);
}
}
}

static void countNodes(Node parent) {
if (parent.children != null) {
Iterator it = parent.children.values().iterator();
while (it.hasNext()) {
Node node = (Node)it.next();
if (!node.counted) {
node.counted = true;
node.nodeNum = count++;
}
}
it = parent.children.values().iterator();
while (it.hasNext()) {
Node node = (Node)it.next();
countNodes(node);
}
}
}

}

class Node implements Comparable {
char letter;
boolean endOfWord;
TreeMap children = new TreeMap();

Node parent;
int numChildren;
int maxDepth;
boolean counted;
int nodeNum;


public int compareTo(Object arg0) {
Node n = (Node)arg0;
if (letter < n.letter)
return -1;
if (letter > n.letter)
return 1;
if (numChildren < n.numChildren) 
return -1;
if (numChildren > n.numChildren)
return 1;
if (maxDepth < n.maxDepth)
return -1;
if (maxDepth > n.maxDepth)
return 1;
return 1;
}


Node() {
Dawg.count++;
}

public byte[] toBytes(boolean endOfList) {
byte[] bytes = new byte[4];
for (int i=0;i<4;i++) bytes[i] = 0;
if (endOfWord) {
bytes[0] |= 0x80;
}
if (endOfList) {
bytes[0] |= 0x40;
}
byte c = (byte)(letter-'a');
bytes[0] |= c;
int num = 0;
if (children.size() > 0) {
Node n = (Node)(children.get(children.firstKey()));
num = n.nodeNum;
}
bytes[1] = (byte)(num>>16);
bytes[2] = (byte)((num&0xffff)>>8);
bytes[3] = (byte)(num&0xff);

return bytes;
}

Permalink     No Comments

Always use the second stall

After many years of public bathroom research, I am at last ready to publish my resulting theories. The first of these is called "Always use the second stall." Since most men like to pee standing up, but some don't like to do so in public, they will use a stall for that purpose. The most popular stall to use from my observations is the first one. A corollary theory stemming from this observation is that most men are also lazy - it's why they not only choose the first (and hence closest to the door) stall, but also why they tend to be too lazy to actually pay attention to where they are peeing.

Men who bypass the first stall seem to value their privacy all the more, and therefore head to the last stalls furthest from the door. This is again based on the evidence left behind. This group also disdains the second stall because if the first stall is being used by someone from the first group, then there is also the risk of splash damage.

It can be concluded, therefore, that the safest (i.e. most hygienic) stall to use is the second stall, despite the concerns exhibited by the second group. You just have to be sure to keep your feet raised above the bottom of the partition.

My second theory stemming from my extensive research on this topic, is that companies staffing college-educated professionals have a tendency to hire a comparatively large number of handicapped employees, particularly the blind. This theory was formulated by the direct observation of the typically well-kept corporate bathroom stall still being covered in carelessly directed urine. The only conclusion that I am able to reach from this observation is that the companies in question must have a number of blind, male employees. Otherwise, if the men in question weren't blind, then they could easily see the results of their misdeeds and take appropriate corrective action. Clearly, this is not the case.

This research has led me to wonder what the allure of peeing standing up is for men. I'm still stumped on this, however, and have as yet been unable to formulate a coherent hypothesis.

Permalink     No Comments

It's Talk Like a Pirate Day!

Avast, ye wanderers of the open 'net, it be Talk Like a Pirate Day, and ye be likin' it, too, or ye be getting the ropes end!

I'm reminded, I am, of a time a few sails back, when I found meself boot deep in all manner of piratical scallywags. Twas quite a sight to be sure, nigh to a sutler at a strange port 'twas...

Permalink     No Comments

The sky is falling!

The SEC moved yesterday to ban the short-selling of 799 financial stocks. I have mixed feelings about this, because I think that the entire market should be disbanded and made illegal. If you examine all the major scandals and failures of big companies like Enron and now AIG, Lehman, et al, their problems can all be traced back to corporate greed fueled by the stock market. It's because the top execs are all worth far more  with their stocks than with their exaggerated salaries, and the mantra is to keep that price going up. How do you do that when your expansion hits a wall? You lobby to relax the rules so you can keep that annual percentage increase rising.

The money quote for me in this article is this:

 But a recent wave of the maneuvers — profiting by selling unowned shares of companies in the anticipation their prices will drop — has been blamed in part for the demise of venerable investment firm Lehman Brothers and other big financial companies.

 How outrageous it is that some unscrupulous traders would want to sell stock when they think the market is about to take a down-turn! What do they think this is, a free market, Capitalism?! What the market wants - what it loves - is that continuous influx of 401k money, no questions asked, and it's there for the long term. But all this "profit-taking" just must be stopped, that's only reserved for Big Oil.

Permalink     No Comments

LMAO - "C# thousands of times faster than C!"

I came across this site when doing some research on DAWGs, and actually guffawed out loud when I read this bit:



C# code sometimes performs a thousand times faster than C and uses less memory.

 This is so ludicrous that it can only be explained by the assumption that the author has drunk the Microsoft C# Kool-Aid. Thousands of times faster, really - wow, I guess we don't need supercomputers anymore, we just need C# running on run-of-the-mill desktops, running Windows of course. LMAO

If you're a C# developer, I'm not trying to piss you off. C# is a fine language, it is after all a total rip-off of Java so it at least has that going for it. Of course Java is platform independent and C# ties you forever into a Windows world, so you make the call on which is more useful. :)

Permalink     3 Comments

This is a Capitalist country, right?

I found this great editorial by Steven Brant that echos so many sentiments I've expressed for years, but a whole lot articulater. Brant makes the point that not only are all the government (and since they use our money, it's us really) bailouts of big (and supposedly private) business means that Capitalism in this country is dead. This passage is priceless:

They should have let A.I.G. fail, because -- if that had brought about
the collapse of the global economic system -- that would have just sped
up our journey to a point of systemic collapse we are destined to reach
anyway. I say destined to reach not because it's God's will but because
no system can continue to function when its fundamental design is
flawed. You see, the current global economic system is based on a
fundamental assumption that -- while it was true when the system was
first set up -- is no longer true today.

 Exactly! Let them fail - if they can't run a business, then economic Darwinism says let them die and be replaced with a smarter business than can do it better. And by the way, same damn thing goes for the airlines. If they can't figure out how much to charge for a ticket, then let them all go under. It's not harsh, it's Capitalism. Or is it?

Brant continues with another great point:

 

The funny thing is, I've known that a significant portion of the US
economy is Socialistic for years. "What are you talking about?", you
ask? "The Military Industrial Complex," I answer.

You do know that all military weapons are purchased using "cost
plus" contracts, in which businesses are guaranteed a profit, don't
you? And that literally every weapons system comes in over its original
budget... and that those cost overruns are absorbed by the government,
not the arms manufacturer? There is no Capitalism in the Military
Industrial Complex. It's all Socialism, justified by the concept that
these weapons are so important to American security that the companies
that manufacture them have to be guaranteed a profit, so they don't
accidentally go out of business.

 This is the same argument for bailing out airlines - they're so vital to the US economy that they just have to be kept going. If that is our truth, then fine - but make them publicly owned and controlled (Socialist) businesses not allegedly private (Capitalist) ones where the CEOs make scores of millions of dollars at eventual taxpayer expense.

And then there is this bit of neo-Marxism in Brant's piece:

And that's why Capitalism has died. Because it is a system that is
compelled to try and make more and more money based on Darwinian
principles that are no longer true. They were true when Capitalism was
created, but they are obsolete now. This death was inevitable, because
the mismatch between the world Wall Street thinks exists and the world
that really exists is so fundamental... the methods needed to continue
making money in a world of the past had become so complicated... that
self destruction was only a matter of time.

 Marx said that Capitalism is just one stage of economic evolution, not the end-all-be-all of economic evolution that most Americans think. So as Brant says, don't be sad at Capitalism's passing. It's natural selection in action.

But then Brant totally drops the ball right before crossing the goal line:

And with the choice of Barack Obama -- a man who knows cross-cultural,
systemic, and bi-partisan issues and who does not see the world through
warrior eyes as John McCain does - we have the opportunity to take a huge step (politically) in the direction of making this change.

 Missed it by that much. Brant just doesn't seem to understand that Obama isn't the Maverick that McCain claims to be, either. If Obama gets elected, nothing significant will change, that's what political history illustrates. The trouble is timing - the economic collapse that I've been waiting, indeed hoping for, just isn't quite here yet. The real candidate for change can only emerge when we are in it's throes, not before. Again, this is a lesson from history. Hitler rose to power because the climate in Germany at that time allowed him to - indeed, needed him to. FD Roosevelt was a key president for the US at a key time in our history, a couple of key times actually, whether you agree with his policies and actions or not. These two simple examples also serve as a great warning for us - when the collapse comes, it may not be a savior that rises to lead this country to a new future, it could be a great villain.

As the wand maker Ollivander says in Harry Potter, "He (Voldimort) did great things - terrible, yes, but great."

Permalink     No Comments

Look - three questions...

Threequestions
1. Is that a banana in your pocket?

2. Are you happy to see me?

3. If you aren't going to eat that banana, can I have it? Seriously, I want it.

Permalink     No Comments

Growing up

We were shopping at Target earlier, and while I was waiting outside the fitting rooms for Rissy and Melanie, I noticed a girl of about 13 sitting alone in the waiting area by the fitting rooms, too. She was intently examining some sort of toy set, something with small figurines etc. It was clearly something intended for someone younger, and I found it interesting how much this new teen was admiring this toy, this symbol of her own youth. She looked up suddenly and hurriedly put the package back into her cart just as her mom and younger sister came out of a fitting room. She clearly didn't want to get caught longing for this mere child's toy.

As bemused as I was at this observation, I was also somewhat melancholy. If you can't still be a kid at 13, then I guess 47 is just right out... 

Permalink     No Comments

Trying my hand at the iPhone

I've started working on my first iPhone application - the lure of the "big money" to be made in the AppStore is just too much to resist. Or something like that.

After reading through the docs and looking at some sample apps, I dove in. Within about 20 minutes I was time-warped back to about 1997, back to when I started doing PalmOS development. The docs sucked, were often misleading or flat wrong, and stuff just didn't work like it should. Deja vu. As I have admitted for years, working in Java has spoiled me to systems in which I have to do my own memory management, like the iPhone. Now doing my own memory management  I can handle, but I've already wasted hours trying to figure out bugs in which adding a copy to the right spot magically fixed the problem.

For you coders out there, here's a little treat:

NSInteger sorter(id s1, id s2, void *context)
{
NSString* v1=(NSString*)s1; 
NSString* v2=(NSString*)s2; 
return [v1 compare:v2];
}
 
- (void)createPicker
{
animationNames = [[[[NSBundle mainBundle] pathsForResourcesOfType:NULL inDirectory:@"animations"] sortedArrayUsingFunction:sorter context:NULL] copy];
CGRect rect = CGRectMake(0.0, 0.0, 320.0, 240.0);
UIPickerView *myPickerView = [[UIPickerView alloc] initWithFrame:rect];
myPickerView.autoresizingMask = UIViewAutoresizingFlexibleWidth | UIViewAutoresizingFlexibleHeight;

myPickerView.delegate = self;
myPickerView.showsSelectionIndicator = YES; // note this is default to NO

// add this picker to our view controller, initially hidden
[window addSubview:myPickerView];
[myPickerView release];
}

Notice the copy at the end of the first line of createPicker. Without that, the app bombs. In the debugger, animationNames is fine, but by the time the picker delegate gets invoked it's not if I remove that copy. I still don't get that one, and the other two very much like it, although I'm sure it's something incredibly simple and obvious to someone in the iPhone know. 

 

Permalink     No Comments

No, monkey, no!

Fan

Don't play with that fan!

Permalink     No Comments

2008-2009 Everett Skittles

The school we teach chess at, Everett Elementary in Lincoln, is changing the CLC program to a quarterly basis from a semester basis, so chess doesn't start until November this year. Since I'm out of vacation, that helps a little, but I'd really rather be there teaching with Melanie from the first day on if I can find a way to make that happen. At least next year I get 3 weeks of vacation, that should help. :)

If I could, I retire and teach chess full time to kids!

Permalink     No Comments