Thursday, March 15, 2012

Rick Astley, Finite State Automata and Database

I was watching a video on Udacity tonight as part of a course on elementary Python. It's a good course. Tonight's video was about the technology behind the Internet and the speaker was talking about information. I balked slightly when he said that the Bit is the smallest unit of information. I'm sure this is a common idea not just in computing. I balked because a Bit contains potentially two pieces of information - 0 or 1. The speaker was perhaps referring to an instance of a Bit - either 0 OR 1 - and therefore meaning that there is one piece of information encoded in a Bit when it's instantiated in either form. However, because this would mean that any number system can be said to contain the smallest piece of information, e.g. the number 1729698698729837198273826372 is a single piece of information, then I think that we need to be clearer. Since a Bit is part of a Binary system any instantiation is automatically associated with it's opposite, i.e. every 0 is not 1 and every 1 is not 0. Certainly in computing terms it's as important that a value is not 0 if it is 1 and vice versa.

So what is the smallest carrier of information? I think it's the unary system. This is a very familiar number system that uses a single symbol for each single unit (in this case the unit of one). The denary number 4 is 1111 in unary for example. Very interestingly there is no symbol for zero since the addition of a new symbol would cease to make the system unary. Many early number systems (e.g. Roman) were of this type and also did not have a symbol for zero. I don't think this meant that there was no concept of zero, but just that since the number system everyone used didn't have an explicit symbol it wasn't easy to invent one. Debates were certainly had in ancient Greece and the Middle East (e.g. see Zeno of Elea's questions) about the nature of zero and whether it merited a symbol of its own. It has one now and mathematics would be in a very different place without it.

What we're saying by having a zero is that there is an absence of something. This in itself is information. Only by having a system with one possible state can there be the most basic of information-storage mechanisms. We can't have any workable system with no states.

In effect the addition of the zero to mathematics and also the null state in binary (and therefore computing) is a fudge to enable other mechanisms to take place. Nature does not use nulls. At least not that we know of anyway! This is another interesting line of thinking - is energy itself in opposition to anything, e.g. empty spacetime? In concept elementary particles/energy seem to be in opposition to other particles/energy and exist in a space that bears no relation to them except as containers. Multi-dimensional space and M-theory may say different!. If the universe is just one thing that happens to be "folded" in an interesting and eventful way does that make everything One in essence? That's all very Zen and all very unknown at the moment.

So what happens when we take away the zero from computing? We have a unary system that can only represent information by the presence of data - in this case our data is only 1s. If we think of an array of binary, e.g. 01010101 (represented by the denary value 85) our equivalent unary value would be 111111111111111111111111111111111111111111111111111111111111111111111111111111111111 (interestingly not twice the size!).

Binary computers use various logic transformations: AND, OR gates, Flip Flops etc to build up a logic system that ultimately can deal with complex data and representations. Our unary computer doesn't have the logical flexibility to do the same thing. It can do serial processing, however. Finite State Automata are popular processing mechanisms in Artificial Intelligence and Linguistics and deal with operations on a finite state system, potentially a single state. I don't know enough about the differences between Boolean logic and FSA to go further, but this is intriguing to me since it links AI and linguistics in a single system type that also removes the arbitrary nothingness of the added zero.

When considering Artificial Intelligence I often consider the human brain as a model to measure constructed systems against. One major problem, however, is that we don't know how the brain works! I think we have a fairly good idea how it doesn't work; it doesn't use logic gates and binary state systems. It's much more likely to use a number of different mechanisms including parallel processing, weighted integration and controller hierarchies (my terms) among others. For example the amount of inputs and signals flowing through the old grey matter is astonishing. There must be huge amounts of integration and weighting going on that takes many inputs in order to produce very few, but strongly-weighted outputs. Our brains aren't simply a mass of competing impulses, however. There are great controlling structures that, seemingly by virtue of their physical orientation, garner even greater control over outputs. These systems have their positions of influence and the weights are processed the way they are due to innate structures developed through evolution and the continual population of those structures with information during the brain's lifetime.

The brain doesn't store data in zeroes and ones however. Does it store data in a unary format? It's a very difficult question to answer and we would need to define what data is and what it means to store anything in a system like the brain. Suffice it to say that the brain is a more analogue system and too much biology/physics/chemistry comes into play when talking about how data is stored and retrieved for me to say either way. The importance of an absence of data at all levels is crucial, however.

I'd like to think that unary computing is more important than it would seem on the face of it. An alternative approach is to combine a unary system with a map system. Imagine a grid filled with 1s. We can only have a single-dimensional grid, however. It would be exactly as long as the amount of data to be stored. We can use a mapping of the places in the grid to represent something other than 1s. Since the grid will expand and contract with the data being stored the map will have to be commensurately dynamic. How would this work? Groups of 1s cannot be distinguished from other groups of ones. The only way to differentiate various states would be the total length. If the total length of the grid was used to determine the map to use to lookup meaning then we could define a system that could provide a way to store meaningful data in this way. Is it useful? Perhaps not. Interesting certainly and one to consider further, especially combined with FSA and/or possibly Turing Machines.

I understand that computational complete programming environments are readily available through many desktop systems so that the usefulness of a unary data system might be moot. What the investigation of different mechanisms in storage and processing might lead to is new insights in AI, however. That's the goal. Not a unary computer interesting though it might be!

All this talk of data reminds me of a recurring theme in my mind of the last few years that I don't think I've blogged about (apologies if I have). When I imagine myself in interviews (please be kind if prospective interviewers ever read this!) I spout wonderful phrases such as "well I consider the entire universe is a database therefore why wouldn't you hire me to transform your varchar data into nvarchar?".

My masters degree in Database Systems taught me many things and I'm sure my colleagues learnt the same things, but I doubt that any of them began to understand the importance of what a database "really" is. My favourite definition of a database is "an organised collection of data". Now I'm sure that many objects would take on humongous meaning when charged with ambiguous definitions like that, but in this case I think it is perfect. In my particularly loose interpretation "data" just means anything that can be represented and "organised" is just the opposite of chaotic. I take it to mean that there is some framework or organisational rule system that governs how the data behaves, but apart from that it's an extremely wide-ranging definition. So much so that I consider the entire universe to be a database. All matter and energy are entities that can entail representation and the whole system has some framework or rule system that governs it, e.g. Gravity, Weak Force etc. What does this mean (apart from being in interesting interview talking point)?

Not much. Except that it gives an upper limit on the size of a database. It also makes us think of the universe and everything in it slightly differently: it adds fuel to my favourite theory that we're all functioning cogs in a machine constructed as some sort of school project by pan-dimensional beings (actually this just sounds good - we are pan-dimensional beings! Perhaps this should be pan-extra-dimensional beings) - see the Simulation Theory that I like here. On closer inspection it seems that it's a somewhat circular argument - we conceive of a governing rule system outside of our perceptible universe that over arches the entire perceptible universe that's remarkably similar to the governing rule systems that we conceive of from our perceptible universe.

I hope to come back to this theme, perhaps when I haven't already rabbited-on for quite so long already!

Which of these jokes is funnier:

1. Rick Astley asked me to lend him a Pixar DVD today. I said "I'll give you Toy Story or Wall-E, but I'm never gonna give you Up".

2. I asked Rick Astley to lend me a Pixar DVD today. He said "I'll give you Toy Story or Wall-E, but I'm never gonna give you Up".

I think that number one is funnier. I have been thinking about why for a day now. I think it's something to do with the expectation being debunked. This is a common comedic theme: to set up an expectation and then to debunk it often causes comedy. By having Rick Astley say the words seems less funny than having me say it because it's more unexpected that I'll say it.

Personally I think this is an even funnier variant, but perhaps shows too much of my warped mind ...

3. Rick Astley asked me to lend him a DVD. I said "I've got this thing by Pixar called Up. I'll give you that?"