Monday, July 7, 2008

More thoughts on the future of the web

This is a continuation of my earlier post titled It's time for a new Web. I wanted to ramble more on the subject.

So fundamentally what we're talking about is bridging the gap between the internet and your operating system. Letting web applications install libraries that have direct access to graphics, sound, I/O, and File System layers. The trick is exposing all of the important interfaces in a useful and secure way. The question, is how low-level do you go and how much freedom do you allow?

How low?

One idea would be to simply take standards such as OpenGL and OpenAL or maybe SDL and make that your base API. Of course you would need additional APIs for other things such as networking and local file storage. Most programers would be more than happy with this level of access. In fact, most would be using libraries that acted on top of OpenGL and simplified and abstracted even further.

But why stop at the level of OpenGL? What if you just exposed the base hardware layer in a way that OpenGL would just be an internet library on top in the sense I talked about in the previous post.

How much freedom?

The question of how much freedom you give the programmer is important. For example, if you just gave direct access to your local filesystem malicious sites/applications could wreck all kinds of havoc on your system. So in this case it makes the most sense to just have a space allocated to the application that is managed by the browser. This space would be insulated from all other applications. The downside to this is that you would lose the ability to interface directly with other applications data. However, I feel that it's a better design to have the application provide interfaces on it's own with some kind of communication protocol.

Another problem with too much freedom is the loss of structure. I mentioned this earlier as well. If the programmer is just thinking about pixels and the pixels are making up text what tells other applications that the pixels equate to ASCII characters? In other words, how do we apply semantics to different concepts so we can do things like Copy-Paste.


One critical part of this vision of the web being a success is to have self-awareness. In other words, an API that lets you do things like query the applications and libraries that are installed.

The idea is that the browser doesn't have a defined UI (though there would be a basic default one). Your home page is your desktop environment. So users would chose different home pages such as Google or Yahoo and those pages would provide the user with their Application links, taskbar, tray, etc.

Of course certain interfaces would need to be defined. Such as the concept of an application being open, it's window properties, user messaging (toaster gui), and other basics. However, these would not be concepts with attached graphical standards though they would often be graphical. What I mean is an application wouldn't know how it was being accessed or represented; it would only know if it was visible or not and its size. (maybe some other info as well, but not much) Of course, the application could attach to events and query the other windows if it wanted to interact in some way.

Part of being self-aware is having an event system that all applications could access. So events would be fired when applications are closed, shown, entered, exited, etc. This would let the desktop app do something like track time spent in each app.

Breaking out of the box

While we're at it why not consider implementing things such as P2P as a standard. Wouldn't it be nice if you just downloaded an application update from your coworker next-door instead of the main site? What if P2P was a standard resource for programmers?

What about user interface considerations. Multiple mice, multiple keyboards, multiple monitors/screens. How would all of these interfaces be provided/queried? How do you abstract their input or do you give the raw input and let libraries deal with it? What about new kinds of devices? For example, multi-touch-pressure screens could be abstracted as multiple cursors but they're really a whole array of pressure values. Do you let the browser layer abstract such a device as a cursor or provide its input directly to applications, or both?

Ending thoughts

Hopefully some of these ideas made sense and you understand what I'm imagining. In a sense this is the holy grail, the thing that would remove operating system barriers and completely standardize computer software while letting programmers achieve anything they could imagine. I don't think it would be easy. You'd need to have a good core group designing the specs and then a marketing engine that could convince people to develop with the new standard and gradually bring it to the mainstream.

I think if such a system were to arrive it would first be like today's browsers. It would have it's own HTML rendering engine and act just like any other browser with the exception that it could do so much more. As developers began releasing internet applications through it and converting existing ones people would live in their browser more and more. There would be a point where it seemed like you had two desktops. (you already have two taskbars with tabs in your browser) People would begin to fullscreen the browser most of the time; eventually operating system distributions would focus solely on getting the user loaded in the browser and not even bother with its own desktop environment.

Tuesday, July 1, 2008

Thoughts on Web Knowledge

Forums suck. Email archives suck. In fact, pretty much all the web content you get from a google search for the solution to a problem sucks.

But let me start at the beginning...

I recently got a second monitor and got in running in Ubuntu Linux. I'm running an Nvidia card in a twinview configuration. I had one major frustration: Windows would maximize across both monitors and dialogs would appear right in the middle of the virtual screen across both monitors instead of within the middle of one.

Today I decided to find a solution to these issues. So I went to Google and did a number of searches.

I dug through a number of forums and email archives looking for the solution. I finally found half the solution on a blog and the other half in one of the forums. (I ended up having to remove xserver-xgl and adjust compiz settings)

Anyway, the point is that across the internet there's a huge duplication of knowledge and it's incredibly unorganized. Also, it is often misleading, out-dated, and unhelpful.

The biggest culprit is forums. Forums are poor organizers of knowledge. Threads are often duplicates and experts are often dubious. And sometimes the gold nugget is a comment 3 pages into a thread surrounded by 2 idiotic posts.

So what's the solution?

I'm thinking something like wikipedia, but directed at practical knowledge. And not a wiki per se, but another type of user submitted content system with more structure and a better sense of context as well as dimensions such as time.

It's Time for a New Web

The visual web is being pushed to its limits. What developers call Web 2.0 is really just a creative use of Javascript and a maturing of web design practices, which are mostly a result of better server side technologies and development techniques. Unfortunately, behind the scenes of every "Web 2.0" site is some pretty horrid technology layers that are being used in ways they were never designed for.

Many of the layers and interfaces in a site don't stack perfectly and have cross-cutting concerns that are hard to reconcile. For example, with CSS and HTML the idea is to separate your presentation layer from your structure. However, there is a significant amount of cross-over. For example, order of content, which is presentation, is largely defined in HTML. (the order of columns in a table, the order of sections of your page, etc.) You can absolutely position divs in your page with CSS, but you are limited in how you do so.

All of this aside, there are major rendering issues with HTML. First is the lack of pixel perfect rendering and browser inconsistencies, which add significant frustration and development time for designers. The next issue is that HTML was designed for static content. Animating and dynamically changing large portions of the page, as the new sites do, appears choppy at best and may have visual flaws.

The success story

HTML has had enormous success as the premiere communication medium on the web. I just wanted to highlight its virtues and explain why I think it has been so successful.

  • Hyperlinks: The defining concept of the web is the hyperlink - the idea that one HTML page can point to another

  • Powerful/Extensible: As we've seen recently, with some clever use of CSS and Javascript you can do some pretty fancy things. With plugins such as flash and java applets you can do even more

  • Human Readable, Low barrier of entry: You can pop open a text editor, type in some HTML and view it. You can copy this file to a server and serve it

  • Open Standard: The fact that HTML is a standard that anyone has the right to implement a browser for is essential to its acceptance

The Essence of HTML

I want to look at the fundamental technological essence of HTML. Basically, you have a Client and a Server. The Server has some content, the Client wants to see it. HTML is the medium or language you use to describe the content. There are many hidden aspects to it, such as how a hyperlink works, what meta-data a site has, asynchronous communication, etc. However, I want to focus on the media aspect of HTML. (Audio-Visual)

There are two essential aspects to HTML as a media.

1. It's like a compression algorithm

If I want the client to see some text I could send it as an image. But it's far more effective to just send the ASCII characters and have the client's font engine render the text. Essentially, this is compression. I give a few parameters: the text, the font metrics (size, weight, family, etc.) and at the end I get my rendered text.

2. It has semantic meaning

Certain concepts within HTML gain semantic meaning within the client's computing environment. For example, screen readers can read the text. If I copy HTML into my word processor, the word processor translates it into it's own markup. I can also copy things like tables into my spread sheet program. The same thing goes for things like images. This interface to the rest of the user's computing environment is incredibly useful.

A blank canvas

So lets start talking about a new web medium. If we could start again, if we had a blank slate, what would we want, how would we design things differently?

From an idealistic point of view we want a blank slate, complete technological freedom. Just give me a accelerated 3D canvas with a primitive drawing API and I'll build everything on that foundation.

But why limit it to graphical freedom? Why not have complete freedom?

Imagine your browser had a library versioning system. When you visit a new site the page defines the libraries it needs and where they can be found. Any libraries a client does not have, it downloads. There would be a number of standard libraries most sites would use: font engines, image format rendering, etc. Imagine the possibilities: You could provide new image formats, new font engines and fonts, on the fly.

On the other hand, this would be complete technological anarchy. What's to limit the number of libraries? What are the security implications?

There's also the whole side issue of maintaining semantics within the web. However, I feel like this is actually fairly straightforward. It would be each library's responsibility to attach semantic data. The font engine would attack text semantics to its rendering. The image library would do the same.

The New Web

If this could be achieved you would have completely redefined the internet and desktop computing experience. Instead of firing up an application, you would go to a certain url. If done right such a standard would erase operating system barriers and become the defacto standard for software distribution.

You might say this sounds a lot like Java Web Start. I would say the goals are similar but the key is that such a system would be language agnostic. Not only language agnostic but future language compatible.

Let me go over a detailed example. Imagine I wrote a new scripting language called Ascript. I write its interpreter in C and compile it for a few different processors and package it according to some web library standards. I then write an application in Ascript. My application has a standard web header, which specifies the Ascript dependency. I put this application on my webserver. When I client visits the url, they download Ascript if they don't have it, then the browser launches my Ascript script with the downloaded interpreter.

There are a lot of issues to resolve at this point. What would the API be like? Would you have a direct interface to OpenGL? How would the user control application access? Maybe have dialogs such as "X wants to access your file system. Approve/Deny" How would the distribution work? Could you assist distribution with automated Bit Torrents? How would applications be organized? Would they have desktop/start menu icons? What other APIs are available? Where do you draw the line? Could you execute assembly?

I'll probably elaborate more on these ideas in the future.