You are viewing a read-only archive of the Blogs.Harvard network. Learn more.

How to get url parsing like rails or nitro

I was so pleased that with the transfer to WordPress, everything read much better than on the Manilla server. When I next looked at the blog, the same text was completely compressed without any return line! Go figure! (the wordpress auto paragraph formatting had been disabled, admin put it back for me Ouf!)

Here is today’s question, how do Rails or Nitro process URLs so that a URL request can be processed without an actual file of that name on the server?

Typing a URL on the top of the page initiates a communication with the computer hosting the website: The browser opens a connection with that computer and then issues a GET command to receive a file.

telnet 80
GET / HTTP/1.0

The receiving computer recognises the “GET” command, sends info about itself, figures out what file is wanted (in the case of “/”, it looks for a root index page), and if it finds it, sends it to the requesting computer.

Frameworks like Rails or Nitro intercept requests within a certain directory for pre-processing. The framework code has to be the one responding to the GET command issued over the http connection. How is it done?

Web applications are machine independant, so a framework probably works with the browser rather than at the machine level. How to handle URL’s must be loaded in the browser before any request comes in.

In Apache some AP_GET_ browser_fx is called upon by the browser when it detects the GET command. The browser probably stores the information parsed and initiates a default action unless… there is an alternative default setup, in which case the request information could be passed on in environment variables, or simply as the original URL string request.

One avenue of inquiry is to look into Apache’s documentation.
Since Rails works with CGI or fcgi, CGI would be another avenue of inquiry.
I prefer to look at how it is done, and see what I can learn from it.
In rails, when you deploy your application you set up Apache through its httpd.conf file. That’s hard wired and requires restarting Apache. You set up a virtual host (see Rails Agile web book on virtual host, chapt 22, page 455) and you tell it that a file in the directory corresponding to that virtual host will overide all. That file is invisible, it is called .htaccess
Any file which does not exist is redirected by this .htaccess file to dispatch.cgi by the code
RewriteRule ^(.*)$ dispatch.cgi [QSA,L]

This use of .htaccess reminds me of what I did to get erb to work
“Your server is configured so that CGI scripts placed in /Library/WebServer/CGI-Executables/ map to …
Create the following .htaccess file somewhere below your server’s document root at /Library/WebServer/Documents/ to enable the .rhtml handler for a specific directory…”

I also found an interesting website which talks about how search engines deal with dynamic websites.
It explains how to use mod-rewrite to be more search engine friendly
RewriteRule ^productid([^.]+).*$ yourscript.php?id=$1 [T=application/x-httpd-php]
this code is written in htaccess file.

So over and over I find the same technical solution in apache, a virtual host with a hidden htaccess file. php, Cgi, or a framework come down the line to route the request and create dynamic web links to present.

What about other webservers? What can I learn from them?

I was able to redirect URLs with webrick using catapult. There the code is short and easy to identify.

class RequestHandler (less than sign) HTTPServlet::AbstractServlet
def process_request( request, response )
…. request.path_info.split

the request.path_info contains the URL, the process_request method can create an instance of the class requested and run the associated code to create and return a response.body and a response content type.

I was able to redirect URL requests with Nitro very easily. Unlike Rails, I did not have to generate miriads of files and folders. All I had to do was to define my class methods in a file main.rb in a folder called controller.
The server code maps the port to the controller folder.
setting :map, :default => { ‘/’ => Controller }, :doc => ‘The server map’

when you run nitro (server.rb), server/runner.rb determines which webserver to use, in the case of webrick, adapter/webrick

@webrick =
class WebrickAdapter (less than sign) WEBrick::HTTPServlet::AbstractServlet
def handle(req, res)

and in rails there is a webrick server
class DispatchServlet (less than sign) WEBrick::HTTPServlet::AbstractServlet
def service(req, res) #:nodoc:

With webrick, I see basic code for which all default behaviours must be defined, but with the advantage of being able to do a basic redirect very easily, as part of one’s code as shown by the streamlined catapult.

What did I learn?
Learning how to do a redirect is not basic task. At the root of the internet behavior is a set of rules on the kind of data can be placed on the internet. HTTP data, XML data etc On top of that, any numbers of software can be written which access the internet in read or write mode. And on top of that code or frameworks are written to automate specific kinds of processes.

A more complex server requires code in its configuration file, or task specific instruction file, A lighter server like webrick expects code in a specified section of code. Frameworks require a file of the proper name in a specified directory… every where, conventions, conventions, conventions. Somehow, people out there assimilate these conventions, manipulate them, create new behaviors.

What I lalso earned from this process is the importance of finding a small program like catapult which has the desired behavior, but is so small I can see how it is done. For learning at least, small programs are more helpful. Once I know what to look for it is easier to look for similar behavior features in other more complex code.

Comments are closed.

Log in