Friday, June 4, 2010

cookie

A cookie, also known as a web cookie, browser cookie, and HTTP cookie, is a text string stored by a user's web browser. A cookie consists of one or more name-value pairs containing bits of information, which may be encrypted for information privacy and data security purposes.

The cookie is sent as an HTTP header by a web server to a web browser and then sent back unchanged by the browser each time it accesses that server. A cookie can be used for authentication, session tracking (state maintenance), storing site preferences, shopping cart contents, the identifier for a server-based session, or anything else that can be accomplished through storing textual data.

As text, cookies are not executable. Because they are not executed, they cannot replicate themselves and are not viruses. However, due to the browser mechanism to set and read cookies, they can be used as spyware. Anti-spyware products may warn users about some cookies because cookies can be used to track people—a privacy concern.

Setting a cookie

Transfer of Web pages follows the HyperText Transfer Protocol (HTTP). Regardless of cookies, browsers request a page from web servers by sending them a usually short text called HTTP request. For example, to access the page http://www.example.org/index.html, browsers connect to the server www.example.org sending it a request that looks like the following one:


GET /index.html HTTP/1.1
Host: www.example.org

browser
server  
The server replies by sending the requested page preceded by a similar packet of text, called 'HTTP response'. This packet may contain lines requesting the browser to store cookies:


HTTP/1.1 200 OK
Content-type: text/html
Set-Cookie: name=value

(content of page)

browser
server
The server sends the line Set-Cookie only if the server wishes the browser to store a cookie. Set-Cookie is a request for the browser to store the string name=value and send it back in all future requests to the server. If the browser supports cookies and cookies are enabled, every subsequent page request to the same server will include the cookie. For example, the browser requests the page http://www.example.org/spec.html by sending the server www.example.org a request like the following:

GET /spec.html HTTP/1.1
Host: www.example.org
Cookie: name=value
Accept: */*


browser
server
This is a request for another page from the same server, and differs from the first one above because it contains the string that the server has previously sent to the browser. This way, the server knows that this request is related to the previous one. The server answers by sending the requested page, possibly adding other cookies as well.
The value of a cookie can be modified by the server by sending a new Set-Cookie: name=newvalue line in response of a page request. The browser then replaces the old value with the new one.
The term "cookie crumb" is sometimes used to refer to the name-value pair. This is not the same as breadcrumb web navigation, which is the technique of showing in each page the list of pages the user has previously visited; this technique, however, may be implemented using cookies.
The Set-Cookie line is typically not created by the base HTTP server but by a CGI program. The basic HTTP server facility (e.g. Apache) just sends the result of the program (a document preceded by the header containing the cookies) to the browser.
Cookies can also be set by JavaScript or similar scripts running within the browser. In JavaScript, the object document.cookie is used for this purpose. For example, the instruction document.cookie = "temperature=20" creates a cookie of name temperature and value 20.

Cookie Basics

 A cookie is a piece of text that a Web server can store on a user's hard disk. Cookies allow a Web site to store information on a user's machine and later retrieve it. The pieces of information are stored as name-value pairs

For example, a Web site might generate a unique ID number for each visitor and store the ID number on each user's machine using a cookie file.

­ If you use Microsoft's Internet Explorer to browse the Web, you can see all of the cookies that are stored on your machine. The most common place for them to reside is in a directory called c:\windows\cookies. When I look in that directory on my machine, I find 165 files. Each file is a text file that contains name-value pairs, and there is one file for each Web site that has placed cookies on my machine.
You can see in the directory that each of these files is a simple, normal text file. You can see which Web site placed the file on your machine by looking at the file name (the information is also stored inside the file). You can open each file by clicking on it.
For example, I have visited goto.com, and the site has placed a cookie on my machine. The cookie file for goto.com contains the following information:

UserID    A9A3BECE0563982D    www.goto.com/
 
Goto.com has stored on my machine a single name-value pair. The name of the pair is UserID, and the value is A9A3BECE0563982D. The first time I visited goto.com, the site assigned me a unique ID value and stored it on my machine. 

How does cookie data move?

As you saw in the previous section, cookie data is simply name-value pairs stored on your hard disk by a Web site. That is all cookie data is. The Web site stores the data, and later it receives it back. A Web site can only receive the data it has stored on your machine. It cannot look at any other cookie, nor anything else on your machine.
Unfinished URL
iStockphot
When you type a URL into a web browser, a web server might look in your cookie file.
The data moves in the following manner:
  • If you type the URL of a Web site into your browser, your browser sends a request to the Web site for the page. For example, if you type the URL http://www.amazon.com into your browser, your browser will contact Amazon's server and request its home page.
  • When the browser does this, it will look on your machine for a cookie file that Amazon has set. If it finds an Amazon cookie file, your browser will send all of the name-value pairs in the file to Amazon's server along with the URL. If it finds no cookie file, it will send no cookie data.
  • Amazon's Web server receives the cookie data and the request for a page. If name-value pairs are received, Amazon can use them.
  • If no name-value pairs are received, Amazon knows that you have not visited before. The server creates a new ID for you in Amazon's database and then sends name-value pairs to your machine in the header for the Web page it sends. Your machine stores the name-value pairs on your hard disk.
  • The Web server can change name-value pairs or add new pairs whenever you visit the site and request a page.
There are other pieces of information that the server can send with the name-value pair. One of these is an expiration date. Another is a path (so that the site can associate different cookie values with different parts of the site).
You have control over this process. You can set an option in your browser so that the browser informs you every time a site sends name-value pairs to you. You can then accept or deny the values.

How do Web sites use cookies?

Cookies evolved because they solve a big problem for the people who implement Web sites. In the broadest sense, a cookie allows a site to store state information on your machine. This information lets a Web site remember what state your browser is in. An ID is one simple piece of state information -- if an ID exists on your machine, the site knows that you have visited before. The state is, "Your browser has visited the site at least one time," and the site knows your ID from that visit.
Web sites use cookies in many different ways. Here are some of the most common examples:
  • Sites can accurately determine how many people actually visit the site. It turns out that because of proxy servers, caching, concentrators and so on, the only way for a site to accurately count visitors is to set a cookie with a unique ID for each visitor. Using cookies, sites can determine:
    • How many visitors arrive
    • How many are new versus repeat visitors
    • How often a visitor has visited
    The way the site does this is by using a database. The first time a visitor arrives, the site creates a new ID in the database and sends the ID as a cookie. The next time the user comes back, the site can increment a counter associated with that ID in the database and know how many times that visitor returns.
  • Sites can store user preferences so that the site can look different for each visitor (often referred to as customization). For example, if you visit msn.com, it offers you the ability to "change content/layout/color." It also allows you to enter your zip code and get customized weather information. When you enter your zip code, the following name-value pair gets added to MSN's cookie file:
     
     WEAT  CC=NC%5FRaleigh%2DDurham&REGION=  www.msn.com/
    Since I live in Raleigh, N.C., this makes sense. 
    Most sites seem to store preferences like this in the site's database and store nothing but an ID as a cookie, but storing the actual values in name-value pairs is another way to do it (we'll discuss later why this approach has lost favor).
  • E-commerce sites can implement things like shopping carts and "quick checkout" options. The cookie contains an ID and lets the site keep track of you as you add different things to your cart. Each item you add to your shopping cart is stored in the site's database along with your ID value. When you check out, the site knows what is in your cart by retrieving all of your selections from the database. It would be impossible to implement a convenient shopping mechanism without cookies or something like them.
­ In all of these examples, note that what the database is able to store is things you have selected from the site, pages you have viewed from the site, information you have given to the site in online forms, etc. All of the information is stored in the site's database, and in most cases, a cookie containing your unique ID is all that is stored on your computer.

Problems with Cookies

Cookies are not a perfect state mechanism, but they certainly make a lot of things possible that would be impossible otherwise. Here are several of the things that make cookies imperfect.
  • People often share machines - Any machine that is used in a public area, and many machines used in an office environment or at home, are shared by multiple people. Let's say that you use a public machine (in a library, for example) to purchase something from an online store. The store will leave a cookie on the machine, and someone could later try to purchase something from the store using your account. Stores usually post large warnings about this problem, and that is why. Even so, mistakes can happen. For example, I had once used my wife's machine to purchase something from Amazon. Later, she visited Amazon and clicked the "one-click" button, not realizing that it really does allow the purchase of a book in exactly one click. On something like a Windows NT machine or a UNIX machine that uses accounts properly, this is not a problem. The accounts separate all of the users' cookies. Accounts are much more relaxed in other operating systems, and it is a problem.
    If you try the example above on a public machine, and if other people using the machine have visited HowStuffWorks, then the history URL may show a very long list of files.
  • Cookies get erased - If you have a problem with your browser and call tech support, probably the first thing that tech support will ask you to do is to erase all of the temporary Internet files on your machine. When you do that, you lose all of your cookie files. Now when you visit a site again, that site will think you are a new user and assign you a new cookie. This tends to skew the site's record of new versus return visitors, and it also can make it hard for you to recover previously stored preferences. This is why sites ask you to register in some cases -- if you register with a user name and a password, you can log in, even if you lose your cookie file, and restore your preferences. If preference values are stored directly on the machine (as in the MSN weather example above), then recovery is impossible. That is why many sites now store all user information in a central database and store only an ID value on the user's machine. If you erase your cookie file for HowStuffWorks and then revisit the history URL in the previous section, you will find that HowStuffWorks has no history for you. The site has to create a new ID and cookie file for you, and that new ID has no data stored against it in the database. (Also note that the HowStuffWorks Registration System allows you to reset your history list whenever you like.)
  • Multiple machines - People often use more than one machine during the day. For example, I have a machine in the office, a machine at home and a laptop for the road. Unless the site is specifically engineered to solve the problem, I will have three unique cookie files on all three machines. Any site that I visit from all three machines will track me as three separate users. It can be annoying to set preferences three times. Again, a site that allows registration and stores preferences centrally may make it easy for me to have the same account on three machines, but the site developers must plan for this when designing the site. If you visit the history URL demonstrated in the previous section from one machine and then try it again from another, you will find that your history lists are different. This is because the server created two IDs for you, one on each machine.
­ There are probably not any easy solutions to these problems, except asking users to register and storing everything in a central database.
When you register with the HowStuffWorks registration system, the problem is solved in the following way: The site remembers your cookie value and stores it with your registration information. If you take the time to log in from any other machine (or a machine that has lost its cookie files), then the server will modify the cookie file on that machine to contain the ID associated with your registration information. You can therefore have multiple machines with the same ID value.

No comments: