Copyright: 1996-8 Nick Kew Last-modified: May 29th 1998 Posting-Frequency: Every 15 days Frequently Asked Questions on CGI programming ------------------------------ Subject: Table of Contents ========================== 0. Preamble 0.1. Changes 0.2. Notice and Disclaimer 0.3. Where to get this document 0.4. How to contribute to this document? 0.5. Can I email the author my questions? 0.6. What's up with posting to comp.infosystems.www.authoring.cgi? 0.7. Credits 1. Basic Questions 1.1. What is CGI? 1.2. Is it a script or a program? 1.3. When do I need to use CGI? 1.4. Should I use CGI or JAVA? 1.5. Should I use CGI or SSI or ... { PHP/ASP/... } 1.6. Should I use CGI or an API? 1.7. So what are in a nutshell the options for webserver programming? 1.8. What do I absolutely need to know? 1.9. Does CGI create new security risks? 1.10. Do I need to be on Unix? 1.11. Do I have to use Perl? 1.12. What languages should I know/use? 1.13. Do I have to put it in cgi-bin? 1.14. Do I have to call it *.cgi? *.pl? 1.15. What is the "CGI Overhead", and should I be worried about it? 1.16. What is CGIWrap, and how does it affect my program? 1.17. How do I decode the data in my Form? 2. HTTP Headers and NPH Scripts 2.1. What is HTTP (HyperText Transfer Protocol)? 2.2. What HTTP request headers can I use? 2.3. What Environment variables are available to my application? 2.4. What HTTP response headers do I need to know about? 2.5. What is NPH? 2.6. Must/should/can I write nph scripts? 2.7. Do I have to call it nph-* 2.8. What is the difference between GET and POST? 3. Techniques: "How do I..." 3.1. Can I get information about who is visiting? 3.2. Can I get the email of visitors? 3.3. "But I saw some.kool.site display my email address..." 3.4. Can I verify the email addresses people enter in my Form? 3.5. Subject: How can I get the hostname of the remote user? 3.6. Can I get browser details and return different pages? 3.7. Can I trace where a user has come from/is going to? 3.8. Can I launch a long process and return a page before it's finished? 3.9. Can I launch a long process which the user interacts with? 3.10. Can I password-protect my pages? 3.11. Can I do HTTP authentication using CGI? 3.12. Can I identify users/sessions without password protection? 3.13. Can I redirect users to another page? 3.14. Can I run a CGI script without returning a new page to the browser? 3.15. Can I write output to a different Netscape frame? 3.16. Can I write output to several frames at once? 3.17. Can I use a CGI script to generate both text and inline images? 3.18. How can I use Caches to make CGI scripts faster and more Net-friendly? 3.19. How can I avoid users hitting "submit" twice? 3.20. How can I stop my CGI script reading and writing files as "nobody"? 3.21. How can I prevent my CGI results being cached by the browser? 3.22. How can I control the default filename when downloading a file via CGI? 4. Applications: Is there an existing script to ... 4.1. Where to look for programs, scripts, and other resources? 4.2. Where to look for free scripts for my application? 4.3. Discussion group/bulletin board 4.4. CSCW/Groupware 4.5. Database 4.6. Is than a non-setuid script to allow users to change password? 5. Troubleshooting a CGI application 5.1. Are there some interactive debugging tools and services available? 5.2. I'm having trouble with my headers. What can I do? 5.3. Why do I get Error 500 ("the script misbehaved", or "Internal Server Error ") 5.4. I tried to use (Content-Type|Location|whatever), but it appears in my Brow ser? 5.5. How can I run my CGI program 'live' in a debugger? 6. Further Reading 6.1. Other FAQs/collections 6.2. Reference Pages INDEX ------------------------------------------------------------- Subject: SECTION 0 - PREAMBLE NOTE: the numbering in this document is automatically generated by my posting software, and will change between postings if new questions are added (as _may_ happen when I see - or someone contributes - a FAQ I've previously overlooked :-) ------------------------------ Subject: 0.1 Changes Last Modified: May 29th 1998: * Added advanced debugging tip from David Jackson * Added words of wisdom on Language from J.M. Ivler * Added summary table of webserver programming options (alternatives to CGI) * Updated No-content answer * Removed some dead URLs ------------------------------ Subject: 0.2 Notice and Disclaimer Copyright 1996-8 Nick Kew. You are free to copy or distribute this document in whole or in part for any purpose and on any medium you choose, provided: You DON'T do so for profit. You DO include this notice and disclaimer in full. Disclaimer: This information is offered in good faith and in the hope that it may be of use, but is not guaranteed to be correct, up to date or suitable for any particular purpose. The author accepts no liability in respect of this information or its use. ------------------------------ Subject: 0.3 Where to get this document The homes of this document on the Web are now * the WebThing Virtual Office, at http://www.webthing.com/ URL http://www.webthing.com/tutorials/cgifaq.html * the Web Design Group, at http://www.htmlhelp.com/ URL http://www.htmlhelp.org/faq/cgifaq.html NOTE - If you want to mirror the FAQ on your WWW site on a publicly-visible server, please make sure you keep it up-to-date. Other known sources are: (1) USENET: posted to newsgroups (TEXT) news:comp.infosystems.www.authoring.cgi news:comp.answers news:news.answers (2) RTFM and mirror sites (TEXT) ftp://rtfm.mit.edu/pub/usenet/news.answers/www/cgi-faq (3) RTFM WWW mirror sites, including (Partial HTML) Europe - http://www.cs.ruu.nl/cgi-bin/faqwais America - http://www.cis.ohio-state.edu/hypertext/faq/usenet/ (4) By EMAIL from my autoresponder (TEXT) Send blank email to mailto:nick+cgi_text@webthing.com (the HTML version has been discontinued: please use the Web. Note too that I'm not always very good at keeping this version up to date) (5) By EMAIL from the FAQserver at RTFM (TEXT) Send email to mailto:mail-server@rtfm.mit.edu with send usenet/news.answers/www/cgi-faq in the body of your message ------------------------------ Subject: 0.4 How to contribute to this document? The WebThing software permits collaborative authoring using your web browser. When you are reading any entry in this InterFAQ, you can add a new entry which will then appear as another "more on" subject. http://www3.pair.com/webthing/ (note: the version at this site is no longer listed in the previous question) In order to maintain the quality of the FAQ, and avoid inappropriate 'commercial' entries, write permission is limited using an Access Control List. If you have a contribution to make, send me an email including your WebThing userid (i.e. what you entered in the registration form) and I'll add you to the list. InterFAQ readers - If your browser isn't showing a "new entry" button, then either you aren't logged in or you're not on the access control list. Note that this InterFAQ is limited to questions-and-answers appropriate to periodic Usenet posting. Other types of contribution can be added elsewhere in the WebCentre. For example * If you have a relevant website and want to link to it, enter it the appropriate collection (e.g. "scripts" or "misc"). You can then also include a description of your site, and have it indexed. * If you want to post a question or comment on something in this document, you can post it as a followup to the "flat" version of the FAQ (library document in the "FAQS" collection). If you don't want to use the InterFAQ you can always mail me ( mailto:nick@webthing.com ) ------------------------------ Subject: 0.5 Can I email the author my questions? Please don't. Post them to an appropriate newsgroup, where they'll be seen and possibly answered by a whole lot more people than just me. And remember: bad (or incoherent) questions get bad answers, so think carefully before posting. If you have an actual programming job to do, I might be interested However, I am currently not interested in jobs below $1000 under any circumstances. If you think something already in the FAQ needs clarifying, feel free to mail me: don't expect a personal reply, but I *might* add something to the answer in question, so check the next posting (or three). ------------------------------ Subject: 0.6 What's up with posting to comp.infosystems.www.authoring.cgi? This is now a moderated newsgroup. The moderator is a bot run by Thomas Boutell ( mailto:boutell@boutell.com ). The charter for moderation is as follows: This newsgroup is self-moderated. Your first posting will not appear until you have read and responded to an automatic welcome mailing, at which point your posting will appear with no further delay. Provision will also be made to automatically approve first postings that contain a header requesting this. Subsequent postings are approved automatically. If posting normally doesn't work - as could be the case if your newsfeed has trouble with moderated groups - you can post articles by emailing them to: mailto:authoring-cgi@boutell.com Provided the return address in your mail is correct, you will then receive precise instructions for having your post(s) automatically approved. Alternative means of posting are detailed in the WWW FAQ, posted regularly by Thomas Boutell. ------------------------------ Subject: 0.7 Credits This FAQ was written by Nick Kew, and has been considerably improved with the help of comments and criticisms, newsgroup posts and miscellaneous suggestions from correspondents including Nathan Neulinger, Maurice L. Marvin, Matthew Healy, Alan J. Flavell, Don Libes, Alain Deckers, David S. Jackson, J.M. Ivler, and no doubt others I've forgotten to credit (please remind me if necessary). ------------------------------------------------------------- Subject: SECTION 1 - BASIC QUESTIONS This section aims to deal with basic questions, addressing the role and nature of CGI, and its place in Web programming. Questions/answers which just don't appear to 'fit' under any other section may also be included here. ------------------------------ Subject: 1.1 What is CGI? [ from the CGI reference http://hoohoo.ncsa.uiuc.edu/cgi/overview.html ] The Common Gateway Interface, or CGI, is a standard for external gateway programs to interface with information servers such as HTTP servers. A plain HTML document that the Web daemon retrieves is static, which means it exists in a constant state: a text file that doesn't change. A CGI program, on the other hand, is executed in real-time, so that it can output dynamic information. ------------------------------ Subject: 1.2 Is it a script or a program? The distinction is semantic. Traditionally, compiled executables (binaries) are called programs, and interpreted programs are usually called scripts. In the context of CGI, the distinction has become even more blurred than before. The words are often used interchangably (including in this document). Current usage favours the word "scripts" for CGI programs. ------------------------------ Subject: 1.3 When do I need to use CGI? There are innumerable caveats to this answer, but basically any Webpage containing a form will require a CGI script or program to process the form inputs. ------------------------------ Subject: 1.4 Should I use CGI or JAVA? [answer to this non-question hopes to try and reduce the noise level of the recurrent "CGI vs JAVA" threads]. CGI and JAVA are fundamentally different, and for most applications are NOT interchangable. CGI is a protocol for running programs on a WWW server. Typical applications include accessing a database, submitting an order, or posting messages to a bulletin board. JAVA is a programming language, and is an alternative to C, Perl, etc rather than to CGI. Java was designed for network applications, and its close association with the Web stems from its ability to run programs safely on the Client machine, and its adoption in browsers (especially Netscape). In certain instances the two may be combined in a single application: for example a JAVA applet to define a region of interest from a geographical map, together with a CGI script to process a query for the area defined. ------------------------------ Subject: 1.5 Should I use CGI or SSI or ... { PHP/ASP/... } CGI and SSI (Server-Side Includes) are often interchangable, and it may be no more than a matter of personal preference. Here are a few guidelines: 1) CGI is a common standard agreed and supported by all major HTTPDs. SSI is NOT a common standard, but an innovation of NCSA's HTTPD which has been widely adopted in later servers. CGI has the greatest portability, if this is an issue. 2) If your requirement is sufficiently simple that it can be done by SSI without invoking an exec, then SSI will probably be more efficient. A typical application would be to include sitewide 'house styles', such as toolbars, netscapeised
tags or embedded CSS stylesheets. 3) For more complex applications - like processing a form - where you need to exec (run) a program in any case, CGI is usually the best choice. Many more recent variants on the theme of SSI are now available. Probably the best-known are PHP which embeds server-side scripting in a pre-html page, and ASP which is a Microsoft proprietary API with the backing of MS marketing. ------------------------------ Subject: 1.6 Should I use CGI or an API? APIs are proprietary programming interfaces supported by particular platforms. By using an API, you lose all portability. If you know your application will only ever run on one platform (OS and HTTPD), and it has a suitable API, go ahead and use it. Otherwise stick to CGI. ------------------------------ Subject: 1.7 So what are in a nutshell the options for webserver programming? Too many to enumerate - but I'll try and summarise. Briefly, there are several decisions you have to make, including: * Power. Is it up to a complex task? * Complexity. How much programming manpower is it worth? * Portability. Might you want to run your program on another system? So here's an overview of the main options. It's inevitably subjective, but may be helpful to someone: Power Complexity Portability Basic SSI: Low Low Medium Enhanced SSI[2]: Medium Medium Low CGI: High Medium High[4] Enhanced CGI-like[5]: High Medium Medium[6] Server API: v.High High Low Servlets[7]: High High Medium [2] For example, PHP, ASP. [4] Subject to choice of programming language. [5] For example: mod_perl or fastcgi, both of which can improve efficiency with respect to standard CGI. [6] You port them by converting to standard CGI [7] Servlets are really a server API, and make sense if you're already running a JAVA VM, but will probably hit your server hard if not. ------------------------------ Subject: 1.8 What do I absolutely need to know? If you're already a programmer, CGI is extremely straightforward, and just three resources should get you up to speed in the time it takes to read them: 1) Installation notes for your HTTPD. Is it configured to run CGI scripts, and if so how does it identify that a URL should be executed? (Check your manuals, READMEs, ISP webpages/FAQS, and if you still can't find it ask your server administrator). 2) The CGI specification at NCSA tells you all you need to know to get your programs running as CGI applications. http://hoohoo.ncsa.uiuc.edu/cgi/interface.html 3) WWW Security FAQ. This is not required to 'get it working', but is essential reading if you want to KEEP it working! http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html If you're NOT already a programmer, you'll have to learn. If you would find it hard to write, say, a 'grep' or 'cat' utility to run from the commandline, then you will probably have a hard time with CGI. Make sure your programs work from the commandline BEFORE trying them with CGI, so that at least one possible source of errors has been dealt with. ------------------------------ Subject: 1.9 Does CGI create new security risks? Yes. Period. There is a lot you can do to minimise these. The most important thing to do is read and understand Lincoln Stein's excellent WWW security FAQ, at http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html . ------------------------------ Subject: 1.10 Do I need to be on Unix? No, but it helps. The Web, along with the Internet itself, C, Perl, and almost every other Good Thing in the last 20 years of computing, originated in Unix. At the time of writing, this is still the most mature and best-supported platform for Web applications. ------------------------------ Subject: 1.11 Do I have to use Perl? No - you can use any programming language you please. Perl is simply today's most popular choice for CGI applications. Some other widely- used languages are C, C++, TCL, BASIC and - for simple tasks - even shell scripts. Reasons for choosing Perl include its powerful text manipulation capabilities (in particular the 'regular' expression) and the fantastic WWW support modules available. ------------------------------ Subject: 1.12 What languages should I know/use? It isn't really that important. Use what you're comfortable with, or what you're constrained (eg by your manager) to use. If you're just dabbling with programming, Perl is a good choice, simply because of the wealth of ready-to-run Perl/CGI resources available. If you're serious about programming, you should be at home in a range of languages. C, the industry standard, is a must (at least to the level of comfortably reading other people's code). You'll certainly want at least one scripting language such as Perl, Python or Tcl. C++ is also a very good idea. In response to a Usenet newbie question: > I am seriously wanting to learn some CGI programming languages J.M. Ivler wrote some eloquent words of wisdom: > If you want to learn a programming language, learn a programming language. > If you want to learn how to do CGI programming, learn a programming > language first. > > My book is one of the few that tackles two languages at the same time. > Why? because it's not about languages (which are just syntax for logic). > CGI programming is about programming, and how to leverage the experience > for the person coming to the site, or maintaining the site, or in some way > meeting some requirements. Language is just a tool to do so. ------------------------------ Subject: 1.13 Do I have to put it in cgi-bin? see next question ------------------------------ Subject: 1.14 Do I have to call it *.cgi? *.pl? Maybe. It depends on your server installation. These types of filenames are commonly used conventions - no more. It is up to the server administrator whether or not CGI scripts are enabled, and (if so) what conventions tell the server to run or to print them. If you are running your own server, read the manual. If you're on ISP or other rented webspace, check their webpages for information or FAQs. As a last resort, ask the server administrator. ------------------------------ Subject: 1.15 What is the "CGI Overhead", and should I be worried about it? The CGI Overhead is a consequence of HTTP being a stateless protocol. This means that a CGI process must be initialised for every "hit" from a browser. In the first instance, this usually means the server forking a new process. This in itself is a very small overhead, but it can become important on a heavily-used server if the number of processes grows to problem levels. If the CGI programs are themselves long-running, this is heavily exacerbated. In the second place, the CGI program must initialise. In the case of a compiled language such as C or C++ this is negligible, but there is a penalty to pay for scripting languages such as Perl. Thirdly, CGI is often used as 'glue' to a backend program, such as a database, which may take some considerable time to initialise. This represents a major overhead, which must be avoided in any serious application. The most usual solution is for the backend program to run as a separate server doing most of the work, while the actual CGI simply carries messages. Fourthly, some CGI scripts are just plain inefficient, and may take hundreds of times the resources they need. Programs using system() or `backtick` notation often fall into this category. Note that there are ways to reduce or eliminate all these overheads, but these tend to be system- or server-specific. The best-supported server is probably Apache, as commercial server-vendors like to push their proprietary solutions in preference to CGI. ------------------------------ Subject: 1.16 What is CGIWrap, and how does it affect my program? [ quoted from http://www.umr.edu/~cgiwrap/intro.html ] > CGIWrap is a gateway program that allows general users to use CGI scripts > and HTML forms without compromising the security of the http server. > Scripts are run with the permissions of the user who owns the script. In > addition, several security checks are performed on the script, which will not > be executed if any checks fail. > > CGIWrap is used via a URL in an HTML document. As distributed, cgiwrap > is configured to run user scripts which are located in the > ~/public_html/cgi-bin/ directory. See http://www.umr.edu/~cgiwrap/ ------------------------------ Subject: 1.17 How do I decode the data in my Form? The normal format for data in HTTP requests is URLencoded. All Form data is encoded in a string, of the form param1=value1¶m2=value2&...paramn=valuen Many non-alphanumeric characters are "escaped" in the encoding: the character whose hexadecimal number is "XY" will be represented by the character string "%XY". Decoding this string is a fundamental function of every CGI library. Another format is "multipart/form-data", also known as "file upload". You will get this from the HTML markup