Easy way to test a URL for 404 in PHP?


I'm teaching myself some basic scraping and I've found that sometimes the URL's that I feed into my code return 404, which gums up all the rest of my code.

So I need a test at the top of the code to check if the URL returns 404 or not.

This would seem like a pretty straightfoward task, but Google's not giving me any answers. I worry I'm searching for the wrong stuff.

One blog recommended I use this:

$valid = @fsockopen($url, 80, $errno, $errstr, 30);

and then test to see if $valid if empty or not.

But I think the URL that's giving me problems has a redirect on it, so $valid is coming up empty for all values. Or perhaps I'm doing something else wrong.

I've also looked into a "head request" but I've yet to find any actual code examples I can play with or try out.

Suggestions? And what's this about curl?

5/1/2009 6:58:53 AM

Accepted Answer

If you are using PHP's curl bindings, you can check the error code using curl_getinfo as such:

$handle = curl_init($url);
curl_setopt($handle,  CURLOPT_RETURNTRANSFER, TRUE);

/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);

/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 404) {
    /* Handle 404 here. */


/* Handle $response here. */
1/3/2009 1:25:59 AM

If your running php5 you can use:

$url = 'http://www.example.com';
print_r(get_headers($url, 1));

Alternatively with php4 a user has contributed the following:

This is a modified version of code from "stuart at sixletterwords dot com", at 14-Sep-2005 04:52. This version tries to emulate get_headers() function at PHP4. I think it works fairly well, and is simple. It is not the best emulation available, but it works.

- supports (and requires) full URLs.
- supports changing of default port in URL.
- stops downloading from socket as soon as end-of-headers is detected.

- only gets the root URL (see line with "GET / HTTP/1.1").
- don't support HTTPS (nor the default HTTPS port).

    function get_headers($url,$format=0)
        $end = "\r\n\r\n";
        $fp = fsockopen($url['host'], (empty($url['port'])?80:$url['port']), $errno, $errstr, 30);
        if ($fp)
            $out  = "GET / HTTP/1.1\r\n";
            $out .= "Host: ".$url['host']."\r\n";
            $out .= "Connection: Close\r\n\r\n";
            $var  = '';
            fwrite($fp, $out);
            while (!feof($fp))
                $var.=fgets($fp, 1280);

                foreach($var as $i)
                    if(preg_match('/^([a-zA-Z -]+): +(.*)$/',$i,$parts))
                return $v;
                return $var;

Both would have a result similar to:

    [0] => HTTP/1.1 200 OK
    [Date] => Sat, 29 May 2004 12:28:14 GMT
    [Server] => Apache/1.3.27 (Unix)  (Red-Hat/Linux)
    [Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT
    [ETag] => "3f80f-1b6-3e1cb03b"
    [Accept-Ranges] => bytes
    [Content-Length] => 438
    [Connection] => close
    [Content-Type] => text/html

Therefore you could just check to see that the header response was OK eg:

$headers = get_headers($url, 1);
if ($headers[0] == 'HTTP/1.1 200 OK') {

if ($headers[0] == 'HTTP/1.1 301 Moved Permanently') {
//moved or redirect page

W3C Codes and Definitions

Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow