Run custom shell scripts in Android

It’s not secure and inconvenient to do some low-level testing task depending on third party APPs. Shell is especially useful for such purpose.

1. Install Android Debug Bridge(ADB)

Download Android SDK and run SDK Manager to install the platform-tools, as following image shows:

android-adb

2. Connect to android device with ADB

In order to push file into /system file system, root mount is required, so run following commands:

$ adb root
$ adb remount

If no error occurred, you should see “remount succeeded” on terminal.

3. Write shell scripts

It’s almost the same as writing general shell scripts for Linux except the first line, for the android shell is located at /system/bin directory.

#!/system/bin/sh

# DO SOMETHING

4. Push script to android

Push the script, test_script for example, to android:

$ adb push /path/to/test_script /system/xbin/test_script

And change permission:

$ adb shell chmod 6755 /system/xbin/test_scripts

5. Run from Android

If you wanna run the scripts on Android, you’d better install Android Terminal Emulator.

Windows batch script to start VMWare guests

The GUI of VMWare Workstation seemed quite slow and always run into “not responding” state when I try to boot a guest. I tried the vmrun command and it works very smoothly, so I wrote a simple batch script which can be used to run a guest automatically on windows start.

@echo off

if "%1" == "" (
    set vm_name=X:\\default\\path\\to\\linux.vmx
) else (
    set vm_name=%1
)

vmrun list | findstr /E "%vm_name%" 1>NUL

if errorlevel 1 (
    echo Starting %vm_name% ...
    vmrun -T ws start %vm_name% nogui
) else (
    echo %vm_name% is running
)

pause

Windows batch script is rarely used (at least for me) so I make a note here in case I or other guys need such script in the future.

CMD reference: http://ss64.com/nt/.

Use fsockopen to read chunked page

PHP provides many different ways to download a page(file) through HTTP, the simplest way is using file_get_contents function which is suitable for relatively small files. If try to download large file with file_get_contents the PHP allowed memory (configured by memory_limit directive) may be exhausted with a fatal error. What’s why we need a portable way to deal with large file, where fsockopen comes in.

There are two main conditions we should consider: Content-Lenght specified or Chunked data.

0. Initiate socket connection

Use fsockopen() to initiate the socket connection. You’d better specify the error number, error message and timeout parameters, and process the error if exists.

$url = 'http://test.example.com/fetch_file.php?file=testfile.iso';

if (preg_match_all('#http://([^/]+)(/.+)#i', $url, $matches)) {
    $host = $matches[1][0];
    $path = $matches[2][0];
} else {
    die('Invalid URl');
}

$fp = fsockopen($host, 80, $errno, $error, 30);

//Open a file pointer for write
$wfp = fopen('file-write-to', 'w');

//specify the block size to read
$readBlockSize = 512;

1. Content-Length

If the Content-Length is specified by HTTP response header, the reading is straightforward just as reading general files.

Snippet use to read response body:

$data = fread($fp, $readBlockSize);
fwrite($wfp, $data);

2. Chunked

For chuncked encoding, there is a different data format, here is a quotation from WikiPedia:

Each chunk starts with the number of octets of the data it embeds expressed in ASCII followed by optional parameters (chunk extension) and a terminating CRLF sequence, followed by the chunk data. The chunk is terminated by CRLF. If chunk extensions are provided, the chunk size is terminated by a semicolon followed with the extension name and an optional equal sign and value.

The last-chunk is a regular chunk, with the exception that its length is zero.

The encoded data looks like this:

4
Wiki
5
pedia
E
 in

chunks.
0

So we have to address the response chunk by chunk. Snippet to do so:

if ($chunk_length === false) {
    $data = trim(fgets($fp, 128));
    $chunk_length = hexdec($data);
} else if ($chunk_length > 0) {
    $read_length = $chunk_length > $readBlockSize ? $readBlockSize : $chunk_length;
    $chunk_length -= $read_length;
    $data = fread($fp, $read_length);
    fwrite($wfp, $data);
    if ($chunk_length <= 0) {
        fseek($fp, 2, SEEK_CUR);
        $chunk_length = false;
    }
} else {
     break;
}

The full script:


 * @copyright (C) 2013 James Tang.
 */

set_time_limit(600);
ignore_user_abort(true);

//$url = 'http://test.example.com/fetch_file.php?file=testfile.iso';
//$saveToFile = 'tmp.iso';
$url = 'http://test.example.com/fetch_file.php?file=tmp.gz';
$saveToFile = 'tmp.gz';

if (preg_match_all('#http://([^/]+)(/.+)#i', $url, $matches)) {
    $host = $matches[1][0];
    $path = $matches[2][0];
} else {
    die('Invalid URl');
}

$fp = fsockopen($host, 80, $errno, $error, 30);
$readBlockSize = 512;

if ($fp) {

    $wfp = fopen($saveToFile, 'w');

    if ($wfp) {
        $request = "GET $path HTTP/1.1\r\n";
        $request .= "Host: $host\r\n";
        $request .= "Connection: close\r\n";
        $request .= "User-Agent: php-download/1.0\r\n";
        $request .= "\r\n";

        fwrite($fp, $request);

        $body_start = false;
        $md5sum = '';
        $content_length = false;
        $chunk_length = false;

        $startLine = fgets($fp, 128);

        if ($startLine && preg_match('#^HTTP/1.\d?\s+200\s+#', $startLine)) {
            while (!feof($fp)) {
                if (!$body_start) {
                    $header = fgets($fp, 128);
                    echo $header;
                    $colon_pos = strpos($header, ':');
                    $header_name = strtolower(trim(substr($header, 0, $colon_pos)));
                    $header_value = trim(substr($header, $colon_pos+1)); 
                    if ($header_name == 'content-md5') {
                        $md5sum = bin2hex(base64_decode($header_value));
                    } else if ($header_name == 'content-length') {
                        $content_length = (int) $header_value;
                    }
                    if ($header == "\r\n") {
                        $body_start = true;
                        echo "Reading data...\n";
                    }
                } else {

                    if ($content_length !== false && $content_length > 0) {
                        $data = fread($fp, $readBlockSize);
                        fwrite($wfp, $data);
                    } else {
                        if ($chunk_length === false) {
                            $data = trim(fgets($fp, 128));
                            $chunk_length = hexdec($data);
                        } else if ($chunk_length > 0) {
                            $read_length = $chunk_length > $readBlockSize ? $readBlockSize : $chunk_length;
                            $chunk_length -= $read_length;
                            $data = fread($fp, $read_length);
                            fwrite($wfp, $data);
                            if ($chunk_length <= 0) {
                                fseek($fp, 2, SEEK_CUR);
                                $chunk_length = false;
                            }
                        } else {
                            break;
                        }
                    }
                }
            }
        } else {
            echo "Failed to read data: " . $startLine . "\n";
        }

        fclose($wfp);
        if ($md5sum && strlen($md5sum) > 0) {
            $md5sum_check = bin2hex(md5_file($saveToFile, true));
            if ($md5sum_check != $md5sum) {
                echo 'MD5 checksum does not match: ' . $md5sum_check . "\n";
            } else {
                echo "MD5 checksum match\n";
            }
        } else {
            echo "No MD5 checksum detected\n";
        }
        //unlink($saveToFile);
    }

    fclose($fp);
} else {
    echo 'Error: ' . $errno . '#' . $error . "
\n"; }

3. Problems

The $readBlockSize value is critical, if too large it may cause problem. When I test on remote server with $readBlockSize=4096, the downloaded file was not identical to source file. This problem must be caused by transfer rate: when you try to read 4096 bytes from the response body, but if less than 4096 bytes was prepared, then the reading sequence is disrupted. At last I found 512 works fine for me.

4. Reference

1. http://en.wikipedia.org/wiki/Chunked_transfer_encoding

2. http://tools.ietf.org/html/rfc2616#page-118

Run VMWare guest with command line

WMWare workstation provides a command line utility, called vmrun, to control the guest machines, they are especially useful if GUI is not required.

For example, start a guest instance with following command:

vmrun -T ws start /path/to/instance.vmx nogui

You can check how many guest instance are running:

rmrun list

For more information, just hit vmrun on terminal or refer to Using vmrun to control Vitual Machines.

Parse HTML with PHP DOM

With PHP DOM extension, parsing HTML data is straightforward just as parsing DOM in JavaScript. This post will demonstrate how to use DOMDocument and DOMXPath to extract data in which we are interested from general HTML file, but not strictly structured as XML.

Sample file looks like this:

<!DOCTYPE html>
<html>
    <head>
        <title>王小五 - Profile</title>
        <meta charset="utf-8" />
        <link href="/css/profile.css" rel="stylesheet" type="text/css"/>
    </head>
    <body>
        <div id="wrapper">
            <div id="header">
                <h1>王小五</h1>
            </div>
            <div id="content">
                <div id="profile-box">
                    <div id="profile-img">
                        <img id="profile-img" src="/profile_img.php?uid=32153" alt="" />
                    </div>
                    <ul class="ulist">
                        <li>
                            <span class="field-name">Age:</span> 
                            <span class="field-value">31</span>
                        </li>
                        <li>
                            <span class="field-name">Gender:</span> 
                            <span class="field-value">Female</span>
                        </li>
                        <li>
                            <span class="field-name">Location:</span> 
                            <span class="field-value">Guangzhou, China</span>
                        </li>
                    </ul>
                </div>
            </div>
            <div id="footer">
                <div class="x13">&copy; 2012 cctv.</div>
            </div>
        </div>
    </body>
</html>

1. Load HTML

The simplest code looks like this:

$html = file_get_contents('data.html');
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

Notice that there are two elements with id=”profile-img” in the HTML, it’s not a “valid” HTML page, so when you try to run the code, you will encounter following PHP warning:

PHP Warning:  DOMDocument::loadHTML(): ID profile-img already defined in Entity, line: 16 in /home/james/projects/mixedlab/linux/php/xml/dom/demo/demo.php on line 6

Most of the times we’d like to ignore such warnings for it’s no likely for us to change the source html file which may generated by other PHP script written by a negligent programmer. Fortunately, this problem can be solved with just a little additional work:

libxml_use_internal_errors(true);

$html = file_get_contents('data.html');
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

libxml_clear_errors();

Set libxml_use_internal_errors to true to suppress the warning, but still you can catch the errors if you want to do something beside just ignore it.

Another problem we should solve is the file encoding. The demo HTML data was encoded in UTF-8, as the meta header shows, but DOMDocument cannot recognize it. DOMDocument can only accept http-equiv meta, so we have to do some preprocess on the HTML:

$html = str_replace(
    '', 
    '', 
    $html
);

2. Extract Data

Although DOMDocument provides many methods for manipulating elements, such as DOMDocument::getElementsByClassName(), but I think it’s less useful and more complicated than using XPath.

XPath::query(string $expression [, DOMNode $contextnode [, bool $registerNodeNS = true ]]) always return a DOMNodeList object if the $expression is well formed and $contextnode is valid or NULL(not set).

Extract user name from header(h1):

$nodes = $xpath->query('//*[@id="header"]/h1');
$name = $nodes->item(0)->nodeValue;
echo "Name: " . $name . "\n";

In most cases, the HTML structure is more complicated than the demo, and $contextnode will help us focus on the restricted section and keep the xpath query concise.

Extract sub-node by specifying the context node(for demonstration only in this case):

$nodes = $xpath->query('//div[@id="header"]');
$headerNode = $nodes->item(0);
$nodes = $xpath->query('h1', $headerNode);
$name = $nodes->item(0)->nodeValue;
echo "Name: " . $name . "\n";

Extract user properties:

$nodes = $xpath->query('//*[@id="profile-box"]/ul/li');
foreach ($nodes as $node) {
    $childNodes = $xpath->query('span', $node);
    $key = $childNodes->item(0)->nodeValue;
    $value = $childNodes->item(1)->nodeValue;
    echo $key . ' ' . $value . "\n";
}

3. innerHTML function

Here provides a useful function for extracting the inner HTML for one node:

function innerHTML($node)
{
    $meta = '';
    $dom = new DOMDocument();
    $dom->loadHTML($meta);
    $dom->appendChild($dom->importNode($node, true));
    $html = preg_replace(
        '#^.*<' . $node->nodeName . '[^>]*>(.*)nodeName . '>.*$#s', 
        '\1', $dom->saveHTML());
    return $html;
}

4. Reference

PHP Dom Manual: http://php.net/manual/en/book.dom.php

Full Demo for this Post: https://github.com/fwso/mixedlab/tree/master/linux/php/xml/dom/demo