JavaScript for everyday scripting

By Stephen Holdaway

25 Oct, 2012

Though JavaScript tends to cop a lot of flak, I find it’s actually a pretty useful thing to know. No matter how much you may dislike JavaScript, the fact is that it’s part of almost every web browser since 1996, and it’s the only major, cross-browser client-side scripting option for the web (though Dart is looking to join that rank). While I fully agree that writing large systems in JavaScript can be a pain, I can’t say the same thing about what’s in the name: scripting. I’m not talking about scripting in the sense of writing small programs, rather scripting the way you operate a command line; short, contextual snippets of code to get stuff done.

Every now and then, I find myself popping open the JavaScript console in Google Chrome and writing a few lines of JavaScript to accelerate something I’m doing. While situations that warrant this are rare, there have been a few occasions where the JavaScript console has saved me huge amounts of menial clicking labour. It’s arguable that if you have to write code to do something on the web, then the creator(s) of the page you’re battling with didn’t do a good enough job. In reality however, most of the things I’ve accomplished with the JavaScript console have been far outside the scope of what the target pages were intended to do. So much to the point that only once have I used the JavaScript console to work around a real usability issue.

Essentially, this all boils down to the fact that the data on every page is right there in your web browser, and in a highly accessible form at that. Why waste time doing things manually when you can automate?! It’s exactly the reason projects like Greasemonkey exist.

Here’s a list for you to ponder:

  • Every component of a web page is accessible with JavaScript (ignoring a few cross-domain nuances with iframes).
  • You don’t just get access to the raw data of a page; you get a highly organised hierarchy (the DOM).
  • The page is on your computer, so you can do whatever you like to it. No-one can stop you (except maybe with some preemptively written defensive JavaScript, but that’s a bit silly).

As a crude example, lets say you wanted to open every link on a page in a new window/tab. You could achieve that like this:

// Get all the anchor (link) elements on the page
var links = document.getElementsByTagName('a');

// Loop through the array of links, opening each one
for( var i=0, c=links.length; i<c; i++){
    window.open(links[i].href);
}

Note that any half-decent pop-up blocker will stop this working (you easily can disable your pop-up blocker if you really want to try it out).

Mining Bebo with JavaScript

In one of my furious efforts to digitally archive everything relevant in arms reach, I decided it would be a good idea to take some old conversations I had on Bebo. To save myself the effort of manually retrieving pages, I wrote an email to Bebo’s support asking for a dump of all comments to and from my account, and a few hours later I received this rather helpful response:

Dear Bebo User:

Thank you for your feedback. We always enjoy hearing from our users.

Jordan

I definitely wouldn’t have done what I did next for the likes of Facebook or twitter (not that it would be necessary with their public APIs), however Bebo wasn’t exactly rocketing through the inter-sky and I didn’t want to lose two years of conversation with my partner because someone pulled the plug on a failing social network. I wrote back kindly pointing out that Jordan hadn’t in-fact attempted to answer my question at all, and soon received this curt reply:

That is not a service that we offer.

With the easy option off the table, I set out to harvest my comment wall from Bebo, and I did. Very successfully. With less than 100 lines of JavaScript, I turned pages and pages of old Bebo comments into 697 rows in a MySQL table on my server in about 5 minutes flat.

The harvesting code

I used an iframe as a viewport and ran snippets of JavaScript in the window to retain state above page changes. Once a small loop had loaded every page and scraped their comments, I packaged everything up in an XMLHttpRequest POST request and fired it off to a script on my local server for database insertion. The key component in this operation was the iframe:

document.body.innerHTML = '<iframe width="100%" height="1000px" src="'+window.location+'"></iframe>';

The browser allowed me to access the contents of this iframe with JavaScript because I created it inside a page on Bebo’s own domain, and it’s content was also from Bebo’s domain. This line of code essentially moves the current page into an iframe on it’s own page, retaining the context of the original page as far as the browser is concerned (Inception-frame?). After this trick, the rest of the code I ran through the console was fairly straight forward:

// Get all comment elements
var comment_elements = window.frames[0].document.getElementById('comment-list').getElementsByTagName('LI');

for (var i=0; i&lt;comment_elements.length; i+=10) {
    var el = comment_elements[i],
        comment = {};

    // Ignore a specific type of comment
    try {
        comment.from = el.childNodes[3].innerHTML;
        if(comment.from.search(/^&lt;/) != -1)
            comment.from = el.childNodes[5].innerHTML;
    } catch(e) {
        console.log("Ignored Moble");
    }

    comment.date = el.getElementsByClassName('comment-timestamp')[0].innerHTML;
    comment.text = el.getElementsByClassName('comment-text')[0].innerHTML;

    if(comment.from == "Steph" || comment.from == "Stephen"){
        comments.push(comment);
    }
}

Once I had the data I wanted, sending cross-domain requests was a breeze. While I could have added a few headers to my server’s response to stop the browser complaining about and ‘blocking’ cross-domain requests, I didn’t need to do anything more than send the requests. The browser complained away, however because it needed to send my requests to read the cross-domain permission headers in the first place, my server happily digested the incoming data and I didn’t care about the discarded response.

While I could have gathered this data using any old language by downloading the HTML pages, the requirement for a logged-in session to view the pages I wanted was a whole other can of worms. In the end, running JavaScript in the context of an authenticated session in a browser made life a lot easier.

Update, August 2013: My concerns about losing this data were well founded: the old Bebo is now inaccessible. Apparently some data will be available in the future after Bebo gets refurbished, but I’m still glad I got this stuff off when I did.

And with that…

I don’t think that everyone should have JavaScript under their belt, but it has certainly been invaluable to me. If you’re interested in learning JavaScript, the Mozilla Developer Network is an excellent resource, and you could always pop open the JavaScript console in your browser.