Monday 25 June 2012

Advanced Search on GitHub

Today I was cruising the MongoDB Java driver GitHub repo. I was interested in the implementation of the eval() method, as I wanted to ensure I cater for all returned types within mongometer.

Seemed simple enough, I thought.

I went straight to DB.java and saw that we're calling command() and extracting an object keyed by retval. Interesting, to see retval, a potentially project-wide constant, defined as a String rather than as an Enum. Anyhoo, this isn't a critique of the driver code, so I'll park that for now, I just wanted to find what could possibly be returned by eval().

An easy way to do this is to fetch the branch and search it locally. But I wouldn't really want to do this for every single project that I ever want to cruise? No way, Pedro! So, let's use the online GitHub Search.

    A good place to start: https://github.com/search
    Advanced Search : retval repo:mongodb/mongo-java-driver
    Search for: Code
    Search Language: Java

That all seems sane enough. Right?



Wow! That was unexpected. I haven't been returned the results limited to the filetype of Java, I've been returned a list of files that contain the term java. Let's have a quick look at the querystring.



It seems to be searching for Java, so let's swap out Java for retval, our actual search term.



Now you get the results for retval. We have an unknown number of matches for retval from within the Java driver code base. But is seems we have been returned results for every version of the file that the search term is found in. Let's park that and come back to it later.



You get the same results when you completely remove the language from the querystring. Let's remove it and leave it off as it reverts back to using Java as the search term.



It might not seem like it, but we're getting somewhere. Notice there is a repo parameter on the querystring. Let's pull the repo:mongodb/mongo-java-driver out of the q term and stick it in the repo parameter.



Now on the search form we have a separate input field where you can specify the repo.



So, let's try limiting it to a single version of each file in the repo. Hmmm, not sure how to do this. Anyone got any ideas? I must be missing something as I'd have thought that search is fundamental to any website these days. Anything I try seems to result in with the same error message.

Invalid search query. Try quoting it.

All I want to do is search files for a given string, without having to fetch the entire repo.

I'd look through the github.com repo to investigate further, but I don't seem to be able to find github on github.

To be continued...

10 comments:

  1. When I looked at one of my private repos I saw that there was a seperate search box just for that one repo that you don't get on public ones.

    Looking at the URL you get from doing a search there, I came up with this for you:

    https://github.com/mongodb/mongo-java-driver/search?q=retval&choice=code&langOverride=&start=

    I'd never actually tried to search code before so never noticed how bad it was!

    ReplyDelete
    Replies
    1. Thanks @wibblymat, but unfortunately that doesn't take me any further I'm afraid. It still leaves me with with the results for every version of the file that the search term is found in, as per the 2nd and 3rd result screenshots above. I can't even tell if it has found all occurrences of the search term.

      It's rather frustrating, but I'm sure someone has the secret to GitHub searching (without having to fetch the source).

      Delete
  2. I emailed them about this 3-4 months ago.

    They replied:

    From: Petros Amiridis (GitHub Staff)
    Subject: [Contact] search

    Hi,

    I am afraid search has various problems including this one. We are working on fixing it.

    Petros

    ReplyDelete
    Replies
    1. Awesome. Thanks @Petros. We should expect a fix any day now then.

      Strange that I can't find github.com anywhere on github. I'd be glad to go through some of the source and help out (if possible).

      Delete
    2. github isn't open source itself

      Delete
    3. Hi Unknown.

      That is a shame. Perhaps if it were, they'd get some support and help from its community.

      Delete
  3. Wow, I've never even gotten as far as you, I've never gotten the "repo:" limit to actually _work_. But I hadn't noticed the seperate repo query param, I'll try doing like that.

    Every time I've reported my assorted github search problems (repo limit, others) to github support, the response I get back is what Petros reported.

    I would not expect a fix 'any day now', I've been getting that response for a year or two now, I'm afraid.

    I agree that limiting to latest version of each file (or, hmm, limiting to any specific arbitrary commit or tag?) would be awfully useful. i don't think there's a way to do it.

    ReplyDelete
    Replies
    1. That's not too encouraging that you've reported it and still nothing has been done )-:

      I've only spent 10 mins hacking it (it actually took longer to write the blog...) So, when I get a bit of spare time, I'll give it a real go.

      Glad to see that I wasn't being an idiot, and that others are having the same problems.

      You'd have thought with it being reported to GitHub, they would attempt to resolve it.

      Delete
  4. Try this:

    https://github.com/skratchdot/github-code-search.user.js

    ReplyDelete
    Replies
    1. Hi @xp1,

      I gave this a try, unfortunately that doesn't take me any further I'm afraid. It still leaves me with with the results for every version of the file that the search term is found in, as per the 2nd and 3rd result screenshots above. I can't even tell if it has found all occurrences of the search term.

      Jp

      Delete