Wednesday, February 2, 2011

Using Apache httpclient through an NTLM authenticating proxy with ftp

I needed to (programmatically) retrieve a file from an FTP server out in the internet. In this example, the URL is ftp://site.com/dir/file.txt. My computer can only access the Internet through proxies. There is an HTTP proxy called web-proxy.local, and an FTP proxy called ftp-proxy.local.

I noticed that I could retrieve the file using my browser, but not using command-line ftp. I determined that the ftp-proxy was slightly mis-configured, and didn't believe that my host was a legitimate user. But how did the browser fetch the file, using that URL above? A little work with wireshark showed that the browser makes an HTTP connection to the proxy, and passes the HTTP command:
GET ftp://site.com/dir/file.txt HTTP/1.1
When I tried to use the java.net.URLConnection with this URL, it wouldn't connect to the web-proxy. That seemed reasonable - it was probably trying to connect to it with FTP. but somehow I needed to create an HTTP connection to a URL starting with FTP.

I decided to try Apache HttpComponents HttpClient - Apache code is always great. After a little difficulty getting the right version (eventually 4.1), I found that I was getting
java.lang.IllegalStateException: Scheme 'ftp' not registered.
at org.apache.http.conn.scheme.SchemeRegistry.getScheme(SchemeRegistry.java:71)
at org.apache.http.impl.conn.DefaultHttpRoutePlanner.determineRoute(DefaultHttpRoutePlanner.java:111)
at org.apache.http.impl.client.DefaultRequestDirector.determineRoute(DefaultRequestDirector.java:710)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:356)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
...
I peered into the source and javadoc of the DefaultHttpRoutePlanner and SchemeRegistry, and discovered that I could add the 'ftp' scheme like this:
HttpClient hc = new DefaultHttpClient()

// Register a scheme so that we can ask the proxy to use ftp
Scheme ftp = new Scheme("ftp", 80, new PlainSocketFactory())
hc.getConnectionManager().getSchemeRegistry().register(ftp)
In this case, I don't think it matters what port number (80) I use, or even which type of socket factory, since the connection will be sent via the proxy anyway - the system doesn't really need to know how to create an ftp socket, or which port to use.

The next issue I had was making it work with the proxy. I had copied the example HttpClient code for using authenticating proxies, but it didn't work. Again, wireshark helped. When the browser fetched the file, I could see the 3-phase NTLM negotiation. But not when my software ran. A spot of googling showed me that instead of using the UsernamePasswordCredentials, it would be better to use NTCredentials. And now it works. The final code looks like this:
HttpClient hc = new DefaultHttpClient()

// Register a scheme so that we can ask the proxy to use ftp
Scheme ftp = new Scheme("ftp", 80, new PlainSocketFactory())
hc.getConnectionManager().getSchemeRegistry().register(ftp)

// Set up NT(LM)Credentials for use with the proxy.
hc.getCredentialsProvider().setCredentials(AuthScope.ANY, new NTCredentials("myUsername", "myPassword", "", ""));

// Set up the proxy
HttpHost proxy = new HttpHost("web-proxy.local", 8080);
hc.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy)

// Set up the URL to fetch
HttpGet hg = new HttpGet(ftp://site.com/dir/file.txt)
HttpResponse hr = hc.execute(hg)

HttpEntity entity = hr.getEntity()
InputStream instream = entity.getContent()
...