At work I am given the task of implementing a basic device profiler service to
classify the incoming HTTP requests into a certain set of groups (desktop,
tablet, mobile, etc.) using the
User-Agent header. It opens a
multitude of new dimensions both at the client- and server-side for interface
and content customizations tailored to the device. For instance, you can
disable some of your fancy gestures (e.g.,
mouseover events) that will not
be properly used on a touch screen. Or you might want to prioritize console
games for a client who uses a PlayStation to browse your shop.
I first tried to investigate and (if possible) evaluate the existing solutions in the wild, including the commercial ones. And eventually decided to go with Apache DeviceMap that employs OpenDDR in the background. In this blog post, I tried to wrap up a summary of the experience I collected through out this pursuit.
While Section 5.5.3 of HTTP/1.1: Semantics and Content specification has a lot to say about the formatting of the User-Agent header, the first thing for sure is that there are no restrictive rules that shape the header in a machine-readable format. Consider the following examples:
I can even show you more fancier ones:
And some annoying ones:
So making a conclusion on which part denotes the product name, version, etc. is a fuzzy, tedious and error-prone task. That being said, there is another thing we can do here! Every User-Agent is more or less unique to the device that the software runs on. Hence, if we can come up with a database such that User-Agent strings are mapped to devices, we can employ this database to find the device of a certain User-Agent. This is where the term Device Description Repository (DDR) kicks in:
The Device Description Repository is a concept proposed by the Mobile Web Initiative Device Description Working Group (DDWG) of the World Wide Web Consortium. The DDR is supported by a standard interface and an initial core vocabulary of device properties. Implementations of the proposed repository are expected to contain information about Web-enabled devices (particularly mobile devices).
WURFL (Wireless Universal Resource FiLe) was the first community effort focused on mobile device detection and dates back to 2007. While WURFL was initially released under an “open source / public domain” license, in June 2011, project’s founders formed ScientiaMobile to provide commercial mobile device detection support and services using WURFL. As of now, the ScientiaMobile WURFL APIs are licensed under a dual-license model, using the AGPL license for non-commercial use and a proprietary commercial license. In a world dominated by capitalism, the current version of the WURFL database itself is no longer open source. Inspired by WURFL and motivated by the gap in the market, it did not take much for alternative companies to emerge, including, but is not limited to, DeviceAtlas, Handset Detection, and 51degrees.
So how far one can go using a DDR to detect the properties of a device by just looking at the User-Agent header? Below is a sample output that I collected from 51degrees:
Per see, what they can get by just looking at your User-Agent header is (to put it mildly) a lot!
As usual, community’s response did not take long and the most recent open source version of WURFL (dating back to 2011) is forked under the OpenDDR project. Later on, community kept updating the database by the effort of individual contributors.
While OpenDDR file format allows hierarchical device representation as in WURFL, it rather maps each device to a set of attributes explicitly. To make a comparison, see how WURFL takes advantage of its hierarchical device representation:
<device user_agent="Nokia7110/1.0 (04" fall_back="nokia_generic" id="nokia_7110_ver1"> <!-- ... --> <group id="ui"> <!-- ... --> <capability name="table_support" value="false" /> </group> </device> <device user_agent="Nokia7110/1.0 (04.67)" fall_back="nokia_7110_ver1" id="nokia_7110_ver1_sub467" /> <device user_agent="Nokia7110/1.0 (04.69)" fall_back="nokia_7110_ver1" id="nokia_7110_ver1_sub469" /> <!-- ... --> <device user_agent="Nokia7110/1.0 (04.94)" fall_back="nokia_7110_ver1" id="nokia_7110_ver1_sub494" /> <!-- 7110 new-age --> <device user_agent="Nokia7110/1.0 (05" fall_back="nokia_7110_ver1" id="nokia_7110_ver2"> <group id="ui"> <capability name="table_support" value="true" /> </group> </device> <device user_agent="Nokia7110/1.0 (05.00)" fall_back="nokia_7110_ver2" id="nokia_7110_ver1_sub500" /> <device user_agent="Nokia7110/1.0 (05.01)" fall_back="nokia_7110_ver2" id="nokia_7110_ver1_sub501" />
On the other hand, OpenDDR follows a more flat model:
<device id="SAMSUNG-SGH-i780" parentId="genericSamsung"> <property name="model" value="SGH-i780"/> <property name="displayWidth" value="320"/> <property name="displayHeight" value="320"/> <property name="mobile_browser" value="Microsoft Mobile Explorer"/> <property name="mobile_browser_version" value="7.7"/> <property name="device_os" value="Windows Mobile OS"/> <!-- ... --> </device> <device id="sholest" parentId="genericMotorola"> <property name="model" value="XT701"/> <property name="marketing_name" value="Sholes Tablet"/> <property name="displayWidth" value="480"/> <!-- ... --> <property name="ajax_support_getelementbyid" value="true"/> <property name="ajax_support_inner_html" value="true"/> <property name="ajax_manipulate_dom" value="true"/> <property name="ajax_manipulate_css" value="true"/> <property name="ajax_support_events" value="true"/> <property name="ajax_support_event_listener" value="true"/> <property name="image_inlining" value="true"/> <property name="from" value="open_db_modified"/> </device> <device id="bravo" parentId="genericHTC"> <property name="model" value="A8183"/> <property name="marketing_name" value="Bravo"/> <property name="displayWidth" value="480"/> <!-- ... --> <property name="ajax_manipulate_css" value="true"/> <property name="ajax_support_events" value="true"/> <property name="ajax_support_event_listener" value="true"/> <property name="image_inlining" value="true"/> <property name="from" value="open_db_modified"/> </device>
While DDR exposes you an almost exhaustive set of device vendors, models, and
attributes, it does not provide you a search mechanism in this swamp. Apache
DeviceMap (which is graduating from incubation
as of this writing) is a project that fills this gap. DeviceMap basically
ships two fundamental Maven artifacts: an OpenDDR clone (
a driver (
devicemap-client) available for Visual Basic, C# and Java
Usage of Apache DeviceMap is pretty straigt forward. You first include necessary set of dependencies in your POM file:
<dependency> <groupId>org.apache.devicemap</groupId> <artifactId>devicemap-data</artifactId> <version>1.0.2-SNAPSHOT</version> </dependency> <dependency> <groupId>org.apache.devicemap</groupId> <artifactId>devicemap-client</artifactId> <version>1.1.0-SNAPSHOT</version> </dependency>
And then enjoy the API exposed by the driver:
import org.apache.devicemap.DeviceMapClient; import org.apache.devicemap.loader.LoaderOption; // Create a DeviceMapClient instance. DeviceMapClient deviceMapClient = new DeviceMapClient(); deviceMapClient.initDeviceData(LoaderOption.JAR); // Try to classify a sample User-Agent parameter using the DeviceMapClient. String desc = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) " + "AppleWebKit/537.36 (KHTML, like Gecko) " + "Chrome/38.0.2125.104 Safari/537.36": Map<String, String> attrs = deviceMapClient.classify(desc); // Dump found attributes. if (attrs != null) for (Map.Entry<String, String> attr : attrs.entrySet()) System.out.format("%s = %s\n", attr.getKey(), attr.getValue());
In the following you can find the output of the above snippet:
Per see, the output is not much detailed as the one we got from 51degrees.
Almost all of the crucial data is missing and there are some mistakes in
certain entries like display width and height. That being said, it got three
important bits right:
is_tablet. Note that there
does not exist much of a mechanism to verify the correctness of the attributes
returned by the employed engine. That is, the nature of the problem also
implies the absance of a verification mechanism.
If you had a chance to check out the website of the commercial DDR and User-Agent detection solutions, you should have noticed the giant IT leaders (Google, Facebook, PayPal, etc.) in their customers list. Hence, it was a little bit tempting to go with a commercial solution. That being said, I also wanted to evaluate the performance of the Apache DeviceMap. For that purpose, I collected a couple of months worth visitor data at work and tried to resolve User-Agent headers. The results were very promising and Apache DeviceMap succeeded to resolve almost 90% of all the collected User-Agent strings. That beign said, resolving a User-Agent – that is, matching the given User-Agent against DDR and returning a set of attributes – does not imply the correctness of the returned attributes. Nevertheless, I repeated the same experiment with almost two dozens of different devices at work and it succeeded in almost everyone.
Since the project that I am working on is still in its early stages and the initial results are more than satisfactory, we concluded to go with Apache DeviceMap. Further, we managed to increase its coverage up to 99% by introducing some entries manually to do the database. Indeed, we reported a majority of those enhancements back to the Apache DeviceMap project.