James Strachan's Weblog
Ramblings on Open Source, Java, Groovy, XML and other geeky malarkey
        

web services and databases

This document contains a few random thoughts on the worlds of web services, messaging and persistence.

What is a web service?

When I use the term web service (with lower case letters), I generally think of connecting distributed systems together via logical XML messaging. So its just about sending and receiving blobs of XML from any language and platform, using any transport and any XML encoding.

Systems then become logically connected together via XML. The physical implementation can use various different technologies. Indeed the real protocol might be binary on the wire or use fixed width files etc.  Transport protocol specific details, such as HTTP, email or JMS headers could be included in the SOAP headers by the provider which the application can use if it wishes.

Applications may expect different XML encodings like SOAP 1.1, SOAP 1.2 with attachments though ultimately bridges and adapters should make this not much of an issue. A web service may have a WSDL definition to declare metadata about the service or not.  Web services can be used one way or RPC, can use synchronous or asynchronous communication and can travel over any network (HTTP, email, JMS, MOM etc). I think the Web Services term (upper case)typically refers to the subset of services that use SOAP encoding and have a WSDL description which probably will become the norm. Though ultimately we'll probably have tools to make pretty much most legacy systems look like Web Services whether they were developed with SOAP/WSDL or not.

So for me, web services are like a universal messaging bus that can span the internet, intranets and include enterprise MOM systems; connecting systems together in a common way.

What about databases?

For too long we've been stuck in a rut with database. We often end up creating a (network) connection with a database, sending commands, getting results back, iterating over the results, then doing more commands. Just to do simple stuff with databases we often need to do multiple queries just because databases normalise the data; the data we actually need is typically on several tables. Even if there's a view or stored procedure that builds a blob in the right way, we often want to then navigate through the data to related fields.

e.g. think of a stock portfolio service. I might query one database to get a list of users positions. Then another to get prices. Then another to get the stock details. Then another to get the users favourite way to display the data. Then we aggregate all this stuff together into one big blob of data.

So rather than doing lots of queries, what an application really wants to do is say 'get me my blob in one fell swoop'.

Lets imagine a wacky idea. Lets imagine that we wrap all access to data in web services. So no more JDBC or SQL in the application tier (obviously that could be used to implement the web service, though thats up to the web service). Immediately we get some benefits.

  • We can access data in one fell swoop; even if we're asking for complex data, like a users stock portfolio. This is the most optimal networking pattern, no more multiple RPCs via JDBC connections or RMI with EntityBeans.
  • Its very easy to put a cache in front (e.g. a web cache in front of HTTP) so that we take huge load off the database
  • These web services now become very reusable across your enterprise; they're not tied to a certain language, Java, and to a certain JDK/J2EE spec or app server. Use the data from XSLT from Java from .NET from MS Office, OpenOffice, Mozilla etc.
  • You can, if you so desire, allow web services to be available on the internet or to your partners over a VPN.
  • web services are much easier to reuse than other technologies like JDBC, EntityBeans etc. You can just use your web browser to view the data and perform queries.
  • We keep SQL and JDBC out of the application code
  • We perform coarse graned communication with the database rather than fine grained (no need to iterate through result sets doing new queries per result in a client-server way).
  • We give the database much more room for performance optimization. The implementor of the web service (who could be, say, a DBA expert who knows just how to fetch the whole blob of data in one go).  Indeed the whole SQL to fetch the blob could well just run inside the database itself.
  • The same approach can be used to fetch any data; whether its stored in a database or a file, or whether its dynamically generated, its transient (like a realtime stock price), whether its a google search etc.
  • We can perform data aggregation to enrich our data with information from the web. Gone are the days of all data being in one single database; now data is spread over the entire net. Using a database-centric approach is no longer valid
  • We avoid database lock-in. Every database is full of proprietary extensions. You really cannot just switch databases without some wrapping layer. The JDBC/SQL layer is not a good dependency point in your application, it needs to be something else.
  • Designing any kind of database is a massive tradeoff between normalisation, reducing contention and redundant data, avoiding deadlocks, optimising readers and writers, laying stuff out on disks etc. So rather than the database designer having to guess the 1 size fits all; we can let the web service contracts dictate how data should be stored. So its moving one level up the stack

Ultimately we shouldn't really care how stuff gets stored. Be it 1 table or 10. Be it in one place or replicated. Be it a file in a file system or a large expensive commercial database. I may write to one database but that data gets replicated all over the place into many different persistent forms. I may use many different forms of database; from relational to XML to file system. I may store XML as a blob of text and create a bunch of XPath indexes; or I may as well suck some data out of the XML into some custom, optimised relational schemas.

All that really matters is how we fetch and update the data. Maybe web services is that mechanism.



© Copyright 2007 James Strachan. Click here to send an email to the editor of this weblog.
Last update: 17/4/07; 11:11:33.