Why XML?
Airlines began computerizing their reservations systems starting in the 1960's, with American Airlines' SABRE system being the first. Other airlines soon realized that this was a competitive advantage and created their own systems and by the 1970's, every airline was computerized. Of course, what this means is that the travel industry is based on 1960's technology.
In the 1970's, if you wanted to ask United's Apollo reservation system for a list of published fares between Raleigh / Durham, North Carolina, and Seattle, Washington, you'd say something like this:
$DRDUSEAand Apollo would respond with something like this:
FARES LAST UPDATED 23JAN 11:31 AM
>$DRDUSEA
RDU-SEA DEPART 23JAN
ADULT FARES
CMP/NXC/FFY/GOV/MIL/MLC/SEN/SPL/VUS/YTH FARES ALSO EXIST
U.S. PASSENGER FACILITY CHARGES / SURCHARGES MAY APPLY
TAXES AND FEES MAY VARY DEPENDING ON THE BOOKED ITINERARY
USD FARE MIN/ XL TVL DATES TKT DATES
CX FARE BASIS AP MAX FE FIRST/LAST FIRST/LAST
1 CO 198.00R TLE7SN 07| SU/30 || -/20JUNC -/-
2 NW 198.00R KO7M1N6 07| 01/-- || -/31MARC -/-
3 DL 198.00R U14M1O49 14| 01/-- || -/06JUNC -/-
4 UA 198.00R TE7ONQ5 07| SU/-- || -/31MARC -/-
)><
In point of fact, it would have looked different for a number of reasons, principally that it's now 30 years later, but you
get the idea. The important thing is that this was the state of the art, and if you're talking to Sabre, Amadeus, or Worldspan,
things would work pretty much the same. The data formats are pretty terse, because back in 1960, bytes were expensive. Note,
for instance, that Apollo tells you that the last effective date for that first fare on Continental is June 20,
but the meaning of that is the next occurrence of June 20; in other words, the meaning of a date depends on
today's date. It's human readable, if the human knows a bit about airlines and how they work. I'd be willing to bet that most
people who've travelled by air could pick out some meaning; pretty much anybody could tell you that the cheapest
fare is $198.00. If you know what you're looking
for, it's not bad to parse, except for the complete lack of metadata, the lack of any self-describing features (i.e. field
delimiters) and the fact that this response could change at any time, without warning. To get
around that problem, the airlines started using structured data. Many in the industry started building proprietary
structured data. United's Apollo and British Airways' Galileo systems, now united under common ownership, standardized on
a format that involved fixed length formats. So in the 1990's, you could make the same request that we just made above using
the following request:
PQQ010001FRQ00CHRGFILDIN01142
4080000183K000000010065GFQH000F0002000000002NNNNNNNNNNNNYNNN
NNNNNNNN 0118GFFD000F00020000RDU SEA NNNNNN
NNNNNNN NN
NNNNNNNN
To which Apollo or Galileo would respond:
PRR01000101Y00DOT011425080021921K00000001000052
GFRH000F00030000NNNYYNNNNNNNNNNNNNNNNNNNNNNNNNNN0067GFMM000F
00010000000000000000090FARES LAST UPDATED 24OCT 6:42PM0099G
FMM000F00010000000000000000090 RDU-SEA DEPART 23JAN
0099GFMM000F00010000000000
000000090ADULT FARES
0099GFMM000F00010000000000000000090CMP/NXC/FFY/
GOV/MIL/MLC/SEN/VUS/YTH FARES ALSO EXIST 0099GFMM
000F00010000000000000000090 U.S. PASSENGER FACILITY CHAR
GES / SURCHARGES MAY APPLY 0099GFMM000F00010000000000000
000090 TAXES AND FEES MAY VARY DEPENDING ON THE BOOKED I
TINERARY 0097GFMM000F00010000000000000000090 USD
FARE MIN/ XL TVL DATES TKT DATES0098GFMM000F0
0010000000000000000090 CX FARE BASIS AP MAX FE
FIRST/LAST FIRST/LAST0091GFTD000F00030001NYNNNNNNNNNNNNN
N DL 172.00RUL21M1SN21|01-- || 0404C 09860091
GFTD000F00030002NYNNNNNNNNNNNNNN CO 172.00RTO211BSN21|0
1-- || 0404C 00060091GFTD000F00030003NYNNNNNNNNNNN
NNN NW 172.00RK21PRNR 21|01-- || 0404C 020900
91GFTD000F00030004NYNNNNNNNNNNNNNN AA 172.00RLHE21D1N21
|01-- || 0404C 0002
I've truncated the response to the first 4 fares. This isn't
nearly as readable, even an expert would struggle at decoding this. You might pick out the RDU, SEA, and UA, and you might even
pick up on the inventory listing (that's the part that looks like 01Y 009B 009M...). But it's pretty difficult to read and parse
visually. Mechanically, the story isn't much better. While there is metadata for this, the format is pretty unforgiving and
unhelpful because it isn't self-describing; you have to know how long each field is, a priori before you can parse it. You
do get a bit of help, though. The Data Record block (that's the part that starts DOT011001060203294) gives you some hints, in
particular that this data record is called 1001 version 6.2 and is 3294 bytes long. The mainframe could send you 1001 6.3 but
any fields that weren't in version 6.2 would go at the end of the record, so clients parse as far as they know how and discard the
remainder of the 3294 bytes that they don't. The mainframe can't send you version 7, because it's not constrained by the "end of
block" rule and your request was version 6.2, so the mainframe can't expect you to know about version 7.
In any case, this is better, but really not good enough. The fact is that it takes a lot of code that knows about all the characteristics of an individual field. There's EDIFACT, which has delimited fields and a heirarchical structure, but EDIFACT turns out to be incredibly complex to compose messages in, and beastly to debug. And then there's XML. What if we took the metadata we have about our proprietary structures, and wrote the last parser anybody will ever write? What if we named each piece of data and wrapped it up in a tag?
<FareQuoteTariffDisplay_8_0> <FareDisplayMods> <QueryHeader> <UniqueKey>0000</UniqueKey> <LangNum>00</LangNum> <Action>002</Action> <RetCRTOutput>N</RetCRTOutput> <NoMsg>N</NoMsg> <NoTrunc>N</NoTrunc> <IMInd>N</IMInd> <FIPlus>N</FIPlus> <PEInd>N</PEInd> <HostUse16>N</HostUse16> <NBInd>N</NBInd> <ActionOnlyInd>N</ActionOnlyInd> <TranslatePeriod>N</TranslatePeriod> <GFYInd>N</GFYInd> <IntFrame1>N</IntFrame1> <SmartParsed>Y</SmartParsed> <PDCodes>N</PDCodes> <BkDtOverride>N</BkDtOverride> <HostUse25>N</HostUse25> <DefCurrency>N</DefCurrency> <PFPWInd>N</PFPWInd> <HostUse28>N</HostUse28> <HostUse29>N</HostUse29> <HostUse30>N</HostUse30> <HostUse31>N</HostUse31> <DefCurrencyLocInd>N</DefCurrencyLocInd> <HostUse33>N</HostUse33> </QueryHeader> <TravConstraints> <UniqueKey>0000</UniqueKey> <StartPt>DEN</StartPt> <EndPt>ORD</EndPt> <OW>N</OW> <RT>N</RT> <LongDispInd>N</LongDispInd> <ValidatingDispInd>N</ValidatingDispInd> <NUCInd>N</NUCInd> <RetDataInd>N</RetDataInd> <RulesInd>N</RulesInd> <BaseFares>N</BaseFares> <ConxPts>N</ConxPts> <IncDomTax>N</IncDomTax> <ConvAP>N</ConvAP> <FQSFareType>N</FQSFareType> <HalfRT>N</HalfRT> <CalShopReq /> <Spare1>NN</Spare1> <StartDt> <![CDATA[ ]]> </StartDt> <AirV1 /> <AirV2 /> <AirV3 /> <GlobDir /> <ConxPt1 /> <ConxPt2 /> <EndDt> <![CDATA[ ]]> </EndDt> <TkDt> <![CDATA[ ]]> </TkDt> <FareType /> <Currency /> <Pt /> <SellCurrency /> <JointFares>N</JointFares> <RndWorld>N</RndWorld> <CircTrip>N</CircTrip> <Spare2>NNNNN</Spare2> </TravConstraints> </FareDisplayMods> </FareQuoteTariffDisplay_8_0>You'll get a considerable amount of data back. What we have is sort of readable, it's self-describing, and we can use off the shelf tools to parse it:
<FareQuoteTariffDisplay_8_0>
<FareInfo>
<RespHeader>
<UniqueKey>0000</UniqueKey>
<CRTOutput>N</CRTOutput>
<ErrMsg>N</ErrMsg>
<AgntAlert>N</AgntAlert>
<SmartParsedData>Y</SmartParsedData>
<Spares1>YNNN</Spares1>
<FQSOnlyItin>N</FQSOnlyItin>
<Spares2>N</Spares2>
<IFQLastF0>N</IFQLastF0>
<IFQLastFQ>N</IFQLastFQ>
<IFQLastD>N</IFQLastD>
<IFQLastB>N</IFQLastB>
<IFQLastV>N</IFQLastV>
<Spare3>N</Spare3>
<AppInd1>N</AppInd1>
<AppInd2>N</AppInd2>
<AppInd3>N</AppInd3>
<AppInd4>N</AppInd4>
<AppInd5>N</AppInd5>
<AppInd6>N</AppInd6>
<AppInd7>N</AppInd7>
<AppInd8>N</AppInd8>
<AppInd9>N</AppInd9>
<AppInd10>N</AppInd10>
<AppInd11>N</AppInd11>
<AppInd12>N</AppInd12>
<AppInd13>N</AppInd13>
<AppInd14>N</AppInd14>
<AppInd15>N</AppInd15>
<AppInd16>N</AppInd16>
</RespHeader>
<InfoMsg>
<UniqueKey>0000</UniqueKey>
<QuoteNum>0</QuoteNum>
<MsgNum>0</MsgNum>
<AppNum>0</AppNum>
<MsgType>9</MsgType>
<Lang>0</Lang>
<Text><![CDATA[FARES LAST UPDATED 24OCT 6:42PM]]></Text>
</InfoMsg>
<Tariff>
<UniqueKey>1</UniqueKey>
<Type1>N</Type1>
<Type2>Y</Type2>
<Type3>N</Type3>
<Type4>N</Type4>
<HasCitiesLine>N</HasCitiesLine>
<PermittedDisc>N</PermittedDisc>
<HasFreeForm>N</HasFreeForm>
<HasPF>N</HasPF>
<Spare1>NNNNNNNN</Spare1>
<PIC />
<Type2Qual>
<SpclCondInd />
<AirV>DL</AirV>
<Fare>172.00</Fare>
<RTInd>R</RTInd>
<FIC>UL21M1SN</FIC>
<AP>21</AP>
<APEndItem></APEndItem>
<MinStay>01</MinStay>
<MaxStay>--</MaxStay>
<DirInd />
<Pens></Pens>
<FirstTravDt />
<LastTravDt>0404</LastTravDt>
<FootnoteType>C</FootnoteType>
<FirstTkDt />
<LastTkDt />
<RteInfo>986</RteInfo>
</Type2Qual>
</Tariff>
<Tariff>
<UniqueKey>2</UniqueKey>
<Type1>N</Type1>
<Type2>Y</Type2>
<Type3>N</Type3>
<Type4>N</Type4>
<HasCitiesLine>N</HasCitiesLine>
<PermittedDisc>N</PermittedDisc>
<HasFreeForm>N</HasFreeForm>
<HasPF>N</HasPF>
<Spare1>NNNNNNNN</Spare1>
<PIC />
<Type2Qual>
<SpclCondInd />
<AirV>CO</AirV>
<Fare>172.00</Fare>
<RTInd>R</RTInd>
<FIC>TO211BSN</FIC>
<AP>21</AP>
<APEndItem></APEndItem>
<MinStay>01</MinStay>
<MaxStay>--</MaxStay>
<DirInd />
<Pens></Pens>
<FirstTravDt />
<LastTravDt>0404</LastTravDt>
<FootnoteType>C</FootnoteType>
<FirstTkDt />
<LastTkDt />
<RteInfo>6</RteInfo>
</Type2Qual>
</Tariff>
<Tariff>
<UniqueKey>3</UniqueKey>
<Type1>N</Type1>
<Type2>Y</Type2>
<Type3>N</Type3>
<Type4>N</Type4>
<HasCitiesLine>N</HasCitiesLine>
<PermittedDisc>N</PermittedDisc>
<HasFreeForm>N</HasFreeForm>
<HasPF>N</HasPF>
<Spare1>NNNNNNNN</Spare1>
<PIC />
<Type2Qual>
<SpclCondInd />
<AirV>NW</AirV>
<Fare>172.00</Fare>
<RTInd>R</RTInd>
<FIC>K21PRNR</FIC>
<AP>21</AP>
<APEndItem></APEndItem>
<MinStay>01</MinStay>
<MaxStay>--</MaxStay>
<DirInd />
<Pens></Pens>
<FirstTravDt />
<LastTravDt>0404</LastTravDt>
<FootnoteType>C</FootnoteType>
<FirstTkDt />
<LastTkDt />
<RteInfo>209</RteInfo>
</Type2Qual>
</Tariff>
<Tariff>
<UniqueKey>4</UniqueKey>
<Type1>N</Type1>
<Type2>Y</Type2>
<Type3>N</Type3>
<Type4>N</Type4>
<HasCitiesLine>N</HasCitiesLine>
<PermittedDisc>N</PermittedDisc>
<HasFreeForm>N</HasFreeForm>
<HasPF>N</HasPF>
<Spare1>NNNNNNNN</Spare1>
<PIC />
<Type2Qual>
<SpclCondInd />
<AirV>AA</AirV>
<Fare>172.00</Fare>
<RTInd>R</RTInd>
<FIC>LHE21D1N</FIC>
<AP>21</AP>
<APEndItem></APEndItem>
<MinStay>01</MinStay>
<MaxStay>--</MaxStay>
<DirInd />
<Pens></Pens>
<FirstTravDt />
<LastTravDt>0404</LastTravDt>
<FootnoteType>C</FootnoteType>
<FirstTkDt />
<LastTkDt />
<RteInfo>2</RteInfo>
</Type2Qual>
</Tariff>
</FareInfo></FareQuoteTariffDisplay_8_0>