## EMV is the worldwide standard for smart card payments¶

• EMV (named after its original developers – Europay, MasterCard and Visa), is now used for over 80% of in person card payments worldwide
• Initially EMV was a standard for contact smart-cards, but has since expanded to include the closely related contactless payment standards, and payment standards like 3D-Secure (for online payments)
• Standards are maintained by EMVCo which makes the current version publicly available
• Over the past 15 years colleagues and I have found numerous security vulnerabilities

# EMV is primarily a compatibility standard, not security¶

• It is designed to allow terminals and cards work, even without fully understanding the data processed, where cards have very limited RAM and processing power
• Be conservative in what you do, be liberal in what you accept from others (Postel's law)
• This has the potential to create security vulnerabilities, by increasing complexity and risking that important data will not be properly interpreted

# One way EMV achieves compatibility is through the TLV format¶

• Data is encoded as Tag, Length, Value
• Very efficient, compared to JSON or XML but not easy to read by eye
• Tree structure can be decoded without knowing all tags
• Unknown tags can be ignored (for better or worse)
• 0x00 values are ignored between TLV items, allowing in-place deletion (historically 0xff too)
• Also known as ASN.1 BER (Basic Encoding Rules) format from X.208, used for example in X.509 HTTPS certificates

# It is sometimes helpful to manually decode TLV data¶

• You have to explain why the decoded version is correct
• You might have to write your own decoder (though I wouldn't recommend it)
• Doing things for yourself can help you find where others might have slipped up

Follow along yourself at https://murdoch.is/:/emvdecode with notes at https://murdoch.is/:/emvdecodenotes and repository at https://murdoch.is/:/emvdecoderepo

In [1]:
## Some helpful utilities for processing hex data
from hexutils import *

In [2]:
## Convert hex to binary
to_bin('AA')

Out[2]:
'10101010'
In [3]:
## Strip whitespace around and within hex
strip_bytes("303\n  132 ")

Out[3]:
'303132'
In [4]:
## Split hex into bytes for display
split_bytes("303\n  132 ")

Out[4]:
'30 31 32'
In [5]:
## Count how many bytes in a hex string
len_bytes("303\n  132 ")

Out[5]:
3
In [6]:
## Covert bytes to text (using ISO8859-1)
decode_bytes("303\n  132 ")

Out[6]:
'012'
In [7]:
## Format a byte into a binary table
format_bytes('aa')

 b8 b7 b6 b5 b4 b3 b2 b1 0xaa = 1 0 1 0 1 0 1 0
In [8]:
## Split a byte into fields of specified length
format_bytes('aa', [1,2,0,5])

 b8 b7 b6 b5 b4 b3 b2 b1 0xaa = 1 - - - - - - - - 0 1 - - - - - - - - - - - - - - - - 0 1 0 1 0
In [9]:
## Take a certain number of bytes from a hex string
take("303\n  132 ", 2)

Out[9]:
'30 31'
In [10]:
## Take a certain number of bytes from a hex string with an offset
take("303\n  132 ", 2, 1)

Out[10]:
'31 32'

## Output of cardpeek log, requesting EMV record 2 from Short File Identifier (SFI 2)¶

C:00B2 02 14 00 :6C97:
C:00B2 02 14 97 :9000: 7081948C219F0206...

In [11]:
## Reference control parameter (SFI 2)
format_bytes('14', [5,3])

 b8 b7 b6 b5 b4 b3 b2 b1 0x14 = 0 0 0 1 0 - - - - - - - - 1 0 0
In [12]:
## Length of record is 0x97 = 151 bytes
0x97

Out[12]:
151
In [13]:
## Response as a Python string
response="7081948C219F02069F03069F1A0295055F2A029A039C019F37049F35019F45029F4C089F34038D0C910A8A0295059F37049F4C089F08020002571352AAAAAAAAAAAA47D15122011407992700000F5F20134D5552444F43482F53544556454E204A2E44525F300202019F1F183134303739303030303030303030303932373030303030309F420208269F4401029F49039F37049F470103"


Response is 7081948C219F02069F03...

In [14]:
## Look at the first byte of the response (a tag)
take(response, 1)

Out[14]:
'70'
In [15]:
## Application class, constructed, one-byte tag
format_bytes(_, [2,1,5]) # 0x70 is a READ RECORD response message template

 b8 b7 b6 b5 b4 b3 b2 b1 0x70 = 0 1 - - - - - - - - 1 - - - - - - - - 1 0 0 0 0

Response is 7081948C219F02069F03...

In [16]:
## First byte of length is 0x81...
take(response, 1, 1)

Out[16]:
'81'
In [17]:
## b8 is 1, so the actual length is in the next byte
format_bytes(_)

 b8 b7 b6 b5 b4 b3 b2 b1 0x81 = 1 0 0 0 0 0 0 1
In [18]:
## The actual length is 0x94...
take(response, 1, 2)

Out[18]:
'94'
In [19]:
## which is 148 in decimal
int(_, 16)

Out[19]:
148

Response is 7081948C219F02069F03...

In [20]:
## The tag value is 148 bytes, starting after tag (1 byte) and length (2 bytes)...
take(response, 148, 1+2)

Out[20]:
'8c 21 9f 02 06 9f 03 06 9f 1a 02 95 05 5f 2a 02 9a 03 9c 01 9f 37 04 9f 35 01 9f 45 02 9f 4c 08 9f 34 03 8d 0c 91 0a 8a 02 95 05 9f 37 04 9f 4c 08 9f 08 02 00 02 57 13 52 aa aa aa aa aa aa 47 d1 51 22 01 14 07 99 27 00 00 0f 5f 20 13 4d 55 52 44 4f 43 48 2f 53 54 45 56 45 4e 20 4a 2e 44 52 5f 30 02 02 01 9f 1f 18 31 34 30 37 39 30 30 30 30 30 30 30 30 30 30 39 32 37 30 30 30 30 30 30 9f 42 02 08 26 9f 44 01 02 9f 49 03 9f 37 04 9f 47 01 03'
In [21]:
## which is the whole response from the card
len_bytes(response) - 3

Out[21]:
148

Response is 7081948C219F02069F03...

In [22]:
## The value is constructed so the next byte is a tag
take(response, 1, 1+2)

Out[22]:
'8c'
In [23]:
## Context-specific class, primitive, 1-byte tag
format_bytes(_, [2,1,5]) # 0x8c - CDOL1

 b8 b7 b6 b5 b4 b3 b2 b1 0x8c = 1 0 - - - - - - - - 0 - - - - - - - - 0 1 1 0 0

Response is 7081948C219F02069F03...

In [24]:
## Next byte will be the length
take(response, 1, 1+2+1)

Out[24]:
'21'
In [25]:
## b8 is 0 so this is a 1 byte length (0x21)...
format_bytes(_)

 b8 b7 b6 b5 b4 b3 b2 b1 0x21 = 0 0 1 0 0 0 0 1
In [26]:
## which is 16 in decimal
int(_, 16)

Out[26]:
33

Response is 7081948C219F02069F03...

In [27]:
## The CDOL1 is 33 bytes, skipping the tags and lengths
cdol1 = take(response, 33, 1+2+1+1)
cdol1

Out[27]:
'9f 02 06 9f 03 06 9f 1a 02 95 05 5f 2a 02 9a 03 9c 01 9f 37 04 9f 35 01 9f 45 02 9f 4c 08 9f 34 03'
In [28]:
## After the CDOL1 the next tag is the CDOL2
take(response, 1, 1+2+1+1 + 33) # 0x8d - CDOL2

Out[28]:
'8d'
In [29]:
## with length 0x0c (12)
take(response, 1, 1+2+1+1 + 33 + 1)

Out[29]:
'0c'
In [30]:
## So the CDOL2 can be extracted
cdol2 = take(response, 0x0c, 1+2+1+1 + 33 + 1+1)
cdol2

Out[30]:
'91 0a 8a 02 95 05 9f 37 04 9f 4c 08'

CDOL1 is 9f 02 06 9f 03 06 9f 1a 02 95 05...

In [31]:
## DOL objects are a list of tags and lengths that describe how to
## send data to a card possibly unable to decode TLV data
take(cdol1, 1)

Out[31]:
'9f'
In [32]:
## 9f starts a context-specific class, primitive, multi-byte tag
format_bytes(_, [2,1,5])

 b8 b7 b6 b5 b4 b3 b2 b1 0x9f = 1 0 - - - - - - - - 0 - - - - - - - - 1 1 1 1 1
In [33]:
## The next byte of the tag is 0x02
take(cdol1, 1, 1)

Out[33]:
'02'
In [34]:
## 0x02 is the last byte of the tag, giving 0x9f02
format_bytes(_, [1,7]) # 0x9f02 - Amount, Authorised (Numeric)

 b8 b7 b6 b5 b4 b3 b2 b1 0x02 = 0 - - - - - - - - 0 0 0 0 0 1 0
In [35]:
## Next is the length of the data expected: 0x06
take(cdol1, 1, 1 + 1)

Out[35]:
'06'
In [36]:
## Going back to the response, another 2-byte tag is at offset 78...
take(response, 2, 78) # 0x5f20 – Cardholder Name

Out[36]:
'5f 20'
In [37]:
## which has length 0x13 (19)
take(response, 1, 80)

Out[37]:
'13'
In [38]:
## This tag is ASCII encoded
take(response, 0x13, 81)

Out[38]:
'4d 55 52 44 4f 43 48 2f 53 54 45 56 45 4e 20 4a 2e 44 52'
In [39]:
decode_bytes(_)

Out[39]:
'MURDOCH/STEVEN J.DR'
In [40]:
## Another 2-byte tag is at offset 100...
take(response, 2, 100) # 0x5f30 – Service Code

Out[40]:
'5f 30'
In [41]:
## with length 0x02
take(response, 1, 102)

Out[41]:
'02'
In [42]:
## and in binary-coded decimal format: 201
strip_bytes(take(response, 2, 103))

Out[42]:
'0201'
In [43]:
## At offset 57 we have a 1-byte tag (with length 59)...
take(response, 1, 57) # 0x57 – Track 2 Equivalent Data

Out[43]:
'57'
In [44]:
## which is also in binary-coded decimal
## I've removed the middle of my card number ;-)
strip_bytes(take(response, 0x13, 59))

Out[44]:
'52aaaaaaaaaaaa47d15122011407992700000f'

# TLV decoding properly is very tricky to get right¶

• Different encodings are used in different contexts
• Overflow and underflow errors could easily occur
• Mistakes do happen and so banks do accept transactions that should in theory be invalid
• Maybe you would like to do this yourself, whether professionally or just out of curiosity
• “The only way to understand the wheel is to reinvent it.” — Mike Bond

