JPEG/EXIF MESSAGE DIGEST COMPILATION WITH SHA512 HASH FUNCTION

Security information method for jpeg / exif documents generally aims to prevent security attack by protecting documents with password and watermark. Both methods cannot used to determine the condition of data integrity at the detection stage of the information security cycle. Message Digest is the essence of a file that used to represent data integrity. This study aims to compile a message digest to detect changes that occur in jpeg / exif documents in information security. The research phase consists of five stages. The first stage, identification of the jpeg / exif document structure conducted using Boyer-Moore string matching algorithm to find jpeg/exif segments location. The Second stage is segment content acquisition, conducted based on segment location and length obtained. The Third step, computing message digest for each segment using SHA512 hash function. Fourth stage, jpeg / exif document modification experiments to identified affected segments. Fifth stage is selecting and combining the hash value of the segment into message digest. Obtained result show message digest for jpeg / exif documents composed of two parts, the hash value of the SOI segment and the APP1 segment. The SOI segment value used to detect modifications for jpeg to png conversion and image editing. The APP1 hash value used to detect metadata editing. The SOF0 hash values use to detect modification for image recoloring, cropping and resizing.


INTRODUCTION
The jpeg/exif document is a document format that results from the use of digital cameras such as smartphone camera. The jpeg/ exif documents as image files are widely used in digital communications such as on social media. The Exchange of information requires security to ensure the information received is the same as the information sent. Information security for jpeg/exif documents generally designed to prevent document modifications. The use of passwords in the jpeg/exif document has long been used but can still be overcome with a variety of password remover tools that are widely available. Other forms of security are shown in the study by Wijayanto [1]. This study shows exif metadata data from jpeg/exif documents can used to prevent copyright theft. The usage of watermark as information security method in study by Sukarno [2] provide protection for preventing document for modification. The use of passwords exif metadata and watermark cannot used in detection stage of information security cycle to detect changes that occur in received jpeg/exif documents.
Message Digest is the essence of a file that can used to represent data integrity. The Message digest is widely used to detect the integrity of data in installers provided by open-source application developers. The Message digest compiled using the hash function. The Hash function is a cryptographic method for the one-way encryption process. The output hash function is a hash value that has characteristics that cannot translated or decrypt into the original form. The Hash value is very sensitive to changes in the input of any size. Secure Hash Algorithm (SHA) is a hash function developed by the National Institute of Science and Technology (NIST). SHA consists of Rachmad, JPEG/EXIF Message Digest Compilation… 45 several categories that distinguished based on the size of the output hash value [3]. SHA512 is a hash function variant of the SHA-2 group [4]. SHA512 has an output size of 512 bits that make this variant better than previous hash function.
The use of SHA512 for information security was found in a study by Refialy [5]. This study uses SHA512 to compile the hash value of a pdf document. The results obtained indicate the resulting hash value is able to detect small changes in the modification of pdf documents. The jpeg / exif file has a larger size than a pdf document. The size of the jpeg / exif document is the result of the development of optical technology in digital cameras. The larger the size of the document, the process of composing the message digest requires more time. This study aims to compile a concise message digest to detect changes in jpeg / exif documents in information security.
String matching algorithms are used to match or compare one or several characters and strings [6]. The Boyer-Moore string matching algorithm is included in the exact string matching algorithm that performs searches by comparing the characters of the strings tested with the pattern sought [7]. The Boyer-Moore have two rules for searching process, Good-Character rule and Bad-Character Rule. These two rules determined the direction of searching process. Bad-Character rule occurred if character from pattern is not same as in string. This condition will make comparison shift to left to next character of pattern. Good-Character rule occurred if character from pattern is same as in string after Bad-Character rule occured. This condition will make the next comparison shift right aligning two match characters. Those two rules searching make searching process faster than other exact string matching algorithm by avoiding not-necessary character comparison [8] .

RESEARCH METHODOLOGY
Research study conducted in five stages that shown as figure 1. First stage is image file segment identification. This stage have purpose to identify the jpeg/exif file structure. Image files as research object acquired from two smartphone types, Asus Z00UD and Samsung Galaxy A5. Each smartphone take 10 image. Images file taken with embedded camera apps from each smartphone with mode auto from indoor and outdoor sites. Result from first stage is location index of each file parts. This location index will use on second stage as parameter to identified the beginning and end of file parts. Identification process conducted by use Boyer-Moore string matching algorithm for segment marker searching. Segment markers are data bit that located in the beginning of each jpeg/exif segments [9]. Tabel 1 shown segment marker value for each segments.  Table) ffdb SOF0 (Start Of Frame-0) ffc0 DHT (Define Huffman Table) ffc4 SOS (Start Of Scan) ffda Boyer-Moore string matching algorithm start pattern searching from most-right pattern character and shift to left until reach the left most character from pattern [10]. Boyer-Moore string matching algorithm have two searching stages. First stage is preprocessing, comprises of variabel assignment such as m for pattern length, n for string length idxstr for string character index, idxpatter for pattern character index and match for character match comparison that occured. Second stage is character comparison. The second stage executed as iteration loop that boundary by two condition. Looping will stop if match variabel have value same as m variable or comparison has reach the end of string. The segment location index identified by the last value of idxstr variable.
Second stage is segment content acquisition. To acquire segment content, need two parameters, starting index and length of content. The starting index provided by segmen index location. Segment length compute from substraction between index location values from two adjacent segments. The third stage is hash value computation. The computation conducted for every segment. Hash value computation with SHA512 hash function consist of three stages, preprocessing, hash computation and hash value compilation. Figure 3 shown the stages sequences. The preprocessing stages consist of four process, padding, parsing, setting initial hash value and variable assigning [3]. The padding process is the adjustment of the size of input data so that the processed data has a size of multiples of 512 bits. The parsing process is to divide the data bits into groups of data with 64-bit size. The process of determining initial hash values and register assignments arranged as shown in Table 2.
The eight registers (a, b, c, d, e, f, g, h) and blocks of input data are used in computing the hash value according to the sequence shown in Figure 4. The hash value computing executed iteratively as much as the block of data bits generated from the parsing stage. The mathematical equation in Figure 4 used according to what is shown in equations 1 to 6. (1) The results of the computing process are eight hash values stored in eight registers. Figure 5 shows the results of calculating the hash value for the input "abc" string.   Before the image file is modified, the hash value is calculated from the original file (HV0). The hash value of the original file is used in comparison to the hash value of the modified file (HV1, HV2, HV3, HV4, HV5, HV5, HV6). Each modified form will have the value hash of each segment compiled. The comparison of hash values is done for each of the same segments. The hash value that has changed in each form of modification will be used as the compiler of the fingerprint file at a later stage. The last stage is message digest compiling. Each hash value from third stage compile into one string file to form one message digest.

RESULTS AND DISCUSSION
The result of jpeg / exif file segment identification shown in Table 3 and 4. Table 3 shown segment location index for image file from Asus Z00UD smartphone. The location index of the SOI and APP1 segment in all jpeg / exif files in Table 3 has the same values 0 and 4. This is because the SOI segment is located in the initial bit of the image file and consists of only four bits containing the segment marker segment, ffd8. The location index and length of the segments DQT, SOF0, DHT and SOS in ten jpeg / exif files have different values. This makes the index location of a segment of a jpeg / exif file not be used to identify the location of the segment in another file. Therefore, the Boyer-Moore algorithm matching string is always used to identify the location of the segment for each time the message digest is compiled. The identification of the location of the jpeg / exif file segment from the Samsung Galaxy A5 smartphone is shown in Table 4. The location index of the SOI and APP1 segment in all jpeg / exif files in Table 4 has the same values 0 and 4. This is because the SOI segment is located in the initial bit of the image file and segment location index on the jpeg/exif file from the Samsung Galaxy A5 smartphone shows that each file has the same segment location. Therefore, identification of the location of the jpeg/exif segment that will conducted in the future does not require a search from the start. The location index of Tables 3 and 4 used to calculate the length of each segment. This calculation compilshed by finding the difference in location index values from two adjacent segments as formulated in equation 7.
An example of the calculation of segment length for jpeg / exif documents from the Asus Z00UD smartphone shown in Table 5.  The APP1 segment length in the first file in Table 3 has a value of 26396 obtained using equation 7, which is the result of a reduction between the DQT segment location index and the APP segment1. The results of the long calculation of all segments show that the SOS segment has the largest segment length. This is because the SOS segment contains image data that is the main data from the Image file. The greater the size of the jpeg / exif document, the longer the SOS segment will be. SOS segment length calculation is done by operating a reduction between the overall image bit length and the SOS segment location index. The bit length of image data obtained from the length calculation of the converted file at the beginning of the identification phase of the segment location that is also stored in the n variable in the search process with Boyer-Moore string matching algorithm. The location index value and segment length then used as parameters for content acquisition for each segment. Figure 9 shows the application interface for the acquisition and calculation of hash values for each segment.  The left part of Figure 11 shows the hash value of six segments in the original file. The right section shows the hash value of five segments in the recoloring modification file. The comparison results per segment consist of two conditions, "Match" if the hash values of the two segments are equal and "Not Match" if not the same. The comparison exemplified in Figure 10 produces only the value of the SOI segment hash value. This shows that four other segments experienced changes in content when recoloring modifications occurred. The affected segments from each modified experiments shown in Table 6. SOS segment have the most length content from other segments. This condition made segment marker searching process need more time because Boyer-Moore string matching algorithm have O(mn) time complexity for worst case condition [7]. SOF0 have smallest size than others except SOI segment. SOF0 store image information such image dimension and number of color components that always change if image altered as in 1 st , 2 nd , 5 th and 6 th experiments. Figure  11 shown jpeg/exif file message digest that arranged from three hash values, SOI, APP1 and SOF0.

CONCLUSION
Jpeg/exif message digest consist of three hash values that represent for modified experiments. SOI hash values use for identifying file convertion and text addition modification. APP1 hash values use for identifying metadata editing modification and SOF0 hash value for identifying recoloring, resizing and cropping modification. Identifying those three segment will make segments searching faster for Boyer-Moore string matching algorithm. Compilation of Message digest from SHA512 hash value have advantage because of its small size and cannot decrypt. Future research hope can developed message digest compilation for another file types that common use in digital communication such as video and audio.