Which means that the distributions are different. When I compared the distributions with the Kolmogorov-Smirnov test I see that Two-sample Kolmogorov-Smirnov testĭata: normalize(as.matrix(X)) and normalize(as.matrix(Y)) I took the data from these experiments and normalized the values to the interval. (format out "~4,9f~%" (/ (hamming-distance (sha1-digest x) x) 1.0)))))Īgain, about 50% of the bits are different between the input and output vectors. Recall that since SHA1 digest produces 20 bytes, I am going to need random vectors of length 20: (with-open-file (out "hamming2-data2.csv" :direction :output (format out "~4,9f~%" (/ (hamming-distance (md5-digest x) x) 1.0)))))įor the majority of the vectors, it is most likely that about 50% of the bits of the output will be different than the input. Now, let me test this on a sample of size 40,000 on MD5: (with-open-file (out "hamming2-data1.csv" :direction :output (reduce #' (map 'list (lambda (i j) (calc i j)) x y)))) Since the space of bit strings of length 128 is large, I will use a Monte-Carlo method: (defun random-vector (m)Īnd finally, the function that calculates the quantity I am interested in is (defun hamming-distance (x y) (let ((hasher (ironclad:make-digest :sha1))) (coerce (ironclad:digest-sequence hasher x) 'list))) Thank you so much for the work you have done The tumblr website appears to be coded by monkeys, and this extension is an absolute godsend. ImplementationĪs yesterday, I am going to use ironclad: (require :ironclad)Īnd the same MD5 and SHA1 hashing functions: (let ((hasher (ironclad:make-digest :md5))) Today, I am going to look at the distributions of numbers $d(f(x),x)$ for $x\in X$. Hamming Distance and Hashing Functions Description of the ProblemĪs yesterday, assume $(X,d)$ is a metric space of bitstrings of a fixed length together with the Hamming distance.
0 Comments
Leave a Reply. |