Checksum: Difference between revisions

25 bytes added ,  1 year ago
→‎Algorithms: Add section anchor for forthcoming redirect
(→‎Algorithms: Section on "fuzzy checksum")
(→‎Algorithms: Add section anchor for forthcoming redirect)
The simple checksums described above fail to detect some common errors which affect many bits at once, such as changing the order of data words, or inserting or deleting words with all bits set to zero. The checksum algorithms most used in practice, such as [[Fletcher's checksum]], [[Adler-32]], and [[cyclic redundancy check]]s (CRCs), address these weaknesses by considering not only the value of each word but also its position in the sequence. This feature generally increases the [[Analysis of algorithms|cost]] of computing the checksum.
 
==={{anchor|fuzzy checksum}}Fuzzy checksum===
The idea of fuzzy checksum was developed for detection of [[email spam]] by building up co-operative databases from multiple ISPs of email suspected to be spam. The content of such spam may often vary in its details, which would render normal checksumming ineffective. By contrast a "fuzzy checksum" reduces the body text to its characteristic minimum, then generates a checksum in the usual manner. This greatly increases the chances of slightly different spam emails producing the same checksum. The ISP spam detection software, such as [[SpamAssassin]], of co-operating ISPs submits checksums of all emails to the centralised service such as [[Distributed Checksum Clearinghouse|DCC]]. If the count of a submitted fuzzy checksum exceeds a certain threshold, the database notes that this probably indicates spam. ISP service users similarly generate a fuzzy checksum on each of their emails and request the service for a spam likelihood.<ref>{{cite web| url=https://cwiki.apache.org/confluence/display/spamassassin/iXhash | title = IXhash |publisher= Apache |accessdate=7 January 2020}}</ref>