Checksum: Difference between revisions

1,124 bytes added ,  1 year ago
→‎Algorithms: Section on "fuzzy checksum"
(Checksums are used far more widely than just storage. See preceding sentence. So to have a 'usually...particular type of storage' sentence structure is unrepresentative for a general-purpose statement in the lead.)
(→‎Algorithms: Section on "fuzzy checksum")
The simple checksums described above fail to detect some common errors which affect many bits at once, such as changing the order of data words, or inserting or deleting words with all bits set to zero. The checksum algorithms most used in practice, such as [[Fletcher's checksum]], [[Adler-32]], and [[cyclic redundancy check]]s (CRCs), address these weaknesses by considering not only the value of each word but also its position in the sequence. This feature generally increases the [[Analysis of algorithms|cost]] of computing the checksum.
===Fuzzy checksum===
The idea of fuzzy checksum was developed for detection of [[email spam]] by building up co-operative databases from multiple ISPs of email suspected to be spam. The content of such spam may often vary in its details, which would render normal checksumming ineffective. By contrast a "fuzzy checksum" reduces the body text to its characteristic minimum, then generates a checksum in the usual manner. This greatly increases the chances of slightly different spam emails producing the same checksum. The ISP spam detection software, such as [[SpamAssassin]], of co-operating ISPs submits checksums of all emails to the centralised service such as [[Distributed Checksum Clearinghouse|DCC]]. If the count of a submitted fuzzy checksum exceeds a certain threshold, the database notes that this probably indicates spam. ISP service users similarly generate a fuzzy checksum on each of their emails and request the service for a spam likelihood.<ref>{{cite web| url= | title = IXhash |publisher= Apache |accessdate=7 January 2020}}</ref>
===General considerations===