2 Nerd-Its - +

Performance Context

Comment a comment by Mark A. McBride (markmcb), published on 02 February 2009
Navigate to the top level to view all replies to the article Data Analytics Approach to Name Matching
other nerds have left 3 comments below

Great article. I was wondering if you could offer any perspective into the relative performance hits of adding each technique you mention. Will any of them wreak havoc on a system when processing a large data set?

Thread parent sort order:
Thread verbosity:
4 Nerd-Its - +
RE: Performance Context by VnutZ :: NR10

Off the top of my head – not at the moment. But I would think the baseline name – to – name operation is somewhat analagous to a poor sorting algorithm in terms of efficiency. Nearly every matching algorithm will somewhat result in the same poor performance in terms of speed.

The only way I can think to beat it would be to use clustered indices which would at least guarantee – for example – that trying to match the name smith would at least jump within the second list to the appropriate cells right away as opposed to performing a complete table scan.

If the operation were taking place with CURSORS then you could code it somewhat like you would in normal programming – i.e. looping only through string beginning with “S” for instance. But it’s still dependent on having the comparison list sorted.

The real problem occurs if you don’t mitigate “fluff matches”. For example, lets say you’re using SOUNDEX and 2 @Smith@s in the source list and have 5 @Smith@s, 1 Smythe and 1 more Smyth in the comparison list. Both of the original @Smith@s will match up to every other Smith, Smythe and Smyth resulting in 14 matches as opposed to just two. The point of this rambling is that as the list becomes bigger, the “fluff matches” increase which could dramatically increase the data requirement to store the results.

The Showcase

Nerd-Its   Nerd Trends   Last Ten  

  1. RE: The true solution in Scientology: We've had it with you
  2. RE: Actually... in Scientology: We've had it with you
  3. RE: Actually... in Scientology: We've had it with you
  4. RE: The true solution in Scientology: We've had it with you
  5. RE: The true solution in Scientology: We've had it with you
  6. RE: Actually... in Scientology: We've had it with you
  7. RE: Actually... in Scientology: We've had it with you
  8. RE: The true solution in Scientology: We've had it with you
  9. RE: Actually... in Scientology: We've had it with you
  10. RE: Discussing Book of Mormon anachronisms in God before Country in the Military

What is OmniNerd?

Omninerd_icon Welcome! OmniNerd's content is generated by nerds like you. Learn more.

Voting Booth

The Interstate Commerce Clause of the U.S. Constitution empowers Congress to regulate?

2 votes, 0 comments