Data Mining News

ECMOD 2010 London

The mine that data - Fri, 08/06/2010 - 03:23
About 15% of MineThatData Nation joins us each day from Europe. I sincerely appreciate our European followers ... blog posts are purposely published in the evening here in the Pacific Northwest so that our European friends have fresh content first thing in the morning!

Now our European friends can get fresh content live, in person. The kind folks at the ECMOD conference have invited me to share information on two different topics.
  • 35 Things Catalogers Can Do To Improve Profitability Tomorrow Morning.
  • Effective Matchback / Allocation Strategies And Methodologies.
Here's a press release for the conference.

Here's a link, follow it and register for the conference now ... what's better than London in early October?

And follow the good folks at ECMOD on Twitter!

BlackBerry - Love it, Hate it, Whatever

I was amused by this moment of journalistic dissonance assembled by Techmeme where to authors started off their articles with opposing generalities about the BlackBerry.


Glibers Dresses: Something In My In-Box

The mine that data - Thu, 08/05/2010 - 03:15

From: Pepper Morgan [mailto:pepper.morgan@gliebersdresses.com]

Sent: Wednesday, August 4, 2010 9:37 AM

To: Kevin Hillstrom

Subject: Any Interest In Helping Us?


Kevin:

It's been a long time since we chatted!

I just wanted to let you know that Brandon Templeton, our CEO, was fired on August 1. We were surprised that he was fired, we were not surprised why he was fired.

He wasn't a big fan of our "catalog" mindset, and he was so confident of the future of marketing that he decided he was going to go "all in" during the month of July. He decided that we would not mail our early July sale catalog, and that we would not mail our late July fall preview catalog, and would instead focus our marketing activities on social media and the new iPad and Android apps that Roger created.

Needless to say, sales tanked, and our parent company freaked out. By the end of July, sales were 50% below plan.

Of course, our team viewed this as a validation of the catalog business model. Meredith was absolutely energized by the results, she crowed in meeting after meeting about the importance of what she calls "spread merchandising", the idea that catalog spreads tell an important brand story.

I viewed this in a different way, and my way of viewing this appears to be a bit unpopular. I told the team yesterday that 50% of our sales did not disappear, and that company profit in July was on plan. Lois claimed that we held on to our sales because of her free shipping based loyalty program, but there is more to it than that, isn't there?

I mean, we basically stopped marketing to the customer, and 50% of the sales still happened. We know that free shipping gives us a 10% bump, so this means that customers kept shopping because of e-mail marketing, because of search, because of our apps, because of our social media strategy, because of our loyalty program, and because of brand loyalty.

Do you have other clients who identified this dynamic, where you stop mailing catalogs and you still keep 50% of your business? And if that is the case, what is their response? Do they increase online marketing spend? Do they spend more money in offline marketing? Do they increase their social media presence? Do their apps actually drive sales increases? How do they staff themselves in order to capitalize on this type of outcome? What happens to the 72 year old female shopper who mails her order in an envelope, with a stamp and a check? Would you be willing to share what you know with us via Skype, I'll pay you for your time out of my budget? Are you available next week?

I hope you are enjoying summer. Sonora is off at summer camp at Lake Winnipesaukee, so I'll see here again around August 15.

Meredith and I were talking the other day that we miss having your thoughts in our Executive meetings. We didn't always agree with what you said, but we agree that you made us think. Roger, on the other hand, well, he probably wouldn't be happy that I'm contacting you. He thinks he can run this place, he's making a bid for the CEO job, and he strongly believes in a multichannel mix of traditional marketing and online marketing, coupled with MBA-style business experience. He's asking the parent company for an opportunity to run the business on a temporary basis, he wants an opportunity to demonstrate his skills.


Thanks,

Pepper

Bricks 'n Clicks: Barnes & Noble

The mine that data - Wed, 08/04/2010 - 13:19
If all of the multichannel bricks 'n clicks strategy of the past decade was so prescient, then Barnes & Noble should have beaten Amazon and Wal-Mart and Apple into submission, right?

That didn't happen, and now B&N is for sale.

A generation of marketers were trained to believe that a theory was reality. I've been in meetings, too many to count, where the Executive shares all of the theories about why a bricks 'n clicks or a multichannel strategy that includes paper will trump new technology. The slides are shared, slides that suggest that customers want to do research online or via paper, and then get in a car and fight traffic for 17 minutes in order to complete the transaction, sales tax included. It's a theory supported with misleading data.

Always trust customer response over vendor/consultant theory. Most important, do your own Multichannel Forensics analysis, and teach yourself how your customers prefer to shop!

Testing: Half-Life

The mine that data - Wed, 08/04/2010 - 03:15
It might be the most asked question I get, and for good reason ...

Question: "Kevin, what the heck are you talking about when you say that you shouldn't test something that has a short half-life?"

First, let's define half-life ... go to Wikipedia for your answer!

Now let's view this from a testing standpoint. Maybe you want to test a big green "sign up now" button instead of a small blue "click here for more information" button. You run a test, and you learn that the big green button improves conversion rate by 34%.

Do you assume that the big green button will outperform the control by 34% forever?

If you are measuring "half-life", you are measuring the amount of time that it takes for the 34% lift to become a 17% ... or worse, to become a 0% lift.

So often, we test concepts that either have minimal half-life, or are only valid for the time/audience when they were tested. In other words, the concepts we are testing are fleeting in nature, they have no staying power.

There are concepts that have significant staying power. In apparel merchandising, you know that women buy merchandise for men, so you can improve conversion (online) or reduce expense (offline) by exclusively advertising womens merchandise. This has a half-life just shy of infinity, you'll be more profitable for the duration of your career by testing strategies that capitalize on this well-known fact. We don't read about these strategies and test findings, because the strategies are so profitable that they yield a significant competitive advantage to the folks who possess the knowledge.

Many concepts have minimal staying power, and, by definition, minimal half-life. We read about these strategies and test findings all of the time, because these strategies have a short half-life, rendering their competitive advantage only to the audience that saw the message during the timeframe when the test happened.

This is the phenomenon I refer to when I mention "half-life" in my writing. The best way to determine if you have a half-life issue is to re-test your strategy three months, six months, or twelve months later, observing if you still have a meaningful lift over your control group.

Summer Segmentation: Employee Orders

The mine that data - Tue, 08/03/2010 - 03:15
If your database allows you to do this, I'm begging you to do this ... create a segmentation variable for employee orders.

Then, sum employee demand for the past twelve months. Compare that sum to the sum of employee demand from 13-24 months ago, 25-36 months ago, 37-48 months ago, etc.

Go to your human resources department, and ask how many employees you had during each time period. Then calculate demand per employee.

Trend demand per employee by year. What does the productivity of your employee base look like? Is it correlated with the number of twelve month buyers you have? If sales are in decline among your employee base and among your customer base, then ask your employees why they won't buy the merchandise??!!

I did this experiment once ... my marketing department was being pummeled by management for not "promoting the brand". So I summed employee demand among all employees Director and above, showed that demand was in decline, then asked this employee set in a meeting why they were choosing to not spend as much with our brand, after all, marketing should be irrelevant to them, they live and create the brand.

Yes, the room got really, really quiet.

Employee orders are the analyst/marketer's best friend. You can cut through a lot of garbage and get down to the core issues associated with merchandise productivity by demonstrating that the Leadership team is / is not buying the merchandise.

Is there too much data?

Datamining and predictive analytics - Tue, 08/03/2010 - 02:52
I was reading back over some old blog posts, and came across this quote from Moneyball: The Art of Winning an Unfair Game

Intelligence about baseball statistics had become equated in the public mind with the ability to recite arcane baseball stats. What [Bill] James's wider audience had failed to understand was that the statistics were beside the point. The point was understanding; the point was to make life on earth just a bit more intelligible; and that point, somehow, had been lost. 'I wonder,' James wrote, 'if we haven't become so numbed by all these numbers that we are no longer capable of truly assimilating any knowledge which might result from them.' [italics mine]

I see this phenomenon often these days; we have so much data that we build models without thinking, hoping that the sheer volume of data and sophisticated algorithms will be enough to find the solution. But even with mounds of data, the insight still occurs often on the micro level, with individual cases or customers. The data must tell a story. 


The quote is a good reminder that no matter the size of the data, we are in the business of decisions, knowledge, and insight. Connecting the big picture (lots of data) to decisions takes more than analytics.

The Power of Bing (t-shirts)

I was chatting with a neighbor this weekend. He's an architect, and, upon noticing that I was wearing a Bing t-shirt, mentioned that now at his firm most everyone has switched to using Bing. The reason being that the image search was superior to Google's. Bing's image search, in addition to providing different - in this case, better - results, innovated the page-less scrolling of results.

The model of attracting people via a key feature, which beats the competition, then encouraging them to stay for all the other features is, naturally, what the fight for users is all about. Here's an article from the New York Times, via Techmeme, which illustrates the feature war with a number of similar stories. Such stories are, of course, anecdotal, but each user is one by their own anecdote.

Square is the new Round, in Web Design

One of the hallmarks of the Web 2.0 movement was the use of rounded corners in pretty much every element that graced the pages of any hot new start up. These rounded corners were often accompanied by two other elements: the lozenge-like lighting on buttons and the mirrored reflections on an imagined surface.

Here's a snippet of the Picnik website with lots of rounded goodness:


Apple adopted this idiom in many of its products. The Safari browser uses the reflections in its grid presentation of browsing history:

and perhpas most famously, the iPhone design language is all about the rounded corners of Web 2.0 stickers:

The new Web 2.0 design idiom, however, is all about squares. For example, the new BBC design uses simple blocks and solid colours:

Stamen design and Infosthetics both adopt a dense, but appealing approach to illustrated navigation:

Stamen

Infosthetics

Making the transition complete, this squared approach to design will be the hallmark of the Windows Phone 7 UX:

As the web adopts these new crisp corners, will the iPhone UI start to look stale, just as it looked fresh and current when it launched?
 
 
 
 
 
 

Dear Catalog CEOs: The Secret of Page Counts

The mine that data - Mon, 08/02/2010 - 03:15
Dear Catalog CEOs:

I, too, remember a day when a 600 page catalog represented a welcome arrival in the mailbox.

Today, the 600 page catalog is called "your website". The catalog, for those under the age of 55, represents a vehicle that "creates demand".

So if the job of the modern catalog is to "create demand", then one might think that we should try to optimize the size of the catalog, so that we can reach the largest audience as possible at the lowest possible cost.

I know, this flies in the face of everything we've been taught. The printing community and the USPS push us toward optimal page counts, and discourage small catalogs. We're enticed to add four or eight pages to achieve "efficiencies".

Why isn't our goal to achieve "the most profit"?

Every company can identify the relationship between page counts and demand, heck, I do this for clients every week.

Take this example, for instance. A business has a 124 page catalog, and is looking to mail the catalog to outside lists. Last year, the catalog $1.4 million in demand and generated 13,987 responses.

If you assume that 70% of the demand can be achieved on 50% of the pages (these days, you can often achieve 80% of the demand on 40% of the pages or you can do even better), then we can estimate what might happen with 56 pages.

We can simulate the outcome of the 56 page catalog, and in most cases, the smaller catalog is going to outperform the bulky, bigger catalog, regardless of how the smaller catalog is merchandised.

Here, we increase reach, from 600,000 customers to 1,650,000 customers. What's not to like about that, folks?

Here, we increase demand from $1.4 million to $1.9 million. What's not to like about that?

Here, we increase total responses from 13,987 to 19,878 ... isn't that the goal of acquisition marketing?

Here, we increase profit from $62,000 to $78,000. You just paid for a portion of your annual salary!

Oh, I can hear folks howling already ... "if you don't advertise the item, the customer won't buy the item, so I need to present everything in my catalog."

Horsefeathers!

Why not advertise the best-selling items, and then direct the customer online for your full merchandise assortment?

The secret of page counts is that "smaller is better". Your printer and the USPS will encourage you to go bigger. I can profitably demonstrate that you can "go smaller", and I can help you grow your business in the process while going smaller. Your list vendor and your co-op will sincerely appreciate your new strategy.

If you want help running the simulations, I am available and ready to assist you ... the whole exercise takes an hour, or less!

The Interpretation of Tables in Texts, 2000

Ten years ago, I submited the final version of my PhD thesis: The Interpretation of Tables in Texts. At the time, there wasn't a huge amount of research going on in the space. Those working in the area pretty much all knew each other and would meet at a couple of conferences, generally in the OCR community as there wasn't much interest in table understanding in other research areas.

Now, there is quite a healthy interest in table understanding due in part to the promise of tabular data being a reasonable way to bootstrap semantic relationships via the large scale mining of the web.

Most recently, I spotted this paper by Finin et al :Exploiting a Web of Semantic Data for Interpreting Tables, WebSci10, 2010 which echoes much of the promise of the 'first generation' of table understanding work by the likes of myself, Dan Lopresti, Thomas Kieninger, Jianying Hu, etc. In fact, the motivating example in that paper:


 
bears a strong similarity to that from my thesis:

with the later also illustrating to some extent the complexity of table semantics.

I'm still very much interested in tabular data. Perhaps as it represents the simplest transition point from textual presentations of information to graphical, or topological representations of information.

For posterity, I've embeded the Scribd incarnation of my thesis below.

2000 - Hurst (PhD Thesis) - The Interpretation of Tables in Texts

Visualizing Economics, Catherine Mulbrandon's Blog

I've just stumbled across Visualizing Economics - a blog by Catherine Mulbrandon. The title pretty much says it all, so please take a look.

Google, Microsoft and Cash Cows

Two interesting posts of late: Don Dodge (former Microsoft evangelist and now a Google evangelist) writing about misplaced expectations for Microsoft stock in MSFT earnings up, stock down. What do investors want? and Michael V. Copeland and Seth Weintraub at Fortune writing about Google's transition to being a cash cow company in Google: the search party is over.

Year to date, Google's stock is down about 22%; Microsoft is down about 11%.

There is no room for complacency on either side.

Tech Travels Japan

Just back from a quick trip to Japan, I thought I'd write up some thoughts and observations about the trip from a technology point of view.

All told, Japan is a far more technologically integrated country than any other I've visited. Much of this integration is borne from necessity (population density) and through organic processes (what originated as an electronic money system for the railways has morphed into a general e-wallet accepted at many points of sale).

Transport: technology in transport includes automated ticket machines, turnstiles which recognize both of the major electronic wallet formats (which are also embedded in mobile devices), extremely precise timetable execution, conductors carrying wireless, touchscreen ticket verification systems which are integrated with the carriages themselves (when they've verified your ticket, the light above your seat indicates the verification).

In addition to the Japanese side of the trip, traveling on Canada Air was a pretty up to date experience. The 777 had USB and socket outlets on each seat. The touchscreen entertainment system was available for use at the gate (I was already 30 mins into a movie before we took off). On the downside, the video watching experience appears to have morphed into an advertisement pushing channel from which you had to literally look away to avoid given that the screen was only inches away from your face.

Entertainment: all of the consumer electronic stores I visited in both Akihabara and elsewhere were full of 3D TV offerings from all the major flat screen manufacturers. The Sony Building in Ginza (which I understand is to be closed down in the near future) was transformed into a 3D aquarium with all of the floors featuring 3D technology and amazing videos of coral reefs, sharks, etc. Some TVs are boasting the ability to recognize the emotions of the human face, but I couldn't quite figure out what they were doing with the results!

Mobile Devices: I live in the Seattle area which is probably a very biased sample of the US in terms of mobile device use. On the bus I take to commute, iPhone adoption is extremely high, as is Kindle and iPad use. In Japan, with a quite different sample of observations on public transport, iPhones were far less prevalent - passengers tending to use the type of device with a physical keyboard. In addition, I only spotted one iPad (and that in the lobby of a hotel) and no other type of reading device. The Japanese have maintained the original form factor of the pocket paperback book (i.e. a paperback book you can actually fit in your pocket) - and that was still clearly popular. Public telephone kiosks seem to be disappearing (though nothing like to the extent in the US).

Search Engines: no-one has heard of Bing, or the fact that Microsoft has a search engine. It was big news when the Yahoo! Google partnership was announced while I was there (Yahoo! Japan is not the same company as Yahoo!). Google is running billboard advertisements for its browser (Chrome).

Tech Corporations: Two well known corporations in Japan (Rakuten and Uniqlo) have or are switching to English as their official corporate language. This is a pretty interesting change and highlights their international ambitions.

While technology is a big part of Japanese culture, much of it is used to support something that can't be packaged and automated - high quality customer service. The dedication and attention to detail one gets as a consumer or traveler in Japan is incredible and often not related to the amount one is paying.

Fetzer's Footwear: Fed Up

The mine that data - Thu, 07/29/2010 - 03:15
Today, I am meeting Lauren Fetzer, CEO of Fetzer's Footwear, for a picnic at Bacteria Bay, located on the west side of Madrona Island.

Bacteria Bay is named after a mysterious illness befell the early settlers of Madrona Island, causing them to scratch uncontrollably and, in many cases, to run into the bay in a desperate attempt to alleviate their misery.

Kevin: "What are you listening to?"

Lauren: "But It's Alright by Huey Lewis, now that's a classic."

Kevin: "Alright."

Lauren: "You work with a lot of companies, don't you?"

Kevin: "True."

Lauren: "So let me ask you a question. Do the Executives at other companies always do what the CEO tells them to do?"

Kevin: "That's a loaded question. I think most Executives execute the spirit of what the CEO asks them to do."

Lauren: "I want for my team to execute exactly what I ask them to do."

Kevin: "Sort of like the robotics in your distribution center, right?"

Lauren: "In some ways. They're really loyal ... lifeless, but loyal. These folks, I'm asking them to be strategic, and they just keep focusing on tactics and in-fighting. Is it like that at other places?"

Kevin: "Why would you expect Bart Cox to be strategic when you blitz him with a weekly verbal assault because the Alderwood store fails to deliver positive comps?"

Lauren: "It's his job to get that store to perform."

Kevin: "So do you want him to focus on the tactics associated with getting one store to perform, or do you want for him to focus on a five year strategic plan?"

Lauren: "Both, that's the life of the multi-tasker."

Kevin: "But he gets fired if he doesn't get Alderwood to perform this year, right?"

Lauren: "Possibly."

Kevin: "Then that's what he's going to focus on."

Lauren: "And Penny keeps focusing on campaigns. I want to know what our marketing strategy is going to be to compete with Zappos in 2015."

Kevin: "Everybody keeps asking her how the campaigns are working, what would you like for her to do?"

Lauren: "I need a strategic plan from Penny, now. I'm fed up."

Kevin: "Fed up?"

Lauren: "Absolutely. I'm thinking of buying a business. I've been talking with the CEO of Buckley Boots. He's got a five million dollar business that can't get to breakeven, but he's got a social component to his business that is second to none, and he even sends catalogs to his loyal customers. I'm thinking we can fold his business into our business, and then leverage his expertise to develop a marketing plan that takes us to 2015 and beyond. What do you think?"

Kevin: "That's one way to get the job done."

Lauren: "Sure is. Would you be willing to evaluate their customer file so that I can understand the five year potential of their business?"

Kevin: "Absolutely! That's what I do best."

Lauren: "Good! I'll send the data tomorrow, have your review completed by next week."

Kevin: "Alright."

Summer Segmentation: Cyber Monday

The mine that data - Wed, 07/28/2010 - 03:15
If you've followed my blog over the past four and a half years, you know all about my long-standing detest of Cyber Monday, a online retailing event contrived by a leading trade organization to encourage shoppers to take advantage of margin-eroding discounts and promotions that benefit promotion of the trade organization that invented the holiday.

Without evidence to the contrary, I assumed that customers who purchased via discounts and promotions on this contrived holiday would not repurchase in the future.

So, I created a segmentation variable for a client called "Cyber Monday". Any customer purchasing on Cyber Monday received a "1", all other customers received a "0" for Cyber Monday purchasing history.

I plugged this variable into a statistical model ... and after controlling for recency and frequency/monetary-value and a host of other variables ... I learned that past purchases on Cyber Monday were not detrimental to subsequent customer value, no positive or negative influence was detected (the variable had a small, negative coefficient, but it was not statistically significant).

So do some due diligence ... create your own Cyber Monday variable, and see if it has a positive, neutral, or negative influence on the long-term value of customers. Don't read opinions and assume that the opinions are truthful, prove a hypothesis for yourself, for your business!

Amazing Physics Simulations from Lagoa

Vu Nguyen posts this amazing video from Lagoa.

Lagoa Multiphysics 1.0 - Teaser from Thiago Costa on Vimeo.

The company behind this video is somewhat ellusive (a website with their name currently just hosts this video).

Crowd Sourcing Butterfly Conservation

The BBC writes about an effort in the UK to use crowd sourcing to populate data recording the number of different types of butterflies: the Big Butterfly Count. Participants are asked to spend 15 minutes spotting butterflies and moths. The data, currently 5121 sightings (24 hours later, 5866), is displayed on a map.

A couple of thoughts. Firstly, I think the data could be displayed in a far more engaging manner with a heat map of some sort, with the ability to show clusters of different species at least. The following is an inefficient way to show the data for a species:

 


Secondly, I wonder if Twitter could be used in some way to channel the data - one could even tweet a picture to the project. That way, the data could be verified and it would come with geolocation and time associated.

Finally, and perhaps most importantly, the site suffers from the age old problem of inadvertent-tab-ellipsis-renaming:

Summer Segmentation: Bribe Rate

The mine that data - Tue, 07/27/2010 - 03:15
The "bribe rate" is one of the most important metrics you can track.

The "bribe rate", of course, is the percentage of orders during any period of time that include a discount, a promotion, or at least one sale item. The bribe rate is often inversely correlated with brand loyalty.

If you think this metric needs to be on every single performance dashboard, you are right.

If you think this metric makes for a perfect segmentation variable, you are right!

Segment your customer base into high, average, and low bribe rates. It matters!

Augmented Reality - 17 years from Concept to Product?

Almost twenty years ago, I recall coming across a paper (either in the AI library in Edinburgh, or the Computer Science library in Cambridge) which described an augmented reality approach to that most intractable of problems: fixing printers.

A number of forces have conspired to allow me to access a reference to that paper (Google's crawl/search, my memory being prompted repeatedly by augmented reality applications on mobile devices).

At any rate, I suspect the image below, from a document with a 1993 time stamp, may be one of the earliest incarnation of augmented reality. Feiner, S., MacIntyre, B., and Seligmann, D. (1993) "Knowledge-Based Augmented Reality." Communications of the ACM, Vol. 36(7), pp. 53-62.

 

Looking around now, what will be hitting mainstream in 17 years?

Syndicate content