About 98 Percent Done: Software Test World Cup 2014

I wrote recently about the Software Test World Cup. In the last post I said I would post our score card and an analysis if one was possible. First I will give you the raw data, with some "Avg" columns removed. I have not intentionally edited the content other than making the judges more anonymous, format shifting from excel and removing extra columns involving averages and totals. Oh and I marked spelling mistakes with sic, which I just learned should be written with brackets, not parenthesis.

Judges	Importance of Bugs Filed	Quality of Bug Reports	Non-Functional Bugs filed	Writing/ Quality of Test Report	Accuracy of Test Report	BONUS: Teamwork/Judge interaction 0-10	NOTES
A	16	16	9	14	14	1	Bonus: Test Report/Bugs made me want to engage the customer. Several Usability issues.
B	15	15	11	13	13	1	I found that the report did not flow well. I know many teams are expected to give ship/no ship decisions, this [sic] iritates me, i promise not to let it affect my [sic] juding.
C	12	11	10	14	12	1	I like the test report. It gives practical examples, what could have been [sic] testeed for different aspects, e.g. Load, but the ship decision and the "major" bugs seem not to fit imo. bug spread is okay, they tried to consider disability issues (red/green vs. [sic] colorbling ID326)

First of all, I thank the judges for making not only numeric judgments, which is super hard, but also that they spent the time to write some comments. I appreciate that. However, in looking at the judge's comments it seems confusing. For example, C says they liked the report, but their score was no higher than the other reports. They complain that the major issues seem not to fit, such as crashes and the inability to order the product. Granted I don't have the full list of bugs we found, but I wonder what priority they were looking for. That is left to the reader's imagination.

Judges A and B were kinder score wise. Judge A gives us a bonus, but not the bonus we were promised by Matt Heusser, the product owner for answering a question in the youtube channel. Certainly the bonuses were not used how I imagined. I thought our conversations with Matt and the Product Owner would be part of that bonus, but it appears the judges ignored this. I did find it interesting that Matt said that usability would be part of non-functional [no citation, I'm not rewatching 3 hours of video]. We also asked for permission to do some load testing on the Snagit site but never got a response from the product owner, so we chose not to because of legal and ethical implications. It seems like that was not applied in the scoring, but maybe I am wrong. With Snagit as the tool to test there isn't a lot of non-functional testing to do. Judge B on the other hand was harsh in comments but gave good scores (relatively). I agree with Judge B that providing a ship/no ship is annoying, but that is what a conversation is for. We didn't get to have one of those, so we did the best we could.

It is interesting how diverse the opinions of the judges were and also how middle in the road most of their scores were (all items but the bonus were out of 20). Considering how highly we placed, I am guessing either the judges were never impressed or they found out how bad things got and a 10 / 20 really is more like a 15 / 20 relatively speaking. Finally, I promised our test report. Sadly I can't upload the thing to blogspot, so instead I am going to post it below, with some attempts to deal with formatting.

Functional Test Report

By: JCD, Isaac Howard, Wayne Earl, and KRB

Status:

Do not ship

Major Issues:

Undo does not always work, undoing the wrong thing. We found a good number of bugs, including multiple crashes; some seem more realistic seen than others. Bug 241 showed that webcam capture crashed on one particular Mac. When no webcam exists and you attempt to take a camera capture Snagit closes. The Order Now, Tutorial, Get More Stamps and New Output buttons on went to a 404 page. With Preferences open in windows 7, 8, the application refuses to take screenshots. There are about 20 priority 1-3 bugs, which suggests the application isn’t finished.

What Did Work:

We did a performance test with a small Win 8 with 4 gigs of ram and 1.8ghz cpu. It succeeded to take video at a viewable quality. The editor in most cases worked well. Basic usage in most cases works well. The mobile integration worked.

Misc:

We earned bonus points per Matt for giving advice on how to edit video (Jeremy Cd). We asked multiple times in the youtube channel if we could load test the system’s web site but could not get back a response from the judges/Matt. We chose not to ‘hack’ the system due to legal and ethical issues.

Limits of Testing:

- No Automation was generated
- State of Unit Testing is unknown (Customer’s don’t know about unit tests)
- We did even come close to hitting all the menu items.
o We only have a rough set of tests of windows. Most of our testing was on Macs.
- We didn’t have the technical expertise in the system to capture logs.
- We didn’t have the time to capture the before and after
- We have limited experience with the SUT.
- We only tested the configurations provided. Other configurations of the system were ignored.
- Could not have a conversation with product owner on critical bugs after he left.
- Driver/Hardware testing is limited. We mostly have Maverick OSX.

How Testing Was Planned:

-Website:

As we have been informed that this is a highly hardware dependent product (video and screen capture), which is a commodity, we decided that actually looking at the website of the product might be as important as the product itself. Particularly since a user cannot determine the differences in quality between products without testing the products, the website becomes a primary concern. Particularly since we don’t know they support the types of computers we have, we might be forced to test the site. Finally, if the product has a price, making sure you can’t break the security and get to the download system for free is important.

-Load:

In discussing load testing, we have considered trying to record multiple youtube videos all playing at once, which will stress the hardware and will make for a great deal of variation of the recorder to capture. It might also add a lot of audio channels to record if required. Also using a TV-static screen will push the compression algorithm, as it cannot be compressed well, so we might also test with that. Finally attempt to use a slower system, with little ram and a slow hard disk with lots of CPU and disk usage to see if the recording fails.

-Mobile:

We will attempt to get our relatively few mobile devices to load the system if possible and do some basic usage.

-Usability:

If it is not easy to do the basics of screen/video capture, then users may ask for their money back or go find a different product.

-Feature comparison:
http://en.wikipedia.org/wiki/Comparison_of_screencasting_software#Comparison_by_features
http://lifehacker.com/5839047/five-best-screencasting-or-screen-recording-tools

-Functionality:
• Save a file with a good and a bad file name.
• Record and Capture on app basis, full screen, area
• Two monitors vs One Monitor
• Editing if supported
• Sound if supported
• Pan and zoom if supported
• Arrows, Text, Captions, etc.
• Transitions
• Competitive Intel:
o http://www.techsmith.com/tutorial-camtasia-8.html
o http://www.techsmith.com/jing-features.html
o http://www.telestream.net/screenflow/features.htm
• Long time record (If possible)
• Upload Tools
• Formats supported
• Merging/Dividing recordings
• Tagging / describing the videos other than just the file name
• How easy is it to take the output and use it with another system (e.g. I want to quickly use screen shots and videos from this tool and add them to a bug I created in JIRA)
• Can you add your own voice to the recording? Like narrate what you are doing or what you expect via the microphone on the device you are using
• Can you turn this ability off so you don't hear Isaac swearing at the system or can you remove the swearing track after the fact?

Questions:
- Who are the stakeholders for this testing?
- What are the requirements? What is the minimum viable feature set?
- What is the goal of this testing (Important bugs, release decision, support costs, lawsuits, etc.)?
- Can you please give 3 major user scenarios?
- What is the typical user like? Advanced? Beginner?
- What are the top 3 things the stakeholders care about, such as usability and security.
- Can you give an example of the #1 competitor?
- What sorts of problems do you expect to see?
- Does the SUT require multi-system support? What systems need support (E.G. Mobile, PC, Mac, etc.)? Are there any special features limited to certain browsers/OSes? How many version back are supported?
- What is the purpose of the product? Do you have a vision or mission statement?
- Can you describe the performance profile?
- Is there any documentation that we should review?
- What languages need to be supported?
- Does the application call home (external servers)? Are debug logs available to us? Even if not, is anything sensitive recorded there? Are they stored in a secure location, either locally or externally?
- Do we know what sorts of networks our typical user will use to download the app with? Do we know what the average patch size is?
- Are there other possible configurations that might need to be tested?
- How does the product make money? What is the business case rather than the customer case?
- Should we/can we do any white box testing?
- Can we get access to a developer to ask questions regarding the internals of the system, code coverage, etc.?

About 98 Percent Done

Tuesday, May 20, 2014

Software Test World Cup 2014 - Part 2

1 comment: