![]() |
97.7k chunk size [Apologies in advance if this isn't the right place to mention this, but... ] Acquisition 0.67, a Mac OS X Gnutella client, implements the new "chunk" download method where files are downloaded in 97.7k segments. I understand this is a new Gnutella feature (or possibly just LimeWire feature?) designed to improve the robustness of the download system. Unfortunately, it does not work very well yet. Most of the time, it will result in the client downloading 97.7k worth of data, disconnecting, and then attempting to reconnect to continue the download. Needless to say, this prevents the connection from ramping up to full speed, and also frequently results in someone else taking the "free" download slot before your client can reconnect. What is the justification for breaking downloads into these small chunks, and are there any plans to address the serious problems caused by this method of handling downloads? |
Re: 97.7k chunk size This is also used in BearShare and is called 'Partial Content'. The advantages of this are that the chunks are smaller for the client and the download will complete faster. Also, this rotates the connections of Downloads. Say you're downloading a file that has 100 sources, your client may only be able to download from 16 sources at a time. What this does is allow the client to rotate hosts that it is downloading from. Hopefully this isn't too wordy! :) |
Thanks for the reply, linuxrocks. However, in my experience, downloads do not complete faster with the "partial content" feature. (Like I said, most of them do not complete at all, they simply disconnect at 97.7k, never to resume.) In fact, I'm not sure how it could be faster, as I have a DSL connection that needs a solid, continuous connection in order to ramp up to full speed, and that just doesn't happen in the space of 97.7k. As for rotating downloads from multiple hosts, that might be great if you are downloading Eminem's latest, but most of the time the file I'm trying to download is only hosted by one source. When I've finally managed to track down a rare file, the last thing I want is to have my connection to the host severed mere seconds after it begun. I'm not sure if the problem that the client is actually sending a "disconnect" message after each 97.7 chunk, or if some hosts incorrectly interpret the end of a chunk as their cue to to close the connection and offer up the download slot to someone else. Regardless, at this point this "feature" is causing far more harm than good and I hope these problems will be addressed soon. |
Quote:
Look in your 'Console' or equivalent. There should be the information you need there. |
The smaller chunk size does not reduce download speed as much as the LimeWire/Acquisition UI tells you. (Try a network monitor or something and look at the overall downstream) The downloads are a lot more robust since then and LimeWire/Acquisiton does not create large files containing mainly empty data any more. Corruptions should be identifed earlier and the overall downloads should be more robust. - This new feature might also come very handy if tree hashes and partial file sharing are implemented one day - although partial filesharing is generally overestimated (except for it can be used to force users to share - although it is my experience that with higher connection speeds freeloaders are becoming rarer). |
trap_jaw, Hmm... I'll grant you I may be deceived about the download speed by the UI. So perhaps you are right that the small chunk size will eventually be a good thing overall. But for the moment, it is a serious hindrance since it doesn't maintain a steady connection -- it downloads 97.7k and then disconnects, leaving your download aborted practically as soon as it had began Yes, it tries to reconnect but 99% that doesn't happen. So while this feature may be good for the network overall, it's terrible for the end users who are stuck with it (at least until the problems with the implementation is resolved). |
This can only happen if you were downloading from a buggy client that claims to support http 1.1 but does not really. I've seen that happen, too. |
That's what I thought, but there are apparently an awful lot of buggy clients out there. It's difficult to know which ones are the culprits since that information is not available to Acquisition users, but I know for a fact that Shareaza doesn't support this (i.e., it boots Acq users after the first 97.7k). There are surely others, given the howls of outrage from users when this "feature" was introduced. Anyway, Dave Wantanabe, the developer of Acquisition has taken steps to try to fix this on his end, and it seems to be working better now -- so far. |
chunks emulate Ethernet more specifically, it's trying to copy the sucess of the CS/MA nature of ethernet -- instead of trying to establish a constant connection (with guaranteed throughput), experience has shown that it's better to have a "collision domain" where everybody competes for temporary access. If two requests collide, they both back-off for a random amount of time and try again. As long as the signaling speed is fast and you don't have more than about 25 requestors, you get consistent transfer rates (60-70% of the theoretical). P.S. I know I've got the acronym for this technique wrong but I'm tired and don't want to try and look it up tonight... :) |
I think it's obvious from the experience of many users (not just me -- really, check the Acquisition forums) that this is probably okay for local networks, and maybe even okay for files that are commonly hosted by, well, a host of clients, but it is extremely bad for that most common of Gnutella situations, i.e., when many clients are after a file hosted by a tiny minority of hosts. Especially when the clients play by different rules. If you have one client that implements the chunk downloads (handing over its download spot to someone else after only a 97.7k transfer) versus another client that hangs on to the connection like a bulldog until it is complete, who do you think is going to end up with the successful download? |
If a client really implements http1.1 it does not give your download slot to another downloader until you close the connection, which is not the case between requesting different chunks. If you Acquisition users think it's not good, - why not tell the guy who put it online to change it? It's really a trivial task. The chunk size is a constant in the ManagedDownloader class. |
Quote:
|
The idea with chunks is simply swarming downloads. When you divide the file in chunks, you can request the different chunks from different hosts and download the file simultaneously from different sources. That will speed up the transferring proccess. Dividing in chunks itslef will not help at all, it will actually decrease the downloading efficient rate. But when there are multiple sources available and the chunk size is reasonable, it will significantly increase the download transfer rates for broad band users. But without Keep-Alive and HTTP-pipelinig small chunks may have huge negative affect to the transfer rates. |
That was essentially my point -- dividing the file into chunks might work well under ideal circumstances, i.e., when downloading files that are hosted by large numbers of hosts, all of whom support the feature. Unfortunately, that simply doesn't reflect the reality of the Gnutella network as it stands today. The overwhelming majority of the files I might actually want to download are available from only one host at any given time (at best). |
Downloading in chunks will not hurt performance significantly as long as the chunk size is reasonable and both-ends support Keep-Alive and HTTP pipelining. Even if the other-end does not support Keep-Alive nor HTTP pipelining, download performance shouldn't hurt much as long as the downloader is using reasonable chunk sizes. If a client doesn't have support for at least Keep-Alive, it definitely should not download files in chunks. This is especially important when downloading files from firewalled hosts, establishing the connection through PUSH have relatively low success rate, so it is important to not disconnect from the host. I wouldn't call a fixed 97.7k chunk size reasonable. It results way too many re-requests when downloading large files. Also, if the client doesn't support Keep-Alive (you said it loses the connection after downloading a chunk), it will definitely result very poor download performance, even if there are a lot of reliable sources available. Keep-Alive is essential feature when downloading files in chunks, HTTP pipelining is highly recommended. |
Phex uses a different way of chunking. You can manually force chunking with the option to manual entered chunk size causing to chunk the file, or a relative size where you specify how many chunks you like. Phex also support a efficient automatic merge of two chunks in case the next chunk has not started a download yet and the other host is not supporting KeepAlive. This way no disconnection is needed. If you dont force manual chunking Phex will chunk the file for you by itself in the moment a new host connection is ready to upload the file to you. Of course the new Phex 0.8 release is also able to download from partial sources or get into a download queue to wait until the other host provides the file to you. Gregor |
Quote:
BearShare divides a file into chunks equal to 2.5% of the file size. Using this method, along with the necessary Keep-Alive support, makes it EASY to add additional sources to the download (they just request different same-sized chunks). You also don't have to make any funky estimates of the speed of the remote host in order to figure out how much to ask for (like LimeWire's original swarming implementation). I believe LimeWire has switched to the fixed-chunk size scheme I described above. Note "fixed chunk scheme" means that all chunks are equally sized (in BearShare's case, 2.5% of the file size). "Variable chunk scheme" means that for a single file, the size of chunks requested from different hosts might be different. |
I agree, this is a very harmful change... At least when downloading a large file from only one or two sources. I think the developers forget that a file that returns 200 sources really means that maybe 3 sources exist that can be connected to. The others are busy, long gone (but their data persists), or have such large queues that it would take days to get an active slot. Too many times have I seen (queued: slot 97 of 100)-type messages. If I leave a query running during the download, I may eventually turn up 200 or 300 sources, but gnutella says that only 22 or them are alive, and all of them are busy, and we'll retry them later (and get another busy, and so on). The congestion on the Gnet is abysmal, and is getting rapidly worse. It looks like way too many people are downloading, but not sharing what they download, or what they have of their own content. Yet as far as I know, not a single software developer group has made any effort to give guidelines of how many upload vs download slots should be active, and what the real consequences will be if these guidelines are ignored. In fact, it's pretty clear that the people who write the software have not tried to really use it the way the typical user does, or they'd realize that mostly, Gnet doesn't work for large files. It probably works OK for the popular MP3 junk out there, but for multi-hundred MB files, too many things prevent the user from ever getting a good download. I'm not pointing at any specific client, as it's the network/users themselves that are causing the problems. Too many must be greedy of bandwidth, and just won't share. So along comes joe_user, and he wants the entire contents of some current music disk, so he searches for the files, selects them all for download, then goes back to browsing. He notices that the thruput is terrible with all this sharing going on, so he turns off his uploads. Multiply this times about 50% of the users, and you can see why people are irritated. Yet there has been a distinct lack of noise from the powers that be in Gnutella development about this, or what to do to fix it, other than quiet messages that say 'don't forget to share' in their posts. We really need every single gnet client group to post a strong recommendation of the upload/download ratios, and an explenation of what will happen if those guidelines are ignored. The average user has no idea of what affect his settings will have on the network, and generally doesn't care, or even think about it much. Because no one ever bothered to tell them it matters, and matters alot. With our current situation, and the clients that do disconnect after 97K transfers, we can't get files anymore. I've gone back to IRC to get my anime files, because I can download an entire 200MB video clip in about 45 minutes, and it's a direct connection to a fast server. I have never (hear me, guys? NEVER) been able to download a complete anime video on Gnet. Fasttrack works, but is also getting much slower, just like Gnet. However, I think Gnet has more users, and is suffering more from the situation because of this. Seriously, I've found numerous posts to the gnutella forums, asking for guidelines in setting up the connection ratios, and not one single person got a direct and satisfactory answer. It's pretty clear to the users that no one cares enought to answer these questions, so they set them to the defaults, or tweak them as they please. To fix this mess, we really need more open slots than requests, or it will never dig it's way out. And unless the authoritative people address the situation in a clear and understandable way, Gnet will deteriorate into an unusable mess, with the exception of the 2-4mb MP3 file with 10,000 active sources. Do we want gnet to become this? I sure don't, but it's headed that way very quickly. And everyone (except the MP3 users who don't really see the problem) pretends that nothing is wrong, and it's the individual user's setup, or their poor connection to the net, or something else that is wrong. We need to stop ignoring the problem, and do something about it. I leave my system on for hours to make my files available, but I think I am the exception. I have more than one computer, and can easily leave a system up to share anime. It's even legal content, unless the episodes have been released for US television, which luckily is almost none of them. But a lot of people want them, as my systems are always completely filled up with requests, and my queue can get pretty deep too. If the 'industry' won't get together and take some action about this, the net will become more and more unusable for the average user. God help the poor fellow with a 56K modem. Unless he finds a source like my system, he's hosed if he wants anything other than a tiny file. We can't do anything about people not sharing, or turning their systems off when they are finished downloading their own data. But we need to advise the people who are trying to make the net 'work' that they need to offer more upload slots than they download at any one time, or they will never see reasonable thruput and reasonable source counts that are active. The situation is so bad, that I'd guess that we need three or four times the number of uploads to downloads, in order to offset the people who just won't share. If people throttle their upload bandwidth, fine, but make the slots available. If enough do that, they will get fine aggregate thruputs, and no one will suffer as a result. It would be a lot better if they all offered a large number of slots, and as much thruput as they can afford to give. But getting the message out, so that the current trend which is killing the network can start to turn around, is the responsibility of the 'industry' leaders. And they have ducked the issue for too long now. Sorry for the soapbox, but the irritation is rising to a level where I had to say something. It's clear others feel the same way. Will someone step up and make the effort to address this, or will we continue to ignore the problems? If I had the equipment to measure the status of the net, I'd publish the results everywhere I could in order to get someone to address this. I'm sure some of the development groups do have the equipment and knowledge to do that, but they are too busy to bother with what's really happening on our network. I think they have their priorities wrong. And I think that within another year, Gnet will be useless if this trend continues. There needs to be an over-adjustment in the opposite direction, or we will never overcome the problems caused by the careless or ignorant user. --Rockkeys |
uh, Thanks for reviving this long-dead thread? BTW, my experience has been that I see enough 'weird' stuff from the people downloading from me that I believe the RIAA/MPAA/whomever is sabotaging the Gnet. Don't ask for my examples, please. Anyway, the problems you cite re non-existent sources etc are -- I believe -- being intentionally created by those same organizations. Or, maybe it's just incompetence that causes people to run Ultrapeers that never delete old listings? |
All times are GMT -7. The time now is 09:19 PM. |
Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2025, vBulletin Solutions, Inc.
SEO by vBSEO 3.6.0 ©2011, Crawlability, Inc.
Copyright © 2020 Gnutella Forums.
All Rights Reserved.